Performance Optimization

Kosh is built for high performance. Follow these guidelines to maintain and improve performance.

Memory Management

Kosh uses object pooling extensively to reduce garbage collection pressure during high-throughput builds.

Buffer Pooling

Use SharedBufferPool for temporary byte buffers:

builder/utils/pools.go

// BufferPool manages reusable bytes.Buffer objects
type BufferPool struct {
    pool sync.Pool
}

func NewBufferPool() *BufferPool {
    return &BufferPool{
        pool: sync.Pool{
            New: func() interface{} {
                return new(bytes.Buffer)
            },
        },
    }
}

func (p *BufferPool) Get() *bytes.Buffer {
    return p.pool.Get().(*bytes.Buffer)
}

func (p *BufferPool) Put(buf *bytes.Buffer) {
    if buf.Cap() > MaxBufferSize {
        return  // Discard oversized buffers
    }
    buf.Reset()
    p.pool.Put(buf)
}

Usage pattern:

// Get buffer from pool
buf := utils.SharedBufferPool.Get()
defer utils.SharedBufferPool.Put(buf)

// Use buffer
buf.WriteString("content")
result := buf.String()

Always use defer to return buffers to the pool, even if an error occurs.

String Building

Use strings.Builder for string concatenation instead of + operator:

// Good - efficient
var sb strings.Builder
for _, word := range words {
    sb.WriteString(word)
    sb.WriteString(" ")
}
result := sb.String()

// Bad - creates intermediate strings
var result string
for _, word := range words {
    result += word + " "
}

Slice Pre-allocation

Pre-allocate slices when you know the size:

// Good - pre-allocate
posts := make([]models.PostMetadata, 0, expectedCount)
for _, file := range files {
    posts = append(posts, processFile(file))
}

// Bad - grows dynamically
var posts []models.PostMetadata
for _, file := range files {
    posts = append(posts, processFile(file))
}

Pool for Encoded Posts

builder/cache/cache.go

// Pool for batch BoltDB operations
var encodedPostPool = sync.Pool{
    New: func() interface{} {
        return make([]EncodedPost, 0, 64)
    },
}

// Usage
func (m *Manager) batchWrite(posts []*PostMeta) error {
    encoded := encodedPostPool.Get().([]EncodedPost)
    defer func() {
        encoded = encoded[:0]
        encodedPostPool.Put(encoded)
    }()
    
    // Use encoded slice...
}

Concurrency Patterns

Worker Pools

Use the generic WorkerPool[T] for concurrent operations:

builder/utils/worker_pool.go

type WorkerPool[T any] struct {
    workers   int
    ctx       context.Context
    wg        sync.WaitGroup
    taskQueue chan T
    handler   func(T)
}

func NewWorkerPool[T any](ctx context.Context, workers int, handler func(T)) *WorkerPool[T] {
    if workers <= 0 {
        workers = runtime.NumCPU()
    }
    if workers > MaxWorkers {
        workers = MaxWorkers
    }
    return &WorkerPool[T]{
        workers:   workers,
        ctx:       ctx,
        taskQueue: make(chan T, workers*WorkerBufferSize),
        handler:   handler,
    }
}

Usage example:

ctx := context.Background()

// Create pool with 4 workers
pool := utils.NewWorkerPool(ctx, 4, func(path string) {
    processMarkdownFile(path)
})

pool.Start()

// Submit tasks
for _, path := range markdownFiles {
    pool.Submit(path)
}

// Wait for completion
pool.Stop()

Atomic Operations

Use atomic operations for counters in concurrent code:

var (
    processedCount int32
    anyChanged     atomic.Bool
)

// In worker goroutine
atomic.AddInt32(&processedCount, 1)
if changed {
    anyChanged.Store(true)
}

// Read final values
total := atomic.LoadInt32(&processedCount)
hasChanges := anyChanged.Load()

Mutex vs RWMutex

Use sync.RWMutex when reads are more frequent than writes:

type Cache struct {
    data map[string]*Entry
    mu   sync.RWMutex
}

// Many readers can access concurrently
func (c *Cache) Get(key string) *Entry {
    c.mu.RLock()
    defer c.mu.RUnlock()
    return c.data[key]
}

// Writers get exclusive access
func (c *Cache) Set(key string, entry *Entry) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.data[key] = entry
}

Cache Optimization

In-Memory LRU Cache

Kosh uses an in-memory LRU cache for hot PostMeta data:

builder/cache/cache.go

type memoryCacheEntry struct {
    meta      *PostMeta
    expiresAt time.Time
}

type Manager struct {
    db          *bolt.DB
    memCache    map[string]*memoryCacheEntry
    memCacheMu  sync.RWMutex
    memCacheTTL time.Duration
}

const defaultMemCacheTTL = 5 * time.Minute

Benefits:

Reduces BoltDB reads for frequently accessed posts
5-minute TTL ensures fresh data
Thread-safe with RWMutex

Content-Addressed Storage

// Small content stored inline (< 32KB)
if len(html) < 32*1024 {
    post.InlineHTML = html
} else {
    // Large content stored by hash
    hash := hashContent(html)
    storeContent(hash, html)
    post.ContentHash = hash
}

Benefits:

Avoids duplicate storage of identical content
Single I/O for small posts
Deduplication for large content

Batch Operations

Group database writes for better throughput:

// Bad - multiple transactions
for _, post := range posts {
    db.Update(func(tx *bolt.Tx) error {
        return tx.Bucket("posts").Put(post.ID, post.Data)
    })
}

// Good - single transaction
db.Update(func(tx *bolt.Tx) error {
    b := tx.Bucket("posts")
    for _, post := range posts {
        if err := b.Put(post.ID, post.Data); err != nil {
            return err
        }
    }
    return nil
})

Body Hash Caching

Kosh caches body content hash separately from frontmatter (v1.2.1):

type PostMeta struct {
    ID            string
    Title         string
    BodyHash      string  // Hash of body content only
    SSRInputHashes map[string]string  // D2/LaTeX hashes
}

Benefits:

Accurate cache invalidation on body-only changes
Prevents silent cache misses
Tracks server-side rendering dependencies

Build Pipeline Optimization

Build Order (Critical)

Static assets MUST complete before post rendering because templates use the Assets map.

The build pipeline enforces this order:

Static assets build

Populates Assets map via SetAssets():

assets := make(map[string]string)
assets["/static/css/layout.css"] = "/static/css/layout.abc123.css"
renderService.SetAssets(assets)

Posts render

Templates use asset map:

<link rel="stylesheet" href="{{ index .Assets "/static/css/layout.css" }}">

Global pages render

Same asset references as posts.

PWA generation

Uses GetAssets() for manifest and service worker.

Pre-computed Fields

Store normalized data to avoid runtime computation:

// Bad - normalize at query time
type SearchRecord struct {
    Title string
    Body  string
}

func search(query string) {
    normalizedQuery := strings.ToLower(query)
    for _, record := range records {
        if strings.Contains(strings.ToLower(record.Title), normalizedQuery) {
            // Match
        }
    }
}

// Good - pre-compute normalized strings
type SearchRecord struct {
    Title           string
    TitleNormalized string  // Pre-computed at index time
    Body            string
    BodyNormalized  string  // Pre-computed at index time
}

func search(query string) {
    normalizedQuery := strings.ToLower(query)
    for _, record := range records {
        if strings.Contains(record.TitleNormalized, normalizedQuery) {
            // Match - no runtime normalization
        }
    }
}

Stemming Cache

Kosh caches stemmed words for ~76x speedup on repeated words:

var (
    stemCache sync.Map  // Thread-safe cache
)

func StemCached(word string) string {
    if cached, ok := stemCache.Load(word); ok {
        return cached.(string)
    }
    
    stemmed := Stem(word)
    stemCache.Store(word, stemmed)
    return stemmed
}

I/O Optimization

Efficient File Walking

Use filepath.WalkDir instead of filepath.Walk:

// Good - efficient
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
    if d.IsDir() {
        return nil
    }
    // Process file
})

// Bad - extra stat calls
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
    if info.IsDir() {
        return nil
    }
    // Process file
})

Buffered I/O

// Use buffered writer for large outputs
file, _ := os.Create("output.html")
writer := bufio.NewWriterSize(file, 64*1024)
defer writer.Flush()

writer.WriteString(content)

Avoid Double ReadFile

Kosh optimizes image encoding (v1.2.1):

// Bad - reads file twice
data, _ := os.ReadFile(imagePath)
encoded := encodeImage(data)
writeToCache(encoded)

data2, _ := os.ReadFile(imagePath)  // Duplicate read!
writeToDestination(data2)

// Good - read once, write twice
data, _ := os.ReadFile(imagePath)
encoded := encodeImage(data)
writeToCache(encoded)
writeToDestination(encoded)

Bytes vs String

Use bytes.Contains to avoid string allocation:

// Good - no allocation
if bytes.Contains(content, []byte("---")) {
    // Has frontmatter
}

// Bad - converts to string
if strings.Contains(string(content), "---") {
    // Has frontmatter
}

Profiling

CPU Profiling

# During build
kosh build --cpuprofile=cpu.prof

# Analyze
go tool pprof cpu.prof

Common pprof commands:

(pprof) top10          # Top 10 CPU consumers
(pprof) list Render    # Show CPU time in Render function
(pprof) web            # Visual graph (requires graphviz)
(pprof) pdf > cpu.pdf  # Export to PDF

Memory Profiling

# During build
kosh build --memprofile=mem.prof

# Analyze
go tool pprof mem.prof

Finding memory leaks:

(pprof) top10 -cum     # Top allocations (cumulative)
(pprof) list Cache     # Show allocations in Cache
(pprof) inuse_space    # Memory currently in use
(pprof) alloc_space    # Total allocated memory

Live Profiling

import _ "net/http/pprof"

func main() {
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    
    // Rest of application
}

Access profiles at:

http://localhost:6060/debug/pprof/
http://localhost:6060/debug/pprof/heap
http://localhost:6060/debug/pprof/goroutine

Build Metrics

Kosh tracks build performance metrics:

builder/metrics/metrics.go

type BuildMetrics struct {
    StartTime   time.Time
    CacheHits   int
    CacheMisses int
    PostsBuilt  int
}

// Output format
// 📊 Built 150 posts in 2.3s (cache: 120/30 hits, 80%)

Metrics collected:

Build duration
Cache hit/miss ratio
Posts processed
Average post processing time

Metrics are suppressed in serve --dev mode to reduce noise during watch mode.

Optimization Checklist

Before optimizing:

Profile first - Measure before changing
Identify bottleneck - Focus on the slowest part
Benchmark before - Establish baseline
Make one change - Isolate the impact
Benchmark after - Verify improvement
Check memory - Ensure no memory leaks
Test edge cases - Large files, many files
Document trade-offs - Note any complexity added

Performance Tips

Use Efficient Data Structures

// Use map for O(1) lookup
visited := make(map[string]bool)
if visited[id] {
    return  // Already processed
}

// Use sync.Map for concurrent access
var cache sync.Map
cache.Store(key, value)
value, ok := cache.Load(key)

Avoid Unnecessary Allocations

// Bad - allocates on every call
func getConfig() *Config {
    return &Config{
        Timeout: 30 * time.Second,
    }
}

// Good - reuse singleton
var defaultConfig = &Config{
    Timeout: 30 * time.Second,
}

func getConfig() *Config {
    return defaultConfig
}

Minimize Interface Conversions

// Bad - repeated type assertions
func process(items []interface{}) {
    for _, item := range items {
        str := item.(string)  // Type assertion in loop
        processString(str)
    }
}

// Good - use generics
func process[T any](items []T) {
    for _, item := range items {
        processItem(item)  // No type assertion
    }
}

Lazy Initialization

type Service struct {
    expensiveResource *Resource
    once              sync.Once
}

func (s *Service) getResource() *Resource {
    s.once.Do(func() {
        s.expensiveResource = initializeResource()
    })
    return s.expensiveResource
}

Performance Goals

Build time: < 5s for 1000 posts (cold cache)
Incremental build: < 500ms for single post
Memory usage: < 500MB for 10,000 posts
Cache hit rate: > 80% on incremental builds
Search index size: < 30% of total content size
WASM load time: < 2s on 3G connection

Get Started

Core Concepts

Usage

Features

Development

​Memory Management

​Buffer Pooling

​String Building

​Slice Pre-allocation

​Pool for Encoded Posts

​Concurrency Patterns

​Worker Pools

​Atomic Operations

​Mutex vs RWMutex

​Cache Optimization

​In-Memory LRU Cache

​Content-Addressed Storage

​Batch Operations

​Body Hash Caching

​Build Pipeline Optimization

​Build Order (Critical)

​Pre-computed Fields

​Stemming Cache

​I/O Optimization

​Efficient File Walking

​Buffered I/O

​Avoid Double ReadFile

​Bytes vs String

​Profiling

​CPU Profiling

​Memory Profiling

​Live Profiling

​Build Metrics

​Optimization Checklist

​Performance Tips

​Use Efficient Data Structures

​Avoid Unnecessary Allocations

​Minimize Interface Conversions

​Lazy Initialization

​Performance Goals

Build docs developers (and LLMs) love

Memory Management

Buffer Pooling

String Building

Slice Pre-allocation

Pool for Encoded Posts

Concurrency Patterns

Worker Pools

Atomic Operations

Mutex vs RWMutex

Cache Optimization

In-Memory LRU Cache

Content-Addressed Storage

Batch Operations

Body Hash Caching

Build Pipeline Optimization

Build Order (Critical)

Pre-computed Fields

Stemming Cache

I/O Optimization

Efficient File Walking

Buffered I/O

Avoid Double ReadFile

Bytes vs String

Profiling

CPU Profiling

Memory Profiling

Live Profiling

Build Metrics

Optimization Checklist

Performance Tips

Use Efficient Data Structures

Avoid Unnecessary Allocations

Minimize Interface Conversions

Lazy Initialization

Performance Goals