Skip to main content
Kosh is built for high performance. Follow these guidelines to maintain and improve performance.

Memory Management

Kosh uses object pooling extensively to reduce garbage collection pressure during high-throughput builds.

Buffer Pooling

Use SharedBufferPool for temporary byte buffers:
builder/utils/pools.go
// BufferPool manages reusable bytes.Buffer objects
type BufferPool struct {
    pool sync.Pool
}

func NewBufferPool() *BufferPool {
    return &BufferPool{
        pool: sync.Pool{
            New: func() interface{} {
                return new(bytes.Buffer)
            },
        },
    }
}

func (p *BufferPool) Get() *bytes.Buffer {
    return p.pool.Get().(*bytes.Buffer)
}

func (p *BufferPool) Put(buf *bytes.Buffer) {
    if buf.Cap() > MaxBufferSize {
        return  // Discard oversized buffers
    }
    buf.Reset()
    p.pool.Put(buf)
}
Usage pattern:
// Get buffer from pool
buf := utils.SharedBufferPool.Get()
defer utils.SharedBufferPool.Put(buf)

// Use buffer
buf.WriteString("content")
result := buf.String()
Always use defer to return buffers to the pool, even if an error occurs.

String Building

Use strings.Builder for string concatenation instead of + operator:
// Good - efficient
var sb strings.Builder
for _, word := range words {
    sb.WriteString(word)
    sb.WriteString(" ")
}
result := sb.String()

// Bad - creates intermediate strings
var result string
for _, word := range words {
    result += word + " "
}

Slice Pre-allocation

Pre-allocate slices when you know the size:
// Good - pre-allocate
posts := make([]models.PostMetadata, 0, expectedCount)
for _, file := range files {
    posts = append(posts, processFile(file))
}

// Bad - grows dynamically
var posts []models.PostMetadata
for _, file := range files {
    posts = append(posts, processFile(file))
}

Pool for Encoded Posts

builder/cache/cache.go
// Pool for batch BoltDB operations
var encodedPostPool = sync.Pool{
    New: func() interface{} {
        return make([]EncodedPost, 0, 64)
    },
}

// Usage
func (m *Manager) batchWrite(posts []*PostMeta) error {
    encoded := encodedPostPool.Get().([]EncodedPost)
    defer func() {
        encoded = encoded[:0]
        encodedPostPool.Put(encoded)
    }()
    
    // Use encoded slice...
}

Concurrency Patterns

Worker Pools

Use the generic WorkerPool[T] for concurrent operations:
builder/utils/worker_pool.go
type WorkerPool[T any] struct {
    workers   int
    ctx       context.Context
    wg        sync.WaitGroup
    taskQueue chan T
    handler   func(T)
}

func NewWorkerPool[T any](ctx context.Context, workers int, handler func(T)) *WorkerPool[T] {
    if workers <= 0 {
        workers = runtime.NumCPU()
    }
    if workers > MaxWorkers {
        workers = MaxWorkers
    }
    return &WorkerPool[T]{
        workers:   workers,
        ctx:       ctx,
        taskQueue: make(chan T, workers*WorkerBufferSize),
        handler:   handler,
    }
}
Usage example:
ctx := context.Background()

// Create pool with 4 workers
pool := utils.NewWorkerPool(ctx, 4, func(path string) {
    processMarkdownFile(path)
})

pool.Start()

// Submit tasks
for _, path := range markdownFiles {
    pool.Submit(path)
}

// Wait for completion
pool.Stop()

Atomic Operations

Use atomic operations for counters in concurrent code:
var (
    processedCount int32
    anyChanged     atomic.Bool
)

// In worker goroutine
atomic.AddInt32(&processedCount, 1)
if changed {
    anyChanged.Store(true)
}

// Read final values
total := atomic.LoadInt32(&processedCount)
hasChanges := anyChanged.Load()

Mutex vs RWMutex

Use sync.RWMutex when reads are more frequent than writes:
type Cache struct {
    data map[string]*Entry
    mu   sync.RWMutex
}

// Many readers can access concurrently
func (c *Cache) Get(key string) *Entry {
    c.mu.RLock()
    defer c.mu.RUnlock()
    return c.data[key]
}

// Writers get exclusive access
func (c *Cache) Set(key string, entry *Entry) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.data[key] = entry
}

Cache Optimization

In-Memory LRU Cache

Kosh uses an in-memory LRU cache for hot PostMeta data:
builder/cache/cache.go
type memoryCacheEntry struct {
    meta      *PostMeta
    expiresAt time.Time
}

type Manager struct {
    db          *bolt.DB
    memCache    map[string]*memoryCacheEntry
    memCacheMu  sync.RWMutex
    memCacheTTL time.Duration
}

const defaultMemCacheTTL = 5 * time.Minute
Benefits:
  • Reduces BoltDB reads for frequently accessed posts
  • 5-minute TTL ensures fresh data
  • Thread-safe with RWMutex

Content-Addressed Storage

// Small content stored inline (< 32KB)
if len(html) < 32*1024 {
    post.InlineHTML = html
} else {
    // Large content stored by hash
    hash := hashContent(html)
    storeContent(hash, html)
    post.ContentHash = hash
}
Benefits:
  • Avoids duplicate storage of identical content
  • Single I/O for small posts
  • Deduplication for large content

Batch Operations

Group database writes for better throughput:
// Bad - multiple transactions
for _, post := range posts {
    db.Update(func(tx *bolt.Tx) error {
        return tx.Bucket("posts").Put(post.ID, post.Data)
    })
}

// Good - single transaction
db.Update(func(tx *bolt.Tx) error {
    b := tx.Bucket("posts")
    for _, post := range posts {
        if err := b.Put(post.ID, post.Data); err != nil {
            return err
        }
    }
    return nil
})

Body Hash Caching

Kosh caches body content hash separately from frontmatter (v1.2.1):
type PostMeta struct {
    ID            string
    Title         string
    BodyHash      string  // Hash of body content only
    SSRInputHashes map[string]string  // D2/LaTeX hashes
}
Benefits:
  • Accurate cache invalidation on body-only changes
  • Prevents silent cache misses
  • Tracks server-side rendering dependencies

Build Pipeline Optimization

Build Order (Critical)

Static assets MUST complete before post rendering because templates use the Assets map.
The build pipeline enforces this order:
1

Static assets build

Populates Assets map via SetAssets():
assets := make(map[string]string)
assets["/static/css/layout.css"] = "/static/css/layout.abc123.css"
renderService.SetAssets(assets)
2

Posts render

Templates use asset map:
<link rel="stylesheet" href="{{ index .Assets "/static/css/layout.css" }}">
3

Global pages render

Same asset references as posts.
4

PWA generation

Uses GetAssets() for manifest and service worker.

Pre-computed Fields

Store normalized data to avoid runtime computation:
// Bad - normalize at query time
type SearchRecord struct {
    Title string
    Body  string
}

func search(query string) {
    normalizedQuery := strings.ToLower(query)
    for _, record := range records {
        if strings.Contains(strings.ToLower(record.Title), normalizedQuery) {
            // Match
        }
    }
}

// Good - pre-compute normalized strings
type SearchRecord struct {
    Title           string
    TitleNormalized string  // Pre-computed at index time
    Body            string
    BodyNormalized  string  // Pre-computed at index time
}

func search(query string) {
    normalizedQuery := strings.ToLower(query)
    for _, record := range records {
        if strings.Contains(record.TitleNormalized, normalizedQuery) {
            // Match - no runtime normalization
        }
    }
}

Stemming Cache

Kosh caches stemmed words for ~76x speedup on repeated words:
var (
    stemCache sync.Map  // Thread-safe cache
)

func StemCached(word string) string {
    if cached, ok := stemCache.Load(word); ok {
        return cached.(string)
    }
    
    stemmed := Stem(word)
    stemCache.Store(word, stemmed)
    return stemmed
}

I/O Optimization

Efficient File Walking

Use filepath.WalkDir instead of filepath.Walk:
// Good - efficient
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
    if d.IsDir() {
        return nil
    }
    // Process file
})

// Bad - extra stat calls
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
    if info.IsDir() {
        return nil
    }
    // Process file
})

Buffered I/O

// Use buffered writer for large outputs
file, _ := os.Create("output.html")
writer := bufio.NewWriterSize(file, 64*1024)
defer writer.Flush()

writer.WriteString(content)

Avoid Double ReadFile

Kosh optimizes image encoding (v1.2.1):
// Bad - reads file twice
data, _ := os.ReadFile(imagePath)
encoded := encodeImage(data)
writeToCache(encoded)

data2, _ := os.ReadFile(imagePath)  // Duplicate read!
writeToDestination(data2)

// Good - read once, write twice
data, _ := os.ReadFile(imagePath)
encoded := encodeImage(data)
writeToCache(encoded)
writeToDestination(encoded)

Bytes vs String

Use bytes.Contains to avoid string allocation:
// Good - no allocation
if bytes.Contains(content, []byte("---")) {
    // Has frontmatter
}

// Bad - converts to string
if strings.Contains(string(content), "---") {
    // Has frontmatter
}

Profiling

CPU Profiling

# During build
kosh build --cpuprofile=cpu.prof

# Analyze
go tool pprof cpu.prof
Common pprof commands:
(pprof) top10          # Top 10 CPU consumers
(pprof) list Render    # Show CPU time in Render function
(pprof) web            # Visual graph (requires graphviz)
(pprof) pdf > cpu.pdf  # Export to PDF

Memory Profiling

# During build
kosh build --memprofile=mem.prof

# Analyze
go tool pprof mem.prof
Finding memory leaks:
(pprof) top10 -cum     # Top allocations (cumulative)
(pprof) list Cache     # Show allocations in Cache
(pprof) inuse_space    # Memory currently in use
(pprof) alloc_space    # Total allocated memory

Live Profiling

import _ "net/http/pprof"

func main() {
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    
    // Rest of application
}
Access profiles at:
  • http://localhost:6060/debug/pprof/
  • http://localhost:6060/debug/pprof/heap
  • http://localhost:6060/debug/pprof/goroutine

Build Metrics

Kosh tracks build performance metrics:
builder/metrics/metrics.go
type BuildMetrics struct {
    StartTime   time.Time
    CacheHits   int
    CacheMisses int
    PostsBuilt  int
}

// Output format
// 📊 Built 150 posts in 2.3s (cache: 120/30 hits, 80%)
Metrics collected:
  • Build duration
  • Cache hit/miss ratio
  • Posts processed
  • Average post processing time
Metrics are suppressed in serve --dev mode to reduce noise during watch mode.

Optimization Checklist

Before optimizing:
  • Profile first - Measure before changing
  • Identify bottleneck - Focus on the slowest part
  • Benchmark before - Establish baseline
  • Make one change - Isolate the impact
  • Benchmark after - Verify improvement
  • Check memory - Ensure no memory leaks
  • Test edge cases - Large files, many files
  • Document trade-offs - Note any complexity added

Performance Tips

Use Efficient Data Structures

// Use map for O(1) lookup
visited := make(map[string]bool)
if visited[id] {
    return  // Already processed
}

// Use sync.Map for concurrent access
var cache sync.Map
cache.Store(key, value)
value, ok := cache.Load(key)

Avoid Unnecessary Allocations

// Bad - allocates on every call
func getConfig() *Config {
    return &Config{
        Timeout: 30 * time.Second,
    }
}

// Good - reuse singleton
var defaultConfig = &Config{
    Timeout: 30 * time.Second,
}

func getConfig() *Config {
    return defaultConfig
}

Minimize Interface Conversions

// Bad - repeated type assertions
func process(items []interface{}) {
    for _, item := range items {
        str := item.(string)  // Type assertion in loop
        processString(str)
    }
}

// Good - use generics
func process[T any](items []T) {
    for _, item := range items {
        processItem(item)  // No type assertion
    }
}

Lazy Initialization

type Service struct {
    expensiveResource *Resource
    once              sync.Once
}

func (s *Service) getResource() *Resource {
    s.once.Do(func() {
        s.expensiveResource = initializeResource()
    })
    return s.expensiveResource
}

Performance Goals

  • Build time: < 5s for 1000 posts (cold cache)
  • Incremental build: < 500ms for single post
  • Memory usage: < 500MB for 10,000 posts
  • Cache hit rate: > 80% on incremental builds
  • Search index size: < 30% of total content size
  • WASM load time: < 2s on 3G connection

Build docs developers (and LLMs) love