Incremental Builds

Kosh implements incremental builds using a BoltDB cache and content-addressed storage. Only changed files are reprocessed, making rebuilds 10-100x faster than full builds.

Architecture

The cache system consists of three layers:

builder/cache/
├── cache.go           # BoltDB manager with LRU cache
├── cache_reads.go     # Generic read operations
├── cache_writes.go    # Batch write operations
├── types.go           # Data structures (PostMeta, SearchRecord)
└── store.go           # Content-addressed filesystem

Storage Design

.kosh-cache/
├── meta.db            # BoltDB database
│   ├── posts          # PostMeta indexed by PostID
│   ├── paths          # Path → PostID mapping
│   ├── search         # Pre-computed BM25 data
│   ├── deps           # Template/include dependencies
│   └── ssr_artifacts  # D2 diagrams, KaTeX math
└── store/             # Content-addressed files
    ├── 3a/f2...       # Large HTML (by BLAKE3 hash)
    └── d4/1c...       # SSR outputs (diagrams, math)

Cache Invalidation

Kosh uses hash-based invalidation to detect changes:

Body Hash Tracking (v1.2.1)

Previously, Kosh only hashed frontmatter, causing silent cache hits when body content changed. v1.2.1 introduced separate body hashing. From builder/cache/types.go:16-25:

type PostMeta struct {
    PostID         string
    ContentHash    string  // Frontmatter hash
    BodyHash       string  // Body content hash (CRITICAL)
    HTMLHash       string  // For large posts
    InlineHTML     []byte  // < 32KB posts stored inline
    TemplateHash   string
    SSRInputHashes []string  // D2/LaTeX input hashes
    // ...
}

Invalidation logic:

// Both hashes must match for cache hit
if cached.ContentHash == newContentHash && cached.BodyHash == newBodyHash {
    return cached  // Cache hit
}
// Otherwise, re-render

Before v1.2.1, changing only the body content (without touching frontmatter) would incorrectly reuse cached HTML. This critical bug is now fixed.

SSR Hash Tracking

D2 diagrams and LaTeX math are cached separately and tracked in SSRInputHashes:

type SSRArtifact struct {
    Type       string  // "d2" or "katex"
    InputHash  string  // BLAKE3 of source code
    OutputHash string  // BLAKE3 of rendered output
    RefCount   int
    Size       int64
    Compressed bool    // Zstd compression
}

Example: If you change a D2 diagram from A -> B to A -> C, only that diagram re-renders—the rest of the post uses cached HTML.

In-Memory LRU Cache

To reduce BoltDB reads, v1.2.1 added an LRU (Least Recently Used) cache for hot data. From builder/cache/cache.go:21-42:

type Manager struct {
    db          *bolt.DB
    memCache    map[string]*memoryCacheEntry
    memCacheMu  sync.RWMutex
    memCacheTTL time.Duration  // Default: 5 minutes
}

type memoryCacheEntry struct {
    meta      *PostMeta
    expiresAt time.Time
}

Cache Lookup Flow

From builder/cache/cache_reads.go:72-108:

func (m *Manager) GetPostByPath(path string) (*PostMeta, error) {
    // 1. Check in-memory cache first
    if cached := m.memCacheGet("path:" + normalizedPath); cached != nil {
        return cached, nil  // Fast path: ~10ns
    }

    // 2. Fall back to BoltDB
    var result *PostMeta
    err := m.db.View(func(tx *bolt.Tx) error {
        paths := tx.Bucket([]byte(BucketPaths))
        postID := paths.Get([]byte(normalizedPath))
        
        posts := tx.Bucket([]byte(BucketPosts))
        data := posts.Get(postID)
        Decode(data, &result)
        return nil
    })

    // 3. Store in memory cache for next lookup
    if result != nil {
        m.memCacheSet("path:"+normalizedPath, result)
    }
    return result, err
}

Performance impact: Frequently accessed posts (like index pages) see ~100x faster lookups after the first read.

The 5-minute TTL ensures the cache stays fresh during watch mode. Cache entries are automatically evicted on writes.

Generic Cache Operations

Kosh uses Go 1.18+ generics for type-safe cache reads (from builder/cache/cache_reads.go:13-33):

func getCachedItem[T any](db *bolt.DB, bucketName string, key []byte) (*T, error) {
    var result *T
    err := db.View(func(tx *bolt.Tx) error {
        bucket := tx.Bucket([]byte(bucketName))
        data := bucket.Get(key)
        if data == nil {
            return nil
        }

        var item T
        if err := Decode(data, &item); err != nil {
            return err
        }
        result = &item
        return nil
    })
    return result, err
}

Usage:

// Type-safe, no casting needed
post, err := getCachedItem[PostMeta](db, BucketPosts, postID)
search, err := getCachedItem[SearchRecord](db, BucketSearch, postID)

Batch Writes

To maximize throughput, Kosh uses object pooling and batch commits (from builder/cache/cache.go:15-19):

var encodedPostPool = sync.Pool{
    New: func() interface{} {
        return make([]EncodedPost, 0, 64)
    },
}

Batch Commit Pattern

func (m *Manager) CommitBatch(posts []EncodedPost) error {
    return m.db.Update(func(tx *bolt.Tx) error {
        postsB := tx.Bucket([]byte(BucketPosts))
        pathsB := tx.Bucket([]byte(BucketPaths))
        searchB := tx.Bucket([]byte(BucketSearch))

        for _, p := range posts {
            postsB.Put(p.PostID, p.Data)
            pathsB.Put(p.Path, p.PostID)
            searchB.Put(p.PostID, p.SearchData)
        }
        return nil
    })
}

Why batch? BoltDB fsync is expensive (~5ms). Batching 100 posts reduces fsync from 100x to 1x = 500ms saved.

Content-Addressed Storage

Large HTML (>32KB) is stored in the filesystem by BLAKE3 hash:

const InlineHTMLThreshold = 32 * 1024  // 32KB

if len(html) < InlineHTMLThreshold {
    meta.InlineHTML = html  // Store in BoltDB
} else {
    hash := cache.HashContent(html)
    store.Write(hash, html)  // Write to .kosh-cache/store/
    meta.HTMLHash = hash
}

Directory structure:

.kosh-cache/store/
├── 3a/f2d8e1...  # First 2 chars = subdir
└── d4/1c9b7a...

This prevents BoltDB bloat and enables efficient deduplication.

Watch Mode

In dev mode, Kosh watches for changes and performs surgical rebuilds. From builder/run/incremental.go:60-95:

func (b *Builder) BuildChanged(ctx context.Context, changedPath string, op fsnotify.Op) {
    // Handle deletions
    if op&fsnotify.Remove == fsnotify.Remove {
        b.deletePostFromCache(changedPath)
        b.Build(ctx)  // Full rebuild to update indexes
        return
    }

    // Handle markdown changes - single post rebuild
    if strings.HasSuffix(changedPath, ".md") {
        b.buildSinglePost(ctx, changedPath)
        utils.SyncVFS(b.DestFs, b.cfg.OutputDir, b.renderService.GetRenderedFiles())
        return
    }

    // Handle CSS/JS changes - full rebuild to update asset hashes
    if ext == ".css" || ext == ".js" {
        b.Build(ctx)  // Asset hashes affect HTML
        return
    }

    // Handle template changes - invalidate affected posts
    if strings.HasSuffix(changedPath, ".html") {
        affectedPaths := b.invalidateForTemplate(changedPath)
        if affectedPaths == nil {
            b.Build(ctx)  // Layout.html changed
        } else {
            for _, path := range affectedPaths {
                b.buildSinglePost(ctx, path)
            }
        }
    }
}

Rebuild Strategies

Change Type	Strategy	Speed
Single `.md` file	Re-render only that post	~50ms
CSS/JS file	Full rebuild (asset hashes)	~500ms
Template	Rebuild posts using template	~200ms
`layout.html`	Full rebuild	~500ms
Config	Full rebuild	~500ms

Watch mode automatically debounces rapid changes to prevent rebuild storms. Only one build runs at a time.

Cache Management Commands

# Show cache statistics
kosh cache stats

# Verify cache integrity
kosh cache verify

# Run garbage collection (remove orphaned content)
kosh cache gc

# Dry run (show what would be deleted)
kosh cache gc --dry-run

# Force full rebuild (clear cache)
kosh cache rebuild

# Delete all cache data
kosh cache clear

# Inspect a specific file's cache entry
kosh cache inspect content/posts/my-post.md

Performance Benchmarks

100-post documentation site:

Operation	Cold (no cache)	Warm (cached)	Speedup
Full build	2.5s	250ms	10x
Single post	50ms	10ms	5x
Watch rebuild	500ms	50ms	10x

500-post blog:

Operation	Cold	Warm	Speedup
Full build	15s	800ms	18x
Single post	80ms	15ms	5x

Incremental builds shine during development. The first build after git clone is slow, but subsequent builds are nearly instant.

BoltDB Configuration

Kosh optimizes BoltDB for SSG workloads (from builder/cache/cache.go:50-79):

opts := &bolt.Options{
    Timeout:         10 * time.Second,
    FreelistType:    bolt.FreelistArrayType,  // Faster than map
    PageSize:        16384,                   // 16KB pages
    InitialMmapSize: calculatedSize,          // Pre-allocate based on existing DB
}

if isDev {
    opts.NoGrowSync = true  // Faster, less durable (okay for dev)
} else {
    opts.NoGrowSync = false // Production: full durability
}

Key optimizations:

Array freelist: Faster for SSG’s write-heavy workload
16KB pages: Matches typical page size (vs 4KB default)
Dynamic mmap: Grows to 2x current DB size (max 100MB)
Dev mode: Skips fsync on grow operations

Cache ID Verification

Kosh stores a cache ID to detect configuration changes:

func (m *Manager) VerifyCacheID(expectedID string) (needsRebuild bool, err error) {
    var storedID []byte
    m.db.View(func(tx *bolt.Tx) error {
        meta := tx.Bucket([]byte(BucketMeta))
        storedID = meta.Get([]byte(KeyCacheID))
        return nil
    })

    if storedID == nil || string(storedID) != expectedID {
        return true, nil  // Rebuild needed
    }
    return false, nil
}

What triggers cache invalidation:

Theme change
Output directory change
Base URL change (if affecting rendered HTML)
Major version upgrade

Get Started

Core Concepts

Usage

Features

Development

Incremental Builds

Architecture

Storage Design

Cache Invalidation

Body Hash Tracking (v1.2.1)

SSR Hash Tracking

In-Memory LRU Cache

Cache Lookup Flow

Generic Cache Operations

Batch Writes

Batch Commit Pattern

Content-Addressed Storage

Watch Mode

Rebuild Strategies

Cache Management Commands

Performance Benchmarks

BoltDB Configuration

Cache ID Verification

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage

Features

Development

​Architecture

​Storage Design

​Cache Invalidation

​Body Hash Tracking (v1.2.1)

​SSR Hash Tracking

​In-Memory LRU Cache

​Cache Lookup Flow

​Generic Cache Operations

​Batch Writes

​Batch Commit Pattern

​Content-Addressed Storage

​Watch Mode

​Rebuild Strategies

​Cache Management Commands

​Performance Benchmarks

​BoltDB Configuration

​Cache ID Verification

Build docs developers (and LLMs) love

Architecture

Storage Design

Cache Invalidation

Body Hash Tracking (v1.2.1)

SSR Hash Tracking

In-Memory LRU Cache

Cache Lookup Flow

Generic Cache Operations

Batch Writes

Batch Commit Pattern

Content-Addressed Storage

Watch Mode

Rebuild Strategies

Cache Management Commands

Performance Benchmarks

BoltDB Configuration

Cache ID Verification