Skip to main content
Kosh implements incremental builds using a BoltDB cache and content-addressed storage. Only changed files are reprocessed, making rebuilds 10-100x faster than full builds.

Architecture

The cache system consists of three layers:
builder/cache/
├── cache.go           # BoltDB manager with LRU cache
├── cache_reads.go     # Generic read operations
├── cache_writes.go    # Batch write operations
├── types.go           # Data structures (PostMeta, SearchRecord)
└── store.go           # Content-addressed filesystem

Storage Design

.kosh-cache/
├── meta.db            # BoltDB database
│   ├── posts          # PostMeta indexed by PostID
│   ├── paths          # Path → PostID mapping
│   ├── search         # Pre-computed BM25 data
│   ├── deps           # Template/include dependencies
│   └── ssr_artifacts  # D2 diagrams, KaTeX math
└── store/             # Content-addressed files
    ├── 3a/f2...       # Large HTML (by BLAKE3 hash)
    └── d4/1c...       # SSR outputs (diagrams, math)

Cache Invalidation

Kosh uses hash-based invalidation to detect changes:

Body Hash Tracking (v1.2.1)

Previously, Kosh only hashed frontmatter, causing silent cache hits when body content changed. v1.2.1 introduced separate body hashing. From builder/cache/types.go:16-25:
type PostMeta struct {
    PostID         string
    ContentHash    string  // Frontmatter hash
    BodyHash       string  // Body content hash (CRITICAL)
    HTMLHash       string  // For large posts
    InlineHTML     []byte  // < 32KB posts stored inline
    TemplateHash   string
    SSRInputHashes []string  // D2/LaTeX input hashes
    // ...
}
Invalidation logic:
// Both hashes must match for cache hit
if cached.ContentHash == newContentHash && cached.BodyHash == newBodyHash {
    return cached  // Cache hit
}
// Otherwise, re-render
Before v1.2.1, changing only the body content (without touching frontmatter) would incorrectly reuse cached HTML. This critical bug is now fixed.

SSR Hash Tracking

D2 diagrams and LaTeX math are cached separately and tracked in SSRInputHashes:
type SSRArtifact struct {
    Type       string  // "d2" or "katex"
    InputHash  string  // BLAKE3 of source code
    OutputHash string  // BLAKE3 of rendered output
    RefCount   int
    Size       int64
    Compressed bool    // Zstd compression
}
Example: If you change a D2 diagram from A -> B to A -> C, only that diagram re-renders—the rest of the post uses cached HTML.

In-Memory LRU Cache

To reduce BoltDB reads, v1.2.1 added an LRU (Least Recently Used) cache for hot data. From builder/cache/cache.go:21-42:
type Manager struct {
    db          *bolt.DB
    memCache    map[string]*memoryCacheEntry
    memCacheMu  sync.RWMutex
    memCacheTTL time.Duration  // Default: 5 minutes
}

type memoryCacheEntry struct {
    meta      *PostMeta
    expiresAt time.Time
}

Cache Lookup Flow

From builder/cache/cache_reads.go:72-108:
func (m *Manager) GetPostByPath(path string) (*PostMeta, error) {
    // 1. Check in-memory cache first
    if cached := m.memCacheGet("path:" + normalizedPath); cached != nil {
        return cached, nil  // Fast path: ~10ns
    }

    // 2. Fall back to BoltDB
    var result *PostMeta
    err := m.db.View(func(tx *bolt.Tx) error {
        paths := tx.Bucket([]byte(BucketPaths))
        postID := paths.Get([]byte(normalizedPath))
        
        posts := tx.Bucket([]byte(BucketPosts))
        data := posts.Get(postID)
        Decode(data, &result)
        return nil
    })

    // 3. Store in memory cache for next lookup
    if result != nil {
        m.memCacheSet("path:"+normalizedPath, result)
    }
    return result, err
}
Performance impact: Frequently accessed posts (like index pages) see ~100x faster lookups after the first read.
The 5-minute TTL ensures the cache stays fresh during watch mode. Cache entries are automatically evicted on writes.

Generic Cache Operations

Kosh uses Go 1.18+ generics for type-safe cache reads (from builder/cache/cache_reads.go:13-33):
func getCachedItem[T any](db *bolt.DB, bucketName string, key []byte) (*T, error) {
    var result *T
    err := db.View(func(tx *bolt.Tx) error {
        bucket := tx.Bucket([]byte(bucketName))
        data := bucket.Get(key)
        if data == nil {
            return nil
        }

        var item T
        if err := Decode(data, &item); err != nil {
            return err
        }
        result = &item
        return nil
    })
    return result, err
}
Usage:
// Type-safe, no casting needed
post, err := getCachedItem[PostMeta](db, BucketPosts, postID)
search, err := getCachedItem[SearchRecord](db, BucketSearch, postID)

Batch Writes

To maximize throughput, Kosh uses object pooling and batch commits (from builder/cache/cache.go:15-19):
var encodedPostPool = sync.Pool{
    New: func() interface{} {
        return make([]EncodedPost, 0, 64)
    },
}

Batch Commit Pattern

func (m *Manager) CommitBatch(posts []EncodedPost) error {
    return m.db.Update(func(tx *bolt.Tx) error {
        postsB := tx.Bucket([]byte(BucketPosts))
        pathsB := tx.Bucket([]byte(BucketPaths))
        searchB := tx.Bucket([]byte(BucketSearch))

        for _, p := range posts {
            postsB.Put(p.PostID, p.Data)
            pathsB.Put(p.Path, p.PostID)
            searchB.Put(p.PostID, p.SearchData)
        }
        return nil
    })
}
Why batch? BoltDB fsync is expensive (~5ms). Batching 100 posts reduces fsync from 100x to 1x = 500ms saved.

Content-Addressed Storage

Large HTML (>32KB) is stored in the filesystem by BLAKE3 hash:
const InlineHTMLThreshold = 32 * 1024  // 32KB

if len(html) < InlineHTMLThreshold {
    meta.InlineHTML = html  // Store in BoltDB
} else {
    hash := cache.HashContent(html)
    store.Write(hash, html)  // Write to .kosh-cache/store/
    meta.HTMLHash = hash
}
Directory structure:
.kosh-cache/store/
├── 3a/f2d8e1...  # First 2 chars = subdir
└── d4/1c9b7a...
This prevents BoltDB bloat and enables efficient deduplication.

Watch Mode

In dev mode, Kosh watches for changes and performs surgical rebuilds. From builder/run/incremental.go:60-95:
func (b *Builder) BuildChanged(ctx context.Context, changedPath string, op fsnotify.Op) {
    // Handle deletions
    if op&fsnotify.Remove == fsnotify.Remove {
        b.deletePostFromCache(changedPath)
        b.Build(ctx)  // Full rebuild to update indexes
        return
    }

    // Handle markdown changes - single post rebuild
    if strings.HasSuffix(changedPath, ".md") {
        b.buildSinglePost(ctx, changedPath)
        utils.SyncVFS(b.DestFs, b.cfg.OutputDir, b.renderService.GetRenderedFiles())
        return
    }

    // Handle CSS/JS changes - full rebuild to update asset hashes
    if ext == ".css" || ext == ".js" {
        b.Build(ctx)  // Asset hashes affect HTML
        return
    }

    // Handle template changes - invalidate affected posts
    if strings.HasSuffix(changedPath, ".html") {
        affectedPaths := b.invalidateForTemplate(changedPath)
        if affectedPaths == nil {
            b.Build(ctx)  // Layout.html changed
        } else {
            for _, path := range affectedPaths {
                b.buildSinglePost(ctx, path)
            }
        }
    }
}

Rebuild Strategies

Change TypeStrategySpeed
Single .md fileRe-render only that post~50ms
CSS/JS fileFull rebuild (asset hashes)~500ms
TemplateRebuild posts using template~200ms
layout.htmlFull rebuild~500ms
ConfigFull rebuild~500ms
Watch mode automatically debounces rapid changes to prevent rebuild storms. Only one build runs at a time.

Cache Management Commands

# Show cache statistics
kosh cache stats

# Verify cache integrity
kosh cache verify

# Run garbage collection (remove orphaned content)
kosh cache gc

# Dry run (show what would be deleted)
kosh cache gc --dry-run

# Force full rebuild (clear cache)
kosh cache rebuild

# Delete all cache data
kosh cache clear

# Inspect a specific file's cache entry
kosh cache inspect content/posts/my-post.md

Performance Benchmarks

100-post documentation site:
OperationCold (no cache)Warm (cached)Speedup
Full build2.5s250ms10x
Single post50ms10ms5x
Watch rebuild500ms50ms10x
500-post blog:
OperationColdWarmSpeedup
Full build15s800ms18x
Single post80ms15ms5x
Incremental builds shine during development. The first build after git clone is slow, but subsequent builds are nearly instant.

BoltDB Configuration

Kosh optimizes BoltDB for SSG workloads (from builder/cache/cache.go:50-79):
opts := &bolt.Options{
    Timeout:         10 * time.Second,
    FreelistType:    bolt.FreelistArrayType,  // Faster than map
    PageSize:        16384,                   // 16KB pages
    InitialMmapSize: calculatedSize,          // Pre-allocate based on existing DB
}

if isDev {
    opts.NoGrowSync = true  // Faster, less durable (okay for dev)
} else {
    opts.NoGrowSync = false // Production: full durability
}
Key optimizations:
  • Array freelist: Faster for SSG’s write-heavy workload
  • 16KB pages: Matches typical page size (vs 4KB default)
  • Dynamic mmap: Grows to 2x current DB size (max 100MB)
  • Dev mode: Skips fsync on grow operations

Cache ID Verification

Kosh stores a cache ID to detect configuration changes:
func (m *Manager) VerifyCacheID(expectedID string) (needsRebuild bool, err error) {
    var storedID []byte
    m.db.View(func(tx *bolt.Tx) error {
        meta := tx.Bucket([]byte(BucketMeta))
        storedID = meta.Get([]byte(KeyCacheID))
        return nil
    })

    if storedID == nil || string(storedID) != expectedID {
        return true, nil  // Rebuild needed
    }
    return false, nil
}
What triggers cache invalidation:
  • Theme change
  • Output directory change
  • Base URL change (if affecting rendered HTML)
  • Major version upgrade

Build docs developers (and LLMs) love