Cache System

Kosh implements a compiler-grade incremental build system using BoltDB for metadata and a content-addressed file store for large artifacts. This enables fast rebuilds by only processing changed files.

Cache Architecture

Two-Tier Storage

BoltDB (Metadata): Fast key-value lookups for post metadata, dependencies, and search indexes
Content Store (Large Artifacts): Content-addressed storage for HTML, images, and SSR outputs

BoltDB Schema

Kosh uses multiple buckets to organize cached data:

Bucket	Key	Value	Purpose
`posts`	PostID (BLAKE3)	`PostMeta` (msgpack)	Post metadata and inline HTML
`paths`	Normalized path	PostID	Path → ID lookup
`search`	PostID	`SearchRecord` (msgpack)	BM25 data and tokenized content
`deps`	PostID	`Dependencies` (msgpack)	Template and include dependencies
`tags`	Tag name	[]PostID	Tag → posts index
`templates`	Template path	[]PostID	Template → posts index
`ssr`	Input hash	`SSRArtifact` (msgpack)	D2/KaTeX cached outputs
`meta`	Key name	Value	Build metadata (schema version, cache ID)

All data structures use msgpack for efficient serialization (30% smaller and 2.5x faster than GOB).

Cache Data Structures

PostMeta

Stores all metadata about a cached post:

type PostMeta struct {
    PostID         string                 // BLAKE3 of UUID or path
    Path           string                 // Normalized relative path
    ModTime        int64                  // File modification timestamp
    ContentHash    string                 // BLAKE3 of frontmatter
    BodyHash       string                 // BLAKE3 of body content (v1.2.1)
    HTMLHash       string                 // BLAKE3 of HTML (for large posts)
    InlineHTML     []byte                 // HTML &lt; 32KB stored inline
    TemplateHash   string                 // Template fingerprint
    SSRInputHashes []string               // D2/KaTeX input hashes
    Title          string
    Date           time.Time
    Tags           []string
    WordCount      int
    ReadingTime    int
    Description    string
    Link           string
    Weight         int
    Pinned         bool
    Draft          bool
    Meta           map[string]interface{}
    TOC            []models.TOCEntry
    Version        string
}

Key Innovation (v1.2.1): The BodyHash field was added to fix a critical bug where body-only changes were ignored:

// Always read source to compute body hash (CRITICAL for cache validity)
if info == nil {
    info, _ = s.sourceFs.Stat(path)
}
source, _ := afero.ReadFile(s.sourceFs, path)
bodyHash := utils.GetBodyHash(source)  // BLAKE3 of just the body

// Invalidate cache if body content changed (regardless of ModTime)
if exists && cachedMeta != nil && cachedMeta.BodyHash != "" && cachedMeta.BodyHash != bodyHash {
    exists = false
}

This ensures that even if only the markdown body changes (without frontmatter changes), the cache is properly invalidated.

SearchRecord

Stores pre-computed search index data:

type SearchRecord struct {
    Title           string
    NormalizedTitle string         // Lowercase for case-insensitive search
    Tokens          []string
    BM25Data        map[string]int // Word frequency map
    DocLen          int            // Document length in tokens
    Content         string         // Plain text content
    NormalizedTags  []string       // Lowercase tags
    Words           []string       // Cached tokenized words
}

Pre-computed Fields: The cache stores normalized strings to avoid runtime ToLower() calls during search:

// Pre-compute normalized fields for search
normalizedTags := make([]string, len(post.Tags))
for i, t := range post.Tags {
    normalizedTags[i] = strings.ToLower(t)
}

searchRecord := models.PostRecord{
    Title:           post.Title,
    NormalizedTitle: strings.ToLower(post.Title),  // Pre-normalized
    NormalizedTags:  normalizedTags,                // Pre-normalized
    // ...
}

Dependencies

Tracks what each post depends on for incremental invalidation:

type Dependencies struct {
    Templates []string  // Template files used
    Includes  []string  // Included content files
    Tags      []string  // Tags (for tag page invalidation)
}

BLAKE3 Hashing

Kosh uses BLAKE3 for all content hashing (replaced MD5 in Phase 1 for security).

import "github.com/zeebo/blake3"

// HashContent computes BLAKE3 hash of content and returns hex string
func HashContent(data []byte) string {
    hash := blake3.Sum256(data)
    return hex.EncodeToString(hash[:])
}

// HashString computes BLAKE3 hash of a string
func HashString(s string) string {
    return HashContent([]byte(s))
}

Benefits of BLAKE3:

Secure: Cryptographically strong (unlike MD5)
Fast: ~3x faster than SHA-256
Deterministic: Same input always produces same hash
Collision-resistant: Practically impossible to find collisions

Content-Addressed Storage

Large HTML content (≥32KB) is stored in a content-addressed file store:

type Store struct {
    basePath string
    encoder  *zstd.Encoder
    decoder  *zstd.Decoder
}

// Put stores content and returns its hash and compression type
func (s *Store) Put(category string, content []byte) (hash string, ct CompressionType, err error) {
    hash = HashContent(content)  // BLAKE3 hash
    ct = determineCompression(len(content))
    
    // Two-tier sharding: hash[0:2]/hash[2:4]/hash
    path := s.shardPath(category, hash) + extension(ct)
    
    // Check if already exists (content-addressed = deduplication)
    if _, err := os.Stat(path); err == nil {
        return hash, ct, nil  // Already stored
    }
    
    // Compress and write
    compressed := s.compress(content, ct)
    return hash, ct, os.WriteFile(path, compressed, 0644)
}

Sharding Strategy

Files are sharded into directories to avoid file system performance issues:

.kosh-cache/store/
├── html/
│   ├── a1/
│   │   ├── b2/
│   │   │   └── a1b2c3d4e5f6...zst  (BLAKE3 hash)
│   │   └── c3/
│   └── f7/
└── ssr/
    └── d2/
        └── a1/
            └── b2c3d4e5f6...raw

Compression Strategy

Size	Compression	Reason
< 1KB	None (`.raw`)	Compression overhead > savings
1KB - 100KB	Zstd Fast	Balanced speed/ratio
> 100KB	Zstd Level 3	Better compression for large files

Content-addressed storage provides automatic deduplication: if two posts generate identical HTML, only one copy is stored.

In-Memory LRU Cache

Added in v1.2.1, hot PostMeta data is cached in memory with TTL:

type memoryCacheEntry struct {
    meta      *PostMeta
    expiresAt time.Time
}

type Manager struct {
    // ...
    memCache    map[string]*memoryCacheEntry
    memCacheMu  sync.RWMutex
    memCacheTTL time.Duration  // Default: 5 minutes
}

func (m *Manager) memCacheGet(key string) *PostMeta {
    m.memCacheMu.RLock()
    entry, ok := m.memCache[key]
    m.memCacheMu.RUnlock()
    
    if !ok {
        return nil
    }
    
    if time.Now().After(entry.expiresAt) {
        m.memCacheMu.Lock()
        delete(m.memCache, key)
        m.memCacheMu.Unlock()
        return nil
    }
    
    return entry.meta
}

Performance Impact:

Reduces BoltDB reads for frequently accessed posts
Particularly effective during pagination rendering
5-minute TTL balances freshness and hit rate

Generic Cache Reads

Kosh uses Go 1.18+ generics for type-safe cache retrieval:

func getCachedItem[T any](db *bolt.DB, bucketName string, key []byte) (*T, error) {
    var result *T
    err := db.View(func(tx *bolt.Tx) error {
        bucket := tx.Bucket([]byte(bucketName))
        if bucket == nil {
            return nil
        }
        data := bucket.Get(key)
        if data == nil {
            return nil
        }
        
        var item T
        if err := Decode(data, &item); err != nil {
            return err
        }
        result = &item
        return nil
    })
    return result, err
}

// Usage
post, err := getCachedItem[PostMeta](m.db, BucketPosts, []byte(postID))

Benefits:

Type safety: Compile-time type checking
Code reduction: Single implementation for all types
Performance: No runtime type assertions

Batch Operations

Cache writes are batched to minimize BoltDB transactions:

func (m *Manager) BatchCommit(
    posts []*PostMeta,
    records map[string]*SearchRecord,
    deps map[string]*Dependencies,
) error {
    // Encode all items before transaction
    encodedPosts := make([]EncodedPost, 0, len(posts))
    for _, post := range posts {
        data, _ := Encode(post)
        searchData, _ := Encode(records[post.PostID])
        depsData, _ := Encode(deps[post.PostID])
        
        encodedPosts = append(encodedPosts, EncodedPost{
            PostID:     []byte(post.PostID),
            Data:       data,
            Path:       []byte(post.Path),
            SearchData: searchData,
            DepsData:   depsData,
            Tags:       post.Tags,
        })
    }
    
    // Single write transaction
    return m.db.Update(func(tx *bolt.Tx) error {
        for _, enc := range encodedPosts {
            // Write to posts bucket
            // Write to paths bucket
            // Write to search bucket
            // Write to deps bucket
            // Write to tags bucket
        }
        return nil
    })
}

Batching reduces transaction overhead from O(n) to O(1), where n is the number of posts.

Cache Invalidation

Kosh invalidates cached posts when dependencies change:

1. File Modification

Compare ModTime from file system with cached value:

if info.ModTime().Unix() > cachedMeta.ModTime {
    exists = false  // Invalidate cache
}

2. Body Hash Mismatch

Compare BLAKE3 hash of body content:

bodyHash := utils.GetBodyHash(source)
if exists && cachedMeta.BodyHash != bodyHash {
    exists = false  // Invalidate cache
}

3. Template Changes

Invalidate all posts using a changed template:

func (b *Builder) invalidateForTemplate(templatePath string) []string {
    if b.cacheService == nil {
        return nil
    }
    
    affectedPosts, _ := b.cacheService.GetPostsByTemplate(templatePath)
    return affectedPosts  // Will be deleted and rebuilt
}

4. Global Dependency Changes

Force rebuild if any global dependency changed:

globalDependencies := []string{
    filepath.Join(cfg.TemplateDir, "layout.html"),
    filepath.Join(cfg.StaticDir, "css/layout.css"),
    "kosh.yaml",
}

for _, dep := range globalDependencies {
    if info, _ := os.Stat(dep); info.ModTime().After(lastBuildTime) {
        shouldForce = true  // Rebuild everything
    }
}

Cache Garbage Collection

The kosh cache gc command removes orphaned entries:

func (m *Manager) GarbageCollect(dryRun bool) (*GCStats, error) {
    stats := &GCStats{}
    
    // 1. Find all referenced hashes
    referencedHashes := make(map[string]bool)
    _ = m.db.View(func(tx *bolt.Tx) error {
        posts := tx.Bucket([]byte(BucketPosts))
        return posts.ForEach(func(k, v []byte) error {
            var post PostMeta
            if err := Decode(v, &post); err == nil {
                if post.HTMLHash != "" {
                    referencedHashes[post.HTMLHash] = true
                }
                for _, h := range post.SSRInputHashes {
                    referencedHashes[h] = true
                }
            }
            return nil
        })
    })
    
    // 2. Scan content store and delete unreferenced files
    filepath.Walk(m.store.basePath, func(path string, info os.FileInfo, err error) error {
        if info.IsDir() {
            return nil
        }
        
        hash := extractHashFromPath(path)
        if !referencedHashes[hash] {
            stats.OrphanedFiles++
            stats.BytesReclaimed += info.Size()
            
            if !dryRun {
                os.Remove(path)
            }
        }
        return nil
    })
    
    return stats, nil
}

Cache Statistics

The kosh cache stats command displays cache metrics:

func (m *Manager) Stats() (*CacheStats, error) {
    stats := &CacheStats{}
    
    _ = m.db.View(func(tx *bolt.Tx) error {
        // Count posts
        posts := tx.Bucket([]byte(BucketPosts))
        stats.TotalPosts = posts.Stats().KeyN
        
        // Count SSR artifacts
        ssr := tx.Bucket([]byte(BucketSSR))
        stats.TotalSSR = ssr.Stats().KeyN
        
        // Get build count
        meta := tx.Bucket([]byte(BucketMeta))
        if data := meta.Get([]byte(KeyBuildCount)); data != nil {
            stats.BuildCount = int(binary.BigEndian.Uint32(data))
        }
        
        return nil
    })
    
    // Get content store size
    filepath.Walk(m.store.basePath, func(path string, info os.FileInfo, err error) error {
        if !info.IsDir() {
            stats.StoreBytes += info.Size()
        }
        return nil
    })
    
    return stats, nil
}

Next Steps

Understand Build Pipeline orchestration
Learn about Service Layer architecture
Review Architecture Overview for context

Get Started

Core Concepts

Usage

Features

Development

Cache System

Cache Architecture

Two-Tier Storage

BoltDB Schema

Cache Data Structures

PostMeta

SearchRecord

Dependencies

BLAKE3 Hashing

Content-Addressed Storage

Sharding Strategy

Compression Strategy

In-Memory LRU Cache

Generic Cache Reads

Batch Operations

Cache Invalidation

1. File Modification

2. Body Hash Mismatch

3. Template Changes

4. Global Dependency Changes

Cache Garbage Collection

Cache Statistics

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage

Features

Development

​Cache Architecture

​Two-Tier Storage

​BoltDB Schema

​Cache Data Structures

​PostMeta

​SearchRecord

​Dependencies

​BLAKE3 Hashing

​Content-Addressed Storage

​Sharding Strategy

​Compression Strategy

​In-Memory LRU Cache

​Generic Cache Reads

​Batch Operations

​Cache Invalidation

​1. File Modification

​2. Body Hash Mismatch

​3. Template Changes

​4. Global Dependency Changes

​Cache Garbage Collection

​Cache Statistics

​Next Steps

Build docs developers (and LLMs) love

Cache Architecture

Two-Tier Storage

BoltDB Schema

Cache Data Structures

PostMeta

SearchRecord

Dependencies

BLAKE3 Hashing

Content-Addressed Storage

Sharding Strategy

Compression Strategy

In-Memory LRU Cache

Generic Cache Reads

Batch Operations

Cache Invalidation

1. File Modification

2. Body Hash Mismatch

3. Template Changes

4. Global Dependency Changes

Cache Garbage Collection

Cache Statistics

Next Steps