Skip to main content
Kosh implements a compiler-grade incremental build system using BoltDB for metadata and a content-addressed file store for large artifacts. This enables fast rebuilds by only processing changed files.

Cache Architecture

Two-Tier Storage

  1. BoltDB (Metadata): Fast key-value lookups for post metadata, dependencies, and search indexes
  2. Content Store (Large Artifacts): Content-addressed storage for HTML, images, and SSR outputs

BoltDB Schema

Kosh uses multiple buckets to organize cached data:
BucketKeyValuePurpose
postsPostID (BLAKE3)PostMeta (msgpack)Post metadata and inline HTML
pathsNormalized pathPostIDPath → ID lookup
searchPostIDSearchRecord (msgpack)BM25 data and tokenized content
depsPostIDDependencies (msgpack)Template and include dependencies
tagsTag name[]PostIDTag → posts index
templatesTemplate path[]PostIDTemplate → posts index
ssrInput hashSSRArtifact (msgpack)D2/KaTeX cached outputs
metaKey nameValueBuild metadata (schema version, cache ID)
All data structures use msgpack for efficient serialization (30% smaller and 2.5x faster than GOB).

Cache Data Structures

PostMeta

Stores all metadata about a cached post:
type PostMeta struct {
    PostID         string                 // BLAKE3 of UUID or path
    Path           string                 // Normalized relative path
    ModTime        int64                  // File modification timestamp
    ContentHash    string                 // BLAKE3 of frontmatter
    BodyHash       string                 // BLAKE3 of body content (v1.2.1)
    HTMLHash       string                 // BLAKE3 of HTML (for large posts)
    InlineHTML     []byte                 // HTML < 32KB stored inline
    TemplateHash   string                 // Template fingerprint
    SSRInputHashes []string               // D2/KaTeX input hashes
    Title          string
    Date           time.Time
    Tags           []string
    WordCount      int
    ReadingTime    int
    Description    string
    Link           string
    Weight         int
    Pinned         bool
    Draft          bool
    Meta           map[string]interface{}
    TOC            []models.TOCEntry
    Version        string
}
Key Innovation (v1.2.1): The BodyHash field was added to fix a critical bug where body-only changes were ignored:
// Always read source to compute body hash (CRITICAL for cache validity)
if info == nil {
    info, _ = s.sourceFs.Stat(path)
}
source, _ := afero.ReadFile(s.sourceFs, path)
bodyHash := utils.GetBodyHash(source)  // BLAKE3 of just the body

// Invalidate cache if body content changed (regardless of ModTime)
if exists && cachedMeta != nil && cachedMeta.BodyHash != "" && cachedMeta.BodyHash != bodyHash {
    exists = false
}
This ensures that even if only the markdown body changes (without frontmatter changes), the cache is properly invalidated.

SearchRecord

Stores pre-computed search index data:
type SearchRecord struct {
    Title           string
    NormalizedTitle string         // Lowercase for case-insensitive search
    Tokens          []string
    BM25Data        map[string]int // Word frequency map
    DocLen          int            // Document length in tokens
    Content         string         // Plain text content
    NormalizedTags  []string       // Lowercase tags
    Words           []string       // Cached tokenized words
}
Pre-computed Fields: The cache stores normalized strings to avoid runtime ToLower() calls during search:
// Pre-compute normalized fields for search
normalizedTags := make([]string, len(post.Tags))
for i, t := range post.Tags {
    normalizedTags[i] = strings.ToLower(t)
}

searchRecord := models.PostRecord{
    Title:           post.Title,
    NormalizedTitle: strings.ToLower(post.Title),  // Pre-normalized
    NormalizedTags:  normalizedTags,                // Pre-normalized
    // ...
}

Dependencies

Tracks what each post depends on for incremental invalidation:
type Dependencies struct {
    Templates []string  // Template files used
    Includes  []string  // Included content files
    Tags      []string  // Tags (for tag page invalidation)
}

BLAKE3 Hashing

Kosh uses BLAKE3 for all content hashing (replaced MD5 in Phase 1 for security).
import "github.com/zeebo/blake3"

// HashContent computes BLAKE3 hash of content and returns hex string
func HashContent(data []byte) string {
    hash := blake3.Sum256(data)
    return hex.EncodeToString(hash[:])
}

// HashString computes BLAKE3 hash of a string
func HashString(s string) string {
    return HashContent([]byte(s))
}
Benefits of BLAKE3:
  • Secure: Cryptographically strong (unlike MD5)
  • Fast: ~3x faster than SHA-256
  • Deterministic: Same input always produces same hash
  • Collision-resistant: Practically impossible to find collisions

Content-Addressed Storage

Large HTML content (≥32KB) is stored in a content-addressed file store:
type Store struct {
    basePath string
    encoder  *zstd.Encoder
    decoder  *zstd.Decoder
}

// Put stores content and returns its hash and compression type
func (s *Store) Put(category string, content []byte) (hash string, ct CompressionType, err error) {
    hash = HashContent(content)  // BLAKE3 hash
    ct = determineCompression(len(content))
    
    // Two-tier sharding: hash[0:2]/hash[2:4]/hash
    path := s.shardPath(category, hash) + extension(ct)
    
    // Check if already exists (content-addressed = deduplication)
    if _, err := os.Stat(path); err == nil {
        return hash, ct, nil  // Already stored
    }
    
    // Compress and write
    compressed := s.compress(content, ct)
    return hash, ct, os.WriteFile(path, compressed, 0644)
}

Sharding Strategy

Files are sharded into directories to avoid file system performance issues:
.kosh-cache/store/
├── html/
│   ├── a1/
│   │   ├── b2/
│   │   │   └── a1b2c3d4e5f6...zst  (BLAKE3 hash)
│   │   └── c3/
│   └── f7/
└── ssr/
    └── d2/
        └── a1/
            └── b2c3d4e5f6...raw

Compression Strategy

SizeCompressionReason
< 1KBNone (.raw)Compression overhead > savings
1KB - 100KBZstd FastBalanced speed/ratio
> 100KBZstd Level 3Better compression for large files
Content-addressed storage provides automatic deduplication: if two posts generate identical HTML, only one copy is stored.

In-Memory LRU Cache

Added in v1.2.1, hot PostMeta data is cached in memory with TTL:
type memoryCacheEntry struct {
    meta      *PostMeta
    expiresAt time.Time
}

type Manager struct {
    // ...
    memCache    map[string]*memoryCacheEntry
    memCacheMu  sync.RWMutex
    memCacheTTL time.Duration  // Default: 5 minutes
}

func (m *Manager) memCacheGet(key string) *PostMeta {
    m.memCacheMu.RLock()
    entry, ok := m.memCache[key]
    m.memCacheMu.RUnlock()
    
    if !ok {
        return nil
    }
    
    if time.Now().After(entry.expiresAt) {
        m.memCacheMu.Lock()
        delete(m.memCache, key)
        m.memCacheMu.Unlock()
        return nil
    }
    
    return entry.meta
}
Performance Impact:
  • Reduces BoltDB reads for frequently accessed posts
  • Particularly effective during pagination rendering
  • 5-minute TTL balances freshness and hit rate

Generic Cache Reads

Kosh uses Go 1.18+ generics for type-safe cache retrieval:
func getCachedItem[T any](db *bolt.DB, bucketName string, key []byte) (*T, error) {
    var result *T
    err := db.View(func(tx *bolt.Tx) error {
        bucket := tx.Bucket([]byte(bucketName))
        if bucket == nil {
            return nil
        }
        data := bucket.Get(key)
        if data == nil {
            return nil
        }
        
        var item T
        if err := Decode(data, &item); err != nil {
            return err
        }
        result = &item
        return nil
    })
    return result, err
}

// Usage
post, err := getCachedItem[PostMeta](m.db, BucketPosts, []byte(postID))
Benefits:
  • Type safety: Compile-time type checking
  • Code reduction: Single implementation for all types
  • Performance: No runtime type assertions

Batch Operations

Cache writes are batched to minimize BoltDB transactions:
func (m *Manager) BatchCommit(
    posts []*PostMeta,
    records map[string]*SearchRecord,
    deps map[string]*Dependencies,
) error {
    // Encode all items before transaction
    encodedPosts := make([]EncodedPost, 0, len(posts))
    for _, post := range posts {
        data, _ := Encode(post)
        searchData, _ := Encode(records[post.PostID])
        depsData, _ := Encode(deps[post.PostID])
        
        encodedPosts = append(encodedPosts, EncodedPost{
            PostID:     []byte(post.PostID),
            Data:       data,
            Path:       []byte(post.Path),
            SearchData: searchData,
            DepsData:   depsData,
            Tags:       post.Tags,
        })
    }
    
    // Single write transaction
    return m.db.Update(func(tx *bolt.Tx) error {
        for _, enc := range encodedPosts {
            // Write to posts bucket
            // Write to paths bucket
            // Write to search bucket
            // Write to deps bucket
            // Write to tags bucket
        }
        return nil
    })
}
Batching reduces transaction overhead from O(n) to O(1), where n is the number of posts.

Cache Invalidation

Kosh invalidates cached posts when dependencies change:

1. File Modification

Compare ModTime from file system with cached value:
if info.ModTime().Unix() > cachedMeta.ModTime {
    exists = false  // Invalidate cache
}

2. Body Hash Mismatch

Compare BLAKE3 hash of body content:
bodyHash := utils.GetBodyHash(source)
if exists && cachedMeta.BodyHash != bodyHash {
    exists = false  // Invalidate cache
}

3. Template Changes

Invalidate all posts using a changed template:
func (b *Builder) invalidateForTemplate(templatePath string) []string {
    if b.cacheService == nil {
        return nil
    }
    
    affectedPosts, _ := b.cacheService.GetPostsByTemplate(templatePath)
    return affectedPosts  // Will be deleted and rebuilt
}

4. Global Dependency Changes

Force rebuild if any global dependency changed:
globalDependencies := []string{
    filepath.Join(cfg.TemplateDir, "layout.html"),
    filepath.Join(cfg.StaticDir, "css/layout.css"),
    "kosh.yaml",
}

for _, dep := range globalDependencies {
    if info, _ := os.Stat(dep); info.ModTime().After(lastBuildTime) {
        shouldForce = true  // Rebuild everything
    }
}

Cache Garbage Collection

The kosh cache gc command removes orphaned entries:
func (m *Manager) GarbageCollect(dryRun bool) (*GCStats, error) {
    stats := &GCStats{}
    
    // 1. Find all referenced hashes
    referencedHashes := make(map[string]bool)
    _ = m.db.View(func(tx *bolt.Tx) error {
        posts := tx.Bucket([]byte(BucketPosts))
        return posts.ForEach(func(k, v []byte) error {
            var post PostMeta
            if err := Decode(v, &post); err == nil {
                if post.HTMLHash != "" {
                    referencedHashes[post.HTMLHash] = true
                }
                for _, h := range post.SSRInputHashes {
                    referencedHashes[h] = true
                }
            }
            return nil
        })
    })
    
    // 2. Scan content store and delete unreferenced files
    filepath.Walk(m.store.basePath, func(path string, info os.FileInfo, err error) error {
        if info.IsDir() {
            return nil
        }
        
        hash := extractHashFromPath(path)
        if !referencedHashes[hash] {
            stats.OrphanedFiles++
            stats.BytesReclaimed += info.Size()
            
            if !dryRun {
                os.Remove(path)
            }
        }
        return nil
    })
    
    return stats, nil
}

Cache Statistics

The kosh cache stats command displays cache metrics:
func (m *Manager) Stats() (*CacheStats, error) {
    stats := &CacheStats{}
    
    _ = m.db.View(func(tx *bolt.Tx) error {
        // Count posts
        posts := tx.Bucket([]byte(BucketPosts))
        stats.TotalPosts = posts.Stats().KeyN
        
        // Count SSR artifacts
        ssr := tx.Bucket([]byte(BucketSSR))
        stats.TotalSSR = ssr.Stats().KeyN
        
        // Get build count
        meta := tx.Bucket([]byte(BucketMeta))
        if data := meta.Get([]byte(KeyBuildCount)); data != nil {
            stats.BuildCount = int(binary.BigEndian.Uint32(data))
        }
        
        return nil
    })
    
    // Get content store size
    filepath.Walk(m.store.basePath, func(path string, info os.FileInfo, err error) error {
        if !info.IsDir() {
            stats.StoreBytes += info.Size()
        }
        return nil
    })
    
    return stats, nil
}

Next Steps

Build docs developers (and LLMs) love