Kosh implements a compiler-grade incremental build system using BoltDB for metadata and a content-addressed file store for large artifacts. This enables fast rebuilds by only processing changed files.
Cache Architecture
Two-Tier Storage
- BoltDB (Metadata): Fast key-value lookups for post metadata, dependencies, and search indexes
- Content Store (Large Artifacts): Content-addressed storage for HTML, images, and SSR outputs
BoltDB Schema
Kosh uses multiple buckets to organize cached data:
| Bucket | Key | Value | Purpose |
|---|
posts | PostID (BLAKE3) | PostMeta (msgpack) | Post metadata and inline HTML |
paths | Normalized path | PostID | Path → ID lookup |
search | PostID | SearchRecord (msgpack) | BM25 data and tokenized content |
deps | PostID | Dependencies (msgpack) | Template and include dependencies |
tags | Tag name | []PostID | Tag → posts index |
templates | Template path | []PostID | Template → posts index |
ssr | Input hash | SSRArtifact (msgpack) | D2/KaTeX cached outputs |
meta | Key name | Value | Build metadata (schema version, cache ID) |
All data structures use msgpack for efficient serialization (30% smaller and 2.5x faster than GOB).
Cache Data Structures
PostMeta
Stores all metadata about a cached post:
type PostMeta struct {
PostID string // BLAKE3 of UUID or path
Path string // Normalized relative path
ModTime int64 // File modification timestamp
ContentHash string // BLAKE3 of frontmatter
BodyHash string // BLAKE3 of body content (v1.2.1)
HTMLHash string // BLAKE3 of HTML (for large posts)
InlineHTML []byte // HTML < 32KB stored inline
TemplateHash string // Template fingerprint
SSRInputHashes []string // D2/KaTeX input hashes
Title string
Date time.Time
Tags []string
WordCount int
ReadingTime int
Description string
Link string
Weight int
Pinned bool
Draft bool
Meta map[string]interface{}
TOC []models.TOCEntry
Version string
}
Key Innovation (v1.2.1):
The BodyHash field was added to fix a critical bug where body-only changes were ignored:
// Always read source to compute body hash (CRITICAL for cache validity)
if info == nil {
info, _ = s.sourceFs.Stat(path)
}
source, _ := afero.ReadFile(s.sourceFs, path)
bodyHash := utils.GetBodyHash(source) // BLAKE3 of just the body
// Invalidate cache if body content changed (regardless of ModTime)
if exists && cachedMeta != nil && cachedMeta.BodyHash != "" && cachedMeta.BodyHash != bodyHash {
exists = false
}
This ensures that even if only the markdown body changes (without frontmatter changes), the cache is properly invalidated.
SearchRecord
Stores pre-computed search index data:
type SearchRecord struct {
Title string
NormalizedTitle string // Lowercase for case-insensitive search
Tokens []string
BM25Data map[string]int // Word frequency map
DocLen int // Document length in tokens
Content string // Plain text content
NormalizedTags []string // Lowercase tags
Words []string // Cached tokenized words
}
Pre-computed Fields:
The cache stores normalized strings to avoid runtime ToLower() calls during search:
// Pre-compute normalized fields for search
normalizedTags := make([]string, len(post.Tags))
for i, t := range post.Tags {
normalizedTags[i] = strings.ToLower(t)
}
searchRecord := models.PostRecord{
Title: post.Title,
NormalizedTitle: strings.ToLower(post.Title), // Pre-normalized
NormalizedTags: normalizedTags, // Pre-normalized
// ...
}
Dependencies
Tracks what each post depends on for incremental invalidation:
type Dependencies struct {
Templates []string // Template files used
Includes []string // Included content files
Tags []string // Tags (for tag page invalidation)
}
BLAKE3 Hashing
Kosh uses BLAKE3 for all content hashing (replaced MD5 in Phase 1 for security).
import "github.com/zeebo/blake3"
// HashContent computes BLAKE3 hash of content and returns hex string
func HashContent(data []byte) string {
hash := blake3.Sum256(data)
return hex.EncodeToString(hash[:])
}
// HashString computes BLAKE3 hash of a string
func HashString(s string) string {
return HashContent([]byte(s))
}
Benefits of BLAKE3:
- Secure: Cryptographically strong (unlike MD5)
- Fast: ~3x faster than SHA-256
- Deterministic: Same input always produces same hash
- Collision-resistant: Practically impossible to find collisions
Content-Addressed Storage
Large HTML content (≥32KB) is stored in a content-addressed file store:
type Store struct {
basePath string
encoder *zstd.Encoder
decoder *zstd.Decoder
}
// Put stores content and returns its hash and compression type
func (s *Store) Put(category string, content []byte) (hash string, ct CompressionType, err error) {
hash = HashContent(content) // BLAKE3 hash
ct = determineCompression(len(content))
// Two-tier sharding: hash[0:2]/hash[2:4]/hash
path := s.shardPath(category, hash) + extension(ct)
// Check if already exists (content-addressed = deduplication)
if _, err := os.Stat(path); err == nil {
return hash, ct, nil // Already stored
}
// Compress and write
compressed := s.compress(content, ct)
return hash, ct, os.WriteFile(path, compressed, 0644)
}
Sharding Strategy
Files are sharded into directories to avoid file system performance issues:
.kosh-cache/store/
├── html/
│ ├── a1/
│ │ ├── b2/
│ │ │ └── a1b2c3d4e5f6...zst (BLAKE3 hash)
│ │ └── c3/
│ └── f7/
└── ssr/
└── d2/
└── a1/
└── b2c3d4e5f6...raw
Compression Strategy
| Size | Compression | Reason |
|---|
| < 1KB | None (.raw) | Compression overhead > savings |
| 1KB - 100KB | Zstd Fast | Balanced speed/ratio |
| > 100KB | Zstd Level 3 | Better compression for large files |
Content-addressed storage provides automatic deduplication: if two posts generate identical HTML, only one copy is stored.
In-Memory LRU Cache
Added in v1.2.1, hot PostMeta data is cached in memory with TTL:
type memoryCacheEntry struct {
meta *PostMeta
expiresAt time.Time
}
type Manager struct {
// ...
memCache map[string]*memoryCacheEntry
memCacheMu sync.RWMutex
memCacheTTL time.Duration // Default: 5 minutes
}
func (m *Manager) memCacheGet(key string) *PostMeta {
m.memCacheMu.RLock()
entry, ok := m.memCache[key]
m.memCacheMu.RUnlock()
if !ok {
return nil
}
if time.Now().After(entry.expiresAt) {
m.memCacheMu.Lock()
delete(m.memCache, key)
m.memCacheMu.Unlock()
return nil
}
return entry.meta
}
Performance Impact:
- Reduces BoltDB reads for frequently accessed posts
- Particularly effective during pagination rendering
- 5-minute TTL balances freshness and hit rate
Generic Cache Reads
Kosh uses Go 1.18+ generics for type-safe cache retrieval:
func getCachedItem[T any](db *bolt.DB, bucketName string, key []byte) (*T, error) {
var result *T
err := db.View(func(tx *bolt.Tx) error {
bucket := tx.Bucket([]byte(bucketName))
if bucket == nil {
return nil
}
data := bucket.Get(key)
if data == nil {
return nil
}
var item T
if err := Decode(data, &item); err != nil {
return err
}
result = &item
return nil
})
return result, err
}
// Usage
post, err := getCachedItem[PostMeta](m.db, BucketPosts, []byte(postID))
Benefits:
- Type safety: Compile-time type checking
- Code reduction: Single implementation for all types
- Performance: No runtime type assertions
Batch Operations
Cache writes are batched to minimize BoltDB transactions:
func (m *Manager) BatchCommit(
posts []*PostMeta,
records map[string]*SearchRecord,
deps map[string]*Dependencies,
) error {
// Encode all items before transaction
encodedPosts := make([]EncodedPost, 0, len(posts))
for _, post := range posts {
data, _ := Encode(post)
searchData, _ := Encode(records[post.PostID])
depsData, _ := Encode(deps[post.PostID])
encodedPosts = append(encodedPosts, EncodedPost{
PostID: []byte(post.PostID),
Data: data,
Path: []byte(post.Path),
SearchData: searchData,
DepsData: depsData,
Tags: post.Tags,
})
}
// Single write transaction
return m.db.Update(func(tx *bolt.Tx) error {
for _, enc := range encodedPosts {
// Write to posts bucket
// Write to paths bucket
// Write to search bucket
// Write to deps bucket
// Write to tags bucket
}
return nil
})
}
Batching reduces transaction overhead from O(n) to O(1), where n is the number of posts.
Cache Invalidation
Kosh invalidates cached posts when dependencies change:
1. File Modification
Compare ModTime from file system with cached value:
if info.ModTime().Unix() > cachedMeta.ModTime {
exists = false // Invalidate cache
}
2. Body Hash Mismatch
Compare BLAKE3 hash of body content:
bodyHash := utils.GetBodyHash(source)
if exists && cachedMeta.BodyHash != bodyHash {
exists = false // Invalidate cache
}
3. Template Changes
Invalidate all posts using a changed template:
func (b *Builder) invalidateForTemplate(templatePath string) []string {
if b.cacheService == nil {
return nil
}
affectedPosts, _ := b.cacheService.GetPostsByTemplate(templatePath)
return affectedPosts // Will be deleted and rebuilt
}
4. Global Dependency Changes
Force rebuild if any global dependency changed:
globalDependencies := []string{
filepath.Join(cfg.TemplateDir, "layout.html"),
filepath.Join(cfg.StaticDir, "css/layout.css"),
"kosh.yaml",
}
for _, dep := range globalDependencies {
if info, _ := os.Stat(dep); info.ModTime().After(lastBuildTime) {
shouldForce = true // Rebuild everything
}
}
Cache Garbage Collection
The kosh cache gc command removes orphaned entries:
func (m *Manager) GarbageCollect(dryRun bool) (*GCStats, error) {
stats := &GCStats{}
// 1. Find all referenced hashes
referencedHashes := make(map[string]bool)
_ = m.db.View(func(tx *bolt.Tx) error {
posts := tx.Bucket([]byte(BucketPosts))
return posts.ForEach(func(k, v []byte) error {
var post PostMeta
if err := Decode(v, &post); err == nil {
if post.HTMLHash != "" {
referencedHashes[post.HTMLHash] = true
}
for _, h := range post.SSRInputHashes {
referencedHashes[h] = true
}
}
return nil
})
})
// 2. Scan content store and delete unreferenced files
filepath.Walk(m.store.basePath, func(path string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}
hash := extractHashFromPath(path)
if !referencedHashes[hash] {
stats.OrphanedFiles++
stats.BytesReclaimed += info.Size()
if !dryRun {
os.Remove(path)
}
}
return nil
})
return stats, nil
}
Cache Statistics
The kosh cache stats command displays cache metrics:
func (m *Manager) Stats() (*CacheStats, error) {
stats := &CacheStats{}
_ = m.db.View(func(tx *bolt.Tx) error {
// Count posts
posts := tx.Bucket([]byte(BucketPosts))
stats.TotalPosts = posts.Stats().KeyN
// Count SSR artifacts
ssr := tx.Bucket([]byte(BucketSSR))
stats.TotalSSR = ssr.Stats().KeyN
// Get build count
meta := tx.Bucket([]byte(BucketMeta))
if data := meta.Get([]byte(KeyBuildCount)); data != nil {
stats.BuildCount = int(binary.BigEndian.Uint32(data))
}
return nil
})
// Get content store size
filepath.Walk(m.store.basePath, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
stats.StoreBytes += info.Size()
}
return nil
})
return stats, nil
}
Next Steps