Skip to main content
Kosh’s build pipeline is a carefully orchestrated sequence of stages designed for performance, correctness, and incremental builds.

Pipeline Overview

Stage 1: Setup & Validation

Build Lock Acquisition

Prevents concurrent builds that could corrupt the output:
func (b *Builder) Build(ctx context.Context) error {
    // Acquire build lock to prevent concurrent builds
    buildLock, lockErr := utils.AcquireBuildLock(b.cfg.OutputDir)
    if lockErr != nil {
        b.logger.Warn("Could not acquire build lock - another build may be in progress", "error", lockErr)
        // Continue anyway - warn but don't block
    } else {
        defer func() { _ = buildLock.Release() }()
    }
    
    // ...
}

WASM Update Check

Runs in parallel to avoid blocking the critical path:
var setupWg sync.WaitGroup
setupWg.Add(1)
go func() {
    defer setupWg.Done()
    select {
    case <-ctx.Done():
        return
    default:
        b.checkWasmUpdate()  // Compare hash of search engine source
    }
}()

Cache Invalidation

Detects changes to global dependencies:
globalDependencies := []string{
    filepath.Join(cfg.TemplateDir, "layout.html"),
    filepath.Join(cfg.TemplateDir, "index.html"),
    filepath.Join(cfg.StaticDir, "css/layout.css"),
    "kosh.yaml",
}

var affectedPosts []string
for _, dep := range globalDependencies {
    if info, err := os.Stat(dep); err == nil && info.ModTime().After(lastBuildTime) {
        affected := b.invalidateForTemplate(dep)
        if affected != nil {
            affectedPosts = append(affectedPosts, affected...)
        } else {
            shouldForce = true  // Force rebuild all
        }
    }
}
If a template changes, only posts using that template are rebuilt. If kosh.yaml changes, all posts are rebuilt.

Stage 2: Build Static Assets

Critical Order: Assets MUST complete before post rendering.
fmt.Println("šŸ“¦ Building assets...")
b.copyStaticAndBuildAssets(ctx)

Why Assets Come First

Templates reference assets using the Assets map:
<!-- Template uses hashed filenames -->
<link rel="stylesheet" href="{{ index .Assets "/static/css/layout.css" }}">
<!-- Rendered as: /static/css/layout-a1b2c3d4.css -->
The Assets map is populated during asset processing:
func (s *assetServiceImpl) Build(ctx context.Context) error {
    var wg sync.WaitGroup
    wg.Add(2)
    
    // 1. Static Copy (images, fonts, etc.)
    go func() {
        defer wg.Done()
        utils.CopyDirVFS(s.sourceFs, s.destFs, s.cfg.StaticDir, destStaticDir, ...)
    }()
    
    // 2. CSS/JS Bundling with esbuild
    go func() {
        defer wg.Done()
        assets := make(map[string]string)
        
        // Bundle CSS
        cssHash := bundleCSS()
        assets["/static/css/layout.css"] = "/static/css/layout-" + cssHash + ".css"
        
        // Bundle JS
        jsHash := bundleJS()
        assets["/static/js/app.js"] = "/static/js/app-" + jsHash + ".js"
        
        // Inject into render service
        s.renderer.SetAssets(assets)
    }()
    
    wg.Wait()  // MUST wait before posts can render
    return nil
}
In v1.2.1, this was changed from parallel to synchronous to fix a race condition that caused CSS 404 errors on post pages.

Stage 3: Process Markdown Posts

The most complex stage, handling parsing, rendering, and indexing.

Fast Path: Cache Rehydration

If only templates changed (not content), rehydrate from cache:
isTemplateOnly := false  // Detect template-only changes
if isTemplateOnly && cachedCount > 0 {
    fmt.Println("šŸ“ Rehydrating from cache...")
    b.renderCachedPosts()
    
    // Hydrate data for global pages from cache
    ids, _ := b.cacheService.ListAllPosts()
    cachedPosts, _ := b.cacheService.GetPostsByIDs(ids)  // Batch fetch
    searchRecords, _ := b.cacheService.GetSearchRecords(ids)
    
    for _, cached := range cachedPosts {
        post := models.PostMetadata{
            Title: cached.Title,
            Link:  cached.Link,
            // ... reconstruct from cache ...
        }
        allPosts = append(allPosts, post)
    }
}
Fast Path Performance:
  • No markdown parsing
  • No HTML rendering
  • No SSR (D2/KaTeX)
  • Only template application

Slow Path: Full Processing

If content changed, parse and render:
fmt.Println("šŸ“ Processing content...")
result, err := b.postService.Process(ctx, shouldForce, forceSocialRebuild, outputMissing)
allPosts = result.AllPosts
indexedPosts = result.IndexedPosts

Post Processing Pipeline

Worker Pool Parallelization

numWorkers := utils.GetDefaultWorkerCount()  // CPU count

parsePool := utils.NewWorkerPool(ctx, numWorkers, func(task PostTask) {
    // 1. Check cache
    cachedMeta, _ := s.cache.GetPostByPath(task.path)
    
    // 2. Validate cache with body hash
    source, _ := afero.ReadFile(s.sourceFs, task.path)
    bodyHash := utils.GetBodyHash(source)
    
    if cachedMeta != nil && cachedMeta.BodyHash == bodyHash {
        // Use cache
        useCache = true
    } else {
        // Parse markdown
        docNode := s.md.Parser().Parse(text.NewReader(source))
        
        // Render HTML
        buf := utils.SharedBufferPool.Get()
        defer utils.SharedBufferPool.Put(buf)
        s.md.Renderer().Render(buf, source, docNode)
        htmlContent := buf.String()
        
        // SSR processing
        htmlContent = mdParser.ReplaceD2BlocksWithThemeSupport(htmlContent, d2Pairs)
        htmlContent, mathHashes = mdParser.RenderMathForHTML(htmlContent, s.nativeRenderer, ...)
        
        // Extract metadata
        metaData := meta.Get(ctx)
        toc := mdParser.GetTOC(ctx)
        
        // Tokenize for search
        words := search.DefaultAnalyzer.Analyze(plainText)
        wordFreqs := make(map[string]int)
        for _, w := range words {
            wordFreqs[w]++
        }
    }
    
    // Store in cache
    newMeta := &cache.PostMeta{...}
    s.cache.StoreHTMLForPost(newMeta, []byte(htmlContent))
})

parsePool.Start()
for _, file := range files {
    parsePool.Submit(PostTask{path: file})
}
parsePool.Stop()  // Wait for all workers

Social Card Generation

Runs in parallel with post processing:
cardPool := utils.NewWorkerPool(ctx, numWorkers, func(task socialCardTask) {
    s.generateSocialCard(task)
})
cardPool.Start()

// Submit cards as posts are processed
for _, post := range posts {
    if needsSocialCard(post) {
        cardPool.Submit(socialCardTask{...})
    }
}

cardPool.Stop()  // Wait for all cards
Social card generation uses headless Chrome via Playwright to render Open Graph preview images.

Stage 4: Render Global Pages

Generate site-wide pages using aggregated post metadata.

Pagination

func (b *Builder) renderPagination(allPosts, pinnedPosts []models.PostMetadata, shouldForce bool) {
    perPage := 10
    totalPages := (len(allPosts) + perPage - 1) / perPage
    
    for page := 1; page <= totalPages; page++ {
        start := (page - 1) * perPage
        end := min(start+perPage, len(allPosts))
        
        data := models.PageData{
            Title:       b.cfg.Title,
            Posts:       allPosts[start:end],
            PinnedPosts: pinnedPosts,  // Show on first page only
            CurrentPage: page,
            TotalPages:  totalPages,
            // ...
        }
        
        if page == 1 {
            b.renderService.RenderIndex(filepath.Join(b.cfg.OutputDir, "index.html"), data)
        } else {
            path := filepath.Join(b.cfg.OutputDir, fmt.Sprintf("page-%d.html", page))
            b.renderService.RenderIndex(path, data)
        }
    }
}

Tag Pages

func (b *Builder) renderTags(tagMap map[string][]models.PostMetadata, forceSocialRebuild bool) {
    for tag, posts := range tagMap {
        utils.SortPosts(posts)  // Sort by date
        
        data := models.PageData{
            Title: "Posts tagged " + tag,
            Posts: posts,
            // ...
        }
        
        tagPath := filepath.Join(b.cfg.OutputDir, "tags", tag+".html")
        b.renderService.RenderPage(tagPath, data)
    }
}

Metadata Generation

Runs in parallel for optimal performance:
func (b *Builder) generateMetadata(allContent []models.PostMetadata, tagMap map[string][]models.PostMetadata, indexedPosts []models.IndexedPost, shouldForce bool) {
    var genWg sync.WaitGroup
    
    if b.cfg.Features.Generators.Sitemap {
        genWg.Add(1)
        go func() {
            defer genWg.Done()
            generators.GenerateSitemap(b.DestFs, b.cfg.BaseURL, allContent, tagMap, ...)
        }()
    }
    
    if b.cfg.Features.Generators.RSS {
        genWg.Add(1)
        go func() {
            defer genWg.Done()
            generators.GenerateRSS(b.DestFs, b.cfg.BaseURL, allContent, ...)
        }()
    }
    
    if b.cfg.Features.Generators.Search {
        genWg.Add(1)
        go func() {
            defer genWg.Done()
            generators.GenerateSearchIndex(b.DestFs, b.cfg.OutputDir, indexedPosts)
        }()
    }
    
    if b.cfg.Features.Generators.Graph {
        genWg.Add(1)
        go func() {
            defer genWg.Done()
            generators.GenerateGraph(b.DestFs, b.cfg.BaseURL, allContent, ...)
        }()
    }
    
    genWg.Wait()  // Wait for all metadata generation
}
Metadata generators run in parallel because they’re independent and I/O-bound.

Stage 5: PWA Generation

Generates Progressive Web App assets (only in production builds):
if cfg.Features.Generators.PWA {
    setupWg.Add(1)
    go func() {
        defer setupWg.Done()
        select {
        case <-ctx.Done():
            return
        default:
            fmt.Println("šŸ“± Generating PWA...")
            b.generatePWA(shouldForce)
        }
    }()
}
PWA Components:
  • manifest.json: App metadata
  • service-worker.js: Offline caching
  • Icon set: 192x192, 512x512 (generated from favicon)
Dev mode (kosh serve --dev) skips PWA generation for faster rebuilds.

Stage 6: Sync VFS to Disk

Flush the in-memory file system to disk using parallel workers:
fmt.Println("šŸ’¾ Syncing to disk...")
if err := utils.SyncVFS(b.DestFs, b.cfg.OutputDir, b.renderService.GetRenderedFiles()); err != nil {
    b.logger.Error("Failed to sync VFS to disk", "error", err)
}
b.renderService.ClearRenderedFiles()

Parallel Sync Implementation

func SyncVFS(vfs afero.Fs, destDir string, renderedFiles map[string]bool) error {
    numWorkers := runtime.NumCPU()
    
    var files []string
    afero.Walk(vfs, "/", func(path string, info os.FileInfo, err error) error {
        if !info.IsDir() {
            files = append(files, path)
        }
        return nil
    })
    
    pool := NewWorkerPool(context.Background(), numWorkers, func(vfsPath string) {
        diskPath := filepath.Join(destDir, vfsPath)
        
        // Read from VFS
        data, _ := afero.ReadFile(vfs, vfsPath)
        
        // Write to disk
        os.MkdirAll(filepath.Dir(diskPath), 0755)
        os.WriteFile(diskPath, data, 0644)
    })
    
    pool.Start()
    for _, file := range files {
        pool.Submit(file)
    }
    pool.Stop()
    
    return nil
}
Why VFS?
  • Atomic builds (all-or-nothing)
  • Fast file operations (memory speed)
  • Incremental sync (only changed files written)
  • Crash-safe (disk never in partial state)

Stage 7: Cleanup & Metrics

Save Caches

defer b.SaveCaches()

func (b *Builder) SaveCaches() {
    // Flush diagram adapter to BoltDB
    if b.diagramAdapter != nil {
        _ = b.diagramAdapter.Close()
    }
    
    // Increment build count
    if b.cacheService != nil {
        _ = b.cacheService.IncrementBuildCount()
    }
    
    // Record end time and print metrics
    b.metrics.RecordEnd()
    if !b.cfg.IsDev {
        b.metrics.Print()  // "šŸ“Š Built N posts in Xs (cache: H/M hits, P%)"
    }
}

Close Resources

defer b.Close()

func (b *Builder) Close() {
    if b.cacheService != nil {
        _ = b.cacheService.Close()  // Close BoltDB
    }
}

Context Cancellation

All stages respect context cancellation for graceful shutdown:
select {
case <-ctx.Done():
    b.logger.Info("Build cancelled", "reason", ctx.Err())
    return ctx.Err()
default:
    // Continue building
}
Signal Handling:
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)

go func() {
    <-sigChan
    fmt.Println("\nšŸ›‘ Shutdown signal received, cleaning up...")
    cancel()  // Trigger context cancellation
    
    time.Sleep(5 * time.Second)  // Grace period
    os.Exit(1)
}()

Performance Optimizations

Critical Path Ordering

StageBlocksReason
AssetsPostsTemplates need Assets map
PostsGlobalGlobal pages need post metadata
GlobalPWAPWA needs all pages for offline caching
PWASyncSync needs all generated files

Parallel Opportunities

OperationParallelization
Post parsingWorker pool (CPU cores)
Social cardsWorker pool (I/O bound)
Asset copyingWorker pool (I/O bound)
Metadata generationGoroutines (independent)
VFS syncWorker pool (I/O bound)

Cache Hit Optimization

Optimize for the common case (no changes):
  1. Fast path detection: Check if only templates changed
  2. Batch reads: GetPostsByIDs() instead of N Ɨ GetPost()
  3. In-memory LRU: Cache hot PostMeta for pagination
  4. Pre-computed fields: Store normalized strings in cache

Next Steps

Build docs developers (and LLMs) love