Architecture Overview

System Architecture

doc-kit uses a dependency-driven generator orchestration system that manages the execution of documentation generators in the correct order, with support for parallel processing and streaming results.

Core Components

The architecture consists of three main layers:

Generator Orchestration (src/generators.mjs) - Manages dependency chains and execution order
Worker Thread Pool (src/threading/) - Distributes work across CPU cores
Streaming System (src/streaming.mjs) - Handles async generators and result collection

Dependency Chain

Generators can depend on the output of other generators using the dependsOn property:

// Example from src/generators/index.mjs
const allGenerators = {
  'legacy-json': {
    dependsOn: undefined, // Root generator
    generate: async (input) => { /* ... */ },
  },
  'json-simple': {
    dependsOn: 'legacy-json', // Depends on legacy-json output
    generate: async (legacyJsonData) => { /* ... */ },
  },
};

Execution Order

The system automatically resolves dependencies and executes generators in the correct order:

Scheduling Phase - Recursively schedule generators and their dependencies
Execution Phase - Execute generators once dependencies are resolved
Collection Phase - Collect results from streaming generators

// From src/generators.mjs:51-62
const scheduleGenerator = async (generatorName, configuration) => {
  if (generatorName in cachedGenerators) {
    return; // Already scheduled
  }

  const { dependsOn, generate, hasParallelProcessor } =
    allGenerators[generatorName];

  // Schedule dependency first
  if (dependsOn && !(dependsOn in cachedGenerators)) {
    await scheduleGenerator(dependsOn, configuration);
  }

  // Then schedule this generator
  cachedGenerators[generatorName] = (async () => {
    const dependencyInput = await getDependencyInput(dependsOn);
    return await generate(dependencyInput);
  })();
};

Generator Cache

The orchestration system maintains a cache of generator promises to:

Prevent duplicate execution - Each generator runs only once per pipeline
Enable parallel collection - Multiple consumers can await the same generator
Support streaming - Async generators are collected once and shared

// From src/generators.mjs:18
const cachedGenerators = {};

// Generators are stored as promises
cachedGenerators['legacy-json'] = Promise<JSONData>;
cachedGenerators['json-simple'] = AsyncGenerator<SimpleJSON[]>;

Streaming vs Non-Streaming

Non-Streaming Generators

Return a promise that resolves to the complete result:

{
  generate: async (input) => {
    const result = await processAllData(input);
    return result; // Complete result
  }
}

Streaming Generators

Return an async generator that yields chunks of results:

{
  hasParallelProcessor: true, // Enables streaming
  generate: async function* (input, worker) {
    // Process in parallel, yield chunks as they complete
    for await (const chunk of worker.stream(items, input, extra)) {
      yield chunk;
    }
  }
}

Streaming generators require hasParallelProcessor: true to receive a worker instance.

Parallel Processing

Generators with hasParallelProcessor: true receive a parallel worker instance:

// From src/generators.mjs:76-78
const worker = hasParallelProcessor
  ? createParallelWorker(generatorName, pool, configuration)
  : Promise.resolve(null);

const result = await generate(dependencyInput, await worker);

The worker distributes items across the thread pool and streams results back as chunks complete.

Complete Pipeline Flow

Step-by-Step Execution

Initialize - Create worker pool with specified thread count
Schedule - Recursively schedule all requested generators and dependencies
Execute - Start generator execution (dependencies first)
Stream - For parallel generators, distribute work across threads
Collect - Gather all results, collecting streaming generators
Cleanup - Destroy worker pool and return results

Configuration Flow

Configuration is passed through the entire pipeline:

// From src/generators.mjs:98-104
const runGenerators = async configuration => {
  const { target: generators, threads } = configuration;

  // Create worker pool with thread count
  pool = createWorkerPool(threads);

  // Pass configuration to all generators
  for (const name of generators) {
    await scheduleGenerator(name, configuration);
  }
};

Each generator receives:

Global settings - threads, chunkSize, input, output
Generator-specific config - From configuration[generatorName]

Performance Characteristics

Memory Efficiency

Streaming generators - Process and yield chunks incrementally
Worker isolation - Each thread has its own memory space
Shared cache - Results are cached to avoid recomputation

CPU Utilization

Parallel processing - Work distributed across available CPU cores
Async execution - Non-blocking I/O operations
Lazy collection - Streaming generators only collected when needed

Optimal Configuration

{
  threads: os.cpus().length, // Use all available cores
  chunkSize: 100,            // Balance overhead vs parallelism
  target: ['json-simple']    // Only run needed generators
}

Error Handling

The system handles errors at multiple levels:

Generator errors - Caught and logged by orchestration system
Worker errors - Piscina handles thread crashes and restarts
Dependency errors - Propagate to dependent generators

Next Steps

Worker Threads

Deep dive into the worker thread implementation

Streaming

Learn about async generators and streaming architecture

Get Started

Commands

Configuration

Generators

Advanced

Architecture Overview

System Architecture

Core Components

Dependency Chain

Execution Order

Generator Cache

Streaming vs Non-Streaming

Non-Streaming Generators

Streaming Generators

Parallel Processing

Complete Pipeline Flow

Step-by-Step Execution

Configuration Flow

Performance Characteristics

Memory Efficiency

CPU Utilization

Optimal Configuration

Error Handling

Next Steps

Worker Threads

Streaming

Build docs developers (and LLMs) love

Get Started

Commands

Configuration

Generators

Advanced

​System Architecture

​Core Components

​Dependency Chain

​Execution Order

​Generator Cache

​Streaming vs Non-Streaming

​Non-Streaming Generators

​Streaming Generators

​Parallel Processing

​Complete Pipeline Flow

​Step-by-Step Execution

​Configuration Flow

​Performance Characteristics

​Memory Efficiency

​CPU Utilization

​Optimal Configuration

​Error Handling

​Next Steps

Worker Threads

Streaming

Build docs developers (and LLMs) love

System Architecture

Core Components

Dependency Chain

Execution Order

Generator Cache

Streaming vs Non-Streaming

Non-Streaming Generators

Streaming Generators

Parallel Processing

Complete Pipeline Flow

Step-by-Step Execution

Configuration Flow

Performance Characteristics

Memory Efficiency

CPU Utilization

Optimal Configuration

Error Handling

Next Steps