System Architecture

Overview

nrvna-ai is an asynchronous inference primitive - a directory-based job queue for LLM inference using llama.cpp. Jobs are represented as filesystem directories that move through states via atomic renames.

The filesystem becomes the state machine. A job’s location is its state - no database needed.

Core Architecture

The system follows a clean separation of concerns with distinct components handling specific responsibilities:

Client Layer

Work and Flow classes provide the public API for job submission and result retrieval

Server Layer

Server orchestrates all components and manages the lifecycle

Discovery Layer

Scanner finds jobs waiting in the queue

Execution Layer

Pool, Processor, and Runner handle concurrent job execution

Component Responsibilities

Work - Job Submission API

Location: work.hpp/cppThe client-facing API for submitting jobs:

Creates job directories in staging area (input/writing/)
Writes prompt to prompt.txt
Atomically moves job to ready queue (input/ready/)
Returns job identifier to client

[[nodiscard]] SubmitResult submit(const std::string& prompt);
static JobId generateId() noexcept;
static bool isValidPrompt(const std::string&) noexcept;

Flow - Result Retrieval API

Location: flow.hpp/cppThe client-facing API for querying job status and results:

Checks job state by directory location
Retrieves results from output/ directory
Retrieves error messages from failed/ directory

[[nodiscard]] Status status(const JobId& id) const noexcept;
[[nodiscard]] std::optional<Job> get(const JobId& id) const noexcept;

Server - Orchestrator

Location: server.hpp/cppManages the entire system lifecycle:

Initializes workspace directory structure
Creates and manages Scanner and Pool
Runs scan loop in dedicated thread
Handles graceful shutdown
Recovers orphaned jobs on startup

Server(const std::string& modelPath, 
       const std::filesystem::path& workspace, 
       int workers = 4);
[[nodiscard]] bool start();
void shutdown() noexcept;

Scanner - Job Discovery

Location: scanner.hpp/cppDiscovers jobs waiting in the queue:

Scans input/ready/ every 1 second
Submits found job IDs to worker pool
Single-threaded, runs in dedicated scanner thread

Pool - Thread Pool Manager

Location: pool.hpp/cppManages concurrent worker threads:

Creates N worker threads (default: 4)
Maintains job queue with mutex protection
Distributes jobs to available workers
Uses condition variables for efficient waiting

Processor - Job Executor

Location: processor.hpp/cppExecutes individual jobs:

Atomically moves job from ready/ to processing/
Reads prompt from job directory
Calls Runner for inference
Writes result and moves job to output/ or failed/
Thread-safe, shared across all workers

Runner - Inference Wrapper

Location: runner.hpp/cppWraps llama.cpp for LLM inference:

Each worker has its own Runner instance
Shares loaded model across all workers
Each Runner has dedicated inference context
Handles token generation and sampling

Logger - Thread-Safe Logging

Location: logger.hpp/cppProvides structured logging across all components:

Thread-safe with mutex protection
Configurable log levels (ERROR, WARN, INFO, DEBUG, TRACE)
Named threads for easy debugging
Errors to stderr, everything else to stdout

Directory Structure

The workspace directory acts as the state machine:

WORKSPACE/
├── input/
│   ├── writing/      ← Jobs being created (staging area)
│   └── ready/        ← Jobs waiting to be processed
├── processing/       ← Jobs currently running inference
├── output/           ← Completed jobs with results
└── failed/           ← Failed jobs with error messages

Never manually move job directories while the server is running. Use atomic filesystem operations if you must interact with the queue.

Workflow: Job Submission

Client calls Work::submit()

Application code submits a prompt string

Create staging directory

Work creates input/writing/<job_id>/

Write prompt

Prompt saved to input/writing/<job_id>/prompt.txt

Atomic rename

Directory atomically moved to input/ready/<job_id>

Return job ID

Client receives job identifier for later retrieval

Workflow: Job Processing

Scanner discovers job

Scanner thread finds job in input/ready/ during periodic scan

Pool assigns to worker

Job added to queue and picked up by available worker thread

Processor moves job

Job atomically renamed from ready/ to processing/

Read prompt

Processor reads processing/<job_id>/prompt.txt

Run inference

Runner executes llama.cpp inference on the prompt

Write result

On success: write result.txt and move to output/
On failure: write error.txt and move to failed/

Workflow: Result Retrieval

// Submit job
Work work(workspace);
auto result = work.submit("What is AI?");
if (!result.ok) {
    std::cerr << "Submit failed: " << result.message << std::endl;
    return;
}

// Poll for completion
Flow flow(workspace);
while (true) {
    auto status = flow.status(result.id);
    if (status == Status::Done) break;
    if (status == Status::Failed) {
        std::cerr << "Job failed" << std::endl;
        return;
    }
    std::this_thread::sleep_for(std::chrono::seconds(1));
}

// Retrieve result
auto job = flow.get(result.id);
if (job) {
    std::cout << job->result << std::endl;
}

Key Design Decisions

Atomic Renames

Directory moves are atomic on POSIX filesystems, ensuring thread-safe state transitions without locks

Directory = State

Job’s location is its state - no database, no coordination needed

Shared Model

llama.cpp model loaded once in memory, shared across all workers

Per-Thread Context

Each worker gets its own inference context for true parallelism

Filesystem-Based

Survives process crashes, easy to inspect and debug

No Exceptions

All errors returned as values, no exception handling needed

Single Responsibility Principle

Each component has exactly one reason to change:

Scanner: Only discovers jobs in directories
Pool: Only manages worker threads
Processor: Only executes individual jobs
Work: Only job submission and validation
Flow: Only job result retrieval

This design enables easy testing, maintenance, and evolution of the system.

Job Lifecycle

Detailed state machine and transitions

Filesystem Queue

Directory-based queue design

Threading Model

Concurrency and thread safety

CLI Tools

Command-line interface reference

Get Started

Core Concepts

CLI Tools

Guides

Configuration

System Architecture

Overview

Core Architecture

Client Layer

Server Layer

Discovery Layer

Execution Layer

Component Responsibilities

Directory Structure

Workflow: Job Submission

Workflow: Job Processing

Workflow: Result Retrieval

Key Design Decisions

Atomic Renames

Directory = State

Shared Model

Per-Thread Context

Filesystem-Based

No Exceptions

Single Responsibility Principle

See Also

Job Lifecycle

Filesystem Queue

Threading Model

CLI Tools

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Tools

Guides

Configuration

​Overview

​Core Architecture

Client Layer

Server Layer

Discovery Layer

Execution Layer

​Component Responsibilities

​Directory Structure

​Workflow: Job Submission

​Workflow: Job Processing

​Workflow: Result Retrieval

​Key Design Decisions

Atomic Renames

Directory = State

Shared Model

Per-Thread Context

Filesystem-Based

No Exceptions

​Single Responsibility Principle

​See Also

Job Lifecycle

Filesystem Queue

Threading Model

CLI Tools

Build docs developers (and LLMs) love

Overview

Core Architecture

Component Responsibilities

Directory Structure

Workflow: Job Submission

Workflow: Job Processing

Workflow: Result Retrieval

Key Design Decisions

Single Responsibility Principle

See Also