Skip to main content

Overview

nrvna-ai is an asynchronous inference primitive - a directory-based job queue for LLM inference using llama.cpp. Jobs are represented as filesystem directories that move through states via atomic renames.
The filesystem becomes the state machine. A job’s location is its state - no database needed.

Core Architecture

The system follows a clean separation of concerns with distinct components handling specific responsibilities:

Client Layer

Work and Flow classes provide the public API for job submission and result retrieval

Server Layer

Server orchestrates all components and manages the lifecycle

Discovery Layer

Scanner finds jobs waiting in the queue

Execution Layer

Pool, Processor, and Runner handle concurrent job execution

Component Responsibilities

Location: work.hpp/cppThe client-facing API for submitting jobs:
  • Creates job directories in staging area (input/writing/)
  • Writes prompt to prompt.txt
  • Atomically moves job to ready queue (input/ready/)
  • Returns job identifier to client
[[nodiscard]] SubmitResult submit(const std::string& prompt);
static JobId generateId() noexcept;
static bool isValidPrompt(const std::string&) noexcept;
Location: flow.hpp/cppThe client-facing API for querying job status and results:
  • Checks job state by directory location
  • Retrieves results from output/ directory
  • Retrieves error messages from failed/ directory
[[nodiscard]] Status status(const JobId& id) const noexcept;
[[nodiscard]] std::optional<Job> get(const JobId& id) const noexcept;
Location: server.hpp/cppManages the entire system lifecycle:
  • Initializes workspace directory structure
  • Creates and manages Scanner and Pool
  • Runs scan loop in dedicated thread
  • Handles graceful shutdown
  • Recovers orphaned jobs on startup
Server(const std::string& modelPath, 
       const std::filesystem::path& workspace, 
       int workers = 4);
[[nodiscard]] bool start();
void shutdown() noexcept;
Location: scanner.hpp/cppDiscovers jobs waiting in the queue:
  • Scans input/ready/ every 1 second
  • Submits found job IDs to worker pool
  • Single-threaded, runs in dedicated scanner thread
Location: pool.hpp/cppManages concurrent worker threads:
  • Creates N worker threads (default: 4)
  • Maintains job queue with mutex protection
  • Distributes jobs to available workers
  • Uses condition variables for efficient waiting
Location: processor.hpp/cppExecutes individual jobs:
  • Atomically moves job from ready/ to processing/
  • Reads prompt from job directory
  • Calls Runner for inference
  • Writes result and moves job to output/ or failed/
  • Thread-safe, shared across all workers
Location: runner.hpp/cppWraps llama.cpp for LLM inference:
  • Each worker has its own Runner instance
  • Shares loaded model across all workers
  • Each Runner has dedicated inference context
  • Handles token generation and sampling
Location: logger.hpp/cppProvides structured logging across all components:
  • Thread-safe with mutex protection
  • Configurable log levels (ERROR, WARN, INFO, DEBUG, TRACE)
  • Named threads for easy debugging
  • Errors to stderr, everything else to stdout

Directory Structure

The workspace directory acts as the state machine:
WORKSPACE/
├── input/
│   ├── writing/      ← Jobs being created (staging area)
│   └── ready/        ← Jobs waiting to be processed
├── processing/       ← Jobs currently running inference
├── output/           ← Completed jobs with results
└── failed/           ← Failed jobs with error messages
Never manually move job directories while the server is running. Use atomic filesystem operations if you must interact with the queue.

Workflow: Job Submission

1

Client calls Work::submit()

Application code submits a prompt string
2

Create staging directory

Work creates input/writing/<job_id>/
3

Write prompt

Prompt saved to input/writing/<job_id>/prompt.txt
4

Atomic rename

Directory atomically moved to input/ready/<job_id>
5

Return job ID

Client receives job identifier for later retrieval

Workflow: Job Processing

1

Scanner discovers job

Scanner thread finds job in input/ready/ during periodic scan
2

Pool assigns to worker

Job added to queue and picked up by available worker thread
3

Processor moves job

Job atomically renamed from ready/ to processing/
4

Read prompt

Processor reads processing/<job_id>/prompt.txt
5

Run inference

Runner executes llama.cpp inference on the prompt
6

Write result

On success: write result.txt and move to output/
On failure: write error.txt and move to failed/

Workflow: Result Retrieval

// Submit job
Work work(workspace);
auto result = work.submit("What is AI?");
if (!result.ok) {
    std::cerr << "Submit failed: " << result.message << std::endl;
    return;
}

// Poll for completion
Flow flow(workspace);
while (true) {
    auto status = flow.status(result.id);
    if (status == Status::Done) break;
    if (status == Status::Failed) {
        std::cerr << "Job failed" << std::endl;
        return;
    }
    std::this_thread::sleep_for(std::chrono::seconds(1));
}

// Retrieve result
auto job = flow.get(result.id);
if (job) {
    std::cout << job->result << std::endl;
}

Key Design Decisions

Atomic Renames

Directory moves are atomic on POSIX filesystems, ensuring thread-safe state transitions without locks

Directory = State

Job’s location is its state - no database, no coordination needed

Shared Model

llama.cpp model loaded once in memory, shared across all workers

Per-Thread Context

Each worker gets its own inference context for true parallelism

Filesystem-Based

Survives process crashes, easy to inspect and debug

No Exceptions

All errors returned as values, no exception handling needed

Single Responsibility Principle

Each component has exactly one reason to change:
  • Scanner: Only discovers jobs in directories
  • Pool: Only manages worker threads
  • Processor: Only executes individual jobs
  • Work: Only job submission and validation
  • Flow: Only job result retrieval
This design enables easy testing, maintenance, and evolution of the system.

See Also

Job Lifecycle

Detailed state machine and transitions

Filesystem Queue

Directory-based queue design

Threading Model

Concurrency and thread safety

CLI Tools

Command-line interface reference

Build docs developers (and LLMs) love