Overview
nrvna-ai is an asynchronous inference primitive - a directory-based job queue for LLM inference using llama.cpp. Jobs are represented as filesystem directories that move through states via atomic renames.The filesystem becomes the state machine. A job’s location is its state - no database needed.
Core Architecture
The system follows a clean separation of concerns with distinct components handling specific responsibilities:Client Layer
Work and Flow classes provide the public API for job submission and result retrieval
Server Layer
Server orchestrates all components and manages the lifecycle
Discovery Layer
Scanner finds jobs waiting in the queue
Execution Layer
Pool, Processor, and Runner handle concurrent job execution
Component Responsibilities
Work - Job Submission API
Work - Job Submission API
Location:
work.hpp/cppThe client-facing API for submitting jobs:- Creates job directories in staging area (
input/writing/) - Writes prompt to
prompt.txt - Atomically moves job to ready queue (
input/ready/) - Returns job identifier to client
Flow - Result Retrieval API
Flow - Result Retrieval API
Location:
flow.hpp/cppThe client-facing API for querying job status and results:- Checks job state by directory location
- Retrieves results from
output/directory - Retrieves error messages from
failed/directory
Server - Orchestrator
Server - Orchestrator
Location:
server.hpp/cppManages the entire system lifecycle:- Initializes workspace directory structure
- Creates and manages Scanner and Pool
- Runs scan loop in dedicated thread
- Handles graceful shutdown
- Recovers orphaned jobs on startup
Scanner - Job Discovery
Scanner - Job Discovery
Location:
scanner.hpp/cppDiscovers jobs waiting in the queue:- Scans
input/ready/every 1 second - Submits found job IDs to worker pool
- Single-threaded, runs in dedicated scanner thread
Pool - Thread Pool Manager
Pool - Thread Pool Manager
Location:
pool.hpp/cppManages concurrent worker threads:- Creates N worker threads (default: 4)
- Maintains job queue with mutex protection
- Distributes jobs to available workers
- Uses condition variables for efficient waiting
Processor - Job Executor
Processor - Job Executor
Location:
processor.hpp/cppExecutes individual jobs:- Atomically moves job from
ready/toprocessing/ - Reads prompt from job directory
- Calls Runner for inference
- Writes result and moves job to
output/orfailed/ - Thread-safe, shared across all workers
Runner - Inference Wrapper
Runner - Inference Wrapper
Location:
runner.hpp/cppWraps llama.cpp for LLM inference:- Each worker has its own Runner instance
- Shares loaded model across all workers
- Each Runner has dedicated inference context
- Handles token generation and sampling
Logger - Thread-Safe Logging
Logger - Thread-Safe Logging
Location:
logger.hpp/cppProvides structured logging across all components:- Thread-safe with mutex protection
- Configurable log levels (ERROR, WARN, INFO, DEBUG, TRACE)
- Named threads for easy debugging
- Errors to stderr, everything else to stdout
Directory Structure
The workspace directory acts as the state machine:Workflow: Job Submission
Workflow: Job Processing
Workflow: Result Retrieval
Key Design Decisions
Atomic Renames
Directory moves are atomic on POSIX filesystems, ensuring thread-safe state transitions without locks
Directory = State
Job’s location is its state - no database, no coordination needed
Shared Model
llama.cpp model loaded once in memory, shared across all workers
Per-Thread Context
Each worker gets its own inference context for true parallelism
Filesystem-Based
Survives process crashes, easy to inspect and debug
No Exceptions
All errors returned as values, no exception handling needed
Single Responsibility Principle
Each component has exactly one reason to change:- Scanner: Only discovers jobs in directories
- Pool: Only manages worker threads
- Processor: Only executes individual jobs
- Work: Only job submission and validation
- Flow: Only job result retrieval
See Also
Job Lifecycle
Detailed state machine and transitions
Filesystem Queue
Directory-based queue design
Threading Model
Concurrency and thread safety
CLI Tools
Command-line interface reference