Overview
In nrvna-ai, jobs move through a well-defined lifecycle represented by their location in the filesystem. The job’s directory path is the job’s state - no database or state tracking needed.Job States
Every job exists in exactly one of five states at any given time:STAGING - Being Created
STAGING - Being Created
Directory:
Status enum: Not visible to system yet
Duration: Milliseconds (during job creation)Jobs begin here during the submission process. The
input/writing/<job_id>/Status enum: Not visible to system yet
Duration: Milliseconds (during job creation)Jobs begin here during the submission process. The
writing/ directory acts as a staging area where jobs are assembled before becoming visible to the queue.What happens:- Work creates the job directory
- Prompt is written to
prompt.txt - Job is invisible to Scanner (not yet in
ready/) - Atomic rename moves job to QUEUED state
QUEUED - Waiting for Processing
QUEUED - Waiting for Processing
Directory:
Status enum:
Duration: Variable (depends on queue depth and worker availability)Jobs wait here until a worker becomes available. This is the main queue for the inference system.What happens:
input/ready/<job_id>/Status enum:
Status::QueuedDuration: Variable (depends on queue depth and worker availability)Jobs wait here until a worker becomes available. This is the main queue for the inference system.What happens:
- Scanner discovers job during periodic scan (every 1 second)
- Job ID submitted to Pool’s work queue
- Job waits for available worker thread
- When worker picks up job, Processor atomically moves it to RUNNING
Multiple jobs can be queued simultaneously. They’re processed in the order discovered by the Scanner.
RUNNING - Inference in Progress
RUNNING - Inference in Progress
Directory:
Status enum:
Duration: Variable (depends on prompt length and model speed)Jobs are actively being processed by a worker thread.What happens:
processing/<job_id>/Status enum:
Status::RunningDuration: Variable (depends on prompt length and model speed)Jobs are actively being processed by a worker thread.What happens:
- Worker’s Processor atomically renames job from
ready/toprocessing/ - Processor reads
prompt.txtfrom the job directory - Runner executes llama.cpp inference
- Tokens are generated and accumulated
- On completion, result written and job moved to DONE or FAILED
DONE - Completed Successfully
DONE - Completed Successfully
Directory:
Status enum:
Duration: Indefinite (until client retrieves or manually cleaned)Jobs that completed successfully end up here.What happens:
output/<job_id>/Status enum:
Status::DoneDuration: Indefinite (until client retrieves or manually cleaned)Jobs that completed successfully end up here.What happens:
- Processor writes inference result to
result.txt - Job atomically renamed from
processing/tooutput/ - Client can retrieve result using Flow::get()
- Job remains here until manually cleaned up
FAILED - Error Occurred
FAILED - Error Occurred
Directory:
Status enum:
Duration: Indefinite (until manually cleaned)Jobs that encountered errors during processing.What happens:Common failure reasons:
failed/<job_id>/Status enum:
Status::FailedDuration: Indefinite (until manually cleaned)Jobs that encountered errors during processing.What happens:
- Processor catches exception or error during inference
- Error message written to
error.txt - Job atomically renamed from
processing/tofailed/ - Client can retrieve error using Flow::get()
- Out of memory during inference
- Model file corruption
- Invalid prompt format
- Context length exceeded
MISSING - Not Found
MISSING - Not Found
Directory: None
Status enum:
Duration: N/AThe job ID doesn’t exist in any directory.Possible reasons:
Status enum:
Status::MissingDuration: N/AThe job ID doesn’t exist in any directory.Possible reasons:
- Invalid or typo’d job ID
- Job was manually deleted
- Job hasn’t been submitted yet
- Workspace was cleared
State Machine
The job lifecycle follows a strict state machine with atomic transitions:State Transitions
All state transitions are implemented as atomic directory renames:Status Detection
The Flow class determines job status by checking directory existence in order:The
Status enum is defined in types.hpp as a uint8_t for compact representation:Job Identifier Format
Job IDs are generated using a timestamp-based format:1736700000_12345_0
1736700000- Unix timestamp (seconds since epoch)12345- Process ID of the client0- Atomic counter within the process
- Uniqueness: Across processes and time
- Sortability: Chronological ordering
- Debuggability: Timestamp visible in the ID
Orphaned Job Recovery
When the server starts, it checks for orphaned jobs that were left inprocessing/ due to crashes:
Monitoring Job Progress
- Filesystem
- C++ API
- CLI
Monitor jobs by watching directory changes:
Best Practices
Don't Poll Too Aggressively
Poll every 1-5 seconds. Jobs typically take seconds to minutes to complete.
Clean Up Completed Jobs
Manually delete jobs from
output/ and failed/ to prevent disk buildup.Check for FAILED State
Always handle the FAILED state in your client code.
Never Modify Directories Manually
Let the system manage state transitions. Manual moves can cause race conditions.
See Also
Architecture
Overall system design and components
Filesystem Queue
Directory-based queue implementation
Work API
Job submission API reference
Flow API
Result retrieval API reference