Skip to main content
nrvna-ai is an async inference primitive. You submit jobs, they process in the background, you collect results.
This guide assumes you’ve already installed nrvna-ai. If not, install it first.

Get a model

Download any GGUF model from HuggingFace and place it in ./models/:
mkdir -p models
# Example: download a small model for testing
# https://huggingface.co/TheBloke/phi-2-GGUF
Or set NRVNA_MODELS_DIR to point to your existing models directory:
export NRVNA_MODELS_DIR=/path/to/your/models

Start the daemon

1

Launch nrvnad

Start the daemon in interactive mode (recommended for first run):
nrvnad
This shows a dashboard with your models and workspaces. Pick a number to get started.Alternatively, start directly with a specific model and workspace:
nrvnad model.gguf workspace
Leave this running in your terminal.
2

Submit a job

In a new terminal, submit your first job:
wrk workspace "What is the capital of France?"
You’ll get a job ID back immediately:
abc123
The job is now processing in the background.
3

Collect the result

Retrieve your result using the job ID:
flw workspace abc123
Output:
The capital of France is Paris.
Use the -w flag to wait for a job that’s still processing:
flw workspace -w abc123

Submit and wait in one line

Pipe the job ID directly to flw:
wrk workspace "Hello" | xargs flw workspace -w
This submits the job and waits for the result in a single command.

Batch processing

Submit multiple jobs at once - they’ll process in parallel:
# Submit 3 jobs in parallel
wrk workspace "Explain quantum computing"
wrk workspace "Explain machine learning"
wrk workspace "Explain neural networks"
Each wrk call returns immediately with a job ID. The daemon processes all jobs concurrently.

How the filesystem works

Inspect the workspace to see jobs moving through states:
ls -R workspace/
You’ll see:
workspace/
├── input/ready/    ← queued jobs
├── processing/     ← jobs being worked
├── output/         ← completed results
└── failed/         ← errors
Jobs are directories. State is location. Transitions are atomic renames.
You can inspect jobs with standard Unix tools:
watch ls workspace/processing/  # monitor active jobs
cat workspace/output/abc123/result.txt  # read results directly

Common patterns

Agent loop

Feed results back as prompts:
for i in {1..5}; do
  result=$(wrk workspace "Continue: $memory" | xargs flw workspace -w)
  memory="$memory\n$result"
done

Fan-out / fan-in

Parallelize, then synthesize:
a=$(wrk workspace "Research: databases")
b=$(wrk workspace "Research: caching")
c=$(wrk workspace "Research: queuing")
wrk workspace "Synthesize: $(flw workspace $a) $(flw workspace $b) $(flw workspace $c)"

Multi-model routing

Run different models for different tasks:
nrvnad qwen-vl.gguf   ws-vision    # mmproj auto-detected
nrvnad codellama.gguf  ws-code
nrvnad phi-3.gguf      ws-fast

wrk ws-vision "Describe this" --image photo.jpg
wrk ws-code   "Refactor: $(cat main.py)"
wrk ws-fast   "Classify: bug or feature?"

Process images

Submit jobs with image inputs:
for img in photos/*.jpg; do
  wrk workspace "Caption this" --image "$img"
done

Next steps

Advanced patterns

See ADVANCED.md for batch, loops, and routing

Architecture

See ARCHITECTURE.md for internals and threading

wrk options

Run wrk --help to see all submission options

flw options

Run flw --help to see all collection options

Build docs developers (and LLMs) love