Run multiple daemons with different models and route jobs based on the task at hand.
Basic Setup
Start specialized daemons, each with its own workspace:
nrvnad qwen-vl.gguf ./ws-vision & # Vision model with mmproj auto-detected
nrvnad codellama.gguf ./ws-code & # Code-specialized model
nrvnad phi-3-mini.gguf ./ws-fast & # Fast general-purpose model
Each daemon listens on its own workspace directory.
Routing Logic
Route jobs based on file type or task characteristics:
classify() {
case "$1" in
*.jpg|*.png) echo "./ws-vision" ;;
*.py|*.js) echo "./ws-code" ;;
*) echo "./ws-fast" ;;
esac
}
# Submit to appropriate workspace
ws=$(classify "$input")
wrk "$ws" "Process this: $input"
Common Routing Patterns
By file extension
Route images to vision models, code to code models:case "$file" in
*.jpg|*.png|*.gif) ws="./ws-vision" ;;
*.py|*.js|*.cpp) ws="./ws-code" ;;
*) ws="./ws-default" ;;
esac
By task complexity
Use fast models for simple tasks, larger models for complex ones:if [ ${#prompt} -lt 100 ]; then
ws="./ws-fast"
else
ws="./ws-large"
fi
By latency requirements
Route real-time requests to fast models:if [ "$realtime" = "true" ]; then
ws="./ws-fast"
else
ws="./ws-quality"
fi
Vision Model Routing
Vision jobs are automatically serialized (mutex prevents parallel corruption):
# Submit multiple vision jobs
for img in *.jpg; do
wrk ./ws-vision "Describe this image" --image "$img"
done
Jobs queue up and process sequentially.
Load Balancing
Round-robin across multiple instances:
# Start multiple workers for the same model
nrvnad model.gguf ./ws-1 &
nrvnad model.gguf ./ws-2 &
nrvnad model.gguf ./ws-3 &
# Round-robin
count=0
workspaces=(./ws-1 ./ws-2 ./ws-3)
for task in "${tasks[@]}"; do
ws="${workspaces[$((count % 3))]}"
wrk "$ws" "$task"
((count++))
done
Model Configuration
Tune each daemon for its workload:
# Vision: large context, fewer workers (serialized anyway)
NRVNA_MAX_CTX=16384 NRVNA_WORKERS=2 nrvnad qwen-vl.gguf ./ws-vision &
# Code: moderate context, more workers
NRVNA_MAX_CTX=8192 NRVNA_WORKERS=6 nrvnad codellama.gguf ./ws-code &
# Fast: small context, max workers
NRVNA_MAX_CTX=4096 NRVNA_WORKERS=8 nrvnad phi-3-mini.gguf ./ws-fast &
Each workspace is independent. Monitor them separately with ls ./ws-*/output | wc -l.
Tips
- Workspace isolation — each daemon owns its workspace
- GPU sharing — limit
NRVNA_GPU_LAYERS per daemon to prevent OOM
- Model search path — use
NRVNA_MODELS_DIR to organize models
- Atomic routing — workspace choice is the routing decision