nrvna-ai uses a multi-threaded architecture to process multiple LLM inference jobs concurrently. The design separates job discovery, queue management, and execution into distinct threads with minimal synchronization overhead.
The threading model leverages atomic filesystem operations instead of mutexes wherever possible, reducing lock contention and complexity.
void Server::scanLoop() { setThreadName("Scanner"); while (!shutdown_.load()) { auto jobs = scanner_->scan(); for (const auto& job_id : jobs) { pool_->submit(job_id); LOG_DEBUG("Submitted job to pool: " + job_id); } std::this_thread::sleep_for(std::chrono::seconds(1)); }}
Scanner may submit the same job multiple times if it’s still in ready/ during the next scan. Workers handle this by attempting atomic rename - only one succeeds.
Responsibilities:
Wait on condition variable for jobs in queue
Pop job ID from thread-safe queue
Call Processor to execute job
Each worker has dedicated llama.cpp Runner instance
Process jobs until shutdown signal
Thread names: Worker-0, Worker-1, …, Worker-N
void Pool::workerLoop(int worker_id) { setThreadName("Worker-" + std::to_string(worker_id)); while (!shutdown_.load()) { std::unique_lock<std::mutex> lock(queueMutex_); // Wait for job or shutdown condition_.wait(lock, [this] { return !jobs_.empty() || shutdown_.load(); }); if (shutdown_.load()) break; auto job_id = jobs_.front(); jobs_.pop(); lock.unlock(); // Process job (outside the lock) processor_(job_id, worker_id); }}
The only mutex in the system protects the job queue in Pool:
class Pool {private: std::queue<JobId> jobs_; // Job queue std::mutex queueMutex_; // Protects jobs_ std::condition_variable condition_; // Notifies workers};
Operations:
void Pool::submit(const JobId& job_id) { std::lock_guard<std::mutex> lock(queueMutex_); jobs_.push(job_id); condition_.notify_one(); // Wake up one worker}
The mutex is held only during queue manipulation. Actual job processing happens outside the lock, so workers don’t block each other.
Simple boolean flags use std::atomic for lock-free reads/writes:
class Server {private: std::atomic<bool> running_{false}; // Server state std::atomic<bool> shutdown_{false}; // Shutdown signal};// Check running state (no lock needed)bool Server::isRunning() const noexcept { return running_.load();}// Signal shutdown (no lock needed)void Server::shutdown() noexcept { shutdown_.store(true);}
void Server::shutdown() noexcept { LOG_INFO("Shutting down server..."); shutdown_.store(true); // Signal all threads condition_.notify_all(); // Wake up all workers}
3
Scanner thread exits
while (!shutdown_.load()) { // Loop exits // ...}LOG_INFO("Scanner thread exiting");
4
Worker threads finish current jobs
void Pool::workerLoop(int worker_id) { while (!shutdown_.load()) { // ... process job ... } LOG_INFO("Worker-" + std::to_string(worker_id) + " exiting");}
5
Server destructor joins threads
Server::~Server() { if (scannerThread_.joinable()) { scannerThread_.join(); } // Pool destructor joins worker threads}
In-progress jobs are interrupted. Jobs in processing/ will be moved back to ready/ on next server start.