The Sovereign AI principle
All inference is local. GenieHelper runs on an IONOS dedicated server (Ubuntu 24, 16 GB RAM, 1 TB NVMe). Ollama hosts three models on-device. The only cloud LLM dependency is the optional PowerAdmin bypass for admin accounts — and that is explicitly opt-in.
Three Ollama models
Each model has a specific role. They are not interchangeable.dolphin3:8b-llama3.1-q4_K_M
Orchestrator and tool planner. Handles multi-step reasoning, ACTION tag emission, and flow decomposition. Chosen for its instruction-following reliability on tool-use tasks.
dolphin-mistral:7b
Uncensored content writer. Drafts captions, fan messages, post concepts, and custom request responses. No corporate content policy. Runs in the creator’s voice.
qwen-2.5:latest
Primary AnythingLLM agent. Handles code generation, structured JSON output, and the main chat agent workspace. Default model across MCP tool calls.
Resource profile
| Metric | Value |
|---|---|
| RAM pinned at steady state | ~4.8 GB |
| Qwen 2.5 7B inference latency (CPU) | ~2–5 s/call |
| RAM ceiling (full server) | 16 GB |
| BullMQ queue concurrency | 1 (prevents RAM exhaustion) |
PowerAdmin Claude bypass
Admin accounts (admin_access: true) route through the Anthropic Claude API instead of Ollama. This is the only cloud LLM path in the system.
ANTHROPIC_API_KEY in server/.env. Regular creator accounts always use Ollama regardless of this key being present.
The PowerAdmin bypass exists for development and support workflows, not for production creator sessions. Creator data never touches the Claude API.
JIT skill hydration
GenieHelper maintains 191 procedural skills across 11 categories in a DuckDB skill graph (memory/core/agent_memory.duckdb). Skills are not preloaded into every session — that would exhaust the context window.
Instead, at the start of each session surgical_context.py runs stimulus propagation across the graph and surfaces only the top-N skills relevant to the current task:
Model selection flow
Subsystem guides
Local inference
Ollama installation, model management, context window budgeting, and JIT skill hydration in detail.
Action Runner
How the agent emits ACTION tags and how the deterministic execution layer intercepts and runs them.
MCP tools
The unified genie-mcp-server: 83 tools across 6 plugin namespaces.
Memory & retrieval
HyDE, RRF, synaptic propagation, Shannon entropy gating, and nightly Hebbian consolidation.