Skip to main content
GenieHelper is built entirely on open-source primitives running on a single dedicated server. There are no cloud LLM dependencies, no managed databases, and no paid CDNs. Every layer described below runs on the same IONOS machine.

LLM inference

Ollama (local)

All inference runs locally via Ollama. CPU-bound at approximately 4.8 GB RAM pinned for the active model. No requests leave the VPS.

Models

Three models are available depending on task type. Model selection is per-workspace and per-user-tier.
ModelQuantizationRole
dolphin3:8b-llama3.1-q4_K_MQ4_K_MOrchestrator, tool planning, ACTION tag emission
dolphin-mistral:7bdefaultUncensored content writer — captions, fan messages, post drafts
qwen2.5:latestdefaultPrimary AnythingLLM agent workspace model — code, JSON, structured tasks
CPU inference on Qwen 2.5 7B runs at approximately 2–5 seconds per call with ~4.8 GB RAM pinned. The poweradmin role bypasses Ollama entirely and routes to the Anthropic Claude API when ANTHROPIC_API_KEY is set in server/.env.

Agent orchestration

AnythingLLM (fork)server/ — port 3001 — pm2: anything-llm The GenieHelper fork of AnythingLLM provides the multi-user agent runtime. Key modifications from upstream:
  • Per-workspace isolation — every creator gets their own AnythingLLM workspace provisioned at registration time via workspaceProvisioner.js. User-generated and scraped content is injected only into the creator’s own workspace; global/onboarding docs go to the administrator workspace.
  • MCP auto-bootserver/utils/boot/index.js is patched to call bootMCPServers() on startup, which spawns genie-mcp-server as a stdio child process.
  • Unified auth — the upstream validatedRequest.js is patched to delegate to unifiedAuth.js, accepting Directus JWTs across all endpoints (both custom GenieHelper routes and AnythingLLM native routes).
  • System prompt hydrationworkspaceProvisioner.js fetches the prime_directive from the system_config Directus collection (5-minute TTL cache) and injects it as the workspace system prompt.

Unified MCP server

83 tools, 6 pluginsscripts/mcp/genie-mcp-server.mjs — stdio transport All agent tools are served by a single MCP server process. No per-tool process overhead, no duplicated auth. One plugin loader, six namespaced tool bundles loaded from storage/plugins/.
genie-mcp-server
├── cms.directus    (55 tools) — full Directus 11 API: CRUD, users, files, flows, fields,
│                               relations, roles, policies, permissions, settings, schema,
│                               activity, revisions
├── ai.ollama       (3 tools)  — generate, chat, list-models
├── web.stagehand   (9 tools)  — browser sessions, navigate, act, extract, observe,
│                               cookies, screenshot
├── media.process   (5 tools)  — validate (ffprobe), watermark, clip, thumbnail, job-status
├── taxonomy.core   (7 tools)  — search, tag-content, map-term, ingest-source,
│                               rebuild-graph, prune, strengthen
└── memory.recall   (4 tools)  — activate_skills, get_skill, find_skills, get_correlations
All Directus writes from server-side business logic must go through cms.directus MCP tools. Direct fetch() calls to Directus from endpoint handlers are not permitted. The only exceptions are register.js and rbacSync.js, which are pre-auth flows that require DIRECTUS_ADMIN_TOKEN.

CMS and data layer

Directus 11cms/ — port 8055 — pm2: agentx-cms Directus is the primary application data layer. All creator-owned collections are protected by item-level row filters (user_id=$CURRENT_USER) enforced at the Directus policy layer — data isolation does not depend on application logic.
AspectDetail
SDK patternComposable client: createDirectus() + rest() + authentication()
AuthJWT issued by Directus; validated in unifiedAuth.js
Admin tokenDIRECTUS_ADMIN_TOKEN — used only by register.js and rbacSync.js
Service tokenMCP_SERVICE_TOKEN — scoped token used by all other server-side reads/writes
Flows limitationThe request operation in Directus Flows returns {} — this is a known upstream bug. All flow-triggered operations are routed through the MCP server instead.

Primary database

PostgreSQL 16 — port 5432 — Docker container — database: impactgenie Three PostgreSQL extensions are in active use:

pgvector

Stores and queries dense embedding vectors for RAG. Used by AnythingLLM workspace vector stores and the taxonomy graph backbone.

AGE (Apache Graph Extension)

Graph traversal via Cypher queries. Powers the knowledge compression architecture — 17 GB flat data compressed into ~500 MB graph representation.

supabase_vault

Encrypted credential storage. Platform credentials (OnlyFans session tokens, Fansly cookies, etc.) are encrypted with AES-256-GCM before being written to the vault, and decrypted on read via credentialsCrypto.js.

Redis

Port 6379, Docker container. Backing store for BullMQ job queues and session cache.

Job queue

BullMQ + Redismedia-worker/ — pm2: media-worker concurrency:1 is a hard constraint, not a preference — it prevents RAM exhaustion on the 16 GB ceiling.
QueueJobsNotes
media-jobsconvert_image, resize_image, crop_image, resize_video, crop_video, compress_video, download_video, ytdlp_info, trim_videos, join_videos, apply_watermark, create_teaser, strip_metadata, apply_steganographic_watermarkFFmpeg/ImageMagick operations
scrape-jobsscrape_profile, publish_post, scrape_post_performanceBrowser automation via Stagehand
onboarding-jobsonboarding:* (5 handlers)LLM-heavy onboarding pipeline operations
The media worker was refactored on 2026-03-15: index.js (previously 2000+ lines) was split into operations/ modules — helpers.js, media.js, scrape.js, publish.js, onboarding.js. The index.js entry point is now ~310 lines covering only the OPERATIONS map, worker registration, and schedulers. Schedulers running inside the media worker process:
SchedulerIntervalPurpose
post_scheduler60 secondsPolls scheduled_posts for due items and enqueues publish jobs
scrape_scheduler6 hoursTriggers profile and stat scrapes for all connected platforms
job_monitor5 minutesDetects stalled jobs and triggers retry or failure notification

Browser automation

Stagehand (Playwright fork)stagehand/ — port 3002 — pm2: stagehand-server Stagehand wraps Playwright with an AI-assisted extraction layer. It handles platform scraping for OnlyFans, Fansly, Instagram, TikTok, and other platforms that block standard headless Chrome. HITL escalation — when Stagehand detects a login wall, CAPTCHA, or 2FA challenge it cannot resolve automatically, it writes a record to hitl_action_queue and emits a scrape_alert SSE event. The React SPA transitions the center stage to scrape_alert mode, prompting the creator to complete the action in the browser extension. Execution resumes when the HITL session is resolved. Available tools via web.stagehand MCP plugin: browser_session_create, browser_navigate, browser_act, browser_extract, browser_observe, browser_screenshot, browser_cookies_get, browser_cookies_set, browser_session_close.

Frontend

React 18 + Vite + Tailwind CSSdashboard/ — port 3100 — pm2: genie-dashboard The dashboard SPA is a static build served by serve dashboard/dist/. The agent-driven Stage architecture uses StageContext (a React reducer) to control layout — the AI opens panels, not the user.
AspectDetail
Build toolVite
StylingTailwind CSS (utility-first)
RouterReact Router v6
AuthDirectus JWT, auto-refresh, invite-gated registration
Agent widgetCustom AgentWidget component (replaced the upstream AnythingLLM embed widget)
StateStageContext.jsx — reducer-driven UI layout
Real-timesseClient.js + realtimeEventRouter.js — SSE events drive stage transitions
Route map:
VisibilityRoutes
Public/, /pricing, /about, /register, /login
Authenticated/app/dashboard, /app/media, /app/calendar, /app/fans, /app/analytics, /app/platforms, /app/settings
Admin/admin, /view-as

Skill graph

DuckDBmemory/core/agent_memory.duckdb
MetricValue
Total skills191
Graph nodes252
Graph edges12,880+
Categories11
Skills are not preloaded into the agent context window. At session start, surgical_context.py activate "<task>" runs stimulus propagation across the DuckDB graph and surfaces only the top-N skills relevant to the current task. This is called JIT (just-in-time) skill hydration. Skill operations are exposed via the memory.recall MCP plugin: activate_skills, get_skill, find_skills, get_correlations.

Taxonomy graph

JSON (pgvector-backed)Nodes/Universe/
MetricValue
Total nodes3,205
Super-concepts4
Backing storepgvector embeddings in PostgreSQL
The taxonomy is the creator-content classification system — a proprietary adult content ontology. Every piece of content, every fan interaction, and every content idea is tagged against this graph. Nodes are stored as JSON files in Nodes/Universe/ and their embeddings live in pgvector. Nightly Hebbian consolidation runs node-decay.mjs — recently activated nodes strengthen, dormant nodes decay. Cross-user FP-Growth pattern mining (in progress in memory/consolidation/cross_user/) will promote creator-specific patterns to the universal graph when they appear across multiple creator profiles.

Memory subsystem

Python + DuckDBmemory/ The memory subsystem is the core differentiator from standard RAG platforms. It implements four retrieval techniques that run in sequence:
TechniqueLocationWhat it does
HyDEmemory/retrieval/Generates a hypothetical ideal answer and embeds that instead of the raw query — improves retrieval precision for semantically distant queries
RRFmemory/retrieval/rrf/Fuses BM25 sparse term matching with dense pgvector similarity using Reciprocal Rank Fusion
Synaptic propagationmemory/retrieval/synaptic/Retrieved seed nodes activate connected taxonomy nodes via a Leaky Integrate-and-Fire neuron model; graph edges strengthen with use (Hebbian reinforcement)
Shannon entropy gatingmemory/retrieval/entropy/Before context injection, nodes are ranked by information entropy; high-redundancy nodes are evicted to maximize signal per token
CRAG (Corrective RAG) grades retrieved context for relevance before injection. Low-confidence retrievals trigger web search fallback or escalation to HITL rather than hallucinating.

Credential encryption

AES-256-GCMserver/utils/credentialsCrypto.js Platform credentials (session tokens, cookies, API keys for OnlyFans, Fansly, etc.) are encrypted per-user before storage and decrypted on read. Encryption is performed in the application layer before the data reaches PostgreSQL.
// Encrypt before storage
const encrypted = encryptJSON(plainCredentials, process.env.CREDENTIALS_ENC_KEY_B64);

// Decrypt on retrieval
const plain = decryptJSON(encrypted, process.env.CREDENTIALS_ENC_KEY_B64);
The key is a base64-encoded 256-bit value stored in server/.env as CREDENTIALS_ENC_KEY_B64. Losing this key means losing access to all stored credentials — back it up.

Infrastructure

AspectDetail
ProviderIONOS dedicated server
OSUbuntu 24
RAM16 GB (hard ceiling — all concurrency and memory allocation decisions must respect this)
Storage1 TB NVMe
Process managerPM2
Web serverNginx via Plesk
ContainersDocker Compose for Directus + PostgreSQL + Redis
LLM runtimeOllama (systemd service, not PM2) on port 11434
The 16 GB RAM ceiling is not a soft guideline. It directly determines BullMQ concurrency:1, the absence of a GPU (CPU inference only), and the decision to run all services on a single machine rather than splitting into microservices. Any architecture change that increases peak RAM usage must be evaluated against this ceiling before implementation.

Build docs developers (and LLMs) love