POST /api/genie/stream-chat
Sends a message to the Genie agent and receives a real-time SSE response stream. The endpoint proxies through the AnythingLLM workspacestream-chat API, which runs the full agent pipeline including all MCP tools (Directus, Ollama, Stagehand, media processing, taxonomy).
Admin users with
admin_access=true are routed to the Claude API (Anthropic) instead of the local Ollama stack. This is the PowerAdmin bypass — for platform administrators, not for regular creators.Request
Body
The message to send to the agent. Cannot be empty or whitespace-only.
Optional session identifier for conversation continuity. If omitted, the server generates one scoped to the authenticated user:
genie-{userId}. Providing the same sessionId across requests preserves conversation history in the AnythingLLM workspace thread.Response
The response is an SSE stream (Content-Type: text/event-stream). The connection stays open until the full response is delivered, then closes. Each SSE event is a JSON-encoded payload on a data: line.
Event types
type field | When emitted | Purpose |
|---|---|---|
textResponseChunk | During streaming | Incremental text from the LLM |
textResponse | Single-shot (gate blocked) | Full response when onboarding is incomplete |
stage_update | During agent reasoning | UI stage state update (see below) |
finalizeResponseStream | End of stream | Signals the client to close the connection |
textResponseChunk
Emitted once per token or chunk as the LLM generates text.finalizeResponseStream
Always the last event. The client should close the connection after receiving this.stage_update
Stage update events are emitted when the agent transitions between reasoning stages. They drive the Stage UI (GenieHelperStageLayout) in the dashboard.
Human-readable label for the current agent task, rendered in the top status rail.
Agent mood state. Affects the visual tone of the Stage UI. Values include
focused, thinking, idle, error.Which content module to display in the Stage left panel.
Which content module to display in the Stage right panel.
The active skill or tool the agent is currently invoking.
Content gate
Before the agent runs, the endpoint checks the user’s onboarding state vianodeRag.getOnboardingState(). If onboarding is not complete, the stream returns a single blocked message instead of forwarding to the workspace.
phase in the user’s onboarding record:
| Phase | Gate message |
|---|---|
EXTENSION_INSTALL, DATA_COLLECTION | Directs user to the Setup tab to complete onboarding |
PROCESSING | Reports data ingestion progress: sources_ingested / sources_required |
COMPLETE | Gate lifted — agent runs normally |
JIT skill hydration
Before forwarding the message to the AnythingLLM workspace, the server injects two pieces of context:-
Node RAG context —
nodeRag.getNodeContext()fetches the top-15 weighted persona nodes for the user from theuser_nodescollection (5-minute TTL cache). This gives the agent the user’s voice, platform stats, and behavioral patterns without repeating them in every prompt. -
Secure identity context — a
[SECURE_CONTEXT]block is prepended with the user’sdirectus_user_id. All MCP tool calls that accept auser_idparameter must use this value. This prevents the LLM from fabricating or substituting a different user identity.
JIT skill hydration via
surgical_context.py runs at request time. The agent always operates with fresh persona context — there is no stale skill state.ACTION tag interception
The LLM’s response is scanned for[ACTION:slug:{"params"}] tags. When detected, the Action Runner (server/utils/actionRunner/) intercepts the tag and dispatches the corresponding flow deterministically — bypassing LLM tool-calling limitations.
| Slug | Purpose |
|---|---|
scout-analyze | Scout and analyze platform profile data |
taxonomy-tag | Tag content using the DuckDB taxonomy graph |
post-create | Create and queue a scheduled post |
message-generate | Generate a fan message response |
memory-recall | Retrieve relevant memory nodes |
media-process | Trigger a media processing job |
PowerAdmin bypass
When the authenticated user hasadmin_access=true in Directus and ANTHROPIC_API_KEY is set in the server environment, the request is routed directly to the Claude API instead of the AnythingLLM + Ollama stack.
In this mode:
- The response is streamed using the same SSE event format (
textResponseChunk+finalizeResponseStream) - The system prompt is the PowerAdmin prompt (full platform architecture context)
- Conversation history is maintained across turns within the same session
- The
X-Genie-Backend: clauderesponse header signals to the frontend which backend is active - Max tokens: 8192
Error responses
| Status | Body | Cause |
|---|---|---|
400 | { "error": "message required" } | Empty or missing message field |
401 | { "error": "Unauthorized" } | Missing or invalid Directus JWT |
503 | { "error": "Account setup incomplete..." } | User has no anythingllm_user_id — workspace provisioning failed |