Skip to main content

Split-Brain Design

Flower Engine uses a split-architecture approach that separates concerns between AI orchestration and user interface:
    [ THE FACE ]             [ THE BRAIN ]
    (Rust / Ratatui)         (Python / FastAPI)
          |                         |
    TUI Interface <--- WebSocket ---> LLM Orchestrator
          |            (JSON V1)            |
    Async Input                     RAG (ChromaDB)
    Event Loop                      SQLite Persistence
Why split? The architecture decouples fast UI rendering (Rust) from heavyweight AI operations (Python), ensuring the terminal interface stays responsive even during long LLM inference times.

Component Layers

The Brain (Python Backend)

The backend is built on FastAPI and handles all AI orchestration, data persistence, and narrative logic.

Core Responsibilities

  • Multi-provider support (OpenRouter, DeepSeek, Groq, Gemini)
  • Token streaming with real-time performance metrics
  • Dynamic model switching and pricing calculation
  • Provider-specific client routing
  • SQLite for sessions, characters, and worlds
  • ChromaDB for vector storage (RAG)
  • Session history with hot-swapping
  • Character and world asset management
  • RAG-based lore retrieval (top 2 chunks)
  • Recent memory injection (top 3 chunks)
  • Scene context on session start
  • Chunked lore embedding (800 char chunks)

Startup Sequence

From engine/main.py:26-149, the backend performs initialization:
@app.on_event("startup")
async def startup():
    # 1. Load YAML assets from disk
    for data in load_yaml_assets("assets/worlds/*.yaml"):
        w = World(...)
        world_manager.add_world(w)
        
        # 2. Chunk and embed lore for RAG
        if w.lore:
            chunks = []
            current_chunk = ""
            chunk_size = 800
            
            for line in w.lore.split('\n'):
                if len(current_chunk) + len(line) > chunk_size:
                    chunks.append(current_chunk.strip())
                    current_chunk = line + '\n'
                else:
                    current_chunk += line + '\n'
            
            for i, chunk in enumerate(chunks):
                rag_manager.add_lore(w.id, f"base_lore_{i}", chunk)

    # 3. Fetch available models from providers
    resp = await hc.get("https://openrouter.ai/api/v1/models")
    for m in resp.json().get("data", []):
        state.AVAILABLE_MODELS.append({...})
The backend requires at least one API key (OpenRouter, Groq, DeepSeek, or Gemini) to function. Models are fetched dynamically at startup.

The Face (Rust Frontend)

The TUI is built with Ratatui and Tokio, providing a blazingly fast, async terminal interface.

Event Loop Architecture

From tui/src/main.rs:54-286, the main loop uses tokio::select! for concurrent event handling:
loop {
    terminal.draw(|f| ui::draw(f, app))?;

    tokio::select! {
        // Process incoming WebSocket messages
        Some(msg) = rx_in.recv() => {
            match msg.event.as_str() {
                "sync_state" => { /* Update UI state */ }
                "chat_chunk" => { app.append_chunk(&msg.payload.content); }
                "chat_end" => { app.finish_stream(); }
                "error" => { /* Display error */ }
                _ => {}
            }
        }
        
        // Process terminal input (keystrokes)
        Some(Ok(event)) = reader.next().fuse() => {
            match event {
                Event::Key(key) => { /* Handle input */ }
                _ => {}
            }
        }
        
        // Animation tick (spinner, cursor)
        _ = tokio::time::sleep(timeout).fuse() => {
            if app.is_typing {
                app.spinner_frame = (app.spinner_frame + 1) % 10;
            }
        }
    }
}
150ms tick rate ensures smooth spinner animations and cursor blinking without consuming excessive CPU.

Connection Management

From tui/src/ws.rs:8-69, the WebSocket client implements auto-reconnect:
pub async fn start_ws_client(
    tx: mpsc::UnboundedSender<WsMessage>,
    mut rx_out: mpsc::UnboundedReceiver<String>,
) {
    let url = Url::parse("ws://localhost:8000/ws/rpc").unwrap();
    
    // Retry loop — Python backend may still be warming up
    let ws_stream = loop {
        match connect_async(url.clone()).await {
            Ok((stream, _)) => break stream,
            Err(_) => {
                tokio::time::sleep(Duration::from_secs(1)).await;
            }
        }
    };

    let (mut write, mut read) = ws_stream.split();
    // ... spawn read/write tasks
}

Data Flow

User Message Flow

1

Input Capture

User types message in Rust TUI and presses Enter
2

WebSocket Send

TUI sends JSON payload: {"prompt": "user message"}
3

Command Routing

Python backend checks if message starts with / for command handling
4

Database Save

User message saved to SQLite before LLM call
5

Context Building

  • RAG queries lore (top 2 chunks)
  • RAG queries memory (top 3 chunks)
  • Scene added if first message
6

LLM Streaming

  • System prompt + history + context sent to LLM
  • Tokens streamed back as chat_chunk events
7

Live Rendering

TUI appends each chunk to display with typewriter effect
8

Finalization

  • chat_end event signals completion
  • Assistant message saved to SQLite
  • Memory chunk added to RAG

Cancellation Flow

From engine/main.py:233-250, streaming can be interrupted:
while not task.done():
    try:
        raw = await asyncio.wait_for(websocket.receive_text(), timeout=0.05)
        cmd_msg = json.loads(raw)
        if cmd_msg.get("prompt") == "/cancel":
            task.cancel()
            await websocket.send_text(
                build_ws_payload("system_update", "✗ Stream cancelled by user.")
            )
    except asyncio.TimeoutError:
        continue
Press Esc during LLM response to cancel streaming. The TUI sends /cancel command, triggering asyncio.CancelledError.

System Requirements

Memory

4GB+ RAM requiredEmbeddings run on CPU using all-MiniLM-L6-v2 for maximum compatibility

Storage

~1GB disk spaceSetup optimized to avoid heavy CUDA libraries

Runtime

Python 3.12+Rust (stable)Latest versions recommended

Platform

Linux, macOSWindows via WSL2Native terminal support required

Performance Characteristics

Latency Breakdown

OperationTypical TimeNotes
WebSocket round-tripLess than 5msLocalhost connection
RAG query (lore)50-150msCPU embedding, 2 results
RAG query (memory)30-100msCPU embedding, 3 results
LLM first token200ms-2sProvider-dependent
Token streaming20-100 tokens/secModel-dependent
UI render frameLess than 1msRatatui efficiency
The Rust TUI maintains 60+ FPS during active streaming, ensuring smooth scrolling and animations.

Asset Structure

The engine loads configuration from YAML files at startup:
assets/
├── worlds/
│   └── *.yaml       # Setting, lore, start_message, system_prompt
├── characters/
│   └── *.yaml       # Player personas and backgrounds
└── rules/
    └── *.yaml       # Global narrative constraints

World YAML Schema

id: "cyberpunk_city"
name: "Neo-Tokyo 2077"
start_message: "Neon lights flicker as rain falls on chrome streets."
lore: |
  A sprawling megacity ruled by megacorporations...
  (Multi-paragraph world lore, chunked into 800-char segments)
system_prompt: "You are the Game Master for a cyberpunk noir scenario."
scene: "You stand in a rain-soaked alley, sirens wailing in the distance."
Lore chunking: Long lore text is automatically split into 800-character chunks with smart line-break handling, then embedded separately for RAG retrieval.

Next Steps

Split-Brain Deep Dive

Learn why Python and Rust work better apart

WebSocket Protocol

Master the JSON message format

System Rules

Understand hardcore narrative constraints

Quick Start

Set up your own instance

Build docs developers (and LLMs) love