The split-brain architecture

Philosophy: Separation of Concerns

Flower Engine deliberately splits the application into two independent processes:

The Brain (Python)

FastAPI BackendHandles AI, embeddings, and data persistence

The Face (Rust)

Ratatui TUIHandles rendering, input, and animations

This design choice stems from a fundamental truth: AI inference and UI rendering have opposing performance characteristics.

The Problem: LLM streaming can take 5-30 seconds per response. During this time, the UI must remain buttery smooth at 60+ FPS for scrolling, animations, and input handling.

Why Python for the Brain?

Python dominates the AI/ML ecosystem for good reasons:

Ecosystem Maturity

# From engine/llm.py:29-48
client = AsyncOpenAI(
    base_url=OPENAI_BASE_URL,
    api_key=OPENAI_API_KEY,
    default_headers={
        "HTTP-Referer": "https://github.com/ritz541/flower-engine",
        "X-Title": "The Flower Roleplay Engine",
    },
)

ds_client = AsyncOpenAI(
    base_url="https://api.deepseek.com", 
    api_key=DEEPSEEK_API_KEY
)

groq_client = AsyncOpenAI(
    base_url=GROQ_BASE_URL, 
    api_key=GROQ_API_KEY
)

if GEMINI_API_KEY:
    from google import genai
    gemini_client = genai.Client(api_key=GEMINI_API_KEY)

Four providers supported out-of-the-box with minimal code. Python’s AI library ecosystem is unmatched.

Async-Native Design

From engine/main.py:152-259, WebSocket handling leverages FastAPI’s async capabilities:

@app.websocket("/ws/rpc")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    await websocket.send_text(
        build_ws_payload("system_update", "✓ Engine ready.", {"status": "ok"})
    )
    await broadcast_sync_state(websocket)

    try:
        while True:
            data = await websocket.receive_text()
            # ... process commands or prompts
            
            # Stream LLM response
            task = asyncio.create_task(
                stream_chat_response(
                    websocket, prompt, context, 
                    world_id, char_id, session_id
                )
            )

            # Allow cancellation during streaming
            while not task.done():
                try:
                    raw = await asyncio.wait_for(
                        websocket.receive_text(), 
                        timeout=0.05
                    )
                    if cmd_msg.get("prompt") == "/cancel":
                        task.cancel()
                except asyncio.TimeoutError:
                    continue

Why this matters

The backend can:

Handle long-running LLM streams
Accept cancellation commands mid-stream
Manage multiple concurrent operations (RAG, DB writes, token streaming)
All without blocking

RAG with Minimal Boilerplate

Python makes vector search trivial:

# Chunked lore embedding at startup
for i, chunk in enumerate(chunks):
    rag_manager.add_lore(w.id, f"base_lore_{i}", chunk)

# Query at runtime (from engine/main.py:197-201)
lore_list, _ = rag_manager.query_lore(
    state.ACTIVE_WORLD_ID, prompt, n_results=2
)
mem_key = f"{state.ACTIVE_CHARACTER_ID}_{state.ACTIVE_SESSION_ID}"
mem_list, _ = rag_manager.query_memory(mem_key, prompt, n_results=3)

ChromaDB + SentenceTransformers provide production-grade RAG with ~10 lines of code. Embeddings run on CPU for maximum compatibility.

Why Rust for the Face?

Rust brings predictable performance to a domain (TUI rendering) where milliseconds matter.

Zero-Cost Abstractions

From tui/src/main.rs:19-52, the event loop is lean:

const TICK_RATE: Duration = Duration::from_millis(150);

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    enable_raw_mode()?;
    let mut stdout = io::stdout();
    execute!(stdout, EnterAlternateScreen)?;
    let backend = CrosstermBackend::new(stdout);
    let mut terminal = Terminal::new(backend)?;

    let mut app = App::new();

    let (tx_in, rx_in) = mpsc::unbounded_channel::<WsMessage>();
    let (tx_out, rx_out) = mpsc::unbounded_channel::<String>();

    // Spawn WebSocket client in background task
    tokio::spawn(async move {
        ws::start_ws_client(tx_in, rx_out).await;
    });

    let res = run_app(&mut terminal, &mut app, rx_in, tx_out).await;

    // Cleanup
    disable_raw_mode()?;
    execute!(terminal.backend_mut(), LeaveAlternateScreen)?;
    terminal.show_cursor()?;
    Ok(())
}

No garbage collection pauses. Every frame renders in less than 1ms with predictable latency.

Concurrent Event Handling

The tokio::select! macro elegantly handles three event sources:

loop {
    terminal.draw(|f| ui::draw(f, app))?;

    tokio::select! {
        // WebSocket messages from Python backend
        Some(msg) = rx_in.recv() => {
            match msg.event.as_str() {
                "chat_chunk" => { 
                    app.append_chunk(&msg.payload.content); 
                    if let Some(tps) = msg.payload.metadata.tokens_per_second {
                        app.tps = tps;
                    }
                }
                "sync_state" => { /* Update UI state */ }
                "chat_end" => { app.finish_stream(); }
                _ => {}
            }
        }
        
        // User keyboard input
        Some(Ok(event)) = reader.next().fuse() => {
            match event {
                Event::Key(key) => {
                    match key.code {
                        KeyCode::Esc => { /* Cancel or quit */ }
                        KeyCode::Enter => { /* Submit message */ }
                        KeyCode::Char(c) => { app.handle_char(c); }
                        _ => {}
                    }
                }
                _ => {}
            }
        }
        
        // Animation tick (spinner, cursor blink)
        _ = tokio::time::sleep(timeout).fuse() => {
            if app.is_typing {
                app.spinner_frame = (app.spinner_frame + 1) % 10;
            }
        }
    }
}

150ms tick rate balances smooth animations with CPU efficiency. The loop never blocks on any single operation.

Type-Safe Message Parsing

From tui/src/models.rs:1-51, WebSocket messages have compile-time guarantees:

#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct WsMessage {
    pub event: String,
    pub payload: Payload,
}

#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct Payload {
    pub content: String,
    #[serde(default)]
    pub metadata: Metadata,
}

#[derive(Debug, Serialize, Deserialize, Clone, Default)]
pub struct Metadata {
    pub model: Option<String>,
    pub tokens_per_second: Option<f64>,
    pub world_id: Option<String>,
    pub total_tokens: Option<u32>,
    pub available_worlds: Option<Vec<EntityInfo>>,
    // ... 10+ optional fields
}

If the Python backend sends malformed JSON, serde_json::from_str fails gracefully:

if let Ok(ws_msg) = serde_json::from_str::<WsMessage>(&text) {
    tx.send(ws_msg).unwrap();
} else {
    eprintln!("Failed to parse WS message: {}", text);
}

The TUI never crashes on bad data.

Communication: The WebSocket Bridge

The two halves communicate via a JSON-based WebSocket protocol at ws://localhost:8000/ws/rpc.

Connection Lifecycle

Startup

Python backend starts FastAPI server on port 8000

TUI Launch

Rust TUI attempts connection with 1-second retry loop:

let ws_stream = loop {
    match connect_async(url.clone()).await {
        Ok((stream, _)) => break stream,
        Err(_) => {
            tokio::time::sleep(Duration::from_secs(1)).await;
        }
    }
};

Handshake

Backend sends system_update event: ”✓ Engine ready.”

State Sync

Backend sends sync_state with available worlds, characters, models, rules

Active Session

User sends prompts, receives chat_chunk streams

Shutdown

User presses Esc (quit), TUI closes connection

Why WebSocket over gRPC/REST?

Bidirectional streaming

Both client and server can send messages at any time. Essential for:

Token-by-token LLM streaming
Mid-stream cancellation commands
Real-time performance metrics (tokens/sec)

Low overhead

After initial handshake, messages are pure JSON with minimal framing. No HTTP headers on every message.

Simplicity

FastAPI has native WebSocket support:

@app.websocket("/ws/rpc")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()

Tokio-tungstenite is the de facto Rust WebSocket client:

let (ws_stream, _) = connect_async(url).await?;
let (write, read) = ws_stream.split();

Performance Benefits

Comparison: Monolithic vs Split

Metric	Monolithic Python	Split Architecture
UI Frame Rate	10-30 FPS (during LLM calls)	60+ FPS (always)
Input Latency	50-200ms (GC pauses)	Less than 5ms (predictable)
Memory Usage	800MB+ (Python overhead)	400MB Python + 10MB Rust
Crash Isolation	Backend crash kills UI	UI survives backend restarts
Dev Iteration	Restart entire app	Restart only changed process

Result: The TUI never stutters, even during 30-second LLM responses.

Real-World Latency

From tui/src/main.rs:124-139, token streaming is instant:

"chat_chunk" => {
    app.append_chunk(&msg.payload.content);  // <1ms
    if let Some(tps) = msg.payload.metadata.tokens_per_second { 
        app.tps = tps;  // Update metrics
    }
}

Each token appears in the TUI within 5ms after Python sends it (WebSocket latency + render time).

Crash Resilience

The split design provides isolation:

Backend Crash
Frontend Crash

If Python crashes:

Rust TUI detects WebSocket closure
Displays “Connection lost” message
Enters retry loop (1-second intervals)
Reconnects when backend restarts
State synced via sync_state event

User doesn’t lose their session (SQLite persists data).

Development Experience

Hot Reload

Both processes support live reloading:

# Terminal 1: Python backend with auto-reload
cd engine
uvicorn engine.main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Rust TUI (cargo watch for auto-rebuild)
cd tui
cargo watch -x run

Change a Python file → Backend reloads in ~2s → TUI reconnects automatically.Change a Rust file → TUI rebuilds in ~5s → No backend disruption.

Language-Appropriate Debugging

Python: Rich logging with context

from engine.logger import log

log.info(f"\n=== RETRIEVED LORE ({len(lore_list)} chunks) ===")
for i, chunk in enumerate(lore_list):
    log.info(f"[LORE {i+1}]\n{chunk}\n")
log.info(f"=== END LORE ===\n")

Rust: Compile-time error checking

// If metadata schema changes, compiler catches it
if let Some(tps) = msg.payload.metadata.tokens_per_second {
    app.tps = tps;  // Type error if field removed
}

Trade-offs

Complexity

Downside: You must manage two processes, ensure protocol compatibility, and handle connection failures.

Flower Engine mitigates this with:

start.sh script launches both processes
Auto-reconnect logic in TUI
Strict JSON schema (errors are logged, not crashed)

Deployment

Benefit: Each half can be deployed independently. Future possibilities:

Remote backend (TUI connects via ws://remote-server:8000)
Multiple TUI clients sharing one backend
Backend horizontal scaling (session-based routing)

When to Use This Pattern

The split-brain architecture excels when:

Mismatched Performance

One part is fast (UI), the other is slow (AI)

Different Ecosystems

Best libraries exist in different languages

Isolation Requirements

Crashes shouldn’t cascade between components

Independent Scaling

UI and backend have different resource needs

Avoid this pattern for simple CRUD apps or when real-time sync is critical (e.g., collaborative editing with less than 10ms latency requirements).

Next Steps

WebSocket Protocol

Learn the JSON message format

Architecture Overview

See the full system design

Development Guide

Set up your dev environment

System Rules

Understand narrative constraints

Get Started

Core Concepts

Guides

Advanced

The split-brain architecture

Philosophy: Separation of Concerns

The Brain (Python)

The Face (Rust)

Why Python for the Brain?

Ecosystem Maturity

Async-Native Design

RAG with Minimal Boilerplate

Why Rust for the Face?

Zero-Cost Abstractions

Concurrent Event Handling

Type-Safe Message Parsing

Communication: The WebSocket Bridge

Connection Lifecycle

Why WebSocket over gRPC/REST?

Performance Benefits

Comparison: Monolithic vs Split

Real-World Latency

Crash Resilience

Development Experience

Hot Reload

Language-Appropriate Debugging

Trade-offs

Complexity

Deployment

When to Use This Pattern

Mismatched Performance

Different Ecosystems

Isolation Requirements

Independent Scaling

Next Steps

WebSocket Protocol

Architecture Overview

Development Guide

System Rules

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Philosophy: Separation of Concerns

The Brain (Python)

The Face (Rust)

​Why Python for the Brain?

​Ecosystem Maturity

​Async-Native Design

​RAG with Minimal Boilerplate

​Why Rust for the Face?

​Zero-Cost Abstractions

​Concurrent Event Handling

​Type-Safe Message Parsing

​Communication: The WebSocket Bridge

​Connection Lifecycle

​Why WebSocket over gRPC/REST?

​Performance Benefits

​Comparison: Monolithic vs Split

​Real-World Latency

​Crash Resilience

​Development Experience

​Hot Reload

​Language-Appropriate Debugging

​Trade-offs

​Complexity

​Deployment

​When to Use This Pattern

Mismatched Performance

Different Ecosystems

Isolation Requirements

Independent Scaling

​Next Steps

WebSocket Protocol

Architecture Overview

Development Guide

System Rules

Build docs developers (and LLMs) love

Philosophy: Separation of Concerns

Why Python for the Brain?

Ecosystem Maturity

Async-Native Design

RAG with Minimal Boilerplate

Why Rust for the Face?

Zero-Cost Abstractions

Concurrent Event Handling

Type-Safe Message Parsing

Communication: The WebSocket Bridge

Connection Lifecycle

Why WebSocket over gRPC/REST?

Performance Benefits

Comparison: Monolithic vs Split

Real-World Latency

Crash Resilience

Development Experience

Hot Reload

Language-Appropriate Debugging

Trade-offs

Complexity

Deployment

When to Use This Pattern

Next Steps