Skip to main content

Philosophy: Separation of Concerns

Flower Engine deliberately splits the application into two independent processes:

The Brain (Python)

FastAPI BackendHandles AI, embeddings, and data persistence

The Face (Rust)

Ratatui TUIHandles rendering, input, and animations
This design choice stems from a fundamental truth: AI inference and UI rendering have opposing performance characteristics.
The Problem: LLM streaming can take 5-30 seconds per response. During this time, the UI must remain buttery smooth at 60+ FPS for scrolling, animations, and input handling.

Why Python for the Brain?

Python dominates the AI/ML ecosystem for good reasons:

Ecosystem Maturity

# From engine/llm.py:29-48
client = AsyncOpenAI(
    base_url=OPENAI_BASE_URL,
    api_key=OPENAI_API_KEY,
    default_headers={
        "HTTP-Referer": "https://github.com/ritz541/flower-engine",
        "X-Title": "The Flower Roleplay Engine",
    },
)

ds_client = AsyncOpenAI(
    base_url="https://api.deepseek.com", 
    api_key=DEEPSEEK_API_KEY
)

groq_client = AsyncOpenAI(
    base_url=GROQ_BASE_URL, 
    api_key=GROQ_API_KEY
)

if GEMINI_API_KEY:
    from google import genai
    gemini_client = genai.Client(api_key=GEMINI_API_KEY)
Four providers supported out-of-the-box with minimal code. Python’s AI library ecosystem is unmatched.

Async-Native Design

From engine/main.py:152-259, WebSocket handling leverages FastAPI’s async capabilities:
@app.websocket("/ws/rpc")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    await websocket.send_text(
        build_ws_payload("system_update", "✓ Engine ready.", {"status": "ok"})
    )
    await broadcast_sync_state(websocket)

    try:
        while True:
            data = await websocket.receive_text()
            # ... process commands or prompts
            
            # Stream LLM response
            task = asyncio.create_task(
                stream_chat_response(
                    websocket, prompt, context, 
                    world_id, char_id, session_id
                )
            )

            # Allow cancellation during streaming
            while not task.done():
                try:
                    raw = await asyncio.wait_for(
                        websocket.receive_text(), 
                        timeout=0.05
                    )
                    if cmd_msg.get("prompt") == "/cancel":
                        task.cancel()
                except asyncio.TimeoutError:
                    continue
The backend can:
  • Handle long-running LLM streams
  • Accept cancellation commands mid-stream
  • Manage multiple concurrent operations (RAG, DB writes, token streaming)
  • All without blocking

RAG with Minimal Boilerplate

Python makes vector search trivial:
# Chunked lore embedding at startup
for i, chunk in enumerate(chunks):
    rag_manager.add_lore(w.id, f"base_lore_{i}", chunk)

# Query at runtime (from engine/main.py:197-201)
lore_list, _ = rag_manager.query_lore(
    state.ACTIVE_WORLD_ID, prompt, n_results=2
)
mem_key = f"{state.ACTIVE_CHARACTER_ID}_{state.ACTIVE_SESSION_ID}"
mem_list, _ = rag_manager.query_memory(mem_key, prompt, n_results=3)
ChromaDB + SentenceTransformers provide production-grade RAG with ~10 lines of code. Embeddings run on CPU for maximum compatibility.

Why Rust for the Face?

Rust brings predictable performance to a domain (TUI rendering) where milliseconds matter.

Zero-Cost Abstractions

From tui/src/main.rs:19-52, the event loop is lean:
const TICK_RATE: Duration = Duration::from_millis(150);

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    enable_raw_mode()?;
    let mut stdout = io::stdout();
    execute!(stdout, EnterAlternateScreen)?;
    let backend = CrosstermBackend::new(stdout);
    let mut terminal = Terminal::new(backend)?;

    let mut app = App::new();

    let (tx_in, rx_in) = mpsc::unbounded_channel::<WsMessage>();
    let (tx_out, rx_out) = mpsc::unbounded_channel::<String>();

    // Spawn WebSocket client in background task
    tokio::spawn(async move {
        ws::start_ws_client(tx_in, rx_out).await;
    });

    let res = run_app(&mut terminal, &mut app, rx_in, tx_out).await;

    // Cleanup
    disable_raw_mode()?;
    execute!(terminal.backend_mut(), LeaveAlternateScreen)?;
    terminal.show_cursor()?;
    Ok(())
}
No garbage collection pauses. Every frame renders in less than 1ms with predictable latency.

Concurrent Event Handling

The tokio::select! macro elegantly handles three event sources:
loop {
    terminal.draw(|f| ui::draw(f, app))?;

    tokio::select! {
        // WebSocket messages from Python backend
        Some(msg) = rx_in.recv() => {
            match msg.event.as_str() {
                "chat_chunk" => { 
                    app.append_chunk(&msg.payload.content); 
                    if let Some(tps) = msg.payload.metadata.tokens_per_second {
                        app.tps = tps;
                    }
                }
                "sync_state" => { /* Update UI state */ }
                "chat_end" => { app.finish_stream(); }
                _ => {}
            }
        }
        
        // User keyboard input
        Some(Ok(event)) = reader.next().fuse() => {
            match event {
                Event::Key(key) => {
                    match key.code {
                        KeyCode::Esc => { /* Cancel or quit */ }
                        KeyCode::Enter => { /* Submit message */ }
                        KeyCode::Char(c) => { app.handle_char(c); }
                        _ => {}
                    }
                }
                _ => {}
            }
        }
        
        // Animation tick (spinner, cursor blink)
        _ = tokio::time::sleep(timeout).fuse() => {
            if app.is_typing {
                app.spinner_frame = (app.spinner_frame + 1) % 10;
            }
        }
    }
}
150ms tick rate balances smooth animations with CPU efficiency. The loop never blocks on any single operation.

Type-Safe Message Parsing

From tui/src/models.rs:1-51, WebSocket messages have compile-time guarantees:
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct WsMessage {
    pub event: String,
    pub payload: Payload,
}

#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct Payload {
    pub content: String,
    #[serde(default)]
    pub metadata: Metadata,
}

#[derive(Debug, Serialize, Deserialize, Clone, Default)]
pub struct Metadata {
    pub model: Option<String>,
    pub tokens_per_second: Option<f64>,
    pub world_id: Option<String>,
    pub total_tokens: Option<u32>,
    pub available_worlds: Option<Vec<EntityInfo>>,
    // ... 10+ optional fields
}
If the Python backend sends malformed JSON, serde_json::from_str fails gracefully:
if let Ok(ws_msg) = serde_json::from_str::<WsMessage>(&text) {
    tx.send(ws_msg).unwrap();
} else {
    eprintln!("Failed to parse WS message: {}", text);
}
The TUI never crashes on bad data.

Communication: The WebSocket Bridge

The two halves communicate via a JSON-based WebSocket protocol at ws://localhost:8000/ws/rpc.

Connection Lifecycle

1

Startup

Python backend starts FastAPI server on port 8000
2

TUI Launch

Rust TUI attempts connection with 1-second retry loop:
let ws_stream = loop {
    match connect_async(url.clone()).await {
        Ok((stream, _)) => break stream,
        Err(_) => {
            tokio::time::sleep(Duration::from_secs(1)).await;
        }
    }
};
3

Handshake

Backend sends system_update event: ”✓ Engine ready.”
4

State Sync

Backend sends sync_state with available worlds, characters, models, rules
5

Active Session

User sends prompts, receives chat_chunk streams
6

Shutdown

User presses Esc (quit), TUI closes connection

Why WebSocket over gRPC/REST?

Both client and server can send messages at any time. Essential for:
  • Token-by-token LLM streaming
  • Mid-stream cancellation commands
  • Real-time performance metrics (tokens/sec)
After initial handshake, messages are pure JSON with minimal framing. No HTTP headers on every message.
FastAPI has native WebSocket support:
@app.websocket("/ws/rpc")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
Tokio-tungstenite is the de facto Rust WebSocket client:
let (ws_stream, _) = connect_async(url).await?;
let (write, read) = ws_stream.split();

Performance Benefits

Comparison: Monolithic vs Split

MetricMonolithic PythonSplit Architecture
UI Frame Rate10-30 FPS (during LLM calls)60+ FPS (always)
Input Latency50-200ms (GC pauses)Less than 5ms (predictable)
Memory Usage800MB+ (Python overhead)400MB Python + 10MB Rust
Crash IsolationBackend crash kills UIUI survives backend restarts
Dev IterationRestart entire appRestart only changed process
Result: The TUI never stutters, even during 30-second LLM responses.

Real-World Latency

From tui/src/main.rs:124-139, token streaming is instant:
"chat_chunk" => {
    app.append_chunk(&msg.payload.content);  // <1ms
    if let Some(tps) = msg.payload.metadata.tokens_per_second { 
        app.tps = tps;  // Update metrics
    }
}
Each token appears in the TUI within 5ms after Python sends it (WebSocket latency + render time).

Crash Resilience

The split design provides isolation:
If Python crashes:
  1. Rust TUI detects WebSocket closure
  2. Displays “Connection lost” message
  3. Enters retry loop (1-second intervals)
  4. Reconnects when backend restarts
  5. State synced via sync_state event
User doesn’t lose their session (SQLite persists data).

Development Experience

Hot Reload

Both processes support live reloading:
# Terminal 1: Python backend with auto-reload
cd engine
uvicorn engine.main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Rust TUI (cargo watch for auto-rebuild)
cd tui
cargo watch -x run
Change a Python file → Backend reloads in ~2s → TUI reconnects automatically.Change a Rust file → TUI rebuilds in ~5s → No backend disruption.

Language-Appropriate Debugging

Python: Rich logging with context
from engine.logger import log

log.info(f"\n=== RETRIEVED LORE ({len(lore_list)} chunks) ===")
for i, chunk in enumerate(lore_list):
    log.info(f"[LORE {i+1}]\n{chunk}\n")
log.info(f"=== END LORE ===\n")
Rust: Compile-time error checking
// If metadata schema changes, compiler catches it
if let Some(tps) = msg.payload.metadata.tokens_per_second {
    app.tps = tps;  // Type error if field removed
}

Trade-offs

Complexity

Downside: You must manage two processes, ensure protocol compatibility, and handle connection failures.
Flower Engine mitigates this with:
  • start.sh script launches both processes
  • Auto-reconnect logic in TUI
  • Strict JSON schema (errors are logged, not crashed)

Deployment

Benefit: Each half can be deployed independently. Future possibilities:
  • Remote backend (TUI connects via ws://remote-server:8000)
  • Multiple TUI clients sharing one backend
  • Backend horizontal scaling (session-based routing)

When to Use This Pattern

The split-brain architecture excels when:

Mismatched Performance

One part is fast (UI), the other is slow (AI)

Different Ecosystems

Best libraries exist in different languages

Isolation Requirements

Crashes shouldn’t cascade between components

Independent Scaling

UI and backend have different resource needs
Avoid this pattern for simple CRUD apps or when real-time sync is critical (e.g., collaborative editing with less than 10ms latency requirements).

Next Steps

WebSocket Protocol

Learn the JSON message format

Architecture Overview

See the full system design

Development Guide

Set up your dev environment

System Rules

Understand narrative constraints

Build docs developers (and LLMs) love