prime eval tui

Overview

The prime eval tui command launches an interactive terminal user interface for browsing evaluation results. It automatically discovers and displays all evaluation runs from your workspace.

Usage

prime eval tui [OPTIONS]

Options

No required options. The TUI will auto-discover results from standard locations.

What It Shows

The TUI provides a hierarchical browser:

Environment selection - All environments with completed evaluations
Model selection - All models evaluated for that environment
Run selection - All evaluation runs for that environment + model combo
Rollout viewer - Individual prompts, completions, and metrics

Discovery

Results are discovered from:

./outputs/evals/ - Global output directory
./environments/*/outputs/evals/ - Per-environment output directories

Each run must have both:

results.jsonl - Rollout data
metadata.json - Evaluation metadata

Environment Selection Screen

┌─ Select Environment ─────────────────────────────────┐
│ gsm8k - Models: 2, Runs: 5                          │
│ math-python - Models: 1, Runs: 3                    │
│ alphabet-sort - Models: 2, Runs: 4                  │
└──────────────────────────────────────────────────────┘

q: Quit  Enter: Select

Keys:

↑/↓ or j/k - Navigate
Enter - Select environment
q - Quit

Model Selection Screen

┌─ Environment: gsm8k ─────────────────────────────────┐
│ Select Model                                         │
│ openai/gpt-4.1-mini - Runs: 3                       │
│ anthropic/claude-sonnet-4 - Runs: 2                 │
└──────────────────────────────────────────────────────┘

q: Quit  b: Back  Enter: Select

Keys:

↑/↓ - Navigate
Enter - Select model
b or Backspace - Go back
q - Quit

Run Selection Screen

┌─ Environment: gsm8k ─────────────────────────────────┐
│ Model: openai/gpt-4.1-mini                          │
│ Select Run                                           │
│ abc123 - 2026-03-03 14:23 | Reward: 0.867           │
│ def456 - 2026-03-02 10:15 | Reward: 0.823           │
│ ghi789 - 2026-03-01 16:47 | Reward: 0.891           │
└──────────────────────────────────────────────────────┘

┌─ Run Details ────────────────────────────────────────┐
│ Run ID: abc123                                       │
│ Environment: gsm8k                                   │
│ Model: openai/gpt-4.1-mini                          │
│ Avg reward: 0.867                                    │
│ Runtime: 2m 34.5s                                    │
└──────────────────────────────────────────────────────┘

q: Quit  b: Back  Enter: Select

Keys:

↑/↓ - Navigate runs
Enter - View rollout details
b or Backspace - Go back
q - Quit

Rollout Viewer

The main screen shows prompts, completions, and metrics side-by-side:

┌─ Metadata ───────────────────────────────────────────┐
│ Environment: gsm8k      Record: 1/30                 │
│ Model: openai/gpt-4.1-mini                          │
│ Run ID: abc123          Examples: 10                 │
│ Date: 2026-03-03 14:23  Rollouts/ex: 3              │
└──────────────────────────────────────────────────────┘

┌─ Prompt ─────────────┬─ Completion ──────────────────┐
│ user: What is 2+2?   │ assistant: The answer is 4.   │
│                      │                                │
│                      │ tool call: calculate          │
│                      │ {"expression": "2+2"}         │
│                      │                                │
│                      │ tool result: 4                │
│                      │                                │
│                      │ assistant: The result is 4.   │
└──────────────────────┴────────────────────────────────┘

┌─ Details ────────────────────────────────────────────┐
│ Reward: 1.000                                        │
│ Answer: 4                                            │
│ Info: {"calculation": "2+2=4"}                      │
└──────────────────────────────────────────────────────┘

q: Quit  b: Back  ←/→: Prev/Next  s: Search  c: Copy

Keys:

←/→ or h/l - Navigate between rollouts
s - Search prompt/completion text
c - Enter copy mode
b or Backspace - Go back to run list
q - Quit
d - Toggle dark/light theme

Search Mode

Press s to search within prompts and completions:

┌─ Search (regex, case-insensitive) ───────────────────┐
│ [calculate               ]                           │
└──────────────────────────────────────────────────────┘

┌─ Prompt results (0) ────┬─ Completion results (2) ───┐
│                          │   245 | tool call: calculate│
│                          │   312 | assistant: The calcu│
└──────────────────────────┴────────────────────────────┘

Esc: Close  Enter: Select  ←/→: Switch column

Keys:

Type to search (regex supported)
↑/↓ - Navigate results
←/→ - Switch between prompt and completion results
Enter - Jump to selected match
Esc - Close search

Matches are highlighted for 3 seconds after selection.

Copy Mode

Press c to enter copy mode:

┌─ Copy Mode ──────────────────────────────────────────┐
│ Tab: switch columns                                  │
│ Highlight text with mouse drag or Shift+Arrow       │
│ Esc: close                                           │
└──────────────────────────────────────────────────────┘

┌─ Prompt ─────────────┬─ Completion ──────────────────┐
│ user: What is 2+2?   │ assistant: The answer is 4.   │
│                      │                                │
│ [Text is selectable] │ [Text is selectable]          │
└──────────────────────┴────────────────────────────────┘

q: Quit  Tab: Next column  c: Copy  Esc: Close

Keys:

Tab - Switch between prompt and completion
Mouse drag or Shift+Arrow - Select text
c - Copy selected text to clipboard
Esc - Exit copy mode
q - Quit

Display Features

Message Formatting

Messages are formatted with role-based styling:

user messages - Standard text
assistant messages - assistant: prefix in bold
tool calls - tool call: prefix with function name and arguments
tool results - tool result: prefix in dimmed style
errors - Red text with error: prefix

Metrics Display

The details panel shows:

Reward - Scalar reward from rubric (formatted to 3 decimals)
Answer - Ground truth answer from task
Info - Additional environment-specific data (formatted as JSON)
Task - Full task data if available

Lazy Loading

Results are loaded lazily for performance:

File handles opened on-demand
Lines read as needed
Metadata count used when available
Caching for already-read records

This allows the TUI to handle evaluations with thousands of rollouts efficiently.

Themes

Toggle between dark and light themes with d:

black-warm (default) - Dark theme with warm accent colors
white-warm - Light theme with matching warm tones

Examples

Basic Usage

# Run an evaluation
prime eval run gsm8k -m gpt-4.1-mini -n 10 -s

# Launch TUI
prime eval tui

View Specific Results

The TUI automatically finds all results, so just launch it:

prime eval tui

Navigate to your environment → model → run.

Search for Patterns

Launch TUI and navigate to a run
Press s to open search
Type a regex pattern (e.g., error|failed)
Navigate results with arrow keys
Press Enter to jump to a match

Copy Completions

Navigate to a rollout
Press c to enter copy mode
Tab to completion column
Select text with mouse or Shift+Arrow
Press c to copy to clipboard

File Locations

Results are saved by prime eval run --save-results to:

./outputs/evals/
└── gsm8k--openai--gpt-4.1-mini/
    └── abc123/
        ├── results.jsonl
        └── metadata.json

Or per-environment:

./environments/gsm8k/outputs/evals/
└── gsm8k--openai--gpt-4.1-mini/
    └── abc123/
        ├── results.jsonl
        └── metadata.json

The TUI scans both locations.

Performance

The TUI is optimized for large evaluations:

Lazy file reading - Only loads visible data
Incremental parsing - Reads JSONL line-by-line
Metadata caching - Avoids re-parsing metadata files
Efficient rendering - Textual’s virtual DOM

Evaluations with 10,000+ rollouts are handled smoothly.

Troubleshooting

No Evaluations Found

┌─ Select Environment ─────────────────────────────────┐
│ No completed evals found                            │
└──────────────────────────────────────────────────────┘

Solution: Run an evaluation with --save-results:

prime eval run gsm8k -m gpt-4.1-mini -n 5 -s
prime eval tui

Corrupted Results

If results.jsonl is malformed, that rollout will show as {}. Solution: Check the file manually:

cat outputs/evals/gsm8k--openai--gpt-4.1-mini/abc123/results.jsonl | jq

Terminal Size

If the TUI appears cramped, resize your terminal:

resize -s 40 160  # 40 rows, 160 columns
prime eval tui

Recommended minimum: 24 rows × 80 columns

Search Not Working

Search uses regex with case-insensitive matching. Test your pattern:

echo "test string" | grep -iE 'pattern'

If the pattern is invalid, an error appears below the search box.

Keyboard Reference

Global

q - Quit application
d - Toggle dark/light theme

↑/↓ or j/k - Move selection
Enter - Select item
b or Backspace - Go back one screen

Rollout Viewer

←/→ or h/l - Previous/next rollout
s - Open search
c - Enter copy mode
b or Backspace - Return to run list

Search Mode

Type - Enter search pattern
↑/↓ - Navigate results
←/→ - Switch prompt/completion
Enter - Jump to selected match
Esc - Close search

Copy Mode

Tab or Shift+Tab - Switch column
Mouse drag - Select text
Shift+Arrow - Select text (keyboard)
c - Copy to clipboard
Esc - Exit copy mode

Tips

Use search (s) to quickly find errors or specific patterns
Copy mode (c) allows extracting full completions for analysis
Results persist across runs - view historical evaluations anytime
The TUI works great with tmux/screen for remote evaluation monitoring
Use --state-columns when running evals to save additional fields visible in the TUI

Prime CLI

Legacy Scripts

Overview

Usage

Options

What It Shows

Discovery

Navigation

Environment Selection Screen

Model Selection Screen

Run Selection Screen

Rollout Viewer

Search Mode

Copy Mode

Display Features

Message Formatting

Metrics Display

Lazy Loading

Themes

Examples

Basic Usage

View Specific Results

Search for Patterns

Copy Completions

File Locations

Performance

Troubleshooting

No Evaluations Found

Corrupted Results

Terminal Size

Search Not Working

Keyboard Reference

Global

Navigation Screens

Rollout Viewer

Search Mode

Copy Mode

Tips

Build docs developers (and LLMs) love

Prime CLI

Legacy Scripts

​Overview

​Usage

​Options

​What It Shows

​Discovery

​Navigation

​Environment Selection Screen

​Model Selection Screen

​Run Selection Screen

​Rollout Viewer

​Search Mode

​Copy Mode

​Display Features

​Message Formatting

​Metrics Display

​Lazy Loading

​Themes

​Examples

​Basic Usage

​View Specific Results

​Search for Patterns

​Copy Completions

​File Locations

​Performance

​Troubleshooting

​No Evaluations Found

​Corrupted Results

​Terminal Size

​Search Not Working

​Keyboard Reference

​Global

​Navigation Screens

​Rollout Viewer

​Search Mode

​Copy Mode

​Tips

Build docs developers (and LLMs) love

Overview

Usage

Options

What It Shows

Discovery

Navigation

Environment Selection Screen

Model Selection Screen

Run Selection Screen

Rollout Viewer

Search Mode

Copy Mode

Display Features

Message Formatting

Metrics Display

Lazy Loading

Themes

Examples

Basic Usage

View Specific Results

Search for Patterns

Copy Completions

File Locations

Performance

Troubleshooting

No Evaluations Found

Corrupted Results

Terminal Size

Search Not Working

Keyboard Reference

Global

Navigation Screens

Rollout Viewer

Search Mode

Copy Mode

Tips