Overview
Theprime eval tui command launches an interactive terminal user interface for browsing evaluation results. It automatically discovers and displays all evaluation runs from your workspace.
Usage
Options
No required options. The TUI will auto-discover results from standard locations.What It Shows
The TUI provides a hierarchical browser:- Environment selection - All environments with completed evaluations
- Model selection - All models evaluated for that environment
- Run selection - All evaluation runs for that environment + model combo
- Rollout viewer - Individual prompts, completions, and metrics
Discovery
Results are discovered from:./outputs/evals/- Global output directory./environments/*/outputs/evals/- Per-environment output directories
results.jsonl- Rollout datametadata.json- Evaluation metadata
Navigation
Environment Selection Screen
↑/↓orj/k- NavigateEnter- Select environmentq- Quit
Model Selection Screen
↑/↓- NavigateEnter- Select modelborBackspace- Go backq- Quit
Run Selection Screen
↑/↓- Navigate runsEnter- View rollout detailsborBackspace- Go backq- Quit
Rollout Viewer
The main screen shows prompts, completions, and metrics side-by-side:←/→orh/l- Navigate between rolloutss- Search prompt/completion textc- Enter copy modeborBackspace- Go back to run listq- Quitd- Toggle dark/light theme
Search Mode
Presss to search within prompts and completions:
- Type to search (regex supported)
↑/↓- Navigate results←/→- Switch between prompt and completion resultsEnter- Jump to selected matchEsc- Close search
Copy Mode
Pressc to enter copy mode:
Tab- Switch between prompt and completion- Mouse drag or
Shift+Arrow- Select text c- Copy selected text to clipboardEsc- Exit copy modeq- Quit
Display Features
Message Formatting
Messages are formatted with role-based styling:- user messages - Standard text
- assistant messages -
assistant:prefix in bold - tool calls -
tool call:prefix with function name and arguments - tool results -
tool result:prefix in dimmed style - errors - Red text with
error:prefix
Metrics Display
The details panel shows:- Reward - Scalar reward from rubric (formatted to 3 decimals)
- Answer - Ground truth answer from task
- Info - Additional environment-specific data (formatted as JSON)
- Task - Full task data if available
Lazy Loading
Results are loaded lazily for performance:- File handles opened on-demand
- Lines read as needed
- Metadata count used when available
- Caching for already-read records
Themes
Toggle between dark and light themes withd:
- black-warm (default) - Dark theme with warm accent colors
- white-warm - Light theme with matching warm tones
Examples
Basic Usage
View Specific Results
The TUI automatically finds all results, so just launch it:Search for Patterns
- Launch TUI and navigate to a run
- Press
sto open search - Type a regex pattern (e.g.,
error|failed) - Navigate results with arrow keys
- Press
Enterto jump to a match
Copy Completions
- Navigate to a rollout
- Press
cto enter copy mode - Tab to completion column
- Select text with mouse or Shift+Arrow
- Press
cto copy to clipboard
File Locations
Results are saved byprime eval run --save-results to:
Performance
The TUI is optimized for large evaluations:- Lazy file reading - Only loads visible data
- Incremental parsing - Reads JSONL line-by-line
- Metadata caching - Avoids re-parsing metadata files
- Efficient rendering - Textual’s virtual DOM
Troubleshooting
No Evaluations Found
--save-results:
Corrupted Results
If results.jsonl is malformed, that rollout will show as{}.
Solution: Check the file manually:
Terminal Size
If the TUI appears cramped, resize your terminal:Search Not Working
Search uses regex with case-insensitive matching. Test your pattern:Keyboard Reference
Global
q- Quit applicationd- Toggle dark/light theme
Navigation Screens
↑/↓orj/k- Move selectionEnter- Select itemborBackspace- Go back one screen
Rollout Viewer
←/→orh/l- Previous/next rollouts- Open searchc- Enter copy modeborBackspace- Return to run list
Search Mode
- Type - Enter search pattern
↑/↓- Navigate results←/→- Switch prompt/completionEnter- Jump to selected matchEsc- Close search
Copy Mode
TaborShift+Tab- Switch column- Mouse drag - Select text
Shift+Arrow- Select text (keyboard)c- Copy to clipboardEsc- Exit copy mode
Tips
- Use search (
s) to quickly find errors or specific patterns - Copy mode (
c) allows extracting full completions for analysis - Results persist across runs - view historical evaluations anytime
- The TUI works great with tmux/screen for remote evaluation monitoring
- Use
--state-columnswhen running evals to save additional fields visible in the TUI