Skip to main content
The TUI (Terminal User Interface) is llmfit’s default mode, providing an interactive, keyboard-driven interface for browsing models, filtering by fit level, searching, and downloading models directly to local runtime providers.

Launching TUI Mode

# Default mode - just run llmfit
llmfit

# With GPU memory override
llmfit --memory 24G

# With context length cap
llmfit --max-context 8192

# Both flags together
llmfit --memory 32G --max-context 16384

Interface Layout

The TUI is divided into four regions:
  1. System Bar (top) - Shows CPU, RAM, GPU hardware, and provider status (Ollama, MLX, llama.cpp)
  2. Search & Filters (second row) - Search box, provider/use-case filters, sort column, fit filter, availability filter, theme selector
  3. Model Table (main area) - Scrollable list of models with scores, quantization, memory usage, and fit indicators
  4. Status Bar (bottom) - Keybinding hints and download progress

Core Keybindings

↑ / ↓
keys
Navigate up/down through model list
j / k
keys
Vim-style navigation (down/up)
PgUp / PgDn
keys
Scroll by 10 rows
Ctrl-U / Ctrl-D
keys
Half-page scroll (up/down by 5 rows)
g / G
keys
Jump to top / bottom of list
Home / End
keys
Alternative keys for top/bottom navigation

Search & Filtering

/
key
Enter search mode. Type to filter models by name, provider, parameters, or use case. All terms must match (AND logic).
Esc / Enter
keys
Exit search mode (while in search)
Ctrl-U
key
Clear search query
f
key
Cycle fit filter: All → Runnable → Perfect → Good → Marginal → Too Tight → All
a
key
Cycle availability filter: All → GGUF Avail → Installed → All
s
key
Cycle sort column: Score → Params → Mem% → Ctx → Date → Use Case → Score
P
key
Open provider filter popup (capital P). Use ↑/↓ to navigate, Space/Enter to toggle, ‘a’ to select all.
U
key
Open use-case filter popup (capital U). Use ↑/↓ to navigate, Space/Enter to toggle, ‘a’ to select all.

Model Details & Planning

Enter
key
Toggle detail view for selected model. Shows full metadata, scoring breakdown, MoE architecture info, memory requirements, GGUF sources, and installation status.
p
key
Open Plan mode for selected model (hardware planning). Allows editing context length, quantization, and target TPS to estimate required hardware.

Provider Integration

d
key
Download selected model. Opens provider picker if multiple providers are available (Ollama vs llama.cpp). Shows animated progress indicator during download.
r
key
Refresh installed models from all runtime providers (Ollama, MLX, llama.cpp)
i
key
Toggle installed-first sorting. When enabled, models detected in any runtime provider appear at the top.

Display Options

t
key
Cycle color theme: Default → Dracula → Solarized → Nord → Monokai → Gruvbox → Default. Theme selection is saved automatically to ~/.config/llmfit/theme.

Exit

q / Esc
keys
Quit TUI (or close detail view if open)

Search Mode

Press / to enter search mode. The search box border turns highlighted, and you can type to filter models. Search features:
  • Partial matching across model name, provider, parameter count, use case, and category
  • Multiple terms (space-separated) use AND logic - all terms must be present
  • Case-insensitive
  • Real-time filtering as you type
  • Navigate results with ↑/↓ while in search mode
Examples:
llama 8b        # Matches "Llama-3.1-8B", "Llama-3.2-8B", etc.
coding qwen     # Matches Qwen models with "coding" use case
mistral 7b      # Matches Mistral 7B variants
Press Esc or Enter to exit search mode. Press Ctrl-U to clear the search.

Plan Mode

Plan mode inverts normal fit analysis: instead of “what fits my hardware?”, it estimates “what hardware is needed for this model config?” Entering Plan Mode:
  1. Navigate to a model row
  2. Press p
Editable Fields:
KeyAction
Tab / j / Move to next field
Shift-Tab / k / Move to previous field
/ Move cursor within current field
TypeEdit current field (digits only for Context/TPS, alphanumeric for Quant)
Backspace / DeleteRemove characters
Ctrl-UClear current field
Esc / qExit Plan mode
Fields:
Context
number
required
Context length in tokens (e.g., 8192, 16384). Affects memory estimation.
Quant
string
Quantization override (e.g., Q4_K_M, Q8_0, mlx-4bit). Leave empty for auto-selection.
Target TPS
number
Target decode speed in tokens/second. Used to recommend GPU memory bandwidth.
Plan Output:
  • Minimum Hardware: VRAM/RAM/CPU cores needed to run the model
  • Recommended Hardware: Specs for optimal performance
  • Run Paths: Feasibility of GPU, CPU+GPU offload, and CPU-only modes with estimated TPS and fit level
  • Upgrade Deltas: Specific hardware changes needed to reach better fit targets

Provider Filter Popup

Press P (capital P) to open the provider filter popup. Controls:
  • / or j / k: Navigate
  • Space / Enter: Toggle checkbox for current provider
  • a: Toggle all providers (select all / deselect all)
  • Esc / P / q: Close popup
Display:
  • [x] indicates provider is enabled
  • [ ] indicates provider is disabled
  • Active count shown in title: “Providers (N/Total)”
  • Selected row highlighted
Filtering is applied immediately when you toggle providers.

Use-Case Filter Popup

Press U (capital U) to open the use-case filter popup. Controls:
  • / or j / k: Navigate
  • Space / Enter: Toggle checkbox for current use case
  • a: Toggle all use cases (select all / deselect all)
  • Esc / U / q: Close popup
Available Use Cases:
  • General
  • Coding
  • Reasoning
  • Chat
  • Multimodal
  • Embedding

Download Functionality

Press d on any model to download it to a local runtime provider. Provider Selection: If multiple providers are available, a popup appears:
  1. Ollama: Pulls via Ollama API (ollama pull <tag>)
  2. llama.cpp: Downloads GGUF from HuggingFace to local cache
Use / to select provider, Enter to confirm. Download Progress:
  • Progress bar appears in the “Inst” column for the downloading model
  • Animated spinner shows activity
  • Percentage displayed when available (Ollama and llama.cpp)
  • Status message shown in status bar
  • Row highlighted during download
Install Detection: The “Inst” column shows:
  • - Model installed in at least one provider
  • O - Available via Ollama only
  • L - Available via llama.cpp only
  • OL - Available via both Ollama and llama.cpp
  • - Checking availability (background probe)
  • - Not available for download
  • Animated progress indicator - Currently downloading

Fit Filter Modes

FilterDescription
AllShows all models regardless of fit
RunnablePerfect + Good + Marginal (excludes Too Tight)
PerfectOnly models that meet recommended VRAM/RAM on GPU
GoodModels that fit with headroom (GPU, MoE offload, or CPU+GPU)
MarginalTight fit or CPU-only (CPU-only always caps at Marginal)
Too TightModels that don’t fit in VRAM or system RAM
Cycle with the f key.

Availability Filter Modes

FilterDescription
AllShows all models
GGUF AvailOnly models with known GGUF download sources (unsloth, bartowski, etc.)
InstalledOnly models already installed in Ollama, MLX, or llama.cpp
Cycle with the a key.

Sort Columns

ColumnDescription
ScoreComposite ranking (Quality + Speed + Fit + Context) weighted by use case
ParamsModel parameter count (ascending: smallest first)
Mem%Memory utilization percentage (ascending: most efficient first)
CtxContext window length (descending: largest first)
DateRelease date (descending: newest first)
Use CaseGrouped by use-case category (General, Coding, Reasoning, etc.)
Cycle with the s key. Current sort column is indicated with in the table header.

Themes

Press t to cycle through six built-in color themes:
ThemeDescription
DefaultOriginal llmfit colors (blue/cyan accents, balanced contrast)
DraculaDark purple background with pastel accents (popular in IDEs)
SolarizedEthan Schoonover’s Solarized Dark palette (warm, low-contrast)
NordArctic cool blue-gray tones (minimal, frosty)
MonokaiMonokai Pro warm syntax colors (yellow/orange accents)
GruvboxRetro groove palette with warm earth tones (brown/orange)
Your theme selection is saved to ~/.config/llmfit/theme and restored on next launch.

Environment Variables

OLLAMA_HOST
string
Ollama API URL (default: http://localhost:11434). Set to connect to remote Ollama instances.Example: OLLAMA_HOST="http://192.168.1.100:11434" llmfit
OLLAMA_CONTEXT_LENGTH
integer
Context length fallback for memory estimation when --max-context is not set.Example: OLLAMA_CONTEXT_LENGTH=8192 llmfit

Tips

Fast navigation: Use g / G to jump to top/bottom, then use j / k for fine control. Combine with search (/) to quickly find specific models.
Multi-term search: Search for “coding 14b” to find all 14B parameter coding models. All terms must match.
Plan mode workflow: Press p on a model, adjust context to your workload (e.g., 32k for long documents), and see if you need more VRAM/RAM.
Install detection latency: The TUI probes download availability in the background. The “Inst” column may show briefly while checking. This is normal and non-blocking.
Ollama, MLX, or llama.cpp must be installed for download (d) and refresh (r) functionality. The TUI works without them, but provider-specific features will be disabled.

Build docs developers (and LLMs) love