Synopsis
Description
Provides curated recommendations of the best models for your system based on fit quality, use case, runtime, and other filters. This is optimized for programmatic use and defaults to JSON output.Options
Number of recommendations to return.
Filter by use case category. Options:
general- General-purpose modelscoding(alias:code) - Code generation and analysisreasoning(alias:reason) - Complex reasoning taskschat- Conversational modelsmultimodal(alias:vision) - Vision and multimodalembedding(alias:embed) - Text embeddings
Filter by minimum fit level. Options:
perfect- Only perfect fitsgood- Good or bettermarginal- Marginal or better (default)
Filter by inference runtime. Options:
any- All runtimes (default)mlx- MLX only (Apple Silicon)llamacpp(aliases:llama.cpp,llama_cpp) - llama.cpp only
Output as JSON. Default is true for this command.
Override GPU VRAM size (e.g., “32G”, “32000M”, “1.5T”).
Cap context length used for memory estimation (tokens). Must be >= 1.
Usage Examples
Basic Recommendations
Filter by Use Case
Filter by Fit Level
Filter by Runtime
Combined Filters
Human-Readable Output
Example Output
JSON Format (Default)
Filtered by Use Case
Human-Readable Format
Filtering Behavior
Backend Compatibility
- MLX-only models are automatically hidden on non-Apple Silicon systems
- CUDA/ROCm models require appropriate GPU drivers
Fit Level Filtering
--min-fit perfect: Only models with Perfect fit level--min-fit good: Perfect and Good fit levels--min-fit marginal: Perfect, Good, and Marginal (default)- Too Tight models are excluded by default
Use Case Categories
- General: Versatile models for various tasks
- Coding: Code generation, analysis, completion
- Reasoning: Complex logical and mathematical reasoning
- Chat: Conversational and instruction-following
- Multimodal: Vision and image understanding
- Embedding: Text embedding generation
