Model Selection
Use the/model command to configure which model Gemini CLI uses:
--model flag when starting Gemini CLI:
The
/model command and --model flag do not override the model used by sub-agents. You may see other models in your usage reports even when specifying a particular model.Model Options
Gemini CLI offers three main approaches to model selection:- Auto (Gemini 3)
- Auto (Gemini 2.5)
- Manual Selection
Auto (Gemini 3) - Recommended
Let the system automatically choose the best Gemini 3 model for your task.Available models:gemini-3-pro-preview- For complex reasoning tasksgemini-3-flash-preview- For fast, simple operations
- Most users and general-purpose development
- Projects with a mix of complex and simple tasks
- When you want optimal balance of speed and intelligence
- Building a web application (architecture planning + CSS generation)
- Debugging issues (complex analysis + quick file edits)
- Code reviews (deep understanding + formatting fixes)
Model Families
Pro Models
Pro models offer the highest levels of reasoning and creativity.When to Use Pro
- Complex multi-stage debugging
- Architectural design and planning
- Advanced code refactoring
- Deep codebase analysis
- Novel problem-solving requiring creativity
- Higher reasoning capabilities
- Better for complex tasks
- Slower response times
- Higher token costs
Flash Models
Flash models provide fast responses for simpler tasks.When to Use Flash
- Simple code generation
- Quick file edits
- Format conversions (JSON to YAML)
- Basic questions and explanations
- Rapid iteration on small changes
- Very fast response times
- Optimized for simple tasks
- Lower token costs
- Good for high-volume operations
Auto Mode (Recommended)
Auto mode intelligently selects between Pro and Flash based on task complexity.Why Use Auto
- System automatically matches model to task complexity
- Optimal balance of speed and intelligence
- Cost-effective for mixed workloads
- No manual switching required
- Simple tasks automatically use Flash for speed
- Complex tasks automatically use Pro for quality
- The system learns from task patterns
- You get the best model for each specific request
Model Context Windows
Gemini 3 models feature a 1M token context window, allowing you to:- Work with extremely large codebases
- Maintain very long conversations
- Include extensive documentation in context
- Process large files without truncation
Best Practices
Default to Auto
For most users, the Auto option provides the best experience:- Automatically optimizes for each task
- Balances speed and quality
- Handles mixed workloads efficiently
- Reduces cognitive overhead of model selection
Switch to Pro for Better Results
If Auto mode isn’t giving you the results you need:- Debugging complex, multi-component issues
- Designing system architecture
- Reverse engineering unfamiliar code
- Solving novel problems without clear patterns
- Working with complex business logic
Switch to Flash for Speed
For simple, repetitive tasks that need quick responses:- Converting between data formats
- Generating boilerplate code
- Making simple text edits
- Answering straightforward questions
- Performing bulk, simple operations
Model Configuration
You can configure your default model in several ways:Command-Line Flag
Specify a model when launching:Environment Variable
Set a default in your shell profile:Settings File
Configure in~/.gemini/settings.json:
Precedence Order
When multiple configurations exist, they’re applied in this order:--modelflag (highest priority)GEMINI_MODELenvironment variablemodel.namein settings.json- Default (
auto)
Model Fallback
Gemini CLI includes automatic model fallback for resilience:Model Failure Detected
If your selected model fails (quota exceeded, rate limiting, server errors), the CLI detects this automatically.
User Confirmation
You’re prompted to switch to a fallback model (unless configured for silent fallback).
Internal utility calls (prompt completion, classification) use silent fallback:
gemini-2.5-flash-lite → gemini-2.5-flash → gemini-2.5-pro without changing your configured model.Model Capabilities
All Gemini models support:Multimodal Input
Process text, images, PDFs, and audio files as input
Tool Calling
Execute tools for file operations, shell commands, and web access
Long Context
Handle up to 1M tokens in Gemini 3 models
Code Generation
Generate, analyze, and modify code across multiple languages
Quota and Pricing
Free Tier (Google Login)
When using Google Login (OAuth):- 60 requests/minute
- 1,000 requests/day
- Access to Gemini 3 models with 1M token context
- No API key management required
Gemini API Key
- 1,000 requests/day (free tier)
- Mix of Flash and Pro models
- Usage-based billing available for higher limits
- Model-specific pricing applies
Vertex AI
- Enterprise features and compliance
- Scalable with billing account
- Higher rate limits
- Integration with Google Cloud
Next Steps
How It Works
Understand the architecture and request flow
Tools
Learn about available tools and how to use them
Configuration
Explore all configuration options
Authentication
Set up different authentication methods