Model Library
Ollama hosts an extensive library of pre-trained models that are optimized for local execution. Browse the complete collection at ollama.com/library. Popular models include:- Llama - Meta’s family of models (Llama 2, Llama 3, Llama 3.1, Llama 3.2, Llama 4)
- Gemma - Google’s efficient language models (Gemma 1, Gemma 2, Gemma 3)
- Qwen - Alibaba’s Qwen family (Qwen 2, Qwen 3, Qwen 2.5-VL, Qwen 3-VL)
- Mistral - Mistral AI’s models (Mistral 1, Mistral 2, Mistral 3, Mixtral)
- DeepSeek - DeepSeek’s reasoning models (DeepSeek v3.1, DeepSeek R1)
- Phi - Microsoft’s small language models
- CodeLlama - Specialized for code generation
Model Naming Convention
Ollama uses a structured naming format to identify models:Components
The registry host where the model is stored. Typically omitted for official models.
The organization or user that published the model. Official models use
library.The model name (e.g.,
llama3, gemma3, qwen3).The model variant, typically indicating size or quantization (e.g.,
7b, 13b, 70b).Examples
Discovering Models
Using the CLI
List all locally available models:Using the API
List loaded models:Model Architecture Support
Ollama supports multiple model architectures through specialized backends:- Text Models
- Vision Models
- Embedding Models
- Specialized Models
- Llama - Including Llama 2, 3, 3.1, 3.2, and 4
- Mistral - Including Mistral 1, 2, 3, and Mixtral
- Gemma - Including Gemma 1, 2, and 3
- Qwen - Including Qwen 2, 3, and variants
- Phi - Microsoft’s Phi models
- DeepSeek - DeepSeek v3.1 and R1
- OLMo - Open Language Model
Model Format
Ollama uses the GGUF (GPT-Generated Unified Format) file format for storing model weights. GGUF is an efficient format designed for inference:- Quantization support - Models can be quantized to reduce memory usage
- Fast loading - Optimized for quick model loading
- Metadata - Includes model configuration and tokenizer data
- Cross-platform - Works across different operating systems
Model Size and Quantization
Models are available in different sizes and quantization levels:Quantization reduces model precision to decrease memory usage and improve speed, with minimal impact on quality.
Common Sizes
- 7B - 7 billion parameters (~4-8 GB RAM)
- 13B - 13 billion parameters (~8-16 GB RAM)
- 70B - 70 billion parameters (~40-80 GB RAM)
Quantization Levels
- Q4_0, Q4_1 - 4-bit quantization (smallest, fastest)
- Q5_0, Q5_1 - 5-bit quantization (balanced)
- Q8_0 - 8-bit quantization (larger, higher quality)
- F16 - 16-bit floating point (highest quality)
Creating Custom Models
You can create custom models from:Existing Models
Build on top of base models with custom configurations
Safetensors
Import models from Hugging Face and other sources
GGUF Files
Use pre-quantized GGUF model files
Model Discovery
Models are discovered from multiple sources:- Local models - Stored in
~/.ollama/models - Official library - Models from ollama.com/library
- Custom registries - Self-hosted model registries
- Cloud models - Remote models accessed via API
Model Storage Location
By default, models are stored in:- macOS
- Linux
- Windows
Model Details
View detailed information about a model:- Model architecture and family
- Parameter count and quantization level
- Template and system prompt
- Model parameters (temperature, context length, etc.)
- License information
Managing Models
Copy a Model
Create a copy with a different name:Delete a Model
Remove a model from local storage:Pull Updates
Update to the latest version:Running Models
Interactive Mode
Single Query
Via API
Model Capabilities
Different models support different capabilities:Vision
Vision
Models like Llama 3.2-Vision, Qwen 2.5-VL support image input and understanding.See Vision documentation for details.
Tool Calling
Tool Calling
Many models support function calling and tool use for agentic applications.See Tool Calling documentation for details.
Thinking/Reasoning
Thinking/Reasoning
Models like DeepSeek R1, GPT-OSS, Qwen 3 support extended reasoning.See Thinking documentation for details.
Embeddings
Embeddings
Specialized models generate embeddings for semantic search and similarity.See Embeddings documentation for details.
Best Practices
Start with smaller models (7B-13B) and increase size based on performance needs and available resources.
Next Steps
Modelfile
Learn how to create and customize models
Context & Memory
Understand how conversation context works
API Reference
Integrate models into your applications
CLI Reference
Master the command-line interface