Models - Ollama

Ollama provides access to a wide variety of open-source language models that can be run locally on your machine. Models are the core building blocks that power AI applications.

Model Library

Ollama hosts an extensive library of pre-trained models that are optimized for local execution. Browse the complete collection at ollama.com/library. Popular models include:

Llama - Meta’s family of models (Llama 2, Llama 3, Llama 3.1, Llama 3.2, Llama 4)
Gemma - Google’s efficient language models (Gemma 1, Gemma 2, Gemma 3)
Qwen - Alibaba’s Qwen family (Qwen 2, Qwen 3, Qwen 2.5-VL, Qwen 3-VL)
Mistral - Mistral AI’s models (Mistral 1, Mistral 2, Mistral 3, Mixtral)
DeepSeek - DeepSeek’s reasoning models (DeepSeek v3.1, DeepSeek R1)
Phi - Microsoft’s small language models
CodeLlama - Specialized for code generation

Model Naming Convention

Ollama uses a structured naming format to identify models:

[host/][namespace/]model[:tag]

Components

host

string

default:"registry.ollama.ai"

The registry host where the model is stored. Typically omitted for official models.

namespace

string

default:"library"

The organization or user that published the model. Official models use library.

model

string

required

The model name (e.g., llama3, gemma3, qwen3).

tag

string

default:"latest"

The model variant, typically indicating size or quantization (e.g., 7b, 13b, 70b).

Examples

registry.ollama.ai/library/llama3:8b

Discovering Models

Using the CLI

List all locally available models:

ollama list

Search for models in the library:

ollama search llama

Pull a model from the library:

ollama pull llama3:8b

Using the API

List loaded models:

curl http://localhost:11434/api/tags

Get details about a specific model:

curl http://localhost:11434/api/show -d '{
  "model": "llama3:8b"
}'

Model Architecture Support

Ollama supports multiple model architectures through specialized backends:

Text Models
Vision Models
Embedding Models
Specialized Models

Llama - Including Llama 2, 3, 3.1, 3.2, and 4
Mistral - Including Mistral 1, 2, 3, and Mixtral
Gemma - Including Gemma 1, 2, and 3
Qwen - Including Qwen 2, 3, and variants
Phi - Microsoft’s Phi models
DeepSeek - DeepSeek v3.1 and R1
OLMo - Open Language Model

Model Format

Ollama uses the GGUF (GPT-Generated Unified Format) file format for storing model weights. GGUF is an efficient format designed for inference:

Quantization support - Models can be quantized to reduce memory usage
Fast loading - Optimized for quick model loading
Metadata - Includes model configuration and tokenizer data
Cross-platform - Works across different operating systems

Model Size and Quantization

Models are available in different sizes and quantization levels:

Quantization reduces model precision to decrease memory usage and improve speed, with minimal impact on quality.

Common Sizes

7B - 7 billion parameters (~4-8 GB RAM)
13B - 13 billion parameters (~8-16 GB RAM)
70B - 70 billion parameters (~40-80 GB RAM)

Quantization Levels

Q4_0, Q4_1 - 4-bit quantization (smallest, fastest)
Q5_0, Q5_1 - 5-bit quantization (balanced)
Q8_0 - 8-bit quantization (larger, higher quality)
F16 - 16-bit floating point (highest quality)

Creating Custom Models

You can create custom models from:

Existing Models

Build on top of base models with custom configurations

Safetensors

Import models from Hugging Face and other sources

GGUF Files

Use pre-quantized GGUF model files

 ollama create mymodel -f ./Modelfile

See the Modelfile documentation for details on creating custom models.

Model Discovery

Models are discovered from multiple sources:

Local models - Stored in ~/.ollama/models
Official library - Models from ollama.com/library
Custom registries - Self-hosted model registries
Cloud models - Remote models accessed via API

Model Storage Location

By default, models are stored in:

macOS
Linux
Windows

~/.ollama/models

/usr/share/ollama/.ollama/models

C:\Users\<username>\.ollama\models

Model Details

View detailed information about a model:

ollama show llama3:8b

This displays:

Model architecture and family
Parameter count and quantization level
Template and system prompt
Model parameters (temperature, context length, etc.)
License information

ollama show --modelfile llama3:8b

Managing Models

Copy a Model

Create a copy with a different name:

ollama cp llama3:8b my-llama3

Delete a Model

Remove a model from local storage:

ollama rm llama3:8b

Pull Updates

Update to the latest version:

ollama pull llama3:8b

Running Models

Interactive Mode

ollama run llama3:8b

Single Query

ollama run llama3:8b "Explain quantum computing"

Via API

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:8b",
  "prompt": "Why is the sky blue?"
}'

Model Capabilities

Different models support different capabilities:

Vision

Models like Llama 3.2-Vision, Qwen 2.5-VL support image input and understanding.See Vision documentation for details.

Tool Calling

Many models support function calling and tool use for agentic applications.See Tool Calling documentation for details.

Thinking/Reasoning

Models like DeepSeek R1, GPT-OSS, Qwen 3 support extended reasoning.See Thinking documentation for details.

Embeddings

Specialized models generate embeddings for semantic search and similarity.See Embeddings documentation for details.

Best Practices

Always verify that you have sufficient RAM and VRAM before running large models. Use ollama ps to monitor resource usage.

Start with smaller models (7B-13B) and increase size based on performance needs and available resources.

Use quantized models (Q4, Q5) for better performance on consumer hardware while maintaining good quality.

Next Steps

Modelfile

Learn how to create and customize models

Context & Memory

Understand how conversation context works

API Reference

Integrate models into your applications

CLI Reference

Master the command-line interface

Get Started

Core Concepts

Features

Integrations

Platform Guides

Advanced

Resources

​Model Library

​Model Naming Convention

​Components

​Examples

​Discovering Models

​Using the CLI

​Using the API

​Model Architecture Support

​Model Format

​Model Size and Quantization

​Common Sizes

​Quantization Levels

​Creating Custom Models

Existing Models

Safetensors

GGUF Files

​Model Discovery

​Model Storage Location

​Model Details

​Managing Models

​Copy a Model

​Delete a Model

​Pull Updates

​Running Models

​Interactive Mode

​Single Query

​Via API

​Model Capabilities

​Best Practices

​Next Steps

Modelfile

Context & Memory

API Reference

CLI Reference

Build docs developers (and LLMs) love

Model Library

Model Naming Convention

Components

Examples

Discovering Models

Using the CLI

Using the API

Model Architecture Support

Model Format

Model Size and Quantization

Common Sizes

Quantization Levels

Creating Custom Models

Model Discovery

Model Storage Location

Model Details

Managing Models

Copy a Model

Delete a Model

Pull Updates

Running Models

Interactive Mode

Single Query

Via API

Model Capabilities

Best Practices

Next Steps