Supported Models

llama.cpp supports a wide variety of LLM architectures, both text-only and multimodal models. Typically, finetunes of the base models listed below are also supported.

For instructions on adding support for new models, see the HOWTO-add-model.md guide in the llama.cpp repository.

Text-Only Models

The following text-generation models are fully supported for inference:

LLaMA Family

LLaMA, LLaMA 2, and LLaMA 3 - Meta’s foundational large language models

LLaMA (original 7B, 13B, 33B, 65B)
LLaMA 2 (7B, 13B, 70B)
LLaMA 3 (8B, 70B, and larger variants)

These models form the foundation of llama.cpp and provide excellent performance across various tasks.

Mistral and Mixtral

Mistral AI Models - High-performance open models

Mistral 7B - Efficient 7B parameter model
Mixtral MoE - Mixture of Experts architecture

Mistral models are known for their strong performance relative to model size.

Google Models

Gemma - Google’s open-source language models

Gemma - Available in multiple sizes
Optimized for efficiency and safety

Chinese Language Models

Specialized Chinese LLMs

Code Models

Specialized for Code Generation

Other Notable Models

Additional Supported Architectures

Falcon - TII UAE’s high-performance models
Phi models - Microsoft’s small language models
GPT-2 - OpenAI’s foundational model
BERT - Bidirectional encoder
Bloom - Multilingual model
MPT - MosaicML Pretrained Transformer
StableLM models
Deepseek models
GPT-NeoX + Pythia

Mixture of Experts (MoE)

MoE Architectures

State Space Models

Alternative Architectures

Mamba - State space model
FalconMamba Models
RWKV-6
RWKV-7

Small Language Models

Efficient Small Models

Complete Text Model List

For a comprehensive and up-to-date list of all supported text models, including:

Koala, Aquila, Vigogne (French)
InternLM2, Orion, Xverse
Command-R models, SEA-LION
GritLM, OLMo, OLMo 2
Poro, Smaug, Grok-1
Flan T5, Bitnet b1.58
Jais, Bielik, Trillion
Ling, LFM2, Hunyuan
And many more…

Visit the llama.cpp README for the complete list.

Multimodal Models

llama.cpp supports multimodal models that can process both text and images:

LLaVA Family

Vision-Language Models

LLaVA models combine vision encoders with language models for visual understanding tasks.

Other Vision Models

Additional Multimodal Architectures

Multimodal support in llama-server is documented in the multimodal documentation.

Model Compatibility

Finetunes

Most finetunes of the base models listed above are automatically supported. This includes:

Instruction-tuned variants (e.g., -Instruct, -Chat)
Domain-specific adaptations
LoRA-merged models
RLHF-trained variants

Format Requirements

All models must be in GGUF format to work with llama.cpp. Models in other formats (PyTorch, SafeTensors, etc.) need to be converted first. See Converting Models for details on the conversion process.

Finding Models

# Search for GGUF models on Hugging Face
https://huggingface.co/models?library=gguf&sort=trending

# Search for specific model families
https://huggingface.co/models?sort=trending&search=llama+gguf

Performance Considerations

Different model architectures have varying performance characteristics:

Smaller models (1B-7B): Run efficiently on consumer hardware, suitable for edge deployment
Medium models (13B-34B): Balance between capability and resource requirements
Large models (70B+): Require substantial VRAM or RAM, best quality results
MoE models: Larger parameter counts but efficient inference due to sparse activation

For optimal performance, consider using quantized models to reduce memory requirements while maintaining quality.

Get Started

Core Concepts

Inference

Models

Advanced

Text-Only Models

Complete Text Model List

Multimodal Models

Model Compatibility

Finetunes

Format Requirements

Finding Models

Performance Considerations

Get Started

Core Concepts

Inference

Models

Advanced

​Text-Only Models

​Complete Text Model List

​Multimodal Models

​Model Compatibility

​Finetunes

​Format Requirements

​Finding Models

​Performance Considerations

Text-Only Models

Complete Text Model List

Multimodal Models

Model Compatibility

Finetunes

Format Requirements

Finding Models

Performance Considerations