Overview
llama.cpp includes an extensive test suite covering unit tests, integration tests, and backend-specific tests. This guide covers how to build, run, and debug tests effectively.Before submitting a pull request, you should execute the full CI locally to ensure your changes don’t break existing functionality.
Quick Start
Build and Run All Tests
Run Specific Tests
Test Categories
llama.cpp has several categories of tests:Unit Tests
Unit Tests
C++ Unit Tests - Test individual components and functionsExamples:
test-tokenizer-0- Tokenizer validationtest-sampling- Sampling algorithmstest-grammar-parser- Grammar parsingtest-arg-parser- Command-line argument parsingtest-rope- Rotary position embeddingstest-quantize-fns- Quantization functions
tests/test-*.cppBackend Operations Tests
Backend Operations Tests
Backend Ops Tests - Verify consistency across different backends (CPU, CUDA, Metal, etc.)The
test-backend-ops tool checks that different backend implementations of ggml operators produce consistent results.Server Tests
Server Tests
Python-based Server Tests - Test the HTTP API server using pytestLocation:
tools/server/tests/See Server Testing section for details.Integration Tests
Integration Tests
End-to-End Tests - Test complete workflows with real modelsExamples:
test-chat- Chat template functionalitytest-chat-template- Chat template parsingtest-llama-archs- Model architecture loadingtest-thread-safety- Multi-threaded inference
Running the Full CI Locally
Before submitting a PR, execute the full CI locally:The CI runs comprehensive tests on different hardware configurations. Running it locally helps catch issues before submitting your PR.
Testing Modified Code
Testing ggml Modifications
If you modified theggml source, you must run test-backend-ops:
Run backend operations test
Testing Performance Impact
Verify your changes don’t negatively impact performance:Testing Perplexity
Ensure your changes don’t affect model quality:Debugging Tests
Using the debug-test.sh Script
Thescripts/debug-test.sh script provides an easy way to debug specific tests:
Manual Debugging Process
For more control, follow these steps:Debugging with Valgrind
Server Testing
The server has its own comprehensive test suite using Python and pytest.Setup Server Tests
Server Test Configuration
Environment variables for customizing server tests:| Variable | Description | Default |
|---|---|---|
PORT | Server listening port | 8080 |
LLAMA_SERVER_BIN_PATH | Path to server binary | ../../../build/bin/llama-server |
DEBUG | Enable verbose output | |
N_GPU_LAYERS | Layers to offload to GPU | |
LLAMA_CACHE | Model cache directory | tmp |
Running Specific Server Tests
Debugging Server Tests
Debug the server while running tests:The
DEBUG_EXTERNAL=1 environment variable tells the test suite to connect to an externally-started server instead of spawning its own.Test Structure and CMake
Understanding Test Registration
Tests are registered intests/CMakeLists.txt using helper functions:
Adding a New Test
Common Test Patterns
Testing with Models
Many tests require model files:Assertion Helpers
Use the testing helpers fromtests/testing.h:
Continuous Integration
llama.cpp uses GitHub Actions for CI/CD. The CI runs:- Unit tests on multiple platforms (Linux, macOS, Windows)
- Backend-specific tests (CUDA, Metal, SYCL)
- Integration tests with real models
- Performance benchmarks
- Code style checks
Best Practices
Test Before Submitting
Always run the full CI locally before opening a PR to catch issues early.
Add Tests for New Features
Every new feature should include corresponding tests to prevent regressions.
Test Multiple Backends
If modifying ggml operations, test on CPU, CUDA, and Metal backends.
Check Performance
Use llama-bench and llama-perplexity to verify no performance degradation.
Troubleshooting
Test Failures
Tests fail on CI but pass locally- Ensure you’re testing the same commit
- Check if it’s a platform-specific issue
- Verify model files are the same version
- Increase test timeout in CMakeLists.txt
- Check for infinite loops or deadlocks
- Run with smaller models for unit tests
- Check for race conditions in multi-threaded code
- Ensure tests don’t depend on external state
- Use fixed random seeds for reproducibility
Getting Help
- Check existing issues for similar problems
- Ask in GitHub Discussions
- Refer to debugging documentation
Next Steps
Contributing
Learn the full contribution workflow and guidelines
Adding Models
Understand how to add new model architectures

