Skip to main content

Overview

llama.cpp is a community-driven project that values high-quality contributions. This guide covers the contribution workflow, coding standards, and best practices for collaborating on the project.
The project has a strict AI usage policy. Pull requests that are fully or predominantly AI-generated are not accepted. See AI Usage Policy for details.

Contributor Levels

The project differentiates between three levels of contributors:
  • Contributors: People who have contributed before (no special privileges)
  • Collaborators (Triage): Contributors with significant contributions who may be responsible for some parts of the code and are expected to maintain and review contributions for the code they own
  • Maintainers: Responsible for reviewing and merging PRs after approval from code owners

AI Usage Policy

This project does not accept pull requests that are fully or predominantly AI-generated. AI tools may be utilized solely in an assistive capacity.
Code that is initially generated by AI and subsequently edited will still be considered AI-generated. AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized.

Requirements When Using AI

If AI is used to generate any portion of the code, contributors must:
1

Disclose AI usage

Explicitly disclose the manner in which AI was employed in your pull request description.
2

Manual review

Perform a comprehensive manual review prior to submitting the pull request.
3

Be prepared to explain

Be prepared to explain every line of code you submitted when asked by a maintainer.
4

No AI-written posts

Do not use AI to write bug reports, feature requests, pull request descriptions, GitHub discussions, or responses to humans.
For more information, refer to the AGENTS.md file in the repository.

Pull Request Workflow

Before Submitting Your PR

1

Search for existing PRs

Search for existing PRs to prevent duplicating efforts. Check both open and closed pull requests.
2

Understand ggml

llama.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider reviewing the examples in the ggml repository:
  • simple - bare minimum for using ggml
  • gpt-2 - minimal language model inference
  • mnist - training and evaluation example
3

Test your changes

Execute the full CI locally on your machine before publishing:
# Execute the full CI locally
bash ./ci/run.sh ./tmp/results ./tmp/mnt

# With CUDA support
GG_BUILD_CUDA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
Verify that perplexity and performance are not negatively affected:
# Check perplexity
llama-perplexity -m model.gguf -f test.txt

# Benchmark performance
llama-bench -m model.gguf
4

Test ggml modifications

If you modified the ggml source:
# Run backend operations test
test-backend-ops
This requires access to at least two different ggml backends to verify consistent results.If you modified a ggml operator or added a new one, add corresponding test cases to test-backend-ops.
5

Create focused PRs

  • Avoid combining unrelated changes in a single PR
  • For complex features, consider opening a feature request first to discuss and align expectations
  • When adding support for a new model or feature, focus on CPU support only in the initial PR unless you have a good reason not to
  • Add support for other backends like CUDA in follow-up PRs
6

Enable write access

Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly.

After Submitting Your PR

  • Expect modification requests: Maintainers will request changes to ensure code meets quality and maintainability standards
  • Be available for review: Maintainers will rely on your insights when making final approval decisions
  • Keep PR up to date: If your PR becomes stale, rebase it on top of latest master to get maintainers’ attention
  • Consider adding yourself to CODEOWNERS: Indicate your availability for fixing related issues and reviewing related PRs

Coding Guidelines

General Principles

  • Avoid adding third-party dependencies, extra files, or extra headers
  • Always consider cross-compatibility with other operating systems and architectures
  • Avoid fancy-looking modern STL constructs, use basic for loops, avoid templates, keep it simple
  • Vertical alignment makes things more readable and easier to batch edit
  • Clean up trailing whitespaces
  • Use 4 spaces for indentation
  • Brackets on the same line
  • Pointer/reference style: void * ptr, int & a

Data Types

// Use sized integer types in public APIs
int32_t process_tokens(const int32_t * tokens, size_t count);

// size_t is appropriate for allocation sizes or byte offsets
size_t buffer_size = ggml_tensor_size(tensor);

Struct Declarations

Declare structs with struct foo {} instead of typedef struct foo {} foo:
// Correct
struct llama_context {
    // ...
};

// In C++ code, omit optional struct and enum keywords
llama_context * ctx;  // OK
const llama_rope_type rope_type;  // OK

// Not recommended
struct llama_context * ctx;  // Not OK
const enum llama_rope_type rope_type;  // Not OK
This guideline is being applied to new code. Legacy code may not follow this convention yet.

Code Formatting

Try to follow existing patterns in the code. When in doubt, use clang-format (from clang-tools v15+) to format added code:
clang-format -i src/llama.cpp
For anything not covered in these guidelines, refer to the C++ Core Guidelines.

Tensor Operations

Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices.
Matrix multiplication is unconventional:
// C = ggml_mul_mat(ctx, A, B) means: C^T = A B^T ⟺ C = B A^T
struct ggml_tensor * C = ggml_mul_mat(ctx, A, B);
The dimensions in ggml are typically in the reverse order of pytorch dimensions.

Naming Guidelines

Function and Variable Names

Use snake_case for function, variable, and type names:
int token_count;
float temperature_value;
void process_tokens(llama_context * ctx);

Optimize for Longest Common Prefix

// Not recommended
int small_number;
int big_number;

// Recommended - easier to search and group
int number_small;
int number_big;

Enum Values

Enum values are always in upper case and prefixed with the enum name:
enum llama_vocab_type {
    LLAMA_VOCAB_TYPE_NONE = 0,
    LLAMA_VOCAB_TYPE_SPM  = 1,
    LLAMA_VOCAB_TYPE_BPE  = 2,
    LLAMA_VOCAB_TYPE_WPM  = 3,
};

Method Naming Pattern

The general naming pattern is <class>_<method>, with <method> being <action>_<noun>:
llama_model_init();           // class: "llama_model", method: "init"
llama_sampler_chain_remove(); // class: "llama_sampler_chain", method: "remove"
llama_sampler_get_seed();     // class: "llama_sampler", method: "get_seed"
llama_set_embeddings();       // class: "llama_context", method: "set_embeddings"
Guidelines:
  • The get action can be omitted
  • The noun can be omitted if not necessary
  • The _context suffix of the class is optional (use it to disambiguate when needed)
  • Use init/free for constructor/destructor actions

Opaque Types

Use the _t suffix when a type is supposed to be opaque to the user:
typedef struct llama_context * llama_context_t;

enum llama_pooling_type llama_pooling_type(const llama_context_t ctx);

File Naming

  • C/C++ filenames are all lowercase with dashes
  • Headers use the .h extension
  • Source files use the .c or .cpp extension
  • Python filenames are all lowercase with underscores

Code Maintenance

Code Ownership

Existing code should have designated collaborators and/or maintainers specified in the CODEOWNERS file responsible for:
  • Reviewing and merging related PRs
  • Fixing related bugs
  • Providing developer guidance/support

When Adding Large Code Changes

1

Add yourself to CODEOWNERS

If you are a collaborator, add yourself to CODEOWNERS to indicate your availability for reviewing related PRs.
2

Find a maintainer

If you are a contributor, find an existing collaborator willing to review and maintain your code long-term.
3

Provide CI workflow

Provide the necessary CI workflow (and hardware) to test your changes. See ci/README.md.
New code should follow the guidelines outlined in this document. For legacy reasons, existing code is not required to follow these guidelines.

Documentation

Documentation is a community effort:
  • When you need to look into source code to figure out how to use an API, consider adding a short summary to the header file for future reference
  • When you notice incorrect or outdated documentation, please update it
  • Document the “why” rather than the “what” when writing comments

For Maintainers

Merging Pull Requests

1

Squash-merge PRs

Always use squash-merge when merging pull requests.
2

Format commit title

Use the following format for the squashed commit title:
<module> : <commit title> (#<issue_number>)
Example: utils : fix typo in utils.py (#1234)Optionally pick a <module> from: https://github.com/ggml-org/llama.cpp/wiki/Modules
3

Let others merge their PRs

Let other maintainers merge their own PRs when possible.
4

Understand the changes

When merging a PR, make sure you have a good understanding of the changes.
5

Consider long-term maintenance

Be mindful that most work on a feature happens after the PR is merged. If the PR author is not committed to long-term contribution, someone else needs to take responsibility (potentially you).

Declining Pull Requests

Maintainers reserve the right to decline review or close pull requests for any reason, particularly when:
  • The proposed change is already mentioned in the roadmap or an existing issue and has been assigned to someone
  • The pull request duplicates an existing one
  • The contributor fails to adhere to this contributing guide

Resources

The GitHub issues, PRs, and discussions contain valuable information for getting familiar with the codebase. For convenience, important information is referenced from GitHub projects: https://github.com/ggml-org/llama.cpp/projects

Next Steps

Adding Models

Learn how to add new model architectures to llama.cpp

Testing

Understand the testing procedures and how to run tests