Contributing to llama.cpp

Overview

llama.cpp is a community-driven project that values high-quality contributions. This guide covers the contribution workflow, coding standards, and best practices for collaborating on the project.

The project has a strict AI usage policy. Pull requests that are fully or predominantly AI-generated are not accepted. See AI Usage Policy for details.

Contributor Levels

The project differentiates between three levels of contributors:

Contributors: People who have contributed before (no special privileges)
Collaborators (Triage): Contributors with significant contributions who may be responsible for some parts of the code and are expected to maintain and review contributions for the code they own
Maintainers: Responsible for reviewing and merging PRs after approval from code owners

AI Usage Policy

This project does not accept pull requests that are fully or predominantly AI-generated. AI tools may be utilized solely in an assistive capacity.

Code that is initially generated by AI and subsequently edited will still be considered AI-generated. AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized.

Requirements When Using AI

If AI is used to generate any portion of the code, contributors must:

Disclose AI usage

Explicitly disclose the manner in which AI was employed in your pull request description.

Manual review

Perform a comprehensive manual review prior to submitting the pull request.

Be prepared to explain

Be prepared to explain every line of code you submitted when asked by a maintainer.

No AI-written posts

Do not use AI to write bug reports, feature requests, pull request descriptions, GitHub discussions, or responses to humans.

For more information, refer to the AGENTS.md file in the repository.

Pull Request Workflow

Before Submitting Your PR

Search for existing PRs

Search for existing PRs to prevent duplicating efforts. Check both open and closed pull requests.

Understand ggml

llama.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider reviewing the examples in the ggml repository:

simple - bare minimum for using ggml
gpt-2 - minimal language model inference
mnist - training and evaluation example

Test your changes

Execute the full CI locally on your machine before publishing:

# Execute the full CI locally
bash ./ci/run.sh ./tmp/results ./tmp/mnt

# With CUDA support
GG_BUILD_CUDA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt

Verify that perplexity and performance are not negatively affected:

# Check perplexity
llama-perplexity -m model.gguf -f test.txt

# Benchmark performance
llama-bench -m model.gguf

Test ggml modifications

If you modified the ggml source:

# Run backend operations test
test-backend-ops

This requires access to at least two different ggml backends to verify consistent results.If you modified a ggml operator or added a new one, add corresponding test cases to test-backend-ops.

Create focused PRs

Avoid combining unrelated changes in a single PR
For complex features, consider opening a feature request first to discuss and align expectations
When adding support for a new model or feature, focus on CPU support only in the initial PR unless you have a good reason not to
Add support for other backends like CUDA in follow-up PRs

Enable write access

Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly.

After Submitting Your PR

Expect modification requests: Maintainers will request changes to ensure code meets quality and maintainability standards
Be available for review: Maintainers will rely on your insights when making final approval decisions
Keep PR up to date: If your PR becomes stale, rebase it on top of latest master to get maintainers’ attention
Consider adding yourself to CODEOWNERS: Indicate your availability for fixing related issues and reviewing related PRs

Coding Guidelines

General Principles

Code Style Fundamentals

Avoid adding third-party dependencies, extra files, or extra headers
Always consider cross-compatibility with other operating systems and architectures
Avoid fancy-looking modern STL constructs, use basic for loops, avoid templates, keep it simple
Vertical alignment makes things more readable and easier to batch edit
Clean up trailing whitespaces
Use 4 spaces for indentation
Brackets on the same line
Pointer/reference style: void * ptr, int & a

Data Types

// Use sized integer types in public APIs
int32_t process_tokens(const int32_t * tokens, size_t count);

// size_t is appropriate for allocation sizes or byte offsets
size_t buffer_size = ggml_tensor_size(tensor);

Struct Declarations

Declare structs with struct foo {} instead of typedef struct foo {} foo:

// Correct
struct llama_context {
    // ...
};

// In C++ code, omit optional struct and enum keywords
llama_context * ctx;  // OK
const llama_rope_type rope_type;  // OK

// Not recommended
struct llama_context * ctx;  // Not OK
const enum llama_rope_type rope_type;  // Not OK

This guideline is being applied to new code. Legacy code may not follow this convention yet.

Code Formatting

Try to follow existing patterns in the code. When in doubt, use clang-format (from clang-tools v15+) to format added code:

clang-format -i src/llama.cpp

For anything not covered in these guidelines, refer to the C++ Core Guidelines.

Tensor Operations

Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices.

Matrix multiplication is unconventional:

// C = ggml_mul_mat(ctx, A, B) means: C^T = A B^T ⟺ C = B A^T
struct ggml_tensor * C = ggml_mul_mat(ctx, A, B);

The dimensions in ggml are typically in the reverse order of pytorch dimensions.

Naming Guidelines

Function and Variable Names

Use snake_case for function, variable, and type names:

int token_count;
float temperature_value;
void process_tokens(llama_context * ctx);

Optimize for Longest Common Prefix

// Not recommended
int small_number;
int big_number;

// Recommended - easier to search and group
int number_small;
int number_big;

Enum Values

Enum values are always in upper case and prefixed with the enum name:

enum llama_vocab_type {
    LLAMA_VOCAB_TYPE_NONE = 0,
    LLAMA_VOCAB_TYPE_SPM  = 1,
    LLAMA_VOCAB_TYPE_BPE  = 2,
    LLAMA_VOCAB_TYPE_WPM  = 3,
};

Method Naming Pattern

The general naming pattern is <class>_<method>, with <method> being <action>_<noun>:

llama_model_init();           // class: "llama_model", method: "init"
llama_sampler_chain_remove(); // class: "llama_sampler_chain", method: "remove"
llama_sampler_get_seed();     // class: "llama_sampler", method: "get_seed"
llama_set_embeddings();       // class: "llama_context", method: "set_embeddings"

Guidelines:

The get action can be omitted
The noun can be omitted if not necessary
The _context suffix of the class is optional (use it to disambiguate when needed)
Use init/free for constructor/destructor actions

Opaque Types

Use the _t suffix when a type is supposed to be opaque to the user:

typedef struct llama_context * llama_context_t;

enum llama_pooling_type llama_pooling_type(const llama_context_t ctx);

File Naming

C/C++ filenames are all lowercase with dashes
Headers use the .h extension
Source files use the .c or .cpp extension
Python filenames are all lowercase with underscores

Code Maintenance

Code Ownership

Existing code should have designated collaborators and/or maintainers specified in the CODEOWNERS file responsible for:

Reviewing and merging related PRs
Fixing related bugs
Providing developer guidance/support

When Adding Large Code Changes

Add yourself to CODEOWNERS

If you are a collaborator, add yourself to CODEOWNERS to indicate your availability for reviewing related PRs.

Find a maintainer

If you are a contributor, find an existing collaborator willing to review and maintain your code long-term.

Provide CI workflow

Provide the necessary CI workflow (and hardware) to test your changes. See ci/README.md.

New code should follow the guidelines outlined in this document. For legacy reasons, existing code is not required to follow these guidelines.

Documentation

Documentation is a community effort:

When you need to look into source code to figure out how to use an API, consider adding a short summary to the header file for future reference
When you notice incorrect or outdated documentation, please update it
Document the “why” rather than the “what” when writing comments

For Maintainers

Merging Pull Requests

Squash-merge PRs

Always use squash-merge when merging pull requests.

Format commit title

Use the following format for the squashed commit title:

<module> : <commit title> (#<issue_number>)

Example: utils : fix typo in utils.py (#1234)Optionally pick a <module> from: https://github.com/ggml-org/llama.cpp/wiki/Modules

Let others merge their PRs

Let other maintainers merge their own PRs when possible.

Understand the changes

When merging a PR, make sure you have a good understanding of the changes.

Consider long-term maintenance

Be mindful that most work on a feature happens after the PR is merged. If the PR author is not committed to long-term contribution, someone else needs to take responsibility (potentially you).

Declining Pull Requests

Maintainers reserve the right to decline review or close pull requests for any reason, particularly when:

The proposed change is already mentioned in the roadmap or an existing issue and has been assigned to someone
The pull request duplicates an existing one
The contributor fails to adhere to this contributing guide

Resources

The GitHub issues, PRs, and discussions contain valuable information for getting familiar with the codebase. For convenience, important information is referenced from GitHub projects: https://github.com/ggml-org/llama.cpp/projects

Building

Contributing

Contributing to llama.cpp

Overview

Contributor Levels

AI Usage Policy

Requirements When Using AI

Pull Request Workflow

Before Submitting Your PR

After Submitting Your PR

Coding Guidelines

General Principles

Data Types

Struct Declarations

Code Formatting

Tensor Operations

Naming Guidelines

Function and Variable Names

Optimize for Longest Common Prefix

Enum Values

Method Naming Pattern

Opaque Types

File Naming

Code Maintenance

Code Ownership

When Adding Large Code Changes

Documentation

For Maintainers

Merging Pull Requests

Declining Pull Requests

Resources

Next Steps

Adding Models

Testing

Building

Contributing

​Overview

​Contributor Levels

​AI Usage Policy

​Requirements When Using AI

​Pull Request Workflow

​Before Submitting Your PR

​After Submitting Your PR

​Coding Guidelines

​General Principles

​Data Types

​Struct Declarations

​Code Formatting

​Tensor Operations

​Naming Guidelines

​Function and Variable Names

​Optimize for Longest Common Prefix

​Enum Values

​Method Naming Pattern

​Opaque Types

​File Naming

​Code Maintenance

​Code Ownership

​When Adding Large Code Changes

​Documentation

​For Maintainers

​Merging Pull Requests

​Declining Pull Requests

​Resources

​Next Steps

Adding Models

Testing

Overview

Contributor Levels

AI Usage Policy

Requirements When Using AI

Pull Request Workflow

Before Submitting Your PR

After Submitting Your PR

Coding Guidelines

General Principles

Data Types

Struct Declarations

Code Formatting

Tensor Operations

Naming Guidelines

Function and Variable Names

Optimize for Longest Common Prefix

Enum Values

Method Naming Pattern

Opaque Types

File Naming

Code Maintenance

Code Ownership

When Adding Large Code Changes

Documentation

For Maintainers

Merging Pull Requests

Declining Pull Requests

Resources

Next Steps