Building from Source Code on Linux
This document provides instructions for building TensorRT-LLM from source code on Linux. Building from source is recommended for:- Achieving optimal performance
- Enabling debugging capabilities
- When you need a different GNU CXX11 ABI configuration than what is available in the pre-built TensorRT-LLM wheel on PyPI
The current pre-built TensorRT-LLM wheel on PyPI is linked against PyTorch 2.9.1, which uses the new CXX11 ABI.
Prerequisites
Use Docker to build and run TensorRT-LLM. Instructions to install an environment to run Docker containers for the NVIDIA platform can be found here.Clone the Repository
If you intend to build any TensorRT-LLM artifacts, you first need to clone the TensorRT-LLM repository:Building a TensorRT-LLM Docker Image
There are two options to create a TensorRT-LLM Docker image. The approximate disk space required to build the image is 63 GB.Option 1: Build TensorRT-LLM in One Step
TensorRT-LLM contains a simple command to create a Docker image. Note that if you plan to develop on TensorRT-LLM, we recommend using Option 2: Build Step-By-Step.CUDA_ARCHS="<list of architectures in CMake format>" optional argument to specify which architectures should be supported by TensorRT-LLM. It restricts the supported GPU architectures but helps reduce compilation time:
make command supports the LOCAL_USER=1 argument to switch to the local user account instead of root inside the container. The examples of TensorRT-LLM are installed in the /app/tensorrt_llm/examples directory.
Since TensorRT-LLM has been built and installed, you can skip the remaining steps.
Option 2: Container for Building TensorRT-LLM Step-by-Step
If you are looking for more flexibility, TensorRT-LLM has commands to create and run a development container in which TensorRT-LLM can be built.On Systems with GNU make
- Create a Docker image for development. The image will be tagged locally with
tensorrt_llm/devel:latest:
- Run the container:
root, add the LOCAL_USER=1 option:
Using Enroot Instead of Docker
If you wish to use enroot instead of docker, you can build a sqsh file that has the identical environment as the development imagetensorrt_llm/devel:latest as follows:
- Allocate a compute node:
- Create a sqsh file with essential TensorRT-LLM dependencies installed:
- Once this squash file is ready, launch an enroot sandbox from
dev_trtllm_image.sqsh:
On Systems Without GNU make
- Create a Docker image for development:
- Run the container:
Advanced Topics
For more information on building and running various TensorRT-LLM container images, check the docker directory in the repository.Build TensorRT-LLM
Option 1: Full Build with C++ Compilation
The following command compiles the C++ code and packages the compiled libraries along with the Python files into a wheel. When developing C++ code, you need this full build command to apply your code changes.Build Options
By default,build_wheel.py enables incremental builds. To clean the build directory, add the --clean option:
build_wheel.py script accepts a semicolon-separated list of CUDA architectures:
benchmarks/cpp/, add the --benchmarks option:
Building the Python Bindings for the C++ Runtime
The C++ Runtime can be exposed to Python via bindings. This feature can be turned on through the default build options:tensorrt_llm.bindings package. Running help on this package in a Python interpreter will provide an overview of the relevant classes. The associated unit tests should also be consulted for understanding the API.
This feature will not be enabled when building only the C++ runtime.
Linking with the TensorRT-LLM C++ Runtime
Thebuild_wheel.py script will also compile the library containing the C++ runtime of TensorRT-LLM. If Python support and torch modules are not required, the script provides the option --cpp_only which restricts the build to the C++ runtime only:
torch (prior to 2.7.0) due to the Dual ABI support in GCC.
The --clean option removes the build directory before starting a new build. By default, TensorRT-LLM uses cpp/build as the build directory, but you can specify a different location with the --build_dir option.
For a complete list of available build options, run:
Supported C++ Header Files
When using TensorRT-LLM, you need to add thecpp and cpp/include directories to the project’s include paths. Only header files contained in cpp/include are part of the supported API and may be directly included. Other headers contained under cpp should not be included directly since they might change in future versions.
Option 2: Python-Only Build without C++ Compilation
If you only need to modify Python code, it is possible to package and install TensorRT-LLM without compilation.TRTLLM_USE_PRECOMPILED=1 enables downloading a prebuilt wheel of the version specified in tensorrt_llm/version.py, extracting compiled libraries into your current directory, thus skipping C++ compilation. This version can be overridden by specifying TRTLLM_USE_PRECOMPILED=x.y.z.
You can specify a custom URL or local path for downloading using TRTLLM_PRECOMPILED_LOCATION. For example, to use version 0.16.0 from PyPI:
Known Limitations
Next Steps
After building TensorRT-LLM from source:- Follow the Coding Guidelines when making changes
- Review the Contributing Guide before submitting pull requests
- Run tests to verify your build
- Start developing!