TensorRT-LLM uses a Jenkins-based CI/CD system that runs unit tests and integration tests across multiple GPU configurations. This page explains how the CI is organized, how tests map to Jenkins stages, and how to trigger specific test stages.
Pull requests do not automatically trigger CI. Developers must comment on the PR to start testing:
# Run standard pre-merge pipeline/bot run# Run specific stages only/bot run --stage-list "stage-A,stage-B"# Add extra stages to pre-merge set/bot run --extra-stage "stage-A,stage-B"# Run all stages even if earlier ones fail (use sparingly)/bot run --disable-fail-fast# Include AutoDeploy stages/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"
Avoid habitually using --disable-fail-fast as it wastes scarce hardware resources. The CI system automatically reuses successful test stages when commits remain unchanged. Overusing this flag keeps failed pipelines consuming resources (like DGX-H100s), increasing queue backlogs and reducing team efficiency.
For a full list of available commands, post /bot help as a PR comment.
Unit tests are located in tests/unittest/ and run during the merge-request pipeline. They are invoked from jenkins/L0_MergeRequest.groovy and do not require mapping to specific hardware stages.Running unit tests locally:
# Run all unit testspytest tests/unittest/# Run specific test filepytest tests/unittest/llmapi/test_llm_args.py# Run tests matching patternpytest tests/unittest -k "test_llm_args"
Integration tests are defined in YAML files under tests/integration/test_lists/test-db/. Most files are named after the GPU or configuration they run on:
l0_a100.yml - Tests for A100 GPUs
l0_h100.yml - Tests for H100 GPUs
l0_a10.yml - Tests for A10 GPUs
l0_sanity_check.yml - Tests that run on multiple hardware types
YAML structure:
terms: stage: post_merge # or pre_merge backend: triton # pytorch, tensorrt, or tritontests: - triton_server/test_triton.py::test_gpt_ib_ptuning[gpt-ib-ptuning]
Key fields:
stage: Either pre_merge or post_merge
backend: pytorch, tensorrt, or triton
tests: List of pytest test paths
Running integration tests locally:
Integration tests require GPU access and the LLM_MODELS_ROOT environment variable set to the path containing model weights.
# Set model rootexport LLM_MODELS_ROOT=/path/to/models# Run integration testspytest tests/integration/defs/...
Located in tests/api_stability/, these tests protect committed API signatures. Changes to LLM API signatures will fail these tests and require code owner review.
The helper script scripts/test_to_stage_mapping.py automates stage lookup:
# Find stages that run a specific testpython scripts/test_to_stage_mapping.py \ --tests "triton_server/test_triton.py::test_gpt_ib_ptuning[gpt-ib-ptuning]"# Find stages using pattern matchingpython scripts/test_to_stage_mapping.py --tests gpt_ib_ptuning# List all tests in a specific stagepython scripts/test_to_stage_mapping.py --stages A100X-Triton-Post-Merge-1# Read tests from a filepython scripts/test_to_stage_mapping.py --test-list my_tests.txtpython scripts/test_to_stage_mapping.py --test-list my_tests.yml
Patterns are matched by substring, so partial test names work. When providing tests on the command line, quote each test string so the shell doesn’t interpret [ and ] as globs.
Example workflow:
# Find which stages run a testpython scripts/test_to_stage_mapping.py --tests "test_gpt_ib_ptuning"# Output:# A100X-Triton-[Post-Merge]-1# A100X-Triton-[Post-Merge]-2# Run those stages on your PR/bot run --stage-list "A100X-Triton-[Post-Merge]-1,A100X-Triton-[Post-Merge]-2"
Sometimes tests are known to fail due to bugs or unsupported features. Instead of removing them from YAML files, add them to tests/integration/test_lists/waives.txt.The CI passes this file to pytest via --waives-file, automatically skipping listed tests.Format:
# General waive with bug linkexamples/test_openai.py::test_llm_openai_triton_1gpu SKIP (https://nvbugspro.nvidia.com/bug/4963654)# GPU-specific waivefull:GH200/examples/test_qwen2audio.py::test_llm_qwen2audio_single_gpu[qwen2_audio_7b_instruct] SKIP (arm is not supported)
Changes to waives.txt should include a bug link or brief explanation so other developers understand why the test is disabled.
Resolve the Jenkins base URL and fetch failure data:
JENKINS_BASE="$(curl -skI 'https://nv/trt-llm-cicd' 2>/dev/null | \ grep -i '^location:' | sed 's/^[Ll]ocation: *//;s/[[:space:]]*$//')job/main/job/L0_MergeRequest_PR"curl -s "${JENKINS_BASE}/${BUILD_NUM}/testReport/api/json" | python3 -c "import json, sysdata = json.load(sys.stdin)print(f'Summary: {data[\"passCount\"]} passed, {data[\"failCount\"]} failed, {data[\"skipCount\"]} skipped')failed = []for suite in data.get('suites', []): for case in suite.get('cases', []): if case.get('status') in ('FAILED', 'REGRESSION'): failed.append(case)if not failed: print('No test failures!')else: print(f'Failed tests ({len(failed)}):') for f in failed: print(f' - {f[\"className\"]}.{f[\"name\"]}') err = (f.get('errorDetails') or '')[:200] if err: print(f' Error: {err}')"
The CI system automatically reuses successful test stages when commits remain unchanged, and subsequent /bot run commands only retry failed stages. Using --disable-fail-fast unnecessarily:
Wastes scarce hardware resources
Keeps failed pipelines consuming DGX-H100s
Increases queue backlogs for all developers
Reduces team efficiency
Only use --disable-fail-fast when explicitly needed.