Skip to main content
This guide covers the testing workflows for Codex CLI, including unit tests, integration tests, and snapshot tests.

Rust Testing

The Rust implementation uses standard cargo test along with specialized tools.

Running Tests

1

Run Tests for a Specific Crate

Always start by testing the specific crate you modified:
cargo test -p codex-tui
cargo test -p codex-core
cargo test -p codex-app-server-protocol
This is the fastest way to get feedback on your changes.
2

Run Full Test Suite (If Needed)

If you changed common, core, or protocol crates, run the complete test suite:
# Standard cargo test
cargo test

# Or with nextest (faster)
just test
Avoid --all-features for routine local runs. It expands the build matrix and significantly increases build time and disk usage. Only use it when you specifically need full feature coverage.
3

Check Results

Review test output for any failures or warnings.

Snapshot Tests

Codex uses snapshot tests via insta to validate rendered output, especially in codex-tui.
Requirement: Any change that affects user-visible UI must include corresponding insta snapshot coverage.
1

Run Tests to Generate Snapshots

cargo test -p codex-tui
2

Check Pending Snapshots

cargo insta pending-snapshots -p codex-tui
3

Review Changes

Review the generated *.snap.new files directly, or preview a specific file:
cargo insta show -p codex-tui path/to/file.snap.new
4

Accept Snapshots (If Correct)

Only accept if you’ve verified the changes are correct:
cargo insta accept -p codex-tui
If you don’t have the tool installed:
cargo install cargo-insta

Test Assertions Best Practices

use pretty_assertions::assert_eq;

#[test]
fn test_example() {
    let result = calculate_something();
    let expected = ExpectedStruct { /* ... */ };
    
    // Prefer deep equals on entire objects
    assert_eq!(result, expected);
}

Integration Tests

When writing end-to-end Codex tests, use the utilities in core_test_support::responses.
use core_test_support::responses;

#[tokio::test]
async fn test_function_call() -> Result<()> {
    let mock = responses::mount_sse_once(&server, responses::sse(vec![
        responses::ev_response_created("resp-1"),
        responses::ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
        responses::ev_completed("resp-1"),
    ])).await;
    
    codex.submit(Op::UserTurn { /* ... */ }).await?;
    
    // Assert request body
    let request = mock.single_request();
    assert_eq!(request.function_call_output(call_id)?, expected_output);
    
    Ok(())
}
Best practices for integration tests:
  • Prefer wait_for_event over wait_for_event_with_timeout
  • Prefer mount_sse_once over mount_sse_once_match or mount_sse_sequence
  • Avoid mutating process environment in tests

Spawning Workspace Binaries in Tests

Use codex_utils_cargo_bin::cargo_bin("...") instead of assert_cmd::Command::cargo_bin(...) when tests need to spawn first-party binaries.
use codex_utils_cargo_bin::cargo_bin;

#[test]
fn test_cli_binary() {
    let codex_bin = cargo_bin("codex");
    // Use codex_bin path...
}
This ensures paths resolve correctly under both Cargo and Bazel runfiles.

TypeScript Testing

The TypeScript implementation is legacy. This section is for reference only.
The TypeScript CLI uses Vitest for unit tests.

Running TypeScript Tests

pnpm test:watch

Git Hooks

The TypeScript project uses Husky to enforce code quality:
  • Pre-commit hook: Runs lint-staged to format and lint files
  • Pre-push hook: Runs tests and type checking
These hooks help maintain code quality and prevent pushing code with failing tests.

App-Server Protocol Testing

After changing API shapes in app-server-protocol:
1

Regenerate Schema Fixtures

just write-app-server-schema

# If experimental API fixtures are affected:
just write-app-server-schema --experimental
2

Validate Changes

cargo test -p codex-app-server-protocol

Sandbox Testing

Test commands under the Codex sandbox using dedicated subcommands:
codex sandbox macos [--full-auto] [--log-denials] [COMMAND]...

# Legacy alias
codex debug seatbelt [--full-auto] [--log-denials] [COMMAND]...
Use --log-denials on macOS to see what file accesses are being blocked by Seatbelt.

Before Submitting a PR

Before marking your PR as ready for review, run all checks locally:
# Format code
just fmt

# Fix linter issues
just fix -p <crate-you-touched>

# Run tests
cargo test -p <crate-you-touched>

# If you changed core crates:
cargo test
CI failures that could have been caught locally slow down the review process. Always run checks before pushing.

Next Steps

Guidelines

Review contribution guidelines

Building

Learn how to build the project