Testing

This guide covers the testing workflows for Codex CLI, including unit tests, integration tests, and snapshot tests.

Rust Testing

The Rust implementation uses standard cargo test along with specialized tools.

Running Tests

Run Tests for a Specific Crate

Always start by testing the specific crate you modified:

cargo test -p codex-tui
cargo test -p codex-core
cargo test -p codex-app-server-protocol

This is the fastest way to get feedback on your changes.

Run Full Test Suite (If Needed)

If you changed common, core, or protocol crates, run the complete test suite:

# Standard cargo test
cargo test

# Or with nextest (faster)
just test

Avoid --all-features for routine local runs. It expands the build matrix and significantly increases build time and disk usage. Only use it when you specifically need full feature coverage.

Check Results

Review test output for any failures or warnings.

Snapshot Tests

Codex uses snapshot tests via insta to validate rendered output, especially in codex-tui.

Requirement: Any change that affects user-visible UI must include corresponding insta snapshot coverage.

Run Tests to Generate Snapshots

cargo test -p codex-tui

Check Pending Snapshots

cargo insta pending-snapshots -p codex-tui

Review Changes

Review the generated *.snap.new files directly, or preview a specific file:

cargo insta show -p codex-tui path/to/file.snap.new

Accept Snapshots (If Correct)

Only accept if you’ve verified the changes are correct:

cargo insta accept -p codex-tui

Installing cargo-insta

If you don’t have the tool installed:

cargo install cargo-insta

Test Assertions Best Practices

use pretty_assertions::assert_eq;

#[test]
fn test_example() {
    let result = calculate_something();
    let expected = ExpectedStruct { /* ... */ };
    
    // Prefer deep equals on entire objects
    assert_eq!(result, expected);
}

Integration Tests

When writing end-to-end Codex tests, use the utilities in core_test_support::responses.

use core_test_support::responses;

#[tokio::test]
async fn test_function_call() -> Result<()> {
    let mock = responses::mount_sse_once(&server, responses::sse(vec![
        responses::ev_response_created("resp-1"),
        responses::ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
        responses::ev_completed("resp-1"),
    ])).await;
    
    codex.submit(Op::UserTurn { /* ... */ }).await?;
    
    // Assert request body
    let request = mock.single_request();
    assert_eq!(request.function_call_output(call_id)?, expected_output);
    
    Ok(())
}

Best practices for integration tests:

Prefer wait_for_event over wait_for_event_with_timeout
Prefer mount_sse_once over mount_sse_once_match or mount_sse_sequence
Avoid mutating process environment in tests

Spawning Workspace Binaries in Tests

Use codex_utils_cargo_bin::cargo_bin("...") instead of assert_cmd::Command::cargo_bin(...) when tests need to spawn first-party binaries.

use codex_utils_cargo_bin::cargo_bin;

#[test]
fn test_cli_binary() {
    let codex_bin = cargo_bin("codex");
    // Use codex_bin path...
}

This ensures paths resolve correctly under both Cargo and Bazel runfiles.

TypeScript Testing

The TypeScript implementation is legacy. This section is for reference only.

The TypeScript CLI uses Vitest for unit tests.

Running TypeScript Tests

pnpm test:watch

Git Hooks

The TypeScript project uses Husky to enforce code quality:

Pre-commit hook: Runs lint-staged to format and lint files
Pre-push hook: Runs tests and type checking

These hooks help maintain code quality and prevent pushing code with failing tests.

App-Server Protocol Testing

After changing API shapes in app-server-protocol:

Regenerate Schema Fixtures

just write-app-server-schema

# If experimental API fixtures are affected:
just write-app-server-schema --experimental

Validate Changes

cargo test -p codex-app-server-protocol

Sandbox Testing

Test commands under the Codex sandbox using dedicated subcommands:

codex sandbox macos [--full-auto] [--log-denials] [COMMAND]...

# Legacy alias
codex debug seatbelt [--full-auto] [--log-denials] [COMMAND]...

Use --log-denials on macOS to see what file accesses are being blocked by Seatbelt.

Before Submitting a PR

Before marking your PR as ready for review, run all checks locally:

Rust
TypeScript

# Format code
just fmt

# Fix linter issues
just fix -p <crate-you-touched>

# Run tests
cargo test -p <crate-you-touched>

# If you changed core crates:
cargo test

# Run full validation suite
pnpm test && pnpm run lint && pnpm run typecheck

CI failures that could have been caught locally slow down the review process. Always run checks before pushing.

Contributing

Architecture

Rust Testing

Running Tests

Snapshot Tests

Test Assertions Best Practices

Integration Tests

Spawning Workspace Binaries in Tests

TypeScript Testing

Running TypeScript Tests

Git Hooks

App-Server Protocol Testing

Sandbox Testing

Before Submitting a PR

Next Steps

Guidelines

Building

Contributing

Architecture

​Rust Testing

​Running Tests

​Snapshot Tests

​Test Assertions Best Practices

​Integration Tests

​Spawning Workspace Binaries in Tests

​TypeScript Testing

​Running TypeScript Tests

​Git Hooks

​App-Server Protocol Testing

​Sandbox Testing

​Before Submitting a PR

​Next Steps

Guidelines

Building

Rust Testing

Running Tests

Snapshot Tests

Test Assertions Best Practices

Integration Tests

Spawning Workspace Binaries in Tests

TypeScript Testing

Running TypeScript Tests

Git Hooks

App-Server Protocol Testing

Sandbox Testing

Before Submitting a PR

Next Steps