Skip to main content
The pensar benchmark command runs automated security benchmarks across different branches of a repository, comparing vulnerability counts and security posture between code versions.

Synopsis

pensar benchmark <repo-path> [options] [branch1 branch2 ...]

Description

Benchmark mode performs automated pentests on specified branches of a repository, allowing you to:
  • Compare security posture across branches
  • Track vulnerability trends over development cycles
  • Validate that security fixes reduce vulnerabilities
  • Test multiple code versions efficiently
Benchmark mode requires local access to a git repository with the source code.

Arguments

repo-path
string
required
Path to the git repository to benchmark.
pensar benchmark /path/to/vulnerable-app
Must be a valid git repository with at least one branch.
branches
string[]
Specific branches to benchmark (optional).
pensar benchmark /path/to/app main develop feature/auth
If not specified, uses --all-branches behavior or defaults to current branch.

Options

--all-branches
boolean
Test all branches in the repository.
pensar benchmark /path/to/app --all-branches
Useful for comprehensive security audits across entire codebase history.
--limit
number
Limit the number of branches to test.
pensar benchmark /path/to/app --all-branches --limit 5
Tests only the first N branches (by git branch listing order).
--skip
number
Skip the first N branches.
pensar benchmark /path/to/app --all-branches --skip 3 --limit 5
Useful for paginating through large branch lists.
--model
string
default:"claude-sonnet-4-5"
AI model to use for benchmarking.
pensar benchmark /path/to/app --model claude-opus-4
Higher-capability models may find more vulnerabilities but cost more.

Examples

Basic Branch Comparison

Compare security posture between two branches:
pensar benchmark /path/to/webapp main develop
==========================================================
BENCHMARK RESULTS
==========================================================

Branch: main
  Findings: 12 (3 CRITICAL, 5 HIGH, 3 MEDIUM, 1 LOW)
  Time: 8m 32s

Branch: develop
  Findings: 8 (2 CRITICAL, 3 HIGH, 2 MEDIUM, 1 LOW)
  Time: 7m 18s

Comparison:
  develop has 4 fewer vulnerabilities than main
  CRITICAL reduced by 1
  HIGH reduced by 2

Results saved to:
  ~/.pensar/benchmarks/2024-03-05_webapp/

Test All Branches (Limited)

Benchmark the 3 most recent branches:
pensar benchmark /path/to/api --all-branches --limit 3

Feature Branch Validation

Test if a security fix reduces vulnerabilities:
# Test before fix
pensar benchmark /path/to/app main

# Test after fix
pensar benchmark /path/to/app feature/fix-sqli

# Compare results

CI/CD Integration

Run benchmark in continuous integration:
.github/workflows/benchmark.yml
name: Security Benchmark

on:
  pull_request:
    branches: [ main ]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Fetch all history for all branches

      - name: Install Pensar Apex
        run: npm install -g @pensar/apex

      - name: Run Benchmark
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pensar benchmark . main ${{ github.head_ref }}

      - name: Check Results
        run: |
          # Fail if PR introduces new CRITICAL vulnerabilities
          # (implement custom check script)

How It Works

1

Repository Preparation

Pensar clones or accesses the specified repository and validates it’s a git repo.
2

Branch Iteration

For each specified branch:
  1. Checks out the branch
  2. Runs whitebox pentest on the codebase
  3. Stores findings separately per branch
3

Results Comparison

After all branches are tested, generates a comparison report showing:
  • Vulnerability counts per severity
  • New vulnerabilities introduced
  • Vulnerabilities fixed
  • Trend analysis
4

Report Generation

Creates markdown and JSON reports in:
~/.pensar/benchmarks/<timestamp>_<repo-name>/
  ├── comparison.md
  ├── comparison.json
  ├── main/
  │   ├── findings.json
  │   └── pocs/
  └── develop/
      ├── findings.json
      └── pocs/

Use Cases

Track security improvements across development:
# Weekly security benchmarks
pensar benchmark /path/to/app --all-branches --limit 10
Monitor if new features introduce vulnerabilities.

Limitations

Benchmark mode needs full source code access. It cannot run on blackbox targets without source.
Each branch takes 5-15 minutes to test depending on codebase size. Benchmarking 10 branches may take 1-2 hours.
Tests the code as it exists on each branch at the time of testing. Does not account for runtime environment differences.
AI-based testing may find different vulnerabilities on repeated runs of the same branch. Use consistent models for comparability.

Troubleshooting

Ensure the path points to a valid git repository:
cd /path/to/app
git status  # Should show branch info
Initialize git if needed:
git init
Verify branch exists:
git branch -a | grep branch-name
Fetch remote branches if needed:
git fetch --all
Use --limit to reduce branches:
pensar benchmark /path/to/app --all-branches --limit 3
Or test specific branches only:
pensar benchmark /path/to/app main develop

Next Steps

Whitebox Testing

Learn more about source code security analysis

CI/CD Integration

Automate benchmarks in your pipeline

Pentest Command

Run standard pentests instead of benchmarks

API Reference

Use the benchmark API programmatically

Build docs developers (and LLMs) love