pensar benchmark

The pensar benchmark command runs automated security benchmarks across different branches of a repository, comparing vulnerability counts and security posture between code versions.

Synopsis

pensar benchmark <repo-path> [options] [branch1 branch2 ...]

Description

Benchmark mode performs automated pentests on specified branches of a repository, allowing you to:

Compare security posture across branches
Track vulnerability trends over development cycles
Validate that security fixes reduce vulnerabilities
Test multiple code versions efficiently

Benchmark mode requires local access to a git repository with the source code.

Arguments

repo-path

string

required

Path to the git repository to benchmark.

pensar benchmark /path/to/vulnerable-app

Must be a valid git repository with at least one branch.

branches

string[]

Specific branches to benchmark (optional).

pensar benchmark /path/to/app main develop feature/auth

If not specified, uses --all-branches behavior or defaults to current branch.

Options

--all-branches

boolean

Test all branches in the repository.

pensar benchmark /path/to/app --all-branches

Useful for comprehensive security audits across entire codebase history.

--limit

number

Limit the number of branches to test.

pensar benchmark /path/to/app --all-branches --limit 5

Tests only the first N branches (by git branch listing order).

--skip

number

Skip the first N branches.

pensar benchmark /path/to/app --all-branches --skip 3 --limit 5

Useful for paginating through large branch lists.

--model

string

default:"claude-sonnet-4-5"

AI model to use for benchmarking.

pensar benchmark /path/to/app --model claude-opus-4

Higher-capability models may find more vulnerabilities but cost more.

Examples

Basic Branch Comparison

Compare security posture between two branches:

pensar benchmark /path/to/webapp main develop

Example Output

==========================================================
BENCHMARK RESULTS
==========================================================

Branch: main
  Findings: 12 (3 CRITICAL, 5 HIGH, 3 MEDIUM, 1 LOW)
  Time: 8m 32s

Branch: develop
  Findings: 8 (2 CRITICAL, 3 HIGH, 2 MEDIUM, 1 LOW)
  Time: 7m 18s

Comparison:
  develop has 4 fewer vulnerabilities than main
  CRITICAL reduced by 1
  HIGH reduced by 2

Results saved to:
  ~/.pensar/benchmarks/2024-03-05_webapp/

Test All Branches (Limited)

Benchmark the 3 most recent branches:

pensar benchmark /path/to/api --all-branches --limit 3

Feature Branch Validation

Test if a security fix reduces vulnerabilities:

# Test before fix
pensar benchmark /path/to/app main

# Test after fix
pensar benchmark /path/to/app feature/fix-sqli

# Compare results

CI/CD Integration

Run benchmark in continuous integration:

.github/workflows/benchmark.yml

name: Security Benchmark

on:
  pull_request:
    branches: [ main ]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Fetch all history for all branches

      - name: Install Pensar Apex
        run: npm install -g @pensar/apex

      - name: Run Benchmark
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pensar benchmark . main ${{ github.head_ref }}

      - name: Check Results
        run: |
          # Fail if PR introduces new CRITICAL vulnerabilities
          # (implement custom check script)

How It Works

Repository Preparation

Pensar clones or accesses the specified repository and validates it’s a git repo.

Branch Iteration

For each specified branch:

Checks out the branch
Runs whitebox pentest on the codebase
Stores findings separately per branch

Results Comparison

After all branches are tested, generates a comparison report showing:

Vulnerability counts per severity
New vulnerabilities introduced
Vulnerabilities fixed
Trend analysis

Report Generation

Creates markdown and JSON reports in:

~/.pensar/benchmarks/<timestamp>_<repo-name>/
  ├── comparison.md
  ├── comparison.json
  ├── main/
  │   ├── findings.json
  │   └── pocs/
  └── develop/
      ├── findings.json
      └── pocs/

Use Cases

Development Cycle
Security Fix Validation
Code Review
Historical Analysis

Track security improvements across development:

# Weekly security benchmarks
pensar benchmark /path/to/app --all-branches --limit 10

Monitor if new features introduce vulnerabilities.

Verify security patches work:

# Before fix
pensar benchmark /path/to/app v1.0.0

# After fix
pensar benchmark /path/to/app v1.0.1

# Confirm CRITICAL count decreased

Assess security impact of pull requests:

# Compare PR branch against main
pensar benchmark /path/to/app main feature/new-auth

# Review introduced vulnerabilities

Analyze security trends over time:

# Test release tags
pensar benchmark /path/to/app v1.0.0 v1.1.0 v1.2.0

# Plot vulnerability trends

Limitations

Requires whitebox access

Benchmark mode needs full source code access. It cannot run on blackbox targets without source.

Time intensive

Each branch takes 5-15 minutes to test depending on codebase size. Benchmarking 10 branches may take 1-2 hours.

Branch state matters

Tests the code as it exists on each branch at the time of testing. Does not account for runtime environment differences.

Determinism not guaranteed

AI-based testing may find different vulnerabilities on repeated runs of the same branch. Use consistent models for comparability.

Troubleshooting

'Not a git repository' error

Ensure the path points to a valid git repository:

cd /path/to/app
git status  # Should show branch info

Initialize git if needed:

git init

Branch not found

Verify branch exists:

git branch -a | grep branch-name

Fetch remote branches if needed:

git fetch --all

Benchmark taking too long

Use --limit to reduce branches:

pensar benchmark /path/to/app --all-branches --limit 3

Or test specific branches only:

pensar benchmark /path/to/app main develop

Next Steps

Whitebox Testing

Learn more about source code security analysis

CI/CD Integration

Automate benchmarks in your pipeline

Pentest Command

Run standard pentests instead of benchmarks

API Reference

Use the benchmark API programmatically

Get Started

Core Concepts

Command Reference

Configuration

Guides

Security

Synopsis

Description

Arguments

Options

Examples

Basic Branch Comparison

Test All Branches (Limited)

Feature Branch Validation

CI/CD Integration

How It Works

Use Cases

Limitations

Troubleshooting

Next Steps

Whitebox Testing

CI/CD Integration

Pentest Command

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Command Reference

Configuration

Guides

Security

​Synopsis

​Description

​Arguments

​Options

​Examples

​Basic Branch Comparison

​Test All Branches (Limited)

​Feature Branch Validation

​CI/CD Integration

​How It Works

​Use Cases

​Limitations

​Troubleshooting

​Next Steps

Whitebox Testing

CI/CD Integration

Pentest Command

API Reference

Build docs developers (and LLMs) love

Synopsis

Description

Arguments

Options

Examples

Basic Branch Comparison

Test All Branches (Limited)

Feature Branch Validation

CI/CD Integration

How It Works

Use Cases

Limitations

Troubleshooting

Next Steps