Skip to main content
The Benchmark API allows you to programmatically compare security posture across different branches or commits of a repository.

runBenchmarkComparisonAgent

Runs a benchmark comparison agent to analyze security differences between code versions.

Import

import { runBenchmarkComparisonAgent } from '@pensar/apex';

Function Signature

async function runBenchmarkComparisonAgent(
  input: BenchmarkComparisonAgentInput
): Promise<{
  comparison: BenchmarkComparison | null;
  resultsPath: string;
}>

Parameters

input
BenchmarkComparisonAgentInput
required
Configuration for the benchmark comparison.

Return Value

comparison
BenchmarkComparison | null
Comparison results between branches.
resultsPath
string
Path to the results directory.

Example Usage

Basic Benchmark

Compare two branches:
import { runBenchmarkComparisonAgent } from '@pensar/apex';

const result = await runBenchmarkComparisonAgent({
  repository: '/path/to/webapp',
  branches: ['main', 'develop'],
  model: 'claude-sonnet-4-5',
  authConfig: {
    anthropicAPIKey: process.env.ANTHROPIC_API_KEY
  },
  callbacks: {
    onTextDelta: (d) => process.stdout.write(d.text),
    onToolCall: (d) => console.log(`→ ${d.toolName}`),
    onToolResult: (d) => console.log(`✓ ${d.toolName} completed`),
    onError: (e) => console.error('Error:', e)
  }
});

if (result.comparison) {
  console.log(`Matched: ${result.comparison.matched.length}/${result.comparison.totalExpected}`);
  console.log(`Precision: ${Math.round(result.comparison.precision * 100)}%`);
  console.log(`Recall: ${Math.round(result.comparison.recall * 100)}%`);
}

Multiple Branches

Compare across multiple versions:
const versions = ['v1.0.0', 'v1.1.0', 'v1.2.0'];

for (let i = 0; i < versions.length - 1; i++) {
  const result = await runBenchmarkComparisonAgent({
    repository: '/path/to/api',
    branches: [versions[i], versions[i + 1]],
    model: 'claude-opus-4'
  });

  console.log(`\n${versions[i]} -> ${versions[i + 1]}:`);
  console.log(`Results: ${result.resultsPath}`);
}

CI/CD Integration

Automated benchmark in CI:
import { runBenchmarkComparisonAgent } from '@pensar/apex';

const mainBranch = 'main';
const prBranch = process.env.GITHUB_HEAD_REF!;

const result = await runBenchmarkComparisonAgent({
  repository: process.cwd(),
  branches: [mainBranch, prBranch],
  model: 'claude-sonnet-4-5',
  authConfig: {
    anthropicAPIKey: process.env.ANTHROPIC_API_KEY
  }
});

// Fail if PR introduces new critical vulnerabilities
if (result.comparison && result.comparison.recall < 1.0) {
  console.error('PR introduces new vulnerabilities!');
  process.exit(1);
}

Use Cases

Release Validation

Verify security fixes reduce vulnerabilities

Code Review

Assess security impact of pull requests

Regression Testing

Ensure no security regressions

Trend Analysis

Track security posture over time

Benchmark Command

CLI interface for benchmarking

Whitebox Testing

Source code analysis guide

CI/CD Integration

Automate benchmarks in pipelines

Findings

Understanding vulnerability findings

Build docs developers (and LLMs) love