Benchmark API

The Benchmark API allows you to programmatically compare security posture across different branches or commits of a repository.

runBenchmarkComparisonAgent

Runs a benchmark comparison agent to analyze security differences between code versions.

Import

import { runBenchmarkComparisonAgent } from '@pensar/apex';

Function Signature

async function runBenchmarkComparisonAgent(
  input: BenchmarkComparisonAgentInput
): Promise<{
  comparison: BenchmarkComparison | null;
  resultsPath: string;
}>

Parameters

input

BenchmarkComparisonAgentInput

required

Configuration for the benchmark comparison.

Show BenchmarkComparisonAgentInput

repository

string

required

Path to the git repository to benchmark.

branches

string[]

required

List of branches to compare.

model

AIModel

AI model to use (default: claude-sonnet-4-5).

authConfig

AIAuthConfig

AI provider authentication configuration.

callbacks

ConsumeCallbacks

Event handlers for agent execution:

onTextDelta: Stream text output
onToolCall: Tool invocation events
onToolResult: Tool completion events
onError: Error handling

Return Value

comparison

BenchmarkComparison | null

Comparison results between branches.

Show BenchmarkComparison

matched

Finding[]

Vulnerabilities found in both branches.

totalExpected

number

Expected number of findings.

precision

number

Precision score (0-1).

recall

number

Recall score (0-1).

resultsPath

string

Path to the results directory.

Example Usage

Basic Benchmark

Compare two branches:

import { runBenchmarkComparisonAgent } from '@pensar/apex';

const result = await runBenchmarkComparisonAgent({
  repository: '/path/to/webapp',
  branches: ['main', 'develop'],
  model: 'claude-sonnet-4-5',
  authConfig: {
    anthropicAPIKey: process.env.ANTHROPIC_API_KEY
  },
  callbacks: {
    onTextDelta: (d) => process.stdout.write(d.text),
    onToolCall: (d) => console.log(`→ ${d.toolName}`),
    onToolResult: (d) => console.log(`✓ ${d.toolName} completed`),
    onError: (e) => console.error('Error:', e)
  }
});

if (result.comparison) {
  console.log(`Matched: ${result.comparison.matched.length}/${result.comparison.totalExpected}`);
  console.log(`Precision: ${Math.round(result.comparison.precision * 100)}%`);
  console.log(`Recall: ${Math.round(result.comparison.recall * 100)}%`);
}

Multiple Branches

Compare across multiple versions:

const versions = ['v1.0.0', 'v1.1.0', 'v1.2.0'];

for (let i = 0; i < versions.length - 1; i++) {
  const result = await runBenchmarkComparisonAgent({
    repository: '/path/to/api',
    branches: [versions[i], versions[i + 1]],
    model: 'claude-opus-4'
  });

  console.log(`\n${versions[i]} -> ${versions[i + 1]}:`);
  console.log(`Results: ${result.resultsPath}`);
}

CI/CD Integration

Automated benchmark in CI:

import { runBenchmarkComparisonAgent } from '@pensar/apex';

const mainBranch = 'main';
const prBranch = process.env.GITHUB_HEAD_REF!;

const result = await runBenchmarkComparisonAgent({
  repository: process.cwd(),
  branches: [mainBranch, prBranch],
  model: 'claude-sonnet-4-5',
  authConfig: {
    anthropicAPIKey: process.env.ANTHROPIC_API_KEY
  }
});

// Fail if PR introduces new critical vulnerabilities
if (result.comparison && result.comparison.recall < 1.0) {
  console.error('PR introduces new vulnerabilities!');
  process.exit(1);
}

Use Cases

Release Validation

Verify security fixes reduce vulnerabilities

Code Review

Assess security impact of pull requests

Regression Testing

Ensure no security regressions

Trend Analysis

Track security posture over time

Benchmark Command

CLI interface for benchmarking

Whitebox Testing

Source code analysis guide

CI/CD Integration

Automate benchmarks in pipelines

Findings

Understanding vulnerability findings

API Overview

Core APIs

Agents

runBenchmarkComparisonAgent

Import

Function Signature

Parameters

Return Value

Example Usage

Basic Benchmark

Multiple Branches

CI/CD Integration

Use Cases

Release Validation

Code Review

Regression Testing

Trend Analysis

Benchmark Command

Whitebox Testing

CI/CD Integration

Findings

Build docs developers (and LLMs) love

API Overview

Core APIs

Agents

​runBenchmarkComparisonAgent

​Import

​Function Signature

​Parameters

​Return Value

​Example Usage

​Basic Benchmark

​Multiple Branches

​CI/CD Integration

​Use Cases

Release Validation

Code Review

Regression Testing

Trend Analysis

​Related

Benchmark Command

Whitebox Testing

CI/CD Integration

Findings

Build docs developers (and LLMs) love

runBenchmarkComparisonAgent

Import

Function Signature

Parameters

Return Value

Example Usage

Basic Benchmark

Multiple Branches

CI/CD Integration

Use Cases

Related