The Benchmark API allows you to programmatically compare security posture across different branches or commits of a repository.
runBenchmarkComparisonAgent
Runs a benchmark comparison agent to analyze security differences between code versions.
Import
import { runBenchmarkComparisonAgent } from '@pensar/apex';
Function Signature
async function runBenchmarkComparisonAgent(
input: BenchmarkComparisonAgentInput
): Promise<{
comparison: BenchmarkComparison | null;
resultsPath: string;
}>
Parameters
input
BenchmarkComparisonAgentInput
required
Configuration for the benchmark comparison.Show BenchmarkComparisonAgentInput
Path to the git repository to benchmark.
List of branches to compare.
AI model to use (default: claude-sonnet-4-5).
AI provider authentication configuration.
Event handlers for agent execution:
onTextDelta: Stream text output
onToolCall: Tool invocation events
onToolResult: Tool completion events
onError: Error handling
Return Value
comparison
BenchmarkComparison | null
Comparison results between branches.
Vulnerabilities found in both branches.
Expected number of findings.
Path to the results directory.
Example Usage
Basic Benchmark
Compare two branches:
import { runBenchmarkComparisonAgent } from '@pensar/apex';
const result = await runBenchmarkComparisonAgent({
repository: '/path/to/webapp',
branches: ['main', 'develop'],
model: 'claude-sonnet-4-5',
authConfig: {
anthropicAPIKey: process.env.ANTHROPIC_API_KEY
},
callbacks: {
onTextDelta: (d) => process.stdout.write(d.text),
onToolCall: (d) => console.log(`→ ${d.toolName}`),
onToolResult: (d) => console.log(`✓ ${d.toolName} completed`),
onError: (e) => console.error('Error:', e)
}
});
if (result.comparison) {
console.log(`Matched: ${result.comparison.matched.length}/${result.comparison.totalExpected}`);
console.log(`Precision: ${Math.round(result.comparison.precision * 100)}%`);
console.log(`Recall: ${Math.round(result.comparison.recall * 100)}%`);
}
Multiple Branches
Compare across multiple versions:
const versions = ['v1.0.0', 'v1.1.0', 'v1.2.0'];
for (let i = 0; i < versions.length - 1; i++) {
const result = await runBenchmarkComparisonAgent({
repository: '/path/to/api',
branches: [versions[i], versions[i + 1]],
model: 'claude-opus-4'
});
console.log(`\n${versions[i]} -> ${versions[i + 1]}:`);
console.log(`Results: ${result.resultsPath}`);
}
CI/CD Integration
Automated benchmark in CI:
import { runBenchmarkComparisonAgent } from '@pensar/apex';
const mainBranch = 'main';
const prBranch = process.env.GITHUB_HEAD_REF!;
const result = await runBenchmarkComparisonAgent({
repository: process.cwd(),
branches: [mainBranch, prBranch],
model: 'claude-sonnet-4-5',
authConfig: {
anthropicAPIKey: process.env.ANTHROPIC_API_KEY
}
});
// Fail if PR introduces new critical vulnerabilities
if (result.comparison && result.comparison.recall < 1.0) {
console.error('PR introduces new vulnerabilities!');
process.exit(1);
}
Use Cases
Release Validation
Verify security fixes reduce vulnerabilities
Code Review
Assess security impact of pull requests
Regression Testing
Ensure no security regressions
Trend Analysis
Track security posture over time
Benchmark Command
CLI interface for benchmarking
Whitebox Testing
Source code analysis guide
CI/CD Integration
Automate benchmarks in pipelines
Findings
Understanding vulnerability findings