Skip to main content

Benchmark Overview

We benchmarked bdg (CLI) against Chrome DevTools MCP Server on five real developer debugging workflows to answer: which interface paradigm serves AI agents better?

CLI Score

77/100 points

MCP Score

60/100 points

Token Efficiency

+33% better for CLI (202.1 vs 152.3 TES)

Time Comparison

MCP 27% faster (323s vs 441s)

Test Suite

TestDifficultyTaskTime Limit
Basic Error⭐ EasyFind and diagnose one JS error2 min
Multiple Errors⭐⭐ ModerateCapture and categorize 5+ errors3 min
SPA Debugging⭐⭐⭐ AdvancedDebug React app, correlate console/network5 min
Form Validation⭐⭐⭐⭐ ExpertTest validation logic, find bugs5 min
Memory Leak⭐⭐⭐⭐⭐ MasterDetect and quantify DOM memory leak8 min

Methodology

  • Same URLs and time limits for both tools
  • Alternating test order to prevent learning bias
  • Metrics: task score, completion time, tokens consumed
  • Token Efficiency Score (TES) = (Score × 100) / (Tokens / 1000)

Overall Results

Metricbdg (CLI)MCPDifference
Total Score77/10060/100+28%
Total Time441s323s+37% slower
Total Tokens~38.1K~39.4K-3% fewer
Token Efficiency202.1152.3+33%
Winner: CLI - Higher score with similar token usage = better efficiency

Test-by-Test Analysis

Test 1: Basic Error Detection (⭐ Easy)

MetricbdgMCP
Score18/2014/20
Time69s46s
Tokens~3.6K~4.8K
Penalty-2 pts0
Takeaway: bdg’s structured JSON with full stack traces saves developer investigation time

Test 2: Multiple Error Collection (⭐⭐ Moderate)

MetricbdgMCP
Score18/2012/20
Time75s48s
Tokens~18.7K~9.3K
Errors Found18 (14 unique)3
MCP doesn’t expose arbitrary JavaScript execution - each interaction requires a separate tool call

Test 3: SPA Debugging (⭐⭐⭐ Advanced)

MetricbdgMCP
Score14/2013/20
Time100s57s
Tokens~4.7K~6.6K
Takeaway: When an application has no bugs to find, tools perform similarly. bdg’s advantage comes from deeper analysis capabilities.

Test 4: Form Validation Testing (⭐⭐⭐⭐ Expert)

MetricbdgMCP
Score15/2013/20
Time93s102s
Tokens~3.5K~15.2K
Penalty0-2 pts (timeout)
MCP exceeded time limit by 42s due to verbose accessibility tree dumps on every interaction

Test 5: Memory Leak Detection (⭐⭐⭐⭐⭐ Master)

MetricbdgMCP
Score12/208/20
Time104s70s
Tokens~7.6K~3.5K
Critical advantage: Memory debugging requires CDP access - bdg’s core strength, MCP’s blind spot

Capability Comparison

Coverage Matrix

CDP Domainbdg AccessMCP Access
RuntimeFull (21 methods)evaluate_script only
DOMFull (39 methods)Via accessibility tree
NetworkFull + HAR exportlist_network_requests
ConsoleFull + streaminglist_console_messages
HeapProfilerFull (17 methods)None
DebuggerFull (35 methods)None
PerformanceFull (22 methods)None
AccessibilitySelective queriesFull tree dumps

Feature Comparison

Comprehensive error data - Full stack traces, line numbers, function namesEfficient batch operations - JavaScript eval for clicking multiple elementsAdvanced CDP access - Direct access to HeapProfiler, Runtime, all 53 domainsStructured output - JSON format with detailed error categorizationNetwork analysis - HAR export capability for detailed request inspectionMemory profiling - Native heap measurement via CDPUnix composability - Pipes naturally with jq, grep, etc.Selective queries - Fetch only what’s needed, control token usage

Token Efficiency Analysis

Test-by-Test Token Usage

Testbdg TokensMCP TokensRatiobdg ScoreMCP Score
Test 1: Basic Error3,6004,8000.75×18/2014/20
Test 2: Multiple Errors18,7009,3002.0×18/2012/20
Test 3: SPA Debugging4,7006,6000.71×14/2013/20
Test 4: Form Validation3,50015,2000.23×15/2013/20
Test 5: Memory Leak7,6003,5002.17×12/208/20
Total38,10039,4000.97×77/10060/100
Similar total tokens, but bdg achieved 28% higher score - tokens spent more effectively

Token Efficiency Score (TES)

TES = (Score × 100) / (Tokens / 1000)

bdg: (77 × 100) / 38.1 = 202.1
MCP: (60 × 100) / 39.4 = 152.3

Advantage: +33% for CLI

Real-World Token Examples

Amazon product page:
  • MCP: 52,000 tokens (single snapshot, truncated at system limit)
  • bdg: 1,200 tokens (two targeted queries)
  • 43× more efficient
Form with 195-option dropdown:
  • MCP: ~5,000 tokens per interaction (full accessibility tree)
  • bdg: ~50 tokens per interaction (targeted selector)
  • 100× more efficient

When to Use Each Tool

Complex Debugging

Detailed error analysis with full stack traces

Memory Profiling

Heap analysis and memory leak detection

Automated Testing

Structured JSON output for test automation

Network Analysis

HAR export for detailed request inspection

Batch Operations

JavaScript evaluation for efficient testing

Token Efficiency

Context-constrained agents, cost optimization

Key Findings

CLI’s selective query approach (fetch what you need) vs MCP’s bulk dumps (dump everything) resulted in 33% better token efficiency while achieving higher scores.Example: 195-option dropdown consumed 50 tokens with bdg vs 5,000 tokens with MCP on every interaction.
Memory profiling, HAR export, and batch JavaScript execution are core debugging workflows, not edge features. bdg’s 300+ CDP method access vs MCP’s limited tool set was decisive.
bdg’s output pipes naturally with jq, grep, sort - enabling transformations and filtering without additional implementation. MCP responses require internal parsing.
Agents can estimate bdg token cost before calling (proportional to results, bounded by limits). MCP’s snapshots are unpredictable (5K-52K tokens).
MCP was 27% faster but provided less actionable information. bdg’s extra time bought comprehensive data that saves developer investigation time.

Conclusion

For developer debugging workflows, CLI provides more capability with better efficiency:
  • +28% higher score (77 vs 60)
  • +33% better token efficiency (202.1 vs 152.3 TES)
  • Capabilities MCP lacks: Memory profiling, HAR export, batch operations
  • Token efficiency wins: Form testing (4.3× better), selective queries (43× better)
The margin wasn’t close. CLI completed tasks that MCP structurally couldn’t (memory profiling), and did shared tasks with less overhead (selective queries vs full dumps).
This doesn’t mean MCP is “bad” - it optimizes for different constraints (ecosystem integration, sandboxing). For power-user debugging, CLI is superior.

Full Benchmark Documentation

Benchmark Article

Complete MCP vs CLI analysis with detailed explanations

Benchmark Results

Raw benchmark data and test-by-test breakdowns

Next Steps

Overview

Why bdg is built for AI agents

Discovery Pattern

How agents discover capabilities

Error Handling

Semantic exit codes and recovery

Build docs developers (and LLMs) love