Overview
TheBuilderQuery class provides the interface for analyzing agent behavior across runs. It’s designed around the questions Builder needs to answer when improving agents:
- What happened? (summaries, narratives)
- Why did it fail? (failure analysis, decision traces)
- What patterns emerge? (across runs, across nodes)
- What should we change? (suggestions)
Class: BuilderQuery
Constructor
Path to the storage directory containing run data
What Happened?
get_run_summary()
Get a quick summary of a run.The run ID to retrieve
Summary object with status, decision_count, success_rate, etc., or None if not found
get_full_run()
Get the complete run with all decisions.The run ID to retrieve
Complete Run object, or None if not found
list_runs_for_goal()
Get summaries of all runs for a specific goal.The goal ID to query
List of run summaries for this goal
get_recent_failures()
Get recent failed runs.Maximum number of failures to return
List of failed run summaries, most recent first
Why Did It Fail?
analyze_failure()
Deep analysis of why a run failed.The failed run ID to analyze
FailureAnalysis object with:
run_id(str): The run IDfailure_point(str): Where it failedroot_cause(str): Why it faileddecision_chain(list[str]): Decisions leading to failureproblems(list[str]): Reported problemssuggestions(list[str]): Improvement suggestions
get_decision_trace()
Get a readable trace of all decisions in a run.The run ID to trace
List of human-readable decision summaries
What Patterns Emerge?
find_patterns()
Find patterns across runs for a goal.The goal ID to analyze
PatternAnalysis object with:
goal_id(str): The goal IDrun_count(int): Number of runs analyzedsuccess_rate(float): Overall success ratecommon_failures(list[tuple[str, int]]): Most common errors and countsproblematic_nodes(list[tuple[str, float]]): Nodes with high failure ratesdecision_patterns(dict): Common decision patterns
compare_runs()
Compare two runs to understand what differed.First run ID
Second run ID
Comparison with:
run_1: First run metricsrun_2: Second run metricsdifferences: List of key differences
get_node_performance()
Get performance metrics for a specific node across all runs.The node ID to analyze
Performance metrics with:
node_id(str): The node IDtotal_decisions(int): Total decisions madesuccess_rate(float): Success rate (0-1)avg_latency_ms(int): Average latencytotal_tokens(int): Total tokens useddecision_type_distribution(dict): Breakdown by decision type
What Should We Change?
suggest_improvements()
Generate improvement suggestions based on run analysis.The goal ID to analyze
List of improvement suggestions, each with:
type(str): Type of suggestion (node_improvement, error_handling, architecture)target(str): What to improve (node_id, error message, goal_id)reason(str): Why this needs improvementrecommendation(str): What to dopriority(str): high, medium, or low