Overview
Thesam eval command runs evaluation test suites to measure and validate the performance of your AI agents. This is useful for regression testing, quality assurance, and continuous improvement of agent responses.
Syntax
Arguments
Path to the evaluation test suite configuration file (YAML format).Example:
path/to/evaluation_suite.yamlOptions
Enable verbose output to see detailed evaluation progress and results.
Show help message and exit.
Description
The evaluation command:- Loads the test suite configuration from the specified YAML file
- Runs each test case against your agent mesh
- Compares actual responses against expected outputs
- Generates evaluation metrics and reports
configs/logging_config.yaml if it exists in your project root.
Test Suite Configuration
An evaluation test suite YAML file defines test cases with expected inputs and outputs. Example structure:test_suite.yaml
The exact schema for test suite configuration files is defined in the
evaluation module. Refer to evaluation/run.py for the complete specification.Examples
Run basic evaluation
Run with verbose output
- Each test case execution
- Agent responses
- Scoring metrics
- Performance timing
Organize evaluation suites
Evaluation Metrics
The evaluation framework can measure:- Response accuracy: Keyword matching, semantic similarity
- Response time: Latency and throughput
- Tool usage: Correct tool selection and execution
- Error handling: Graceful degradation
- Consistency: Similar inputs producing similar outputs
Specific metrics available depend on your test suite configuration and the evaluation framework setup.
Implementation
The command delegates toevaluation.run.main() which orchestrates:
- Test suite loading and validation
- Agent mesh interaction
- Response evaluation
- Results aggregation and reporting
cli/commands/eval_cmd.py, evaluation/run.py
Troubleshooting
Configuration file not found
Configuration file not found
Error: File path does not existSolution: Verify the path to your test suite YAML file is correct. Use absolute paths or paths relative to your current working directory.
Evaluation error
Evaluation error
Error:
An error occurred during evaluation: <error message>Solution: Check that:- Your agent mesh is properly configured
- Required services (broker, LLM) are accessible
- Test suite YAML is valid
- Use
--verboseflag for detailed error information
Missing evaluation module
Missing evaluation module
Error: Cannot import from
evaluation.runSolution: The evaluation framework must be properly installed. Check that all required dependencies are installed in your environment.Best Practices
Version control
Keep test suites in version control alongside your agent configurations to track changes over time.
CI/CD integration
Run evaluations in your CI/CD pipeline to catch regressions before deployment.
Incremental testing
Start with basic test cases and gradually add more complex scenarios as your agents evolve.
Baseline metrics
Establish baseline performance metrics for your agents and monitor for degradation.
See also
- sam run - Run your agent mesh
- sam task - Send individual tasks for testing
- Agent Configuration - Configure agent behavior