Overview
Trigger testing uses 4 prompt types to ensure robust skill activation:| Type | Description | Example |
|---|---|---|
| explicit | Skill named directly with $ prefix | $my-skill do something |
| implicit | Describes the scenario without naming skill | I need to run the test suite |
| contextual | Realistic noisy prompt with domain context | This React app needs better test coverage. Can you help? |
| negative | Should NOT trigger (catches false positives) | How do I install Python? |
Trigger testing requires the Claude CLI (
npm install -g @anthropic-ai/claude-code) or Codex CLI. Both runtimes are supported.Prerequisites
Create or generate tests
You need trigger test definitions in
.skill-lab/tests/triggers.yaml.Either create them manually or use LLM-powered test generation:Test Definition Format
Trigger tests are defined in.skill-lab/tests/triggers.yaml:
triggers.yaml
Required Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique test identifier (e.g., explicit-1) |
type | enum | Trigger type: explicit, implicit, contextual, or negative |
prompt | string | The prompt to send to the agent |
expected | enum | Expected outcome: trigger or no_trigger |
Optional Fields
| Field | Type | Description |
|---|---|---|
name | string | Human-readable test name (defaults to id) |
Running Trigger Tests
Basic Usage
- Load test cases from
.skill-lab/tests/triggers.yaml - Execute each prompt through the Claude runtime
- Analyze execution traces for skill invocations
- Report pass/fail for each test
Example Output
Filter by Trigger Type
Run only tests of a specific type:explicit, implicit, contextual, negative
JSON Output
Generate machine-readable JSON output:How It Works
Test execution
For each test case, Skill Lab:
- Launches the Claude CLI with the test prompt
- Records the execution trace to
.skill-lab/traces/{test-id}.jsonl - Captures the agent’s tool calls and skill invocations
Trace analysis
The trace analyzer examines the JSONL trace file:
- Scans for skill invocation events
- Identifies which skills were loaded
- Tracks the order of tool calls
Best Practices
Write Diverse Test Cases
Cover all 4 trigger types for comprehensive validation:- Explicit (3+ tests): Direct invocations with variations
- Implicit (3+ tests): Scenario descriptions without naming the skill
- Contextual (3+ tests): Realistic prompts with noise and domain context
- Negative (4+ tests): Adjacent tasks that should NOT trigger
Test Edge Cases
Include negative tests for:- Similar domains but different tasks
- Ambiguous prompts that could match multiple skills
- Common questions unrelated to your skill’s purpose
Example: Testing Boundaries
Optimize Test Runtime
Skill Lab automatically stops execution when the expected skill is triggered (for positive tests). This reduces runtime and API costs.Interpreting Results
Pass Rates by Type
| Pass Rate | Quality Level | Action |
|---|---|---|
| 100% | Excellent | Ready for production |
| 80-99% | Good | Review failed tests, improve activation logic |
| 60-79% | Fair | Significant false positives/negatives |
| < 60% | Poor | Skill trigger logic needs redesign |
Common Failure Patterns
Implicit tests failing: Skill description may be too narrow or unclear- Update skill description to cover implicit scenarios
- Add more examples to SKILL.md
- Refine skill description to be more specific
- Add explicit prerequisites or constraints
- Add domain keywords to skill description
- Include contextual examples in SKILL.md
CI/CD Integration
Next Steps
Test Generation
Auto-generate trigger tests using LLMs instead of writing them manually
Static Analysis
Validate skill structure and content quality