Trigger Testing

Trigger testing validates whether your skill activates when it should (and doesn’t activate when it shouldn’t). Skill Lab runs your trigger test cases through a real agent runtime and analyzes execution traces to verify correct skill invocation.

Overview

Trigger testing uses 4 prompt types to ensure robust skill activation:

Type	Description	Example
explicit	Skill named directly with `$` prefix	`$my-skill do something`
implicit	Describes the scenario without naming skill	`I need to run the test suite`
contextual	Realistic noisy prompt with domain context	`This React app needs better test coverage. Can you help?`
negative	Should NOT trigger (catches false positives)	`How do I install Python?`

Trigger testing requires the Claude CLI (npm install -g @anthropic-ai/claude-code) or Codex CLI. Both runtimes are supported.

Prerequisites

Install Claude CLI

Install the Claude Code CLI tool:

npm install -g @anthropic-ai/claude-code

Verify installation:

claude --version

Create or generate tests

You need trigger test definitions in .skill-lab/tests/triggers.yaml.Either create them manually or use LLM-powered test generation:

sklab generate ./my-skill

Test Definition Format

Trigger tests are defined in .skill-lab/tests/triggers.yaml:

triggers.yaml

skill: my-skill

test_cases:
  - id: explicit-1
    type: explicit
    prompt: "$my-skill run the test suite"
    expected: trigger

  - id: implicit-1
    type: implicit
    prompt: "I need to run all the tests for this project"
    expected: trigger

  - id: contextual-1
    type: contextual
    prompt: "This React app has some failing tests in the CI pipeline. Can you investigate and fix them?"
    expected: trigger

  - id: negative-1
    type: negative
    prompt: "How do I install Python?"
    expected: no_trigger

Required Fields

Field	Type	Description
`id`	string	Unique test identifier (e.g., `explicit-1`)
`type`	enum	Trigger type: `explicit`, `implicit`, `contextual`, or `negative`
`prompt`	string	The prompt to send to the agent
`expected`	enum	Expected outcome: `trigger` or `no_trigger`

Optional Fields

Field	Type	Description
`name`	string	Human-readable test name (defaults to `id`)

Running Trigger Tests

Basic Usage

# Run all trigger tests
sklab trigger ./my-skill

# Run from current directory
sklab trigger

The command will:

Load test cases from .skill-lab/tests/triggers.yaml
Execute each prompt through the Claude runtime
Analyze execution traces for skill invocations
Report pass/fail for each test

Example Output

Trigger Test Report: my-skill
Runtime: claude │ Duration: 12.3s │ 11/13 passed

╭────────────────────────────────────────────────────────╮
│ Test                              Type        Status  │
├───────────────────────────────────────────────────────┤
│ Direct invocation                 explicit    ✓       │
│ Skill name with context           explicit    ✓       │
│ $ prefix invocation               explicit    ✓       │
│ Describe test scenario            implicit    ✓       │
│ Request without naming            implicit    ✓       │
│ Problem statement                 implicit    ✗       │
│ Realistic noisy prompt            contextual  ✓       │
│ Domain-specific context           contextual  ✓       │
│ Multi-step workflow               contextual  ✓       │
│ Unrelated question                negative    ✓       │
│ Similar domain, wrong task        negative    ✓       │
│ Different skill's territory       negative    ✗       │
│ Edge case confusion               negative    ✓       │
╰────────────────────────────────────────────────────────╯

By type: explicit: 3/3 (100%) │ implicit: 2/3 (67%) │ contextual: 3/3 (100%) │ negative: 3/4 (75%)

Filter by Trigger Type

Run only tests of a specific type:

# Test only explicit triggers
sklab trigger --type explicit

# Test only negative cases (false positive detection)
sklab trigger --type negative

# Test implicit activation
sklab trigger --type implicit

Valid types: explicit, implicit, contextual, negative

JSON Output

Generate machine-readable JSON output:

# Print to stdout
sklab trigger --format json

# Save to file
sklab trigger --output report.json --format json

{
  "skill_path": "/path/to/my-skill",
  "skill_name": "my-skill",
  "timestamp": "2026-03-03T14:30:00Z",
  "duration_ms": 12345.6,
  "runtime": "claude",
  "tests_run": 13,
  "tests_passed": 11,
  "tests_failed": 2,
  "overall_pass": false,
  "pass_rate": 0.846,
  "results": [
    {
      "test_id": "explicit-1",
      "test_name": "Direct invocation",
      "trigger_type": "explicit",
      "passed": true,
      "skill_triggered": true,
      "expected_trigger": true,
      "message": "Test passed: Direct invocation",
      "trace_path": "/path/to/.skill-lab/traces/explicit-1.jsonl",
      "events_count": 42,
      "exit_code": 0
    }
  ],
  "summary_by_type": {
    "explicit": {"passed": 3, "failed": 0, "total": 3},
    "implicit": {"passed": 2, "failed": 1, "total": 3},
    "contextual": {"passed": 3, "failed": 0, "total": 3},
    "negative": {"passed": 3, "failed": 1, "total": 4}
  }
}

How It Works

Test execution

For each test case, Skill Lab:

Launches the Claude CLI with the test prompt
Records the execution trace to .skill-lab/traces/{test-id}.jsonl
Captures the agent’s tool calls and skill invocations

Trace analysis

The trace analyzer examines the JSONL trace file:

Scans for skill invocation events
Identifies which skills were loaded
Tracks the order of tool calls

Result validation

Compares actual behavior against expected outcome:

Positive tests (expected: trigger): Skill must be invoked
Negative tests (expected: no_trigger): Skill must NOT be invoked

Execution traces are saved to .skill-lab/traces/ for debugging. If a test fails, examine the trace file to understand why the skill was (or wasn’t) triggered.

Best Practices

Write Diverse Test Cases

Cover all 4 trigger types for comprehensive validation:

Explicit (3+ tests): Direct invocations with variations
Implicit (3+ tests): Scenario descriptions without naming the skill
Contextual (3+ tests): Realistic prompts with noise and domain context
Negative (4+ tests): Adjacent tasks that should NOT trigger

Test Edge Cases

Include negative tests for:

Similar domains but different tasks
Ambiguous prompts that could match multiple skills
Common questions unrelated to your skill’s purpose

Example: Testing Boundaries

# Good negative test: similar domain, wrong task
- id: negative-boundary-1
  type: negative
  prompt: "I need to deploy the application" # deployment, not testing
  expected: no_trigger

# Good negative test: unrelated domain
- id: negative-unrelated-1
  type: negative
  prompt: "How do I configure my database?"
  expected: no_trigger

Optimize Test Runtime

Skill Lab automatically stops execution when the expected skill is triggered (for positive tests). This reduces runtime and API costs.

Trigger testing incurs LLM API costs. Each test runs a full agent session. Use --type filters during development to test specific trigger types.

Interpreting Results

Pass Rates by Type

Pass Rate	Quality Level	Action
100%	Excellent	Ready for production
80-99%	Good	Review failed tests, improve activation logic
60-79%	Fair	Significant false positives/negatives
< 60%	Poor	Skill trigger logic needs redesign

Common Failure Patterns

Implicit tests failing: Skill description may be too narrow or unclear

Update skill description to cover implicit scenarios
Add more examples to SKILL.md

Negative tests failing: Skill is over-triggering (false positives)

Refine skill description to be more specific
Add explicit prerequisites or constraints

Contextual tests failing: Skill doesn’t handle noisy, realistic prompts

Add domain keywords to skill description
Include contextual examples in SKILL.md

CI/CD Integration

name: Trigger Tests
on: [push, pull_request]

jobs:
  trigger-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          npm install -g @anthropic-ai/claude-code
          pip install skill-lab
      
      - name: Run trigger tests
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: sklab trigger ./my-skill --format json --output results.json
      
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: trigger-test-results
          path: results.json

Get Started

Guides

Core Concepts

Development

Overview

Prerequisites

Test Definition Format

Required Fields

Optional Fields

Running Trigger Tests

Basic Usage

Example Output

Filter by Trigger Type

JSON Output

How It Works

Best Practices

Write Diverse Test Cases

Test Edge Cases

Optimize Test Runtime

Interpreting Results

Pass Rates by Type

Common Failure Patterns

CI/CD Integration

Next Steps

Test Generation

Static Analysis

Build docs developers (and LLMs) love

Get Started

Guides

Core Concepts

Development

​Overview

​Prerequisites

​Test Definition Format

​Required Fields

​Optional Fields

​Running Trigger Tests

​Basic Usage

​Example Output

​Filter by Trigger Type

​JSON Output

​How It Works

​Best Practices

​Write Diverse Test Cases

​Test Edge Cases

​Optimize Test Runtime

​Interpreting Results

​Pass Rates by Type

​Common Failure Patterns

​CI/CD Integration

​Next Steps

Test Generation

Static Analysis

Build docs developers (and LLMs) love

Overview

Prerequisites

Test Definition Format

Required Fields

Optional Fields

Running Trigger Tests

Basic Usage

Example Output

Filter by Trigger Type

JSON Output

How It Works

Best Practices

Write Diverse Test Cases

Test Edge Cases

Optimize Test Runtime

Interpreting Results

Pass Rates by Type

Common Failure Patterns

CI/CD Integration

Next Steps