Skip to main content
Skill Lab evaluates your Agent Skills through 28 automated checks across 4 quality dimensions. Static analysis validates skill structure, naming conventions, descriptions, and content quality — providing a quality score and detailed feedback.

Overview

Static checks validate:
  • Structure (7 checks): File existence, YAML frontmatter, scripts/references validation
  • Schema (9 checks): Frontmatter fields (name, description, license, compatibility, etc.)
  • Naming (1 check): Directory name matches skill name
  • Content (11 checks): Examples, token budgets, actionable descriptions, asset paths
Each skill receives a 0-100 quality score based on weighted check results.

Basic Usage

1

Evaluate a skill

Run static analysis on a skill directory:
sklab evaluate ./my-skill
Or evaluate the current directory:
sklab evaluate
2

Review the report

The console output shows:
  • Quality score (0-100)
  • Overall pass/fail status
  • Failed checks with severity levels
  • Summary by dimension
╭─────────────── Skill Lab Evaluation ───────────────╮
│ Skill: my-skill                                     │
│ Path: /path/to/my-skill                             │
╰─────────────────────────────────────────────────────╯

Quality Score: 87.5/100
Status: PASS
Checks: 26/28 passed
Duration: 45.3ms

                     Failed Checks
┌────────┬──────────┬────────────────────┬──────────────┐
│ Status │ Severity │ Check              │ Message      │
├────────┼──────────┼────────────────────┼──────────────┤
│ !      │ WARNING  │ content.examples   │ No examples  │
│ i      │ INFO     │ content.line-limit │ 152 lines    │
└────────┴──────────┴────────────────────┴──────────────┘

Command Options

Verbose Mode

Show all checks, including passing ones:
sklab evaluate ./my-skill --verbose
By default, only failed checks are shown. Use --verbose (or -V) to see the full report.

Spec-Only Mode

Run only the 10 spec-required checks (skip 18 quality suggestions):
sklab evaluate ./my-skill --spec-only
Use the -s or --spec-only flag when you only care about Agent Skills spec compliance.
Spec-required checks use ERROR severity and must pass for overall_pass: true. Quality suggestions use WARNING or INFO severity.

JSON Output

Generate machine-readable JSON output:
# Print to stdout
sklab evaluate ./my-skill --format json

# Save to file
sklab evaluate ./my-skill --format json --output report.json
{
  "skill_path": "/path/to/my-skill",
  "skill_name": "my-skill",
  "timestamp": "2026-03-03T14:30:00Z",
  "duration_ms": 45.3,
  "quality_score": 87.5,
  "overall_pass": true,
  "checks_run": 28,
  "checks_passed": 26,
  "checks_failed": 2,
  "results": [
    {
      "check_id": "structure.skill-md-exists",
      "check_name": "SKILL.md Exists",
      "passed": true,
      "severity": "error",
      "dimension": "structure",
      "message": "SKILL.md found"
    },
    {
      "check_id": "content.examples",
      "check_name": "Has Examples",
      "passed": false,
      "severity": "warning",
      "dimension": "content",
      "message": "Skill body contains no example blocks"
    }
  ],
  "summary": {
    "by_severity": {
      "error": {"passed": 10, "failed": 0},
      "warning": {"passed": 12, "failed": 1},
      "info": {"passed": 4, "failed": 1}
    },
    "by_dimension": {
      "structure": {"passed": 7, "failed": 0},
      "description": {"passed": 9, "failed": 0},
      "naming": {"passed": 1, "failed": 0},
      "content": {"passed": 9, "failed": 2}
    }
  }
}

Quick Validation

For CI/CD pipelines, use the validate command for pass/fail validation:
sklab validate ./my-skill
This command:
  • Returns exit code 0 if all ERROR-level checks pass
  • Returns exit code 1 if any ERROR-level check fails
  • Only prints ERROR-level failures (no quality suggestions)
Example: CI Script
#!/bin/bash
if sklab validate ./my-skill; then
  echo "Skill validation passed"
  exit 0
else
  echo "Skill validation failed"
  exit 1
fi

List Available Checks

View all 28 available checks:
# All checks
sklab list-checks

# Filter by dimension
sklab list-checks --dimension structure
sklab list-checks --dimension content

# Only spec-required checks (10 total)
sklab list-checks --spec-only

# Only quality suggestions (18 total)
sklab list-checks --suggestions-only
Output shows check ID, name, dimension, severity, and whether it’s spec-required:
                        Available Checks
┌────────────────────────┬──────────────┬───────────┬──────────┬──────┐
│ Check ID               │ Name         │ Dimension │ Severity │ Spec │
├────────────────────────┼──────────────┼───────────┼──────────┼──────┤
│ structure.skill-md     │ SKILL.md ... │ structure │ ERROR    │ Yes  │
│ content.examples       │ Has Examples │ content   │ WARNING  │ No   │
└────────────────────────┴──────────────┴───────────┴──────────┴──────┘

Total: 28 checks (10 spec-required, 18 quality suggestions)

Understanding Severity Levels

SeverityWeightMeaningImpact
ERROR1.0Spec violationFails overall_pass, blocks deployment
WARNING0.5Quality issueLowers quality score
INFO0.25SuggestionMinimal score impact
A skill with any ERROR-level failures will have overall_pass: false and return exit code 1. Fix all ERROR-level issues before deploying skills.

Quality Score Calculation

The quality score (0-100) is calculated using:
  1. Dimension weights: Structure (30%), Description (25%), Content (25%), Naming (20%)
  2. Severity weights: ERROR (1.0), WARNING (0.5), INFO (0.25)
  3. Weighted pass rate per dimension
  4. Final score: Weighted average across dimensions
Quality Score Ranges:
  90-100: Excellent
  80-89:  Good
  60-79:  Fair
  0-59:   Needs improvement

Integration Examples

name: Validate Skills
on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install skill-lab
      - name: Validate skill
        run: sklab validate ./my-skill --spec-only

Next Steps

Trigger Testing

Test if your skill activates correctly with different prompt types

Test Generation

Auto-generate trigger tests using LLMs

Build docs developers (and LLMs) love