Skip to main content

Creating Skills

Skills are the building blocks of Superpowers. A skill is a reference guide for proven techniques, patterns, or tools that helps future Claude instances find and apply effective approaches.
Creating skills IS Test-Driven Development applied to process documentation. You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).

What is a Skill?

Skills are:
  • Reusable techniques, patterns, tools, reference guides
  • Documentation that helps agents discover and apply proven approaches
  • Living documentation that evolves through testing and refinement
Skills are NOT:
  • Narratives about how you solved a problem once
  • One-off solutions to specific problems
  • Project-specific conventions (those belong in CLAUDE.md)

When to Create a Skill

1

Evaluate if a skill is needed

Create a skill when:
  • The technique wasn’t intuitively obvious to you
  • You’d reference this again across projects
  • The pattern applies broadly (not project-specific)
  • Others would benefit from this knowledge
Don’t create a skill for:
  • One-off solutions
  • Standard practices well-documented elsewhere
  • Project-specific conventions
  • Mechanical constraints (if it’s enforceable with regex/validation, automate it—save documentation for judgment calls)
2

Understand the TDD mapping

Before creating a skill, understand that skill creation follows the RED-GREEN-REFACTOR cycle:
TDD ConceptSkill Creation
Test casePressure scenario with subagent
Production codeSkill document (SKILL.md)
Test fails (RED)Agent violates rule without skill (baseline)
Test passes (GREEN)Agent complies with skill present
RefactorClose loopholes while maintaining compliance
The Iron Law: NO SKILL WITHOUT A FAILING TEST FIRST.This applies to NEW skills AND EDITS to existing skills. If you write a skill before testing, delete it and start over. No exceptions.

Skill Types

Superpowers uses three main types of skills:

Technique

Concrete method with steps to follow (e.g., condition-based-waiting, root-cause-tracing)

Pattern

Way of thinking about problems (e.g., flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (e.g., office docs)

Directory Structure

Skills use a flat namespace for easy discovery:
skills/
  skill-name/
    SKILL.md              # Main reference (required)
    supporting-file.*     # Only if needed
Separate files for:
  1. Heavy reference (100+ lines) - API docs, comprehensive syntax
  2. Reusable tools - Scripts, utilities, templates
Keep inline:
  • Principles and concepts
  • Code patterns (< 50 lines)
  • Everything else

SKILL.md Structure

Every skill follows a consistent structure:

Frontmatter

---
name: skill-name-with-hyphens
description: Use when [specific triggering conditions and symptoms]
---
Requirements:
  • Only two fields: name and description
  • Max 1024 characters total
  • name: Letters, numbers, and hyphens only (no parentheses, special chars)
  • description: Third-person, describes ONLY when to use (NOT what it does)

Content Sections

# Skill Name

## Overview
What is this? Core principle in 1-2 sentences.

## When to Use
[Small inline flowchart IF decision non-obvious]

Bullet list with SYMPTOMS and use cases
When NOT to use

## Core Pattern (for techniques/patterns)
Before/after code comparison

## Quick Reference
Table or bullets for scanning common operations

## Implementation
Inline code for simple patterns
Link to file for heavy reference or reusable tools

## Common Mistakes
What goes wrong + fixes

## Real-World Impact (optional)
Concrete results

Claude Search Optimization (CSO)

CSO is critical for discovery - future Claude needs to FIND your skill.
Purpose: Claude reads the description to decide which skills to load for a given task.CRITICAL: Description = When to Use, NOT What the Skill DoesThe description should ONLY describe triggering conditions. Do NOT summarize the skill’s process or workflow.Why this matters: Testing revealed that when a description summarizes the skill’s workflow, Claude may follow the description instead of reading the full skill content.
# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review between tasks

# ❌ BAD: Too much process detail
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# ✅ GOOD: Just triggering conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session

# ✅ GOOD: Triggering conditions only
description: Use when implementing any feature or bugfix, before writing implementation code
Best practices:
  • Start with “Use when…” to focus on triggering conditions
  • Use concrete triggers, symptoms, and situations
  • Describe the problem not language-specific symptoms
  • Keep triggers technology-agnostic unless skill is technology-specific
  • Write in third person (injected into system prompt)
  • NEVER summarize the skill’s process or workflow
Use words Claude would search for:
  • Error messages: “Hook timed out”, “ENOTEMPTY”, “race condition”
  • Symptoms: “flaky”, “hanging”, “zombie”, “pollution”
  • Synonyms: “timeout/hang/freeze”, “cleanup/teardown/afterEach”
  • Tools: Actual commands, library names, file types
Use active voice, verb-first:
  • creating-skills not skill-creation
  • condition-based-waiting not async-test-helpers
Gerunds (-ing) work well for processes:
  • creating-skills, testing-skills, debugging-with-logs
  • Active, describes the action you’re taking
Name by what you DO or core insight:
  • flatten-with-flags > data-structure-refactoring
  • root-cause-tracing > debugging-techniques
Problem: Frequently-referenced skills load into EVERY conversation. Every token counts.Target word counts:
  • Getting-started workflows: under 150 words each
  • Frequently-loaded skills: under 200 words total
  • Other skills: under 500 words (still be concise)
Techniques:Move details to tool help:
# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.
Use cross-references:
# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]

# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.

Code Examples

One excellent example beats many mediocre ones
Choose the most relevant language:
  • Testing techniques → TypeScript/JavaScript
  • System debugging → Shell/Python
  • Data processing → Python
A good example:
  • Is complete and runnable
  • Has clear comments explaining WHY
  • Comes from a real scenario
  • Shows the pattern clearly
  • Is ready to adapt (not a generic template)
Don’t:
  • Implement in 5+ languages
  • Create fill-in-the-blank templates
  • Write contrived examples

File Organization Patterns

Self-Contained

defense-in-depth/
  SKILL.md
When all content fits, no heavy reference needed

With Reusable Tool

condition-based-waiting/
  SKILL.md
  example.ts
When tool is reusable code, not just narrative

With Heavy Reference

pptx/
  SKILL.md
  pptxgenjs.md
  ooxml.md
  scripts/
When reference material is too large for inline

Complete Example: Creating a Simple Skill

Let’s walk through creating a skill for condition-based waiting in tests.
1

RED Phase - Establish Baseline

First, create a pressure scenario to test WITHOUT the skill:Scenario: “Write a test that waits for an async operation to complete”Run with a subagent WITHOUT the skill. Document what happens:
  • Agent uses setTimeout with arbitrary delays
  • Tests are flaky (pass/fail inconsistently)
  • Agent doesn’t understand the race condition
Capture the exact rationalizations:
  • “100ms should be enough time”
  • “Adding a longer timeout will fix it”
  • “It works on my machine”
2

GREEN Phase - Write Minimal Skill

Create skills/condition-based-waiting/SKILL.md:
---
name: condition-based-waiting
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
---

# Condition-Based Waiting

## Overview

Wait for conditions, not arbitrary timeouts. Tests should wait for the actual state change, not guess how long it takes.

## When to Use

**Use when:**
- Tests are flaky
- Using setTimeout or sleep in tests
- "Works locally but fails in CI"
- Race conditions in async code

**Don't use for:**
- Synchronous operations
- Intentional delays (debounce testing)

## Core Pattern

**Before (flaky):**
```typescript
test('updates user', async () => {
  updateUser({ name: 'Alice' });
  await sleep(100); // Hope it's done
  expect(getUser().name).toBe('Alice');
});
After (reliable):
test('updates user', async () => {
  updateUser({ name: 'Alice' });
  await waitFor(() => getUser().name === 'Alice');
  expect(getUser().name).toBe('Alice');
});

Quick Reference

FrameworkHelper
JestwaitFor(() => condition)
VitestwaitFor(() => condition)
Testing LibrarywaitFor(() => expect(...))

Common Mistakes

Mistake: Increasing timeout duration
await sleep(5000); // Longer = more flaky
Fix: Wait for condition
await waitFor(() => condition, { timeout: 5000 });
Run the same scenario WITH this skill. Agent should now use condition-based waiting.
3

REFACTOR Phase - Close Loopholes

Test again and capture new rationalizations:
  • “But I need a small timeout for debouncing” → Add to “When NOT to use”
  • “waitFor is slower” → Add to Common Mistakes with performance note
Update the skill to address each rationalization explicitly.

Real-World Skills to Study

Examine these skills from the Superpowers repository:

test-driven-development

Excellent example of a discipline-enforcing skill with comprehensive rationalization table

brainstorming

Clean example of a process skill with clear flowchart

systematic-debugging

Pattern skill showing how to structure complex workflows

writing-plans

Technique skill with comprehensive reference material

Skill Creation Checklist

Use this checklist for every skill (see Testing Skills for testing details): RED Phase - Write Failing Test:
  • Create pressure scenarios (3+ combined pressures for discipline skills)
  • Run scenarios WITHOUT skill - document baseline behavior verbatim
  • Identify patterns in rationalizations/failures
GREEN Phase - Write Minimal Skill:
  • Name uses only letters, numbers, hyphens (no parentheses/special chars)
  • YAML frontmatter with only name and description (max 1024 chars)
  • Description starts with “Use when…” and includes specific triggers/symptoms
  • Description written in third person
  • Keywords throughout for search (errors, symptoms, tools)
  • Clear overview with core principle
  • Address specific baseline failures identified in RED
  • Code inline OR link to separate file
  • One excellent example (not multi-language)
  • Run scenarios WITH skill - verify agents now comply
REFACTOR Phase - Close Loopholes:
  • Identify NEW rationalizations from testing
  • Add explicit counters (if discipline skill)
  • Build rationalization table from all test iterations
  • Create red flags list
  • Re-test until bulletproof
Quality Checks:
  • Small flowchart only if decision non-obvious
  • Quick reference table
  • Common mistakes section
  • No narrative storytelling
  • Supporting files only for tools or heavy reference
Deployment:
  • Commit skill to git and push to your fork (if configured)
  • Consider contributing back via PR (if broadly useful)

Anti-Patterns to Avoid

Common mistakes when creating skills:Narrative Example: “In session 2025-10-03, we found empty projectDir caused…”
  • Too specific, not reusable
Multi-Language Dilution: example-js.js, example-py.py, example-go.go
  • Mediocre quality, maintenance burden
Code in Flowcharts:
step1 [label="import fs"];
step2 [label="read file"];
  • Can’t copy-paste, hard to read
Generic Labels: helper1, helper2, step3, pattern4
  • Labels should have semantic meaning

Next Steps

Once you’ve created your skill:
  1. Test it thoroughly - See Testing Skills
  2. Contribute it back - See Contributing
  3. Iterate based on feedback - Skills are living documentation

Additional Resources

  • Full skill creation guide: skills/writing-skills/SKILL.md in the source repository
  • Anthropic’s official best practices: anthropic-best-practices.md
  • Testing methodology: Testing Skills

Build docs developers (and LLMs) love