Creating Skills

Skills are the building blocks of Superpowers. A skill is a reference guide for proven techniques, patterns, or tools that helps future Claude instances find and apply effective approaches.

Creating skills IS Test-Driven Development applied to process documentation. You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).

What is a Skill?

Skills are:

Reusable techniques, patterns, tools, reference guides
Documentation that helps agents discover and apply proven approaches
Living documentation that evolves through testing and refinement

Skills are NOT:

Narratives about how you solved a problem once
One-off solutions to specific problems
Project-specific conventions (those belong in CLAUDE.md)

When to Create a Skill

Evaluate if a skill is needed

Create a skill when:

The technique wasn’t intuitively obvious to you
You’d reference this again across projects
The pattern applies broadly (not project-specific)
Others would benefit from this knowledge

Don’t create a skill for:

One-off solutions
Standard practices well-documented elsewhere
Project-specific conventions
Mechanical constraints (if it’s enforceable with regex/validation, automate it—save documentation for judgment calls)

Understand the TDD mapping

Before creating a skill, understand that skill creation follows the RED-GREEN-REFACTOR cycle:

TDD Concept	Skill Creation
Test case	Pressure scenario with subagent
Production code	Skill document (SKILL.md)
Test fails (RED)	Agent violates rule without skill (baseline)
Test passes (GREEN)	Agent complies with skill present
Refactor	Close loopholes while maintaining compliance

The Iron Law: NO SKILL WITHOUT A FAILING TEST FIRST.This applies to NEW skills AND EDITS to existing skills. If you write a skill before testing, delete it and start over. No exceptions.

Skill Types

Superpowers uses three main types of skills:

Technique

Concrete method with steps to follow (e.g., condition-based-waiting, root-cause-tracing)

Pattern

Way of thinking about problems (e.g., flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (e.g., office docs)

Directory Structure

Skills use a flat namespace for easy discovery:

skills/
  skill-name/
    SKILL.md              # Main reference (required)
    supporting-file.*     # Only if needed

Separate files for:

Heavy reference (100+ lines) - API docs, comprehensive syntax
Reusable tools - Scripts, utilities, templates

Keep inline:

Principles and concepts
Code patterns (< 50 lines)
Everything else

SKILL.md Structure

Every skill follows a consistent structure:

Frontmatter

---
name: skill-name-with-hyphens
description: Use when [specific triggering conditions and symptoms]
---

Requirements:

Only two fields: name and description
Max 1024 characters total
name: Letters, numbers, and hyphens only (no parentheses, special chars)
description: Third-person, describes ONLY when to use (NOT what it does)

Content Sections

# Skill Name

## Overview
What is this? Core principle in 1-2 sentences.

## When to Use
[Small inline flowchart IF decision non-obvious]

Bullet list with SYMPTOMS and use cases
When NOT to use

## Core Pattern (for techniques/patterns)
Before/after code comparison

## Quick Reference
Table or bullets for scanning common operations

## Implementation
Inline code for simple patterns
Link to file for heavy reference or reusable tools

## Common Mistakes
What goes wrong + fixes

## Real-World Impact (optional)
Concrete results

Claude Search Optimization (CSO)

CSO is critical for discovery - future Claude needs to FIND your skill.

1. Rich Description Field

Purpose: Claude reads the description to decide which skills to load for a given task.CRITICAL: Description = When to Use, NOT What the Skill DoesThe description should ONLY describe triggering conditions. Do NOT summarize the skill’s process or workflow.Why this matters: Testing revealed that when a description summarizes the skill’s workflow, Claude may follow the description instead of reading the full skill content.

# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review between tasks

# ❌ BAD: Too much process detail
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# ✅ GOOD: Just triggering conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session

# ✅ GOOD: Triggering conditions only
description: Use when implementing any feature or bugfix, before writing implementation code

Best practices:

Start with “Use when…” to focus on triggering conditions
Use concrete triggers, symptoms, and situations
Describe the problem not language-specific symptoms
Keep triggers technology-agnostic unless skill is technology-specific
Write in third person (injected into system prompt)
NEVER summarize the skill’s process or workflow

2. Keyword Coverage

Use words Claude would search for:

Error messages: “Hook timed out”, “ENOTEMPTY”, “race condition”
Symptoms: “flaky”, “hanging”, “zombie”, “pollution”
Synonyms: “timeout/hang/freeze”, “cleanup/teardown/afterEach”
Tools: Actual commands, library names, file types

3. Descriptive Naming

Use active voice, verb-first:

✅ creating-skills not skill-creation
✅ condition-based-waiting not async-test-helpers

Gerunds (-ing) work well for processes:

creating-skills, testing-skills, debugging-with-logs
Active, describes the action you’re taking

Name by what you DO or core insight:

✅ flatten-with-flags > data-structure-refactoring
✅ root-cause-tracing > debugging-techniques

4. Token Efficiency

Problem: Frequently-referenced skills load into EVERY conversation. Every token counts.Target word counts:

Getting-started workflows: under 150 words each
Frequently-loaded skills: under 200 words total
Other skills: under 500 words (still be concise)

Techniques:Move details to tool help:

# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.

Use cross-references:

# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]

# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.

Code Examples

One excellent example beats many mediocre ones

Choose the most relevant language:

Testing techniques → TypeScript/JavaScript
System debugging → Shell/Python
Data processing → Python

A good example:

Is complete and runnable
Has clear comments explaining WHY
Comes from a real scenario
Shows the pattern clearly
Is ready to adapt (not a generic template)

Don’t:

Implement in 5+ languages
Create fill-in-the-blank templates
Write contrived examples

File Organization Patterns

Self-Contained

defense-in-depth/
  SKILL.md

When all content fits, no heavy reference needed

With Reusable Tool

condition-based-waiting/
  SKILL.md
  example.ts

When tool is reusable code, not just narrative

With Heavy Reference

pptx/
  SKILL.md
  pptxgenjs.md
  ooxml.md
  scripts/

When reference material is too large for inline

Complete Example: Creating a Simple Skill

Let’s walk through creating a skill for condition-based waiting in tests.

RED Phase - Establish Baseline

First, create a pressure scenario to test WITHOUT the skill:Scenario: “Write a test that waits for an async operation to complete”Run with a subagent WITHOUT the skill. Document what happens:

Agent uses setTimeout with arbitrary delays
Tests are flaky (pass/fail inconsistently)
Agent doesn’t understand the race condition

Capture the exact rationalizations:

“100ms should be enough time”
“Adding a longer timeout will fix it”
“It works on my machine”

GREEN Phase - Write Minimal Skill

Create skills/condition-based-waiting/SKILL.md:

---
name: condition-based-waiting
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
---

# Condition-Based Waiting

## Overview

Wait for conditions, not arbitrary timeouts. Tests should wait for the actual state change, not guess how long it takes.

## When to Use

**Use when:**
- Tests are flaky
- Using setTimeout or sleep in tests
- "Works locally but fails in CI"
- Race conditions in async code

**Don't use for:**
- Synchronous operations
- Intentional delays (debounce testing)

## Core Pattern

**Before (flaky):**
```typescript
test('updates user', async () => {
  updateUser({ name: 'Alice' });
  await sleep(100); // Hope it's done
  expect(getUser().name).toBe('Alice');
});

After (reliable):

test('updates user', async () => {
  updateUser({ name: 'Alice' });
  await waitFor(() => getUser().name === 'Alice');
  expect(getUser().name).toBe('Alice');
});

Quick Reference

Framework	Helper
Jest	`waitFor(() => condition)`
Vitest	`waitFor(() => condition)`
Testing Library	`waitFor(() => expect(...))`

Common Mistakes

Mistake: Increasing timeout duration

await sleep(5000); // Longer = more flaky

Fix: Wait for condition

await waitFor(() => condition, { timeout: 5000 });

Run the same scenario WITH this skill. Agent should now use condition-based waiting.

REFACTOR Phase - Close Loopholes

Test again and capture new rationalizations:

“But I need a small timeout for debouncing” → Add to “When NOT to use”
“waitFor is slower” → Add to Common Mistakes with performance note

Update the skill to address each rationalization explicitly.

Real-World Skills to Study

Examine these skills from the Superpowers repository:

test-driven-development

Excellent example of a discipline-enforcing skill with comprehensive rationalization table

brainstorming

Clean example of a process skill with clear flowchart

systematic-debugging

Pattern skill showing how to structure complex workflows

writing-plans

Technique skill with comprehensive reference material

Skill Creation Checklist

Use this checklist for every skill (see Testing Skills for testing details): RED Phase - Write Failing Test:

Create pressure scenarios (3+ combined pressures for discipline skills)
Run scenarios WITHOUT skill - document baseline behavior verbatim
Identify patterns in rationalizations/failures

GREEN Phase - Write Minimal Skill: REFACTOR Phase - Close Loopholes:

Identify NEW rationalizations from testing
Add explicit counters (if discipline skill)
Build rationalization table from all test iterations
Create red flags list
Re-test until bulletproof

Quality Checks:

Small flowchart only if decision non-obvious
Quick reference table
Common mistakes section
No narrative storytelling
Supporting files only for tools or heavy reference

Deployment:

Commit skill to git and push to your fork (if configured)
Consider contributing back via PR (if broadly useful)

Anti-Patterns to Avoid

Common mistakes when creating skills:❌ Narrative Example: “In session 2025-10-03, we found empty projectDir caused…”

Too specific, not reusable

❌ Multi-Language Dilution: example-js.js, example-py.py, example-go.go

Mediocre quality, maintenance burden

❌ Code in Flowcharts:

step1 [label="import fs"];
step2 [label="read file"];

Can’t copy-paste, hard to read

❌ Generic Labels: helper1, helper2, step3, pattern4

Labels should have semantic meaning

Next Steps

Once you’ve created your skill:

Test it thoroughly - See Testing Skills
Contribute it back - See Contributing
Iterate based on feedback - Skills are living documentation

Additional Resources

Full skill creation guide: skills/writing-skills/SKILL.md in the source repository
Anthropic’s official best practices: anthropic-best-practices.md
Testing methodology: Testing Skills

Testing Skills

⌘I

Contributing

Architecture

Creating Skills

Creating Skills

What is a Skill?

When to Create a Skill

Skill Types

Technique

Pattern

Reference

Directory Structure

SKILL.md Structure

Frontmatter

Content Sections

Claude Search Optimization (CSO)

Code Examples

File Organization Patterns

Self-Contained

With Reusable Tool

With Heavy Reference

Complete Example: Creating a Simple Skill

Quick Reference

Common Mistakes

Real-World Skills to Study

test-driven-development

brainstorming

systematic-debugging

writing-plans

Skill Creation Checklist

Anti-Patterns to Avoid

Next Steps

Additional Resources

Build docs developers (and LLMs) love

Contributing

Architecture

​Creating Skills

​What is a Skill?

​When to Create a Skill

​Skill Types

Technique

Pattern

Reference

​Directory Structure

​SKILL.md Structure

​Frontmatter

​Content Sections

​Claude Search Optimization (CSO)

​Code Examples

​File Organization Patterns

Self-Contained

With Reusable Tool

With Heavy Reference

​Complete Example: Creating a Simple Skill

​Quick Reference

​Common Mistakes

​Real-World Skills to Study

test-driven-development

brainstorming

systematic-debugging

writing-plans

​Skill Creation Checklist

​Anti-Patterns to Avoid

​Next Steps

​Additional Resources

Build docs developers (and LLMs) love

Creating Skills

What is a Skill?

When to Create a Skill

Skill Types

Directory Structure

SKILL.md Structure

Frontmatter

Content Sections

Claude Search Optimization (CSO)

Code Examples

File Organization Patterns

Complete Example: Creating a Simple Skill

Quick Reference

Common Mistakes

Real-World Skills to Study

Skill Creation Checklist

Anti-Patterns to Avoid

Next Steps

Additional Resources