SKILL.md) that acts as a table of contents, plus a set of focused reference files that load progressively. This page explains why the architecture is designed this way and how each piece works.
File structure
A skill is a directory. The layout is always two levels deep:SKILL.md → reference file, and no deeper. Nested references cause Claude to partially read files and miss information.
SKILL.md
The map. Workflow steps, self-review checklist, golden rules, and reference file index. Always loaded on trigger. Kept under 100 lines.
Reference files
The details. Code patterns, templates, step instructions, failure diagnosis tables, examples. Loaded only when Claude needs them. Under 500 lines each.
Golden rules
Hard mechanical rules specific to the skill’s domain. Encoded in
SKILL.md so they’re visible on every invocation. Prevent output drift between runs.SKILL.md as a map
SKILL.md has exactly four sections:
- Title and one-line purpose — one sentence describing what the skill builds and how
- Workflow — numbered steps, one line each, each pointing to a reference file
- Self-review checklist — objectively verifiable conditions to check before delivering
- Reference file index — a table linking every reference file with a one-line summary
SKILL.md contains no code blocks and no multi-paragraph explanations. If content requires more than one line, it belongs in a reference file. This keeps the initial context load small and focused.
| Content type | Where it goes |
|---|---|
| Workflow steps (one line each) | SKILL.md |
| Self-review checklist | SKILL.md |
| Golden rules | SKILL.md |
| Reference file index | SKILL.md |
| Code patterns and templates | Reference file |
| Detailed step instructions | Reference file |
| Examples and samples | Reference file |
| Testing methodology | Reference file |
| Failure diagnosis tables | Reference file |
| Domain-specific reference | Reference file |
YAML frontmatter
Every skill begins with YAML frontmatter that defines how Claude discovers and invokes it:| Field | Purpose |
|---|---|
name | The slash command identifier. Lowercase, hyphens only, max 64 characters. |
description | How Claude discovers the skill. Written in third person. Used as a search index. |
argument-hint | Displayed as a hint when the user types the slash command. |
description field is the most consequential. Claude selects which skill to load based on description alone — before reading any other part of the file. A description that omits the keywords a user would naturally say will never trigger.
Formula for descriptions: [What it does]. Use when [trigger conditions].
Reference files
Each reference file covers exactly one concern. The filename is itself a signal — Claude uses it to decide whether to read the file at all.Progressive disclosure
Skills are designed to load only the context needed at each phase:| Phase | What Claude loads | Token cost |
|---|---|---|
| Startup | name + description from every installed skill | ~100 tokens per skill |
| Trigger | Full SKILL.md body | The full file |
| As-needed | Individual reference files, one at a time | Only when read |
Golden rules
Golden rules are hard mechanical rules specific to a skill’s domain. They appear inSKILL.md so they’re visible on every invocation — not buried in a reference file that might not be read.
Properties of effective golden rules:
- Imperative voice: “Never”, “Always”, “Must”, “Do not” — not “Consider”, “Try to”, “Prefer”
- Mechanical: an agent can follow the rule without exercising judgment
- Domain-specific: each rule prevents a failure mode identified during the design phase
- Count: 3–8 rules per skill — fewer means insufficient guardrails, more means the skill is overspecified
| Failure mode | Golden rule |
|---|---|
Agent puts all content in SKILL.md | ”SKILL.md is a map. If you’re writing a code block in SKILL.md, it belongs in a reference file.” |
| Agent writes vague descriptions | ”Description is discovery. If the description doesn’t contain the keywords a user would say, the skill won’t trigger.” |
| Agent skips testing | ”Every skill must have at least one feedback loop: do → check → fix.” |
| Output differs between runs | ”Replace every adjective with a specification.” |
Self-review checklists
Every skill includes a self-review checklist — a list of objectively verifiable conditions the agent checks after completing the workflow. This is the primary feedback loop mechanism. Effective checklist items are concrete and binary:Harness engineering principles
The architecture above is an expression of harness engineering — the practice of encoding constraints, conventions, and feedback loops into skill files rather than relying on the agent’s general judgment. Five core principles:- Map, not manual.
SKILL.mdis a table of contents. Details live in reference files. Agents navigate to what they need. - Concrete beats abstract. Every quality standard is a specification, not an adjective. “Functions under 30 lines” is a standard. “Clean code” is not.
- Feedback loops are the product. A skill without a verification step is a suggestion. Every skill must encode at least one do → check → fix cycle.
- Rules promote to code. When a documented instruction keeps being violated, encode it as a validation function or linter — not a stronger-worded paragraph. Executable rules enforce themselves.
- If it’s not in the files, it doesn’t exist. The agent can only see what’s in the skill directory. Every constraint, convention, and pattern must be written down or it will be ignored.