Skip to main content

Best Practices for Skill Development

These best practices are drawn from Anthropic’s production skills and the skill-creator skill that helps build new skills.

Writing Effective Instructions

Explain the Why, Not Just the What

Modern LLMs have strong theory of mind and perform better when they understand the reasoning behind instructions.
# Report Structure

ALWAYS use this exact template:
# [Title]
## Executive summary  
## Key findings
## Recommendations

NEVER deviate from this structure.
From the skill-creator skill:
“Try to explain to the model why things are important in lieu of heavy-handed musty MUSTs. Use theory of mind and try to make the skill general and not super-narrow to specific examples.”

Use Imperative Form

Write instructions as direct commands for clarity:
You should start by reading the input file.
Then you'll want to parse the data.
After that, validation would be good.

Provide Concrete Examples

Examples clarify expectations better than abstract descriptions:
## Commit Message Format

Follow the conventional commits standard.

**Example 1:**
Input: Added user authentication with JWT tokens
Output: feat(auth): implement JWT-based authentication

**Example 2:**  
Input: Fixed bug where dates were off by one day
Output: fix(calendar): correct timezone offset calculation

**Example 3:**
Input: Updated README with installation steps
Output: docs(readme): add installation instructions

Skill Description Best Practices

The description field is your skill’s primary triggering mechanism. Write it carefully.

Include Both What and When

From skill-creator:
“Include both what the skill does AND specific contexts for when to use it. All ‘when to use’ info goes here, not in the body.”
---
name: mcp-builder
description: Guide for creating MCP servers
---

Be Slightly “Pushy”

From skill-creator:
“Currently Claude has a tendency to ‘undertrigger’ skills — to not use them when they’d be useful. To combat this, please make the skill descriptions a little bit ‘pushy’.”
Example transformation: Before:
description: How to build a simple fast dashboard to display internal data.
After:
description: How to build a simple fast dashboard to display internal Anthropic data. Make sure to use this skill whenever the user mentions dashboards, data visualization, internal metrics, or wants to display any kind of company data, even if they don't explicitly ask for a 'dashboard.'

List Trigger Keywords

Explicitly mention terms that should trigger the skill:
---
name: docx
description: "Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads."
---

Include Negative Triggers

Specify when NOT to use the skill to prevent false positives:
---  
name: docx
description: "...If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation."
---

Progressive Disclosure

Keep SKILL.md focused and move details to bundled resources.

Keep SKILL.md Under 500 Lines

From skill-creator:
“Keep SKILL.md under 500 lines; if you’re approaching this limit, add an additional layer of hierarchy along with clear pointers about where the model using the skill should go next to follow up.”

Add Clear Navigation

When using references, provide explicit guidance:
## Phase 2: Implementation

### Set Up Project Structure

See language-specific guides for project setup:
- [⚡ TypeScript Guide](./reference/node_mcp_server.md) - Project structure, tsconfig
- [🐍 Python Guide](./reference/python_mcp_server.md) - Module organization, dependencies

Use Tables of Contents

For reference files >300 lines, include a TOC:
# MCP Best Practices

## Table of Contents
- [Server Naming](#server-naming)
- [Tool Design](#tool-design)
- [Response Formats](#response-formats)
- [Error Handling](#error-handling)
- [Security](#security)

## Server Naming
...

Bundle Scripts Strategically

Look for Repeated Work

From skill-creator:
“Look for repeated work across test cases. Read the transcripts from the test runs and notice if the subagents all independently wrote similar helper scripts or took the same multi-step approach to something. If all 3 test cases resulted in the subagent writing a create_docx.py or a build_chart.py, that’s a strong signal the skill should bundle that script.”

When to Bundle vs. Generate

  • The same code appears in multiple test runs
  • The operation must be deterministic (parsing, validation)
  • Performance matters (native code is faster)
  • The logic is complex with edge cases
  • The code varies based on user requirements
  • It’s a simple, one-time operation
  • The task requires understanding user context
  • Flexibility is more important than consistency

Domain Organization

When supporting multiple frameworks or platforms:
cloud-deploy/
├── SKILL.md              # Workflow + selection logic
└── references/
    ├── aws.md           # AWS-specific guide
    ├── gcp.md           # Google Cloud guide  
    └── azure.md         # Azure guide
In SKILL.md:
## Deployment Workflow

1. Identify the cloud provider from user requirements
2. Load the appropriate reference:
   - AWS: Read `references/aws.md`
   - GCP: Read `references/gcp.md`
   - Azure: Read `references/azure.md`
3. Follow provider-specific deployment steps
This ensures Claude loads only relevant documentation.

Security and Safety

Principle of Lack of Surprise

From skill-creator:
“Skills must not contain malware, exploit code, or any content that could compromise system security. A skill’s contents should not surprise the user in their intent if described. Don’t go along with requests to create misleading skills or skills designed to facilitate unauthorized access, data exfiltration, or other malicious activities.”

Validate Inputs

If your skill processes user data, include validation:
## Input Validation

Before processing:
1. Verify file exists and is readable
2. Check file size is reasonable (<100MB)
3. Validate file format matches expected type
4. Sanitize any user-provided strings used in commands

Handle Errors Gracefully

## Error Handling

When errors occur:
1. Log the specific error message
2. Explain what went wrong in user-friendly terms  
3. Suggest concrete next steps
4. Don't expose sensitive system information

**Example:**
"Failed to connect to database: connection refused on port 5432"
→ "Could not connect to the database. Check that PostgreSQL is running 
and accessible on port 5432."

Testing and Iteration

Start with 2-3 Test Cases

From skill-creator:
“Come up with 2-3 realistic test prompts — the kind of thing a real user would actually say.”
Test cases should be:
  • Realistic - What users will actually ask
  • Specific - Include concrete details (file names, data, context)
  • Representative - Cover different aspects of the skill

Generalize from Feedback

From skill-creator:
“We’re trying to create skills that can be used a million times across many different prompts. Here you and the user are iterating on only a few examples. But if the skill works only for those examples, it’s useless. Rather than put in fiddly overfitty changes, or oppressively constrictive MUSTs, try branching out and using different metaphors, or recommending different patterns of working.”

Remove What Doesn’t Help

From skill-creator:
“Keep the prompt lean. Remove things that aren’t pulling their weight. Make sure to read the transcripts, not just the final outputs — if it looks like the skill is making the model waste a bunch of time doing things that are unproductive, you can try getting rid of the parts of the skill that are making it do that.”

Communicating Clearly

Adapt to User Expertise

From skill-creator:
“The skill creator is liable to be used by people across a wide range of familiarity with coding jargon. Pay attention to context cues to understand how to phrase your communication!”
“In the default case: ‘evaluation’ and ‘benchmark’ are borderline, but OK. For ‘JSON’ and ‘assertion’ you want to see serious cues from the user that they know what those things are before using them without explaining them.”

Define Structure Clearly

When specifying output formats:
## Report Structure

ALWAYS use this exact template:

# [Title]
## Executive Summary  
[2-3 paragraphs summarizing key findings]

## Key Findings
1. [First finding with supporting data]
2. [Second finding with supporting data]
3. [Third finding with supporting data]

## Recommendations  
- [Actionable recommendation 1]
- [Actionable recommendation 2]
- [Actionable recommendation 3]

## Appendix
[Supporting data, methodology, detailed tables]

Version Control and Distribution

Include a LICENSE.txt

If sharing your skill, include clear licensing:
---
name: my-skill
description: Does useful things...
license: Complete terms in LICENSE.txt
---
Common options:
  • Apache 2.0 - Open source, permissive
  • MIT - Open source, very permissive
  • Proprietary - Closed source, custom terms

Package for Distribution

Use the skill-creator’s packaging script:
python -m scripts.package_skill path/to/skill-folder
This creates a .skill file that users can install easily.

Common Pitfalls to Avoid

Over-constraining: Using too many ALWAYS/NEVER/MUST makes skills brittle. Explain why instead.Under-describing: Vague descriptions mean the skill won’t trigger when needed.Bloated SKILL.md: Putting everything in one file. Use references for large docs.No examples: Abstract instructions are hard to follow. Show concrete examples.Overfitting to test cases: Skills should generalize, not just pass specific tests.Missing navigation: Reference files without clear pointers from SKILL.md.Unbundled repeated code: If every test run generates the same script, bundle it.

Skill Quality Checklist

Before considering a skill complete:
  • Frontmatter
    • Name is lowercase with hyphens
    • Description includes both what and when
    • Description lists key trigger terms
    • Description specifies negative triggers if relevant
    • License specified if distributing
  • Structure
    • SKILL.md is under 500 lines (or has clear reason to exceed)
    • Large docs (>300 lines) moved to references/
    • Scripts bundled for repeated/deterministic tasks
    • Assets included for templates/static files
  • Instructions
    • Written in imperative form
    • Explains why, not just what
    • Includes concrete examples
    • Provides clear navigation to references
    • Defines output formats explicitly
  • Testing
    • Tested with 2-3 realistic prompts
    • Generalizes beyond specific test cases
    • Handles errors gracefully
    • Performs well on variations of tasks
  • Security
    • No malicious code or exploits
    • Intent matches description (no surprises)
    • Validates inputs appropriately
    • Sanitizes user-provided data

Next Steps

Overview

Review the skill creation process

Skill Structure

Understand how to organize your skill files

Build docs developers (and LLMs) love