Best Practices - Trail of Bits Skills

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Overview
Description Quality
Third-Person Voice
Include Trigger Keywords
Be Specific About Capabilities
Value-Add: Behavioral Guidance Over Reference Dumps
Teach How to Look Things Up
Explain WHY, Not Just WHAT
Document Anti-Patterns WITH Explanations
Scope Boundaries
Strict for Fragile Tasks
Flexible for Variable Tasks
Required Sections
When to Use
When NOT to Use
Security Skills: Rationalizations to Reject
Content Organization
Progressive Disclosure Pattern
Keep SKILL.md Under 500 Lines
One Level Deep
Hooks Best Practices
Prefer Shell + jq Over Python
Fast-Fail Early
Favor Regex Over AST Parsing
Anticipate False Positive Patterns
Document Trade-offs in PRs
Examples and Concreteness
Provide Concrete Input/Output
Show Decision Criteria
Quality Checklist
Next Steps

Overview

These are Trail of Bits house standards on top of Anthropic’s requirements. Following these practices ensures your skills are discoverable, maintainable, and effective.

Description Quality

Your skill competes with 100+ others. The description must trigger correctly.

Third-Person Voice

description: "Analyzes smart contracts for reentrancy vulnerabilities"

Claude reads descriptions to decide which skills to invoke. Third-person descriptions integrate better with this selection process.

Include Trigger Keywords

description: "Detects timing side-channel vulnerabilities in cryptographic code. Use when implementing or reviewing crypto code, encountering division on secrets, or constant-time programming questions."

Why it matters: Specific triggers increase the chances Claude will invoke your skill at the right time.

Be Specific About Capabilities

description: "Detects reentrancy vulnerabilities in Solidity, Vyper, and Cairo smart contracts using static analysis patterns"

Value-Add: Behavioral Guidance Over Reference Dumps

Skills should provide guidance Claude doesn’t already have, not duplicate reference material.

Teach How to Look Things Up

Don’t paste entire specs. Teach when and how to look things up.

Instead of:

## DWARF Spec

[10,000 lines of DWARF specification]

Do this:

## Working with DWARF

Use these tools to look up what you need:

- `dwarfdump -debug-info` - Inspect debug info entries
- `readelf -wi` - Quick overview of debug sections
- `pyelftools` - Programmatic access for complex analysis

**When to use each:**
- Use `dwarfdump` for human-readable output
- Use `readelf` for quick checks
- Use `pyelftools` when you need to parse or transform data

Explain WHY, Not Just WHAT

Include trade-offs, decision criteria, and judgment calls:

## Why Constant-Time Matters

**The problem:** Timing variations leak secret information through:
- Cache access patterns (fast vs slow memory)
- Branch prediction (taken vs not-taken)
- Division/modulo (variable-time on most CPUs)

**Why not just optimize later:** Compiler optimizations can introduce timing leaks
even in code that was originally constant-time. You must verify assembly output.

**Trade-off:** Constant-time code is often slower. This is acceptable for crypto
where correctness matters more than performance.

Document Anti-Patterns WITH Explanations

Say why something is wrong, not just that it’s wrong:

## Anti-Patterns

### Early Return on Secret Comparison

❌ **Bad:**
\`\`\`c
if (secret[i] != input[i]) return 0;  // Early exit
\`\`\`

**Why it's wrong:** Exit timing reveals position of first mismatch, leaking partial
secret information. An attacker can measure timing to guess bytes one at a time.

✅ **Good:**
\`\`\`c
diff |= secret[i] ^ input[i];  // Accumulate differences
\`\`\`

**Why it's right:** All bytes are always compared. Timing is independent of where
mismatches occur.

Scope Boundaries

Prescriptiveness should match task risk:

Strict for Fragile Tasks

Security audits, crypto implementations, and compliance checks need rigid step-by-step enforcement:

## Mandatory Steps

You MUST complete all steps in order. Do not skip any step.

### 1) Initial Scan
Run static analysis: `semgrep --config crypto.yaml`

### 2) Manual Review
Review all findings. For each:
- [ ] Verify it's a real issue (not false positive)
- [ ] Assess severity (critical/high/medium/low)
- [ ] Document exploit scenario

### 3) Verification
For each critical finding:
- [ ] Write proof-of-concept exploit
- [ ] Verify in test environment
- [ ] Document mitigation

Flexible for Variable Tasks

Code exploration, documentation, and refactoring can offer options:

## Exploration Options

Choose the approach that fits your needs:

**Option A: Quick Overview**
- Scan file structure with `tree`
- Read entry points and main modules
- Review README and documentation

**Option B: Deep Dive**
- Map all dependencies
- Trace execution paths
- Analyze data flows
- Document architecture

**Option C: Targeted Search**
- Define specific questions
- Search for relevant code
- Focus on specific subsystems

Required Sections

Every SKILL.md must include:

When to Use

Specific scenarios where this skill applies:

## When to Use

**Concrete triggers:**
- User implements signature, encryption, or key derivation
- Code contains `/` or `%` operators on secret-derived values
- User mentions "constant-time", "timing attack", "side-channel"
- Reviewing functions named `sign`, `verify`, `encrypt`, `decrypt`

**Flowchart:**
\`\`\`
User writing crypto code? ──yes──> Use this skill
         │
         no
         │
         v
User asking about timing attacks? ──yes──> Use this skill
         │
         no
         │
         v
Skip this skill
\`\`\`

When NOT to Use

Scenarios where another approach is better:

## When NOT to Use

- Non-cryptographic code (business logic, UI, etc.)
- Public data processing where timing leaks don't matter
- Code that doesn't handle secrets, keys, or authentication tokens
- High-level API usage where timing is handled by the library

Security Skills: Rationalizations to Reject

For audit/security skills, include common shortcuts to reject:

## Rationalizations to Reject

These are common shortcuts that lead to missed findings:

- **"This code is internal-only"**
  - Reject: Internal attackers exist; defense in depth matters
  
- **"Performance matters more"**
  - Reject: Security is non-negotiable for cryptographic code
  
- **"The compiler will optimize it"**
  - Reject: Never rely on compiler optimizations for security properties
  
- **"It's too rare to matter"**
  - Reject: Cryptographic vulnerabilities are often systematic, not edge cases

Content Organization

Progressive Disclosure Pattern

Start simple, then provide depth:

## Quick Start

[2-3 paragraphs of core guidance]

For common scenarios, use this command:
\`\`\`bash
uv run {baseDir}/scripts/analyze.py --quick file.c
\`\`\`

## Deep Analysis

See [ADVANCED.md](references/ADVANCED.md) for:
- Cross-architecture testing
- Custom rule development
- Integration with CI/CD
- False positive reduction

## Language-Specific Guides

- [C/C++](references/compiled.md)
- [Python](references/python.md)
- [JavaScript](references/javascript.md)

Keep SKILL.md Under 500 Lines

Split into supporting files when needed:

skills/
  my-skill/
    SKILL.md              # < 500 lines: overview + quick start
    references/
      ADVANCED.md         # Detailed usage
      API.md              # Complete API reference
      TROUBLESHOOTING.md  # Common issues
    workflows/
      audit-workflow.md   # Step-by-step audit process
      review-workflow.md  # Code review process
    scripts/
      analyze.py          # Analysis script
      report.py           # Report generator

One Level Deep

SKILL.md can link to files, but those files shouldn’t chain to more files.

Good: SKILL.md → advanced.md (one level) Bad: SKILL.md → advanced.md → expert.md (chained)

Directory depth is fine (references/guides/topic.md). The restriction is on reference chains, not nested folders.

Hooks Best Practices

PreToolUse hooks run on every Bash command—performance is critical.

Prefer Shell + jq Over Python

#!/bin/bash
# Fast: shell + jq
command=$(echo "$1" | jq -r '.command')
if [[ ! "$command" =~ ^python[0-9.]* ]]; then
    exit 0  # Fast path: most commands exit immediately
fi
# Only analyze python commands

Why: Interpreter startup (Python + tree-sitter) adds noticeable latency. Shell + jq is instant.

Fast-Fail Early

Exit immediately for non-matching commands:

#!/bin/bash
command=$(echo "$1" | jq -r '.command')

# Fast-fail: most invocations exit here (instant)
if [[ ! "$command" =~ git|rm|mv ]]; then
    exit 0
fi

# Only proceed for dangerous commands

Favor Regex Over AST Parsing

Accept rare false positives if performance gain is significant:

# Fast but has false positives
if [[ "$command" =~ python.*\.py ]]; then
    # Analyze
fi

# vs.

# Slow but precise
python -c "import tree_sitter; ..." # Parse AST

Trade-off: Regex might trigger on grep python.py but Claude can rephrase. The speed gain is worth it.

Anticipate False Positive Patterns

Don’t trigger on diagnostic commands:

# Skip diagnostic commands
if [[ "$command" =~ ^(which|type|command\ -v) ]]; then
    exit 0
fi

# Skip search tools
if [[ "$command" =~ ^(grep|rg|ag|ack) ]]; then
    exit 0
fi

# Skip file operations with dangerous terms in filenames
if [[ "$command" =~ (cat|less|head|tail).*python ]]; then
    exit 0
fi

Document Trade-offs in PRs

Explain deliberate design choices in PR descriptions:

## Performance Considerations

This hook uses regex instead of AST parsing to minimize latency on every Bash command.

**Trade-off:** May have false positives on edge cases like `grep "def " python.txt`,
but Claude can rephrase. The ~100ms latency reduction is worth it.

**Tested:** No false positives in normal development workflows.

Examples and Concreteness

Provide Concrete Input/Output

## Example: Detecting Reentrancy

Input:
\`\`\`solidity
function withdraw(uint amount) public {
    require(balances[msg.sender] >= amount);
    msg.sender.call{value: amount}("");  // External call before state update
    balances[msg.sender] -= amount;      // State updated after
}
\`\`\`

Output:
\`\`\`
Line 3: REENTRANCY VULNERABILITY
  Pattern: External call before state update
  Impact: Attacker can recursively call withdraw()
  Fix: Update balances[msg.sender] before external call
\`\`\`

Show Decision Criteria

Help Claude make judgment calls:

## Choosing Analysis Depth

**Quick scan** - Use when:
- Initial code review
- Unfamiliar codebase exploration
- Time constraints

**Deep analysis** - Use when:
- Security audit
- Pre-deployment verification
- Critical infrastructure

**Trade-off:** Deep analysis takes 10x longer but finds 3x more issues.

Quality Checklist

Before submitting, verify: Description:

Third-person voice
Includes trigger keywords
Specific about capabilities

Content:

Explains WHY, not just WHAT
Includes trade-offs and decision criteria
Documents anti-patterns with explanations
Concrete examples with input/output

Structure:

“When to use” section present
“When NOT to use” section present
Security skills have “Rationalizations to reject”
SKILL.md under 500 lines
References are one level deep

Technical:

No hardcoded paths (use {baseDir})
Python scripts use PEP 723 metadata
Hooks are performance-optimized
All referenced files exist

Next Steps

Skill Authoring

Detailed skill authoring guide

Plugin Structure

Required directory structure

Examples

Real-world skill examples

Getting Started

Start contributing

Skill Authoring Guide

Plugin Structure

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Development

Reference

​Overview

​Description Quality

​Third-Person Voice

​Include Trigger Keywords

​Be Specific About Capabilities

​Value-Add: Behavioral Guidance Over Reference Dumps

​Teach How to Look Things Up

​Explain WHY, Not Just WHAT

​Document Anti-Patterns WITH Explanations

​Scope Boundaries

​Strict for Fragile Tasks

​Flexible for Variable Tasks

​Required Sections

​When to Use

​When NOT to Use

​Security Skills: Rationalizations to Reject

​Content Organization

​Progressive Disclosure Pattern

​Keep SKILL.md Under 500 Lines

​One Level Deep

​Hooks Best Practices

​Prefer Shell + jq Over Python

​Fast-Fail Early

​Favor Regex Over AST Parsing

​Anticipate False Positive Patterns

​Document Trade-offs in PRs

​Examples and Concreteness

​Provide Concrete Input/Output

​Show Decision Criteria

​Quality Checklist

​Next Steps

Skill Authoring

Plugin Structure

Examples

Getting Started

Build docs developers (and LLMs) love

Overview

Description Quality

Third-Person Voice

Include Trigger Keywords

Be Specific About Capabilities

Value-Add: Behavioral Guidance Over Reference Dumps

Teach How to Look Things Up

Explain WHY, Not Just WHAT

Document Anti-Patterns WITH Explanations

Scope Boundaries

Strict for Fragile Tasks

Flexible for Variable Tasks

Required Sections

When to Use

When NOT to Use

Security Skills: Rationalizations to Reject

Content Organization

Progressive Disclosure Pattern

Keep SKILL.md Under 500 Lines

One Level Deep

Hooks Best Practices

Prefer Shell + jq Over Python

Fast-Fail Early

Favor Regex Over AST Parsing

Anticipate False Positive Patterns

Document Trade-offs in PRs

Examples and Concreteness

Provide Concrete Input/Output

Show Decision Criteria

Quality Checklist

Next Steps