Overview
Playbooks are workflow templates that define step-by-step procedures for custom stewards. They’re stored as Workflow elements in Stoneforge and can include bash commands, CLI workflows, decision trees, and manual procedures.
What Are Playbooks?
Playbooks are markdown documents that contain:
Objective : What the workflow achieves
Steps : Ordered procedures to execute
Decision points : Branching logic based on conditions
Commands : Bash scripts, CLI commands, or API calls
Validation : How to verify success
Automated Workflows Stewards execute playbook steps automatically (cleanup, monitoring, etc.)
Manual Procedures Stewards follow playbook steps with human approval for critical operations
Creating a Playbook
Basic Structure
Playbooks use markdown with a specific structure:
Stale Branch Cleanup Playbook
## Stale Branch Cleanup
### Objective
Remove branches for closed tasks older than 30 days to reduce clutter and improve repository performance.
### Prerequisites
- Git repository with remote access
- Stoneforge CLI installed
- Write permissions on the repository
### Steps
#### 1. Find Stale Branches
List all branches in the `agent/` namespace older than 30 days:
```bash
git for-each-ref --format= '%(refname:short) %(authordate:iso8601)' refs/heads/agent/ | \
awk -v cutoff="$( date -d '30 days ago' +%Y-%m-%d)" '$2 < cutoff {print $1}'
Store results in $STALE_BRANCHES.
2. Verify Task Status
For each branch, extract the task ID and check if it’s closed:
for branch in $STALE_BRANCHES ; do
# Extract task ID from branch name: agent/worker-1/task-abc123-feature
task_id = $( echo $branch | grep -oP 'task-[a-z0-9]+' )
# Check task status
status = $( sf show $task_id --json | jq -r '.status' )
if [ " $status " = "closed" ]; then
echo " $branch is safe to delete (task $task_id is closed)"
else
echo "SKIP $branch (task $task_id status: $status )"
fi
done
3. Delete Stale Branches
Only delete branches where the task is closed:
for branch in $SAFE_TO_DELETE ; do
# Delete local branch
git branch -D $branch
# Delete remote branch
git push origin --delete $branch
echo "Deleted: $branch "
done
4. Report Results
Create a summary document:
sf document create --title "Branch Cleanup $( date +%Y-%m-%d)" \
--category "changelog" \
--content "Deleted $COUNT stale branches: $DELETED_BRANCHES "
Send notification to operations channel:
sf message send --from < Steward I D > --channel < ops-channe l > \
--content "Completed branch cleanup: $COUNT branches deleted. See doc: $DOC_ID "
Validation
Verify the cleanup was successful:
# Count remaining stale branches (should be 0)
stale_count = $( git for-each-ref --format= '%(refname:short) %(authordate:iso8601)' refs/heads/agent/ | \
awk -v cutoff="$( date -d '30 days ago' +%Y-%m-%d)" '$2 < cutoff {print $1}' | wc -l )
if [ $stale_count -eq 0 ]; then
echo "SUCCESS: All stale branches removed"
else
echo "WARNING: $stale_count stale branches remain (possibly with open tasks)"
fi
### Creating the Workflow Element
```typescript
import { createPlaybook } from '@stoneforge/core';
import * as fs from 'fs';
const playbookContent = fs.readFileSync('./playbooks/cleanup-stale-branches.md', 'utf8');
const playbook = await createPlaybook({
title: 'Stale Branch Cleanup',
content: playbookContent,
createdBy: operatorId,
tags: ['maintenance', 'cleanup', 'git'],
});
const savedPlaybook = await api.create(playbook);
console.log('Playbook created:', savedPlaybook.id);
Registering a Custom Steward
Once the playbook is created, register a steward that references it:
import { createOrchestratorAPI } from '@stoneforge/smithy' ;
const api = createOrchestratorAPI ( storage );
const steward = await api . registerSteward ({
name: 'cleanup-steward' ,
stewardFocus: 'custom' ,
playbookId: savedPlaybook . id , // Reference the playbook
triggers: [
{ type: 'cron' , schedule: '0 2 * * 0' }, // Weekly on Sunday at 2am
],
createdBy: directorId ,
maxConcurrentTasks: 1 ,
});
Use playbookId (preferred) to reference a Workflow element. The legacy playbook string field is deprecated.
Trigger Types
Event Triggers
Execute playbook when specific events occur:
const steward = await api . registerSteward ({
name: 'post-merge-steward' ,
stewardFocus: 'custom' ,
playbookId: postMergePlaybookId ,
triggers: [
{ type: 'event' , event: 'task_completed' },
{ type: 'event' , event: 'merge_successful' },
],
createdBy: directorId ,
});
Available events:
task_completed - After a worker completes a task
merge_successful - After a successful merge to main
test_failure - When tests fail during merge review
conflict_detected - When merge conflicts are detected
Cron Triggers
Schedule playbook execution using cron syntax:
const steward = await api . registerSteward ({
name: 'nightly-maintenance' ,
stewardFocus: 'custom' ,
playbookId: maintenancePlaybookId ,
triggers: [
{ type: 'cron' , schedule: '0 2 * * *' }, // Daily at 2am
],
createdBy: directorId ,
});
Cron schedule format:
┌───────────── minute (0 - 59)
│ ┌─────────── hour (0 - 23)
│ │ ┌───────── day of month (1 - 31)
│ │ │ ┌─────── month (1 - 12)
│ │ │ │ ┌───── day of week (0 - 7, 0 and 7 = Sunday)
│ │ │ │ │
* * * * *
Examples:
0 2 * * * - Daily at 2:00 AM
0 */6 * * * - Every 6 hours
0 2 * * 0 - Weekly on Sunday at 2:00 AM
0 0 1 * * - Monthly on the 1st at midnight
Playbook Examples
Dependency Update Checker
Check for Outdated Dependencies
## Dependency Update Checker
### Objective
Detect outdated npm dependencies and create tasks to update them.
### Steps
#### 1. Check for Updates
```bash
npm outdated --json > /tmp/outdated.json
2. Filter Critical Updates
Extract packages with major version updates:
jq -r 'to_entries | .[] | select(.value.wanted != .value.latest) | "\(.key): \(.value.current) -> \(.value.latest)"' /tmp/outdated.json
3. Create Update Tasks
For each outdated package, create a task:
for update in $UPDATES ; do
package = $( echo $update | cut -d: -f1 )
versions = $( echo $update | cut -d: -f2 )
sf task create --title "Update $package " \
--priority 2 \
--type maintenance \
--tags dependency-update \
--plan "Dependency Updates"
done
4. Notify Team
sf message send --from < Steward I D > --channel < dev-channe l > \
--content "Found $COUNT outdated dependencies. Created update tasks in plan 'Dependency Updates'."
### Documentation Validation
```markdown title="Validate Documentation Links"
## Documentation Link Validator
### Objective
Find and report broken links in markdown documentation.
### Steps
#### 1. Extract All Links
```bash
find docs -name '*.md' -exec grep -oP '\[.*?\]\(\K[^)]+' {} \; | sort -u > /tmp/links.txt
2. Check External Links
Verify HTTP/HTTPS links return 200:
while read link ; do
if [[ $link =~ ^https ? :// ]]; then
status = $( curl -s -o /dev/null -w '%{http_code}' $link )
if [ $status -ne 200 ]; then
echo "BROKEN: $link (HTTP $status )"
fi
fi
done < /tmp/links.txt
3. Check Internal Links
Verify relative file paths exist:
while read link ; do
if [[ ! $link =~ ^https ? :// ]]; then
if [ ! -f "docs/ $link " ]; then
echo "BROKEN: $link (file not found)"
fi
fi
done < /tmp/links.txt
4. Report Broken Links
If any broken links found, create a task:
if [ -s /tmp/broken-links.txt ]; then
sf task create --title "Fix broken documentation links" \
--priority 2 \
--type bug \
--tags documentation \
--description "$( cat /tmp/broken-links.txt)"
fi
### Code Quality Scan
```markdown title="Run Code Quality Checks"
## Code Quality Scanner
### Objective
Run linting and type checking, create fix tasks for issues.
### Steps
#### 1. Run ESLint
```bash
npx eslint . --format json > /tmp/eslint-report.json
2. Count Errors
error_count = $( jq '[.[].errorCount] | add' /tmp/eslint-report.json )
warning_count = $( jq '[.[].warningCount] | add' /tmp/eslint-report.json )
3. Create Tasks for Errors
If errors > 0, create a fix task:
if [ $error_count -gt 0 ]; then
sf task create --title "Fix $error_count ESLint errors" \
--priority 3 \
--type bug \
--tags linting,quality \
--description "Run \` npm run lint:fix \` and fix remaining errors."
fi
4. Run Type Checking
npx tsc --noEmit > /tmp/tsc-errors.txt 2>&1
5. Report Results
sf message send --from < Steward I D > --channel < dev-channe l > \
--content "Code quality scan: $error_count errors, $warning_count warnings. Tasks created for errors."
## Playbook Execution
Stewards execute playbooks when triggered:
### Execution Flow
┌─────────────────────────────────────────────┐
│ Trigger Fired │
│ (event: task_completed OR cron) │
└───────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Dispatch Daemon assigns │
│ task to available steward │
└───────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Steward spawns with │
│ playbook loaded │
└───────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Steward executes steps │
│ (bash commands, CLI workflows) │
└───────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Steward reports results │
│ (creates docs, sends messages) │
└───────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Session ends │
└─────────────────────────────────────────────┘
### Plugin Executor
The `PluginExecutor` service runs playbook commands:
```typescript
import { createPluginExecutor } from '@stoneforge/smithy';
const executor = createPluginExecutor();
// Execute a bash script from playbook
const result = await executor.executeScript(
'/tmp/cleanup-script.sh',
{
cwd: workspaceRoot,
timeout: 60_000, // 1 minute
}
);
if (result.exitCode === 0) {
console.log('Script succeeded:', result.stdout);
} else {
console.error('Script failed:', result.stderr);
}
Best Practices
Make Steps Idempotent Playbook steps should be safe to run multiple times. Check state before modifying (e.g., check if branch exists before deleting).
Include Validation Every playbook should verify success at the end. Report what changed and what the final state is.
Log All Actions Create summary documents or send messages with what was done. Auditable workflows prevent confusion.
Handle Errors Gracefully Use bash error handling (set -e, || true) and report failures clearly.
Idempotent Scripts
# BAD: Fails if branch already deleted
git branch -D stale-branch
# GOOD: Check first
if git show-ref --verify --quiet refs/heads/stale-branch ; then
git branch -D stale-branch
echo "Deleted stale-branch"
else
echo "stale-branch already deleted (or never existed)"
fi
Error Handling
# Exit on first error
set -e
# Or handle errors explicitly
if ! npm test ; then
echo "Tests failed, creating fix task"
sf task create --title "Fix failing tests" --priority 1 --type bug
exit 1
fi
Steward Scheduler
The StewardScheduler service manages cron-triggered stewards:
import { createStewardScheduler } from '@stoneforge/smithy' ;
const scheduler = createStewardScheduler ( api , dispatchService , agentRegistry );
// Start the scheduler
await scheduler . start ();
// Scheduler polls every minute for stewards with cron triggers
// and dispatches them when the schedule matches
The scheduler runs automatically in the Smithy server. You typically don’t need to start it manually.
Troubleshooting
Symptom : Steward spawns but doesn’t run playbook stepsPossible causes :
playbookId points to non-existent Workflow
Playbook markdown syntax errors
Bash commands have incorrect syntax
Solution : Verify playbook exists and test bash commands manually:sf show < playbook-i d >
# Extract and test bash snippets
bash -c 'your command here'
Symptom : Cron trigger never firesPossible causes :
Scheduler not running
Cron syntax incorrect
Steward has maxConcurrentTasks: 0
Solution :// Verify steward configuration
const steward = await api . getAgent ( stewardId );
console . log ( 'Triggers:' , steward . metadata ?. triggers );
console . log ( 'Max tasks:' , steward . metadata ?. maxConcurrentTasks );
// Test cron expression
import { cronParser } from 'cron-parser' ;
const interval = cronParser . parseExpression ( '0 2 * * *' );
console . log ( 'Next run:' , interval . next (). toString ());
Symptom : Playbook execution times outCause : Script takes longer than timeout (default: 2 minutes)Solution : Increase timeout or optimize script:const result = await executor . executeScript (
'/tmp/long-script.sh' ,
{
timeout: 10 * 60_000 , // 10 minutes
}
);
Custom Agents Create specialized steward roles
Steward Types Overview of built-in steward focuses
Agent Communication How stewards send reports and notifications