Token Optimization

PromptSmith offers multiple strategies to optimize token usage and reduce AI API costs, including TOON format - a token-optimized alternative to markdown that can reduce prompt size by 30-60%.

Understanding Token Costs

Large system prompts directly impact your costs:

Input tokens: Charged every time you send a request
Larger prompts: Mean more input tokens per request
High volume: Costs multiply across thousands/millions of requests

A 2000-token prompt at 10,000 requests/day costs significantly more than a 1000-token prompt. Optimization pays for itself quickly.

TOON Format

TOON (Token-Oriented Object Notation) is an optimized format that reduces tokens while maintaining model comprehension:

Basic Usage

import { createPromptBuilder } from 'promptsmith-ts/builder';

const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant')
  .withCapabilities([
    'Answer questions',
    'Provide information'
  ])
  .withFormat('toon'); // ✅ Enable TOON format

const prompt = builder.build();

Format Comparison

# Identity
You are a customer service assistant

# Capabilities
1. Process returns and exchanges
2. Track order status
3. Answer product questions

# Available Tools

## track_order
Look up order status by order number

**Parameters:**
- `order_number` (string, required): Order number to track

Savings Analysis

const builder = createPromptBuilder()
  .withIdentity('You are a data analyst')
  .withCapabilities(['Analyze data', 'Generate reports'])
  .withTool({
    name: 'query_db',
    description: 'Query database',
    schema: z.object({
      query: z.string().describe('SQL query')
    })
  });

// Compare formats
const markdown = builder.build('markdown');
const toon = builder.build('toon');
const compact = builder.build('compact');

console.log('Markdown:', markdown.length, 'chars');
console.log('Compact:', compact.length, 'chars');
console.log('TOON:', toon.length, 'chars');

// Example output:
// Markdown: 850 chars (~213 tokens)
// Compact: 720 chars (~180 tokens) - 15% reduction
// TOON: 510 chars (~128 tokens) - 40% reduction

When to Use Each Format

Development: Markdown

Use during development for readability:

const devBuilder = createPromptBuilder()
  .withFormat('markdown') // Default, most readable
  .withIdentity('You are a helpful assistant');

// Easy to read, review, and debug

Pros:

Human-readable
Easy to review in diffs
Clear structure
Familiar syntax

Cons:

Highest token usage
Expensive at scale

Staging: Compact

Use in QA/staging for moderate optimization:

const stagingBuilder = createPromptBuilder()
  .withFormat('compact')
  .withIdentity('You are a helpful assistant');

// 10-20% token reduction, still readable

Pros:

Moderate token savings (10-20%)
Still uses markdown semantics
Reasonably readable

Cons:

Less readable than full markdown
Not maximum optimization

Production: TOON

Use in production for maximum savings:

const prodBuilder = createPromptBuilder()
  .withFormat('toon')
  .withIdentity('You are a helpful assistant');

// 30-60% token reduction

Pros:

Maximum token savings (30-60%)
Significant cost reduction
Model comprehension maintained

Cons:

Less human-readable
Harder to debug directly
Different from markdown

Temporary Format Override

Override format per build without changing default:

const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant')
  .withFormat('markdown'); // Default format

// Use different formats temporarily
const markdownPrompt = builder.build(); // Uses default (markdown)
const toonPrompt = builder.build('toon'); // Override to TOON
const compactPrompt = builder.build('compact'); // Override to compact

Cost Impact Example

Scenario: High-Volume Application

// Assumptions
const requestsPerDay = 100000;
const daysPerMonth = 30;
const inputTokenCost = 0.01; // per 1000 tokens (GPT-4 pricing)

// Markdown format
const markdownTokens = 2000;
const markdownCost = 
  (markdownTokens / 1000) * inputTokenCost * requestsPerDay * daysPerMonth;

console.log('Markdown monthly cost:', markdownCost); // $600

// TOON format (40% reduction)
const toonTokens = 1200;
const toonCost = 
  (toonTokens / 1000) * inputTokenCost * requestsPerDay * daysPerMonth;

console.log('TOON monthly cost:', toonCost); // $360
console.log('Monthly savings:', markdownCost - toonCost); // $240
console.log('Annual savings:', (markdownCost - toonCost) * 12); // $2,880

For high-traffic applications, TOON format can save thousands of dollars per year while maintaining model performance.

Size Debugging

Use .debug() to analyze prompt size and see savings:

const builder = createPromptBuilder()
  .withIdentity('You are a data analyst')
  .withCapabilities(['Analyze data', 'Generate reports'])
  .withTool({
    name: 'query_db',
    description: 'Query database',
    schema: z.object({ query: z.string() })
  })
  .withFormat('markdown')
  .debug();

// Output:
// PromptSmith Builder Debug
//
// Format: markdown | Identity: ✓ | Capabilities: 2 | Tools: 1
// Constraints: 0 | Examples: 0 | Guardrails: ✗
//
// Preview: # Identity You are a data analyst # Capabilities 1. Analyze...
// Size: 850 chars (~213 tokens)
// TOON format: 510 chars (~128 tokens) - saves 40%

Optimization Strategies

1. Remove Redundancy

Eliminate duplicate or unnecessary information:

// ❌ Redundant
const builder = createPromptBuilder()
  .withIdentity('You are a helpful customer service assistant')
  .withCapability('Be helpful to customers')
  .withCapability('Provide customer assistance')
  .withCapability('Help customers with questions');

// ✅ Concise
const builder = createPromptBuilder()
  .withIdentity('You are a customer service assistant')
  .withCapabilities([
    'Answer product questions',
    'Process returns',
    'Track orders'
  ]);

2. Consolidate Examples

Use fewer, high-quality examples:

// ❌ Too many examples
.withExamples([
  { user: 'Hi', assistant: 'Hello!' },
  { user: 'Hey', assistant: 'Hi there!' },
  { user: 'Hello', assistant: 'Hello! How can I help?' },
  // 10 more greeting examples...
])

// ✅ One comprehensive example
.withExamples([
  {
    user: 'Hello',
    assistant: 'Hi! How can I help you today?',
    explanation: 'Greet users warmly and offer assistance'
  }
])

3. Concise Descriptions

Be specific but brief:

// ❌ Verbose
.withTool({
  name: 'search',
  description: 'This tool allows you to search through our entire product catalog by providing keywords, product names, categories, or SKU numbers. You should use this tool whenever a customer is looking for a product or wants to know if we have something in stock.',
  schema: z.object({
    query: z.string().describe('The search keywords that the user wants to look for')
  })
})

// ✅ Concise
.withTool({
  name: 'search',
  description: 'Search product catalog by keyword, name, category, or SKU. Use when customer asks about product availability.',
  schema: z.object({
    query: z.string().describe('Search keywords')
  })
})

4. Use Template Merging

Avoid repeating common patterns:

// ❌ Repeated security config across agents
const agent1 = createPromptBuilder()
  .withGuardrails()
  .withConstraint('must', '...')
  .withConstraint('must_not', '...');

const agent2 = createPromptBuilder()
  .withGuardrails()
  .withConstraint('must', '...')
  .withConstraint('must_not', '...');

// ✅ Reusable template
import { security } from 'promptsmith-ts/templates';

const agent1 = createPromptBuilder().merge(security());
const agent2 = createPromptBuilder().merge(security());

5. Conditional Content

Only include what’s needed:

function createAgent(features: string[]) {
  const builder = createPromptBuilder()
    .withIdentity('You are an assistant');
  
  // Only add tools user has access to
  if (features.includes('database')) {
    builder.withTool(/* database tool */);
  }
  
  if (features.includes('email')) {
    builder.withTool(/* email tool */);
  }
  
  return builder;
}

// Basic user gets smaller prompt
const basicAgent = createAgent(['email']);

// Premium user gets full prompt
const premiumAgent = createAgent(['database', 'email']);

Environment-Based Optimization

function getFormat(): 'markdown' | 'toon' | 'compact' {
  if (process.env.NODE_ENV === 'development') {
    return 'markdown'; // Readability
  }
  if (process.env.NODE_ENV === 'staging') {
    return 'compact'; // Moderate optimization
  }
  return 'toon'; // Maximum optimization
}

const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant')
  .withFormat(getFormat());

Measuring Impact

Track token usage over time:

function buildAndMeasure(builder: SystemPromptBuilder, format: PromptFormat) {
  const prompt = builder.build(format);
  const chars = prompt.length;
  const estimatedTokens = Math.ceil(chars / 4); // Rough estimate
  
  return {
    format,
    chars,
    estimatedTokens,
    prompt
  };
}

const builder = createPromptBuilder()
  .withIdentity('You are a data analyst')
  .withCapabilities(['Query data', 'Generate reports']);

const formats: PromptFormat[] = ['markdown', 'compact', 'toon'];
const measurements = formats.map(format => buildAndMeasure(builder, format));

console.table(measurements);
// ┌─────────┬──────────┬───────┬──────────────────┐
// │ (index) │  format  │ chars │ estimatedTokens  │
// ├─────────┼──────────┼───────┼──────────────────┤
// │    0    │ markdown │  850  │       213        │
// │    1    │ compact  │  720  │       180        │
// │    2    │   toon   │  510  │       128        │
// └─────────┴──────────┴───────┴──────────────────┘

Prompt Caching

PromptSmith automatically caches built prompts:

const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant');

// First call: builds and caches
const prompt1 = builder.build();

// Second call: returns cached result (instant)
const prompt2 = builder.build();

// Cache invalidates on changes
builder.withCapability('Answer questions');
const prompt3 = builder.build(); // Rebuilds and caches new version

Cache statistics:

const stats = builder['_cache'].getStats();
console.log(stats);
// {
//   isDirty: false,
//   cachedFormats: ['markdown'],
//   cacheSize: 850
// }

Best Practices

Start with Markdown: Develop and test with readable format
Promote to Compact in Staging: Test with moderate optimization
Use TOON in Production: Deploy with maximum optimization
Monitor Token Usage: Track actual token consumption
Profile Before Optimizing: Measure impact of changes
Test Behavior: Ensure TOON format doesn’t affect model performance
Document Format Choice: Explain why each format is used

Common Pitfalls

Over-Optimization: Don’t sacrifice clarity for minimal savings:

// ❌ Too abbreviated
.withIdentity('CS asst')
.withCapability('Proc ret')

// ✅ Clear and concise
.withIdentity('You are a customer service assistant')
.withCapability('Process returns')

Premature Optimization: Optimize after validating prompt behavior:

Build and test with markdown ✅
Validate behavior thoroughly ✅
Switch to TOON for production ✅
Not: Start with TOON and debug ❌

Next Steps

Testing

Validate TOON format effectiveness

Composition

Optimize reusable templates

API Reference

Detailed format documentation

Examples

See optimization examples

Getting Started

Core Concepts

Guides

Templates

Integrations

Token Optimization

Token Optimization

Understanding Token Costs

TOON Format

Basic Usage

Format Comparison

Savings Analysis

When to Use Each Format

Development: Markdown

Staging: Compact

Production: TOON

Temporary Format Override

Cost Impact Example

Scenario: High-Volume Application

Size Debugging

Optimization Strategies

1. Remove Redundancy

2. Consolidate Examples

3. Concise Descriptions

4. Use Template Merging

5. Conditional Content

Environment-Based Optimization

Measuring Impact

Prompt Caching

Best Practices

Common Pitfalls

Next Steps

Testing

Composition

API Reference

Examples

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Templates

Integrations

​Token Optimization

​Understanding Token Costs

​TOON Format

​Basic Usage

​Format Comparison

​Savings Analysis

​When to Use Each Format

​Development: Markdown

​Staging: Compact

​Production: TOON

​Temporary Format Override

​Cost Impact Example

​Scenario: High-Volume Application

​Size Debugging

​Optimization Strategies

​1. Remove Redundancy

​2. Consolidate Examples

​3. Concise Descriptions

​4. Use Template Merging

​5. Conditional Content

​Environment-Based Optimization

​Measuring Impact

​Prompt Caching

​Best Practices

​Common Pitfalls

​Next Steps

Testing

Composition

API Reference

Examples

Build docs developers (and LLMs) love

Token Optimization

Understanding Token Costs

TOON Format

Basic Usage

Format Comparison

Savings Analysis

When to Use Each Format

Development: Markdown

Staging: Compact

Production: TOON

Temporary Format Override

Cost Impact Example

Scenario: High-Volume Application

Size Debugging

Optimization Strategies

1. Remove Redundancy

2. Consolidate Examples

3. Concise Descriptions

4. Use Template Merging

5. Conditional Content

Environment-Based Optimization

Measuring Impact

Prompt Caching

Best Practices

Common Pitfalls

Next Steps