Skip to main content

Token Optimization

PromptSmith offers multiple strategies to optimize token usage and reduce AI API costs, including TOON format - a token-optimized alternative to markdown that can reduce prompt size by 30-60%.

Understanding Token Costs

Large system prompts directly impact your costs:
  • Input tokens: Charged every time you send a request
  • Larger prompts: Mean more input tokens per request
  • High volume: Costs multiply across thousands/millions of requests
A 2000-token prompt at 10,000 requests/day costs significantly more than a 1000-token prompt. Optimization pays for itself quickly.

TOON Format

TOON (Token-Oriented Object Notation) is an optimized format that reduces tokens while maintaining model comprehension:

Basic Usage

import { createPromptBuilder } from 'promptsmith-ts/builder';

const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant')
  .withCapabilities([
    'Answer questions',
    'Provide information'
  ])
  .withFormat('toon'); // ✅ Enable TOON format

const prompt = builder.build();

Format Comparison

# Identity
You are a customer service assistant

# Capabilities
1. Process returns and exchanges
2. Track order status
3. Answer product questions

# Available Tools

## track_order
Look up order status by order number

**Parameters:**
- `order_number` (string, required): Order number to track

Savings Analysis

const builder = createPromptBuilder()
  .withIdentity('You are a data analyst')
  .withCapabilities(['Analyze data', 'Generate reports'])
  .withTool({
    name: 'query_db',
    description: 'Query database',
    schema: z.object({
      query: z.string().describe('SQL query')
    })
  });

// Compare formats
const markdown = builder.build('markdown');
const toon = builder.build('toon');
const compact = builder.build('compact');

console.log('Markdown:', markdown.length, 'chars');
console.log('Compact:', compact.length, 'chars');
console.log('TOON:', toon.length, 'chars');

// Example output:
// Markdown: 850 chars (~213 tokens)
// Compact: 720 chars (~180 tokens) - 15% reduction
// TOON: 510 chars (~128 tokens) - 40% reduction

When to Use Each Format

Development: Markdown

Use during development for readability:
const devBuilder = createPromptBuilder()
  .withFormat('markdown') // Default, most readable
  .withIdentity('You are a helpful assistant');

// Easy to read, review, and debug
Pros:
  • Human-readable
  • Easy to review in diffs
  • Clear structure
  • Familiar syntax
Cons:
  • Highest token usage
  • Expensive at scale

Staging: Compact

Use in QA/staging for moderate optimization:
const stagingBuilder = createPromptBuilder()
  .withFormat('compact')
  .withIdentity('You are a helpful assistant');

// 10-20% token reduction, still readable
Pros:
  • Moderate token savings (10-20%)
  • Still uses markdown semantics
  • Reasonably readable
Cons:
  • Less readable than full markdown
  • Not maximum optimization

Production: TOON

Use in production for maximum savings:
const prodBuilder = createPromptBuilder()
  .withFormat('toon')
  .withIdentity('You are a helpful assistant');

// 30-60% token reduction
Pros:
  • Maximum token savings (30-60%)
  • Significant cost reduction
  • Model comprehension maintained
Cons:
  • Less human-readable
  • Harder to debug directly
  • Different from markdown

Temporary Format Override

Override format per build without changing default:
const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant')
  .withFormat('markdown'); // Default format

// Use different formats temporarily
const markdownPrompt = builder.build(); // Uses default (markdown)
const toonPrompt = builder.build('toon'); // Override to TOON
const compactPrompt = builder.build('compact'); // Override to compact

Cost Impact Example

Scenario: High-Volume Application

// Assumptions
const requestsPerDay = 100000;
const daysPerMonth = 30;
const inputTokenCost = 0.01; // per 1000 tokens (GPT-4 pricing)

// Markdown format
const markdownTokens = 2000;
const markdownCost = 
  (markdownTokens / 1000) * inputTokenCost * requestsPerDay * daysPerMonth;

console.log('Markdown monthly cost:', markdownCost); // $600

// TOON format (40% reduction)
const toonTokens = 1200;
const toonCost = 
  (toonTokens / 1000) * inputTokenCost * requestsPerDay * daysPerMonth;

console.log('TOON monthly cost:', toonCost); // $360
console.log('Monthly savings:', markdownCost - toonCost); // $240
console.log('Annual savings:', (markdownCost - toonCost) * 12); // $2,880
For high-traffic applications, TOON format can save thousands of dollars per year while maintaining model performance.

Size Debugging

Use .debug() to analyze prompt size and see savings:
const builder = createPromptBuilder()
  .withIdentity('You are a data analyst')
  .withCapabilities(['Analyze data', 'Generate reports'])
  .withTool({
    name: 'query_db',
    description: 'Query database',
    schema: z.object({ query: z.string() })
  })
  .withFormat('markdown')
  .debug();

// Output:
// PromptSmith Builder Debug
//
// Format: markdown | Identity: ✓ | Capabilities: 2 | Tools: 1
// Constraints: 0 | Examples: 0 | Guardrails: ✗
//
// Preview: # Identity You are a data analyst # Capabilities 1. Analyze...
// Size: 850 chars (~213 tokens)
// TOON format: 510 chars (~128 tokens) - saves 40%

Optimization Strategies

1. Remove Redundancy

Eliminate duplicate or unnecessary information:
// ❌ Redundant
const builder = createPromptBuilder()
  .withIdentity('You are a helpful customer service assistant')
  .withCapability('Be helpful to customers')
  .withCapability('Provide customer assistance')
  .withCapability('Help customers with questions');

// ✅ Concise
const builder = createPromptBuilder()
  .withIdentity('You are a customer service assistant')
  .withCapabilities([
    'Answer product questions',
    'Process returns',
    'Track orders'
  ]);

2. Consolidate Examples

Use fewer, high-quality examples:
// ❌ Too many examples
.withExamples([
  { user: 'Hi', assistant: 'Hello!' },
  { user: 'Hey', assistant: 'Hi there!' },
  { user: 'Hello', assistant: 'Hello! How can I help?' },
  // 10 more greeting examples...
])

// ✅ One comprehensive example
.withExamples([
  {
    user: 'Hello',
    assistant: 'Hi! How can I help you today?',
    explanation: 'Greet users warmly and offer assistance'
  }
])

3. Concise Descriptions

Be specific but brief:
// ❌ Verbose
.withTool({
  name: 'search',
  description: 'This tool allows you to search through our entire product catalog by providing keywords, product names, categories, or SKU numbers. You should use this tool whenever a customer is looking for a product or wants to know if we have something in stock.',
  schema: z.object({
    query: z.string().describe('The search keywords that the user wants to look for')
  })
})

// ✅ Concise
.withTool({
  name: 'search',
  description: 'Search product catalog by keyword, name, category, or SKU. Use when customer asks about product availability.',
  schema: z.object({
    query: z.string().describe('Search keywords')
  })
})

4. Use Template Merging

Avoid repeating common patterns:
// ❌ Repeated security config across agents
const agent1 = createPromptBuilder()
  .withGuardrails()
  .withConstraint('must', '...')
  .withConstraint('must_not', '...');

const agent2 = createPromptBuilder()
  .withGuardrails()
  .withConstraint('must', '...')
  .withConstraint('must_not', '...');

// ✅ Reusable template
import { security } from 'promptsmith-ts/templates';

const agent1 = createPromptBuilder().merge(security());
const agent2 = createPromptBuilder().merge(security());

5. Conditional Content

Only include what’s needed:
function createAgent(features: string[]) {
  const builder = createPromptBuilder()
    .withIdentity('You are an assistant');
  
  // Only add tools user has access to
  if (features.includes('database')) {
    builder.withTool(/* database tool */);
  }
  
  if (features.includes('email')) {
    builder.withTool(/* email tool */);
  }
  
  return builder;
}

// Basic user gets smaller prompt
const basicAgent = createAgent(['email']);

// Premium user gets full prompt
const premiumAgent = createAgent(['database', 'email']);

Environment-Based Optimization

function getFormat(): 'markdown' | 'toon' | 'compact' {
  if (process.env.NODE_ENV === 'development') {
    return 'markdown'; // Readability
  }
  if (process.env.NODE_ENV === 'staging') {
    return 'compact'; // Moderate optimization
  }
  return 'toon'; // Maximum optimization
}

const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant')
  .withFormat(getFormat());

Measuring Impact

Track token usage over time:
function buildAndMeasure(builder: SystemPromptBuilder, format: PromptFormat) {
  const prompt = builder.build(format);
  const chars = prompt.length;
  const estimatedTokens = Math.ceil(chars / 4); // Rough estimate
  
  return {
    format,
    chars,
    estimatedTokens,
    prompt
  };
}

const builder = createPromptBuilder()
  .withIdentity('You are a data analyst')
  .withCapabilities(['Query data', 'Generate reports']);

const formats: PromptFormat[] = ['markdown', 'compact', 'toon'];
const measurements = formats.map(format => buildAndMeasure(builder, format));

console.table(measurements);
// ┌─────────┬──────────┬───────┬──────────────────┐
// │ (index) │  format  │ chars │ estimatedTokens  │
// ├─────────┼──────────┼───────┼──────────────────┤
// │    0    │ markdown │  850  │       213        │
// │    1    │ compact  │  720  │       180        │
// │    2    │   toon   │  510  │       128        │
// └─────────┴──────────┴───────┴──────────────────┘

Prompt Caching

PromptSmith automatically caches built prompts:
const builder = createPromptBuilder()
  .withIdentity('You are a helpful assistant');

// First call: builds and caches
const prompt1 = builder.build();

// Second call: returns cached result (instant)
const prompt2 = builder.build();

// Cache invalidates on changes
builder.withCapability('Answer questions');
const prompt3 = builder.build(); // Rebuilds and caches new version
Cache statistics:
const stats = builder['_cache'].getStats();
console.log(stats);
// {
//   isDirty: false,
//   cachedFormats: ['markdown'],
//   cacheSize: 850
// }

Best Practices

  1. Start with Markdown: Develop and test with readable format
  2. Promote to Compact in Staging: Test with moderate optimization
  3. Use TOON in Production: Deploy with maximum optimization
  4. Monitor Token Usage: Track actual token consumption
  5. Profile Before Optimizing: Measure impact of changes
  6. Test Behavior: Ensure TOON format doesn’t affect model performance
  7. Document Format Choice: Explain why each format is used

Common Pitfalls

Over-Optimization: Don’t sacrifice clarity for minimal savings:
// ❌ Too abbreviated
.withIdentity('CS asst')
.withCapability('Proc ret')

// ✅ Clear and concise
.withIdentity('You are a customer service assistant')
.withCapability('Process returns')
Premature Optimization: Optimize after validating prompt behavior:
  1. Build and test with markdown ✅
  2. Validate behavior thoroughly ✅
  3. Switch to TOON for production ✅
  4. Not: Start with TOON and debug ❌

Next Steps

Testing

Validate TOON format effectiveness

Composition

Optimize reusable templates

API Reference

Detailed format documentation

Examples

See optimization examples

Build docs developers (and LLMs) love