Skip to main content
Understanding how AI API costs work is essential for budgeting and optimization. This guide explains Tokenizador’s cost calculation system and provides strategies for minimizing expenses.

How Costs Are Calculated

AI model costs are based on token count, not character count or request count. Different models have different prices per million tokens.

Basic Cost Formula

// From statistics-calculator.js:58-64
calculateCost(tokenCount, modelInfo) {
  if (!modelInfo || !modelInfo.inputCost) {
    return 0;
  }
  // Calculate cost based on input tokens (cost per 1M tokens)
  return (tokenCount / 1000000) * modelInfo.inputCost;
}

Input Formula

Cost = (Token Count / 1,000,000) × Input Price

Example (GPT-4o):
1,000 tokens = (1000 / 1000000) × $2.50
             = $0.0025

Output Formula

Cost = (Token Count / 1,000,000) × Output Price

Example (GPT-4o):
1,000 tokens = (1000 / 1000000) × $10.00
             = $0.01

Input vs Output Costs

Output tokens are typically 2-5x more expensive than input tokens. This is crucial for cost optimization.

Price Comparison Across Models

ModelInput (per 1M)Output (per 1M)Ratio
GPT-4o$2.50$10.004.0x
GPT-4o Mini$0.15$0.604.0x
GPT-4 Turbo$10.00$30.003.0x
GPT-4$30.00$60.002.0x
GPT-3.5 Turbo$0.50$1.503.0x
// From models-config.js:88-89
inputCost: 2.50,
outputCost: 10.00

Why Output Costs More

Generating tokens requires more computation than processing them:Input Processing:
  • One forward pass through model
  • Parallel processing possible
  • Relatively fast
Output Generation:
  • Multiple forward passes (one per token)
  • Sequential processing required
  • Sampling and search algorithms
  • Quality checks and safety filters
Result: 4-5x more computational resources.
Providers incentivize efficient usage:
  • Encourages concise prompts (lower input)
  • Rewards efficient prompt engineering
  • Discourages generating unnecessary output
  • Balances infrastructure costs
Output generation requires maintaining full context:
Input: Process once and cache
Output: Each token attends to all previous tokens

Example - Generating 1000 tokens:
Token 1:   Attends to input (e.g., 500 tokens)
Token 2:   Attends to input + token 1 (501 tokens)
Token 3:   Attends to input + tokens 1-2 (502 tokens)
...
Token 1000: Attends to all (1,500 tokens)

Total attention operations: 500,500
This quadratic growth makes output expensive.

Tokenizador’s Cost Display

What’s Shown in the Interface

// From statistics-calculator.js:16-37
calculateStatistics(text, tokenResult, modelId) {
  const modelInfo = MODELS_DATA[modelId];
  const tokenCount = tokenResult.count || 0;
  const costEstimate = this.calculateCost(tokenCount, modelInfo);
  
  return {
    tokenCount,
    charCount: text.length,
    wordCount: this.countWords(text),
    costEstimate,              // Input cost only
    inputCostPer1M: modelInfo.inputCost,
    outputCostPer1M: modelInfo.outputCost
  };
}

Displayed Cost

The main cost display shows input token cost only:
Your text: 1,000 tokens
Model: GPT-4o

Displayed: $0.0025
Calculation: 1000 / 1,000,000 × $2.50
This represents the cost to send your text to the model.

Full Cost Info

Model information panel shows both:
Input: $2.50/1M tokens
Output: $10.00/1M tokens
You must account for output costs separately based on expected response length.

Calculating Total API Cost

// Example: Customer support chatbot

const inputTokens = 500;      // User message + context
const outputTokens = 200;     // Bot response
const model = 'GPT-4o Mini';

const inputCost = (500 / 1_000_000) * 0.15;   // $0.000075
const outputCost = (200 / 1_000_000) * 0.60;  // $0.000120
const totalCost = inputCost + outputCost;     // $0.000195

// Per 1M conversations:
const millionConversations = totalCost * 1_000_000;  // $195
Output typically costs more despite fewer tokens.

Cost Optimization Strategies

1. Model Selection

Don’t use expensive models for simple tasks:
// Task complexity analysis
const taskToModel = {
  // Simple tasks - use cheapest models
  'classification': ['GPT-4o Mini', 'Gemini Flash', 'Llama 3.1 8B'],
  'extraction': ['GPT-4o Mini', 'Command R', 'Mistral Nemo'],
  'simple_qa': ['GPT-3.5 Turbo', 'Claude Haiku', 'Gemini Flash'],
  
  // Medium tasks - balance cost and quality
  'summarization': ['GPT-4o', 'Claude 3.5 Sonnet', 'Llama 3.1 70B'],
  'translation': ['GPT-4o', 'Command R+', 'Qwen 2.5'],
  'analysis': ['GPT-4o', 'Claude Sonnet', 'Mistral Large'],
  
  // Complex tasks - use best models
  'reasoning': ['GPT-4o', 'Claude 3.5 Sonnet', 'Claude Opus'],
  'code_generation': ['GPT-4o', 'Claude 3.5 Sonnet', 'Llama 3.1 405B'],
  'creative_writing': ['GPT-4o', 'Claude 3.5 Sonnet', 'Command R+']
};
Potential savings: 80-95% by using appropriate models.
Models with lower token ratios save costs:
// Same 10,000 character input
const text = "...10,000 characters...";

// Token counts (approximate)
const tokenization = {
  'Qwen 2.5':    2300,  // 0.92 ratio
  'Llama 3.1':   2375,  // 0.95 ratio  
  'GPT-4o':      2500,  // 1.0 ratio (baseline)
  'Gemini 1.5':  2625,  // 1.05 ratio
  'Claude 3.5':  2750   // 1.1 ratio
};

// Cost comparison at $0.35 per 1M (Llama 3.1 70B pricing)
const costs = {
  'Qwen 2.5':   (2300 / 1_000_000) * 0.35,  // $0.000805
  'Llama 3.1':  (2375 / 1_000_000) * 0.35,  // $0.000831
  'GPT-4o':     (2500 / 1_000_000) * 2.50,  // $0.00625
  'Claude 3.5': (2750 / 1_000_000) * 3.00   // $0.00825
};
Key insight: Token efficiency matters most when prices are similar.
Some providers offer batch processing:
// OpenAI Batch API example
const standardCost = {
  'GPT-4o': 2.50,           // Input per 1M
  'GPT-4o Batch': 1.25      // 50% discount
};

// Trade-off: 24-hour processing time
// Best for: Non-urgent bulk processing
Check provider docs for batch discounts (not shown in Tokenizador).

2. Prompt Engineering

Every token costs money:
// Verbose prompt (expensive)
const verbosePrompt = `
I would really appreciate it if you could please take a moment 
to carefully analyze the following text and provide me with a 
comprehensive summary that captures all of the main points and 
key ideas presented in the document.
`;
// Tokens: ~45

// Concise prompt (efficient)
const concisePrompt = `
Summarize the key points:
`;
// Tokens: ~7

// Savings: 84% reduction
Best practices:
  • Remove filler words
  • Use direct instructions
  • Avoid redundancy
  • Let the model infer context
Limit unnecessary generation:
// Uncontrolled output
const openEnded = "Explain quantum computing.";
// Model might generate: 2000+ tokens

// Controlled output
const controlled = "Explain quantum computing in 100 words.";
// Model generates: ~130 tokens (includes some overhead)

// Cost comparison (GPT-4o):
// Uncontrolled: 2000 tokens × $10/1M = $0.020
// Controlled:   130 tokens × $10/1M  = $0.0013
// Savings: 93.5%
Techniques:
  • Specify word/token limits
  • Request bullet points instead of paragraphs
  • Use “briefly” or “concisely”
  • Set max_tokens in API calls
Minimize repeated context:
// Inefficient: Resending full context each time
const inefficientChat = [
  { tokens: 5000 },  // Initial context + message 1
  { tokens: 5100 },  // Full context + message 2
  { tokens: 5200 },  // Full context + message 3
];
// Total: 15,300 tokens

// Efficient: Use conversation memory strategically
const efficientChat = [
  { tokens: 5000 },  // Initial context + message 1
  { tokens: 1100 },  // Recent context + message 2
  { tokens: 1200 },  // Recent context + message 3
];
// Total: 7,300 tokens (52% savings)
Strategies:
  • Maintain summary of conversation
  • Only send recent exchanges
  • Use semantic search for relevant context
  • Clear context when topic changes

3. Caching and Preprocessing

Store and reuse frequent outputs:
// Example: FAQ system
const cache = new Map();

async function getAnswer(question) {
  // Check cache first
  const cacheKey = normalizeQuestion(question);
  if (cache.has(cacheKey)) {
    return cache.get(cacheKey);  // $0 cost
  }
  
  // Call API only for new questions
  const answer = await callAI(question);  // Full cost
  cache.set(cacheKey, answer);
  return answer;
}

// If 70% of questions are repeats:
// Savings: 70% of API costs
Reduce token count before API calls:
// Remove unnecessary whitespace
function compress(text) {
  return text
    .replace(/\s+/g, ' ')           // Multiple spaces → single
    .replace(/\n\s*\n/g, '\n')      // Multiple newlines → single
    .trim();
}

// Example:
const original = `
  This  has     extra    spaces.
  
  
  And multiple blank lines.
`;
// Tokens: ~18

const compressed = compress(original);
// "This has extra spaces. And multiple blank lines."
// Tokens: ~11
// Savings: 39%
Embeddings are much cheaper than generation:
// Cost comparison
const embedding = {
  model: 'text-embedding-3-small',
  inputCost: 0.02,        // Per 1M tokens
  outputCost: 0           // No output
};

const generation = {
  model: 'GPT-4o',
  inputCost: 2.50,        // Per 1M tokens
  outputCost: 10.00       // Per 1M tokens
};

// Embedding 1M tokens: $0.02
// Generating 1M tokens: $12.50 (input + output)
// Embeddings are 625x cheaper!
Use case: Semantic search instead of asking AI to find information.

Real-World Cost Scenarios

Scenario 1: Customer Support Chatbot

  • 10,000 conversations per day
  • Average 5 exchanges per conversation
  • 100 tokens per user message (with context)
  • 150 tokens per bot response
  • Need high quality responses

Scenario 2: Document Summarization Service

  • 1,000 documents per day
  • Average 20,000 tokens per document
  • Generate 500 token summaries
  • Accuracy is critical

Scenario 3: Code Analysis Tool

  • Analyze codebases for bugs and improvements
  • Average codebase: 50,000 tokens
  • Generate 2,000 token reports
  • 100 analyses per day
  • Need high accuracy

Cost Tracking and Monitoring

Build Your Own Cost Calculator

// Comprehensive cost tracking
class CostTracker {
  constructor(modelId) {
    this.modelInfo = MODELS_DATA[modelId];
    this.totalInputTokens = 0;
    this.totalOutputTokens = 0;
  }
  
  trackRequest(inputTokens, outputTokens) {
    this.totalInputTokens += inputTokens;
    this.totalOutputTokens += outputTokens;
  }
  
  getCurrentCost() {
    const inputCost = (this.totalInputTokens / 1_000_000) * 
                      this.modelInfo.inputCost;
    const outputCost = (this.totalOutputTokens / 1_000_000) * 
                       this.modelInfo.outputCost;
    return {
      input: inputCost,
      output: outputCost,
      total: inputCost + outputCost,
      breakdown: {
        inputPercentage: (inputCost / (inputCost + outputCost)) * 100,
        outputPercentage: (outputCost / (inputCost + outputCost)) * 100
      }
    };
  }
  
  getProjections(requestsPerDay) {
    const avgInputPerRequest = this.totalInputTokens / this.requestCount;
    const avgOutputPerRequest = this.totalOutputTokens / this.requestCount;
    
    return {
      daily: this.projectCost(requestsPerDay, avgInputPerRequest, avgOutputPerRequest),
      monthly: this.projectCost(requestsPerDay * 30, avgInputPerRequest, avgOutputPerRequest),
      annual: this.projectCost(requestsPerDay * 365, avgInputPerRequest, avgOutputPerRequest)
    };
  }
}

Monitoring Dashboard Metrics

Token Efficiency

Track tokens per request:
{  
  avgInputTokens: 1250,
  avgOutputTokens: 800,
  avgRatio: 0.64,
  trend: "↓ 5% this week"
}

Cost per Request

Monitor unit costs:
{
  current: "$0.0042",
  target: "$0.0035",
  variance: "+20%",
  status: "Above target"
}

Model Distribution

Track model usage:
{
  "GPT-4o Mini": "75%",
  "GPT-4o": "20%",
  "Claude 3.5": "5%"
}

Optimization Opportunities

Identify savings:
{
  "Reduce prompts": "$120/mo",
  "Limit output": "$350/mo",
  "Switch models": "$200/mo"
}

Common Cost Mistakes

Avoid these expensive mistakes:
Problem:
// Using GPT-4o for simple classification
const result = await classify(text, 'GPT-4o');
// Cost: $0.0025 per 1000 tokens
Solution:
// Use GPT-4o Mini for simple tasks
const result = await classify(text, 'GPT-4o Mini');
// Cost: $0.00015 per 1000 tokens
// Savings: 94%
Problem:
// Requesting verbose outputs
prompt = "Write a comprehensive, detailed analysis...";
// Output: 3000+ tokens at $10/1M = $0.030+
Solution:
// Request concise outputs
prompt = "Write a concise analysis (max 200 words)...";
// Output: ~300 tokens at $10/1M = $0.003
// Savings: 90%
Problem:
// Sending full conversation history every time
const context = fullHistory.join('\n');  // 10,000 tokens
const prompt = context + newMessage;     // +100 tokens
// Cost per message: $0.025
Solution:
// Summarize and limit context
const context = recentHistory.slice(-5).join('\n');  // 500 tokens
const prompt = context + newMessage;                  // +100 tokens
// Cost per message: $0.0015
// Savings: 94%
Problem:
// Regenerating same content repeatedly
for (let i = 0; i < 1000; i++) {
  await generateFAQ(commonQuestion);
}
// Cost: 1000x API calls
Solution:
// Cache frequent responses
const cached = cache.get(commonQuestion);
if (cached) return cached;

const result = await generateFAQ(commonQuestion);
cache.set(commonQuestion, result);
// Cost: 1x API call (999 cache hits = $0)

Tools and Resources

Tokenizador

Use this tool to estimate costs before implementation

Model Comparison

Compare pricing across all 48 supported models

OpenAI Pricing

Official OpenAI pricing page

Anthropic Pricing

Official Anthropic pricing page

Cost Calculator Spreadsheet

Build a custom spreadsheet with your usage patterns

API Usage Dashboards

Monitor actual costs through provider dashboards

Quick Reference

Budget Models (< $0.20 per 1M input)

ModelInputOutputBest For
Granite 3 2B$0.025$0.025Ultra-low-cost tasks
Llama 3.1 8B$0.055$0.055Budget-friendly general use
Gemini 1.5 Flash$0.075$0.30High-volume, large context
DeepSeek V2.5$0.14$0.28Balanced budget option
GPT-4o Mini$0.15$0.60Quality on a budget

Key Cost Principles

1

Output costs 2-5x more than input

Always optimize for shorter outputs when possible.
2

Token efficiency varies by model

Consider tokenRatio when comparing costs.
3

Match model to task complexity

Don’t overspend on simple tasks.
4

Cache aggressively

Reuse responses whenever possible.
5

Monitor and optimize

Track costs and iterate on efficiency.

Next Steps

How to Use

Learn to use Tokenizador effectively

Supported Models

Full model specifications and pricing

Build docs developers (and LLMs) love