StatisticsCalculator

Overview

The StatisticsCalculator class provides comprehensive statistical analysis for tokenized text. It calculates token counts, character counts, word counts, cost estimates, context utilization, and provides model comparison capabilities.

This calculator works with data from 48 AI models and provides accurate cost estimates based on current pricing.

Constructor

Creates a new StatisticsCalculator instance.

const calculator = new StatisticsCalculator();

The calculator is stateless and can be reused for multiple calculations.

Methods

calculateStatistics()

Calculates comprehensive statistics for the given text and model.

calculateStatistics(text, tokenResult, modelId)

text

string

required

The original input text

tokenResult

Object

required

Result object from TokenizationService.tokenizeText()

modelId

string

required

Model identifier (e.g., “gpt-4o”, “claude-3.5-sonnet”)

returns

Object

Comprehensive statistics object

Return value structure:

tokenCount

number

Total number of tokens

charCount

number

Total number of characters

wordCount

number

Total number of words

costEstimate

number

Estimated cost in USD for input tokens

contextUtilization

number

Percentage of context window used (0-100)

tokensPerWord

number

Average tokens per word ratio

inputCostPer1M

number

Cost per 1M input tokens in USD

outputCostPer1M

number

Cost per 1M output tokens in USD

const calculator = new StatisticsCalculator();
const tokenizer = new TokenizationService();

const text = "Hello world! This is a test.";
const tokenResult = await tokenizer.tokenizeText(text, 'gpt-4o');

const stats = calculator.calculateStatistics(text, tokenResult, 'gpt-4o');

console.log(stats);
// {
//   tokenCount: 8,
//   charCount: 29,
//   wordCount: 6,
//   costEstimate: 0.00002,
//   contextUtilization: 0.00625,
//   tokensPerWord: 1.33,
//   inputCostPer1M: 2.50,
//   outputCostPer1M: 10.00
// }

countWords()

Counts words in text using intelligent word boundary detection.

countWords(text)

text

string

required

Text to analyze

returns

number

Number of words (0 for empty text)

Algorithm:

Trims whitespace from text
Splits on whitespace characters (\s+)
Filters out empty strings
Returns count

const calculator = new StatisticsCalculator();

console.log(calculator.countWords('Hello world'));
// 2

console.log(calculator.countWords('  Multiple   spaces   between  '));
// 3

console.log(calculator.countWords(''));
// 0

console.log(calculator.countWords('One-hyphenated-word'));
// 1

calculateCost()

Calculates estimated cost based on token count and model pricing.

calculateCost(tokenCount, modelInfo)

tokenCount

number

required

Number of tokens

modelInfo

Object

required

Model information object from MODELS_DATA

returns

number

Estimated cost in USD

Cost calculation formula:

cost = (tokenCount / 1,000,000) × inputCostPer1M

const calculator = new StatisticsCalculator();
const modelInfo = MODELS_DATA['gpt-4o'];

// Calculate cost for 1,000 tokens
const cost1k = calculator.calculateCost(1000, modelInfo);
console.log(`1K tokens: $${cost1k.toFixed(6)}`);
// 1K tokens: $0.002500

// Calculate cost for 1,000,000 tokens
const cost1m = calculator.calculateCost(1000000, modelInfo);
console.log(`1M tokens: $${cost1m.toFixed(2)}`);
// 1M tokens: $2.50

Cost estimates are based on input token pricing. Output tokens typically cost more.

calculateContextUtilization()

Calculates the percentage of the model’s context window being used.

calculateContextUtilization(tokenCount, contextLimit)

tokenCount

number

required

Number of tokens in the text

contextLimit

number

required

Maximum context window size for the model

returns

number

Percentage from 0 to 100 (capped at 100)

const calculator = new StatisticsCalculator();

// GPT-4o has 128K context
console.log(calculator.calculateContextUtilization(1000, 128000));
// 0.78 (less than 1%)

console.log(calculator.calculateContextUtilization(64000, 128000));
// 50.0 (half the context)

console.log(calculator.calculateContextUtilization(128000, 128000));
// 100.0 (full context)

console.log(calculator.calculateContextUtilization(150000, 128000));
// 100.0 (capped at 100%, actually exceeds)

exceedsContextLimit()

Checks if token count exceeds the model’s context limit.

exceedsContextLimit(tokenCount, modelId)

tokenCount

number

required

Number of tokens

modelId

string

required

Model identifier

returns

boolean

True if exceeds limit, false otherwise

const calculator = new StatisticsCalculator();

// GPT-4o has 128K context limit
console.log(calculator.exceedsContextLimit(100000, 'gpt-4o'));
// false

console.log(calculator.exceedsContextLimit(150000, 'gpt-4o'));
// true

// GPT-3.5 has 16K context limit
console.log(calculator.exceedsContextLimit(20000, 'gpt-3.5-turbo'));
// true

getContextWarning()

Returns a warning message if context usage is high or exceeded.

getContextWarning(tokenCount, modelId)

tokenCount

number

required

Number of tokens

modelId

string

required

Model identifier

returns

string|null

Warning message or null if no warning needed

Warning thresholds:

100%+ (Exceeded)
90-99% (Near Limit)
75-89% (High Usage)
Under 75% (No Warning)

"⚠️ Texto excede el límite de contexto del modelo (128,000 tokens)"

Text exceeds the model’s maximum context window.

"⚠️ Cerca del límite de contexto (95.5% utilizado)"

Approaching the context limit, may cause issues.

"ℹ️ Alto uso del contexto (82.3% utilizado)"

High context usage, consider splitting text.

null

Context usage is acceptable, no warning needed.

const calculator = new StatisticsCalculator();

// GPT-4o: 128K context
console.log(calculator.getContextWarning(50000, 'gpt-4o'));
// null (39% usage)

console.log(calculator.getContextWarning(100000, 'gpt-4o'));
// "ℹ️ Alto uso del contexto (78.1% utilizado)"

console.log(calculator.getContextWarning(120000, 'gpt-4o'));
// "⚠️ Cerca del límite de contexto (93.8% utilizado)"

console.log(calculator.getContextWarning(150000, 'gpt-4o'));
// "⚠️ Texto excede el límite de contexto del modelo (128,000 tokens)"

formatStatistics()

Formats statistics for display with proper localization and units.

formatStatistics(stats)

stats

Object

required

Raw statistics object from calculateStatistics()

returns

Object

Formatted statistics with string values

const calculator = new StatisticsCalculator();

const rawStats = {
  tokenCount: 15847,
  charCount: 72456,
  wordCount: 11234,
  costEstimate: 0.03961175,
  contextUtilization: 12.380469,
  tokensPerWord: 1.410987,
  inputCostPer1M: 2.50,
  outputCostPer1M: 10.00
};

const formatted = calculator.formatStatistics(rawStats);

console.log(formatted);
// {
//   tokenCount: "15,847",
//   charCount: "72,456",
//   wordCount: "11,234",
//   costEstimate: "$0.039612",
//   contextUtilization: "12.4%",
//   tokensPerWord: "1.41",
//   inputCostPer1M: "$2.50/1M",
//   outputCostPer1M: "$10.00/1M"
// }

Use formatted statistics for displaying in UI. They include proper thousand separators, currency symbols, and percentage signs.

compareModels()

Compares tokenization statistics across multiple models.

async compareModels(text, modelIds, tokenizationService)

text

string

required

Text to analyze

modelIds

string[]

required

Array of model IDs to compare

tokenizationService

TokenizationService

required

Tokenization service instance

returns

Promise<Array>

Array of comparison objects sorted by cost (cheapest first)

Comparison object structure:

modelId

string

Model identifier

company

string

Model provider (e.g., “OpenAI”, “Anthropic”)

stats

Object

Raw statistics object

formatted

Object

Formatted statistics for display

const calculator = new StatisticsCalculator();
const tokenizer = new TokenizationService();
await tokenizer.waitForInitialization();

const text = "Long text for comparison...";

const comparison = await calculator.compareModels(
  text,
  ['gpt-4o', 'claude-3.5-sonnet', 'llama-3.1-70b'],
  tokenizer
);

comparison.forEach(result => {
  console.log(`${result.modelId} (${result.company}):`);
  console.log(`  Tokens: ${result.formatted.tokenCount}`);
  console.log(`  Cost: ${result.formatted.costEstimate}`);
});

Comparison results are automatically sorted by cost estimate, making it easy to find the most economical model for your text.

getEfficiencyMetrics()

Calculates efficiency metrics for tokenization analysis.

getEfficiencyMetrics(stats)

stats

Object

required

Statistics object from calculateStatistics()

returns

Object

Efficiency metrics object

Return value structure:

costEfficiency

number

Cost per thousand tokens (lower is better)

compressionRatio

number

Tokens per character (lower = better compression)

verbosityIndex

number

Tokens per word (lower = more efficient encoding)

const calculator = new StatisticsCalculator();

const stats = {
  tokenCount: 1000,
  charCount: 4500,
  wordCount: 750,
  costEstimate: 0.0025
};

const metrics = calculator.getEfficiencyMetrics(stats);

console.log(metrics);
// {
//   costEfficiency: 2.5,        // $2.50 per 1000 tokens
//   compressionRatio: 0.222,    // 0.22 tokens per character
//   verbosityIndex: 1.333       // 1.33 tokens per word
// }

Usage Examples

const calculator = new StatisticsCalculator();
const tokenizer = new TokenizationService();
await tokenizer.waitForInitialization();

const text = `
  This is a sample text for comprehensive tokenization analysis.
  We'll analyze tokens, costs, and efficiency metrics.
`;

const modelId = 'gpt-4o';

// Tokenize
const tokenResult = await tokenizer.tokenizeText(text, modelId);

// Calculate statistics
const stats = calculator.calculateStatistics(text, tokenResult, modelId);

// Get formatted display values
const formatted = calculator.formatStatistics(stats);

// Check for warnings
const warning = calculator.getContextWarning(stats.tokenCount, modelId);

// Get efficiency metrics
const efficiency = calculator.getEfficiencyMetrics(stats);

console.log('Statistics:', formatted);
if (warning) console.log('Warning:', warning);
console.log('Efficiency:', efficiency);

Statistics Interpretation

Token Count

The total number of tokens the text is divided into. This directly impacts:

API costs (priced per token)
Processing time
Context window usage

Typical ranges:

Short prompt: 10-100 tokens
Medium text: 100-1,000 tokens
Long document: 1,000-10,000+ tokens

Character Count

Total number of characters including spaces and punctuation.Rule of thumb: English text averages ~4 characters per token.

Word Count

Number of words (whitespace-separated).Rule of thumb: English text averages ~0.75 tokens per word.

Cost Estimate

Estimated API cost for processing the text.Note: Based on input pricing. Output tokens cost more.Cost ranges (GPT-4o):

1K tokens: ~$0.0025
10K tokens: ~$0.025
100K tokens: ~$0.25

Context Utilization

Percentage of the model’s context window being used.Guidelines:

Less than 50%: Comfortable usage
50-75%: Moderate usage
75-90%: High usage
90-100%: Near limit
Greater than 100%: Exceeds limit (will fail)

Tokens Per Word

Average number of tokens per word.Typical values:

English: 1.3-1.5
Code: 1.5-2.0
Non-English: varies by language

Lower values indicate more efficient tokenization.

Cost Optimization Tips

Choose Efficient Models

Compare models to find the best token-to-cost ratio for your use case

Minimize Prompt Length

Remove unnecessary context and instructions to reduce token count

Use Smaller Models

Consider mini variants (e.g., gpt-4o-mini) for simpler tasks

Batch Requests

Process multiple items in one request to reduce per-request overhead

TokenAnalyzer

Main application orchestrator

TokenizationService

Tokenization engine

UIController

UI management

Supported Models

View all model pricing

Core Classes

Configuration

Overview

Constructor

Methods

calculateStatistics()

countWords()

calculateCost()

calculateContextUtilization()

exceedsContextLimit()

getContextWarning()

formatStatistics()

compareModels()

getEfficiencyMetrics()

Usage Examples

Statistics Interpretation

Cost Optimization Tips

Choose Efficient Models

Minimize Prompt Length

Use Smaller Models

Batch Requests

See Also

TokenAnalyzer

TokenizationService

UIController

Supported Models

Build docs developers (and LLMs) love

Core Classes

Configuration

​Overview

​Constructor

​Methods

​calculateStatistics()

​countWords()

​calculateCost()

​calculateContextUtilization()

​exceedsContextLimit()

​getContextWarning()

​formatStatistics()

​compareModels()

​getEfficiencyMetrics()

​Usage Examples

​Statistics Interpretation

​Cost Optimization Tips

Choose Efficient Models

Minimize Prompt Length

Use Smaller Models

Batch Requests

​See Also

TokenAnalyzer

TokenizationService

UIController

Supported Models

Build docs developers (and LLMs) love

Overview

Constructor

Methods

calculateStatistics()

countWords()

calculateCost()

calculateContextUtilization()

exceedsContextLimit()

getContextWarning()

formatStatistics()

compareModels()

getEfficiencyMetrics()

Usage Examples

Statistics Interpretation

Cost Optimization Tips

See Also