Overview
The StatisticsCalculator class provides comprehensive statistical analysis for tokenized text. It calculates token counts, character counts, word counts, cost estimates, context utilization, and provides model comparison capabilities.
This calculator works with data from 48 AI models and provides accurate cost estimates based on current pricing.
Constructor
Creates a new StatisticsCalculator instance.
const calculator = new StatisticsCalculator ();
The calculator is stateless and can be reused for multiple calculations.
Methods
calculateStatistics()
Calculates comprehensive statistics for the given text and model.
calculateStatistics ( text , tokenResult , modelId )
Result object from TokenizationService.tokenizeText()
Model identifier (e.g., “gpt-4o”, “claude-3.5-sonnet”)
Comprehensive statistics object
Return value structure:
Total number of characters
Estimated cost in USD for input tokens
Percentage of context window used (0-100)
Average tokens per word ratio
Cost per 1M input tokens in USD
Cost per 1M output tokens in USD
Basic Usage
Multiple Models
const calculator = new StatisticsCalculator ();
const tokenizer = new TokenizationService ();
const text = "Hello world! This is a test." ;
const tokenResult = await tokenizer . tokenizeText ( text , 'gpt-4o' );
const stats = calculator . calculateStatistics ( text , tokenResult , 'gpt-4o' );
console . log ( stats );
// {
// tokenCount: 8,
// charCount: 29,
// wordCount: 6,
// costEstimate: 0.00002,
// contextUtilization: 0.00625,
// tokensPerWord: 1.33,
// inputCostPer1M: 2.50,
// outputCostPer1M: 10.00
// }
countWords()
Counts words in text using intelligent word boundary detection.
Number of words (0 for empty text)
Algorithm:
Trims whitespace from text
Splits on whitespace characters (\s+)
Filters out empty strings
Returns count
const calculator = new StatisticsCalculator ();
console . log ( calculator . countWords ( 'Hello world' ));
// 2
console . log ( calculator . countWords ( ' Multiple spaces between ' ));
// 3
console . log ( calculator . countWords ( '' ));
// 0
console . log ( calculator . countWords ( 'One-hyphenated-word' ));
// 1
calculateCost()
Calculates estimated cost based on token count and model pricing.
calculateCost ( tokenCount , modelInfo )
Model information object from MODELS_DATA
Cost calculation formula:
cost = (tokenCount / 1,000,000) × inputCostPer1M
const calculator = new StatisticsCalculator ();
const modelInfo = MODELS_DATA [ 'gpt-4o' ];
// Calculate cost for 1,000 tokens
const cost1k = calculator . calculateCost ( 1000 , modelInfo );
console . log ( `1K tokens: $ ${ cost1k . toFixed ( 6 ) } ` );
// 1K tokens: $0.002500
// Calculate cost for 1,000,000 tokens
const cost1m = calculator . calculateCost ( 1000000 , modelInfo );
console . log ( `1M tokens: $ ${ cost1m . toFixed ( 2 ) } ` );
// 1M tokens: $2.50
Cost estimates are based on input token pricing. Output tokens typically cost more.
calculateContextUtilization()
Calculates the percentage of the model’s context window being used.
calculateContextUtilization ( tokenCount , contextLimit )
Number of tokens in the text
Maximum context window size for the model
Percentage from 0 to 100 (capped at 100)
const calculator = new StatisticsCalculator ();
// GPT-4o has 128K context
console . log ( calculator . calculateContextUtilization ( 1000 , 128000 ));
// 0.78 (less than 1%)
console . log ( calculator . calculateContextUtilization ( 64000 , 128000 ));
// 50.0 (half the context)
console . log ( calculator . calculateContextUtilization ( 128000 , 128000 ));
// 100.0 (full context)
console . log ( calculator . calculateContextUtilization ( 150000 , 128000 ));
// 100.0 (capped at 100%, actually exceeds)
exceedsContextLimit()
Checks if token count exceeds the model’s context limit.
exceedsContextLimit ( tokenCount , modelId )
True if exceeds limit, false otherwise
const calculator = new StatisticsCalculator ();
// GPT-4o has 128K context limit
console . log ( calculator . exceedsContextLimit ( 100000 , 'gpt-4o' ));
// false
console . log ( calculator . exceedsContextLimit ( 150000 , 'gpt-4o' ));
// true
// GPT-3.5 has 16K context limit
console . log ( calculator . exceedsContextLimit ( 20000 , 'gpt-3.5-turbo' ));
// true
getContextWarning()
Returns a warning message if context usage is high or exceeded.
getContextWarning ( tokenCount , modelId )
Warning message or null if no warning needed
Warning thresholds:
100%+ (Exceeded)
90-99% (Near Limit)
75-89% (High Usage)
Under 75% (No Warning)
"⚠️ Texto excede el límite de contexto del modelo (128,000 tokens)"
Text exceeds the model’s maximum context window. "⚠️ Cerca del límite de contexto (95.5% utilizado)"
Approaching the context limit, may cause issues. "ℹ️ Alto uso del contexto (82.3% utilizado)"
High context usage, consider splitting text. Context usage is acceptable, no warning needed.
const calculator = new StatisticsCalculator ();
// GPT-4o: 128K context
console . log ( calculator . getContextWarning ( 50000 , 'gpt-4o' ));
// null (39% usage)
console . log ( calculator . getContextWarning ( 100000 , 'gpt-4o' ));
// "ℹ️ Alto uso del contexto (78.1% utilizado)"
console . log ( calculator . getContextWarning ( 120000 , 'gpt-4o' ));
// "⚠️ Cerca del límite de contexto (93.8% utilizado)"
console . log ( calculator . getContextWarning ( 150000 , 'gpt-4o' ));
// "⚠️ Texto excede el límite de contexto del modelo (128,000 tokens)"
Formats statistics for display with proper localization and units.
Raw statistics object from calculateStatistics()
Formatted statistics with string values
const calculator = new StatisticsCalculator ();
const rawStats = {
tokenCount: 15847 ,
charCount: 72456 ,
wordCount: 11234 ,
costEstimate: 0.03961175 ,
contextUtilization: 12.380469 ,
tokensPerWord: 1.410987 ,
inputCostPer1M: 2.50 ,
outputCostPer1M: 10.00
};
const formatted = calculator . formatStatistics ( rawStats );
console . log ( formatted );
// {
// tokenCount: "15,847",
// charCount: "72,456",
// wordCount: "11,234",
// costEstimate: "$0.039612",
// contextUtilization: "12.4%",
// tokensPerWord: "1.41",
// inputCostPer1M: "$2.50/1M",
// outputCostPer1M: "$10.00/1M"
// }
Use formatted statistics for displaying in UI. They include proper thousand separators, currency symbols, and percentage signs.
compareModels()
Compares tokenization statistics across multiple models.
async compareModels ( text , modelIds , tokenizationService )
Array of model IDs to compare
tokenizationService
TokenizationService
required
Tokenization service instance
Array of comparison objects sorted by cost (cheapest first)
Comparison object structure:
Model provider (e.g., “OpenAI”, “Anthropic”)
Formatted statistics for display
Basic Comparison
Cost Optimization
const calculator = new StatisticsCalculator ();
const tokenizer = new TokenizationService ();
await tokenizer . waitForInitialization ();
const text = "Long text for comparison..." ;
const comparison = await calculator . compareModels (
text ,
[ 'gpt-4o' , 'claude-3.5-sonnet' , 'llama-3.1-70b' ],
tokenizer
);
comparison . forEach ( result => {
console . log ( ` ${ result . modelId } ( ${ result . company } ):` );
console . log ( ` Tokens: ${ result . formatted . tokenCount } ` );
console . log ( ` Cost: ${ result . formatted . costEstimate } ` );
});
Comparison results are automatically sorted by cost estimate, making it easy to find the most economical model for your text.
getEfficiencyMetrics()
Calculates efficiency metrics for tokenization analysis.
getEfficiencyMetrics ( stats )
Statistics object from calculateStatistics()
Efficiency metrics object
Return value structure:
Cost per thousand tokens (lower is better)
Tokens per character (lower = better compression)
Tokens per word (lower = more efficient encoding)
Example
Model Efficiency Comparison
const calculator = new StatisticsCalculator ();
const stats = {
tokenCount: 1000 ,
charCount: 4500 ,
wordCount: 750 ,
costEstimate: 0.0025
};
const metrics = calculator . getEfficiencyMetrics ( stats );
console . log ( metrics );
// {
// costEfficiency: 2.5, // $2.50 per 1000 tokens
// compressionRatio: 0.222, // 0.22 tokens per character
// verbosityIndex: 1.333 // 1.33 tokens per word
// }
Usage Examples
Complete Analysis
Cost Budgeting
Context Management
const calculator = new StatisticsCalculator ();
const tokenizer = new TokenizationService ();
await tokenizer . waitForInitialization ();
const text = `
This is a sample text for comprehensive tokenization analysis.
We'll analyze tokens, costs, and efficiency metrics.
` ;
const modelId = 'gpt-4o' ;
// Tokenize
const tokenResult = await tokenizer . tokenizeText ( text , modelId );
// Calculate statistics
const stats = calculator . calculateStatistics ( text , tokenResult , modelId );
// Get formatted display values
const formatted = calculator . formatStatistics ( stats );
// Check for warnings
const warning = calculator . getContextWarning ( stats . tokenCount , modelId );
// Get efficiency metrics
const efficiency = calculator . getEfficiencyMetrics ( stats );
console . log ( 'Statistics:' , formatted );
if ( warning ) console . log ( 'Warning:' , warning );
console . log ( 'Efficiency:' , efficiency );
Statistics Interpretation
The total number of tokens the text is divided into. This directly impacts:
API costs (priced per token)
Processing time
Context window usage
Typical ranges:
Short prompt: 10-100 tokens
Medium text: 100-1,000 tokens
Long document: 1,000-10,000+ tokens
Total number of characters including spaces and punctuation. Rule of thumb: English text averages ~4 characters per token.
Number of words (whitespace-separated). Rule of thumb: English text averages ~0.75 tokens per word.
Estimated API cost for processing the text. Note: Based on input pricing. Output tokens cost more.Cost ranges (GPT-4o):
1K tokens: ~$0.0025
10K tokens: ~$0.025
100K tokens: ~$0.25
Percentage of the model’s context window being used. Guidelines:
Less than 50%: Comfortable usage
50-75%: Moderate usage
75-90%: High usage
90-100%: Near limit
Greater than 100%: Exceeds limit (will fail)
Average number of tokens per word. Typical values:
English: 1.3-1.5
Code: 1.5-2.0
Non-English: varies by language
Lower values indicate more efficient tokenization.
Cost Optimization Tips
Choose Efficient Models Compare models to find the best token-to-cost ratio for your use case
Minimize Prompt Length Remove unnecessary context and instructions to reduce token count
Use Smaller Models Consider mini variants (e.g., gpt-4o-mini) for simpler tasks
Batch Requests Process multiple items in one request to reduce per-request overhead
See Also
TokenAnalyzer Main application orchestrator
TokenizationService Tokenization engine
UIController UI management
Supported Models View all model pricing