Getting Started
Select Your Model
Choose from 48 AI models across different providers. The model selector is organized by company:Each model has different tokenization characteristics, costs, and context limits.
Enter Your Text
Type or paste your text into the input area. The analyzer processes tokens in real-time as you type:
Tokenization happens automatically with minimal delay, providing instant feedback on token count and costs.
Understanding the Interface
Statistics Dashboard
The stats grid shows four key metrics updated in real-time:Total Tokens
Precise token count calculated using the tiktoken library, matching the actual tokenization used by AI providers.
Characters
Total character count including spaces, useful for understanding compression ratios.
Words
Intelligent word counting using whitespace separation.
Estimated Cost
Real-time cost calculation based on current pricing per 1M tokens.
Model Information Panel
After entering text, you’ll see detailed model information:Token Visualization Features
Color-Coded Tokens
Tokens are displayed with different visual styles based on their type:Token Type Color Coding
Token Type Color Coding
The tokenization service categorizes tokens into different types for better visualization:
- palabra (word) - Regular word tokens
- palabra_con_espacio - Words that include leading spaces
- subword - Parts of longer words (BPE sub-tokens)
- espacio_en_blanco (whitespace) - Space and tab tokens
- number - Numeric tokens
- punctuation - Punctuation marks
- special - Special characters and symbols
Interactive Token List
The token list shows each token with its unique ID:Advanced Features
Real-Time Cost Estimation
Cost calculation is performed using the pricing data frommodels-config.js:
- Input Cost
- Output Cost
The displayed cost represents the price for input tokens (text sent to the model).Example for GPT-4o:
- Input: $2.50 per 1M tokens
- 1,000 tokens = $0.0025
Context Limit Warnings
The analyzer monitors your context usage:75-89% Usage
High context usage - consider if you have room for responses
90-99% Usage
Near context limit - very little room remaining
100%+ Usage
Exceeds context limit - text will be truncated
Tips for Effective Use
Compare Models for Cost Optimization
Compare Models for Cost Optimization
Different models tokenize text differently. Compare models to find the most cost-effective option:
- GPT-4o uses
o200k_baseencoding (newer, more efficient) - GPT-4 uses
cl100k_baseencoding - Claude models typically use ~10% more tokens
- Llama models typically use ~5% fewer tokens
Understand Token-to-Word Ratios
Understand Token-to-Word Ratios
Monitor the tokens-per-word ratio in your statistics:
- English text: typically 1.3-1.5 tokens per word
- Code: can be 2-3+ tokens per word
- Special characters: may be individual tokens
Use the Clear Button
Use the Clear Button
Keyboard Shortcuts
The text input area supports all standard keyboard shortcuts for text editing:
Ctrl/Cmd + A- Select allCtrl/Cmd + C- CopyCtrl/Cmd + V- PasteCtrl/Cmd + Z- Undo
Mobile Usage
Tokenizador is fully responsive and works on mobile devices:- Touch-friendly interface
- Optimized model selector for mobile
- Responsive token visualization
- Full feature parity with desktop
Troubleshooting
Token IDs Show as Approximate
Token IDs Show as Approximate
If you see Token counts remain accurate, but specific IDs may differ from production.
isApproximate: true in token data, the tiktoken library couldn’t load. The fallback tokenizer is being used:Costs Seem Different from API Usage
Costs Seem Different from API Usage
The analyzer calculates input token costs only. Actual API costs include:
- Output tokens (usually more expensive)
- Any additional API fees
- Volume discounts or special pricing
Model Not in Dropdown
Model Not in Dropdown
The tool includes 48 models as of the last update. If you need a newer model:
- Use a similar model from the same provider
- Models from the same family usually share tokenization
- Check the supported models guide for the complete list
Next Steps
Supported Models
Explore all 48 models with detailed pricing and specifications
Understanding Tokenization
Learn how tokenization works and why it matters
Cost Estimation
Deep dive into how costs are calculated
GitHub Repository
View source code and contribute