Executive Summary
Best Overall
Naive Bayes (alpha=0.5)
- 99.92% accuracy
- 0.03s training time
- 29.84 MB model size
- Best for production
Most Efficient
Linear SVM
- 99.77% accuracy
- 14.92 MB model size
- Smallest footprint
- Fast inference
Overall Performance Comparison
Accuracy Rankings
| Rank | Model | Validation Accuracy | Test Accuracy | Misclassifications (out of 7,350) |
|---|---|---|---|---|
| 🥇 1 | Naive Bayes | 99.92% | 99.92% | 6 |
| 🥈 2 | SVM | 99.77% | - | ~17 |
| 🥉 3 | Logistic Regression | 99.56% | - | ~32 |
| 4 | Random Forest | 99.41% | - | ~43 |
| 5 | BiLSTM | 94.08% | 93.67% | 465 |
| 6 | LSTM | 94.07% | - | 436 |
Training Time Rankings
| Rank | Model | Training Time | Relative Speed |
|---|---|---|---|
| 🥇 1 | Naive Bayes | 0.03s | 1x (baseline) |
| 🥈 2 | SVM | 0.59s | 20x slower |
| 🥉 3 | Logistic Regression | 14.73s | 491x slower |
| 4 | LSTM | ~1,312s (15 epochs × 82s) | 43,733x slower |
| 5 | BiLSTM | ~2,070s (15 epochs × 138s) | 69,000x slower |
| 6 | Random Forest | 128s | 4,267x slower |
Model Size Rankings
| Rank | Model | Model Size | Relative Size |
|---|---|---|---|
| 🥇 1 | BiLSTM | 0.62 MB | 1x (smallest) |
| 🥈 2 | LSTM | 0.62 MB | 1x |
| 🥉 3 | SVM | 14.92 MB | 24x larger |
| 4 | Logistic Regression | 14.92 MB | 24x larger |
| 5 | Naive Bayes | 29.84 MB | 48x larger |
| 6 | Random Forest | ~230 MB | 371x larger |
Detailed Performance Metrics
Traditional ML Models
- Naive Bayes
- SVM
- Random Forest
- Logistic Regression
Best Overall ModelHyperparameters:
- Alpha: 0.5 (smoothing parameter)
- Vectorizer: TF-IDF with word n-grams (1, 2)
- Validation Accuracy: 99.92%
- Test Accuracy: 99.92%
- Training Time: 0.03s
- Inference Time: 0.01s per batch
- Model Size: 29.84 MB
- Swedish (sv): 1.0000 ✓
- Dutch (nl): 1.0000 ✓
- Portuguese (pt): 0.9995
- Italian (it): 0.9991
- French (fr): 0.9986
- German (de): 0.9985
- Spanish (es): 0.9986
- Highest accuracy across all models
- Extremely fast training and inference
- Robust performance on all languages
- Production-ready out of the box
- Larger model size than SVM
- Assumes feature independence (not an issue in practice)
Deep Learning Models
- BiLSTM
- LSTM
Best Deep Learning ModelArchitecture:
- Embedding: 10K vocab, 100 dimensions
- Bidirectional LSTM: 32 units (64 total)
- Dense layer: 64 units with ReLU
- Output: 7 units with softmax
- Dropout: 0.3 rate
- Validation Accuracy: 94.08%
- Test Accuracy: 93.67%
- Training Time: ~2,070s (15 epochs)
- Inference Time: 30ms per sample
- Model Size: 0.62 MB
- Swedish (sv): 0.9747
- Dutch (nl): 0.9645
- German (de): 0.9659 ✓ (best DL)
- Portuguese (pt): 0.9368
- French (fr): 0.9365
- Spanish (es): 0.9272
- Italian (it): 0.8853
- Bidirectional context understanding
- Smallest model size (0.62 MB)
- Learns word embeddings
- Better on German, French, Spanish vs LSTM
- 5.85% lower accuracy than Naive Bayes
- 2x slower inference than LSTM
- Long training time
- Poor performance on Italian
Vectorization Techniques Comparison
Before settling on word-level TF-IDF, we compared multiple vectorization approaches:| Technique | Vectorization Time | Accuracy | Model Size |
|---|---|---|---|
| Word TF-IDF | 1.54s | 99.89% | 29.84 MB |
| Character TF-IDF | 4.99s | 99.88% | 9.66 MB |
| Hashing | 4.96s | 99.81% | Variable |
| Letter Frequency | 0.86s | 86.80% | 0.00 MB |
Per-Language Performance Analysis
Traditional ML (Naive Bayes) vs Deep Learning (BiLSTM)
- Swedish
- Dutch
- Portuguese
- Italian
- French
- German
- Spanish
| Model | Precision | Recall | F1-Score |
|---|---|---|---|
| Naive Bayes | 1.0000 | 1.0000 | 1.0000 |
| BiLSTM | 0.9939 | 0.9562 | 0.9747 |
| Difference | -0.61% | -4.38% | -2.53% |
Common Misclassifications
Naive Bayes Misclassifications (6 total)
Deep Learning Misclassifications (435-465 total)
Use Case Recommendations
Production Systems
Recommended: Naive Bayes
- Highest accuracy (99.92%)
- Fast inference (<10ms)
- Reliable and battle-tested
- Easy to deploy
Resource-Constrained
Recommended: SVM or BiLSTM
- SVM: 14.92 MB, 99.77% accuracy
- BiLSTM: 0.62 MB, 93.67% accuracy
- Trade accuracy for size based on needs
Real-Time Processing
Recommended: Naive Bayes or SVM
- Inference: <1ms per prediction
- Batch processing: thousands/second
- Low latency requirements
Research & Embeddings
Recommended: BiLSTM
- Learned embeddings reusable
- Interpretable attention weights
- Sequential pattern analysis
Training Cost Analysis
Assuming training on a standard machine (GPU for deep learning):| Model | Training Time | Est. Cloud Cost* | Retraining Frequency |
|---|---|---|---|
| Naive Bayes | 0.03s | $0.00 | Daily if needed |
| SVM | 0.59s | $0.00 | Daily if needed |
| Logistic Regression | 14.73s | $0.00 | Daily if needed |
| Random Forest | 128s | $0.01 | Weekly |
| LSTM | ~22 min | $0.10 | Monthly |
| BiLSTM | ~35 min | $0.15 | Monthly |
Traditional ML models can be retrained in seconds, making them ideal for continuous learning scenarios. Deep learning models require minutes to hours and are better suited for less frequent retraining.
Final Recommendations
For Most Users
For Embedded Systems
For Research
Summary Table
| Criteria | Best Model | Runner-Up |
|---|---|---|
| Accuracy | Naive Bayes (99.92%) | SVM (99.77%) |
| Training Speed | Naive Bayes (0.03s) | SVM (0.59s) |
| Inference Speed | SVM (<0.01s) | Naive Bayes (0.01s) |
| Model Size | BiLSTM (0.62 MB) | LSTM (0.62 MB) |
| Production Ready | Naive Bayes | SVM |
| Embeddings | BiLSTM | LSTM |
| Overall Winner | Naive Bayes | SVM |
Naive Bayes with alpha=0.5 is the recommended model for production use, achieving 99.92% accuracy with fast training and inference times.
Next Steps
Training Guide
Learn how to train your own models
Using Models
Start making predictions
Model Details
Deep dive into model implementations
API Reference
Integrate models into your application