Available Models
Our system provides two categories of models, each with different trade-offs:Traditional ML Models
Fast, efficient, and highly accurate classical machine learning approaches
Deep Learning Models
Neural network architectures that learn sequential patterns in text
Quick Comparison
| Model | Accuracy | Training Time | Inference Speed | Model Size |
|---|---|---|---|---|
| Naive Bayes | 99.92% | ~0.03s | ~0.01s | 29.84 MB |
| SVM | 99.77% | ~0.59s | ~0.00s | 14.92 MB |
| Random Forest | 99.41% | ~128s | ~0.66s | 230 MB |
| LSTM | 94.07% | ~15 epochs | ~15ms/sample | 618 KB |
| BiLSTM | 94.08% | ~15 epochs | ~30ms/sample | 618 KB |
All models were trained on the Europarl Parallel Corpus with 7,000 samples per language.
Model Selection Guide
When to Use Traditional ML
Choose traditional ML models when you need:- Maximum accuracy (>99.9% on validation set)
- Fast training (seconds instead of minutes)
- Quick inference (milliseconds per prediction)
- Production-ready performance with minimal resources
alpha=0.5 achieves 99.92% accuracy with the best balance of speed and accuracy.
When to Use Deep Learning
Choose deep learning models when you:- Want to learn sequential patterns in text
- Need interpretable embeddings for downstream tasks
- Are working with noisy or out-of-vocabulary words
- Have sufficient computational resources for training
Feature Extraction
All models use TF-IDF vectorization with word-level n-grams:- Analyzer: Word-based
- N-gram range: (1, 2) - unigrams and bigrams
- Vocabulary size: ~50K features for traditional ML
- Sequence length: 95th percentile of training data for deep learning (~50 tokens)
- Traditional ML
- Deep Learning
Performance Metrics
Best Performing Models by Language
| Language | Best Traditional ML | Best Deep Learning |
|---|---|---|
| Spanish (es) | NB: 0.9986 F1 | BiLSTM: 0.9272 F1 |
| German (de) | NB: 0.9985 F1 | BiLSTM: 0.9659 F1 |
| French (fr) | NB: 0.9995 F1 | BiLSTM: 0.9365 F1 |
| Italian (it) | NB: 0.9995 F1 | LSTM: 0.8969 F1 |
| Dutch (nl) | NB: 0.9995 F1 | LSTM: 0.9693 F1 |
| Portuguese (pt) | NB: 0.9986 F1 | LSTM: 0.9369 F1 |
| Swedish (sv) | NB: 1.0000 F1 | BiLSTM: 0.9747 F1 |
Traditional ML models consistently outperform deep learning models on this task due to the structured nature of the Europarl corpus.
Next Steps
Traditional ML Details
Explore Naive Bayes, SVM, and Random Forest implementations
Deep Learning Details
Learn about LSTM and BiLSTM architectures
Model Comparison
See detailed performance comparisons and metrics
Training Guide
Learn how to train your own models