Skip to main content
This page provides a detailed comparison of all five models implemented in the Language Detection System, based on actual results from the Europarl corpus.

Executive Summary

Best Overall

Naive Bayes (alpha=0.5)
  • 99.92% accuracy
  • 0.03s training time
  • 29.84 MB model size
  • Best for production

Most Efficient

Linear SVM
  • 99.77% accuracy
  • 14.92 MB model size
  • Smallest footprint
  • Fast inference

Overall Performance Comparison

Accuracy Rankings

RankModelValidation AccuracyTest AccuracyMisclassifications (out of 7,350)
🥇 1Naive Bayes99.92%99.92%6
🥈 2SVM99.77%-~17
🥉 3Logistic Regression99.56%-~32
4Random Forest99.41%-~43
5BiLSTM94.08%93.67%465
6LSTM94.07%-436

Training Time Rankings

RankModelTraining TimeRelative Speed
🥇 1Naive Bayes0.03s1x (baseline)
🥈 2SVM0.59s20x slower
🥉 3Logistic Regression14.73s491x slower
4LSTM~1,312s (15 epochs × 82s)43,733x slower
5BiLSTM~2,070s (15 epochs × 138s)69,000x slower
6Random Forest128s4,267x slower

Model Size Rankings

RankModelModel SizeRelative Size
🥇 1BiLSTM0.62 MB1x (smallest)
🥈 2LSTM0.62 MB1x
🥉 3SVM14.92 MB24x larger
4Logistic Regression14.92 MB24x larger
5Naive Bayes29.84 MB48x larger
6Random Forest~230 MB371x larger

Detailed Performance Metrics

Traditional ML Models

Best Overall ModelHyperparameters:
  • Alpha: 0.5 (smoothing parameter)
  • Vectorizer: TF-IDF with word n-grams (1, 2)
Performance:
  • Validation Accuracy: 99.92%
  • Test Accuracy: 99.92%
  • Training Time: 0.03s
  • Inference Time: 0.01s per batch
  • Model Size: 29.84 MB
Per-Language F1-Scores (Test Set):
  • Swedish (sv): 1.0000 ✓
  • Dutch (nl): 1.0000 ✓
  • Portuguese (pt): 0.9995
  • Italian (it): 0.9991
  • French (fr): 0.9986
  • German (de): 0.9985
  • Spanish (es): 0.9986
Strengths:
  • Highest accuracy across all models
  • Extremely fast training and inference
  • Robust performance on all languages
  • Production-ready out of the box
Weaknesses:
  • Larger model size than SVM
  • Assumes feature independence (not an issue in practice)

Deep Learning Models

Best Deep Learning ModelArchitecture:
  • Embedding: 10K vocab, 100 dimensions
  • Bidirectional LSTM: 32 units (64 total)
  • Dense layer: 64 units with ReLU
  • Output: 7 units with softmax
  • Dropout: 0.3 rate
Performance:
  • Validation Accuracy: 94.08%
  • Test Accuracy: 93.67%
  • Training Time: ~2,070s (15 epochs)
  • Inference Time: 30ms per sample
  • Model Size: 0.62 MB
Per-Language F1-Scores (Validation):
  • Swedish (sv): 0.9747
  • Dutch (nl): 0.9645
  • German (de): 0.9659 ✓ (best DL)
  • Portuguese (pt): 0.9368
  • French (fr): 0.9365
  • Spanish (es): 0.9272
  • Italian (it): 0.8853
Strengths:
  • Bidirectional context understanding
  • Smallest model size (0.62 MB)
  • Learns word embeddings
  • Better on German, French, Spanish vs LSTM
Weaknesses:
  • 5.85% lower accuracy than Naive Bayes
  • 2x slower inference than LSTM
  • Long training time
  • Poor performance on Italian

Vectorization Techniques Comparison

Before settling on word-level TF-IDF, we compared multiple vectorization approaches:
TechniqueVectorization TimeAccuracyModel Size
Word TF-IDF1.54s99.89%29.84 MB
Character TF-IDF4.99s99.88%9.66 MB
Hashing4.96s99.81%Variable
Letter Frequency0.86s86.80%0.00 MB
Winner: Word-level TF-IDF with n-grams (1, 2) provides the best balance of accuracy and speed.

Per-Language Performance Analysis

Traditional ML (Naive Bayes) vs Deep Learning (BiLSTM)

ModelPrecisionRecallF1-Score
Naive Bayes1.00001.00001.0000
BiLSTM0.99390.95620.9747
Difference-0.61%-4.38%-2.53%
Analysis: NB achieves perfect classification on Swedish, while BiLSTM still performs very well.

Common Misclassifications

Naive Bayes Misclassifications (6 total)

Text: "Monsieur Bolkestein, je veux vous dire quelque chose!"
Actual: German (de)
Predicted: French (fr) - Confidence: 98.47%
Reason: French sentence appears in German document

Deep Learning Misclassifications (435-465 total)

Text: "Temos de mudar resolutamente de rumo."
Tokens: "<OOV> de <OOV> <OOV> de <OOV>"
Actual: Portuguese (pt)
Predicted: Dutch (nl) - Confidence: 45.31%
Reason: Most words are OOV, model relies on "de" which appears in both languages

Use Case Recommendations

Production Systems

Recommended: Naive Bayes
  • Highest accuracy (99.92%)
  • Fast inference (<10ms)
  • Reliable and battle-tested
  • Easy to deploy

Resource-Constrained

Recommended: SVM or BiLSTM
  • SVM: 14.92 MB, 99.77% accuracy
  • BiLSTM: 0.62 MB, 93.67% accuracy
  • Trade accuracy for size based on needs

Real-Time Processing

Recommended: Naive Bayes or SVM
  • Inference: <1ms per prediction
  • Batch processing: thousands/second
  • Low latency requirements

Research & Embeddings

Recommended: BiLSTM
  • Learned embeddings reusable
  • Interpretable attention weights
  • Sequential pattern analysis

Training Cost Analysis

Assuming training on a standard machine (GPU for deep learning):
ModelTraining TimeEst. Cloud Cost*Retraining Frequency
Naive Bayes0.03s$0.00Daily if needed
SVM0.59s$0.00Daily if needed
Logistic Regression14.73s$0.00Daily if needed
Random Forest128s$0.01Weekly
LSTM~22 min$0.10Monthly
BiLSTM~35 min$0.15Monthly
*Estimated costs on AWS/GCP compute instances
Traditional ML models can be retrained in seconds, making them ideal for continuous learning scenarios. Deep learning models require minutes to hours and are better suited for less frequent retraining.

Final Recommendations

For Most Users

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
import joblib

# Use Naive Bayes - best overall performance
pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='word', ngram_range=(1, 2))),
    ('classifier', MultinomialNB(alpha=0.5))
])

pipeline.fit(X_train, y_train)
joblib.dump(pipeline, 'language_detector.joblib')

For Embedded Systems

# Use SVM for smaller model size
pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='word', ngram_range=(1, 2))),
    ('classifier', LinearSVC())
])

For Research

from tensorflow.keras.models import load_model

# Use BiLSTM for embeddings and interpretability
model = load_model('modelos/modelo_bilstm.keras')

# Extract embeddings
embedding_model = Model(inputs=model.input, 
                        outputs=model.layers[1].output)
embeddings = embedding_model.predict(sequences)

Summary Table

CriteriaBest ModelRunner-Up
AccuracyNaive Bayes (99.92%)SVM (99.77%)
Training SpeedNaive Bayes (0.03s)SVM (0.59s)
Inference SpeedSVM (<0.01s)Naive Bayes (0.01s)
Model SizeBiLSTM (0.62 MB)LSTM (0.62 MB)
Production ReadyNaive BayesSVM
EmbeddingsBiLSTMLSTM
Overall WinnerNaive BayesSVM
Naive Bayes with alpha=0.5 is the recommended model for production use, achieving 99.92% accuracy with fast training and inference times.

Next Steps

Training Guide

Learn how to train your own models

Using Models

Start making predictions

Model Details

Deep dive into model implementations

API Reference

Integrate models into your application

Build docs developers (and LLMs) love