Skip to main content
This guide will help you train the fake news detector and classify your first article. You’ll train a model on 44,000 news articles and achieve 98.5% accuracy.
Before starting, make sure you have completed the installation steps and have the required CSV files (Fake.csv and True.csv) in your project directory.

Train the model

1

Download NLTK data

The model requires NLTK stopwords and tokenizers. Download them first:
python3 -c 'import nltk; nltk.download("stopwords"); nltk.download("punkt")'
This downloads English stopwords used for text preprocessing.
2

Run the training script

Execute the training script to process the dataset and train the model:
python3 fake_news_ia.py
The script will:
  • Load approximately 44,000 articles from Fake.csv and True.csv
  • Clean and preprocess the text using NLP techniques
  • Train a Logistic Regression model with TF-IDF vectorization
  • Save the model to modelo_fake_news.pkl
  • Save the vectorizer to vectorizer_tfidf.pkl
Training takes approximately 1-2 minutes on modern hardware. You’ll see progress messages for each step.
3

Review the results

The script outputs model performance metrics:
Accuracy (Precisión General): 0.985

Reporte de Clasificación:
              precision    recall  f1-score   support

        fake       0.99      0.98      0.98      4519
        real       0.98      0.99      0.98      4469
You should see approximately 98.5% accuracy on the test set.

Classify your first article

Once training is complete, you can classify news articles using either the web interface or command line.

Option 1: Use the Streamlit web app

1

Launch the app

Start the Streamlit web server:
streamlit run app.py
The app will open in your browser at http://localhost:8501.
2

Paste a news article

Copy and paste a news article into the text area. Try this example:
The Federal Reserve announced on Wednesday that it will maintain 
the benchmark interest rate within the current range of 5.25% to 
5.50%, citing steady economic growth and easing inflation. Federal 
Reserve Chair Jerome Powell stated during a press briefing in 
Washington that future rate decisions will depend on labor market 
data and inflation trends over the coming months.
3

Get the prediction

Click the “Clasificar Noticia” button. The model will:
  1. Clean and preprocess the text
  2. Transform it using the trained TF-IDF vectorizer
  3. Make a prediction using the Logistic Regression model
You’ll see either:
  • REAL - with a success message and balloons animation
  • FAKE - with an error message

Option 2: Use the command-line script

For batch predictions, use the predict_news.py script:
python3 predict_news.py
This script includes several pre-configured test cases and outputs predictions for each:
Noticia 1 (Inicio): The Federal Reserve announced on Wednesday that it...
Predicción: REAL

Noticia 2 (Inicio): A secret meeting was held at the UN headquarters wh...
Predicción: FAKE

Noticia 3 (Inicio): President Joe Biden announced a new infrastructure...
Predicción: REAL

Test with different examples

Try classifying these different types of content:
The European Union formally approved a new trade agreement with Canada 
on Thursday following a vote in the European Parliament in Brussels. 
Officials said the agreement is expected to strengthen economic 
cooperation and reduce tariffs on industrial goods over the next five years.
The model works best with news-style articles. Short statements or social media posts may not produce accurate results.

Next steps

Build docs developers (and LLMs) love