Quickstart

This guide will help you train the fake news detector and classify your first article. You’ll train a model on 44,000 news articles and achieve 98.5% accuracy.

Before starting, make sure you have completed the installation steps and have the required CSV files (Fake.csv and True.csv) in your project directory.

Train the model

Download NLTK data

The model requires NLTK stopwords and tokenizers. Download them first:

python3 -c 'import nltk; nltk.download("stopwords"); nltk.download("punkt")'

This downloads English stopwords used for text preprocessing.

Run the training script

Execute the training script to process the dataset and train the model:

python3 fake_news_ia.py

The script will:

Load approximately 44,000 articles from Fake.csv and True.csv
Clean and preprocess the text using NLP techniques
Train a Logistic Regression model with TF-IDF vectorization
Save the model to modelo_fake_news.pkl
Save the vectorizer to vectorizer_tfidf.pkl

Training takes approximately 1-2 minutes on modern hardware. You’ll see progress messages for each step.

Review the results

The script outputs model performance metrics:

Accuracy (Precisión General): 0.985

Reporte de Clasificación:
              precision    recall  f1-score   support

        fake       0.99      0.98      0.98      4519
        real       0.98      0.99      0.98      4469

You should see approximately 98.5% accuracy on the test set.

Classify your first article

Once training is complete, you can classify news articles using either the web interface or command line.

Option 1: Use the Streamlit web app

Launch the app

Start the Streamlit web server:

streamlit run app.py

The app will open in your browser at http://localhost:8501.

Paste a news article

Copy and paste a news article into the text area. Try this example:

The Federal Reserve announced on Wednesday that it will maintain 
the benchmark interest rate within the current range of 5.25% to 
5.50%, citing steady economic growth and easing inflation. Federal 
Reserve Chair Jerome Powell stated during a press briefing in 
Washington that future rate decisions will depend on labor market 
data and inflation trends over the coming months.

Get the prediction

Click the “Clasificar Noticia” button. The model will:

Clean and preprocess the text
Transform it using the trained TF-IDF vectorizer
Make a prediction using the Logistic Regression model

You’ll see either:

✅ REAL - with a success message and balloons animation
❌ FAKE - with an error message

Option 2: Use the command-line script

For batch predictions, use the predict_news.py script:

python3 predict_news.py

This script includes several pre-configured test cases and outputs predictions for each:

Noticia 1 (Inicio): The Federal Reserve announced on Wednesday that it...
Predicción: REAL

Noticia 2 (Inicio): A secret meeting was held at the UN headquarters wh...
Predicción: FAKE

Noticia 3 (Inicio): President Joe Biden announced a new infrastructure...
Predicción: REAL

Test with different examples

Try classifying these different types of content:

The European Union formally approved a new trade agreement with Canada 
on Thursday following a vote in the European Parliament in Brussels. 
Officials said the agreement is expected to strengthen economic 
cooperation and reduce tariffs on industrial goods over the next five years.

The model works best with news-style articles. Short statements or social media posts may not produce accurate results.

Next steps

Learn about the training process and how to customize the model
Explore integration options for using the model in your applications
Understand the NLP preprocessing pipeline in detail

Get Started

Core Concepts

Training Guide

Inference

Advanced

Train the model

Classify your first article

Option 1: Use the Streamlit web app

Option 2: Use the command-line script

Test with different examples

Next steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training Guide

Inference

Advanced

​Train the model

​Classify your first article

​Option 1: Use the Streamlit web app

​Option 2: Use the command-line script

​Test with different examples

​Next steps

Build docs developers (and LLMs) love

Train the model

Classify your first article

Option 1: Use the Streamlit web app

Option 2: Use the command-line script

Test with different examples

Next steps