Prediction API

Overview

The prediction API allows you to integrate the fake news detector into your Python applications. This guide demonstrates how to load the trained model and vectorizer, preprocess text, and generate predictions.

Quick Start

Load Required Dependencies

Import the necessary libraries for model loading and text preprocessing:

import joblib
import re
from nltk.corpus import stopwords
import sys

Load Trained Models

Load the pre-trained model and TF-IDF vectorizer from disk:

try:
    modelo = joblib.load('modelo_fake_news.pkl')
    vectorizer = joblib.load('vectorizer_tfidf.pkl')
    stop_words = set(stopwords.words("english"))
    
    print("Modelos cargados exitosamente. Listo para clasificar. ✅")
except FileNotFoundError:
    print("Error: Los archivos .pkl no se encontraron.")
    print("Asegúrate de ejecutar 'fake_news_ia.py' primero.")
    sys.exit()

Make sure you have the modelo_fake_news.pkl and vectorizer_tfidf.pkl files in your working directory. These are generated by training the model first.

Define Text Preprocessing Function

The limpiar_texto function must be identical to the one used during training:

def limpiar_texto(texto):
    # 1. Remove metadata/source (e.g., WASHINGTON (REUTERS) - )
    texto = re.sub(r'([A-Z\s]+)\s*\((REUTERS|AP|AFP)\)\s*\-\s*', '', str(texto), flags=re.IGNORECASE)
    
    # 2. Convert to lowercase
    texto = str(texto).lower()
    
    # 3. Remove punctuation, numbers, and special characters
    texto = re.sub(r'[^a-z\s]', '', texto) 
    
    # 4. Tokenize with split()
    tokens = texto.split() 

    # 5. Filter stopwords and single-letter tokens
    tokens = [t for t in tokens if t not in stop_words and len(t) > 1]
    return " ".join(tokens)

The preprocessing function must match the training pipeline exactly to ensure consistent results.

Make a Prediction

Process and classify a single news article:

# Example news article
noticia = "The Federal Reserve announced on Wednesday that it will maintain the benchmark interest rate within the current range of 5.25% to 5.50%, citing steady economic growth and easing inflation."

# 1. Clean the text
noticia_limpia = limpiar_texto(noticia)

# 2. Vectorize the text using the trained vectorizer
noticia_vec = vectorizer.transform([noticia_limpia])

# 3. Make prediction
prediccion = modelo.predict(noticia_vec)[0]

# 4. Display result
print(f"Predicción: {prediccion.upper()}")

Complete Example

Here’s a complete working example from predict_news.py:1-43:

import joblib
import re
from nltk.corpus import stopwords
import sys

# Load models
try:
    modelo = joblib.load('modelo_fake_news.pkl')
    vectorizer = joblib.load('vectorizer_tfidf.pkl')
    stop_words = set(stopwords.words("english"))
    print("Modelos cargados exitosamente. ✅")
except FileNotFoundError:
    print("Error: Archivos .pkl no encontrados.")
    sys.exit()

# Preprocessing function
def limpiar_texto(texto):
    texto = re.sub(r'([A-Z\s]+)\s*\((REUTERS|AP|AFP)\)\s*\-\s*', '', str(texto), flags=re.IGNORECASE)
    texto = str(texto).lower()
    texto = re.sub(r'[^a-z\s]', '', texto) 
    tokens = texto.split() 
    tokens = [t for t in tokens if t not in stop_words and len(t) > 1]
    return " ".join(tokens)

# Classify a news article
noticia = "President Joe Biden announced a new infrastructure plan."
noticia_limpia = limpiar_texto(noticia)
noticia_vec = vectorizer.transform([noticia_limpia])
prediccion = modelo.predict(noticia_vec)[0]

print(f"Predicción: {prediccion.upper()}")

Expected Output

When you run the prediction successfully:

Modelos cargados exitosamente. Listo para clasificar. ✅
Predicción: REAL

The model returns one of two classifications:

real - The article is classified as legitimate news
fake - The article is classified as fake news

Integration Tips

NLTK Stopwords: If you encounter a stopwords error, download them first:

python3 -c 'import nltk; nltk.download("stopwords")'

Model Location: Ensure both .pkl files are in the same directory as your script, or provide absolute paths to joblib.load().

Next Steps

Learn about batch processing for multiple articles
Try the Streamlit web interface for interactive testing

Get Started

Core Concepts

Training Guide

Inference

Advanced

Overview

Quick Start

Complete Example

Expected Output

Integration Tips

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training Guide

Inference

Advanced

​Overview

​Quick Start

​Complete Example

​Expected Output

​Integration Tips

​Next Steps

Build docs developers (and LLMs) love

Overview

Quick Start

Complete Example

Expected Output

Integration Tips

Next Steps