Fake News Detector

Detect fake news with machine learning

This project is an AI-powered fake news detector that uses Natural Language Processing (NLP) and Logistic Regression to classify news articles as real or fake with 98.5% accuracy. Built with scikit-learn and deployed as an interactive Streamlit web application.

The model is trained on approximately 44,000 news articles and uses TF-IDF vectorization with unigrams and bigrams to capture semantic patterns in fake news content.

Key features

98.5% accuracy

Achieves high precision through optimized TF-IDF vectorization and Logistic Regression

TF-IDF with N-grams

Uses unigrams and bigrams with 5,000 features to capture key phrases and patterns

44,000 training samples

Trained on large dataset from Kaggle’s Fake and Real News collections

Interactive Streamlit UI

Easy-to-use web interface for real-time news classification

Anti-bias design

Removes source metadata to ensure the model focuses on content, not publisher

NLP preprocessing

Advanced text cleaning with stopword removal and metadata filtering

How it works

The fake news detector uses a machine learning pipeline that combines:

Data preparation - Combines title and text fields, removes source metadata like “WASHINGTON (REUTERS) -” to prevent bias

NLP preprocessing - Removes stopwords, punctuation, and applies tokenization

TF-IDF vectorization - Converts cleaned text into numerical features using Term Frequency-Inverse Document Frequency

Logistic Regression - Fast and interpretable classification model

Model persistence - Saves trained model and vectorizer using joblib for deployment

The model focuses on content analysis rather than source verification, making it capable of detecting fake news patterns regardless of publisher.

Architecture

The project consists of three main components:

fake_news_ia.py - Training script that processes 44,000 articles, trains the model, and saves artifacts

app.py - Streamlit web application for interactive news classification

predict_news.py - Command-line interface for batch predictions

All components use identical preprocessing functions to ensure consistency between training and inference.

Get Started

Core Concepts

Training Guide

Inference

Advanced

Detect fake news with machine learning

Key features

98.5% accuracy

TF-IDF with N-grams

44,000 training samples

Interactive Streamlit UI

Anti-bias design

NLP preprocessing

How it works

Get started

Quickstart

Installation

Training

API Reference

Architecture

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training Guide

Inference

Advanced

​Detect fake news with machine learning

​Key features

98.5% accuracy

TF-IDF with N-grams

44,000 training samples

Interactive Streamlit UI

Anti-bias design

NLP preprocessing

​How it works

​Get started

Quickstart

Installation

Training

API Reference

​Architecture

Build docs developers (and LLMs) love

Detect fake news with machine learning

Key features

How it works

Get started

Architecture