Detect fake news with machine learning
This project is an AI-powered fake news detector that uses Natural Language Processing (NLP) and Logistic Regression to classify news articles as real or fake with 98.5% accuracy. Built with scikit-learn and deployed as an interactive Streamlit web application. The model is trained on approximately 44,000 news articles and uses TF-IDF vectorization with unigrams and bigrams to capture semantic patterns in fake news content.Key features
98.5% accuracy
Achieves high precision through optimized TF-IDF vectorization and Logistic Regression
TF-IDF with N-grams
Uses unigrams and bigrams with 5,000 features to capture key phrases and patterns
44,000 training samples
Trained on large dataset from Kaggle’s Fake and Real News collections
Interactive Streamlit UI
Easy-to-use web interface for real-time news classification
Anti-bias design
Removes source metadata to ensure the model focuses on content, not publisher
NLP preprocessing
Advanced text cleaning with stopword removal and metadata filtering
How it works
The fake news detector uses a machine learning pipeline that combines:- Data preparation - Combines title and text fields, removes source metadata like “WASHINGTON (REUTERS) -” to prevent bias
- NLP preprocessing - Removes stopwords, punctuation, and applies tokenization
- TF-IDF vectorization - Converts cleaned text into numerical features using Term Frequency-Inverse Document Frequency
- Logistic Regression - Fast and interpretable classification model
- Model persistence - Saves trained model and vectorizer using joblib for deployment
The model focuses on content analysis rather than source verification, making it capable of detecting fake news patterns regardless of publisher.
Get started
Quickstart
Train the model and classify your first news article in minutes
Installation
Set up your Python environment and install dependencies
Training
Learn how the model is trained and optimized
API Reference
Explore the prediction API and integration options
Architecture
The project consists of three main components:- fake_news_ia.py - Training script that processes 44,000 articles, trains the model, and saves artifacts
- app.py - Streamlit web application for interactive news classification
- predict_news.py - Command-line interface for batch predictions