Before starting, make sure you have completed the installation steps and have the required CSV files (
Fake.csv and True.csv) in your project directory.Train the model
Download NLTK data
The model requires NLTK stopwords and tokenizers. Download them first:This downloads English stopwords used for text preprocessing.
Run the training script
Execute the training script to process the dataset and train the model:The script will:
- Load approximately 44,000 articles from
Fake.csvandTrue.csv - Clean and preprocess the text using NLP techniques
- Train a Logistic Regression model with TF-IDF vectorization
- Save the model to
modelo_fake_news.pkl - Save the vectorizer to
vectorizer_tfidf.pkl
Classify your first article
Once training is complete, you can classify news articles using either the web interface or command line.Option 1: Use the Streamlit web app
Launch the app
Start the Streamlit web server:The app will open in your browser at
http://localhost:8501.Get the prediction
Click the “Clasificar Noticia” button. The model will:
- Clean and preprocess the text
- Transform it using the trained TF-IDF vectorizer
- Make a prediction using the Logistic Regression model
- ✅ REAL - with a success message and balloons animation
- ❌ FAKE - with an error message
Option 2: Use the command-line script
For batch predictions, use thepredict_news.py script:
Test with different examples
Try classifying these different types of content:Next steps
- Learn about the training process and how to customize the model
- Explore integration options for using the model in your applications
- Understand the NLP preprocessing pipeline in detail