Skip to main content
This guide covers everything you need to install and configure the Fake News Detector on your local machine.

Requirements

Before you begin, ensure you have:
  • Python 3.7 or higher
  • pip (Python package manager)
  • Git (optional, for cloning the repository)
  • 2GB of free disk space (for datasets and NLTK data)

Installation steps

1

Clone or download the project

If you’re using Git, clone the repository:
git clone https://github.com/MisaelCast/Proyecto-IA.git
cd Proyecto-IA
Otherwise, download and extract the project files to a directory.
2

Create a virtual environment

Create an isolated Python environment to avoid dependency conflicts:
python3 -m venv .venv
source .venv/bin/activate
You should see (.venv) in your terminal prompt indicating the virtual environment is active.
Always activate the virtual environment before working with the project to ensure you’re using the correct dependencies.
3

Install Python dependencies

Install all required packages using pip:
pip install pandas nltk scikit-learn joblib streamlit
This installs:
  • pandas - Data manipulation and CSV loading
  • nltk - Natural Language Toolkit for text preprocessing
  • scikit-learn - Machine learning library with TF-IDF and Logistic Regression
  • joblib - Model serialization and persistence
  • streamlit - Web application framework
4

Download NLTK data

The model requires NLTK’s stopwords and tokenizers. Download them with:
python3 -c 'import nltk; nltk.download("stopwords"); nltk.download("punkt")'
This downloads:
  • English stopwords (common words like “the”, “is”, “and” to be filtered out)
  • Punkt tokenizer (for sentence and word tokenization)
NLTK data is downloaded to ~/nltk_data/ by default and is approximately 50MB.
5

Download the datasets

The model requires two CSV files from Kaggle:
  • Fake.csv - Collection of fake news articles
  • True.csv - Collection of real news articles
Download these files and place them in your project directory alongside fake_news_ia.py.
The training script will fail if these files are not present. Ensure both CSV files are in the same directory as the Python scripts.

Verify installation

Confirm everything is installed correctly:
python3 -c "import pandas, nltk, sklearn, joblib, streamlit; print('All dependencies installed successfully!')"
If you see the success message, you’re ready to proceed to the quickstart guide.

Troubleshooting

NLTK stopwords not found

If you see an error about missing stopwords when running the scripts:
Error: Necesitas descargar las stopwords de NLTK.
Run the NLTK download command again:
python3 -c 'import nltk; nltk.download("stopwords")'

Model files not found

If the Streamlit app shows:
Error: Archivos de modelo o vectorizador (.pkl) no encontrados.
You need to train the model first:
python3 fake_news_ia.py
This creates modelo_fake_news.pkl and vectorizer_tfidf.pkl required by the app.

CSV files not found

If training fails with:
Error: Asegúrate de que los archivos 'Fake.csv' y 'True.csv' estén en la misma carpeta.
Ensure both CSV files are in your project directory and not in a subdirectory.

Package versions

The project is tested with:
  • pandas 1.3+
  • nltk 3.6+
  • scikit-learn 0.24+
  • joblib 1.0+
  • streamlit 1.0+
Newer versions should work, but if you encounter issues, you can install specific versions:
pip install pandas==1.3.0 nltk==3.6 scikit-learn==0.24.0 joblib==1.0.0 streamlit==1.0.0

Next steps

Now that installation is complete, head to the quickstart guide to train your model and classify your first article.

Build docs developers (and LLMs) love