Skip to main content

Common errors

This page covers the most common issues you might encounter when using the Tweet Audit Tool and how to resolve them.

Environment and configuration errors

Error message:
GEMINI_API_KEY is required. Set it via environment variable or .env file
Cause: The tool cannot find your Gemini API key in the environment.Solution:
1

Create .env file

Create a .env file in your project root if it doesn’t exist:
touch .env
2

Add your API key

Add your Gemini API key to the .env file:
GEMINI_API_KEY=your_actual_api_key_here
X_USERNAME=your_x_username
3

Get an API key

If you don’t have an API key yet, get one from Google AI Studio
Make sure there are no spaces around the = sign and no quotes around the key value.
Error message:
Tweet archive not found: data/tweets/tweets.json
Cause: The tool cannot find your Twitter archive file in the expected location.Solution:
1

Request your archive

If you haven’t already, request your archive from X:
  • Go to X.com → More → Settings and Privacy → Your Account
  • Click “Download an archive of your data”
  • Wait 24-48 hours for the email
2

Extract the archive

Download and extract the ZIP file you receive from X
3

Copy to correct location

Create the directory and copy the tweets file:
mkdir -p data/tweets
cp /path/to/your/archive/data/tweets.json data/tweets/tweets.json
Error message:
Corrupted checkpoint file data/checkpoint.txt: expected integer, got '...'
Cause: The checkpoint file that tracks progress has become corrupted.Solution:Simply delete the checkpoint file and restart:
rm data/checkpoint.txt
python src/main.py analyze-tweets
Deleting the checkpoint will restart the analysis from the beginning. Any previously analyzed tweets will need to be re-analyzed.

File format errors

Error message:
Missing required column 'id' in data/tweets/transformed/tweets.csv
Cause: The transformed CSV file is corrupted or incomplete.Solution:Delete the corrupted CSV and re-extract tweets from the archive:
rm data/tweets/transformed/tweets.csv
python src/main.py extract-tweets
Error message:
Invalid JSON in data/tweets/tweets.json: ...
Cause: The Twitter archive JSON file is malformed or incomplete.Solution:
1

Verify the download

Re-download your Twitter archive from the email link
2

Extract completely

Ensure the ZIP file is fully extracted without errors
3

Copy again

Copy the tweets.json file again:
cp /path/to/archive/data/tweets.json data/tweets/tweets.json
Error message:
Missing required field 'id_str' in data/tweets/tweets.json.
Expected format: [{'tweet': {'id_str': '...', 'full_text': '...'}}]
Cause: The archive file format doesn’t match the expected Twitter export structure.Solution:Ensure you’re using the official Twitter/X archive export:
  • The file should come from X.com’s “Download an archive of your data” feature
  • The JSON should contain a list of objects with a tweet key
  • Each tweet should have id_str and full_text fields
If you’re using an old archive format, you may need to request a fresh archive export from X.
Error message:
Invalid CSV format in data/tweets/transformed/tweets.csv: ...
Cause: The CSV file has formatting issues.Solution:Re-extract tweets to regenerate the CSV:
rm -rf data/tweets/transformed/
python src/main.py extract-tweets

API and analysis errors

Error message:
429 rate limit exceeded
Cause: You’re hitting Gemini API rate limits.Solution:The tool automatically retries with exponential backoff, but you can also:
1

Increase delay

Add or adjust RATE_LIMIT_SECONDS in your .env:
RATE_LIMIT_SECONDS=2.0
2

Wait and resume

The tool saves progress automatically. Wait a few minutes and run again:
python src/main.py analyze-tweets
3

Check your quota

Visit Google AI Studio to check your API quota limits
Gemini 2.5 Flash free tier: 15 requests per minute, 1,500 per day. For large tweet volumes, spread analysis over multiple days.
Error message:
Empty response from Gemini for tweet 1234567890
Cause: The API returned no content for a specific tweet.Solution:This is usually a transient error. The tool will:
  1. Automatically retry up to 3 times with exponential backoff
  2. If it continues failing, the tweet may contain content that Gemini cannot process
You can resume analysis after checking the logs:
python src/main.py analyze-tweets
Error message:
Invalid Gemini response for tweet 1234567890: ...
Missing decision field in Gemini response for tweet 1234567890: ...
Invalid decision value from Gemini for tweet 1234567890: ...
Cause: Gemini returned a response in an unexpected format.Solution:This indicates an issue with the AI model’s output format. Try:
1

Check your model

Verify you’re using a supported model in .env:
GEMINI_MODEL=gemini-2.5-flash
2

Retry analysis

The tool will resume from the last successful checkpoint:
python src/main.py analyze-tweets
3

Check API status

If the issue persists, check Google Cloud Status for API outages
Error messages:
Connection timeout
Connection error
503 Service temporarily unavailable
Cause: Network issues or temporary API unavailability.Solution:The tool automatically retries with exponential backoff for:
  • Timeouts
  • Connection errors
  • Rate limits (429)
  • Service unavailable (503)
  • Quota errors
If errors persist:
  1. Check your internet connection
  2. Wait a few minutes and resume analysis
  3. The checkpoint system ensures you don’t lose progress

Permission errors

Error message:
Permission denied: data/tweets/processed/results.csv
Cause: The tool cannot write to the output directory due to file permissions.Solution:
1

Check directory permissions

Ensure you have write access to the data directory:
ls -la data/
2

Fix permissions

If needed, update permissions:
chmod -R u+w data/
3

Check disk space

Ensure you have sufficient disk space:
df -h .

Analysis quality issues

Gemini returns “KEEP” for everything

Symptom: The analysis completes but no tweets are flagged for deletion, even though you expected some to be flagged. Cause: Your criteria might be too lenient or not specific enough. Solution:
1

Review your criteria

Open config.json and examine your criteria settings
2

Add forbidden words

Include specific words that should trigger deletion:
"forbidden_words": ["damn", "wtf", "crypto", "NFT"]
3

Be more specific with topics

Make your topics_to_exclude more detailed:
"topics_to_exclude": [
  "Profanity or unprofessional language",
  "Personal attacks or insults directed at individuals",
  "Political opinions from 2020-2021",
  "Cryptocurrency or NFT promotion"
]
4

Add stronger instructions

Provide clearer guidance:
"additional_instructions": "Be aggressive in flagging content. Flag anything that could be seen as unprofessional by a potential employer or client."
5

Test on a sample

Before re-running on all tweets:
  1. Create a small test archive with 5-10 known problematic tweets
  2. Run the analysis
  3. Verify the results match your expectations
  4. Adjust criteria as needed
6

Reset and re-analyze

Once criteria are refined, restart:
rm data/checkpoint.txt
rm data/tweets/processed/results.csv
python src/main.py analyze-tweets
Deleting the checkpoint and results file will restart the analysis from scratch. This will use additional API quota.

Getting help

If you encounter an error not listed here:
  1. Check the logs: Set LOG_LEVEL=DEBUG in your .env for detailed logging
  2. Search existing issues: Check the GitHub issues
  3. Report a bug: Open a new issue with:
    • The full error message
    • Your Python version (python --version)
    • Steps to reproduce
    • Relevant log output (with sensitive data removed)
Remember to never share your actual API keys or personal tweet content when reporting issues.

Build docs developers (and LLMs) love