Before starting, make sure you’ve completed the installation steps, including setting up your
.env file and placing your Twitter archive in data/tweets/tweets.json.Your first analysis
The tweet audit process involves two simple commands:Extract tweets from archive
Convert your X archive JSON to a structured CSV format:Expected output:This command:
- Reads
data/tweets/tweets.json(your X archive) - Parses and validates tweet data
- Writes structured CSV to
data/tweets/transformed/tweets.csv - Skips retweets automatically
The extraction process is defined in
src/application.py:47-62. It uses JSONParser to read the archive and CSVWriter to create the transformed output.Analyze tweets with AI
Send tweets to Gemini AI for evaluation against your criteria:Expected output:This command:
- Loads tweets from the transformed CSV
- Processes tweets in batches (default: 10 tweets per batch)
- Sends each tweet to Gemini AI for analysis
- Writes flagged tweets to
data/tweets/processed/results.csv - Saves checkpoints after each batch for resumability
Review results
Open the results CSV to see flagged tweets:Example output:Each row contains:
tweet_url- Direct link to the tweet on Xdeleted- Status flag (falseby default, update totrueafter deleting)
Understanding the output
After running both commands, yourdata/ directory will look like:
File details
tweets.json
tweets.json
Your original X archive export. This file is never modified.
transformed/tweets.csv
transformed/tweets.csv
Structured CSV with extracted tweet data. Used as input for analysis.
processed/results.csv
processed/results.csv
Final output with URLs of tweets flagged for deletion.Update the
deleted column to true after manually deleting tweets to track your progress.checkpoint.txt
checkpoint.txt
Stores the last processed tweet index for resumability.If analysis is interrupted, the tool automatically resumes from this point.
Advanced usage
Resume interrupted analysis
The tool automatically saves progress after each batch. If interrupted (Ctrl+C, crash, API quota exceeded):The checkpoint system is implemented in
src/application.py:77-79 using the Checkpoint class that reads/writes data/checkpoint.txt.Test with a small sample
Before processing thousands of tweets, test your criteria on a small sample:Review and refine
Check
data/tweets/processed/results.csv and adjust your config.json criteria based on results.Adjust processing speed
Control API call frequency to balance speed and rate limits:Change batch size
Modify batch size insrc/config.py (default: 10 tweets per batch):
src/config.py
- Larger batches (e.g., 50) = Faster processing, longer recovery if interrupted
- Smaller batches (e.g., 5) = More frequent checkpoints, better resumability
Enable debug logging
Get detailed logs for troubleshooting:.env
Understanding AI decisions
Gemini AI evaluates each tweet and returns a decision:src/analyzer.py:107-136):
Common workflows
Full archive cleanup
Full archive cleanup
Process all tweets and clean up your entire history:
Iterative refinement
Iterative refinement
Test criteria on samples before full analysis:
Periodic audits
Periodic audits
Run audits regularly to catch new problematic tweets:
Troubleshooting
Missing required column: id
Missing required column: id
The transformed CSV is corrupted. Delete and re-extract:
API rate limit errors (429)
API rate limit errors (429)
You’re hitting Gemini’s rate limits. Solutions:
-
Increase delay in
.env: -
Wait and resume (the tool auto-resumes):
- Check daily quota at Google AI Studio
Empty response from Gemini
Empty response from Gemini
Gemini returned an empty response. The tool will retry automatically (up to 3 times with exponential backoff).If it persists:
Tool keeps flagging everything (or nothing)
Tool keeps flagging everything (or nothing)
Your criteria might be too strict (or too lenient). Adjust
config.json:If flagging too much:- Remove some
forbidden_words - Make
topics_to_excludemore specific - Soften
tone_requirements
- Add more
forbidden_words - Expand
topics_to_exclude - Strengthen
additional_instructions
Cost and time estimates
Gemini API limits (free tier)
- Requests per minute: 15
- Requests per day: 1,500
- Cost: Free for moderate use
Processing time
100 tweets
~2 minutes(1 req/sec)
1,000 tweets
~17 minutes(1 req/sec)
10,000 tweets
~7 days(1,500 req/day limit)
Next steps
Configuration guide
Learn advanced criteria customization and settings
CLI reference
Complete command-line interface documentation
Troubleshooting
Common issues and solutions
GitHub repository
View source code and contribute