analyze-tweets command processes your extracted tweets using Google’s Gemini AI to identify tweets that don’t align with your configured criteria.
Usage
Prerequisites
Ensure you have:- ✅ Successfully run
extract-tweetsto generate the CSV file - ✅ Set your
GEMINI_API_KEYin the.envfile - ✅ Configured your criteria in
config.json(optional, uses defaults if not provided) - ✅ Set your
X_USERNAMEin the.envfile
What it does
Theanalyze-tweets command performs the following operations:
- Loads tweets from the extracted CSV file
- Initializes the Gemini AI analyzer
- Processes tweets in batches (default: 10 tweets per batch)
- Skips retweets (tweets starting with “RT @”)
- Analyzes each tweet against your configured criteria
- Writes flagged tweets (marked for deletion) to the results CSV
- Saves progress checkpoints for resumability
The analysis respects rate limits and can be safely interrupted. Use Ctrl+C to stop, then re-run to resume from the last checkpoint.
Input requirements
The CSV file generated by the
extract-tweets command containing all your tweets.Environment configuration file containing:
Optional criteria configuration. If not provided, sensible defaults are used.
Output
A CSV file containing tweets flagged for deletion with columns:
tweet_url: Full URL to the tweet (e.g.,https://x.com/username/status/123456)deleted: Boolean flag (initiallyfalse, update totrueafter manual deletion)
A checkpoint file tracking analysis progress (tweet index). Used for resuming interrupted sessions.
Success output
When the analysis succeeds, you’ll see:The analyzed count may be lower than your total tweet count because retweets are automatically skipped.
Behavior details
Batch processing
Tweets are processed in configurable batches (default: 10 tweets per batch):- Progress is logged after each batch (src/application.py:89-92)
- Checkpoints are saved after completing each batch (src/application.py:116-117)
- Rate limiting is applied between individual tweet analyses
Retweet handling
Retweets are automatically skipped (src/application.py:95-96):Retweets are not analyzed because they are not your original content and have different deletion implications.
Analysis decisions
The Gemini analyzer evaluates each tweet and returns one of two decisions:The tweet aligns with your criteria and should be kept
The tweet violates your criteria and should be considered for deletion
DELETE are written to the results CSV (src/application.py:103-104).
Resumability
The command automatically resumes from the last checkpoint:- On start, reads the checkpoint file (src/application.py:78)
- Skips already-processed tweets
- Continues from the last saved position
- Saves progress after each batch
Error handling
The command handles several error scenarios:File not found
Invalid format
Analysis failed
- The error is logged with full details
- The command exits with error code 1
- A partial result is returned showing how many tweets were successfully analyzed
- The checkpoint is saved up to the last successful batch
Permission denied
Empty tweet list
If no tweets are found in the CSV:Configuration
The command uses these settings:From environment (.env)
Your Google AI Studio API key for accessing Gemini
Your X (Twitter) username, used to generate tweet URLs
The Gemini model to use for analysis
Delay in seconds between API calls to respect rate limits
Logging verbosity: DEBUG, INFO, WARNING, or ERROR
From config (src/config.py)
Number of tweets to process before saving a checkpoint
Input CSV file location
Output results file location
Checkpoint file location
Example workflow
Here’s a complete analysis workflow:Logging
Detailed logs provide visibility into the analysis process:Performance
Analysis time depends on several factors:| Tweet Count | Default Rate (1s) | Fast Rate (0.5s) | API Calls |
|---|---|---|---|
| 100 tweets | ~2 minutes | ~1 minute | ~100 |
| 1,000 tweets | ~17 minutes | ~8 minutes | ~1,000 |
| 10,000 tweets | ~3 hours | ~1.5 hours | ~10,000 |
| 50,000 tweets | ~14 hours | ~7 hours | ~50,000 |
Actual time may be longer due to:
- Network latency
- API response times
- Automatic retry delays on errors
- Skipped retweets (reducing total API calls)
Optimizing performance
Adjust rate limiting for faster processing (be mindful of API quotas):API costs
Gemini 2.5 Flash pricing (as of 2026):- Free tier: 15 requests per minute, 1,500 requests per day
- Paid tier: Check current pricing at Google AI Pricing
| Archive Size | Daily Limit Coverage | Estimated Cost (Paid) |
|---|---|---|
| 1,000 tweets | Fits in 1 day | ~$0.01 |
| 10,000 tweets | Requires 7 days | ~$0.10 |
| 50,000 tweets | Requires 34 days | ~$0.50 |
Troubleshooting
Analysis stops midway
Cause: Rate limit exceeded, network error, or API quota reached Solution: Simply re-run the command - it will resume from the checkpointAll tweets marked as KEEP
Cause: Your criteria may be too lenient Solution: Refine yourconfig.json with more specific criteria
All tweets marked as DELETE
Cause: Your criteria may be too strict Solution: Relax your criteria or add more nuanced guidance inadditional_instructions
”API key is invalid”
Cause: Invalid or missingGEMINI_API_KEY
Solution: Verify your API key at Google AI Studio
”Missing required column: id”
Cause: Corrupted CSV file Solution: Deletedata/tweets/transformed/tweets.csv and re-run extract-tweets
Next steps
After successfully analyzing your tweets:- Review the
results.csvfile - Visit each flagged tweet URL to manually review
- Delete tweets you agree should be removed
- Track your progress by updating the
deletedcolumn in the CSV
Configuration guide
Learn how to customize your analysis criteria for better results