Skip to main content
The analyze-tweets command processes your extracted tweets using Google’s Gemini AI to identify tweets that don’t align with your configured criteria.

Usage

python src/main.py analyze-tweets
This command takes no additional arguments or options.

Prerequisites

You must run extract-tweets before running this command.
Ensure you have:
  1. ✅ Successfully run extract-tweets to generate the CSV file
  2. ✅ Set your GEMINI_API_KEY in the .env file
  3. ✅ Configured your criteria in config.json (optional, uses defaults if not provided)
  4. ✅ Set your X_USERNAME in the .env file

What it does

The analyze-tweets command performs the following operations:
  1. Loads tweets from the extracted CSV file
  2. Initializes the Gemini AI analyzer
  3. Processes tweets in batches (default: 10 tweets per batch)
  4. Skips retweets (tweets starting with “RT @”)
  5. Analyzes each tweet against your configured criteria
  6. Writes flagged tweets (marked for deletion) to the results CSV
  7. Saves progress checkpoints for resumability
The analysis respects rate limits and can be safely interrupted. Use Ctrl+C to stop, then re-run to resume from the last checkpoint.

Input requirements

data/tweets/transformed/tweets.csv
file
required
The CSV file generated by the extract-tweets command containing all your tweets.
.env
file
required
Environment configuration file containing:
GEMINI_API_KEY=your_api_key_here
X_USERNAME=your_twitter_username
GEMINI_MODEL=gemini-2.5-flash  # Optional
RATE_LIMIT_SECONDS=1.0  # Optional
config.json
file
Optional criteria configuration. If not provided, sensible defaults are used.
{
  "criteria": {
    "forbidden_words": ["word1", "word2"],
    "topics_to_exclude": ["Topic 1", "Topic 2"],
    "tone_requirements": ["Professional", "Respectful"],
    "additional_instructions": "Custom guidance for AI"
  }
}

Output

data/tweets/processed/results.csv
file
A CSV file containing tweets flagged for deletion with columns:
  • tweet_url: Full URL to the tweet (e.g., https://x.com/username/status/123456)
  • deleted: Boolean flag (initially false, update to true after manual deletion)
data/checkpoint.txt
file
A checkpoint file tracking analysis progress (tweet index). Used for resuming interrupted sessions.
Both output files are created automatically with secure permissions (0o600 - owner read/write only).

Success output

When the analysis succeeds, you’ll see:
Analyzing tweets...
Successfully analyzed 1247 tweets
The number displayed is the count of tweets actually analyzed (excluding retweets).
The analyzed count may be lower than your total tweet count because retweets are automatically skipped.

Behavior details

Batch processing

Tweets are processed in configurable batches (default: 10 tweets per batch):
  • Progress is logged after each batch (src/application.py:89-92)
  • Checkpoints are saved after completing each batch (src/application.py:116-117)
  • Rate limiting is applied between individual tweet analyses

Retweet handling

Retweets are automatically skipped (src/application.py:95-96):
if tweet.content.startswith("RT @"):
    continue  # Skip retweets
Retweets are not analyzed because they are not your original content and have different deletion implications.

Analysis decisions

The Gemini analyzer evaluates each tweet and returns one of two decisions:
KEEP
Decision
The tweet aligns with your criteria and should be kept
DELETE
Decision
The tweet violates your criteria and should be considered for deletion
Only tweets marked as DELETE are written to the results CSV (src/application.py:103-104).

Resumability

The command automatically resumes from the last checkpoint:
  1. On start, reads the checkpoint file (src/application.py:78)
  2. Skips already-processed tweets
  3. Continues from the last saved position
  4. Saves progress after each batch
# If interrupted at tweet 150:
python src/main.py analyze-tweets
# Automatically resumes from tweet 150
To restart analysis from the beginning, delete data/checkpoint.txt before running the command.

Error handling

The command handles several error scenarios:

File not found

Error type: file_not_foundCause: The transformed tweets CSV doesn’t existSolution: Run extract-tweets first to generate the required CSV file
Analyzing tweets...
Error: [Errno 2] No such file or directory: 'data/tweets/transformed/tweets.csv'

Invalid format

Error type: invalid_formatCause: The CSV file is corrupted or missing required columnsSolution: Delete the CSV and re-run extract-tweets

Analysis failed

Error type: analysis_failedCause: Failed to analyze a specific tweet (API error, network issue, etc.)Solution:
  • Check your internet connection
  • Verify your Gemini API key is valid
  • Check if you’ve hit API rate limits
  • Review logs for specific error details
Analyzing tweets...
Error: Failed to analyze tweet 1234567890: API request failed
When analysis fails on a specific tweet (src/application.py:105-114):
  • The error is logged with full details
  • The command exits with error code 1
  • A partial result is returned showing how many tweets were successfully analyzed
  • The checkpoint is saved up to the last successful batch

Permission denied

Error type: permission_deniedCause: Cannot read input CSV or write to output filesSolution: Check file and directory permissions

Empty tweet list

If no tweets are found in the CSV:
Analyzing tweets...
Successfully analyzed 0 tweets
This is not an error - the command exits successfully with count 0 (src/application.py:70-72).

Configuration

The command uses these settings:

From environment (.env)

GEMINI_API_KEY
string
required
Your Google AI Studio API key for accessing Gemini
X_USERNAME
string
required
Your X (Twitter) username, used to generate tweet URLs
GEMINI_MODEL
string
default:"gemini-2.5-flash"
The Gemini model to use for analysis
RATE_LIMIT_SECONDS
float
default:"1.0"
Delay in seconds between API calls to respect rate limits
LOG_LEVEL
string
default:"INFO"
Logging verbosity: DEBUG, INFO, WARNING, or ERROR

From config (src/config.py)

batch_size
int
default:"10"
Number of tweets to process before saving a checkpoint
TRANSFORMED_TWEETS_PATH
string
default:"data/tweets/transformed/tweets.csv"
Input CSV file location
PROCESSED_RESULTS_PATH
string
default:"data/tweets/processed/results.csv"
Output results file location
CHECKPOINT_PATH
string
default:"data/checkpoint.txt"
Checkpoint file location

Example workflow

Here’s a complete analysis workflow:
# 1. Ensure extraction is complete
python src/main.py extract-tweets
# Successfully extracted 1247 tweets

# 2. Verify configuration
cat .env | grep GEMINI_API_KEY
# GEMINI_API_KEY=your_api_key_here

# 3. Run analysis
python src/main.py analyze-tweets
# Analyzing tweets...
# Successfully analyzed 1247 tweets

# 4. Review results
head -5 data/tweets/processed/results.csv
# tweet_url,deleted
# https://x.com/username/status/123456,false
# https://x.com/username/status/789012,false

# 5. Manually review and delete flagged tweets
# Update the 'deleted' column as you process each URL

Logging

Detailed logs provide visibility into the analysis process:
INFO: Loading tweets from data/tweets/transformed/tweets.csv
INFO: Loaded 1247 tweets for analysis
INFO: Gemini analyzer initialized
INFO: Resuming from tweet index 0
INFO: Processing batch 1/125 (tweets 1-10 of 1247)
DEBUG: Tweet 123456: KEEP
DEBUG: Tweet 123457: DELETE
INFO: Checkpoint saved at index 10
...
INFO: Analysis complete. Results written to data/tweets/processed/results.csv
Enable DEBUG logging for per-tweet decisions:
LOG_LEVEL=DEBUG python src/main.py analyze-tweets

Performance

Analysis time depends on several factors:
Tweet CountDefault Rate (1s)Fast Rate (0.5s)API Calls
100 tweets~2 minutes~1 minute~100
1,000 tweets~17 minutes~8 minutes~1,000
10,000 tweets~3 hours~1.5 hours~10,000
50,000 tweets~14 hours~7 hours~50,000
Actual time may be longer due to:
  • Network latency
  • API response times
  • Automatic retry delays on errors
  • Skipped retweets (reducing total API calls)

Optimizing performance

Adjust rate limiting for faster processing (be mindful of API quotas):
# In .env file
RATE_LIMIT_SECONDS=0.5  # 2 requests per second instead of 1
Increase batch size for less frequent checkpointing:
# In src/config.py
batch_size: int = 50  # Save checkpoint every 50 tweets
Use parallel processing for very large archives:
This requires code modification and careful rate limit management. Not recommended unless you have >100k tweets.

API costs

Gemini 2.5 Flash pricing (as of 2026):
  • Free tier: 15 requests per minute, 1,500 requests per day
  • Paid tier: Check current pricing at Google AI Pricing
For typical usage:
Archive SizeDaily Limit CoverageEstimated Cost (Paid)
1,000 tweetsFits in 1 day~$0.01
10,000 tweetsRequires 7 days~$0.10
50,000 tweetsRequires 34 days~$0.50
To stay within free tier limits, process large archives over multiple days. The checkpoint system makes this seamless.

Troubleshooting

Analysis stops midway

Cause: Rate limit exceeded, network error, or API quota reached Solution: Simply re-run the command - it will resume from the checkpoint

All tweets marked as KEEP

Cause: Your criteria may be too lenient Solution: Refine your config.json with more specific criteria

All tweets marked as DELETE

Cause: Your criteria may be too strict Solution: Relax your criteria or add more nuanced guidance in additional_instructions

”API key is invalid”

Cause: Invalid or missing GEMINI_API_KEY Solution: Verify your API key at Google AI Studio

”Missing required column: id”

Cause: Corrupted CSV file Solution: Delete data/tweets/transformed/tweets.csv and re-run extract-tweets

Next steps

After successfully analyzing your tweets:
  1. Review the results.csv file
  2. Visit each flagged tweet URL to manually review
  3. Delete tweets you agree should be removed
  4. Track your progress by updating the deleted column in the CSV
No automated deletion: You must manually delete tweets. The tool only provides recommendations.

Configuration guide

Learn how to customize your analysis criteria for better results

Build docs developers (and LLMs) love