analyze-tweets

The analyze-tweets command processes your extracted tweets using Google’s Gemini AI to identify tweets that don’t align with your configured criteria.

Usage

python src/main.py analyze-tweets

This command takes no additional arguments or options.

Prerequisites

You must run extract-tweets before running this command.

Ensure you have:

✅ Successfully run extract-tweets to generate the CSV file
✅ Set your GEMINI_API_KEY in the .env file
✅ Configured your criteria in config.json (optional, uses defaults if not provided)
✅ Set your X_USERNAME in the .env file

What it does

The analyze-tweets command performs the following operations:

Loads tweets from the extracted CSV file
Initializes the Gemini AI analyzer
Processes tweets in batches (default: 10 tweets per batch)
Skips retweets (tweets starting with “RT @”)
Analyzes each tweet against your configured criteria
Writes flagged tweets (marked for deletion) to the results CSV
Saves progress checkpoints for resumability

The analysis respects rate limits and can be safely interrupted. Use Ctrl+C to stop, then re-run to resume from the last checkpoint.

Input requirements

data/tweets/transformed/tweets.csv

file

required

The CSV file generated by the extract-tweets command containing all your tweets.

.env

file

required

Environment configuration file containing:

GEMINI_API_KEY=your_api_key_here
X_USERNAME=your_twitter_username
GEMINI_MODEL=gemini-2.5-flash  # Optional
RATE_LIMIT_SECONDS=1.0  # Optional

config.json

file

Optional criteria configuration. If not provided, sensible defaults are used.

{
  "criteria": {
    "forbidden_words": ["word1", "word2"],
    "topics_to_exclude": ["Topic 1", "Topic 2"],
    "tone_requirements": ["Professional", "Respectful"],
    "additional_instructions": "Custom guidance for AI"
  }
}

Output

data/tweets/processed/results.csv

file

A CSV file containing tweets flagged for deletion with columns:

tweet_url: Full URL to the tweet (e.g., https://x.com/username/status/123456)
deleted: Boolean flag (initially false, update to true after manual deletion)

data/checkpoint.txt

file

A checkpoint file tracking analysis progress (tweet index). Used for resuming interrupted sessions.

Both output files are created automatically with secure permissions (0o600 - owner read/write only).

Success output

When the analysis succeeds, you’ll see:

Analyzing tweets...
Successfully analyzed 1247 tweets

The number displayed is the count of tweets actually analyzed (excluding retweets).

The analyzed count may be lower than your total tweet count because retweets are automatically skipped.

Behavior details

Batch processing

Tweets are processed in configurable batches (default: 10 tweets per batch):

Progress is logged after each batch (src/application.py:89-92)
Checkpoints are saved after completing each batch (src/application.py:116-117)
Rate limiting is applied between individual tweet analyses

Retweet handling

Retweets are automatically skipped (src/application.py:95-96):

if tweet.content.startswith("RT @"):
    continue  # Skip retweets

Retweets are not analyzed because they are not your original content and have different deletion implications.

Analysis decisions

The Gemini analyzer evaluates each tweet and returns one of two decisions:

KEEP

Decision

The tweet aligns with your criteria and should be kept

DELETE

Decision

The tweet violates your criteria and should be considered for deletion

Only tweets marked as DELETE are written to the results CSV (src/application.py:103-104).

Resumability

The command automatically resumes from the last checkpoint:

On start, reads the checkpoint file (src/application.py:78)
Skips already-processed tweets
Continues from the last saved position
Saves progress after each batch

# If interrupted at tweet 150:
python src/main.py analyze-tweets
# Automatically resumes from tweet 150

To restart analysis from the beginning, delete data/checkpoint.txt before running the command.

Error handling

The command handles several error scenarios:

File not found

Error type: file_not_foundCause: The transformed tweets CSV doesn’t existSolution: Run extract-tweets first to generate the required CSV file

Analyzing tweets...
Error: [Errno 2] No such file or directory: 'data/tweets/transformed/tweets.csv'

Invalid format

Error type: invalid_formatCause: The CSV file is corrupted or missing required columnsSolution: Delete the CSV and re-run extract-tweets

Analysis failed

Error type: analysis_failedCause: Failed to analyze a specific tweet (API error, network issue, etc.)Solution:

Check your internet connection
Verify your Gemini API key is valid
Check if you’ve hit API rate limits
Review logs for specific error details

Analyzing tweets...
Error: Failed to analyze tweet 1234567890: API request failed

When analysis fails on a specific tweet (src/application.py:105-114):

The error is logged with full details
The command exits with error code 1
A partial result is returned showing how many tweets were successfully analyzed
The checkpoint is saved up to the last successful batch

Permission denied

Error type: permission_deniedCause: Cannot read input CSV or write to output filesSolution: Check file and directory permissions

Empty tweet list

If no tweets are found in the CSV:

Analyzing tweets...
Successfully analyzed 0 tweets

This is not an error - the command exits successfully with count 0 (src/application.py:70-72).

Configuration

The command uses these settings:

From environment (.env)

GEMINI_API_KEY

string

required

Your Google AI Studio API key for accessing Gemini

X_USERNAME

string

required

Your X (Twitter) username, used to generate tweet URLs

GEMINI_MODEL

string

default:"gemini-2.5-flash"

The Gemini model to use for analysis

RATE_LIMIT_SECONDS

float

default:"1.0"

Delay in seconds between API calls to respect rate limits

LOG_LEVEL

string

default:"INFO"

Logging verbosity: DEBUG, INFO, WARNING, or ERROR

From config (src/config.py)

batch_size

int

default:"10"

Number of tweets to process before saving a checkpoint

TRANSFORMED_TWEETS_PATH

string

default:"data/tweets/transformed/tweets.csv"

Input CSV file location

PROCESSED_RESULTS_PATH

string

default:"data/tweets/processed/results.csv"

Output results file location

CHECKPOINT_PATH

string

default:"data/checkpoint.txt"

Checkpoint file location

Example workflow

Here’s a complete analysis workflow:

# 1. Ensure extraction is complete
python src/main.py extract-tweets
# Successfully extracted 1247 tweets

# 2. Verify configuration
cat .env | grep GEMINI_API_KEY
# GEMINI_API_KEY=your_api_key_here

# 3. Run analysis
python src/main.py analyze-tweets
# Analyzing tweets...
# Successfully analyzed 1247 tweets

# 4. Review results
head -5 data/tweets/processed/results.csv
# tweet_url,deleted
# https://x.com/username/status/123456,false
# https://x.com/username/status/789012,false

# 5. Manually review and delete flagged tweets
# Update the 'deleted' column as you process each URL

Logging

Detailed logs provide visibility into the analysis process:

INFO: Loading tweets from data/tweets/transformed/tweets.csv
INFO: Loaded 1247 tweets for analysis
INFO: Gemini analyzer initialized
INFO: Resuming from tweet index 0
INFO: Processing batch 1/125 (tweets 1-10 of 1247)
DEBUG: Tweet 123456: KEEP
DEBUG: Tweet 123457: DELETE
INFO: Checkpoint saved at index 10
...
INFO: Analysis complete. Results written to data/tweets/processed/results.csv

Enable DEBUG logging for per-tweet decisions:

LOG_LEVEL=DEBUG python src/main.py analyze-tweets

Performance

Analysis time depends on several factors:

Tweet Count	Default Rate (1s)	Fast Rate (0.5s)	API Calls
100 tweets	~2 minutes	~1 minute	~100
1,000 tweets	~17 minutes	~8 minutes	~1,000
10,000 tweets	~3 hours	~1.5 hours	~10,000
50,000 tweets	~14 hours	~7 hours	~50,000

Actual time may be longer due to:

Network latency
API response times
Automatic retry delays on errors
Skipped retweets (reducing total API calls)

Optimizing performance

Adjust rate limiting for faster processing (be mindful of API quotas):

# In .env file
RATE_LIMIT_SECONDS=0.5  # 2 requests per second instead of 1

Increase batch size for less frequent checkpointing:

# In src/config.py
batch_size: int = 50  # Save checkpoint every 50 tweets

Use parallel processing for very large archives:

This requires code modification and careful rate limit management. Not recommended unless you have >100k tweets.

API costs

Gemini 2.5 Flash pricing (as of 2026):

Free tier: 15 requests per minute, 1,500 requests per day
Paid tier: Check current pricing at Google AI Pricing

For typical usage:

Archive Size	Daily Limit Coverage	Estimated Cost (Paid)
1,000 tweets	Fits in 1 day	~$0.01
10,000 tweets	Requires 7 days	~$0.10
50,000 tweets	Requires 34 days	~$0.50

To stay within free tier limits, process large archives over multiple days. The checkpoint system makes this seamless.

Troubleshooting

Analysis stops midway

Cause: Rate limit exceeded, network error, or API quota reached Solution: Simply re-run the command - it will resume from the checkpoint

All tweets marked as KEEP

Cause: Your criteria may be too lenient Solution: Refine your config.json with more specific criteria

All tweets marked as DELETE

Cause: Your criteria may be too strict Solution: Relax your criteria or add more nuanced guidance in additional_instructions

”API key is invalid”

Cause: Invalid or missing GEMINI_API_KEY Solution: Verify your API key at Google AI Studio

”Missing required column: id”

Cause: Corrupted CSV file Solution: Delete data/tweets/transformed/tweets.csv and re-run extract-tweets

Next steps

After successfully analyzing your tweets:

Review the results.csv file
Visit each flagged tweet URL to manually review
Delete tweets you agree should be removed
Track your progress by updating the deleted column in the CSV

No automated deletion: You must manually delete tweets. The tool only provides recommendations.

Configuration guide

Learn how to customize your analysis criteria for better results

Commands

Examples

Usage

Prerequisites

What it does

Input requirements

Output

Success output

Behavior details

Batch processing

Retweet handling

Analysis decisions

Resumability

Error handling

File not found

Invalid format

Analysis failed

Permission denied

Empty tweet list

Configuration

From environment (.env)

From config (src/config.py)

Example workflow

Logging

Performance

Optimizing performance

API costs

Troubleshooting

Analysis stops midway

All tweets marked as KEEP

All tweets marked as DELETE

”API key is invalid”

”Missing required column: id”

Next steps

Configuration guide

Build docs developers (and LLMs) love

Commands

Examples

​Usage

​Prerequisites

​What it does

​Input requirements

​Output

​Success output

​Behavior details

​Batch processing

​Retweet handling

​Analysis decisions

​Resumability

​Error handling

​File not found

​Invalid format

​Analysis failed

​Permission denied

​Empty tweet list

​Configuration

​From environment (.env)

​From config (src/config.py)

​Example workflow

​Logging

​Performance

​Optimizing performance

​API costs

​Troubleshooting

​Analysis stops midway

​All tweets marked as KEEP

​All tweets marked as DELETE

​”API key is invalid”

​”Missing required column: id”

​Next steps

Configuration guide

Build docs developers (and LLMs) love

Usage

Prerequisites

What it does

Input requirements

Output

Success output

Behavior details

Batch processing

Retweet handling

Analysis decisions

Resumability

Error handling

File not found

Invalid format

Analysis failed

Permission denied

Empty tweet list

Configuration

From environment (.env)

From config (src/config.py)

Example workflow

Logging

Performance

Optimizing performance

API costs

Troubleshooting

Analysis stops midway

All tweets marked as KEEP

All tweets marked as DELETE

”API key is invalid”

”Missing required column: id”

Next steps