Quick start

Before starting, make sure you’ve completed the installation steps, including setting up your .env file and placing your Twitter archive in data/tweets/tweets.json.

Your first analysis

The tweet audit process involves two simple commands:

Extract tweets from archive

Convert your X archive JSON to a structured CSV format:

python src/main.py extract-tweets

Expected output:

Extracting tweets from archive...
Successfully extracted 1247 tweets

This command:

Reads data/tweets/tweets.json (your X archive)
Parses and validates tweet data
Writes structured CSV to data/tweets/transformed/tweets.csv
Skips retweets automatically

The extraction process is defined in src/application.py:47-62. It uses JSONParser to read the archive and CSVWriter to create the transformed output.

Analyze tweets with AI

Send tweets to Gemini AI for evaluation against your criteria:

python src/main.py analyze-tweets

Expected output:

Analyzing tweets...
Loading tweets from data/tweets/transformed/tweets.csv
Loaded 1247 tweets for analysis
Resuming from tweet index 0
Processing batch 1/125 (tweets 1-10 of 1247)
Processing batch 2/125 (tweets 11-20 of 1247)
...
Analysis complete. Results written to data/tweets/processed/results.csv
Successfully analyzed 1247 tweets

This command:

Loads tweets from the transformed CSV
Processes tweets in batches (default: 10 tweets per batch)
Sends each tweet to Gemini AI for analysis
Writes flagged tweets to data/tweets/processed/results.csv
Saves checkpoints after each batch for resumability

Analysis respects rate limits with a default 1-second delay between API calls. For 1,000 tweets, expect ~17 minutes of processing time.

Review results

Open the results CSV to see flagged tweets:

cat data/tweets/processed/results.csv

Example output:

tweet_url,deleted
https://x.com/username/status/1234567890,false
https://x.com/username/status/1234567891,false
https://x.com/username/status/1234567892,false

Each row contains:

tweet_url - Direct link to the tweet on X
deleted - Status flag (false by default, update to true after deleting)

Delete flagged tweets

Review and manually delete tweets:

Visit each URL in your browser
Read the tweet and decide if you agree with the AI’s assessment
Delete manually if you agree (click ⋯ → Delete)
Update the CSV by changing false to true for deleted tweets

The tool never deletes tweets automatically. You maintain complete control over what gets deleted.

Understanding the output

After running both commands, your data/ directory will look like:

data/
├── tweets/
│   ├── tweets.json              # Original X archive
│   ├── transformed/
│   │   └── tweets.csv           # Extracted and structured tweets
│   └── processed/
│       └── results.csv          # Tweets flagged for deletion
└── checkpoint.txt               # Resume point if interrupted

File details

tweets.json

Your original X archive export. This file is never modified.

[
  {
    "tweet": {
      "id": "1234567890",
      "full_text": "This is my tweet content",
      "created_at": "Wed Mar 15 10:30:00 +0000 2023"
    }
  }
]

transformed/tweets.csv

Structured CSV with extracted tweet data. Used as input for analysis.

id,content,created_at
1234567890,"This is my tweet content","2023-03-15T10:30:00+00:00"
1234567891,"Another tweet","2023-03-14T15:20:00+00:00"

processed/results.csv

Final output with URLs of tweets flagged for deletion.

tweet_url,deleted
https://x.com/username/status/1234567890,false

Update the deleted column to true after manually deleting tweets to track your progress.

checkpoint.txt

Stores the last processed tweet index for resumability.

If analysis is interrupted, the tool automatically resumes from this point.

Advanced usage

Resume interrupted analysis

The tool automatically saves progress after each batch. If interrupted (Ctrl+C, crash, API quota exceeded):

# Simply run analyze again - it automatically resumes
python src/main.py analyze-tweets

Output:

Analyzing tweets...
Loaded 1247 tweets for analysis
Resuming from tweet index 120  # <-- Picks up where it left off
Processing batch 13/125 (tweets 121-130 of 1247)
...

The checkpoint system is implemented in src/application.py:77-79 using the Checkpoint class that reads/writes data/checkpoint.txt.

Test with a small sample

Before processing thousands of tweets, test your criteria on a small sample:

Create test archive

Extract 5-10 tweets manually from your archive:

# Create a test file with just a few tweets
python -c "import json; tweets = json.load(open('data/tweets/tweets.json')); json.dump(tweets[:10], open('data/tweets/tweets-test.json', 'w'))"

Temporarily update config

Edit src/config.py to use the test file:

tweets_archive_path: str = "data/tweets/tweets-test.json"

Run analysis on test data

python src/main.py extract-tweets
python src/main.py analyze-tweets

Review and refine

Check data/tweets/processed/results.csv and adjust your config.json criteria based on results.

Reset and run full analysis

# Delete test outputs and checkpoint
rm -rf data/tweets/transformed/ data/tweets/processed/ data/checkpoint.txt

# Restore original config
# Edit src/config.py back to: tweets_archive_path: str = "data/tweets/tweets.json"

# Run full analysis
python src/main.py extract-tweets
python src/main.py analyze-tweets

Adjust processing speed

Control API call frequency to balance speed and rate limits:

# Edit .env
RATE_LIMIT_SECONDS=0.5

Faster rates may hit API limits. If you see 429 errors, increase RATE_LIMIT_SECONDS.

Change batch size

Modify batch size in src/config.py (default: 10 tweets per batch):

src/config.py

batch_size: int = 10  # Process 10 tweets per batch

Trade-offs:

Larger batches (e.g., 50) = Faster processing, longer recovery if interrupted
Smaller batches (e.g., 5) = More frequent checkpoints, better resumability

Enable debug logging

Get detailed logs for troubleshooting:

.env

LOG_LEVEL=DEBUG

Output with DEBUG enabled:

2024-03-15 10:30:00 - __main__ - INFO - Analyzing tweets...
2024-03-15 10:30:00 - application - INFO - Loading tweets from data/tweets/transformed/tweets.csv
2024-03-15 10:30:01 - analyzer - DEBUG - Tweet 1234567890: DELETE
2024-03-15 10:30:02 - analyzer - DEBUG - Tweet 1234567891: KEEP
2024-03-15 10:30:03 - analyzer - DEBUG - Tweet 1234567892: DELETE
...

Understanding AI decisions

Gemini AI evaluates each tweet and returns a decision:

{
  "decision": "DELETE",
  "reason": "Contains forbidden word 'crypto' and promotes cryptocurrency hype"
}

The prompt sent to Gemini (defined in src/analyzer.py:107-136):

You are evaluating tweets for a professional's Twitter cleanup.

Tweet ID: 1234567890
Tweet: "Check out this crypto project!"

Mark for deletion if it violates any of these criteria:
1. Profanity or unprofessional language
2. Personal attacks or insults
3. Outdated political opinions
4. Contains any of these words: crypto, NFT, web3

Additional guidance: Flag any content that could harm professional reputation

Respond in JSON format:
{
  "decision": "DELETE" or "KEEP",
  "reason": "brief explanation"
}

Common workflows

Full archive cleanup

Process all tweets and clean up your entire history:

# Extract all tweets
python src/main.py extract-tweets

# Analyze all tweets
python src/main.py analyze-tweets

# Review results
cat data/tweets/processed/results.csv

# Delete flagged tweets manually via X.com

Iterative refinement

Test criteria on samples before full analysis:

# 1. Test on small sample (manually create test file)
python src/main.py extract-tweets
python src/main.py analyze-tweets

# 2. Review results and adjust config.json
cat data/tweets/processed/results.csv

# 3. Clear outputs and retest
rm -rf data/tweets/transformed/ data/tweets/processed/ data/checkpoint.txt

# 4. Once satisfied, run on full archive
python src/main.py extract-tweets
python src/main.py analyze-tweets

Periodic audits

Run audits regularly to catch new problematic tweets:

# 1. Download fresh archive from X
# 2. Replace data/tweets/tweets.json
# 3. Clear previous results
rm -rf data/tweets/transformed/ data/tweets/processed/ data/checkpoint.txt

# 4. Run fresh analysis
python src/main.py extract-tweets
python src/main.py analyze-tweets

Troubleshooting

Missing required column: id

The transformed CSV is corrupted. Delete and re-extract:

rm -rf data/tweets/transformed/
python src/main.py extract-tweets

API rate limit errors (429)

You’re hitting Gemini’s rate limits. Solutions:

Increase delay in .env:
```
RATE_LIMIT_SECONDS=2.0
```

Wait and resume (the tool auto-resumes):

# Wait 1 hour, then:
python src/main.py analyze-tweets

Check daily quota at Google AI Studio

Empty response from Gemini

Gemini returned an empty response. The tool will retry automatically (up to 3 times with exponential backoff).If it persists:

# Check your API key is valid
echo $GEMINI_API_KEY

# Try a different model
# Edit .env:
GEMINI_MODEL=gemini-1.5-flash

Tool keeps flagging everything (or nothing)

Your criteria might be too strict (or too lenient). Adjust config.json:If flagging too much:

Remove some forbidden_words
Make topics_to_exclude more specific
Soften tone_requirements

If flagging too little:

Add more forbidden_words
Expand topics_to_exclude
Strengthen additional_instructions

Then clear checkpoint and re-analyze:

rm data/checkpoint.txt data/tweets/processed/results.csv
python src/main.py analyze-tweets

Cost and time estimates

Gemini API limits (free tier)

Requests per minute: 15
Requests per day: 1,500
Cost: Free for moderate use

Processing time

100 tweets

~2 minutes(1 req/sec)

1,000 tweets

~17 minutes(1 req/sec)

10,000 tweets

~7 days(1,500 req/day limit)

For large archives (10,000+ tweets), consider spreading analysis over multiple days or upgrading to a paid Gemini API plan.

Next steps

Configuration guide

Learn advanced criteria customization and settings

CLI reference

Complete command-line interface documentation

Troubleshooting

Common issues and solutions

GitHub repository

View source code and contribute

Get Started

Guides

Advanced

Support

Your first analysis

Understanding the output

File details

Advanced usage

Resume interrupted analysis

Test with a small sample

Adjust processing speed

Change batch size

Enable debug logging

Understanding AI decisions

Common workflows

Troubleshooting

Cost and time estimates

Gemini API limits (free tier)

Processing time

100 tweets

1,000 tweets

10,000 tweets

Next steps

Configuration guide

CLI reference

Troubleshooting

GitHub repository

Build docs developers (and LLMs) love

Get Started

Guides

Advanced

Support

​Your first analysis

​Understanding the output

​File details

​Advanced usage

​Resume interrupted analysis

​Test with a small sample

​Adjust processing speed

​Change batch size

​Enable debug logging

​Understanding AI decisions

​Common workflows

​Troubleshooting

​Cost and time estimates

​Gemini API limits (free tier)

​Processing time

100 tweets

1,000 tweets

10,000 tweets

​Next steps

Configuration guide

CLI reference

Troubleshooting

GitHub repository

Build docs developers (and LLMs) love

Your first analysis

Understanding the output

File details

Advanced usage

Resume interrupted analysis

Test with a small sample

Adjust processing speed

Change batch size

Enable debug logging

Understanding AI decisions

Common workflows

Troubleshooting

Cost and time estimates

Gemini API limits (free tier)

Processing time

Next steps