Skip to main content
Before starting, make sure you’ve completed the installation steps, including setting up your .env file and placing your Twitter archive in data/tweets/tweets.json.

Your first analysis

The tweet audit process involves two simple commands:
1

Extract tweets from archive

Convert your X archive JSON to a structured CSV format:
python src/main.py extract-tweets
Expected output:
Extracting tweets from archive...
Successfully extracted 1247 tweets
This command:
  • Reads data/tweets/tweets.json (your X archive)
  • Parses and validates tweet data
  • Writes structured CSV to data/tweets/transformed/tweets.csv
  • Skips retweets automatically
The extraction process is defined in src/application.py:47-62. It uses JSONParser to read the archive and CSVWriter to create the transformed output.
2

Analyze tweets with AI

Send tweets to Gemini AI for evaluation against your criteria:
python src/main.py analyze-tweets
Expected output:
Analyzing tweets...
Loading tweets from data/tweets/transformed/tweets.csv
Loaded 1247 tweets for analysis
Resuming from tweet index 0
Processing batch 1/125 (tweets 1-10 of 1247)
Processing batch 2/125 (tweets 11-20 of 1247)
...
Analysis complete. Results written to data/tweets/processed/results.csv
Successfully analyzed 1247 tweets
This command:
  • Loads tweets from the transformed CSV
  • Processes tweets in batches (default: 10 tweets per batch)
  • Sends each tweet to Gemini AI for analysis
  • Writes flagged tweets to data/tweets/processed/results.csv
  • Saves checkpoints after each batch for resumability
Analysis respects rate limits with a default 1-second delay between API calls. For 1,000 tweets, expect ~17 minutes of processing time.
3

Review results

Open the results CSV to see flagged tweets:
cat data/tweets/processed/results.csv
Example output:
tweet_url,deleted
https://x.com/username/status/1234567890,false
https://x.com/username/status/1234567891,false
https://x.com/username/status/1234567892,false
Each row contains:
  • tweet_url - Direct link to the tweet on X
  • deleted - Status flag (false by default, update to true after deleting)
4

Delete flagged tweets

Review and manually delete tweets:
  1. Visit each URL in your browser
  2. Read the tweet and decide if you agree with the AI’s assessment
  3. Delete manually if you agree (click ⋯ → Delete)
  4. Update the CSV by changing false to true for deleted tweets
The tool never deletes tweets automatically. You maintain complete control over what gets deleted.

Understanding the output

After running both commands, your data/ directory will look like:
data/
├── tweets/
│   ├── tweets.json              # Original X archive
│   ├── transformed/
│   │   └── tweets.csv           # Extracted and structured tweets
│   └── processed/
│       └── results.csv          # Tweets flagged for deletion
└── checkpoint.txt               # Resume point if interrupted

File details

Your original X archive export. This file is never modified.
[
  {
    "tweet": {
      "id": "1234567890",
      "full_text": "This is my tweet content",
      "created_at": "Wed Mar 15 10:30:00 +0000 2023"
    }
  }
]
Structured CSV with extracted tweet data. Used as input for analysis.
id,content,created_at
1234567890,"This is my tweet content","2023-03-15T10:30:00+00:00"
1234567891,"Another tweet","2023-03-14T15:20:00+00:00"
Final output with URLs of tweets flagged for deletion.
tweet_url,deleted
https://x.com/username/status/1234567890,false
Update the deleted column to true after manually deleting tweets to track your progress.
Stores the last processed tweet index for resumability.
120
If analysis is interrupted, the tool automatically resumes from this point.

Advanced usage

Resume interrupted analysis

The tool automatically saves progress after each batch. If interrupted (Ctrl+C, crash, API quota exceeded):
# Simply run analyze again - it automatically resumes
python src/main.py analyze-tweets
Output:
Analyzing tweets...
Loaded 1247 tweets for analysis
Resuming from tweet index 120  # <-- Picks up where it left off
Processing batch 13/125 (tweets 121-130 of 1247)
...
The checkpoint system is implemented in src/application.py:77-79 using the Checkpoint class that reads/writes data/checkpoint.txt.

Test with a small sample

Before processing thousands of tweets, test your criteria on a small sample:
1

Create test archive

Extract 5-10 tweets manually from your archive:
# Create a test file with just a few tweets
python -c "import json; tweets = json.load(open('data/tweets/tweets.json')); json.dump(tweets[:10], open('data/tweets/tweets-test.json', 'w'))"
2

Temporarily update config

Edit src/config.py to use the test file:
tweets_archive_path: str = "data/tweets/tweets-test.json"
3

Run analysis on test data

python src/main.py extract-tweets
python src/main.py analyze-tweets
4

Review and refine

Check data/tweets/processed/results.csv and adjust your config.json criteria based on results.
5

Reset and run full analysis

# Delete test outputs and checkpoint
rm -rf data/tweets/transformed/ data/tweets/processed/ data/checkpoint.txt

# Restore original config
# Edit src/config.py back to: tweets_archive_path: str = "data/tweets/tweets.json"

# Run full analysis
python src/main.py extract-tweets
python src/main.py analyze-tweets

Adjust processing speed

Control API call frequency to balance speed and rate limits:
# Edit .env
RATE_LIMIT_SECONDS=0.5
Faster rates may hit API limits. If you see 429 errors, increase RATE_LIMIT_SECONDS.

Change batch size

Modify batch size in src/config.py (default: 10 tweets per batch):
src/config.py
batch_size: int = 10  # Process 10 tweets per batch
Trade-offs:
  • Larger batches (e.g., 50) = Faster processing, longer recovery if interrupted
  • Smaller batches (e.g., 5) = More frequent checkpoints, better resumability

Enable debug logging

Get detailed logs for troubleshooting:
.env
LOG_LEVEL=DEBUG
Output with DEBUG enabled:
2024-03-15 10:30:00 - __main__ - INFO - Analyzing tweets...
2024-03-15 10:30:00 - application - INFO - Loading tweets from data/tweets/transformed/tweets.csv
2024-03-15 10:30:01 - analyzer - DEBUG - Tweet 1234567890: DELETE
2024-03-15 10:30:02 - analyzer - DEBUG - Tweet 1234567891: KEEP
2024-03-15 10:30:03 - analyzer - DEBUG - Tweet 1234567892: DELETE
...

Understanding AI decisions

Gemini AI evaluates each tweet and returns a decision:
{
  "decision": "DELETE",
  "reason": "Contains forbidden word 'crypto' and promotes cryptocurrency hype"
}
The prompt sent to Gemini (defined in src/analyzer.py:107-136):
You are evaluating tweets for a professional's Twitter cleanup.

Tweet ID: 1234567890
Tweet: "Check out this crypto project!"

Mark for deletion if it violates any of these criteria:
1. Profanity or unprofessional language
2. Personal attacks or insults
3. Outdated political opinions
4. Contains any of these words: crypto, NFT, web3

Additional guidance: Flag any content that could harm professional reputation

Respond in JSON format:
{
  "decision": "DELETE" or "KEEP",
  "reason": "brief explanation"
}

Common workflows

Process all tweets and clean up your entire history:
# Extract all tweets
python src/main.py extract-tweets

# Analyze all tweets
python src/main.py analyze-tweets

# Review results
cat data/tweets/processed/results.csv

# Delete flagged tweets manually via X.com
Test criteria on samples before full analysis:
# 1. Test on small sample (manually create test file)
python src/main.py extract-tweets
python src/main.py analyze-tweets

# 2. Review results and adjust config.json
cat data/tweets/processed/results.csv

# 3. Clear outputs and retest
rm -rf data/tweets/transformed/ data/tweets/processed/ data/checkpoint.txt

# 4. Once satisfied, run on full archive
python src/main.py extract-tweets
python src/main.py analyze-tweets
Run audits regularly to catch new problematic tweets:
# 1. Download fresh archive from X
# 2. Replace data/tweets/tweets.json
# 3. Clear previous results
rm -rf data/tweets/transformed/ data/tweets/processed/ data/checkpoint.txt

# 4. Run fresh analysis
python src/main.py extract-tweets
python src/main.py analyze-tweets

Troubleshooting

The transformed CSV is corrupted. Delete and re-extract:
rm -rf data/tweets/transformed/
python src/main.py extract-tweets
You’re hitting Gemini’s rate limits. Solutions:
  1. Increase delay in .env:
    RATE_LIMIT_SECONDS=2.0
    
  2. Wait and resume (the tool auto-resumes):
    # Wait 1 hour, then:
    python src/main.py analyze-tweets
    
  3. Check daily quota at Google AI Studio
Gemini returned an empty response. The tool will retry automatically (up to 3 times with exponential backoff).If it persists:
# Check your API key is valid
echo $GEMINI_API_KEY

# Try a different model
# Edit .env:
GEMINI_MODEL=gemini-1.5-flash
Your criteria might be too strict (or too lenient). Adjust config.json:If flagging too much:
  • Remove some forbidden_words
  • Make topics_to_exclude more specific
  • Soften tone_requirements
If flagging too little:
  • Add more forbidden_words
  • Expand topics_to_exclude
  • Strengthen additional_instructions
Then clear checkpoint and re-analyze:
rm data/checkpoint.txt data/tweets/processed/results.csv
python src/main.py analyze-tweets

Cost and time estimates

Gemini API limits (free tier)

  • Requests per minute: 15
  • Requests per day: 1,500
  • Cost: Free for moderate use

Processing time

100 tweets

~2 minutes(1 req/sec)

1,000 tweets

~17 minutes(1 req/sec)

10,000 tweets

~7 days(1,500 req/day limit)
For large archives (10,000+ tweets), consider spreading analysis over multiple days or upgrading to a paid Gemini API plan.

Next steps

Configuration guide

Learn advanced criteria customization and settings

CLI reference

Complete command-line interface documentation

Troubleshooting

Common issues and solutions

GitHub repository

View source code and contribute

Build docs developers (and LLMs) love