What counts as a large archive?
Small
< 1,000 tweets~20 minutesNo special handling needed
Medium
1,000 - 10,000 tweets~3 hoursBasic rate limiting sufficient
Large
10,000+ tweetsMultiple daysRequires planning (this guide)
Understanding API limits
Gemini 2.5 Flash (free tier) limits:Rate limits
Rate limits
- 15 requests per minute
- 1,500 requests per day
RATE_LIMIT_SECONDS=1.0):- You make 60 requests per minute → Exceeds limit
- You make ~3,600 requests per hour → Exceeds daily limit
Cost estimates
Cost estimates
Gemini 2.5 Flash is free within limits:
- Input: 1,500 requests/day free
- Output: Generous free tier
- 10,000 tweets = 10,000 API calls
- At 1,500/day = ~7 days to complete
- Cost: $0 (free tier)
Quota exceeded behavior
Quota exceeded behavior
When you hit rate limits:
- Tool automatically retries with exponential backoff
- Wait time increases: 1s → 2s → 4s
- After 3 attempts, the request fails
- Progress is saved; resume with
analyze-tweets
Strategy 1: Multi-day processing
The simplest approach for free tier users with large archives.Configuration
Adjust rate limiting to stay within daily limits:.env
Daily workflow
Day 1: Start processing
Run analysis in the morning:Let it run until you hit the daily limit (~1,400 tweets).
Strategy 2: Paid API tier
For users who want to process large archives quickly.Upgrade to paid tier
- Visit Google AI Studio
- Enable billing on your Google Cloud project
- Paid tier removes the 1,500/day limit
Optimized configuration
Process much faster with paid tier:.env
Cost calculation
Gemini 2.5 Flash pricing (as of 2024):- Input: 0.0001 per tweet)
- Output: $0.30 per 1M tokens
- Input cost: ~$1.00
- Output cost: ~$0.50
- Total: ~$1.50
Pricing changes over time. Check Google’s pricing page for current rates.
Processing timeline
- 10,000 tweets at 40 req/min = ~4 hours
- 50,000 tweets = ~21 hours
- 100,000 tweets = ~42 hours
Strategy 3: Selective processing
Process only recent or relevant tweets instead of your entire archive.Filter by date range
Modify the extraction step to filter tweets:This requires custom code modification. The default tool processes all tweets.
src/custom_extract.py
.env or modify config.py:
src/config.py
Filter by engagement
Prioritize analyzing tweets that are still visible:src/custom_extract.py
Strategy 4: Batch processing with checkpoints
Leverage the built-in checkpoint system for interrupted workflows.How checkpoints work
The tool automatically saves progress:checkpoint.txt:
Manual checkpoint management
- Check progress
- Resume processing
- Reset progress
- Skip to position
1420 (tweet index)Handling interruptions
The checkpoint system handles all interruption types:| Interruption Type | Behavior | Recovery |
|---|---|---|
| Manual Ctrl+C | Saves after current batch | Re-run analyze-tweets |
| API quota exceeded | Saves before failure | Wait 24h, re-run |
| Network failure | Retries 3x, then saves | Re-run when online |
| Computer crash | Last batch checkpoint lost | Resume from last save |
| Power outage | Last batch checkpoint lost | Resume from last save |
Maximum lost progress in worst case: 1 batch (10 tweets by default)
Optimizing for speed
Adjust batch size
Increase batch size for faster processing (with trade-offs):src/config.py
- Faster overall processing
- Fewer checkpoint writes
- More progress lost if interrupted
- Longer wait between progress updates
Parallel processing (advanced)
For very large archives (100,000+ tweets), consider splitting the work:Parallel processing requires code modifications to support multiple input files and checkpoint files. This is not supported out of the box.
Monitoring long-running jobs
Real-time progress tracking
Monitor progress in a separate terminal:Estimate completion time
Troubleshooting large archives
Memory issues
If you seeMemoryError:
Disk space
Check available space:- Original archive: 5-10 MB per 10,000 tweets
- Transformed CSV: ~2x original size
- Results CSV: ~1-5% of transformed size
- Total: ~3x original archive size
Connection timeouts
For unreliable connections:.env
Next steps
Basic workflow
Review the end-to-end process
Custom criteria
Fine-tune your deletion criteria