extract-tweets command reads your Twitter archive JSON file and converts it to a structured CSV format suitable for analysis.
Usage
What it does
Theextract-tweets command performs the following operations:
- Reads the Twitter archive JSON file from
data/tweets/tweets.json - Parses the tweet data using the JSON parser
- Transforms tweets into a normalized CSV format
- Writes the output to
data/tweets/transformed/tweets.csv
This is always the first command you should run. The
analyze-tweets command depends on the CSV file generated by this command.Input requirements
Your Twitter archive JSON file. This file is obtained by:
- Requesting your Twitter archive from X.com
- Downloading the archive ZIP file (after 24-48 hours)
- Extracting the ZIP and copying
data/tweets.jsonto your project’sdata/tweets/directory
Archive structure
Your data directory should look like:Output
A CSV file containing all extracted tweets with normalized fields including:
id: Tweet IDcontent: Tweet textcreated_at: Timestamp- Other metadata fields
Success output
When the extraction succeeds, you’ll see:Error handling
The command handles several error scenarios:File not found
Invalid format
Permission denied
Unexpected errors
Configuration
The command uses these settings from your configuration (src/config.py:26):Location of your Twitter archive JSON file
Output location for the extracted CSV file
Implementation details
The extraction process:- Initializes a
JSONParserwith the archive path (src/application.py:50) - Parses all tweets from the JSON structure (src/application.py:51)
- Opens a CSV writer with the output path (src/application.py:54)
- Writes all tweets to CSV format (src/application.py:55)
- Returns a success result with the tweet count (src/application.py:60)
The extraction is performed in-memory. For very large archives (100k+ tweets), ensure you have sufficient available RAM.
Example workflow
Here’s a complete example of preparing and extracting your archive:Next steps
After successfully extracting your tweets:- Review the generated CSV file to ensure tweets were extracted correctly
- Configure your analysis criteria in
config.json - Run the
analyze-tweetscommand to process the tweets
Analyze tweets
Continue to the next step: analyzing your extracted tweets with AI
Logging
Detailed logs are written during execution. Set your desired log level in.env:
- Archive file path being read
- Number of tweets found
- Output file path
- Any errors encountered
Performance
Extraction performance varies by archive size:| Archive Size | Approximate Time | Memory Usage |
|---|---|---|
| 1,000 tweets | < 1 second | ~10 MB |
| 10,000 tweets | 1-2 seconds | ~50 MB |
| 50,000 tweets | 5-10 seconds | ~200 MB |
| 100,000+ tweets | 20+ seconds | ~500+ MB |