Learn how to analyze tweets with Google Gemini AI and generate a list of tweets to delete based on your custom criteria
Once you’ve extracted tweets from your archive, you can analyze them using Google’s Gemini AI to identify which tweets no longer align with your values.
The analysis process follows these steps from application.py:64-122:
1
Load extracted tweets
The tool reads your transformed CSV file:
logger.info(f"Loading tweets from {settings.transformed_tweets_path}")parser = CSVParser(settings.transformed_tweets_path)tweets = parser.parse()if not tweets: logger.warning("No tweets found to analyze") return Result(success=True, count=0)logger.info(f"Loaded {len(tweets)} tweets for analysis")
2
Initialize checkpoint system
The checkpoint allows resuming if interrupted:
with Checkpoint(settings.checkpoint_path) as checkpoint: start_index = checkpoint.load() logger.info(f"Resuming from tweet index {start_index}")
On first run, start_index is 0. On subsequent runs, it’s the last saved position.
3
Process tweets in batches
Tweets are analyzed in batches (default: 10 tweets per batch):
for i in range(start_index, len(tweets), settings.batch_size): batch = tweets[i : i + settings.batch_size] batch_num = (i // settings.batch_size) + 1 total_batches = (len(tweets) + settings.batch_size - 1) // settings.batch_size logger.info( f"Processing batch {batch_num}/{total_batches} " f"(tweets {i + 1}-{min(i + len(batch), len(tweets))} of {len(tweets)})" )
4
Skip retweets
Retweets are automatically skipped:
for tweet in batch: if _is_retweet(tweet): continue
From analyzer.py:107-135, the prompt is built like this:
def _build_prompt(self, tweet: Tweet) -> str: criteria_parts = [] # Add topics and tone requirements criteria_parts.extend(settings.criteria.topics_to_exclude) criteria_parts.extend(settings.criteria.tone_requirements) # Add forbidden words if settings.criteria.forbidden_words: words = ", ".join(settings.criteria.forbidden_words) criteria_parts.append(f"Contains any of these words: {words}") # Format as numbered list criteria_list = "\n".join(f"{i + 1}. {c}" for i, c in enumerate(criteria_parts)) # Add additional instructions additional = "" if settings.criteria.additional_instructions: additional = f"\n\nAdditional guidance: {settings.criteria.additional_instructions}" return f"""You are evaluating tweets for a professional's Twitter cleanup.Tweet ID: {tweet.id}Tweet: "{tweet.content}"Mark for deletion if it violates any of these criteria:{criteria_list}{additional}Respond in JSON format:{{ "decision": "DELETE" or "KEEP", "reason": "brief explanation"}}"""
You are evaluating tweets for a professional's Twitter cleanup.Tweet ID: 1234567890Tweet: "This crypto project is going to the moon! 🚀"Mark for deletion if it violates any of these criteria:1. Profanity or unprofessional language2. Personal attacks or insults3. Outdated political opinions4. Professional language only5. Respectful communication6. Contains any of these words: crypto, NFT, web3Additional guidance: Flag any content that could harm professional reputationRespond in JSON format:{ "decision": "DELETE" or "KEEP", "reason": "brief explanation"}
Gemini returns a JSON decision that’s parsed and validated:
if not response.text: raise ValueError(f"Empty response from Gemini for tweet {tweet.id}")try: data = json.loads(response.text)except json.JSONDecodeError as e: raise ValueError( f"Invalid Gemini response for tweet {tweet.id}: {e}" ) from etry: decision = Decision(data["decision"].upper())except KeyError as e: raise ValueError( f"Missing decision field in Gemini response for tweet {tweet.id}" ) from ereturn AnalysisResult(tweet_url=settings.tweet_url(tweet.id), decision=decision)
def write_result(self, result: AnalysisResult) -> None: if not self.writer: raise RuntimeError("CSVWriter is not open") if not self.header_written: self.writer.writerow([RESULT_CSV_URL_COLUMN, RESULT_CSV_DELETED_COLUMN]) self.header_written = True self.writer.writerow([result.tweet_url, CSV_BOOL_FALSE]) self.file.flush()
Only tweets with Decision.DELETE are written to the results file. Tweets marked KEEP are not recorded.