Criteria customization - Tweet Audit Tool

The quality of your tweet analysis depends entirely on how well you define your criteria. This guide covers advanced strategies for customizing your analysis rules.

Understanding criteria types

The tool uses four types of criteria from config.py:26-30:

@dataclass
class Criteria:
    additional_instructions: str = ""
    forbidden_words: list[str] = field(default_factory=list)
    topics_to_exclude: list[str] = field(default_factory=list)
    tone_requirements: list[str] = field(default_factory=list)

Each type serves a different purpose:

Forbidden words

Exact string matching for specific words or phrases

Topics to exclude

High-level content categories interpreted by AI

Tone requirements

Stylistic and communication style rules

Additional instructions

Free-form guidance for edge cases

Forbidden words strategy

How it works

Forbidden words trigger exact, case-insensitive matching. From analyzer.py:113-115:

if settings.criteria.forbidden_words:
    words = ", ".join(settings.criteria.forbidden_words)
    criteria_parts.append(f"Contains any of these words: {words}")

Use cases

Profanity filter
Keyword removal
Brand/product mentions

Remove tweets with explicit language:

config.json

{
  "criteria": {
    "forbidden_words": [
      "damn",
      "hell",
      "wtf",
      "shit",
      "fuck"
    ]
  }
}

This matches exact words only. “damn” matches “Damn!” but not “condemned”.

Remove tweets about specific topics:

config.json

{
  "criteria": {
    "forbidden_words": [
      "crypto",
      "NFT",
      "web3",
      "blockchain",
      "HODL",
      "wagmi",
      "gm"
    ]
  }
}

Great for removing buzzwords from a specific era.

Remove tweets mentioning competitors or past affiliations:

config.json

{
  "criteria": {
    "forbidden_words": [
      "CompanyX",
      "ProductY",
      "BrandZ"
    ]
  }
}

Best practices

Start with obvious terms

Begin with clear-cut words you definitely want to remove:

"forbidden_words": ["wtf", "lmao", "yolo"]

Test on a sample

Run analysis on 50-100 tweets to see what gets caught:

# Process first 100 tweets
python src/main.py analyze-tweets
# Stop after batch 10, review results

Refine iteratively

Add more words based on what you missed:

"forbidden_words": ["wtf", "lmao", "yolo", "omg", "smh"]

Consider case variations

You don’t need case variations - matching is case-insensitive:

// ✅ Good
"forbidden_words": ["crypto"]

// ❌ Redundant
"forbidden_words": ["crypto", "Crypto", "CRYPTO"]

Keep forbidden words list under 50 items for optimal performance. For broader matching, use topics instead.

Topics to exclude strategy

How it works

Topics are interpreted contextually by the AI. From analyzer.py:110:

criteria_parts.extend(settings.criteria.topics_to_exclude)

These become numbered criteria in the prompt:

Profanity or unprofessional language
Personal attacks or insults
Cryptocurrency speculation

Effective topic definitions

Professional cleanup

Remove content that damages professional image:

config.json

{
  "criteria": {
    "topics_to_exclude": [
      "Profanity or unprofessional language",
      "Personal attacks or insults directed at individuals",
      "Complaints about employers or coworkers",
      "Excessive personal drama or venting",
      "Controversial political statements",
      "Inappropriate jokes or humor"
    ]
  }
}

Political neutrality

Remove all political content:

config.json

{
  "criteria": {
    "topics_to_exclude": [
      "Political opinions or election commentary",
      "Partisan statements about political parties",
      "Policy advocacy or activism",
      "Government or political figure criticism",
      "Political memes or jokes"
    ]
  }
}

Technology focus

Keep only technical/professional content:

config.json

{
  "criteria": {
    "topics_to_exclude": [
      "Non-technical personal updates",
      "Entertainment or pop culture commentary",
      "Sports opinions or fandom",
      "Food, travel, or lifestyle content",
      "General social media conversations"
    ]
  }
}

Outdated opinions

Remove content that doesn’t reflect current views:

config.json

{
  "criteria": {
    "topics_to_exclude": [
      "Cryptocurrency or NFT enthusiasm from 2021-2022",
      "Hot takes or strong opinions that I no longer hold",
      "Technology predictions that proved incorrect",
      "Trend-chasing or hype-driven content",
      "Overly optimistic or pessimistic forecasts"
    ]
  }
}

Writing effective topic descriptions

Be specific

❌ Bad: “Bad tweets”
✅ Good: “Tweets containing personal attacks on individuals”

Use action-oriented language

❌ Bad: “Politics”
✅ Good: “Political opinions or partisan statements”

Provide context

❌ Bad: “Old content”
✅ Good: “Technology predictions from before 2020 that proved incorrect”

Avoid ambiguity

❌ Bad: “Weird stuff”
✅ Good: “Unprofessional humor or inappropriate jokes”

Tone requirements strategy

How it works

Tone requirements define stylistic standards. From analyzer.py:111:

criteria_parts.extend(settings.criteria.tone_requirements)

Effective tone rules

{
  "criteria": {
    "tone_requirements": [
      "Professional and courteous language only",
      "Respectful disagreement without personal attacks",
      "Evidence-based claims with sources when possible",
      "Constructive criticism rather than negativity",
      "Thoughtful communication, not reactive hot takes"
    ]
  }
}

Tone vs. topic

Use topics for WHAT
Use tone for HOW

What the tweet is about:

"topics_to_exclude": [
  "Cryptocurrency speculation",  // WHAT: The subject matter
  "Political opinions"            // WHAT: The content type
]

How the tweet is written:

"tone_requirements": [
  "Respectful communication",     // HOW: Manner of expression
  "Evidence-based claims"         // HOW: Style of argumentation
]

Additional instructions strategy

How it works

Free-form guidance for the AI. From analyzer.py:119-121:

additional = ""
if settings.criteria.additional_instructions:
    additional = f"\n\nAdditional guidance: {settings.criteria.additional_instructions}"

Effective uses

Temporal filtering

Target content from specific time periods:

"additional_instructions": "Be especially strict with tweets from 2020-2021 during the pandemic and crypto boom. Content from this period often doesn't reflect my current views."

Threshold guidance

Set the bar for deletion:

"additional_instructions": "When in doubt, mark for deletion. I prefer a clean timeline over preserving borderline content. Err on the side of caution."

Or the opposite:

"additional_instructions": "Only flag tweets that are clearly problematic. Preserve content unless it strongly violates criteria. When uncertain, keep it."

Context-specific rules

Handle special cases:

"additional_instructions": "Technical tweets about blockchain technology are fine - only flag cryptocurrency speculation or financial advice. Distinguish between technology discussion and hype."

Reputation focus

Emphasize the goal:

"additional_instructions": "Focus on professional reputation protection. Would a future employer, client, or colleague be concerned by this tweet? If yes, flag it."

Complete examples by use case

Professional branding

For consultants, job seekers, or public figures:

config.json

{
  "criteria": {
    "forbidden_words": [
      "damn", "hell", "wtf", "crap", "sucks"
    ],
    "topics_to_exclude": [
      "Profanity or unprofessional language",
      "Personal attacks on individuals or companies",
      "Complaints about employers or coworkers",
      "Controversial political or religious opinions",
      "Excessive personal drama or life updates",
      "Inappropriate jokes or off-color humor"
    ],
    "tone_requirements": [
      "Professional and courteous language",
      "Respectful disagreement without insults",
      "Constructive rather than purely critical",
      "Thoughtful analysis over hot takes"
    ],
    "additional_instructions": "Flag any content that could negatively impact professional opportunities. Would I want a hiring manager or client to see this? If no, delete it."
  }
}

Technology focus

For developers focusing on technical content:

config.json

{
  "criteria": {
    "forbidden_words": [
      "crypto", "NFT", "web3", "HODL", "wagmi", "gm"
    ],
    "topics_to_exclude": [
      "Cryptocurrency or NFT speculation",
      "Non-technical personal life updates",
      "Political opinions unrelated to technology policy",
      "Sports, entertainment, or pop culture",
      "Food, travel, or lifestyle content",
      "Clickbait or engagement-bait posts"
    ],
    "tone_requirements": [
      "Technical accuracy and precision",
      "Objective analysis over emotional reactions",
      "Nuanced discussion of trade-offs",
      "Educational or informative content",
      "Professional communication standards"
    ],
    "additional_instructions": "Keep only tweets about software development, technology, engineering, or related professional topics. Remove everything else to maintain a focused technical profile."
  }
}

Political neutrality

For removing all political content:

config.json

{
  "criteria": {
    "forbidden_words": [],
    "topics_to_exclude": [
      "Political opinions or partisan statements",
      "Election commentary or predictions",
      "Policy advocacy or activism",
      "Criticism of political figures or parties",
      "Social issues with political dimensions",
      "Government actions or legislation commentary"
    ],
    "tone_requirements": [
      "Non-partisan communication",
      "Objective analysis over opinion",
      "Professional neutrality maintained"
    ],
    "additional_instructions": "Remove all political content regardless of viewpoint. Keep technology, professional, and neutral educational content only."
  }
}

Era cleanup

For removing content from a specific period:

config.json

{
  "criteria": {
    "forbidden_words": [
      "COVID", "pandemic", "quarantine", "lockdown",
      "crypto", "NFT", "web3", "metaverse"
    ],
    "topics_to_exclude": [
      "COVID-19 or pandemic-related hot takes",
      "Cryptocurrency or NFT enthusiasm",
      "Predictions about web3 or metaverse",
      "Work-from-home lifestyle tweets",
      "Quarantine or lockdown content",
      "Technology hype from 2020-2022 that aged poorly"
    ],
    "tone_requirements": [
      "Timeless content that remains relevant",
      "Measured opinions rather than reactive takes",
      "Evidence-based rather than speculative"
    ],
    "additional_instructions": "Focus on removing dated content from 2020-2022 that was reactive to temporary circumstances or speculative hype cycles. Preserve evergreen technical and professional content."
  }
}

Testing and iteration

Start with a test set

Create a small sample

Extract your full archive but only analyze a sample:

# Extract all tweets
python src/main.py extract-tweets

# Edit checkpoint to stop early
echo "50" > data/checkpoint.txt

# Analyze first 50 tweets
python src/main.py analyze-tweets

Review results

Check what was flagged:

# See all flagged tweets
cat data/tweets/processed/results.csv

# Open each URL in browser and review

Adjust criteria

Based on results:

Too many false positives: Make criteria more specific
Missing obvious deletions: Add forbidden words or topics
Borderline cases: Refine tone requirements or add additional instructions

Reset and retry

Clear results and retest:

# Remove previous results
rm data/tweets/processed/results.csv
rm data/checkpoint.txt

# Test again with updated config
python src/main.py analyze-tweets

Iteration checklist

AI analysis has inherent variability. Running the same tweet twice might yield different results. Focus on overall patterns, not individual edge cases.

Advanced techniques

Tiered criteria sets

Create multiple config files for different passes:

# First pass: Obvious deletions
mv config.json config-aggressive.json
python src/main.py analyze-tweets

# Second pass: Borderline content
mv config-aggressive.json config-backup.json
mv config-moderate.json config.json
rm data/checkpoint.txt  # Start over
python src/main.py analyze-tweets

Negative testing

Test what doesn’t get flagged:

# After analysis, find tweets NOT in results
import pandas as pd

all_tweets = pd.read_csv('data/tweets/transformed/tweets.csv')
flagged = pd.read_csv('data/tweets/processed/results.csv')

# Extract IDs from URLs in flagged tweets
flagged['id'] = flagged['tweet_url'].str.split('/').str[-1]

kept = all_tweets[~all_tweets['id'].isin(flagged['id'])]
print(f"Kept {len(kept)} tweets")
print(kept.sample(20))  # Review random kept tweets

A/B testing criteria

Compare different criteria configurations:

Run analysis with Config A, save results as results-a.csv
Delete checkpoint and results
Run analysis with Config B, save results as results-b.csv
Compare which tweets differ between configurations

Common pitfalls

Overly broad forbidden words

❌ Problem: Adding “fire” removes “fired up about this project”
✅ Solution: Use topics for contextual matching instead

// ❌ Don't do this
"forbidden_words": ["fire", "kill", "dead"]

// ✅ Do this instead
"topics_to_exclude": ["Violent or aggressive language"]

Redundant criteria

❌ Problem: Same concept repeated multiple ways

"topics_to_exclude": [
  "Political opinions",
  "Political statements",  // Redundant
  "Politics",              // Redundant
  "Partisan content"       // Redundant
]

✅ Solution: Consolidate to one clear description

"topics_to_exclude": [
  "Political opinions or partisan statements"
]

Vague additional instructions

❌ Problem: Instructions too open to interpretation

"additional_instructions": "Delete bad tweets"

✅ Solution: Be specific about what “bad” means

"additional_instructions": "Flag any content that could harm my professional reputation as a software engineer, including unprofessional language, controversial opinions, or outdated technical takes"

Get Started

Guides

Advanced

Support

​Understanding criteria types

Forbidden words

Topics to exclude

Tone requirements

Additional instructions

​Forbidden words strategy

​How it works

​Use cases

​Best practices

​Topics to exclude strategy

​How it works

​Effective topic definitions

​Writing effective topic descriptions

​Tone requirements strategy

​How it works

​Effective tone rules

​Tone vs. topic

​Additional instructions strategy

​How it works

​Effective uses

​Complete examples by use case

​Professional branding

​Technology focus

​Political neutrality

​Era cleanup

​Testing and iteration

​Start with a test set

​Iteration checklist

​Advanced techniques

​Tiered criteria sets

​Negative testing

​A/B testing criteria

​Common pitfalls

​Next steps

Run analysis

Resume analysis

Build docs developers (and LLMs) love

Understanding criteria types

Forbidden words strategy

How it works

Use cases

Best practices

Topics to exclude strategy

How it works

Effective topic definitions

Writing effective topic descriptions

Tone requirements strategy

How it works

Effective tone rules

Tone vs. topic

Additional instructions strategy

How it works

Effective uses

Complete examples by use case

Professional branding

Technology focus

Political neutrality

Era cleanup

Testing and iteration

Start with a test set

Iteration checklist

Advanced techniques

Tiered criteria sets

Negative testing

A/B testing criteria

Common pitfalls

Next steps