Skip to main content
The Tweet Audit Tool uses two configuration methods: environment variables for credentials and system settings, and a JSON file for analysis criteria.

Environment variables

Environment variables are loaded from a .env file in your project root. This keeps sensitive credentials like API keys out of version control.
1

Create .env file

Create a .env file in your project root:
touch .env
2

Add required variables

Add your Gemini API key and X username:
.env
# Required: Your Gemini API key from https://aistudio.google.com/app/apikey
GEMINI_API_KEY=your_api_key_here

# Required: Your X (Twitter) username
X_USERNAME=your_x_username
3

Configure optional settings

Add optional configuration for performance tuning:
.env
# Optional: Gemini model to use (default: gemini-2.5-flash)
GEMINI_MODEL=gemini-2.5-flash

# Optional: Seconds to wait between API calls (default: 1.0)
RATE_LIMIT_SECONDS=1.0

# Optional: Logging verbosity (default: INFO)
# Options: DEBUG, INFO, WARNING, ERROR, CRITICAL
LOG_LEVEL=INFO

Available environment variables

GEMINI_API_KEY
string
required
Your Google Gemini API key. Get one free at Google AI Studio.The tool will fail to start without this key:
if not settings.gemini_api_key:
    raise ValueError(
        "GEMINI_API_KEY is required. Set it via environment variable or .env file"
    )
X_USERNAME
string
required
Your X (Twitter) username, used to construct tweet URLs.Example: If your profile is https://x.com/johndoe, use:
X_USERNAME=johndoe
GEMINI_MODEL
string
default:"gemini-2.5-flash"
The Gemini model to use for analysis.Recommended options:
  • gemini-2.5-flash - Fast and cost-effective (recommended)
  • gemini-2.0-flash - Slightly older version
  • gemini-1.5-pro - More capable but slower
The model is used in analyzer.py:76:
response = self.client.models.generate_content(
    model=self.model,
    contents=prompt,
    config=genai.types.GenerateContentConfigDict(response_mime_type="application/json"),
)
RATE_LIMIT_SECONDS
float
default:"1.0"
Minimum seconds to wait between API requests to Gemini.Useful for:
  • Avoiding rate limits (increase to 2.0 or higher)
  • Reducing API costs (higher values = slower processing)
  • Testing (decrease to 0.5 for faster local development)
Implementation in analyzer.py:64-68:
def _rate_limit(self) -> None:
    elapsed = time.time() - self.last_request_time
    if elapsed < self.min_request_interval:
        time.sleep(self.min_request_interval - elapsed)
    self.last_request_time = time.time()
LOG_LEVEL
string
default:"INFO"
Controls verbosity of console output.Options:
  • DEBUG - Detailed information for debugging
  • INFO - General progress updates (recommended)
  • WARNING - Only warnings and errors
  • ERROR - Only error messages
  • CRITICAL - Only critical failures
Set in config.py:14-22:
def configure_logging() -> None:
    log_level = os.getenv("LOG_LEVEL", "INFO").upper()
    log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

    logging.basicConfig(
        level=getattr(logging, log_level, logging.INFO),
        format=log_format,
        handlers=[logging.StreamHandler(sys.stdout)],
    )

Analysis criteria configuration

The config.json file defines what makes a tweet “problematic”. This file is optional - the tool uses sensible defaults if not provided.

Creating config.json

Create a config.json file in your project root:
{
  "criteria": {
    "forbidden_words": ["damn", "wtf", "hell"],
    "topics_to_exclude": [
      "Profanity or unprofessional language",
      "Personal attacks or insults",
      "Outdated political opinions"
    ],
    "tone_requirements": [
      "Professional language only",
      "Respectful communication"
    ],
    "additional_instructions": "Flag any content that could harm professional reputation"
  }
}

Criteria fields

criteria.forbidden_words
string[]
Exact words that trigger deletion (case-insensitive).Example:
"forbidden_words": ["damn", "crypto", "wtf"]
Used in prompt construction in analyzer.py:113-115:
if settings.criteria.forbidden_words:
    words = ", ".join(settings.criteria.forbidden_words)
    criteria_parts.append(f"Contains any of these words: {words}")
Forbidden words match exact words only. “crypto” matches “Crypto is great” but not “cryptocurrency”.
criteria.topics_to_exclude
string[]
High-level content categories to flag. The AI interprets these broadly.Example:
"topics_to_exclude": [
  "Profanity or unprofessional language",
  "Personal attacks or insults",
  "Outdated political opinions"
]
Each topic becomes a numbered criterion in the AI prompt:
criteria_parts.extend(settings.criteria.topics_to_exclude)
criteria_list = "\n".join(f"{i + 1}. {c}" for i, c in enumerate(criteria_parts))
criteria.tone_requirements
string[]
Stylistic and tone-related rules for content.Example:
"tone_requirements": [
  "Professional language only",
  "Respectful communication",
  "No sarcasm or mockery"
]
These are added to the criteria list sent to the AI:
criteria_parts.extend(settings.criteria.tone_requirements)
criteria.additional_instructions
string
Free-form guidance for the AI analyzer.Example:
"additional_instructions": "Be extra cautious with tweets from 2020-2021 during controversial periods"
Appended to the prompt in analyzer.py:119-121:
additional = ""
if settings.criteria.additional_instructions:
    additional = f"\n\nAdditional guidance: {settings.criteria.additional_instructions}"

Default criteria

If config.json doesn’t exist, these defaults are used from config.py:51-64:
def _default_criteria() -> Criteria:
    return Criteria(
        forbidden_words=[],
        topics_to_exclude=[
            "Profanity or unprofessional language",
            "Personal attacks or insults",
            "Outdated political opinions",
        ],
        tone_requirements=[
            "Professional language only",
            "Respectful communication",
        ],
        additional_instructions="Flag any content that could harm professional reputation",
    )

File paths configuration

The tool uses predefined paths for data storage. These are hardcoded in config.py:34-38 but can be modified:
@dataclass
class Settings:
    tweets_archive_path: str = "data/tweets/tweets.json"
    transformed_tweets_path: str = "data/tweets/transformed/tweets.csv"
    checkpoint_path: str = "data/checkpoint.txt"
    processed_results_path: str = "data/tweets/processed/results.csv"
    base_twitter_url: str = "https://x.com"
To change data paths, modify these values in config.py before running the tool.

Batch processing configuration

The batch size controls how many tweets are processed before saving a checkpoint:
batch_size: int = 10  # Process 10 tweets per batch
Trade-offs:
  • Larger batches (20-50): Faster processing, but longer recovery if interrupted
  • Smaller batches (5-10): More frequent checkpoints, safer for large datasets
Batch size must be modified in config.py:43. There’s no environment variable for this setting.

Configuration loading order

The tool loads configuration in this order (from config.py:87-102):
1

Load environment variables

The .env file is loaded first:
from dotenv import load_dotenv
load_dotenv()
2

Apply default criteria

Default analysis criteria are set:
criteria = _default_criteria()
3

Override with config.json

If config.json exists, it overrides defaults:
if file_criteria := _load_config_file("config.json"):
    criteria = file_criteria
4

Create Settings object

Final settings combine environment variables and criteria:
return Settings(
    x_username=os.getenv("X_USERNAME", "iamuchihadan"),
    gemini_api_key=os.getenv("GEMINI_API_KEY", ""),
    gemini_model=os.getenv("GEMINI_MODEL", "gemini-2.5-flash"),
    rate_limit_seconds=float(os.getenv("RATE_LIMIT_SECONDS", "1.0")),
    criteria=criteria,
)

Validation and errors

Missing API key

If GEMINI_API_KEY is not set, the analyzer fails immediately:
if not settings.gemini_api_key:
    raise ValueError(
        "GEMINI_API_KEY is required. Set it via environment variable or .env file"
    )

Invalid config.json

If config.json exists but has invalid JSON, the tool falls back to defaults:
try:
    with open(config_path, encoding="utf-8") as f:
        data = json.load(f)
except (OSError, json.JSONDecodeError):
    return None  # Use default criteria
Test your config.json is valid JSON using:
python -m json.tool config.json

Example: Complete configuration

Here’s a complete configuration setup:
GEMINI_API_KEY=AIzaSyDxxxxxxxxxxxxxxxxxxxxxxxxx
X_USERNAME=johndoe
GEMINI_MODEL=gemini-2.5-flash
RATE_LIMIT_SECONDS=1.5
LOG_LEVEL=INFO

Next steps

Extract tweets

Learn how to extract tweets from your X archive

Customize criteria

Deep dive into criteria customization strategies

Build docs developers (and LLMs) love