Contributing guidelines

Getting started

Thank you for considering contributing to Tweet Audit Tool! This guide will help you get started with contributing code, documentation, and improvements.

Before you begin

Set up your environment

Follow the development setup guide to configure your local environment.

Familiarize yourself with the codebase

Read through the project structure and understand the code organization:

Each file has a single responsibility (models, config, storage, analyzer, application, main)
Tests mirror the source code structure
All tests should follow the Arrange-Act-Assert pattern

Check existing issues

Look for existing issues or feature requests on GitHub before starting work. This helps avoid duplicate efforts.

Pull request process

Fork and clone

Fork the repository on GitHub and clone your fork:

git clone https://github.com/YOUR_USERNAME/tweet-audit-impl.git
cd tweet-audit-impl

Create a feature branch

Create a descriptive branch name for your changes:

git checkout -b feature/add-json-export
# or
git checkout -b fix/checkpoint-corruption
# or
git checkout -b docs/update-api-examples

Use prefixes: feature/, fix/, docs/, refactor/, test/, or chore/

Make your changes

Follow the coding standards and best practices outlined below. Key points:

Write tests for new functionality
Update documentation if needed
Follow existing code style and conventions
Keep commits focused and atomic

Run quality checks

Before committing, ensure all checks pass:

# Format code
ruff format .

# Check for linting issues
ruff check --fix .

# Run tests with coverage
pytest --cov=src

All tests should pass and coverage should meet the guidelines:

Core logic: 80%+
Utilities: 60%+
CLI: 40%+

Commit your changes

Write clear, descriptive commit messages following the conventional commits format:

git add .
git commit -m "feat: add JSON output format for analysis results"

See commit message format below for details.

Push and create pull request

Push your branch and create a pull request:

git push origin feature/add-json-export

Then open a pull request on GitHub with:

Clear title describing the change
Description of what was changed and why
Reference to any related issues
Screenshots or examples if applicable

Commit message format

Follow the conventional commits specification:

<type>: <description>

[optional body]

[optional footer]

Commit types

feat - New feature
fix - Bug fix
docs - Documentation only changes
style - Code style changes (formatting, etc.)
refactor - Code restructuring without changing behavior
test - Adding or updating tests
chore - Maintenance tasks (dependencies, build, etc.)

Examples

feat: add support for JSON output format

Adds a new JSONWriter class that exports analysis results
in JSON format alongside the existing CSV format.

Code style guidelines

Imports

Organize imports in three groups:

# Standard library
import json
import os
from pathlib import Path

# Third-party packages
import google.genai as genai
from dotenv import load_dotenv

# Local modules
from models import Tweet
from config import settings

Type hints

Always use type hints for function parameters and return values:

# Good
def analyze(self, tweet: Tweet) -> AnalysisResult:
    ...

def process(tweets: list[Tweet]) -> dict[str, list[str]]:
    ...

# Bad
def analyze(self, tweet):
    ...

def process(tweets):
    ...

Naming conventions

Classes: PascalCase (e.g., TweetAnalyzer, CSVWriter)
Functions/methods: snake_case (e.g., analyze_tweet, write_results)
Constants: UPPER_SNAKE_CASE (e.g., MAX_RETRIES, DEFAULT_BATCH_SIZE)
Private methods: _leading_underscore (e.g., _build_prompt, _rate_limit)

Docstrings

Write docstrings for public functions and classes:

def analyze(self, tweet: Tweet) -> AnalysisResult:
    """Analyze a tweet using Gemini AI.
    
    Args:
        tweet: The tweet to analyze
        
    Returns:
        AnalysisResult with decision and tweet URL
        
    Raises:
        ValueError: If API response is invalid
        RuntimeError: If API request fails
    """
    ...

Context managers

Use context managers for resource management:

# Good
with CSVWriter(path) as writer:
    writer.write_tweets(tweets)

# Bad
writer = CSVWriter(path)
writer.open()
writer.write_tweets(tweets)
writer.close()  # Easy to forget!

Fail fast

Validate inputs early and raise clear exceptions:

# Good
def analyze(self, tweet: Tweet) -> AnalysisResult:
    if not tweet.content:
        raise ValueError("Tweet content cannot be empty")
    # ... rest of code

# Bad
def analyze(self, tweet: Tweet) -> AnalysisResult:
    if tweet.content:  # Silent failure if empty
        # ... process
    return None  # Caller doesn't know it failed

Development workflow

Adding new features

Follow this workflow when adding functionality:

Update models (if needed)

If you need new data structures, add them to models.py:

src/models.py

@dataclass(frozen=True)
class AnalysisResult:
    tweet_url: str
    decision: Decision
    reason: str = ""  # Add new field

Add core functionality

Implement the feature in the appropriate module:

src/storage.py

class JSONWriter:
    def write_results(self, results: list[AnalysisResult]) -> None:
        data = [
            {
                "url": r.tweet_url,
                "decision": r.decision.value,
                "reason": r.reason
            }
            for r in results
        ]
        with open(self.path, 'w') as f:
            json.dump(data, f, indent=2)

Write tests

Add comprehensive tests for the new functionality:

tests/test_storage.py

def test_should_write_results_to_json(tmp_path):
    output_path = tmp_path / "results.json"
    results = [
        AnalysisResult(
            tweet_url="https://x.com/u/s/1",
            decision=Decision.DELETE,
            reason="Profanity"
        )
    ]
    
    with JSONWriter(str(output_path)) as writer:
        writer.write_results(results)
    
    with open(output_path) as f:
        data = json.load(f)
    
    assert len(data) == 1
    assert data[0]["decision"] == "DELETE"
    assert data[0]["reason"] == "Profanity"

Integrate with application layer

Wire up the feature in application.py:

src/application.py

def analyze_tweets(self, output_format: str = "csv") -> Result:
    # ... existing code ...
    
    if output_format == "json":
        with JSONWriter(settings.json_results_path) as writer:
            writer.write_results(results)
    else:
        with CSVWriter(settings.processed_results_path) as writer:
            # ... existing code ...

Update CLI (if applicable)

Add command-line options if needed:

src/main.py

parser.add_argument(
    "--format",
    choices=["csv", "json"],
    default="csv",
    help="Output format"
)

result = app.analyze_tweets(output_format=args.format)

Update documentation

Document the new feature in README.md and other relevant docs.

Common tasks

Add new analysis criteria type

Update Criteria model in config.py
Update _build_prompt in analyzer.py to include new criteria
Update config.example.json with example
Add tests in test_analyzer.py
Update README.md with new criteria explanation

Change Gemini model

Update the model in .env:

GEMINI_MODEL=gemini-1.5-pro  # Use Pro instead of Flash

Or programmatically in config.py:

gemini_model: str = "gemini-1.5-pro"

Add progress bar

Install tqdm:

pip install tqdm

Use in application.py:

from tqdm import tqdm

def analyze_tweets(self) -> Result:
    tweets = parser.parse()
    
    with tqdm(total=len(tweets), desc="Analyzing") as pbar:
        for tweet in tweets:
            result = self.analyzer.analyze(tweet)
            pbar.update(1)

Testing guidelines

Test-driven development

Always write tests before or alongside implementation:

Write a failing test that defines the desired behavior
Run the test to verify it fails (red)
Implement the minimal code to make it pass (green)
Refactor while keeping tests green
Repeat

Test coverage

Ensure adequate test coverage:

# Check coverage
pytest --cov=src --cov-report=term-missing

# Generate HTML report
pytest --cov=src --cov-report=html
open htmlcov/index.html

Writing good tests

Test one thing

Each test should verify a single behavior or scenario

Use descriptive names

Test names should clearly describe what’s being tested

Mock external dependencies

Mock APIs, file systems, and external services

Test edge cases

Include tests for error conditions and boundary cases

Code review process

When reviewing or submitting code:

What reviewers look for

Functionality - Does the code work as intended?
Tests - Are there adequate tests with good coverage?
Code style - Does it follow project conventions?
Documentation - Are changes documented appropriately?
Performance - Are there any obvious performance issues?
Security - Are there any security concerns?

Responding to feedback

Address all review comments
Ask for clarification if feedback is unclear
Push additional commits to the same branch
Re-request review when ready

Be open to feedback and remember that code review improves code quality and helps everyone learn.

Documentation contributions

Documentation improvements are highly valued:

Fix typos or unclear explanations
Add examples or use cases
Improve API documentation
Write tutorials or guides
Update outdated information

Documentation follows the same pull request process as code changes.

Best practices

Keep functions small

Each function should do one thing well:

# Good: Each function has a single responsibility
def analyze_tweets(self) -> Result:
    tweets = self._load_tweets()
    results = self._process_tweets(tweets)
    self._save_results(results)
    return Result(success=True)

# Bad: God function
def analyze_tweets(self) -> Result:
    # 200 lines of mixed concerns
    ...

Write self-documenting code

Use clear variable and function names:

# Good
def should_delete_tweet(tweet: Tweet, criteria: Criteria) -> bool:
    return contains_forbidden_word(tweet, criteria.forbidden_words)

# Bad
def check(t, c):
    return proc(t, c.fw)

Avoid premature optimization

Focus on correctness and clarity first:

Make it work (correctness)
Make it right (clean code)
Make it fast (only if needed)

Handle errors explicitly

Don’t silently ignore errors:

# Good
try:
    result = api.call()
except APIError as e:
    logger.error(f"API call failed: {e}")
    raise

# Bad
try:
    result = api.call()
except:
    pass  # What went wrong?

Resources

Getting help

If you need help:

Read the docs - Check DEVELOPMENT.md and README.md
Review tests - Tests show usage examples
Ask questions - Open a GitHub discussion
Report bugs - Open a GitHub issue with details

Thank you!

Your contributions make this project better for everyone. We appreciate your time and effort in improving Tweet Audit Tool.

Technical Documentation

Development

Getting started

Before you begin

Pull request process

Commit message format

Commit types

Examples

Code style guidelines

Imports

Type hints

Naming conventions

Docstrings

Context managers

Fail fast

Development workflow

Adding new features

Common tasks

Testing guidelines

Test-driven development

Test coverage

Writing good tests

Test one thing

Use descriptive names

Mock external dependencies

Test edge cases

Code review process

What reviewers look for

Responding to feedback

Documentation contributions

Best practices

Resources

Getting help

Thank you!

Build docs developers (and LLMs) love

Technical Documentation

Development

​Getting started

​Before you begin

​Pull request process

​Commit message format

​Commit types

​Examples

​Code style guidelines

​Imports

​Type hints

​Naming conventions

​Docstrings

​Context managers

​Fail fast

​Development workflow

​Adding new features

​Common tasks

​Testing guidelines

​Test-driven development

​Test coverage

​Writing good tests

Test one thing

Use descriptive names

Mock external dependencies

Test edge cases

​Code review process

​What reviewers look for

​Responding to feedback

​Documentation contributions

​Best practices

​Resources

​Getting help

​Thank you!

Build docs developers (and LLMs) love

Getting started

Before you begin

Pull request process

Commit message format

Commit types

Examples

Code style guidelines

Imports

Type hints

Naming conventions

Docstrings

Context managers

Fail fast

Development workflow

Adding new features

Common tasks

Testing guidelines

Test-driven development

Test coverage

Writing good tests

Code review process

What reviewers look for

Responding to feedback

Documentation contributions

Best practices

Resources

Getting help

Thank you!