Directory structure
Core files
main.py
The CLI entry point using Typer. Structure:- Creates three command groups:
readme,contribution,listings - Each command is a thin wrapper that calls functions from
list_updater/ - Handles argument parsing and validation
pyproject.toml
Project configuration and dependencies. Key sections:[project]- Package metadata (name, version, Python requirement)dependencies- Runtime dependencies (typer)[dependency-groups]- Development dependencies (ruff, mypy, taskipy)[tool.taskipy.tasks]- Task shortcuts for linting and type checking[tool.ruff]- Linter configuration[tool.mypy]- Type checker configuration
type statement, | union syntax)
Library modules
list_updater/init.py
Public API that exports all functions used bymain.py and GitHub Actions workflows.
Exports:
- Command implementations (8 functions starting with
cmd_) - Constants (categories, thresholds, button templates)
- Utility functions (filtering, formatting, GitHub Actions helpers)
list_updater/commands.py
Core command implementations for README updates and contribution processing. Functions:cmd_readme_update()- Generate all README files from listings.jsoncmd_contribution_process(event_file)- Process new/edit internship issuescmd_listings_mark_inactive(event_file)- Bulk mark listings as inactivecmd_listings_remove(...)- Remove by URL/ID with hide/permanent options
- Parses GitHub issue form data
- Validates and cleans user input
- Updates listings.json atomically
- Sets GitHub Actions output variables
list_updater/analytics.py
Analytics and data quality commands. Functions:cmd_listings_stats(json_output)- Statistics and countscmd_listings_validate(fix)- Schema and data validationcmd_listings_search(...)- Filter and search listingscmd_listings_diff(since, commit)- Show changes over timecmd_listings_fix(...)- Interactive issue fixing
- Comprehensive validation (schema, duplicates, categories)
- Intelligent duplicate resolution (prefers Simplify source)
- Auto-classification suggestions for invalid categories
- Git integration for diff commands
list_updater/category.py
Job category classification and management. Key functions:classify_job_category(listing)- ML-based category classification from job titlecreate_category_table(listings, category)- Generate markdown table for a categoryensure_categories(listings)- Validate/fix category fields
- “software”, “swe”, “backend” → “Software Engineering”
- “pm”, “product” → “Product Management”
- “data”, “ml”, “ai” → “Data Science, AI & Machine Learning”
- “quant”, “finance” → “Quantitative Finance”
- “hardware”, “embedded” → “Hardware Engineering”
list_updater/constants.py
Configuration constants used throughout the codebase. Key constants:CATEGORIES- Category definitions with emojis and namesFAANG_PLUS- List of top-tier companies (for 🔥 indicator)BLOCKED_COMPANIES- Companies to exclude from READMELISTING_SCHEMA_PROPS- Required fields for validationSIMPLIFY_BUTTON/SHORT_APPLY_BUTTON/LONG_APPLY_BUTTON- Apply button HTMLSIMPLIFY_INACTIVE_THRESHOLD_MONTHS- When to auto-mark Simplify listings inactiveNON_SIMPLIFY_INACTIVE_THRESHOLD_MONTHS- When to auto-mark community listings inactive
list_updater/formatter.py
Markdown and HTML formatting utilities. Key functions:create_md_table(listings, include_age)- Generate markdown table from listingsget_link(listing)- Format company name with link and indicators (🔥, 🎓, etc.)get_locations(listing)- Format location string with indicators (🛂, 🇺🇸)convert_markdown_to_html(md)- Convert markdown to HTML for meta tagsget_minimal_css()- CSS for HTML version of README
- 🔥 - FAANG+ company
- 🎓 - Advanced degree required (Master’s/MBA/PhD)
- 🛂 - Does not offer sponsorship
- 🇺🇸 - U.S. citizenship required
list_updater/github.py
GitHub Actions integration utilities. Functions:set_output(name, value)- Set GitHub Actions output variablefail(message)- Output error and set error_message output
list_updater/listings.py
Listing data operations and utilities. Key functions:get_listings_from_json()- Load listings from.github/scripts/listings.jsoncheck_schema(listings)- Validate required fieldssort_listings(listings)- Sort by company name and datefilter_active(listings)- Filter to active onlyfilter_summer(listings, year, earliest_date)- Filter Summer term listingsfilter_off_season(listings)- Filter Fall/Winter/Spring listingsmark_stale_listings(listings)- Auto-mark old listings as inactive
- Excludes blocked companies
- Excludes hidden listings (
is_visible: false) - Excludes listings with empty terms
- Handles term matching (“Summer 2026”, “Summer 2025/2026”)
list_updater/readme_generator.py
README file generation and embedding. Key functions:embed_table(listings, filename, active_only, inactive_only, off_season)- Generate and embed tables into READMEcheck_and_insert_warning(content)- Add file size warning if approaching GitHub limit
- Reads README template
- Generates category tables
- Embeds tables between markers
- Checks file size against GitHub’s 100MB limit
- Writes updated README
Data flow
Contribution flow
- User submits GitHub issue (new_internship or edit_internship)
- Maintainer adds
approvedlabel - GitHub Action triggers →
contribution process $GITHUB_EVENT_PATH - Command parses issue body and updates listings.json
- GitHub Action commits with contributor attribution
- README update workflow triggers automatically
README update flow
- Listings.json changes (via contribution or external script)
- GitHub Action triggers →
readme update - Command loads, validates, sorts, and filters listings
- Generates three README files with category tables
- GitHub Action commits changes
Testing
While there are no automated tests currently, you can test commands locally: Validate data:Related documentation
- Contributing to the CLI - Development workflow
- Testing - Testing best practices