High Level Overview
The internship list repository uses a combination of structured data storage, GitHub Actions automation, and external microservices to maintain an up-to-date database of tech internships.Data Flow
Data sources
Internships come from two sources:
- GitHub issue forms (
new_internshipandedit_internship) - External microservice that fetches from Simplify’s database
Central storage
All internships are stored in
.github/scripts/listings.json as the single source of truth.Approval workflow
When an
approved label is attached to a contribution issue, GitHub Actions automatically process it.Listings.json Storage
Purpose
The.github/scripts/listings.json file serves as:
- The single source of truth for all internship data
- A structured, machine-readable format
- Version-controlled history of all changes
- Input for README generation
How It’s Updated
The file is edited by:-
GitHub contribution workflow
- User submits
new_internshiporedit_internshipissue - Maintainer adds
approvedlabel - GitHub Action parses issue form data
- Action updates
listings.jsonwith new entry
- User submits
-
External microservice
- Runs daily on a schedule
- Fetches new internships from Simplify’s database
- Adds them to
listings.json - Commits changes with “Simplify” as source
GitHub Actions Workflow
The repository uses three main GitHub Actions workflows:1. Contribution Approved
File:.github/workflows/contribution_approved.yml
Trigger: When approved label is added to an issue
Process:
- Reads the issue form data
- Validates the submission
- Runs
main.py contribution processto updatelistings.json - Commits changes with contributor attribution
- Auto-closes the issue
- If failure occurs, comments on issue with error details
2. Update READMEs
File:.github/workflows/update_readmes.yml
Trigger:
- Changes to
listings.json - Manual workflow dispatch
- Reads all data from
listings.json - Categorizes internships by type
- Runs
main.py readme updateto regenerate README tables - Applies special indicators (🔥, 🎓, 🛂, 🇺🇸)
- Commits and pushes updated README files
3. Lint
File:.github/workflows/lint.yml
Trigger: Changes to Python code
Process:
- Runs
rufffor code style checks - Runs
mypyfor type checking - Reports any violations
Microservice Integration
External Script
A private script runs externally once per day to:-
Pull new internships
- Connects to Simplify’s database
- Identifies new Summer 2026 tech internships
- Extracts relevant data (company, title, location, URL, etc.)
-
Add to repository
- Formats data according to
listings.jsonschema - Adds entries with
"source": "Simplify" - Commits changes to the repository
- Formats data according to
-
Trigger README update
- Commit triggers the “Update READMEs” workflow
- New Simplify internships appear in README automatically
Why External?
The script runs externally (not as a GitHub Action) because:- It requires access to Simplify’s private database
- It needs authentication credentials that shouldn’t be in the public repo
- It runs on Simplify’s infrastructure with their security model
Contributor Attribution
The system tracks where each internship came from:- Community contributions:
sourcefield contains the GitHub username of the contributor - Simplify additions:
sourcefield contains"Simplify"
- Commit author: The contributor’s GitHub username
- Commit email: The contributor’s GitHub email
- Commit message: Indicates what was added/changed
Data Integrity
Several mechanisms ensure data quality:- Schema validation: CLI tools validate JSON schema
- Duplicate detection: System checks for duplicate URLs and IDs
- Category validation: Only valid categories are accepted
- Manual review: Maintainers review contributions before approval
- Automated testing: Lint workflow catches code issues
Performance Considerations
The architecture is designed for efficiency:- Single source of truth:
listings.jsonprevents data inconsistency - Automatic README generation: No manual README edits needed
- Incremental updates: Only changed data triggers workflows
- Fast parsing: JSON format is quick to read and write
- GitHub Action caching: Dependencies are cached for faster runs