GitHub Actions runs this scraper automatically every day at midnight UTC using the
update.yml workflow. You only need to run it manually if you are developing locally or contributing changes to the scraper itself.Prerequisites
- Python 3.11 or later
pippackage manager
Steps
Install dependencies
The project depends on the following packages (from
requirements.txt):certifi==2026.2.25charset-normalizer==3.4.6idna==3.11requests==2.33.0urllib3==2.6.3
What happens when you run it
main.py iterates over all 17 categories returned by ApifyScraper.get_category_list():
- Calls Apify’s Algolia search API in batches of 1,000 actors until all results are fetched.
- Deduplicates results by actor ID.
- Merges the new data with any existing JSON in the
data/directory. Ratings and review counts from existing records are preserved if the new fetch does not include them. - Sorts the merged list using the Bayesian scoring algorithm (see below).
- Writes the sorted list to
data/<slug>-agent-apis.json. - Generates a
README.mdfor the category directory (e.g.,agents-agent-apis/README.md).
README.md with updated totals and top-25 API tables for each category.
Output files are written to two locations: JSON data files in
data/ and README.md files inside each category directory (e.g., agents-agent-apis/README.md).Bayesian rating algorithm
APIs are ranked using a Bayesian average score, defined inscripts/helpers/utils.py:
R— the API’s average rating (0–5 scale)v— the number of reviews the API has receivedm = 50— the minimum votes threshold; APIs with fewer than 50 reviews are pulled toward the meanC— the mean rating across all APIs in the category
GitHub Actions workflow
The workflow file at.github/workflows/update.yml runs the scraper on a cron schedule:
workflow_dispatch trigger also allows you to run the workflow manually from the GitHub Actions UI.