Skip to main content
The scraper fetches actor data from Apify’s marketplace and generates the JSON data files and Markdown documentation that make up the directory. You can run it locally to refresh data or test changes.
GitHub Actions runs this scraper automatically every day at midnight UTC using the update.yml workflow. You only need to run it manually if you are developing locally or contributing changes to the scraper itself.

Prerequisites

  • Python 3.11 or later
  • pip package manager

Steps

1

Clone the repository

git clone https://github.com/AgentsAPI/awesome-agent-apis.git
cd awesome-agent-apis
2

Install dependencies

The project depends on the following packages (from requirements.txt):
  • certifi==2026.2.25
  • charset-normalizer==3.4.6
  • idna==3.11
  • requests==2.33.0
  • urllib3==2.6.3
Install them with:
pip install -r requirements.txt
3

Run the scraper

python main.py
The scraper will print progress for each category and offset as it runs:
Category AGENTS - 0
Category AGENTS - 1000
...
Updated AGENTS: 1336 APIs
Category AI - 0
...

What happens when you run it

main.py iterates over all 17 categories returned by ApifyScraper.get_category_list():
['AGENTS', 'AI', 'AUTOMATION', 'DEVELOPER_TOOLS', 'ECOMMERCE',
 'INTEGRATIONS', 'JOBS', 'LEAD_GENERATION', 'MCP_SERVERS',
 'NEWS', 'OPEN_SOURCE', 'REAL_ESTATE', 'SEO_TOOLS',
 'SOCIAL_MEDIA', 'TRAVEL', 'VIDEOS', 'OTHER']
For each category, the scraper:
  1. Calls Apify’s Algolia search API in batches of 1,000 actors until all results are fetched.
  2. Deduplicates results by actor ID.
  3. Merges the new data with any existing JSON in the data/ directory. Ratings and review counts from existing records are preserved if the new fetch does not include them.
  4. Sorts the merged list using the Bayesian scoring algorithm (see below).
  5. Writes the sorted list to data/<slug>-agent-apis.json.
  6. Generates a README.md for the category directory (e.g., agents-agent-apis/README.md).
After all categories are processed, it regenerates the root README.md with updated totals and top-25 API tables for each category.
Output files are written to two locations: JSON data files in data/ and README.md files inside each category directory (e.g., agents-agent-apis/README.md).

Bayesian rating algorithm

APIs are ranked using a Bayesian average score, defined in scripts/helpers/utils.py:
def api_score(item, m, C):
    R = item.get('rating', 0)
    v = item.get('rating_count', 0)
    return (v / (v + m)) * R + (m / (v + m)) * C if (v + m) else 0
Where:
  • R — the API’s average rating (0–5 scale)
  • v — the number of reviews the API has received
  • m = 50 — the minimum votes threshold; APIs with fewer than 50 reviews are pulled toward the mean
  • C — the mean rating across all APIs in the category
This algorithm prevents APIs with a single 5-star review from outranking well-reviewed APIs with hundreds of ratings. An API needs a substantial number of reviews before its rating is treated as reliable.

GitHub Actions workflow

The workflow file at .github/workflows/update.yml runs the scraper on a cron schedule:
on:
  schedule:
    - cron: "0 0 * * *"
  workflow_dispatch:
The workflow_dispatch trigger also allows you to run the workflow manually from the GitHub Actions UI.

Build docs developers (and LLMs) love