Skip to main content
Open Source APIs give AI agents access to tools built on well-known open-source projects and community-maintained scrapers. They are particularly useful for OSINT workflows, GitHub-based research, and building on battle-tested scraping infrastructure.
This category contains 825 APIs, updated daily from Apify’s marketplace.

Top APIs in this category

APIs are ranked by Bayesian quality score, which balances rating with review volume to surface consistently high-quality tools.
APIRatingDescription
Web Scraper⭐ 4.81 (72)Crawl arbitrary websites using a web browser and extract structured data using a provided JavaScript function. Supports recursive crawling and manages concurrency automatically.
Sherlock⭐ 4.70 (76)Hunt down social media accounts by username across social networks. Based on the open-source Sherlock project on GitHub.
LinkedIn Profile Scraper + Email⭐ 4.67 (29)Extract detailed LinkedIn profiles in bulk including work experience, education history, and skills. No cookies required.
Cheerio Scraper⭐ 4.99 (20)Crawl websites using raw HTTP requests and parse HTML with the Cheerio library. High-performance alternative to browser-based scrapers.
LinkedIn Profile Posts Scraper⭐ 4.89 (19)Extract posts from LinkedIn profiles including content, media, engagement, reactions, and comments. No cookies required.
Puppeteer Scraper⭐ 4.97 (18)Crawl websites with headless Chrome and Puppeteer. Gives fine control over the crawl process and supports login.
Website Screenshot Generator⭐ 4.39 (14)Create screenshots of websites from a specified URL. Useful for monitoring web changes on a schedule.
Bing Ads Scraper⭐ 5.00 (9)Find and scrape current and past ads on Bing. Get ad copy, shown dates and locations, impressions, and Microsoft Ad Library data.
LinkedIn Post Comments Scraper⭐ 4.17 (11)Extract LinkedIn post comments and replies in bulk, including social activities such as likes and reactions. No cookies required.
Tester MCP Client⭐ 4.99 (8)An MCP client that connects to any MCP server using Streamable HTTP and displays conversation in a chat-like UI.

Use with your agent

The following example uses the Cheerio Scraper to crawl a website and extract structured data from pages without JavaScript:
import requests

response = requests.post(
    "https://api.apify.com/v2/acts/apify~cheerio-scraper/run-sync-get-dataset-items",
    headers={"Authorization": "Bearer YOUR_APIFY_TOKEN"},
    json={
        "startUrls": [{"url": "https://github.com/trending"}],
        "pageFunction": """
            async function pageFunction(context) {
                const { $, request } = context;
                const repos = [];
                $('article.Box-row').each((i, el) => {
                    repos.push({
                        name: $(el).find('h2 a').text().trim(),
                        description: $(el).find('p').text().trim(),
                        stars: $(el).find('[aria-label*=\"stars\"]').text().trim()
                    });
                });
                return repos;
            }
        """,
        "maxRequestsPerCrawl": 5
    }
)
data = response.json()
print(data)

What you can build

  • Open-source intelligence (OSINT) — Use Sherlock to find social media accounts across platforms by username for research and investigation workflows.
  • Project discovery agents — Monitor GitHub trending repositories and extract project metadata to stay on top of emerging open-source tools.
  • Developer research pipelines — Scrape LinkedIn profiles of open-source contributors to understand the ecosystem around a project.
  • Web archiving and monitoring — Use Puppeteer or Cheerio scrapers to periodically capture web content and detect changes over time.

Developer Tools APIs

Infrastructure tools for building data pipelines.

AI APIs

LLM integrations for processing extracted data.

Build docs developers (and LLMs) love