Data scraping and search APIs - Hackathon Resources

Build data-rich applications with these APIs that handle web scraping, search, and research. Perfect for RAG applications, training datasets, and real-time search features.

Firecrawl

Turn entire websites into LLM-ready data

Serper.dev

Lightning-fast Google Search API

Parallel.ai

Advanced web research for AI agents

Firecrawl

Firecrawl transforms websites into clean, structured data ready for LLMs, RAG applications, or training datasets. It handles JavaScript rendering, pagination, and data extraction automatically.

Key features

Scrape
Crawl
Map
Extract

Extract single pages in multiple formats:

Markdown - Clean, structured text for LLMs
HTML - Full page source with styling
Structured data - Extracted JSON objects
Screenshots - Visual captures of pages

Pricing and free tier

Firecrawl offers a generous free tier for testing. Check their website for current pricing details.

Quick start

Install the Python SDK:

pip install firecrawl-py

Code examples

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")

# Scrape a website
doc = firecrawl.scrape("https://example.com", formats=["markdown", "html"])
print(doc.markdown)

Use cases

RAG applications

Create vector databases from documentation sites, knowledge bases, or technical resources.

Training datasets

Build clean, structured datasets for fine-tuning models or training classifiers.

Competitive intelligence

Monitor competitor websites, pricing changes, or product updates automatically.

Content aggregation

Collect and organize content from multiple sources into unified datasets.

Essential for RAG applications - Firecrawl’s markdown output is perfect for creating vector embeddings. Clean data = better retrieval accuracy.

Serper.dev

Serper.dev provides lightning-fast Google Search results through a simple API. With 1-2 second response times and generous free credits, it’s perfect for adding real-time search to your hackathon app.

Key features

Blazing fast - 1-2 second response times (industry-leading)
Generous free tier - 2,500+ free credits for new signups
Cost-effective - $0.30-$ 1.00 per 1,000 queries (10x cheaper than alternatives)
Rich results - Organic results, images, videos, knowledge graphs, places
Structured JSON - Easy to parse and integrate
No rate limits - On paid plans (free tier has reasonable limits)

Pricing

Free tier

2,500+ credits for new signups. Perfect for hackathons.

Standard pricing

$0.30-$ 1.00 per 1,000 queries based on volume.

Enterprise

Custom pricing for high-volume needs.

Response structure

Serper returns structured JSON with:

Organic results - Title, snippet, URL, position
Knowledge graph - Entity information from Google
People also ask - Related questions
Images - Image search results
Videos - Video results from YouTube and others
Places - Local business results (for location queries)
Related searches - Suggested follow-up queries

Quick start

curl -X POST https://google.serper.dev/search \
  -H 'X-API-KEY: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"q":"hackathon tips"}'

Code examples

import requests
import json

url = "https://google.serper.dev/search"
payload = json.dumps({"q": "latest AI news"})
headers = {
    'X-API-KEY': 'YOUR_API_KEY',
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)
results = response.json()

# Print top 3 results
for result in results['organic'][:3]:
    print(f"Title: {result['title']}")
    print(f"URL: {result['link']}")
    print(f"Snippet: {result['snippet']}")
    print()

Use cases

Real-time search features

Add Google search to chatbots, research tools, or content aggregators without building your own crawler.

SEO and monitoring

Track keyword rankings, monitor search results, or analyze SERP features for competitive intelligence.

AI agent tools

Give LLM agents the ability to search the web for current information beyond their training data.

Content discovery

Find relevant images, videos, or articles programmatically for content curation apps.

Generous free tier - 2,500 credits is enough for the entire hackathon. You won’t hit limits mid-demo.

Parallel.ai

Parallel.ai provides advanced web research and search APIs specifically designed for AI agents. With 48% multi-hop accuracy compared to GPT-4’s 14%, it excels at deep research tasks.

Key features

Deep Research Mode - Multi-hop reasoning with 48% accuracy (vs GPT-4’s 14%)
Multiple agent modes - Fast, hyper-fast, and comprehensive research options
Scraping & extraction - Get structured data from any page
SOC 2 Type II certified - Enterprise-grade security and compliance
Structured JSON outputs - Easy integration with your applications
Citations included - All answers include source URLs for verification

Research modes

Fast mode
Hyper-fast mode
Comprehensive mode

Quick research for straightforward queries:

Response time: 5-10 seconds
Single-hop queries
Best for factual lookups
Lower cost per query

Multi-hop research

Unlike standard search APIs, Parallel.ai can answer questions that require multiple reasoning steps:

Initial query

User asks: “What programming language was used to build the first version of Twitter?”

Research step 1

Agent searches: “Twitter first version programming language”

Research step 2

Finds Ruby on Rails, then searches: “Ruby on Rails programming language”

Final answer

Returns: “Ruby - Twitter was originally built using Ruby on Rails framework”

When to use what

Use Firecrawl for...

Creating RAG datasets
Scraping documentation
Building training data
Extracting structured info

Use Serper for...

Real-time search features
Simple web queries
Image/video search
Cost-effective high volume

Use Parallel.ai for...

Deep research tasks
Multi-hop reasoning
Complex questions
AI agent capabilities

Best practices

For scraping (Firecrawl)

Respect robots.txt and terms of service - Always check if a website allows scraping. Firecrawl respects these rules automatically.

Start small - Test on a few pages before crawling thousands
Use the Map feature - Plan your crawl strategy by seeing all URLs first
Choose the right format - Markdown for LLMs, HTML for full fidelity, structured data for specific extraction
Cache results - Store scraped data locally to avoid re-scraping during development
Handle errors - Some pages may fail; implement retry logic with exponential backoff

For search (Serper/Parallel.ai)

Cache common queries - Don’t waste credits on repeated searches
Monitor usage - Track API calls to stay within free tier during hackathon
Parse structured data - Both APIs return JSON; extract exactly what you need
Add citations - Always credit sources when displaying search results
Implement fallbacks - If one API fails, have a backup search method

Cost optimization

Free tier strategy - Use Serper’s 2,500 free credits for demos and testing. Switch to Parallel.ai only for complex research tasks that justify the higher cost.

Development phase

Use free tiers exclusively
Cache all results locally
Mock API responses for UI development
Only make real API calls when testing functionality

Demo day

Pre-load common queries
Have cached responses ready
Monitor rate limits closely
Implement graceful fallbacks

Example: Building a research assistant

Combine all three APIs for a powerful research tool:

from firecrawl import Firecrawl
import requests
import json

class ResearchAssistant:
    def __init__(self, firecrawl_key, serper_key):
        self.firecrawl = Firecrawl(api_key=firecrawl_key)
        self.serper_key = serper_key
    
    def search_web(self, query):
        """Search with Serper for fast results"""
        url = "https://google.serper.dev/search"
        payload = json.dumps({"q": query, "num": 5})
        headers = {
            'X-API-KEY': self.serper_key,
            'Content-Type': 'application/json'
        }
        response = requests.post(url, headers=headers, data=payload)
        return response.json()['organic']
    
    def deep_scrape(self, url):
        """Get full content from a URL"""
        doc = self.firecrawl.scrape(url, formats=["markdown"])
        return doc.markdown
    
    def research_topic(self, topic):
        """Complete research workflow"""
        # 1. Search for relevant pages
        print(f"Searching for: {topic}")
        results = self.search_web(topic)
        
        # 2. Scrape top 3 results
        research_data = []
        for result in results[:3]:
            print(f"Scraping: {result['link']}")
            content = self.deep_scrape(result['link'])
            research_data.append({
                'title': result['title'],
                'url': result['link'],
                'content': content
            })
        
        return research_data

# Usage
assistant = ResearchAssistant(
    firecrawl_key="fc-YOUR_KEY",
    serper_key="YOUR_SERPER_KEY"
)

data = assistant.research_topic("machine learning best practices")

This example demonstrates how to combine search (Serper) with scraping (Firecrawl) for comprehensive research automation.

Getting Started

Resources

Examples

Firecrawl

Serper.dev

Parallel.ai

​Firecrawl

​Key features

​Pricing and free tier

​Quick start

​Code examples

​Use cases

RAG applications

Training datasets

Competitive intelligence

Content aggregation

​Serper.dev

​Key features

​Pricing

Free tier

Standard pricing

Enterprise

​Response structure

​Quick start

​Code examples

​Use cases

​Parallel.ai

​Key features

​Research modes

​Multi-hop research

​When to use what

Use Firecrawl for...

Use Serper for...

Use Parallel.ai for...

​Best practices

​For scraping (Firecrawl)

​For search (Serper/Parallel.ai)

​Cost optimization

​Example: Building a research assistant

Build docs developers (and LLMs) love

Firecrawl

Key features

Pricing and free tier

Quick start

Code examples

Use cases

Serper.dev

Key features

Pricing

Response structure

Quick start

Code examples

Use cases

Parallel.ai

Key features

Research modes

Multi-hop research

When to use what

Best practices

For scraping (Firecrawl)

For search (Serper/Parallel.ai)

Cost optimization

Example: Building a research assistant