Batch Research

Overview

The batch research endpoint performs complete company analysis and returns results after all processing is finished. This is ideal for:

Background processing and scheduled jobs
Batch analysis of multiple companies
Integration with ETL pipelines
Simple request/response workflows

For real-time progress updates, use the Streaming Research endpoint instead.

Endpoint

POST /research/batch

Request Body

research_goal

string

required

The high-level research objective. Be specific about what you’re looking for.Examples:

“Find fintech companies using AI for fraud detection”
“Identify healthcare startups building HIPAA-compliant patient portals”
“Discover e-commerce companies using microservices architecture”

company_domains

array

required

List of company domains to analyze. Each domain should be a valid hostname without protocol.Example: ["stripe.com", "paypal.com", "square.com"]

Maximum 100 domains per request. For larger batches, split into multiple requests.

search_depth

enum

required

Controls the number of search strategies and breadth of research.Options:

quick - 3-5 strategies, 10-20 seconds per company
standard - 8-12 strategies, 30-60 seconds per company
comprehensive - 15-20+ strategies, 90-180 seconds per company

Recommendation: Use standard for most use cases.

max_parallel_searches

integer

required

Maximum number of concurrent searches across all sources. Controls API throughput and rate limiting.Range: 5-50Recommendations:

5-10 - Conservative, prevents rate limits
20 - Balanced performance (recommended)
40-50 - Aggressive, requires high API quotas

confidence_threshold

float

required

Minimum confidence score (0.0-1.0) to include a company in results.Range: 0.0-1.0Recommendations:

0.5-0.6 - Inclusive, more potential matches
0.7-0.8 - Balanced, good signal-to-noise
0.9+ - Strict, only high-confidence matches

Response Fields

research_id

string

Unique identifier for this research run (UUID v4).

total_companies

integer

Number of companies analyzed in this request.

search_strategies_generated

integer

Number of search strategies generated by the LLM for this research goal.

total_searches_executed

integer

Total number of search queries executed across all sources and companies.

processing_time_ms

integer

Total processing time in milliseconds from request to response.

results

array

Array of company research results.

Show CompanyResearchResult object

domain

string

Company domain analyzed.

confidence_score

float

Confidence score (0.0-1.0) indicating match quality for the research goal.

evidence_sources

integer

Number of unique sources that provided evidence (max 3: Google, News, Jobs).

findings

object

Detailed findings for this company.

Show Findings object

technologies

array

List of technologies extracted from evidence (e.g., “tensorflow”, “kubernetes”, “python”).

evidence

array

Array of evidence items supporting the findings.

Show Evidence object

url

string

URL of the evidence source.

title

string

Title of the page or article.

snippet

string

Relevant text snippet from the source.

source_name

string

Name of the source (“google_search”, “news_search”, or “jobs_search”).

signals_found

integer

Number of signals indicating a match for the research goal.

search_performance

object

Performance metrics for the research run.

Show SearchPerformance object

queries_per_second

float

Average throughput of search queries across all sources.

failed_requests

integer

Number of failed search requests (circuit breaker trips, timeouts, errors).

Example Request

curl -X POST http://localhost:8000/research/batch \
  -H "Content-Type: application/json" \
  -d '{
    "research_goal": "Find fintech companies using AI for fraud detection",
    "company_domains": ["stripe.com", "paypal.com"],
    "search_depth": "standard",
    "max_parallel_searches": 20,
    "confidence_threshold": 0.7
  }'

Example Response

{
  "research_id": "a7f3c8e9-4b2d-4a1e-8c5f-9d7e6f8a3b2c",
  "total_companies": 2,
  "search_strategies_generated": 12,
  "total_searches_executed": 24,
  "processing_time_ms": 34200,
  "results": [
    {
      "domain": "stripe.com",
      "confidence_score": 0.92,
      "evidence_sources": 3,
      "findings": {
        "technologies": [
          "tensorflow",
          "python",
          "kubernetes",
          "radar",
          "machine-learning"
        ],
        "evidence": [
          {
            "url": "https://stripe.com/blog/radar-2.0",
            "title": "Introducing Radar 2.0: Advanced fraud detection with machine learning",
            "snippet": "Stripe Radar uses adaptive machine learning algorithms to detect and prevent fraud in real-time across millions of transactions...",
            "source_name": "google_search"
          },
          {
            "url": "https://stripe.com/jobs/listing/machine-learning-engineer-fraud/5678",
            "title": "Machine Learning Engineer - Fraud Detection",
            "snippet": "Build and deploy ML models for real-time fraud detection using TensorFlow and Python. Work on Stripe Radar...",
            "source_name": "jobs_search"
          },
          {
            "url": "https://newsapi.org/stripe-announces-100m-investment",
            "title": "Stripe Announces $100M Investment in AI Fraud Prevention",
            "snippet": "Payment processor Stripe today announced a major investment in artificial intelligence capabilities for fraud detection...",
            "source_name": "news_search"
          }
        ],
        "signals_found": 8
      }
    },
    {
      "domain": "paypal.com",
      "confidence_score": 0.85,
      "evidence_sources": 2,
      "findings": {
        "technologies": [
          "deep-learning",
          "java",
          "scala",
          "risk-management"
        ],
        "evidence": [
          {
            "url": "https://www.paypal.com/us/webapps/mpp/security/fraud-protection",
            "title": "PayPal Fraud Protection - Advanced Security",
            "snippet": "Our advanced AI and machine learning systems monitor transactions 24/7 to detect and prevent fraudulent activity...",
            "source_name": "google_search"
          },
          {
            "url": "https://newsapi.org/paypal-fraud-detection-ai",
            "title": "PayPal Enhances Fraud Detection with Deep Learning",
            "snippet": "PayPal has deployed new deep learning models that reduce false positives by 30% while catching more fraud...",
            "source_name": "news_search"
          }
        ],
        "signals_found": 6
      }
    }
  ],
  "search_performance": {
    "queries_per_second": 18.5,
    "failed_requests": 2
  }
}

Error Responses

422 Unprocessable Entity

error

Invalid request parameters or validation errors.

{
  "detail": [
    {
      "loc": ["body", "confidence_threshold"],
      "msg": "ensure this value is less than or equal to 1.0",
      "type": "value_error.number.not_le"
    }
  ]
}

500 Internal Server Error

error

Server-side processing error.

{
  "detail": "Failed to generate search strategies: API key invalid"
}

Performance Optimization

Tuning Parallelism

Adjust max_parallel_searches based on your API quotas:

{
  "max_parallel_searches": 10,
  "search_depth": "quick"
}

Filtering Results

Use confidence_threshold to reduce noise:

// High precision - only strong matches
const strictSearch = {
  confidence_threshold: 0.85,
  // ...
};

// High recall - more potential matches
const broadSearch = {
  confidence_threshold: 0.6,
  // ...
};

Batch Processing

For large company lists, process in batches:

import asyncio
import aiohttp

BATCH_SIZE = 50

async def research_batch(domains, research_goal):
    async with aiohttp.ClientSession() as session:
        for i in range(0, len(domains), BATCH_SIZE):
            batch = domains[i:i + BATCH_SIZE]
            payload = {
                'research_goal': research_goal,
                'company_domains': batch,
                'search_depth': 'standard',
                'max_parallel_searches': 20,
                'confidence_threshold': 0.7,
            }
            async with session.post(
                'http://localhost:8000/research/batch',
                json=payload
            ) as resp:
                data = await resp.json()
                yield data

# Process 500 companies in batches of 50
domains = [f"company{i}.com" for i in range(500)]
async for result in research_batch(domains, "Find AI startups"):
    print(f"Batch complete: {len(result['results'])} companies")

Integration Examples

Filtering High-Confidence Matches

def filter_high_confidence(response, min_confidence=0.8):
    """Filter results by confidence score."""
    return [
        result for result in response['results']
        if result['confidence_score'] >= min_confidence
    ]

data = response.json()
high_confidence = filter_high_confidence(data, min_confidence=0.85)
print(f"Found {len(high_confidence)} high-confidence matches")

Extracting Technologies

function extractAllTechnologies(response) {
  const allTechs = new Set();
  
  response.results.forEach(result => {
    result.findings.technologies.forEach(tech => {
      allTechs.add(tech);
    });
  });
  
  return Array.from(allTechs);
}

const technologies = extractAllTechnologies(data);
console.log('Technologies found:', technologies);
// Output: ['tensorflow', 'python', 'kubernetes', 'java', ...]

CSV Export

import csv

def export_to_csv(response, filename='research_results.csv'):
    """Export research results to CSV."""
    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow([
            'Domain',
            'Confidence',
            'Evidence Sources',
            'Technologies',
            'Signals Found'
        ])
        
        for result in response['results']:
            writer.writerow([
                result['domain'],
                result['confidence_score'],
                result['evidence_sources'],
                ', '.join(result['findings']['technologies']),
                result['findings']['signals_found']
            ])

export_to_csv(data)

Next Steps

Streaming Research

Learn how to implement real-time progress tracking with Server-Sent Events

Endpoints

Models

Overview

Endpoint

Request Body

Response Fields

Example Request

Example Response

Error Responses

Performance Optimization

Tuning Parallelism

Filtering Results

Batch Processing

Integration Examples

Filtering High-Confidence Matches

Extracting Technologies

CSV Export

Next Steps

Streaming Research

Build docs developers (and LLMs) love

Endpoints

Models

​Overview

​Endpoint

​Request Body

​Response Fields

​Example Request

​Example Response

​Error Responses

​Performance Optimization

​Tuning Parallelism

​Filtering Results

​Batch Processing

​Integration Examples

​Filtering High-Confidence Matches

​Extracting Technologies

​CSV Export

​Next Steps

Streaming Research

Build docs developers (and LLMs) love

Overview

Endpoint

Request Body

Response Fields

Example Request

Example Response

Error Responses

Performance Optimization

Tuning Parallelism

Filtering Results

Batch Processing

Integration Examples

Filtering High-Confidence Matches

Extracting Technologies

CSV Export

Next Steps