Skip to main content
POST
/
research
/
batch
Batch Research
curl --request POST \
  --url https://api.example.com/research/batch \
  --header 'Content-Type: application/json' \
  --data '
{
  "research_goal": "<string>",
  "company_domains": [
    {}
  ],
  "search_depth": {},
  "max_parallel_searches": 123,
  "confidence_threshold": 123
}
'
{
  "research_id": "<string>",
  "total_companies": 123,
  "search_strategies_generated": 123,
  "total_searches_executed": 123,
  "processing_time_ms": 123,
  "results": [
    {
      "domain": "<string>",
      "confidence_score": 123,
      "evidence_sources": 123,
      "findings": {
        "technologies": [
          {}
        ],
        "evidence": [
          {
            "url": "<string>",
            "title": "<string>",
            "snippet": "<string>",
            "source_name": "<string>"
          }
        ],
        "signals_found": 123
      }
    }
  ],
  "search_performance": {
    "queries_per_second": 123,
    "failed_requests": 123
  },
  "422 Unprocessable Entity": {},
  "500 Internal Server Error": {}
}

Overview

The batch research endpoint performs complete company analysis and returns results after all processing is finished. This is ideal for:
  • Background processing and scheduled jobs
  • Batch analysis of multiple companies
  • Integration with ETL pipelines
  • Simple request/response workflows
For real-time progress updates, use the Streaming Research endpoint instead.

Endpoint

POST /research/batch

Request Body

research_goal
string
required
The high-level research objective. Be specific about what you’re looking for.Examples:
  • “Find fintech companies using AI for fraud detection”
  • “Identify healthcare startups building HIPAA-compliant patient portals”
  • “Discover e-commerce companies using microservices architecture”
company_domains
array
required
List of company domains to analyze. Each domain should be a valid hostname without protocol.Example: ["stripe.com", "paypal.com", "square.com"]
Maximum 100 domains per request. For larger batches, split into multiple requests.
search_depth
enum
required
Controls the number of search strategies and breadth of research.Options:
  • quick - 3-5 strategies, 10-20 seconds per company
  • standard - 8-12 strategies, 30-60 seconds per company
  • comprehensive - 15-20+ strategies, 90-180 seconds per company
Recommendation: Use standard for most use cases.
max_parallel_searches
integer
required
Maximum number of concurrent searches across all sources. Controls API throughput and rate limiting.Range: 5-50Recommendations:
  • 5-10 - Conservative, prevents rate limits
  • 20 - Balanced performance (recommended)
  • 40-50 - Aggressive, requires high API quotas
confidence_threshold
float
required
Minimum confidence score (0.0-1.0) to include a company in results.Range: 0.0-1.0Recommendations:
  • 0.5-0.6 - Inclusive, more potential matches
  • 0.7-0.8 - Balanced, good signal-to-noise
  • 0.9+ - Strict, only high-confidence matches

Response Fields

research_id
string
Unique identifier for this research run (UUID v4).
total_companies
integer
Number of companies analyzed in this request.
search_strategies_generated
integer
Number of search strategies generated by the LLM for this research goal.
total_searches_executed
integer
Total number of search queries executed across all sources and companies.
processing_time_ms
integer
Total processing time in milliseconds from request to response.
results
array
Array of company research results.
search_performance
object
Performance metrics for the research run.

Example Request

curl -X POST http://localhost:8000/research/batch \
  -H "Content-Type: application/json" \
  -d '{
    "research_goal": "Find fintech companies using AI for fraud detection",
    "company_domains": ["stripe.com", "paypal.com"],
    "search_depth": "standard",
    "max_parallel_searches": 20,
    "confidence_threshold": 0.7
  }'

Example Response

{
  "research_id": "a7f3c8e9-4b2d-4a1e-8c5f-9d7e6f8a3b2c",
  "total_companies": 2,
  "search_strategies_generated": 12,
  "total_searches_executed": 24,
  "processing_time_ms": 34200,
  "results": [
    {
      "domain": "stripe.com",
      "confidence_score": 0.92,
      "evidence_sources": 3,
      "findings": {
        "technologies": [
          "tensorflow",
          "python",
          "kubernetes",
          "radar",
          "machine-learning"
        ],
        "evidence": [
          {
            "url": "https://stripe.com/blog/radar-2.0",
            "title": "Introducing Radar 2.0: Advanced fraud detection with machine learning",
            "snippet": "Stripe Radar uses adaptive machine learning algorithms to detect and prevent fraud in real-time across millions of transactions...",
            "source_name": "google_search"
          },
          {
            "url": "https://stripe.com/jobs/listing/machine-learning-engineer-fraud/5678",
            "title": "Machine Learning Engineer - Fraud Detection",
            "snippet": "Build and deploy ML models for real-time fraud detection using TensorFlow and Python. Work on Stripe Radar...",
            "source_name": "jobs_search"
          },
          {
            "url": "https://newsapi.org/stripe-announces-100m-investment",
            "title": "Stripe Announces $100M Investment in AI Fraud Prevention",
            "snippet": "Payment processor Stripe today announced a major investment in artificial intelligence capabilities for fraud detection...",
            "source_name": "news_search"
          }
        ],
        "signals_found": 8
      }
    },
    {
      "domain": "paypal.com",
      "confidence_score": 0.85,
      "evidence_sources": 2,
      "findings": {
        "technologies": [
          "deep-learning",
          "java",
          "scala",
          "risk-management"
        ],
        "evidence": [
          {
            "url": "https://www.paypal.com/us/webapps/mpp/security/fraud-protection",
            "title": "PayPal Fraud Protection - Advanced Security",
            "snippet": "Our advanced AI and machine learning systems monitor transactions 24/7 to detect and prevent fraudulent activity...",
            "source_name": "google_search"
          },
          {
            "url": "https://newsapi.org/paypal-fraud-detection-ai",
            "title": "PayPal Enhances Fraud Detection with Deep Learning",
            "snippet": "PayPal has deployed new deep learning models that reduce false positives by 30% while catching more fraud...",
            "source_name": "news_search"
          }
        ],
        "signals_found": 6
      }
    }
  ],
  "search_performance": {
    "queries_per_second": 18.5,
    "failed_requests": 2
  }
}

Error Responses

422 Unprocessable Entity
error
Invalid request parameters or validation errors.
{
  "detail": [
    {
      "loc": ["body", "confidence_threshold"],
      "msg": "ensure this value is less than or equal to 1.0",
      "type": "value_error.number.not_le"
    }
  ]
}
500 Internal Server Error
error
Server-side processing error.
{
  "detail": "Failed to generate search strategies: API key invalid"
}

Performance Optimization

Tuning Parallelism

Adjust max_parallel_searches based on your API quotas:
{
  "max_parallel_searches": 10,
  "search_depth": "quick"
}

Filtering Results

Use confidence_threshold to reduce noise:
// High precision - only strong matches
const strictSearch = {
  confidence_threshold: 0.85,
  // ...
};

// High recall - more potential matches
const broadSearch = {
  confidence_threshold: 0.6,
  // ...
};

Batch Processing

For large company lists, process in batches:
import asyncio
import aiohttp

BATCH_SIZE = 50

async def research_batch(domains, research_goal):
    async with aiohttp.ClientSession() as session:
        for i in range(0, len(domains), BATCH_SIZE):
            batch = domains[i:i + BATCH_SIZE]
            payload = {
                'research_goal': research_goal,
                'company_domains': batch,
                'search_depth': 'standard',
                'max_parallel_searches': 20,
                'confidence_threshold': 0.7,
            }
            async with session.post(
                'http://localhost:8000/research/batch',
                json=payload
            ) as resp:
                data = await resp.json()
                yield data

# Process 500 companies in batches of 50
domains = [f"company{i}.com" for i in range(500)]
async for result in research_batch(domains, "Find AI startups"):
    print(f"Batch complete: {len(result['results'])} companies")

Integration Examples

Filtering High-Confidence Matches

def filter_high_confidence(response, min_confidence=0.8):
    """Filter results by confidence score."""
    return [
        result for result in response['results']
        if result['confidence_score'] >= min_confidence
    ]

data = response.json()
high_confidence = filter_high_confidence(data, min_confidence=0.85)
print(f"Found {len(high_confidence)} high-confidence matches")

Extracting Technologies

function extractAllTechnologies(response) {
  const allTechs = new Set();
  
  response.results.forEach(result => {
    result.findings.technologies.forEach(tech => {
      allTechs.add(tech);
    });
  });
  
  return Array.from(allTechs);
}

const technologies = extractAllTechnologies(data);
console.log('Technologies found:', technologies);
// Output: ['tensorflow', 'python', 'kubernetes', 'java', ...]

CSV Export

import csv

def export_to_csv(response, filename='research_results.csv'):
    """Export research results to CSV."""
    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow([
            'Domain',
            'Confidence',
            'Evidence Sources',
            'Technologies',
            'Signals Found'
        ])
        
        for result in response['results']:
            writer.writerow([
                result['domain'],
                result['confidence_score'],
                result['evidence_sources'],
                ', '.join(result['findings']['technologies']),
                result['findings']['signals_found']
            ])

export_to_csv(data)

Next Steps

Streaming Research

Learn how to implement real-time progress tracking with Server-Sent Events

Build docs developers (and LLMs) love