Skip to main content
The Research Agent skill pack enables intelligent web research workflows with semantic memory, privacy-focused search, and headless browser automation.

Included Services

Qdrant

Vector database for semantic memory

SearXNG

Privacy-focused metasearch engine

Browserless

Headless Chrome for web scraping

Skills Provided

Qdrant Memory

Capabilities:
  • Store and search vector embeddings
  • Semantic similarity search
  • Filter searches by metadata
  • Build RAG (Retrieval-Augmented Generation) systems
  • Manage multiple collections
Example Usage:
# Create a collection for research notes
curl -X PUT "http://qdrant:6333/collections/research_notes" \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {"size": 1536, "distance": "Cosine"},
    "optimizers_config": {"default_segment_number": 2}
  }'

# Store a research finding with embedding
curl -X PUT "http://qdrant:6333/collections/research_notes/points" \
  -H "Content-Type: application/json" \
  -d '{
    "points": [{
      "id": 1,
      "vector": [0.05, 0.61, 0.76, ...],
      "payload": {
        "source": "https://example.com/article",
        "text": "Key findings from the research paper...",
        "timestamp": "2025-01-15T10:30:00Z",
        "tags": ["ai", "research"]
      }
    }]
  }'

# Search for similar research
curl -X POST "http://qdrant:6333/collections/research_notes/points/search" \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.2, 0.1, 0.9, ...],
    "limit": 5,
    "with_payload": true
  }'
Capabilities:
  • Privacy-focused web search
  • Aggregates results from multiple search engines
  • No tracking or profiling
  • JSON API for programmatic access
  • Filter by category (web, images, news, etc.)
Example Usage:
# Search the web
curl "http://searxng:8080/search?q=artificial+intelligence&format=json"

# Search for academic papers
curl "http://searxng:8080/search?q=quantum+computing&categories=science&format=json"

# Search for news articles
curl "http://searxng:8080/search?q=latest+ai+developments&categories=news&format=json"

# Image search
curl "http://searxng:8080/search?q=neural+networks&categories=images&format=json"
Response structure:
{
  "query": "artificial intelligence",
  "results": [
    {
      "title": "What is AI?",
      "url": "https://example.com/ai",
      "content": "Artificial intelligence is...",
      "engine": "google",
      "score": 0.95
    }
  ]
}

Browserless Browse

Capabilities:
  • Headless Chrome automation
  • Render JavaScript-heavy pages
  • Take screenshots
  • Generate PDFs
  • Extract structured data
  • Handle dynamic content
Example Usage:
# Scrape a webpage with JavaScript rendering
curl -X POST "http://browserless:3000/content?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article",
    "waitForSelector": "#main-content"
  }'

# Take a screenshot
curl -X POST "http://browserless:3000/screenshot?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output screenshot.png

# Generate a PDF
curl -X POST "http://browserless:3000/pdf?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/report"}' \
  --output report.pdf

# Execute custom Puppeteer script
curl -X POST "http://browserless:3000/function?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "code": "async ({ page }) => {
      await page.goto(\"https://example.com\");
      const title = await page.title();
      return { title };
    }"
  }'

Use Cases

Intelligent Web Research

Build a research agent that:
  1. Searches the web using SearXNG
  2. Visits top results with Browserless
  3. Extracts key information from page content
  4. Generates embeddings using Ollama (from Local AI pack)
  5. Stores in Qdrant for semantic retrieval
  6. Answers questions using RAG

Competitive Intelligence

Monitor competitor websites and track changes:
# 1. Search for competitor mentions
curl "http://searxng:8080/search?q=competitor+product+launch&format=json"

# 2. Visit each result and capture content
for url in $urls; do
  curl -X POST "http://browserless:3000/content?token=TOKEN" \
    -d "{\"url\": \"$url\"}" > content.html
  
  # 3. Extract and store insights in Qdrant
  # (with embeddings)
done

Academic Research

Search academic sources and build a knowledge base:
# Search for research papers
curl "http://searxng:8080/search?q=neural+networks&categories=science&format=json" \
  | jq -r '.results[] | .url' \
  | while read url; do
      # Download PDF or capture content
      curl -X POST "http://browserless:3000/pdf?token=TOKEN" \
        -d "{\"url\": \"$url\"}" \
        --output "papers/$(echo $url | md5sum | cut -d' ' -f1).pdf"
    done

Content Monitoring

Track website changes and get alerts:
# Capture current state
curl -X POST "http://browserless:3000/screenshot?token=TOKEN" \
  -d '{"url": "https://target.com"}' \
  --output current.png

# Compare with previous state
# (use image diff tools)

# If changed, store in Qdrant with timestamp

Example Research Workflow

Complete research pipeline:
#!/bin/bash
# Research Agent Workflow

# 1. Search for information
RESULTS=$(curl -s "http://searxng:8080/search?q=AI+trends+2025&format=json")

# 2. Extract top 5 URLs
URLS=$(echo $RESULTS | jq -r '.results[0:5] | .[].url')

# 3. Visit each URL and extract content
for URL in $URLS; do
  echo "Processing: $URL"
  
  # Scrape content with Browserless
  CONTENT=$(curl -s -X POST "http://browserless:3000/content?token=TOKEN" \
    -H "Content-Type: application/json" \
    -d "{\"url\": \"$URL\", \"waitForSelector\": \"body\"}")
  
  # Generate embedding (using Ollama from Local AI pack)
  EMBEDDING=$(curl -s -X POST "http://ollama:11434/api/embed" \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"nomic-embed-text\", \"input\": [\"$CONTENT\"]}" \
    | jq -r '.embeddings[0]')
  
  # Store in Qdrant
  curl -X PUT "http://qdrant:6333/collections/research/points" \
    -H "Content-Type: application/json" \
    -d "{
      \"points\": [{
        \"id\": $(uuidgen | md5sum | head -c 8),
        \"vector\": $EMBEDDING,
        \"payload\": {
          \"url\": \"$URL\",
          \"content\": \"$CONTENT\",
          \"timestamp\": \"$(date -Iseconds)\"
        }
      }]
    }"
done

echo "Research complete. Data stored in Qdrant."

Configuration

Environment Variables

# Qdrant
QDRANT_HOST=qdrant
QDRANT_PORT=6333
QDRANT_GRPC_PORT=6334

# SearXNG
SEARXNG_HOST=searxng
SEARXNG_PORT=8080

# Browserless
BROWSERLESS_HOST=browserless
BROWSERLESS_PORT=3000
BROWSERLESS_TOKEN=<generated>

Collection Patterns

Recommended Qdrant collections:
  • research_notes - Manual research findings
  • web_scrapes - Automated scraping results
  • documents - Uploaded research documents
  • conversations - Chat history for RAG

Memory Requirements

  • Qdrant: ~512 MB base + vector data
  • SearXNG: ~512 MB
  • Browserless: ~1.5 GB (Chrome + Node.js)
Total: ~2.5 GB minimum

Performance Tips

Qdrant

  • Create payload indexes on frequently filtered fields
  • Use with_vector: false when only payloads are needed
  • Batch upsert operations for better performance

SearXNG

  • Cache search results to reduce load
  • Use specific categories to narrow results
  • Respect rate limits from upstream engines

Browserless

  • Reuse browser contexts when possible
  • Use waitForSelector instead of arbitrary delays
  • Disable images/CSS for faster scraping: {"blockAds": true}
  • Increase timeout for slow-loading pages

Next Steps

Local AI Pack

Add Ollama for embeddings and LLM inference

Knowledge Base Pack

Add full-text search with Meilisearch

Build docs developers (and LLMs) love