Research Agent Pack

The Research Agent skill pack enables intelligent web research workflows with semantic memory, privacy-focused search, and headless browser automation.

Included Services

Qdrant

Vector database for semantic memory

SearXNG

Privacy-focused metasearch engine

Browserless

Headless Chrome for web scraping

Skills Provided

Qdrant Memory

Capabilities:

Store and search vector embeddings
Semantic similarity search
Filter searches by metadata
Build RAG (Retrieval-Augmented Generation) systems
Manage multiple collections

Example Usage:

# Create a collection for research notes
curl -X PUT "http://qdrant:6333/collections/research_notes" \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {"size": 1536, "distance": "Cosine"},
    "optimizers_config": {"default_segment_number": 2}
  }'

# Store a research finding with embedding
curl -X PUT "http://qdrant:6333/collections/research_notes/points" \
  -H "Content-Type: application/json" \
  -d '{
    "points": [{
      "id": 1,
      "vector": [0.05, 0.61, 0.76, ...],
      "payload": {
        "source": "https://example.com/article",
        "text": "Key findings from the research paper...",
        "timestamp": "2025-01-15T10:30:00Z",
        "tags": ["ai", "research"]
      }
    }]
  }'

# Search for similar research
curl -X POST "http://qdrant:6333/collections/research_notes/points/search" \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.2, 0.1, 0.9, ...],
    "limit": 5,
    "with_payload": true
  }'

SearXNG Search

Capabilities:

Privacy-focused web search
Aggregates results from multiple search engines
No tracking or profiling
JSON API for programmatic access
Filter by category (web, images, news, etc.)

Example Usage:

# Search the web
curl "http://searxng:8080/search?q=artificial+intelligence&format=json"

# Search for academic papers
curl "http://searxng:8080/search?q=quantum+computing&categories=science&format=json"

# Search for news articles
curl "http://searxng:8080/search?q=latest+ai+developments&categories=news&format=json"

# Image search
curl "http://searxng:8080/search?q=neural+networks&categories=images&format=json"

Response structure:

{
  "query": "artificial intelligence",
  "results": [
    {
      "title": "What is AI?",
      "url": "https://example.com/ai",
      "content": "Artificial intelligence is...",
      "engine": "google",
      "score": 0.95
    }
  ]
}

Browserless Browse

Capabilities:

Headless Chrome automation
Render JavaScript-heavy pages
Take screenshots
Generate PDFs
Extract structured data
Handle dynamic content

Example Usage:

# Scrape a webpage with JavaScript rendering
curl -X POST "http://browserless:3000/content?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article",
    "waitForSelector": "#main-content"
  }'

# Take a screenshot
curl -X POST "http://browserless:3000/screenshot?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output screenshot.png

# Generate a PDF
curl -X POST "http://browserless:3000/pdf?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/report"}' \
  --output report.pdf

# Execute custom Puppeteer script
curl -X POST "http://browserless:3000/function?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "code": "async ({ page }) => {
      await page.goto(\"https://example.com\");
      const title = await page.title();
      return { title };
    }"
  }'

Use Cases

Intelligent Web Research

Build a research agent that:

Searches the web using SearXNG
Visits top results with Browserless
Extracts key information from page content
Generates embeddings using Ollama (from Local AI pack)
Stores in Qdrant for semantic retrieval
Answers questions using RAG

Competitive Intelligence

Monitor competitor websites and track changes:

# 1. Search for competitor mentions
curl "http://searxng:8080/search?q=competitor+product+launch&format=json"

# 2. Visit each result and capture content
for url in $urls; do
  curl -X POST "http://browserless:3000/content?token=TOKEN" \
    -d "{\"url\": \"$url\"}" > content.html
  
  # 3. Extract and store insights in Qdrant
  # (with embeddings)
done

Academic Research

Search academic sources and build a knowledge base:

# Search for research papers
curl "http://searxng:8080/search?q=neural+networks&categories=science&format=json" \
  | jq -r '.results[] | .url' \
  | while read url; do
      # Download PDF or capture content
      curl -X POST "http://browserless:3000/pdf?token=TOKEN" \
        -d "{\"url\": \"$url\"}" \
        --output "papers/$(echo $url | md5sum | cut -d' ' -f1).pdf"
    done

Content Monitoring

Track website changes and get alerts:

# Capture current state
curl -X POST "http://browserless:3000/screenshot?token=TOKEN" \
  -d '{"url": "https://target.com"}' \
  --output current.png

# Compare with previous state
# (use image diff tools)

# If changed, store in Qdrant with timestamp

Example Research Workflow

Complete research pipeline:

#!/bin/bash
# Research Agent Workflow

# 1. Search for information
RESULTS=$(curl -s "http://searxng:8080/search?q=AI+trends+2025&format=json")

# 2. Extract top 5 URLs
URLS=$(echo $RESULTS | jq -r '.results[0:5] | .[].url')

# 3. Visit each URL and extract content
for URL in $URLS; do
  echo "Processing: $URL"
  
  # Scrape content with Browserless
  CONTENT=$(curl -s -X POST "http://browserless:3000/content?token=TOKEN" \
    -H "Content-Type: application/json" \
    -d "{\"url\": \"$URL\", \"waitForSelector\": \"body\"}")
  
  # Generate embedding (using Ollama from Local AI pack)
  EMBEDDING=$(curl -s -X POST "http://ollama:11434/api/embed" \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"nomic-embed-text\", \"input\": [\"$CONTENT\"]}" \
    | jq -r '.embeddings[0]')
  
  # Store in Qdrant
  curl -X PUT "http://qdrant:6333/collections/research/points" \
    -H "Content-Type: application/json" \
    -d "{
      \"points\": [{
        \"id\": $(uuidgen | md5sum | head -c 8),
        \"vector\": $EMBEDDING,
        \"payload\": {
          \"url\": \"$URL\",
          \"content\": \"$CONTENT\",
          \"timestamp\": \"$(date -Iseconds)\"
        }
      }]
    }"
done

echo "Research complete. Data stored in Qdrant."

Configuration

Environment Variables

# Qdrant
QDRANT_HOST=qdrant
QDRANT_PORT=6333
QDRANT_GRPC_PORT=6334

# SearXNG
SEARXNG_HOST=searxng
SEARXNG_PORT=8080

# Browserless
BROWSERLESS_HOST=browserless
BROWSERLESS_PORT=3000
BROWSERLESS_TOKEN=<generated>

Collection Patterns

Recommended Qdrant collections:

research_notes - Manual research findings
web_scrapes - Automated scraping results
documents - Uploaded research documents
conversations - Chat history for RAG

Memory Requirements

Qdrant: ~512 MB base + vector data
SearXNG: ~512 MB
Browserless: ~1.5 GB (Chrome + Node.js)

Total: ~2.5 GB minimum

Performance Tips

Qdrant

Create payload indexes on frequently filtered fields
Use with_vector: false when only payloads are needed
Batch upsert operations for better performance

SearXNG

Cache search results to reduce load
Use specific categories to narrow results
Respect rate limits from upstream engines

Browserless

Reuse browser contexts when possible
Use waitForSelector instead of arbitrary delays
Disable images/CSS for faster scraping: {"blockAds": true}
Increase timeout for slow-loading pages

Next Steps

Local AI Pack

Add Ollama for embeddings and LLM inference

Knowledge Base Pack

Add full-text search with Meilisearch

Service Catalog

Skill Packs

Included Services

Qdrant

SearXNG

Browserless

Skills Provided

Qdrant Memory

SearXNG Search

Browserless Browse

Use Cases

Intelligent Web Research

Competitive Intelligence

Academic Research

Content Monitoring

Example Research Workflow

Configuration

Environment Variables

Collection Patterns

Memory Requirements

Performance Tips

Qdrant

SearXNG

Browserless

Next Steps

Local AI Pack

Knowledge Base Pack

Build docs developers (and LLMs) love

Service Catalog

Skill Packs

​Included Services

Qdrant

SearXNG

Browserless

​Skills Provided

​Qdrant Memory

​SearXNG Search

​Browserless Browse

​Use Cases

​Intelligent Web Research

​Competitive Intelligence

​Academic Research

​Content Monitoring

​Example Research Workflow

​Configuration

​Environment Variables

​Collection Patterns

​Memory Requirements

​Performance Tips

​Qdrant

​SearXNG

​Browserless

​Next Steps

Local AI Pack

Knowledge Base Pack

Build docs developers (and LLMs) love

Included Services

Skills Provided

Qdrant Memory

SearXNG Search

Browserless Browse

Use Cases

Intelligent Web Research

Competitive Intelligence

Academic Research

Content Monitoring

Example Research Workflow

Configuration

Environment Variables

Collection Patterns

Memory Requirements

Performance Tips

Qdrant

SearXNG

Browserless

Next Steps