Skip to main content
The Knowledge Base skill pack enables building intelligent document search and retrieval systems with both semantic (vector) and keyword-based (full-text) search.

Included Services

Qdrant

Vector database for semantic search

PostgreSQL

Relational database for structured data

Meilisearch

Fast typo-tolerant full-text search

Skills Provided

Qdrant Memory

Capabilities:
  • Store document embeddings
  • Semantic similarity search
  • Metadata filtering
  • Hybrid search with filters
  • Multi-collection management
Example Usage:
# Create a documents collection
curl -X PUT "http://qdrant:6333/collections/documents" \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {"size": 1536, "distance": "Cosine"},
    "optimizers_config": {"default_segment_number": 2}
  }'

# Store document chunks with embeddings
curl -X PUT "http://qdrant:6333/collections/documents/points" \
  -H "Content-Type: application/json" \
  -d '{
    "points": [{
      "id": 1,
      "vector": [0.05, 0.61, 0.76, ...],
      "payload": {
        "document_id": "doc-123",
        "chunk_index": 0,
        "text": "Machine learning is a subset of artificial intelligence...",
        "source": "ml-handbook.pdf",
        "page": 5,
        "created_at": "2025-01-15T10:30:00Z"
      }
    }]
  }'

# Semantic search
curl -X POST "http://qdrant:6333/collections/documents/points/search" \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.2, 0.1, 0.9, ...],
    "limit": 10,
    "with_payload": true,
    "filter": {
      "must": [
        {"key": "source", "match": {"value": "ml-handbook.pdf"}}
      ]
    }
  }'

PostgreSQL Query

Capabilities:
  • Store document metadata
  • Track document versions
  • User permissions and access control
  • Full ACID transactions
  • Complex relational queries
Example Usage:
-- Create documents table
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title VARCHAR(255) NOT NULL,
  content_type VARCHAR(50),
  file_path TEXT,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW(),
  created_by UUID,
  tags TEXT[],
  metadata JSONB
);

-- Create chunks table (for RAG)
CREATE TABLE document_chunks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
  chunk_index INTEGER,
  text TEXT NOT NULL,
  qdrant_id BIGINT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Insert a document
INSERT INTO documents (title, content_type, file_path, tags)
VALUES (
  'Machine Learning Handbook',
  'application/pdf',
  '/data/documents/ml-handbook.pdf',
  ARRAY['ai', 'ml', 'handbook']
);

-- Search by tags
SELECT * FROM documents
WHERE 'ai' = ANY(tags)
ORDER BY created_at DESC
LIMIT 10;

Meilisearch Index

Capabilities:
  • Lightning-fast full-text search
  • Typo-tolerant search
  • Faceted filtering
  • Highlighting and snippets
  • Instant search as-you-type
  • Ranking customization
Example Usage:
# Create an index
curl -X POST "http://meilisearch:7700/indexes" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "uid": "documents",
    "primaryKey": "id"
  }'

# Configure searchable attributes
curl -X PATCH "http://meilisearch:7700/indexes/documents/settings" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "searchableAttributes": ["title", "content", "tags"],
    "filterableAttributes": ["content_type", "created_at", "tags"],
    "sortableAttributes": ["created_at", "title"]
  }'

# Add documents
curl -X POST "http://meilisearch:7700/indexes/documents/documents" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "id": "doc-123",
      "title": "Machine Learning Handbook",
      "content": "A comprehensive guide to machine learning...",
      "content_type": "pdf",
      "tags": ["ai", "ml"],
      "created_at": 1705315800
    }
  ]'

# Search (typo-tolerant)
curl -X POST "http://meilisearch:7700/indexes/documents/search" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "machne lerning",
    "limit": 20,
    "attributesToHighlight": ["title", "content"],
    "filter": "tags = ai"
  }'

Use Cases

RAG (Retrieval-Augmented Generation)

Build a complete RAG system:
  1. Ingest documents
    • Upload PDFs, markdown, etc.
    • Store metadata in PostgreSQL
    • Index full text in Meilisearch
  2. Chunk and embed
    • Split documents into chunks
    • Generate embeddings with Ollama
    • Store vectors in Qdrant
  3. Search
    • Full-text search with Meilisearch
    • Semantic search with Qdrant
    • Combine results for hybrid search
  4. Generate answers
    • Retrieve relevant chunks
    • Pass to LLM for generation
    • Cite sources from PostgreSQL

Document Management System

Build an intelligent DMS:
# Upload document
1. Store file in MinIO (from Video Creator pack)
2. Extract text content
3. Insert metadata in PostgreSQL
4. Index in Meilisearch
5. Generate embeddings and store in Qdrant

# Search documents
- Full-text: Meilisearch
- Semantic: Qdrant
- Metadata filters: PostgreSQL

Knowledge Graph

Connect documents with relationships:
-- Create relationships table
CREATE TABLE document_relationships (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  source_doc_id UUID REFERENCES documents(id),
  target_doc_id UUID REFERENCES documents(id),
  relationship_type VARCHAR(50),
  confidence FLOAT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Find related documents
SELECT d.*
FROM documents d
JOIN document_relationships r ON d.id = r.target_doc_id
WHERE r.source_doc_id = $1
  AND r.relationship_type = 'similar'
  AND r.confidence > 0.8
ORDER BY r.confidence DESC;

Wiki / Documentation Site

Build a searchable wiki:
  • Store pages in PostgreSQL
  • Index content in Meilisearch
  • Generate embeddings for “related pages”
  • Use Qdrant for semantic “see also” suggestions

Example RAG Pipeline

Complete document processing pipeline:
#!/bin/bash
# RAG Pipeline: Ingest Document

DOC_PATH="/data/uploads/handbook.pdf"
DOC_ID=$(uuidgen)

# 1. Extract text from PDF
TEXT=$(pdftotext "$DOC_PATH" -)

# 2. Store metadata in PostgreSQL
psql -h postgresql -U postgres -d knowledge_base <<EOF
INSERT INTO documents (id, title, content_type, file_path)
VALUES (
  '$DOC_ID',
  'ML Handbook',
  'application/pdf',
  '$DOC_PATH'
);
EOF

# 3. Index in Meilisearch
curl -X POST "http://meilisearch:7700/indexes/documents/documents" \
  -H "Authorization: Bearer $MEILISEARCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[{
    "id": "'$DOC_ID'",
    "title": "ML Handbook",
    "content": "'$(echo $TEXT | jq -Rs .)'"
  }]'

# 4. Chunk text
CHUNKS=$(echo "$TEXT" | split -b 1000 -)

# 5. Generate embeddings and store in Qdrant
CHUNK_IDX=0
for CHUNK in $CHUNKS; do
  # Generate embedding with Ollama
  EMBEDDING=$(curl -s -X POST "http://ollama:11434/api/embed" \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"nomic-embed-text\", \"input\": [\"$CHUNK\"]}" \
    | jq -r '.embeddings[0]')
  
  # Store in Qdrant
  curl -X PUT "http://qdrant:6333/collections/documents/points" \
    -H "Content-Type: application/json" \
    -d "{
      \"points\": [{
        \"id\": $RANDOM,
        \"vector\": $EMBEDDING,
        \"payload\": {
          \"document_id\": \"$DOC_ID\",
          \"chunk_index\": $CHUNK_IDX,
          \"text\": \"$CHUNK\"
        }
      }]
    }"
  
  CHUNK_IDX=$((CHUNK_IDX + 1))
done

echo "Document $DOC_ID ingested successfully"

Configuration

Environment Variables

# Qdrant
QDRANT_HOST=qdrant
QDRANT_PORT=6333

# PostgreSQL
POSTGRES_HOST=postgresql
POSTGRES_PORT=5432
POSTGRES_DB=knowledge_base
POSTGRES_USER=postgres
POSTGRES_PASSWORD=<generated>

# Meilisearch
MEILISEARCH_HOST=meilisearch
MEILISEARCH_PORT=7700
MEILISEARCH_API_KEY=<generated>

Collection Patterns

Recommended structures: Qdrant Collections:
  • documents - Document chunk embeddings
  • questions - FAQ embeddings
  • code_snippets - Code example embeddings
PostgreSQL Tables:
  • documents - Document metadata
  • document_chunks - Chunk text and pointers
  • users - User accounts
  • permissions - Access control
Meilisearch Indexes:
  • documents - Full document content
  • users - User search
  • tags - Tag autocomplete

Memory Requirements

  • Qdrant: ~512 MB base + vector data
  • PostgreSQL: ~256 MB base + table data
  • Meilisearch: ~512 MB base + index data
Total: ~2 GB minimum (scales with data)

Performance Tips

Qdrant

  • Use appropriate vector dimensions (1536 for OpenAI, 384 for small models)
  • Create payload indexes on filtered fields
  • Batch operations when ingesting large datasets

PostgreSQL

  • Create indexes on frequently queried columns
  • Use JSONB for flexible metadata storage
  • Enable connection pooling (pgBouncer)

Meilisearch

  • Configure searchableAttributes to only necessary fields
  • Use filterableAttributes for faceted search
  • Set appropriate ranking rules for your use case

Hybrid Search Strategy

Combine all three for best results:
// 1. Keyword search (fast, exact matches)
const keywordResults = await meilisearch.search(query);

// 2. Semantic search (understands meaning)
const embedding = await generateEmbedding(query);
const semanticResults = await qdrant.search(embedding);

// 3. Merge and rank results
const combined = mergeResults(keywordResults, semanticResults);

// 4. Fetch full metadata from PostgreSQL
const enriched = await fetchMetadata(combined);

return enriched;

Next Steps

Local AI Pack

Add Ollama for generating embeddings

Research Agent Pack

Add web scraping and search

Build docs developers (and LLMs) love