RAG Setup (LanceDB)

ClinicalPilot uses LanceDB as its embedded vector database for Retrieval-Augmented Generation (RAG). This enables agents to cite medical literature and ground their reasoning in evidence.

Why LanceDB?

Serverless: No separate database server required
Embedded: Runs in-process with your application
Fast: Native vector search with disk-based persistence
Zero cost: No cloud infrastructure or API charges

Architecture

Query → Embeddings (all-MiniLM-L6-v2)
          ↓
      LanceDB Vector Search
          ↓
    Top-K Similar Documents
          ↓
    Literature Agent (GPT-4o-mini)

Installation

LanceDB is included in requirements.txt. No additional installation needed.

pip install lancedb sentence-transformers

Initialization

Auto-Initialization

LanceDB auto-creates the vector store on first run:

python -m uvicorn backend.main:app --reload

Database path: data/lancedb/ (configurable via LANCEDB_PATH in .env)

Manual Initialization

To explicitly initialize the database:

python -m backend.rag.lancedb_store --init

Output:

Initializing LanceDB...
✓ LanceDB initialized at /path/to/clinicalpilot/data/lancedb

Verify Setup

Check that the database was created:

ls -la data/lancedb/

You should see:

medical_knowledge.lance/

Embedding Model

ClinicalPilot uses all-MiniLM-L6-v2 from sentence-transformers:

Dimension: 384
Model size: ~80MB
Speed: ~1000 sentences/second on CPU
Quality: Optimized for semantic similarity

The embedding model downloads automatically on first use (~80MB). Subsequent runs use the cached model.

Ingesting Documents

From Directory

Ingest all .txt, .md, and .pdf files from a directory:

python -m backend.rag.lancedb_store --ingest /path/to/documents

Example:

# Create a directory for medical literature
mkdir -p data/rag_documents

# Add your medical PDFs, guidelines, etc.
cp ~/Downloads/clinical_guidelines.pdf data/rag_documents/

# Ingest
python -m backend.rag.lancedb_store --ingest data/rag_documents/

Output:

✓ Ingested 127 chunks from data/rag_documents/

Programmatic Ingestion

Add documents from Python code:

from backend.rag.lancedb_store import add_documents

texts = [
    "Aspirin 81mg daily reduces cardiovascular risk in patients with CAD.",
    "Beta-blockers are contraindicated in severe asthma.",
]

sources = ["AHA Guidelines 2024", "GINA Guidelines 2024"]
categories = ["cardiology", "pulmonology"]

count = add_documents(texts, sources, categories)
print(f"Added {count} documents")

Document Chunking

Documents are automatically split into 500-character chunks with 50-character overlap to preserve context.

Chunking ensures that embedding quality remains high and search results are granular. A 10-page PDF might generate 50-100 chunks.

Chunking Parameters

Edit backend/rag/lancedb_store.py to customize:

def _chunk_text(text: str, chunk_size: int = 500, overlap: int = 50):
    # Adjust chunk_size and overlap as needed
    ...

Searching the Vector Store

From Python

from backend.rag.lancedb_store import search

results = search(
    query="What are the contraindications for beta-blockers?",
    top_k=5
)

for result in results:
    print(f"Score: {result['score']:.3f}")
    print(f"Source: {result['source']}")
    print(f"Text: {result['text'][:100]}...\n")

From API

The Literature Agent automatically searches LanceDB when processing cases. No manual API calls needed.

Configuration

Environment Variables

# .env
LANCEDB_PATH=data/lancedb

Absolute Path

You can use an absolute path:

LANCEDB_PATH=/var/lib/clinicalpilot/lancedb

Relative paths are resolved from the project root.

Schema

LanceDB table schema:

Field	Type	Description
`text`	`str`	Document chunk text
`source`	`str`	Source file or reference
`category`	`str`	Category (e.g., “cardiology”, “pharmacology”)
`vector`	`list[float]`	384-dimensional embedding

Performance

Search Latency

10K documents: ~10-20ms
100K documents: ~50-100ms
1M documents: ~200-500ms

LanceDB uses disk-based indices, so performance scales well even with large datasets. RAM usage remains low (~100-200MB).

Embedding Latency

Single query: ~5-10ms (CPU)
Batch (100 docs): ~500ms (CPU)
Batch (1000 docs): ~5s (CPU)

Integration with Agents

The Literature Agent (backend/agents/literature.py) automatically queries LanceDB:

from backend.rag.lancedb_store import search

# Inside Literature Agent
rag_results = search(patient_context.chief_complaint, top_k=5)

# Agent uses RAG results + PubMed citations
prompt = f"""
Evidence from local knowledge base:
{rag_results}

Evidence from PubMed:
{pubmed_results}

Provide clinical recommendations...
"""

Best Practices

Do not commit the data/lancedb/ directory to Git. It can grow to several GB. Add it to .gitignore.

Avoiding Hallucinations

Cite sources: Always include source metadata so agents can reference guidelines
Keep chunks focused: Don’t mix unrelated topics in the same document
Update regularly: Medical knowledge changes — refresh your RAG store quarterly

Backup and Restore

Backup

tar -czf lancedb_backup.tar.gz data/lancedb/

Restore

tar -xzf lancedb_backup.tar.gz

Troubleshooting

”Table not found” Error

# Re-initialize the database
python -m backend.rag.lancedb_store --init

Search Returns No Results

Check that documents were ingested:

from backend.rag.lancedb_store import get_table

table = get_table()
print(f"Total documents: {table.count_rows()}")

Embedding Model Download Fails

# Pre-download the model
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"

”lancedb not installed” Warning

pip install lancedb --upgrade

Advanced: Custom Embeddings

To use OpenAI embeddings instead of sentence-transformers:

# backend/rag/embeddings.py
import openai

def embed_texts(texts: list[str]) -> list[list[float]]:
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return [e.embedding for e in response.data]

OpenAI embeddings have 1536 dimensions (vs 384 for all-MiniLM-L6-v2). You’ll need to re-create the LanceDB table and re-ingest all documents.

Advanced

Deployment

Why LanceDB?

Architecture

Installation

Initialization

Embedding Model

Ingesting Documents

From Directory

Programmatic Ingestion

Document Chunking

Chunking Parameters

Searching the Vector Store

From Python

From API

Configuration

Environment Variables

Absolute Path

Schema

Performance

Search Latency

Embedding Latency

Integration with Agents

Best Practices

Recommended Content

Avoiding Hallucinations

Backup and Restore

Backup

Restore

Troubleshooting

”Table not found” Error

Search Returns No Results

Embedding Model Download Fails

”lancedb not installed” Warning

Advanced: Custom Embeddings

Next Steps

Observability

Testing

Build docs developers (and LLMs) love

Advanced

Deployment

​Why LanceDB?

​Architecture

​Installation

​Initialization

​Embedding Model

​Ingesting Documents

​From Directory

​Programmatic Ingestion

​Document Chunking

​Chunking Parameters

​Searching the Vector Store

​From Python

​From API

​Configuration

​Environment Variables

​Absolute Path

​Schema

​Performance

​Search Latency

​Embedding Latency

​Integration with Agents

​Best Practices

​Recommended Content

​Avoiding Hallucinations

​Backup and Restore

​Backup

​Restore

​Troubleshooting

​”Table not found” Error

​Search Returns No Results

​Embedding Model Download Fails

​”lancedb not installed” Warning

​Advanced: Custom Embeddings

​Next Steps

Observability

Testing

Build docs developers (and LLMs) love

Why LanceDB?

Architecture

Installation

Initialization

Embedding Model

Ingesting Documents

From Directory

Programmatic Ingestion

Document Chunking

Chunking Parameters

Searching the Vector Store

From Python

From API

Configuration

Environment Variables

Absolute Path

Schema

Performance

Search Latency

Embedding Latency

Integration with Agents

Best Practices

Recommended Content

Avoiding Hallucinations

Backup and Restore

Backup

Restore

Troubleshooting

”Table not found” Error

Search Returns No Results

Embedding Model Download Fails

”lancedb not installed” Warning

Advanced: Custom Embeddings

Next Steps