RAG Application

Overview

This guide walks through building a production-ready RAG (Retrieval-Augmented Generation) application using SolVec. You’ll learn how to:

Load and chunk documents
Generate embeddings
Store vectors in SolVec
Retrieve relevant context
Generate responses with an LLM
Verify data integrity on-chain

RAG combines the precision of semantic search with the reasoning of large language models. SolVec adds a third dimension: cryptographic verifiability of your knowledge base.

Architecture

Prerequisites

Node.js 18+ or Python 3.10+
OpenAI API key
Solana wallet (optional, for verification)

Installation

npm install solvec@alpha openai

Step-by-Step Implementation

1. Document Loading & Chunking

import * as fs from 'fs';

interface Chunk {
  id: string;
  text: string;
  metadata: {
    source: string;
    chunkIndex: number;
    totalChunks: number;
  };
}

function chunkDocument(
  text: string,
  chunkSize: number = 1000,
  overlap: number = 200
): Chunk[] {
  const chunks: Chunk[] = [];
  let start = 0;
  let chunkIndex = 0;

  while (start < text.length) {
    const end = Math.min(start + chunkSize, text.length);
    const chunkText = text.slice(start, end);

    chunks.push({
      id: `chunk_${chunkIndex}`,
      text: chunkText,
      metadata: {
        source: 'documentation',
        chunkIndex,
        totalChunks: 0,  // will update
      },
    });

    start += chunkSize - overlap;
    chunkIndex++;
  }

  // Update totalChunks
  chunks.forEach(chunk => {
    chunk.metadata.totalChunks = chunks.length;
  });

  return chunks;
}

// Load document
const docText = fs.readFileSync('./data/veclabs-docs.txt', 'utf-8');
const chunks = chunkDocument(docText, 1000, 200);
console.log(`Split into ${chunks.length} chunks`);

2. Generate Embeddings

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function embedChunks(chunks: Chunk[]): Promise<number[][]> {
  const texts = chunks.map(c => c.text);

  // Batch embed for efficiency (max 2048 per request)
  const BATCH_SIZE = 2048;
  const embeddings: number[][] = [];

  for (let i = 0; i < texts.length; i += BATCH_SIZE) {
    const batch = texts.slice(i, i + BATCH_SIZE);

    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',  // 1536 dims
      input: batch,
    });

    embeddings.push(...response.data.map(e => e.embedding));
    console.log(`Embedded ${embeddings.length}/${texts.length} chunks`);
  }

  return embeddings;
}

const embeddings = await embedChunks(chunks);

3. Store in SolVec

import { SolVec } from 'solvec';

const sv = new SolVec({
  network: 'devnet',
  walletPath: process.env.SOLANA_WALLET,  // optional
});

const collection = sv.collection('rag-knowledge-base', {
  dimensions: 1536,
  metric: 'cosine',
});

// Upsert chunks with embeddings
await collection.upsert(
  chunks.map((chunk, i) => ({
    id: chunk.id,
    values: embeddings[i],
    metadata: {
      text: chunk.text,
      ...chunk.metadata,
    },
  }))
);

console.log(`Stored ${chunks.length} chunks in SolVec`);

// Verify on-chain
const proof = await collection.verify();
console.log('Explorer:', proof.solanaExplorerUrl);

4. Retrieval Function

async function retrieve(
  query: string,
  topK: number = 5
): Promise<string[]> {
  // Embed query
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });

  const queryEmbedding = response.data[0].embedding;

  // Search SolVec
  const results = await collection.query({
    vector: queryEmbedding,
    topK,
    includeMetadata: true,
  });

  // Extract text from top matches
  return results.matches.map(
    match => match.metadata?.text as string
  );
}

5. RAG Answer Generation

async function answerQuestion(question: string): Promise<string> {
  // 1. Retrieve relevant context
  const context = await retrieve(question, 3);

  // 2. Build prompt
  const systemPrompt = `You are a helpful AI assistant. 
Answer the question based on the following context.

Context:
${context.map((c, i) => `[${i + 1}] ${c}`).join('\n\n')}

If the context doesn't contain enough information, say so.`;

  // 3. Generate answer
  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: question },
    ],
    temperature: 0.3,
  });

  return completion.choices[0].message.content || 'No answer generated.';
}

// Example usage
const answer = await answerQuestion(
  'What makes VecLabs different from Pinecone?'
);
console.log('Answer:', answer);

Full Working Example

import { SolVec } from 'solvec';
import OpenAI from 'openai';
import * as fs from 'fs';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const sv = new SolVec({ network: 'devnet' });
const collection = sv.collection('docs', { dimensions: 1536 });

async function buildRAG() {
  // 1. Load & chunk document
  const text = fs.readFileSync('./veclabs-README.md', 'utf-8');
  const chunks = chunkDocument(text, 1000, 200);
  console.log(`✓ Split into ${chunks.length} chunks`);

  // 2. Embed chunks
  const embeddings = await embedChunks(chunks);
  console.log(`✓ Generated ${embeddings.length} embeddings`);

  // 3. Store in SolVec
  await collection.upsert(
    chunks.map((chunk, i) => ({
      id: chunk.id,
      values: embeddings[i],
      metadata: { text: chunk.text },
    }))
  );
  console.log(`✓ Stored in SolVec`);

  // 4. Verify on-chain
  const proof = await collection.verify();
  console.log(`✓ Verified: ${proof.solanaExplorerUrl}`);
}

async function queryRAG() {
  const questions = [
    'What is VecLabs?',
    'How fast is query latency?',
    'How does verification work?',
  ];

  for (const q of questions) {
    console.log(`\nQ: ${q}`);
    const answer = await answerQuestion(q);
    console.log(`A: ${answer}`);
  }
}

// Run
await buildRAG();
await queryRAG();

Expected Output:

✓ Split into 47 chunks
✓ Generated 47 embeddings
[SolVec] Upserted 47 vectors to collection 'docs'
✓ Stored in SolVec
✓ Verified: https://explorer.solana.com/address/8iLpy...?cluster=devnet

Q: What is VecLabs?
A: VecLabs is a decentralized vector database built on Solana. It provides
sub-5ms query latency using a Rust HNSW implementation, with encrypted storage
and on-chain verification of data integrity.

Q: How fast is query latency?
A: VecLabs achieves sub-5ms p99 latency at 100K vectors, with p50 at 1.9ms
and p95 at 2.8ms. This is 5-6x faster than Pinecone and Weaviate.

Q: How does verification work?
A: After every write, SolVec computes a SHA-256 Merkle root of all vector IDs
and posts it to Solana. Anyone can verify that the local collection matches
the on-chain root, proving the data hasn't been tampered with.

Advanced Features

Metadata Filtering

Filter results by document properties:

const results = await collection.query({
  vector: queryEmbedding,
  topK: 5,
  filter: {
    source: 'documentation',
    version: '0.1.0',
  },
});

Incremental Updates

Add new documents without rebuilding the entire index:

// Add new chunks
const newChunks = chunkDocument(newDocument, 1000, 200);
const newEmbeddings = await embedChunks(newChunks);

await collection.upsert(
  newChunks.map((chunk, i) => ({
    id: `new_${chunk.id}`,
    values: newEmbeddings[i],
    metadata: { text: chunk.text },
  }))
);

// Verify updated state
const proof = await collection.verify();
console.log('Updated root:', proof.localRoot.slice(0, 16) + '...');

Streaming Responses

async function streamAnswer(question: string) {
  const context = await retrieve(question, 3);

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: buildPrompt(context) },
      { role: 'user', content: question },
    ],
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

Performance Benchmarks

Operation	Time (ms)	Notes
Chunk 100KB doc	~15ms	RecursiveCharacterTextSplitter
Embed 100 chunks	~1200ms	OpenAI text-embedding-3-small
Upsert 100 vectors	~45ms	In-memory HNSW
Query (top-5)	1.9ms	p50 latency
Full RAG pipeline	~1500ms	Including LLM generation

Measured on Apple M2, 16GB RAM.

Next Steps

AI Agent Memory

Add persistent memory to conversational agents

LangChain Integration

Use SolVec with LangChain for advanced RAG

API Reference

Complete SDK documentation

Verification Guide

Learn how on-chain verification works

Production Checklist

Alpha limitations: Shadow Drive persistence is in progress. Vectors are currently stored in-memory.

Enable persistence

Wait for Shadow Drive integration (coming soon) or implement custom persistence.

Optimize chunking

Experiment with chunk size (500-2000) and overlap (10-20%) for your use case.

Add error handling

Wrap all API calls in try/catch for network failures and rate limits.

Monitor verification

Periodically call collection.verify() to detect data corruption.

Implement caching

Cache embeddings and query results to reduce API costs.

For large document collections (>10K chunks), batch upserts in groups of 500-1000 vectors for optimal performance.

Get Started

Core Concepts

Guides

Migration

Examples

RAG Application

Overview

Architecture

Prerequisites

Installation

Step-by-Step Implementation

1. Document Loading & Chunking

2. Generate Embeddings

3. Store in SolVec

4. Retrieval Function

5. RAG Answer Generation

Full Working Example

Advanced Features

Metadata Filtering

Incremental Updates

Streaming Responses

Performance Benchmarks

Next Steps

AI Agent Memory

LangChain Integration

API Reference

Verification Guide

Production Checklist

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Migration

Examples

​Overview

​Architecture

​Prerequisites

​Installation

​Step-by-Step Implementation

​1. Document Loading & Chunking

​2. Generate Embeddings

​3. Store in SolVec

​4. Retrieval Function

​5. RAG Answer Generation

​Full Working Example

​Advanced Features

​Metadata Filtering

​Incremental Updates

​Streaming Responses

​Performance Benchmarks

​Next Steps

AI Agent Memory

LangChain Integration

API Reference

Verification Guide

​Production Checklist

Build docs developers (and LLMs) love

Overview

Architecture

Prerequisites

Installation

Step-by-Step Implementation

1. Document Loading & Chunking

2. Generate Embeddings

3. Store in SolVec

4. Retrieval Function

5. RAG Answer Generation

Full Working Example

Advanced Features

Metadata Filtering

Incremental Updates

Streaming Responses

Performance Benchmarks

Next Steps

Production Checklist