Skip to main content

Overview

This guide walks through building a production-ready RAG (Retrieval-Augmented Generation) application using SolVec. You’ll learn how to:
  • Load and chunk documents
  • Generate embeddings
  • Store vectors in SolVec
  • Retrieve relevant context
  • Generate responses with an LLM
  • Verify data integrity on-chain
RAG combines the precision of semantic search with the reasoning of large language models. SolVec adds a third dimension: cryptographic verifiability of your knowledge base.

Architecture


Prerequisites

  • Node.js 18+ or Python 3.10+
  • OpenAI API key
  • Solana wallet (optional, for verification)

Installation

npm install solvec@alpha openai

Step-by-Step Implementation

1. Document Loading & Chunking

import * as fs from 'fs';

interface Chunk {
  id: string;
  text: string;
  metadata: {
    source: string;
    chunkIndex: number;
    totalChunks: number;
  };
}

function chunkDocument(
  text: string,
  chunkSize: number = 1000,
  overlap: number = 200
): Chunk[] {
  const chunks: Chunk[] = [];
  let start = 0;
  let chunkIndex = 0;

  while (start < text.length) {
    const end = Math.min(start + chunkSize, text.length);
    const chunkText = text.slice(start, end);

    chunks.push({
      id: `chunk_${chunkIndex}`,
      text: chunkText,
      metadata: {
        source: 'documentation',
        chunkIndex,
        totalChunks: 0,  // will update
      },
    });

    start += chunkSize - overlap;
    chunkIndex++;
  }

  // Update totalChunks
  chunks.forEach(chunk => {
    chunk.metadata.totalChunks = chunks.length;
  });

  return chunks;
}

// Load document
const docText = fs.readFileSync('./data/veclabs-docs.txt', 'utf-8');
const chunks = chunkDocument(docText, 1000, 200);
console.log(`Split into ${chunks.length} chunks`);

2. Generate Embeddings

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function embedChunks(chunks: Chunk[]): Promise<number[][]> {
  const texts = chunks.map(c => c.text);

  // Batch embed for efficiency (max 2048 per request)
  const BATCH_SIZE = 2048;
  const embeddings: number[][] = [];

  for (let i = 0; i < texts.length; i += BATCH_SIZE) {
    const batch = texts.slice(i, i + BATCH_SIZE);

    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',  // 1536 dims
      input: batch,
    });

    embeddings.push(...response.data.map(e => e.embedding));
    console.log(`Embedded ${embeddings.length}/${texts.length} chunks`);
  }

  return embeddings;
}

const embeddings = await embedChunks(chunks);

3. Store in SolVec

import { SolVec } from 'solvec';

const sv = new SolVec({
  network: 'devnet',
  walletPath: process.env.SOLANA_WALLET,  // optional
});

const collection = sv.collection('rag-knowledge-base', {
  dimensions: 1536,
  metric: 'cosine',
});

// Upsert chunks with embeddings
await collection.upsert(
  chunks.map((chunk, i) => ({
    id: chunk.id,
    values: embeddings[i],
    metadata: {
      text: chunk.text,
      ...chunk.metadata,
    },
  }))
);

console.log(`Stored ${chunks.length} chunks in SolVec`);

// Verify on-chain
const proof = await collection.verify();
console.log('Explorer:', proof.solanaExplorerUrl);

4. Retrieval Function

async function retrieve(
  query: string,
  topK: number = 5
): Promise<string[]> {
  // Embed query
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });

  const queryEmbedding = response.data[0].embedding;

  // Search SolVec
  const results = await collection.query({
    vector: queryEmbedding,
    topK,
    includeMetadata: true,
  });

  // Extract text from top matches
  return results.matches.map(
    match => match.metadata?.text as string
  );
}

5. RAG Answer Generation

async function answerQuestion(question: string): Promise<string> {
  // 1. Retrieve relevant context
  const context = await retrieve(question, 3);

  // 2. Build prompt
  const systemPrompt = `You are a helpful AI assistant. 
Answer the question based on the following context.

Context:
${context.map((c, i) => `[${i + 1}] ${c}`).join('\n\n')}

If the context doesn't contain enough information, say so.`;

  // 3. Generate answer
  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: question },
    ],
    temperature: 0.3,
  });

  return completion.choices[0].message.content || 'No answer generated.';
}

// Example usage
const answer = await answerQuestion(
  'What makes VecLabs different from Pinecone?'
);
console.log('Answer:', answer);

Full Working Example

import { SolVec } from 'solvec';
import OpenAI from 'openai';
import * as fs from 'fs';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const sv = new SolVec({ network: 'devnet' });
const collection = sv.collection('docs', { dimensions: 1536 });

async function buildRAG() {
  // 1. Load & chunk document
  const text = fs.readFileSync('./veclabs-README.md', 'utf-8');
  const chunks = chunkDocument(text, 1000, 200);
  console.log(`✓ Split into ${chunks.length} chunks`);

  // 2. Embed chunks
  const embeddings = await embedChunks(chunks);
  console.log(`✓ Generated ${embeddings.length} embeddings`);

  // 3. Store in SolVec
  await collection.upsert(
    chunks.map((chunk, i) => ({
      id: chunk.id,
      values: embeddings[i],
      metadata: { text: chunk.text },
    }))
  );
  console.log(`✓ Stored in SolVec`);

  // 4. Verify on-chain
  const proof = await collection.verify();
  console.log(`✓ Verified: ${proof.solanaExplorerUrl}`);
}

async function queryRAG() {
  const questions = [
    'What is VecLabs?',
    'How fast is query latency?',
    'How does verification work?',
  ];

  for (const q of questions) {
    console.log(`\nQ: ${q}`);
    const answer = await answerQuestion(q);
    console.log(`A: ${answer}`);
  }
}

// Run
await buildRAG();
await queryRAG();
Expected Output:
✓ Split into 47 chunks
✓ Generated 47 embeddings
[SolVec] Upserted 47 vectors to collection 'docs'
✓ Stored in SolVec
✓ Verified: https://explorer.solana.com/address/8iLpy...?cluster=devnet

Q: What is VecLabs?
A: VecLabs is a decentralized vector database built on Solana. It provides
sub-5ms query latency using a Rust HNSW implementation, with encrypted storage
and on-chain verification of data integrity.

Q: How fast is query latency?
A: VecLabs achieves sub-5ms p99 latency at 100K vectors, with p50 at 1.9ms
and p95 at 2.8ms. This is 5-6x faster than Pinecone and Weaviate.

Q: How does verification work?
A: After every write, SolVec computes a SHA-256 Merkle root of all vector IDs
and posts it to Solana. Anyone can verify that the local collection matches
the on-chain root, proving the data hasn't been tampered with.

Advanced Features

Metadata Filtering

Filter results by document properties:
const results = await collection.query({
  vector: queryEmbedding,
  topK: 5,
  filter: {
    source: 'documentation',
    version: '0.1.0',
  },
});

Incremental Updates

Add new documents without rebuilding the entire index:
// Add new chunks
const newChunks = chunkDocument(newDocument, 1000, 200);
const newEmbeddings = await embedChunks(newChunks);

await collection.upsert(
  newChunks.map((chunk, i) => ({
    id: `new_${chunk.id}`,
    values: newEmbeddings[i],
    metadata: { text: chunk.text },
  }))
);

// Verify updated state
const proof = await collection.verify();
console.log('Updated root:', proof.localRoot.slice(0, 16) + '...');

Streaming Responses

async function streamAnswer(question: string) {
  const context = await retrieve(question, 3);

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: buildPrompt(context) },
      { role: 'user', content: question },
    ],
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || '');
  }
}

Performance Benchmarks

OperationTime (ms)Notes
Chunk 100KB doc~15msRecursiveCharacterTextSplitter
Embed 100 chunks~1200msOpenAI text-embedding-3-small
Upsert 100 vectors~45msIn-memory HNSW
Query (top-5)1.9msp50 latency
Full RAG pipeline~1500msIncluding LLM generation
Measured on Apple M2, 16GB RAM.

Next Steps

AI Agent Memory

Add persistent memory to conversational agents

LangChain Integration

Use SolVec with LangChain for advanced RAG

API Reference

Complete SDK documentation

Verification Guide

Learn how on-chain verification works

Production Checklist

Alpha limitations: Shadow Drive persistence is in progress. Vectors are currently stored in-memory.
1

Enable persistence

Wait for Shadow Drive integration (coming soon) or implement custom persistence.
2

Optimize chunking

Experiment with chunk size (500-2000) and overlap (10-20%) for your use case.
3

Add error handling

Wrap all API calls in try/catch for network failures and rate limits.
4

Monitor verification

Periodically call collection.verify() to detect data corruption.
5

Implement caching

Cache embeddings and query results to reduce API costs.
For large document collections (>10K chunks), batch upserts in groups of 500-1000 vectors for optimal performance.

Build docs developers (and LLMs) love