Skip to main content

Overview

Integrate SolVec with LangChain to build RAG (Retrieval-Augmented Generation) applications with on-chain verifiable vector storage. This guide shows how to create a custom LangChain vector store backed by SolVec.
Status: LangChain integration is planned but not yet published to npm. This guide shows the conceptual implementation and will be updated when the official package is released.

Why LangChain + SolVec?

Familiar API

Drop-in replacement for Pinecone, Weaviate, or Chroma

On-Chain Provenance

Every document store is verifiable on Solana

88% Cheaper

~8/monthvs8/month vs 70/month for Pinecone s1

Data Ownership

Vectors encrypted with your wallet key

Installation

1

Install dependencies

npm install solvec@alpha langchain @langchain/openai
2

Set environment variables

export OPENAI_API_KEY="sk-..."
export SOLANA_WALLET="~/.config/solana/id.json"  # optional

Custom Vector Store Implementation

1. Create SolVecStore Class

Implement LangChain’s VectorStore interface:
import { VectorStore } from '@langchain/core/vectorstores';
import { Document } from '@langchain/core/documents';
import { Embeddings } from '@langchain/core/embeddings';
import { SolVec, SolVecCollection } from 'solvec';

export class SolVecStore extends VectorStore {
  private collection: SolVecCollection;

  constructor(
    embeddings: Embeddings,
    config: {
      collectionName: string;
      network?: 'mainnet-beta' | 'devnet' | 'localnet';
      walletPath?: string;
      dimensions?: number;
    }
  ) {
    super(embeddings, {});

    const sv = new SolVec({
      network: config.network ?? 'devnet',
      walletPath: config.walletPath,
    });

    this.collection = sv.collection(config.collectionName, {
      dimensions: config.dimensions ?? 1536,
      metric: 'cosine',
    });
  }

  /** Add documents to the vector store */
  async addDocuments(
    documents: Document[],
    options?: { ids?: string[] }
  ): Promise<string[]> {
    const texts = documents.map(doc => doc.pageContent);
    const embeddings = await this.embeddings.embedDocuments(texts);

    const ids = options?.ids ?? documents.map(
      (_, i) => `doc_${Date.now()}_${i}`
    );

    await this.collection.upsert(
      documents.map((doc, i) => ({
        id: ids[i],
        values: embeddings[i],
        metadata: {
          text: doc.pageContent,
          ...doc.metadata,
        },
      }))
    );

    return ids;
  }

  /** Add vectors directly */
  async addVectors(
    vectors: number[][],
    documents: Document[],
    options?: { ids?: string[] }
  ): Promise<string[]> {
    const ids = options?.ids ?? documents.map(
      (_, i) => `vec_${Date.now()}_${i}`
    );

    await this.collection.upsert(
      documents.map((doc, i) => ({
        id: ids[i],
        values: vectors[i],
        metadata: {
          text: doc.pageContent,
          ...doc.metadata,
        },
      }))
    );

    return ids;
  }

  /** Similarity search */
  async similaritySearchVectorWithScore(
    query: number[],
    k: number,
    filter?: Record<string, unknown>
  ): Promise<[Document, number][]> {
    const results = await this.collection.query({
      vector: query,
      topK: k,
      filter,
      includeMetadata: true,
    });

    return results.matches.map(match => [
      new Document({
        pageContent: match.metadata?.text as string,
        metadata: match.metadata ?? {},
      }),
      match.score,
    ]);
  }

  /** Delete documents */
  async delete(params: { ids: string[] }): Promise<void> {
    await this.collection.delete(params.ids);
  }

  /** Verify collection integrity against on-chain root */
  async verify() {
    return await this.collection.verify();
  }

  _vectorstoreType(): string {
    return 'solvec';
  }
}

Usage Examples

Basic RAG Pipeline

import { SolVecStore } from './solvec-langchain';  // your implementation
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
import { RetrievalQAChain } from 'langchain/chains';
import { Document } from '@langchain/core/documents';

// 1. Initialize vector store
const vectorStore = new SolVecStore(
  new OpenAIEmbeddings({ model: 'text-embedding-3-small' }),
  {
    collectionName: 'knowledge-base',
    network: 'devnet',
    dimensions: 1536,
  }
);

// 2. Add documents
const docs = [
  new Document({
    pageContent: 'VecLabs is a decentralized vector database built on Solana.',
    metadata: { source: 'docs', section: 'intro' },
  }),
  new Document({
    pageContent: 'SolVec provides sub-5ms query latency with Rust HNSW.',
    metadata: { source: 'docs', section: 'performance' },
  }),
];

await vectorStore.addDocuments(docs);

// 3. Create RAG chain
const chain = RetrievalQAChain.fromLLM(
  new ChatOpenAI({ model: 'gpt-4o-mini' }),
  vectorStore.asRetriever({ k: 3 })
);

// 4. Query
const response = await chain.invoke({
  query: 'What is VecLabs and how fast is it?',
});

console.log(response.text);
// "VecLabs is a decentralized vector database with sub-5ms query latency..."

Document Loading Pipeline

import { TextLoader } from 'langchain/document_loaders/fs/text';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

// 1. Load documents
const loader = new TextLoader('./data/documentation.txt');
const rawDocs = await loader.load();

// 2. Split into chunks
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const chunks = await splitter.splitDocuments(rawDocs);
console.log(`Split into ${chunks.length} chunks`);

// 3. Store in SolVec
await vectorStore.addDocuments(chunks);

// 4. Verify on-chain
const proof = await vectorStore.verify();
console.log('Explorer:', proof.solanaExplorerUrl);

Conversational RAG with Memory

import { ConversationalRetrievalQAChain } from 'langchain/chains';
import { BufferMemory } from 'langchain/memory';

const memory = new BufferMemory({
  memoryKey: 'chat_history',
  returnMessages: true,
});

const conversationalChain = ConversationalRetrievalQAChain.fromLLM(
  new ChatOpenAI({ model: 'gpt-4o-mini' }),
  vectorStore.asRetriever({ k: 5 }),
  { memory }
);

// Multi-turn conversation
const res1 = await conversationalChain.invoke({
  question: 'What is VecLabs?',
});
console.log('Q1:', res1.text);

const res2 = await conversationalChain.invoke({
  question: 'How does it compare to Pinecone?',
});
console.log('Q2:', res2.text);
// Context from previous question is maintained

Metadata Filtering

// Add documents with rich metadata
await vectorStore.addDocuments([
  new Document({
    pageContent: 'VecLabs architecture uses three layers...',
    metadata: {
      category: 'architecture',
      version: '0.1.0',
      author: 'VecLabs Team',
    },
  }),
  new Document({
    pageContent: 'API reference for upsert()...',
    metadata: {
      category: 'api',
      version: '0.1.0',
      author: 'VecLabs Team',
    },
  }),
]);

// Query with filters
const retriever = vectorStore.asRetriever({
  k: 5,
  filter: { category: 'architecture' },  // only architecture docs
});

const results = await retriever.getRelevantDocuments(
  'How is the system designed?'
);

Comparison: Pinecone vs SolVec

Migration from Pinecone is straightforward:
import { PineconeStore } from '@langchain/pinecone';
import { Pinecone } from '@pinecone-database/pinecone';

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.Index('my-index');

const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex: index }
);

Verify Document Integrity

Unique to SolVec — verify that your knowledge base hasn’t been tampered with:
const proof = await vectorStore.verify();

if (proof.verified) {
  console.log('✓ Knowledge base verified on-chain');
  console.log('  Documents:', proof.vectorCount);
  console.log('  Merkle Root:', proof.localRoot.slice(0, 16) + '...');
  console.log('  Explorer:', proof.solanaExplorerUrl);
} else {
  console.error('✗ Verification failed! Data may be corrupted.');
}

Performance Benchmarks

OperationSolVecPineconeImprovement
Similarity search (p99)4.3ms~25ms5.8x faster
Batch upsert (1000 docs)~180ms~450ms2.5x faster
Cold start query1.9ms~8ms4.2x faster
Measured on Apple M2, 16GB RAM. 100K vectors, 1536 dimensions (OpenAI embeddings).

Next Steps

RAG Application

Full RAG example with document chunking and streaming

AI Agent Memory

Build an agent with persistent, verifiable memory

API Reference

Complete TypeScript SDK documentation

GitHub

View source code and contribute

Production Checklist

1

Enable Shadow Drive persistence

Currently in development — vectors are in-memory in alpha.
2

Add error handling

Wrap all vector operations in try/catch for network failures.
3

Implement pagination

For large document sets, batch upserts in groups of 100-500.
4

Monitor on-chain state

Verify Merkle root periodically to detect data corruption.
Join the VecLabs Discord for LangChain integration updates and support.

Build docs developers (and LLMs) love