Overview
Integrate SolVec with LangChain to build RAG (Retrieval-Augmented Generation) applications with on-chain verifiable vector storage. This guide shows how to create a custom LangChain vector store backed by SolVec.
Status: LangChain integration is planned but not yet published to npm. This guide shows the conceptual implementation and will be updated when the official package is released.
Why LangChain + SolVec?
Familiar API Drop-in replacement for Pinecone, Weaviate, or Chroma
On-Chain Provenance Every document store is verifiable on Solana
88% Cheaper ~8 / m o n t h v s 8/month vs 8/ m o n t h v s 70/month for Pinecone s1
Data Ownership Vectors encrypted with your wallet key
Installation
Install dependencies
npm install solvec@alpha langchain @langchain/openai
Set environment variables
export OPENAI_API_KEY = "sk-..."
export SOLANA_WALLET = "~/.config/solana/id.json" # optional
Custom Vector Store Implementation
1. Create SolVecStore Class
Implement LangChain’s VectorStore interface:
import { VectorStore } from '@langchain/core/vectorstores' ;
import { Document } from '@langchain/core/documents' ;
import { Embeddings } from '@langchain/core/embeddings' ;
import { SolVec , SolVecCollection } from 'solvec' ;
export class SolVecStore extends VectorStore {
private collection : SolVecCollection ;
constructor (
embeddings : Embeddings ,
config : {
collectionName : string ;
network ?: 'mainnet-beta' | 'devnet' | 'localnet' ;
walletPath ?: string ;
dimensions ?: number ;
}
) {
super ( embeddings , {});
const sv = new SolVec ({
network: config . network ?? 'devnet' ,
walletPath: config . walletPath ,
});
this . collection = sv . collection ( config . collectionName , {
dimensions: config . dimensions ?? 1536 ,
metric: 'cosine' ,
});
}
/** Add documents to the vector store */
async addDocuments (
documents : Document [],
options ?: { ids ?: string [] }
) : Promise < string []> {
const texts = documents . map ( doc => doc . pageContent );
const embeddings = await this . embeddings . embedDocuments ( texts );
const ids = options ?. ids ?? documents . map (
( _ , i ) => `doc_ ${ Date . now () } _ ${ i } `
);
await this . collection . upsert (
documents . map (( doc , i ) => ({
id: ids [ i ],
values: embeddings [ i ],
metadata: {
text: doc . pageContent ,
... doc . metadata ,
},
}))
);
return ids ;
}
/** Add vectors directly */
async addVectors (
vectors : number [][],
documents : Document [],
options ?: { ids ?: string [] }
) : Promise < string []> {
const ids = options ?. ids ?? documents . map (
( _ , i ) => `vec_ ${ Date . now () } _ ${ i } `
);
await this . collection . upsert (
documents . map (( doc , i ) => ({
id: ids [ i ],
values: vectors [ i ],
metadata: {
text: doc . pageContent ,
... doc . metadata ,
},
}))
);
return ids ;
}
/** Similarity search */
async similaritySearchVectorWithScore (
query : number [],
k : number ,
filter ?: Record < string , unknown >
) : Promise <[ Document , number ][]> {
const results = await this . collection . query ({
vector: query ,
topK: k ,
filter ,
includeMetadata: true ,
});
return results . matches . map ( match => [
new Document ({
pageContent: match . metadata ?. text as string ,
metadata: match . metadata ?? {},
}),
match . score ,
]);
}
/** Delete documents */
async delete ( params : { ids : string [] }) : Promise < void > {
await this . collection . delete ( params . ids );
}
/** Verify collection integrity against on-chain root */
async verify () {
return await this . collection . verify ();
}
_vectorstoreType () : string {
return 'solvec' ;
}
}
Usage Examples
Basic RAG Pipeline
import { SolVecStore } from './solvec-langchain' ; // your implementation
import { OpenAIEmbeddings , ChatOpenAI } from '@langchain/openai' ;
import { RetrievalQAChain } from 'langchain/chains' ;
import { Document } from '@langchain/core/documents' ;
// 1. Initialize vector store
const vectorStore = new SolVecStore (
new OpenAIEmbeddings ({ model: 'text-embedding-3-small' }),
{
collectionName: 'knowledge-base' ,
network: 'devnet' ,
dimensions: 1536 ,
}
);
// 2. Add documents
const docs = [
new Document ({
pageContent: 'VecLabs is a decentralized vector database built on Solana.' ,
metadata: { source: 'docs' , section: 'intro' },
}),
new Document ({
pageContent: 'SolVec provides sub-5ms query latency with Rust HNSW.' ,
metadata: { source: 'docs' , section: 'performance' },
}),
];
await vectorStore . addDocuments ( docs );
// 3. Create RAG chain
const chain = RetrievalQAChain . fromLLM (
new ChatOpenAI ({ model: 'gpt-4o-mini' }),
vectorStore . asRetriever ({ k: 3 })
);
// 4. Query
const response = await chain . invoke ({
query: 'What is VecLabs and how fast is it?' ,
});
console . log ( response . text );
// "VecLabs is a decentralized vector database with sub-5ms query latency..."
Document Loading Pipeline
import { TextLoader } from 'langchain/document_loaders/fs/text' ;
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter' ;
// 1. Load documents
const loader = new TextLoader ( './data/documentation.txt' );
const rawDocs = await loader . load ();
// 2. Split into chunks
const splitter = new RecursiveCharacterTextSplitter ({
chunkSize: 1000 ,
chunkOverlap: 200 ,
});
const chunks = await splitter . splitDocuments ( rawDocs );
console . log ( `Split into ${ chunks . length } chunks` );
// 3. Store in SolVec
await vectorStore . addDocuments ( chunks );
// 4. Verify on-chain
const proof = await vectorStore . verify ();
console . log ( 'Explorer:' , proof . solanaExplorerUrl );
Conversational RAG with Memory
import { ConversationalRetrievalQAChain } from 'langchain/chains' ;
import { BufferMemory } from 'langchain/memory' ;
const memory = new BufferMemory ({
memoryKey: 'chat_history' ,
returnMessages: true ,
});
const conversationalChain = ConversationalRetrievalQAChain . fromLLM (
new ChatOpenAI ({ model: 'gpt-4o-mini' }),
vectorStore . asRetriever ({ k: 5 }),
{ memory }
);
// Multi-turn conversation
const res1 = await conversationalChain . invoke ({
question: 'What is VecLabs?' ,
});
console . log ( 'Q1:' , res1 . text );
const res2 = await conversationalChain . invoke ({
question: 'How does it compare to Pinecone?' ,
});
console . log ( 'Q2:' , res2 . text );
// Context from previous question is maintained
// Add documents with rich metadata
await vectorStore . addDocuments ([
new Document ({
pageContent: 'VecLabs architecture uses three layers...' ,
metadata: {
category: 'architecture' ,
version: '0.1.0' ,
author: 'VecLabs Team' ,
},
}),
new Document ({
pageContent: 'API reference for upsert()...' ,
metadata: {
category: 'api' ,
version: '0.1.0' ,
author: 'VecLabs Team' ,
},
}),
]);
// Query with filters
const retriever = vectorStore . asRetriever ({
k: 5 ,
filter: { category: 'architecture' }, // only architecture docs
});
const results = await retriever . getRelevantDocuments (
'How is the system designed?'
);
Comparison: Pinecone vs SolVec
Migration from Pinecone is straightforward:
import { PineconeStore } from '@langchain/pinecone' ;
import { Pinecone } from '@pinecone-database/pinecone' ;
const pinecone = new Pinecone ({ apiKey: process . env . PINECONE_API_KEY });
const index = pinecone . Index ( 'my-index' );
const vectorStore = await PineconeStore . fromExistingIndex (
new OpenAIEmbeddings (),
{ pineconeIndex: index }
);
Verify Document Integrity
Unique to SolVec — verify that your knowledge base hasn’t been tampered with:
const proof = await vectorStore . verify ();
if ( proof . verified ) {
console . log ( '✓ Knowledge base verified on-chain' );
console . log ( ' Documents:' , proof . vectorCount );
console . log ( ' Merkle Root:' , proof . localRoot . slice ( 0 , 16 ) + '...' );
console . log ( ' Explorer:' , proof . solanaExplorerUrl );
} else {
console . error ( '✗ Verification failed! Data may be corrupted.' );
}
Operation SolVec Pinecone Improvement Similarity search (p99) 4.3ms ~25ms 5.8x faster Batch upsert (1000 docs) ~180ms ~450ms 2.5x faster Cold start query 1.9ms ~8ms 4.2x faster
Measured on Apple M2, 16GB RAM. 100K vectors, 1536 dimensions (OpenAI embeddings).
Next Steps
RAG Application Full RAG example with document chunking and streaming
AI Agent Memory Build an agent with persistent, verifiable memory
API Reference Complete TypeScript SDK documentation
GitHub View source code and contribute
Production Checklist
Enable Shadow Drive persistence
Currently in development — vectors are in-memory in alpha.
Add error handling
Wrap all vector operations in try/catch for network failures.
Implement pagination
For large document sets, batch upserts in groups of 100-500.
Monitor on-chain state
Verify Merkle root periodically to detect data corruption.