Skip to main content

Overview

The SolVecCollection class represents a single vector collection. The API is intentionally identical to Pinecone’s Index class for easy migration. You don’t instantiate this class directly. Instead, use SolVec.collection() to get or create a collection.

Constructor

You should not instantiate SolVecCollection directly. Use sv.collection() instead.
# Internal constructor signature (for reference only)
SolVecCollection(
    name: str,
    dimensions: int,
    metric: DistanceMetric,
    network: str,
    wallet_path: Optional[str] = None
)

Methods

upsert()

Insert or update vectors in the collection. If a vector with the same ID exists, it will be updated.
response = col.upsert([
    {
        "id": "mem_001",
        "values": [0.1, 0.2, 0.3, ...],
        "metadata": {"text": "User likes coffee"}
    }
])

Parameters

vectors
list[dict] | list[UpsertRecord]
required
List of vectors to upsert. Each vector can be either:Dictionary format:
  • id (str, required) - Unique identifier for the vector
  • values (list[float], required) - The embedding vector
  • metadata (dict, optional) - Additional metadata to store with the vector
Or use UpsertRecord dataclass:
from solvec.types import UpsertRecord

record = UpsertRecord(
    id="mem_001",
    values=[0.1, 0.2, 0.3],
    metadata={"text": "hello"}
)

Returns

response
UpsertResponse
Object containing:
  • upserted_count (int) - Number of vectors successfully upserted

Raises

  • ValueError - If vector dimensions don’t match the collection’s dimensions

Examples

from solvec import SolVec

sv = SolVec(network="devnet")
col = sv.collection("memories", dimensions=3)

response = col.upsert([
    {
        "id": "vec_1",
        "values": [1.0, 0.0, 0.0],
        "metadata": {"text": "first memory"}
    },
    {
        "id": "vec_2",
        "values": [0.0, 1.0, 0.0],
        "metadata": {"text": "second memory"}
    }
])

print(f"Upserted {response.upserted_count} vectors")

query()

Query for nearest neighbors using vector similarity search.
results = col.query(
    vector=[0.1, 0.2, 0.3, ...],
    top_k=10,
    filter={"category": "memory"},
    include_metadata=True,
    include_values=False
)

Parameters

vector
list[float]
required
Query embedding vector. Must match the collection’s dimensions.
top_k
int
default:"10"
Number of results to return. Results are sorted by score in descending order.
filter
dict | None
default:"None"
Metadata filter dictionary. Only vectors whose metadata matches ALL key-value pairs will be returned.Example: {"category": "memory", "user_id": "123"}
include_metadata
bool
default:"True"
Whether to include metadata in the response. Set to False to reduce response size.
include_values
bool
default:"False"
Whether to include the vector values in the response. Usually not needed for most use cases.

Returns

response
QueryResponse
Object containing:
  • matches (list[QueryMatch]) - List of matching vectors sorted by score (highest first)
  • namespace (str) - The collection name
Each QueryMatch contains:
  • id (str) - Vector ID
  • score (float) - Similarity score
  • metadata (dict) - Vector metadata (if include_metadata=True)
  • values (list[float] | None) - Vector values (if include_values=True)

Raises

  • ValueError - If query vector dimensions don’t match the collection’s dimensions

Examples

from solvec import SolVec

sv = SolVec(network="devnet")
col = sv.collection("memories", dimensions=3)

# Insert some vectors
col.upsert([
    {"id": "a", "values": [1.0, 0.0, 0.0], "metadata": {"text": "alpha"}},
    {"id": "b", "values": [0.9, 0.1, 0.0], "metadata": {"text": "beta"}},
    {"id": "c", "values": [0.0, 1.0, 0.0], "metadata": {"text": "gamma"}}
])

# Query
results = col.query(vector=[1.0, 0.0, 0.0], top_k=2)

for match in results.matches:
    print(f"{match.id}: {match.metadata['text']} (score: {match.score:.3f})")
# Output:
# a: alpha (score: 1.000)
# b: beta (score: 0.995)

delete()

Delete vectors by ID.
col.delete(["vec_1", "vec_2", "vec_3"])

Parameters

ids
list[str]
required
List of vector IDs to delete. Non-existent IDs are silently ignored.

Returns

None

Example

from solvec import SolVec

sv = SolVec(network="devnet")
col = sv.collection("memories", dimensions=3)

col.upsert([
    {"id": "x", "values": [1.0, 0.0, 0.0]},
    {"id": "y", "values": [0.0, 1.0, 0.0]}
])

# Delete one vector
col.delete(["x"])

# Verify deletion
stats = col.describe_index_stats()
print(stats.vector_count)  # Output: 1

fetch()

Fetch specific vectors by ID.
result = col.fetch(["vec_1", "vec_2"])

Parameters

ids
list[str]
required
List of vector IDs to fetch.

Returns

result
dict
Dictionary containing:
  • vectors (dict) - Dictionary mapping IDs to vector objects
  • namespace (str) - The collection name
Each vector object contains:
  • id (str) - Vector ID
  • values (list[float]) - The embedding vector
  • metadata (dict) - Vector metadata

Example

from solvec import SolVec

sv = SolVec(network="devnet")
col = sv.collection("memories", dimensions=3)

col.upsert([
    {"id": "a", "values": [1.0, 0.0, 0.0], "metadata": {"text": "first"}},
    {"id": "b", "values": [0.0, 1.0, 0.0], "metadata": {"text": "second"}}
])

result = col.fetch(["a"])
print(result["vectors"]["a"]["values"])  # [1.0, 0.0, 0.0]
print(result["vectors"]["a"]["metadata"])  # {'text': 'first'}

describe_index_stats()

Get collection statistics and metadata.
stats = col.describe_index_stats()

Returns

stats
CollectionStats
Object containing:
  • vector_count (int) - Total number of vectors in the collection
  • dimension (int) - Vector dimensions
  • metric (DistanceMetric) - Distance metric being used
  • name (str) - Collection name
  • merkle_root (str) - Current Merkle root hash for integrity verification
  • last_updated (int) - Unix timestamp of last update
  • is_frozen (bool) - Whether the collection is frozen (read-only)

Example

from solvec import SolVec

sv = SolVec(network="devnet")
col = sv.collection("memories", dimensions=3)

col.upsert([
    {"id": "a", "values": [1.0, 0.0, 0.0]},
    {"id": "b", "values": [0.0, 1.0, 0.0]}
])

stats = col.describe_index_stats()
print(f"Vectors: {stats.vector_count}")
print(f"Dimensions: {stats.dimension}")
print(f"Metric: {stats.metric}")
print(f"Merkle root: {stats.merkle_root}")

verify()

Verify collection integrity against on-chain Merkle root.
result = col.verify()

Returns

result
VerificationResult
Object containing:
  • verified (bool) - Whether verification succeeded
  • on_chain_root (str) - Merkle root stored on Solana
  • local_root (str) - Locally computed Merkle root
  • match (bool) - Whether roots match
  • vector_count (int) - Number of vectors verified
  • solana_explorer_url (str) - Link to Solana Explorer for the verification transaction
  • timestamp (int) - Verification timestamp in milliseconds

Example

from solvec import SolVec

sv = SolVec(network="devnet")
col = sv.collection("memories", dimensions=3)

col.upsert([
    {"id": "a", "values": [1.0, 0.0, 0.0]}
])

result = col.verify()
print(f"Verified: {result.verified}")
print(f"Match: {result.match}")
print(f"Vector count: {result.vector_count}")
print(f"Explorer: {result.solana_explorer_url}")

Instance Attributes

name
str
Collection name
dimensions
int
Vector dimensions
metric
DistanceMetric
Distance metric (cosine, euclidean, or dot)
network
str
Solana network (mainnet-beta, devnet, or localnet)
wallet_path
str | None
Path to wallet keypair file

Complete Example: RAG Pipeline

import openai
from solvec import SolVec

# Initialize
sv = SolVec(network="devnet", wallet="~/.config/solana/id.json")
col = sv.collection("knowledge-base", dimensions=1536)

# 1. Index documents
documents = [
    "VecLabs combines HNSW indexing with Solana blockchain",
    "AI agents need persistent memory for context",
    "Semantic search finds meaning, not just keywords"
]

for i, doc in enumerate(documents):
    # Generate embedding
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=doc
    )
    embedding = response.data[0].embedding
    
    # Store in collection
    col.upsert([{
        "id": f"doc_{i}",
        "values": embedding,
        "metadata": {"text": doc}
    }])

# 2. Query with natural language
query = "How do AI agents maintain context?"

response = openai.embeddings.create(
    model="text-embedding-3-small",
    input=query
)
query_embedding = response.data[0].embedding

# 3. Search
results = col.query(vector=query_embedding, top_k=3)

# 4. Get context for LLM
context = "\n".join([
    match.metadata["text"] 
    for match in results.matches
])

print("Relevant context:")
print(context)

# 5. Verify data integrity
verification = col.verify()
print(f"\nData verified on-chain: {verification.verified}")
print(f"Solana Explorer: {verification.solana_explorer_url}")

Build docs developers (and LLMs) love