Skip to main content

Overview

Chroma is an open-source embedding database designed for AI applications. Perfect for local development with easy deployment to cloud or self-hosted environments.

Setup

No setup required! Chroma runs in-memory by default.
// Simply leave Chroma URL empty
Collection Name: my-collection
Embeddings: OpenAI Embeddings

Configuration

Required Parameters

collectionName
string
required
Name of the collection to store/retrieve embeddings
embeddings
Embeddings
required
Embedding model to use (e.g., OpenAI Embeddings)

Optional Parameters

document
Document[]
Documents to upsert into the collection
chromaURL
string
URL of Chroma server. Leave empty for in-memory mode:
  • Empty = In-memory
  • http://localhost:8000 = Local server
  • https://api.trychroma.com = Chroma Cloud
credential
credential
Chroma API credential (only needed for cloud-hosted instances)
recordManager
RecordManager
Track indexed documents to prevent duplication
chromaMetadataFilter
json
Filter search results by metadata:
{
  "source": "docs",
  "category": "tutorial"
}
topK
number
default:4
Number of results to return

Usage Examples

In-Memory (Development)

// Fastest setup - no persistence
Collection Name: test-collection
Embeddings: OpenAI Embeddings
Chroma URL: [leave empty]
Top K: 4

// Data stored in memory, lost on restart

Local Persistent

# Start Chroma server
docker run -p 8000:8000 -v ./chroma_data:/chroma/chroma chromadb/chroma
// Connect to local server
Collection Name: my-docs
Chroma URL: http://localhost:8000
Embeddings: OpenAI Embeddings

Chroma Cloud

// Cloud configuration
Chroma URL: https://api.trychroma.com
Collection Name: production-docs
Credential: Chroma API (with key, tenant, database)
Embeddings: OpenAI Embeddings

With Metadata Filtering

// Search only specific documents
{
  "chromaMetadataFilter": {
    "type": "api-docs",
    "version": "v2"
  }
}

With Record Manager

// Prevent duplicate indexing
Document: Text Loader
Collection Name: knowledge-base
Record Manager: Postgres Record Manager
Embeddings: OpenAI Embeddings

// Only new/changed docs are processed

Metadata Filter Syntax

Chroma supports WHERE clause filtering:
// Simple equality
{ "category": "tutorial" }

// Operators: $eq, $ne, $gt, $gte, $lt, $lte
{
  "year": { "$gte": 2023 },
  "rating": { "$gt": 4.5 }
}

// $in operator
{
  "status": { "$in": ["published", "reviewed"] }
}

// Logical operators: $and, $or
{
  "$and": [
    { "category": "docs" },
    { "language": "en" }
  ]
}

{
  "$or": [
    { "priority": "high" },
    { "urgent": true }
  ]
}

Best Practices

Development

  • Use in-memory for quick testing
  • Use local server for development
  • Small datasets work great in-memory
  • Easy to reset and iterate

Production

  • Use Chroma Cloud or self-hosted server
  • Enable authentication
  • Set up backups
  • Monitor collection sizes

Performance

  • Create indexes on frequently queried metadata
  • Use appropriate collection sizes
  • Consider sharding for very large datasets
  • Batch upserts when possible

Data Management

  • Use descriptive collection names
  • Tag documents with metadata
  • Use record manager to avoid duplicates
  • Implement collection lifecycle management

Collection Management

Creating Collections

Collections are created automatically when you first upsert documents.

Deleting Collections

# Via Chroma client (if needed)
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)
client.delete_collection(name="old-collection")

Listing Collections

# Check existing collections
client = chromadb.HttpClient(host="localhost", port=8000)
collections = client.list_collections()
for collection in collections:
    print(f"{collection.name}: {collection.count()} documents")

Deployment Options

Docker Compose

version: '3'
services:
  chroma:
    image: chromadb/chroma
    ports:
      - "8000:8000"
    volumes:
      - ./chroma_data:/chroma/chroma
    environment:
      - CHROMA_SERVER_AUTH_PROVIDER=token
      - CHROMA_SERVER_AUTH_CREDENTIALS=your-secret-token

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: chroma
spec:
  replicas: 1
  selector:
    matchLabels:
      app: chroma
  template:
    metadata:
      labels:
        app: chroma
    spec:
      containers:
      - name: chroma
        image: chromadb/chroma
        ports:
        - containerPort: 8000
        volumeMounts:
        - name: chroma-data
          mountPath: /chroma/chroma
      volumes:
      - name: chroma-data
        persistentVolumeClaim:
          claimName: chroma-pvc

Common Issues

Can’t connect to Chroma serverSolution:
  • Verify Chroma server is running
  • Check URL format: http://localhost:8000
  • Ensure port 8000 is not blocked
  • For Docker: Check container is running
Error accessing collectionSolution:
  • Collections are auto-created on first upsert
  • Check collection name spelling
  • Ensure documents were successfully indexed
  • Verify you’re connecting to correct server
Data disappears after restartSolution:
  • In-memory mode doesn’t persist
  • Use Chroma server for persistence
  • Configure persistent volume
  • Consider Chroma Cloud for managed hosting
Chroma Cloud connection issuesSolution:
  • Verify API key is correct
  • Check tenant and database names
  • Ensure credential is properly configured
  • Test with Chroma Cloud console

Chroma vs Other Vector DBs

FeatureChromaPineconeQdrant
Open SourceYesNoYes
In-MemoryYesNoYes
Managed CloudYesYesYes
Self-HostedYesNoYes
Best ForDevelopmentProductionProduction
Ease of UseExcellentVery GoodGood

Outputs

retriever
VectorStoreRetriever
Retriever interface for use in chains and agents
vectorStore
ChromaVectorStore
Direct vector store access for custom operations

Build docs developers (and LLMs) love