Concurrency Control
The most critical performance setting isSEMAPHORE_LIMIT, which controls concurrent episode processing.
Understanding SEMAPHORE_LIMIT
Graphiti’s ingestion pipelines are highly concurrent.SEMAPHORE_LIMIT determines how many episodes can be processed simultaneously. Each episode involves multiple LLM calls:
- Entity extraction (2-3 calls)
- Entity deduplication (1-2 calls)
- Fact extraction (2-3 calls)
- Summarization (1-2 calls)
Default Configuration
Tuning by LLM Provider
OpenAI
Anthropic
Azure OpenAI
Ollama (Local LLM)
Groq
Symptoms of Misconfiguration
Too High:- 429 rate limit errors in logs
- Increased API costs from retries
- Memory pressure from queued operations
- Inconsistent response times
- Slow episode ingestion
- Underutilized API quota
- Poor throughput
- Long processing queues
Monitoring and Adjustment
Dynamic Adjustment
Adjust concurrency at runtime:Database Optimization
Neo4j Performance
Memory Configuration
Editneo4j.conf:
Index Configuration
Create optimal indices:Query Optimization
Use query plans to identify bottlenecks:Connection Pooling
FalkorDB Performance
Redis Configuration
Optimize Redis for FalkorDB:Graph-Specific Settings
Kuzu Performance
File System Optimization
Memory vs Disk Trade-off
Chunking Configuration
Graphiti automatically chunks large episodes to avoid LLM context limits.Chunking Parameters
Tuning Guidance
Large documents:Embedding Performance
Batch Embeddings
Graphiti batches embedding requests by default:Choose Faster Embedding Models
Local Embeddings
Use local models to eliminate network latency:Search Performance
Limit Result Counts
Use Centered Searches
Optimize Search Configuration
Parallel Processing
Enable Parallel Runtime
Batch Episode Ingestion
Caching Strategies
LLM Response Caching
Some providers support prompt caching:Application-Level Caching
Monitoring and Profiling
Enable Logging
Track Metrics
OpenTelemetry Integration
Seeexamples/opentelemetry/ for full instrumentation:
Production Deployment
Horizontal Scaling
Deploy multiple Graphiti instances:Database Clustering
Neo4j Cluster
Load Balancing
Performance Benchmarks
Typical performance on modern hardware:| Operation | Avg Time | P95 Time | Notes |
|---|---|---|---|
| Add Episode (short) | 2-5s | 8s | SEMAPHORE_LIMIT=10 |
| Add Episode (long) | 8-15s | 25s | With chunking |
| Search (5 results) | 200-500ms | 1s | With indices |
| Search (20 results) | 500ms-1s | 2s | With reranking |
| Bulk ingest (100 episodes) | 30-60s | 90s | Parallel |
Troubleshooting
High Memory Usage
Symptoms: Memory grows unbounded Solutions:- Lower
SEMAPHORE_LIMIT - Reduce
CHUNK_TOKEN_SIZE - Enable database connection pooling
- Clear episode queue periodically
Slow Ingestion
Symptoms: Episodes take > 30s to process Solutions:- Increase
SEMAPHORE_LIMIT(if not hitting rate limits) - Use faster embedding model
- Reduce chunking overhead
- Check database index health
Rate Limit Errors
Symptoms: 429 errors in logs Solutions:- Lower
SEMAPHORE_LIMIT - Implement exponential backoff
- Upgrade LLM provider tier
- Switch to local models (Ollama)