Overview
Filebright uses OpenRouter as the AI provider for:- Embeddings: Converting document chunks into vector representations for semantic search
- Chat: Answering questions using RAG (Retrieval-Augmented Generation)
Why OpenRouter?
- Model flexibility: Switch between different models without changing code
- Cost optimization: Choose models based on your budget and performance needs
- Reliability: Automatic fallbacks if a model is unavailable
- Simple billing: One account for all AI providers
Getting started
Create an OpenRouter account
- Visit openrouter.ai
- Sign up for a free account
- Add credits to your account (pay-as-you-go pricing)
- Navigate to Keys in the dashboard
- Create a new API key
- Copy the key and add it to your
.envfile
.env
Configuration
Environment variables
Your OpenRouter API key.Example:
sk-or-v1-1234567890abcdefThe model used to generate vector embeddings for document chunks.Recommended models:
text-embedding-3-small- Fast, cost-effective (1536 dimensions)text-embedding-3-large- Higher quality (3072 dimensions)text-embedding-ada-002- Legacy OpenAI model (1536 dimensions)
The model used for RAG-based question answering.Recommended models:
openai/gpt-3.5-turbo- Fast, cost-effectiveopenai/gpt-4-turbo- Higher quality reasoningopenai/gpt-4o- Balanced performance and costanthropic/claude-3.5-sonnet- Excellent for long contextmeta-llama/llama-3.1-70b-instruct- Open source alternative
Configuration file
The OpenRouter configuration is defined inbackend/config/services.php:
backend/config/services.php
Embedding models
Embedding models convert text into vector representations for semantic search.Available models
| Model | Dimensions | Cost | Best for |
|---|---|---|---|
text-embedding-3-small | 1536 | $0.02/1M tokens | General use, cost-effective |
text-embedding-3-large | 3072 | $0.13/1M tokens | High accuracy, larger docs |
text-embedding-ada-002 | 1536 | $0.10/1M tokens | Legacy compatibility |
Choosing an embedding model
For most use cases,
text-embedding-3-small provides excellent performance at the lowest cost.text-embedding-3-large if:
- You need higher accuracy for complex queries
- Your documents contain specialized or technical content
- Cost is not a primary concern
How embeddings work
TheEmbeddingService handles embedding generation:
backend/app/Services/EmbeddingService.php
Chat models
Chat models generate answers to user questions using retrieved context from your documents.Available models
| Model | Context Window | Cost (Input/Output) | Best for |
|---|---|---|---|
openai/gpt-3.5-turbo | 16K | 1.50 per 1M tokens | Fast, cost-effective |
openai/gpt-4-turbo | 128K | 30 per 1M tokens | High quality, complex queries |
openai/gpt-4o | 128K | 10 per 1M tokens | Balanced option |
anthropic/claude-3.5-sonnet | 200K | 15 per 1M tokens | Long documents, analysis |
anthropic/claude-3-haiku | 200K | 1.25 per 1M tokens | Fast, efficient |
meta-llama/llama-3.1-70b-instruct | 128K | 0.88 per 1M tokens | Open source, privacy |
Choosing a chat model
Start with
openai/gpt-3.5-turbo for development, then upgrade to openai/gpt-4o or anthropic/claude-3.5-sonnet for production.- Budget: GPT-3.5-turbo is most cost-effective
- Quality: GPT-4-turbo and Claude-3.5-sonnet provide better reasoning
- Context length: Claude models support longer contexts for large documents
- Speed: GPT-3.5-turbo and Claude-3-haiku are fastest
How RAG works
TheRAGService orchestrates the retrieval and generation process:
backend/app/Services/RAGService.php
- User asks a question
- Question is converted to an embedding
- MongoDB vector search finds similar document chunks
- Chunks are combined as context
- Chat model generates an answer based on the context
Vector search parameters
The RAG system uses these parameters for retrieval:backend/app/Services/RAGService.php:
numCandidates: Higher = more thorough search, slowerlimit: More chunks = more context, higher cost
API endpoints
OpenRouter provides these endpoints:Embeddings
Chat completions
Cost optimization
Tips for reducing costs
-
Use efficient models
text-embedding-3-smallinstead oftext-embedding-3-largeopenai/gpt-3.5-turbofor simple queries
-
Optimize chunk size
- Larger chunks = fewer embeddings to generate and store
- Smaller chunks = more precise retrieval
- Default: 1000 characters with 200 character overlap
-
Reduce retrieved chunks
- Lower
limitin vector search (default: 3) - Fewer chunks = less context sent to chat model
- Lower
-
Cache responses
- Implement caching for common queries
- Reuse answers for identical questions
-
Monitor usage
- Check OpenRouter dashboard regularly
- Set up usage alerts
- Review which models are being used most
Example costs
For a typical document upload and query: Upload 10-page PDF:- Text extraction: Free
- Generate embeddings: ~5,000 tokens = $0.0001
- Store in MongoDB: Free
- Total: ~$0.0001
- Query embedding: ~20 tokens = $0.000001
- Retrieve 3 chunks: Free
- Generate answer: ~500 tokens = $0.00075
- Total: ~$0.00075
With the default configuration, 1,000 document queries costs approximately $0.75.
Testing the integration
Verify your OpenRouter integration:Test embeddings
Test chat
Troubleshooting
API key invalid
API key invalid
- Verify the API key is correct in
.env - Check for extra spaces or newlines
- Ensure the key starts with
sk-or-v1- - Generate a new key from OpenRouter dashboard
Insufficient credits
Insufficient credits
- Check your balance at openrouter.ai
- Add credits to your account
- Review recent usage to identify unexpected costs
Model not found
Model not found
- Verify the model name is correct
- Check available models at openrouter.ai/models
- Some models require special access
Embeddings dimension mismatch
Embeddings dimension mismatch
- Check MongoDB vector index dimensions
text-embedding-3-small= 1536 dimensionstext-embedding-3-large= 3072 dimensions- Update index or change model to match
Rate limit exceeded
Rate limit exceeded
- OpenRouter enforces rate limits per model
- Implement exponential backoff in your code
- Reduce concurrent requests
- Contact OpenRouter for higher limits
Security best practices
- Never commit API keys: Use
.envfiles and add to.gitignore - Use environment variables: Never hardcode keys in source code
- Rotate keys regularly: Generate new keys periodically
- Monitor usage: Set up alerts for unusual activity
- Restrict key access: Use separate keys for development and production
- Sanitize user input: Always validate and sanitize user queries before sending to the API
Next steps
RAG system
Learn how the RAG system works
Document management
Upload and manage documents