AI integration

Overview

Filebright uses OpenRouter as the AI provider for:

Embeddings: Converting document chunks into vector representations for semantic search
Chat: Answering questions using RAG (Retrieval-Augmented Generation)

OpenRouter provides unified access to multiple AI models from different providers (OpenAI, Anthropic, Meta, etc.) through a single API.

Why OpenRouter?

Model flexibility: Switch between different models without changing code
Cost optimization: Choose models based on your budget and performance needs
Reliability: Automatic fallbacks if a model is unavailable
Simple billing: One account for all AI providers

Getting started

Create an OpenRouter account

Visit openrouter.ai
Sign up for a free account
Add credits to your account (pay-as-you-go pricing)
Navigate to Keys in the dashboard
Create a new API key
Copy the key and add it to your .env file

.env

OPENROUTER_API_KEY=sk-or-v1-...

Keep your API key secure. Never commit it to version control or share it publicly.

Configuration

Environment variables

OPENROUTER_API_KEY

string

required

Your OpenRouter API key.Example: sk-or-v1-1234567890abcdef

OPENROUTER_EMBEDDING_MODEL

string

default:"text-embedding-3-small"

The model used to generate vector embeddings for document chunks.Recommended models:

text-embedding-3-small - Fast, cost-effective (1536 dimensions)
text-embedding-3-large - Higher quality (3072 dimensions)
text-embedding-ada-002 - Legacy OpenAI model (1536 dimensions)

OPENROUTER_CHAT_MODEL

string

default:"openai/gpt-3.5-turbo"

The model used for RAG-based question answering.Recommended models:

openai/gpt-3.5-turbo - Fast, cost-effective
openai/gpt-4-turbo - Higher quality reasoning
openai/gpt-4o - Balanced performance and cost
anthropic/claude-3.5-sonnet - Excellent for long context
meta-llama/llama-3.1-70b-instruct - Open source alternative

Configuration file

The OpenRouter configuration is defined in backend/config/services.php:

backend/config/services.php

'openrouter' => [
    'key' => env('OPENROUTER_API_KEY'),
    'embedding_model' => env('OPENROUTER_EMBEDDING_MODEL', 'text-embedding-3-small'),
    'chat_model' => env('OPENROUTER_CHAT_MODEL', 'openai/gpt-3.5-turbo'),
],

Embedding models

Embedding models convert text into vector representations for semantic search.

Available models

Model	Dimensions	Cost	Best for
`text-embedding-3-small`	1536	$0.02/1M tokens	General use, cost-effective
`text-embedding-3-large`	3072	$0.13/1M tokens	High accuracy, larger docs
`text-embedding-ada-002`	1536	$0.10/1M tokens	Legacy compatibility

Choosing an embedding model

For most use cases, text-embedding-3-small provides excellent performance at the lowest cost.

Consider text-embedding-3-large if:

You need higher accuracy for complex queries
Your documents contain specialized or technical content
Cost is not a primary concern

If you change the embedding model after documents are already processed, you must:

Delete all existing embeddings from MongoDB
Update the vector index dimensions
Reprocess all documents to generate new embeddings

How embeddings work

The EmbeddingService handles embedding generation:

backend/app/Services/EmbeddingService.php

public function getBulkEmbeddings(array $texts): array
{
    $response = Http::withHeaders([
        'Authorization' => 'Bearer ' . $this->apiKey,
        'HTTP-Referer' => config('app.url'),
    ])->post('https://openrouter.ai/api/v1/embeddings', [
        'model' => $this->model,
        'input' => $texts,
    ]);

    return array_map(fn($item) => $item['embedding'], $response->json('data'));
}

Embeddings are generated in bulk for efficiency when processing documents.

Chat models

Chat models generate answers to user questions using retrieved context from your documents.

Available models

Model	Context Window	Cost (Input/Output)	Best for
`openai/gpt-3.5-turbo`	16K	$0.50/$ 1.50 per 1M tokens	Fast, cost-effective
`openai/gpt-4-turbo`	128K	$10/$ 30 per 1M tokens	High quality, complex queries
`openai/gpt-4o`	128K	$2.50/$ 10 per 1M tokens	Balanced option
`anthropic/claude-3.5-sonnet`	200K	$3/$ 15 per 1M tokens	Long documents, analysis
`anthropic/claude-3-haiku`	200K	$0.25/$ 1.25 per 1M tokens	Fast, efficient
`meta-llama/llama-3.1-70b-instruct`	128K	$0.88/$ 0.88 per 1M tokens	Open source, privacy

Choosing a chat model

Start with openai/gpt-3.5-turbo for development, then upgrade to openai/gpt-4o or anthropic/claude-3.5-sonnet for production.

Consider factors:

Budget: GPT-3.5-turbo is most cost-effective
Quality: GPT-4-turbo and Claude-3.5-sonnet provide better reasoning
Context length: Claude models support longer contexts for large documents
Speed: GPT-3.5-turbo and Claude-3-haiku are fastest

How RAG works

The RAGService orchestrates the retrieval and generation process:

backend/app/Services/RAGService.php

public function answer(string $query, int $userId): string
{
    // 1. Generate embedding for the query
    $queryEmbedding = $this->embeddingService->getEmbedding($query);
    
    // 2. Retrieve relevant chunks from MongoDB
    $chunks = $this->retrieveContext($queryEmbedding, $userId);
    
    // 3. Build context from retrieved chunks
    $context = $chunks->pluck('content')->implode("\n\n---\n\n");
    
    // 4. Generate answer using chat model
    return $this->getLLMResponse($query, $context);
}

The process:

User asks a question
Question is converted to an embedding
MongoDB vector search finds similar document chunks
Chunks are combined as context
Chat model generates an answer based on the context

Vector search parameters

The RAG system uses these parameters for retrieval:

'$vectorSearch' => [
    'index' => 'vector_index',      // MongoDB index name
    'path' => 'embedding',          // Field containing embeddings
    'queryVector' => $embedding,    // Query embedding
    'numCandidates' => 100,         // Number of candidates to consider
    'limit' => 3,                   // Top results to return
    'filter' => [
        'metadata.user_id' => $userId  // Only search user's documents
    ]
]

You can adjust these in backend/app/Services/RAGService.php:

numCandidates: Higher = more thorough search, slower
limit: More chunks = more context, higher cost

API endpoints

OpenRouter provides these endpoints:

Embeddings

POST https://openrouter.ai/api/v1/embeddings

Request:

{
  "model": "text-embedding-3-small",
  "input": ["text to embed", "another text"]
}

Response:

{
  "data": [
    {"embedding": [0.123, -0.456, ...], "index": 0},
    {"embedding": [0.789, -0.012, ...], "index": 1}
  ],
  "model": "text-embedding-3-small",
  "usage": {"prompt_tokens": 10, "total_tokens": 10}
}

Chat completions

POST https://openrouter.ai/api/v1/chat/completions

Request:

{
  "model": "openai/gpt-3.5-turbo",
  "messages": [
    {"role": "user", "content": "Your question with context"}
  ]
}

Response:

{
  "choices": [
    {
      "message": {"role": "assistant", "content": "The answer"},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 50, "completion_tokens": 100}
}

Cost optimization

Tips for reducing costs

Use efficient models
- text-embedding-3-small instead of text-embedding-3-large
- openai/gpt-3.5-turbo for simple queries
Optimize chunk size
- Larger chunks = fewer embeddings to generate and store
- Smaller chunks = more precise retrieval
- Default: 1000 characters with 200 character overlap
Reduce retrieved chunks
- Lower limit in vector search (default: 3)
- Fewer chunks = less context sent to chat model
Cache responses
- Implement caching for common queries
- Reuse answers for identical questions
Monitor usage
- Check OpenRouter dashboard regularly
- Set up usage alerts
- Review which models are being used most

Example costs

For a typical document upload and query: Upload 10-page PDF:

Text extraction: Free
Generate embeddings: ~5,000 tokens = $0.0001
Store in MongoDB: Free
Total: ~$0.0001

Ask a question:

Query embedding: ~20 tokens = $0.000001
Retrieve 3 chunks: Free
Generate answer: ~500 tokens = $0.00075
Total: ~$0.00075

With the default configuration, 1,000 document queries costs approximately $0.75.

Testing the integration

Verify your OpenRouter integration:

Test embeddings

php artisan tinker

$service = app(App\Services\EmbeddingService::class);
$embedding = $service->getEmbedding('test text');
count($embedding); // Should return 1536 for text-embedding-3-small

Test chat

php artisan tinker

$service = app(App\Services\RAGService::class);
$answer = $service->answer('What is AI?', 1);
echo $answer; // Should return a response (or "no relevant info" if no docs)

Troubleshooting

API key invalid

Verify the API key is correct in .env
Check for extra spaces or newlines
Ensure the key starts with sk-or-v1-
Generate a new key from OpenRouter dashboard

Insufficient credits

Check your balance at openrouter.ai
Add credits to your account
Review recent usage to identify unexpected costs

Model not found

Verify the model name is correct
Check available models at openrouter.ai/models
Some models require special access

Embeddings dimension mismatch

Check MongoDB vector index dimensions
text-embedding-3-small = 1536 dimensions
text-embedding-3-large = 3072 dimensions
Update index or change model to match

Rate limit exceeded

OpenRouter enforces rate limits per model
Implement exponential backoff in your code
Reduce concurrent requests
Contact OpenRouter for higher limits

Security best practices

Follow these security guidelines to protect your API key and data.

Never commit API keys: Use .env files and add to .gitignore
Use environment variables: Never hardcode keys in source code
Rotate keys regularly: Generate new keys periodically
Monitor usage: Set up alerts for unusual activity
Restrict key access: Use separate keys for development and production
Sanitize user input: Always validate and sanitize user queries before sending to the API

Get Started

Features

Configuration

Deployment

Overview

Why OpenRouter?

Getting started

Create an OpenRouter account

Configuration

Environment variables

Configuration file

Embedding models

Available models

Choosing an embedding model

How embeddings work

Chat models

Available models

Choosing a chat model

How RAG works

Vector search parameters

API endpoints

Embeddings

Chat completions

Cost optimization

Tips for reducing costs

Example costs

Testing the integration

Test embeddings

Test chat

Troubleshooting

Security best practices

Next steps

RAG system

Document management

Build docs developers (and LLMs) love

Get Started

Features

Configuration

Deployment

​Overview

​Why OpenRouter?

​Getting started

​Create an OpenRouter account

​Configuration

​Environment variables

​Configuration file

​Embedding models

​Available models

​Choosing an embedding model

​How embeddings work

​Chat models

​Available models

​Choosing a chat model

​How RAG works

​Vector search parameters

​API endpoints

​Embeddings

​Chat completions

​Cost optimization

​Tips for reducing costs

​Example costs

​Testing the integration

​Test embeddings

​Test chat

​Troubleshooting

​Security best practices

​Next steps

RAG system

Document management

Build docs developers (and LLMs) love

Overview

Why OpenRouter?

Getting started

Create an OpenRouter account

Configuration

Environment variables

Configuration file

Embedding models

Available models

Choosing an embedding model

How embeddings work

Chat models

Available models

Choosing a chat model

How RAG works

Vector search parameters

API endpoints

Embeddings

Chat completions

Cost optimization

Tips for reducing costs

Example costs

Testing the integration

Test embeddings

Test chat

Troubleshooting

Security best practices

Next steps