OpenAI API Configuration

Overview

The WhatsApp RAG Bot uses OpenAI’s API for:

Chat Completion: GPT models for conversational responses
Text Embeddings: Vector embeddings for semantic search (RAG)
Audio Transcription: Whisper model for voice message processing

Prerequisites

You need an OpenAI API key with access to:

GPT models (gpt-3.5-turbo or gpt-4)
Text embedding models
Whisper audio transcription

Get your API key at platform.openai.com/api-keys

Configuration Options

Option 1: Database Configuration (Recommended)

Configure through the admin dashboard:

src/Services/CredentialService.php

$credentialService->saveOpenAICredentials([
    'api_key' => 'sk-...',
    'model' => 'gpt-4',
    'embedding_model' => 'text-embedding-3-small'
]);

Credentials stored in the database are automatically encrypted using AES-256-CBC encryption.

Option 2: Environment Variables

Add these to your .env file:

.env

OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002

Configuration File

config/config.php

'openai' => [
    'api_key' => getenv('OPENAI_API_KEY') ?: '',
    'model' => getenv('OPENAI_MODEL') ?: 'gpt-3.5-turbo',
    'embedding_model' => getenv('OPENAI_EMBEDDING_MODEL') ?: 'text-embedding-ada-002',
    'temperature' => 0.7,
    'max_tokens' => 500
],

Available Models

Chat Models

GPT-3.5 Turbo

Model: gpt-3.5-turboFast and cost-effective for most use cases. Good balance of performance and price.

Context: 16k tokens
Speed: Very fast
Cost: Low

GPT-4

Model: gpt-4More capable and accurate responses. Better for complex reasoning.

Context: 8k tokens
Speed: Slower
Cost: Higher

GPT-4 Turbo

Model: gpt-4-turbo-previewLatest GPT-4 with improved performance and larger context window.

Context: 128k tokens
Speed: Fast
Cost: Moderate

GPT-4o

Model: gpt-4oOptimized version with best balance of speed and capability.

Context: 128k tokens
Speed: Very fast
Cost: Moderate

Embedding Models

Ada-002 (Legacy)
v3-small (Recommended)
v3-large

Model: text-embedding-ada-002The previous generation embedding model.

Dimensions: 1536
Cost: $0.0001 per 1K tokens
Good for most RAG use cases

'embedding_model' => 'text-embedding-ada-002'

Model: text-embedding-3-smallNew generation model with better performance.

Dimensions: 1536
Cost: $0.00002 per 1K tokens (5x cheaper)
Improved accuracy

'embedding_model' => 'text-embedding-3-small'

Model: text-embedding-3-largeHighest quality embeddings.

Dimensions: 3072
Cost: $0.00013 per 1K tokens
Best accuracy for complex queries

'embedding_model' => 'text-embedding-3-large'

Model Parameters

Temperature

Controls randomness in responses (0.0 to 2.0):

'temperature' => 0.7  // Default: balanced creativity

0.0-0.3: Focused and deterministic (good for factual responses)
0.4-0.7: Balanced creativity and consistency (recommended)
0.8-1.0: More creative and varied responses
1.1-2.0: Very creative (may be less coherent)

Max Tokens

Maximum length of generated responses:

'max_tokens' => 500  // Default: ~375 words

1 token ≈ 0.75 words in English. Adjust based on your needs:

Short answers: 150-300 tokens
Medium answers: 300-500 tokens
Long answers: 500-1000 tokens

Usage in Code

webhook.php

$openaiTemperature = 0.7;
$openaiMaxTokens = 500;

if ($credentialService && $credentialService->hasOpenAICredentials()) {
    $oaiCreds = $credentialService->getOpenAICredentials();
    $openai = new OpenAIService(
        $oaiCreds['api_key'],
        $oaiCreds['model'],
        $oaiCreds['embedding_model'],
        $logger
    );
    $openaiTemperature = $oaiCreds['temperature'] ?? 0.7;
    $openaiMaxTokens = $oaiCreds['max_tokens'] ?? 500;
}

OpenAI Service Features

Chat Completion

Generate responses with context and conversation history:

$openai = new OpenAIService(
    $apiKey,
    'gpt-3.5-turbo',
    'text-embedding-ada-002',
    $logger
);

$response = $openai->generateResponse(
    $userMessage,
    $contextFromRAG,
    $systemPrompt,
    $temperature = 0.7,
    $maxTokens = 500,
    $conversationHistory = []
);

Text Embeddings

Generate vector embeddings for semantic search:

// Single text
$embedding = $openai->generateEmbedding($text);

// Batch processing
$texts = ['text 1', 'text 2', 'text 3'];
$embeddings = $openai->generateEmbeddings($texts);

Audio Transcription

Transcribe voice messages using Whisper:

src/Services/OpenAIService.php

public function transcribeAudio($audioContent, $filename = 'audio.ogg')
{
    $response = $this->client->post('audio/transcriptions', [
        'multipart' => [
            [
                'name' => 'file',
                'contents' => $audioContent,
                'filename' => $filename
            ],
            [
                'name' => 'model',
                'contents' => 'whisper-1'
            ],
            [
                'name' => 'language',
                'contents' => 'es'  // Spanish
            ]
        ]
    ]);
    
    $data = json_decode($response->getBody()->getContents(), true);
    return $data['text'] ?? '';
}

Audio transcription is only available in AI mode. In classic mode, users are prompted to send text messages instead.

RAG Configuration

The bot uses embeddings for semantic search in the RAG system:

config/config.php

'rag' => [
    'chunk_size' => 500,              // Characters per chunk
    'chunk_overlap' => 50,            // Overlap between chunks
    'top_k_results' => 3,             // Number of similar chunks to retrieve
    'similarity_threshold' => 0.7,    // Minimum similarity score (0.0-1.0)
    'similarity_method' => 'cosine'   // Similarity calculation method
],

How RAG Works

Document Processing

Documents are split into chunks and converted to embeddings using your configured embedding model.

Query Embedding

User queries are converted to embeddings using the same model.

Similarity Search

The system finds the most similar document chunks using cosine similarity.

Context Generation

Top matching chunks are provided as context to the GPT model.

Response Generation

GPT generates a response based on the context and conversation history.

Error Handling

Insufficient Funds

The system automatically detects when your OpenAI account has insufficient credits:

webhook.php

function handleInsufficientFunds($db, $e) {
    if (strpos($e->getMessage(), 'INSUFFICIENT_FUNDS') !== false) {
        $db->query(
            "INSERT INTO settings (setting_key, setting_value) 
             VALUES ('openai_status', 'insufficient_funds') 
             ON DUPLICATE KEY UPDATE setting_value = 'insufficient_funds'",
            []
        );
        return true;
    }
    return false;
}

When OpenAI credits are depleted, the bot will use fallback messages. Monitor your usage at platform.openai.com/usage

Rate Limiting

OpenAI has rate limits based on your account tier:

Free tier: Limited requests per minute
Pay-as-you-go: Higher limits based on usage history
Enterprise: Custom rate limits

Implement retry logic or upgrade your account if you hit rate limits frequently.

Cost Optimization

Model Selection
Token Management
Caching

Choose models based on your budget:Most Cost-Effective:

'model' => 'gpt-3.5-turbo',
'embedding_model' => 'text-embedding-3-small'

Balanced:

'model' => 'gpt-4o-mini',
'embedding_model' => 'text-embedding-3-small'

Best Quality:

'model' => 'gpt-4',
'embedding_model' => 'text-embedding-3-large'

Reduce token usage:

Limit conversation history:

UPDATE settings 
SET setting_value = '3' 
WHERE setting_key = 'context_messages_count';

Reduce max tokens:

'max_tokens' => 300  // Shorter responses

Optimize RAG chunks:

'rag' => [
    'top_k_results' => 2,  // Fewer context chunks
    'chunk_size' => 400    // Smaller chunks
]

The system caches query embeddings to reduce API calls:

database/schema.sql

CREATE TABLE IF NOT EXISTS query_embedding_cache (
    query_hash VARCHAR(32) NOT NULL PRIMARY KEY,
    embedding MEDIUMBLOB NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_used_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    hit_count INT DEFAULT 0,
    INDEX idx_last_used (last_used_at)
);

Cached embeddings are reused for identical queries, saving API costs.

Testing Your Configuration

Test API Connection

Send a test message through the admin dashboard or directly via WhatsApp.

Verify Embedding Generation

Upload a test document and check if vectors are generated successfully.

Test Audio Transcription

Send a voice message (in AI mode) and verify it’s transcribed correctly.

Monitor Logs

Check for any OpenAI API errors:

tail -f logs/app.log | grep -i openai

Troubleshooting

Invalid API key error

Verify your API key is correct
Check that the key hasn’t been revoked
Ensure there are no extra spaces or newlines
Generate a new key at platform.openai.com/api-keys

Model not found error

Verify your account has access to the specified model
Check for typos in the model name
Some models require special access (e.g., GPT-4)
Try using gpt-3.5-turbo as a fallback

Rate limit exceeded

Reduce the number of concurrent requests
Implement request queuing
Upgrade your OpenAI account tier
Add retry logic with exponential backoff

Embedding dimension mismatch

If you change embedding models, you must:

Clear the vectors table
Clear the query_embedding_cache table
Re-process all documents

Different models have different dimensions:

text-embedding-ada-002: 1536
text-embedding-3-small: 1536
text-embedding-3-large: 3072

Next Steps

System Settings

Configure system prompts and bot behavior

Google Calendar

Set up appointment scheduling

Get Started

Core Features

Configuration

Deployment

Overview

Prerequisites

Configuration Options

Option 1: Database Configuration (Recommended)

Option 2: Environment Variables

Configuration File

Available Models

Chat Models

GPT-3.5 Turbo

GPT-4

GPT-4 Turbo

GPT-4o

Embedding Models

Model Parameters

Temperature

Max Tokens

Usage in Code

OpenAI Service Features

Chat Completion

Text Embeddings

Audio Transcription

RAG Configuration

How RAG Works

Error Handling

Insufficient Funds

Rate Limiting

Cost Optimization

Testing Your Configuration

Troubleshooting

Next Steps

System Settings

Google Calendar

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

​Overview

​Prerequisites

​Configuration Options

​Option 1: Database Configuration (Recommended)

​Option 2: Environment Variables

​Configuration File

​Available Models

​Chat Models

GPT-3.5 Turbo

GPT-4

GPT-4 Turbo

GPT-4o

​Embedding Models

​Model Parameters

​Temperature

​Max Tokens

​Usage in Code

​OpenAI Service Features

​Chat Completion

​Text Embeddings

​Audio Transcription

​RAG Configuration

​How RAG Works

​Error Handling

​Insufficient Funds

​Rate Limiting

​Cost Optimization

​Testing Your Configuration

​Troubleshooting

​Next Steps

System Settings

Google Calendar

Build docs developers (and LLMs) love

Overview

Prerequisites

Configuration Options

Option 1: Database Configuration (Recommended)

Option 2: Environment Variables

Configuration File

Available Models

Chat Models

Embedding Models

Model Parameters

Temperature

Max Tokens

Usage in Code

OpenAI Service Features

Chat Completion

Text Embeddings

Audio Transcription

RAG Configuration

How RAG Works

Error Handling

Insufficient Funds

Rate Limiting

Cost Optimization

Testing Your Configuration

Troubleshooting

Next Steps