Skip to main content

Overview

The RAGService is the core component that implements Retrieval-Augmented Generation (RAG). It combines vector search with OpenAI’s language models to provide context-aware, accurate responses based on your indexed documents.

Class Structure

Constructor

public function __construct(
    OpenAIService $openai,
    VectorSearchService $vectorSearch,
    Logger $logger,
    $topK = 3,
    $threshold = 0.7,
    Database $db = null
)
openai
OpenAIService
required
OpenAI service instance for embeddings and chat completions
Vector search service for finding similar document chunks
logger
Logger
required
Logger instance for tracking operations
topK
int
default:"3"
Number of most similar chunks to retrieve
threshold
float
default:"0.7"
Minimum similarity score (0.0-1.0) to include a chunk
db
Database
default:"null"
Optional database connection for embedding caching

Instantiation Example

webhook.php
$vectorSearch = new VectorSearchService(
    $db,
    Config::get('rag.similarity_method')
);

$rag = new RAGService(
    $openai,
    $vectorSearch,
    $logger,
    3,      // Top 3 results
    0.7,    // 70% similarity threshold
    $db     // Enable caching
);

Core Methods

generateResponse()

Generates an AI response using relevant document context.
public function generateResponse(
    $userMessage,
    $systemPrompt = null,
    $conversationHistory = [],
    $temperature = 0.7,
    $maxTokens = 500
)
userMessage
string
required
The user’s query or message
systemPrompt
string
default:"null"
Custom system prompt to guide AI behavior
conversationHistory
array
default:"[]"
Previous messages for context (format: [['sender' => 'user|bot', 'message_text' => '...']])
temperature
float
default:"0.7"
Creativity level (0.0-1.0)
maxTokens
int
default:"500"
Maximum response length
Returns: Array containing:
  • response (string|null): Generated AI response
  • context (string): Retrieved document context
  • confidence (float): Highest similarity score
  • sources (array): Source documents with scores

Usage Example

webhook.php
$result = $rag->generateResponse(
    $messageData['text'],
    $systemPrompt,
    $conversationHistory,
    $openaiTemperature,
    $openaiMaxTokens
);

if ($result['response'] && $result['confidence'] >= 0.7) {
    $whatsapp->sendMessage(
        $conversation['phone_number'],
        $result['response']
    );
    
    $conversationService->addMessage(
        $conversation['id'],
        'bot',
        $result['response'],
        null,
        $result['context'],
        $result['confidence']
    );
}

indexDocument()

Indexes a document by splitting it into chunks and generating embeddings.
public function indexDocument(
    $documentId,
    $text,
    $chunkSize = 500,
    $overlap = 50
)
documentId
int
required
Database ID of the document being indexed
text
string
required
Full text content to index
chunkSize
int
default:"500"
Number of words per chunk
overlap
int
default:"50"
Number of overlapping words between chunks
Returns: Number of chunks successfully indexed

Usage Example

$documentText = TextProcessor::extractText($filePath, 'pdf');
$chunksIndexed = $rag->indexDocument(
    $documentId,
    $documentText,
    500,  // 500 words per chunk
    50    // 50 word overlap
);

$logger->info("Indexed {$chunksIndexed} chunks");

RAG Pipeline Algorithm

The generateResponse() method implements a sophisticated RAG pipeline:
1

Query Embedding

Converts the user’s message into a vector embedding, using cache if available
$queryEmbedding = $this->getCachedOrCreateEmbedding($userMessage);
2

Vector Search

Searches for the most similar document chunks
$similarChunks = $this->vectorSearch->searchSimilar(
    $queryEmbedding,
    $this->topK,
    $this->threshold
);
3

Context Assembly

Combines retrieved chunks into a context string
foreach ($similarChunks as $chunk) {
    $contextParts[] = $chunk['chunk_text'];
    $sources[] = [
        'document' => $chunk['original_name'],
        'score' => $chunk['score']
    ];
    $maxScore = max($maxScore, $chunk['score']);
}
$context = implode("\n\n", $contextParts);
4

Response Generation

Generates AI response with context and conversation history
$response = $this->openai->generateResponse(
    $userMessage,
    $context,
    $systemPrompt,
    $temperature,
    $maxTokens,
    $conversationHistory
);

Embedding Caching

The service implements intelligent embedding caching to reduce API costs and improve response times.

Cache Strategy

$normalized = trim(mb_strtolower($userMessage));
$queryHash = md5($normalized);

$cached = $this->db->fetchOne(
    'SELECT embedding FROM query_embedding_cache 
     WHERE query_hash = :hash 
     AND created_at > DATE_SUB(NOW(), INTERVAL 24 HOUR)',
    [':hash' => $queryHash]
);

if ($cached && !empty($cached['embedding'])) {
    $this->db->query(
        'UPDATE query_embedding_cache 
         SET last_used_at = NOW(), hit_count = hit_count + 1 
         WHERE query_hash = :hash',
        [':hash' => $queryHash]
    );
    return VectorMath::unserializeVector($cached['embedding']);
}

Cache Benefits

Cost Reduction

Avoids redundant embedding API calls for repeated queries

Performance

Instant embedding retrieval from database

Case Insensitive

Normalizes queries to maximize cache hits

Auto Cleanup

Removes entries unused for 7 days

Response Structure

No Context Found

if (empty($similarChunks)) {
    return [
        'response' => null,
        'context' => '',
        'confidence' => 0.0,
        'sources' => []
    ];
}
When no relevant context is found, the bot falls back to using OpenAI Service without RAG context, or displays a fallback message.

Successful Response

{
  "response": "Based on our documentation, you can reset your password by...",
  "context": "Password Reset\n\nTo reset your password: 1. Click 'Forgot Password'...",
  "confidence": 0.92,
  "sources": [
    {
      "document": "user_guide.pdf",
      "score": 0.92
    },
    {
      "document": "faq.pdf",
      "score": 0.85
    }
  ]
}

Error Handling

try {
    $result = $rag->generateResponse($messageData['text'], $systemPrompt);
    
    if ($result['response'] && $result['confidence'] >= 0.7) {
        // Handle high-confidence response
    } else {
        // Handle low-confidence response
    }
} catch (\Exception $e) {
    if (strpos($e->getMessage(), 'INSUFFICIENT_FUNDS') !== false) {
        // Handle OpenAI quota exceeded
        $db->query(
            "INSERT INTO settings (setting_key, setting_value) 
             VALUES ('openai_status', 'insufficient_funds') 
             ON DUPLICATE KEY UPDATE setting_value = 'insufficient_funds'"
        );
    }
    
    $logger->error('RAG Error: ' . $e->getMessage());
}

Configuration

Configure RAG behavior in config/config.php:
config/config.php
return [
    'rag' => [
        'similarity_method' => 'cosine',  // or 'euclidean'
        'similarity_threshold' => 0.7,
        'top_k' => 3,
        'chunk_size' => 500,
        'chunk_overlap' => 50
    ]
];

Best Practices

  • Small knowledge base (<10 docs): Use topK = 3
  • Medium knowledge base (10-50 docs): Use topK = 5
  • Large knowledge base (>50 docs): Use topK = 7-10
  • Strict accuracy (70-85%): threshold = 0.75
  • Balanced (60-75%): threshold = 0.7 (default)
  • Broader recall (50-70%): threshold = 0.6
  • Technical docs: 300-500 words with 50 word overlap
  • Conversational content: 500-700 words with 100 word overlap
  • Short FAQs: 200-300 words with 30 word overlap
Always pass the $db parameter to enable embedding caching:
$rag = new RAGService($openai, $vectorSearch, $logger, 3, 0.7, $db);

OpenAI Service

Handles embeddings and chat completions

Vector Search

Performs similarity search on embeddings

Next Steps

Index Documents

Learn how to upload and index documents

Tune Parameters

Optimize RAG performance for your use case

Build docs developers (and LLMs) love