Overview
The RAGService is the core component that implements Retrieval-Augmented Generation (RAG). It combines vector search with OpenAI’s language models to provide context-aware, accurate responses based on your indexed documents.
Class Structure
Constructor
public function __construct (
OpenAIService $openai ,
VectorSearchService $vectorSearch ,
Logger $logger ,
$topK = 3 ,
$threshold = 0.7 ,
Database $db = null
)
OpenAI service instance for embeddings and chat completions
vectorSearch
VectorSearchService
required
Vector search service for finding similar document chunks
Logger instance for tracking operations
Number of most similar chunks to retrieve
Minimum similarity score (0.0-1.0) to include a chunk
Optional database connection for embedding caching
Instantiation Example
$vectorSearch = new VectorSearchService (
$db ,
Config :: get ( 'rag.similarity_method' )
);
$rag = new RAGService (
$openai ,
$vectorSearch ,
$logger ,
3 , // Top 3 results
0.7 , // 70% similarity threshold
$db // Enable caching
);
Core Methods
generateResponse()
Generates an AI response using relevant document context.
public function generateResponse (
$userMessage ,
$systemPrompt = null ,
$conversationHistory = [],
$temperature = 0.7 ,
$maxTokens = 500
)
The user’s query or message
Custom system prompt to guide AI behavior
Previous messages for context (format: [['sender' => 'user|bot', 'message_text' => '...']])
Creativity level (0.0-1.0)
Returns: Array containing:
response (string|null): Generated AI response
context (string): Retrieved document context
confidence (float): Highest similarity score
sources (array): Source documents with scores
Usage Example
$result = $rag -> generateResponse (
$messageData [ 'text' ],
$systemPrompt ,
$conversationHistory ,
$openaiTemperature ,
$openaiMaxTokens
);
if ( $result [ 'response' ] && $result [ 'confidence' ] >= 0.7 ) {
$whatsapp -> sendMessage (
$conversation [ 'phone_number' ],
$result [ 'response' ]
);
$conversationService -> addMessage (
$conversation [ 'id' ],
'bot' ,
$result [ 'response' ],
null ,
$result [ 'context' ],
$result [ 'confidence' ]
);
}
indexDocument()
Indexes a document by splitting it into chunks and generating embeddings.
public function indexDocument (
$documentId ,
$text ,
$chunkSize = 500 ,
$overlap = 50
)
Database ID of the document being indexed
Full text content to index
Number of words per chunk
Number of overlapping words between chunks
Returns: Number of chunks successfully indexed
Usage Example
$documentText = TextProcessor :: extractText ( $filePath , 'pdf' );
$chunksIndexed = $rag -> indexDocument (
$documentId ,
$documentText ,
500 , // 500 words per chunk
50 // 50 word overlap
);
$logger -> info ( "Indexed { $chunksIndexed } chunks" );
RAG Pipeline Algorithm
The generateResponse() method implements a sophisticated RAG pipeline:
Query Embedding
Converts the user’s message into a vector embedding, using cache if available $queryEmbedding = $this -> getCachedOrCreateEmbedding ( $userMessage );
Vector Search
Searches for the most similar document chunks $similarChunks = $this -> vectorSearch -> searchSimilar (
$queryEmbedding ,
$this -> topK ,
$this -> threshold
);
Context Assembly
Combines retrieved chunks into a context string foreach ( $similarChunks as $chunk ) {
$contextParts [] = $chunk [ 'chunk_text' ];
$sources [] = [
'document' => $chunk [ 'original_name' ],
'score' => $chunk [ 'score' ]
];
$maxScore = max ( $maxScore , $chunk [ 'score' ]);
}
$context = implode ( " \n\n " , $contextParts );
Response Generation
Generates AI response with context and conversation history $response = $this -> openai -> generateResponse (
$userMessage ,
$context ,
$systemPrompt ,
$temperature ,
$maxTokens ,
$conversationHistory
);
Embedding Caching
The service implements intelligent embedding caching to reduce API costs and improve response times.
Cache Strategy
Cache Lookup
Cache Storage
Cache Cleanup
$normalized = trim ( mb_strtolower ( $userMessage ));
$queryHash = md5 ( $normalized );
$cached = $this -> db -> fetchOne (
' SELECT embedding FROM query_embedding_cache
WHERE query_hash = : hash
AND created_at > DATE_SUB( NOW () , INTERVAL 24 HOUR )' ,
[ ':hash' => $queryHash ]
);
if ( $cached && ! empty ( $cached [ 'embedding' ])) {
$this -> db -> query (
' UPDATE query_embedding_cache
SET last_used_at = NOW () , hit_count = hit_count + 1
WHERE query_hash = : hash ' ,
[ ':hash' => $queryHash ]
);
return VectorMath :: unserializeVector ( $cached [ 'embedding' ]);
}
$embedding = $this -> openai -> createEmbedding ( $userMessage );
$binaryEmbedding = VectorMath :: serializeVector ( $embedding );
$this -> db -> query (
' INSERT INTO query_embedding_cache
(query_hash, embedding, created_at, last_used_at, hit_count)
VALUES (: hash , :embedding, NOW () , NOW () , 0 )
ON DUPLICATE KEY UPDATE
embedding = :embedding2,
created_at = NOW () ,
last_used_at = NOW () ' ,
[
':hash' => $queryHash ,
':embedding' => $binaryEmbedding ,
':embedding2' => $binaryEmbedding
]
);
private function cleanExpiredCache ()
{
try {
$this -> db -> query (
' DELETE FROM query_embedding_cache
WHERE last_used_at < DATE_SUB( NOW () , INTERVAL 7 DAY )'
);
} catch ( \ Exception $e ) {
$this -> logger -> warning ( 'RAG: Cache cleanup failed' );
}
}
Cache Benefits
Cost Reduction Avoids redundant embedding API calls for repeated queries
Performance Instant embedding retrieval from database
Case Insensitive Normalizes queries to maximize cache hits
Auto Cleanup Removes entries unused for 7 days
Response Structure
No Context Found
if ( empty ( $similarChunks )) {
return [
'response' => null ,
'context' => '' ,
'confidence' => 0.0 ,
'sources' => []
];
}
When no relevant context is found, the bot falls back to using OpenAI Service without RAG context, or displays a fallback message.
Successful Response
{
"response" : "Based on our documentation, you can reset your password by..." ,
"context" : "Password Reset \n\n To reset your password: 1. Click 'Forgot Password'..." ,
"confidence" : 0.92 ,
"sources" : [
{
"document" : "user_guide.pdf" ,
"score" : 0.92
},
{
"document" : "faq.pdf" ,
"score" : 0.85
}
]
}
Error Handling
try {
$result = $rag -> generateResponse ( $messageData [ 'text' ], $systemPrompt );
if ( $result [ 'response' ] && $result [ 'confidence' ] >= 0.7 ) {
// Handle high-confidence response
} else {
// Handle low-confidence response
}
} catch ( \ Exception $e ) {
if ( strpos ( $e -> getMessage (), 'INSUFFICIENT_FUNDS' ) !== false ) {
// Handle OpenAI quota exceeded
$db -> query (
" INSERT INTO settings (setting_key, setting_value)
VALUES ('openai_status', 'insufficient_funds')
ON DUPLICATE KEY UPDATE setting_value = 'insufficient_funds'"
);
}
$logger -> error ( 'RAG Error: ' . $e -> getMessage ());
}
Configuration
Configure RAG behavior in config/config.php:
return [
'rag' => [
'similarity_method' => 'cosine' , // or 'euclidean'
'similarity_threshold' => 0.7 ,
'top_k' => 3 ,
'chunk_size' => 500 ,
'chunk_overlap' => 50
]
];
Best Practices
Adjust topK Based on Document Count
Small knowledge base (<10 docs): Use topK = 3
Medium knowledge base (10-50 docs): Use topK = 5
Large knowledge base (>50 docs): Use topK = 7-10
Set Appropriate Thresholds
Strict accuracy (70-85%): threshold = 0.75
Balanced (60-75%): threshold = 0.7 (default)
Broader recall (50-70%): threshold = 0.6
Technical docs: 300-500 words with 50 word overlap
Conversational content: 500-700 words with 100 word overlap
Short FAQs: 200-300 words with 30 word overlap
Always pass the $db parameter to enable embedding caching: $rag = new RAGService ( $openai , $vectorSearch , $logger , 3 , 0.7 , $db );
OpenAI Service Handles embeddings and chat completions
Vector Search Performs similarity search on embeddings
Next Steps
Index Documents Learn how to upload and index documents
Tune Parameters Optimize RAG performance for your use case