RAG Service

Overview

The RAGService is the core component that implements Retrieval-Augmented Generation (RAG). It combines vector search with OpenAI’s language models to provide context-aware, accurate responses based on your indexed documents.

Class Structure

Constructor

public function __construct(
    OpenAIService $openai,
    VectorSearchService $vectorSearch,
    Logger $logger,
    $topK = 3,
    $threshold = 0.7,
    Database $db = null
)

openai

OpenAIService

required

OpenAI service instance for embeddings and chat completions

vectorSearch

VectorSearchService

required

Vector search service for finding similar document chunks

logger

Logger

required

Logger instance for tracking operations

topK

int

default:"3"

Number of most similar chunks to retrieve

threshold

float

default:"0.7"

Minimum similarity score (0.0-1.0) to include a chunk

Database

default:"null"

Optional database connection for embedding caching

Instantiation Example

webhook.php

$vectorSearch = new VectorSearchService(
    $db,
    Config::get('rag.similarity_method')
);

$rag = new RAGService(
    $openai,
    $vectorSearch,
    $logger,
    3,      // Top 3 results
    0.7,    // 70% similarity threshold
    $db     // Enable caching
);

Core Methods

generateResponse()

Generates an AI response using relevant document context.

public function generateResponse(
    $userMessage,
    $systemPrompt = null,
    $conversationHistory = [],
    $temperature = 0.7,
    $maxTokens = 500
)

userMessage

string

required

The user’s query or message

systemPrompt

string

default:"null"

Custom system prompt to guide AI behavior

conversationHistory

array

default:"[]"

Previous messages for context (format: [['sender' => 'user|bot', 'message_text' => '...']])

temperature

float

default:"0.7"

Creativity level (0.0-1.0)

maxTokens

int

default:"500"

Maximum response length

Returns: Array containing:

response (string|null): Generated AI response
context (string): Retrieved document context
confidence (float): Highest similarity score
sources (array): Source documents with scores

Usage Example

webhook.php

$result = $rag->generateResponse(
    $messageData['text'],
    $systemPrompt,
    $conversationHistory,
    $openaiTemperature,
    $openaiMaxTokens
);

if ($result['response'] && $result['confidence'] >= 0.7) {
    $whatsapp->sendMessage(
        $conversation['phone_number'],
        $result['response']
    );
    
    $conversationService->addMessage(
        $conversation['id'],
        'bot',
        $result['response'],
        null,
        $result['context'],
        $result['confidence']
    );
}

indexDocument()

Indexes a document by splitting it into chunks and generating embeddings.

public function indexDocument(
    $documentId,
    $text,
    $chunkSize = 500,
    $overlap = 50
)

documentId

int

required

Database ID of the document being indexed

text

string

required

Full text content to index

chunkSize

int

default:"500"

Number of words per chunk

overlap

int

default:"50"

Number of overlapping words between chunks

Returns: Number of chunks successfully indexed

Usage Example

$documentText = TextProcessor::extractText($filePath, 'pdf');
$chunksIndexed = $rag->indexDocument(
    $documentId,
    $documentText,
    500,  // 500 words per chunk
    50    // 50 word overlap
);

$logger->info("Indexed {$chunksIndexed} chunks");

RAG Pipeline Algorithm

The generateResponse() method implements a sophisticated RAG pipeline:

Query Embedding

Converts the user’s message into a vector embedding, using cache if available

$queryEmbedding = $this->getCachedOrCreateEmbedding($userMessage);

Vector Search

Searches for the most similar document chunks

$similarChunks = $this->vectorSearch->searchSimilar(
    $queryEmbedding,
    $this->topK,
    $this->threshold
);

Context Assembly

Combines retrieved chunks into a context string

foreach ($similarChunks as $chunk) {
    $contextParts[] = $chunk['chunk_text'];
    $sources[] = [
        'document' => $chunk['original_name'],
        'score' => $chunk['score']
    ];
    $maxScore = max($maxScore, $chunk['score']);
}
$context = implode("\n\n", $contextParts);

Response Generation

Generates AI response with context and conversation history

$response = $this->openai->generateResponse(
    $userMessage,
    $context,
    $systemPrompt,
    $temperature,
    $maxTokens,
    $conversationHistory
);

Embedding Caching

The service implements intelligent embedding caching to reduce API costs and improve response times.

Cache Strategy

Cache Lookup
Cache Storage
Cache Cleanup

$normalized = trim(mb_strtolower($userMessage));
$queryHash = md5($normalized);

$cached = $this->db->fetchOne(
    'SELECT embedding FROM query_embedding_cache 
     WHERE query_hash = :hash 
     AND created_at > DATE_SUB(NOW(), INTERVAL 24 HOUR)',
    [':hash' => $queryHash]
);

if ($cached && !empty($cached['embedding'])) {
    $this->db->query(
        'UPDATE query_embedding_cache 
         SET last_used_at = NOW(), hit_count = hit_count + 1 
         WHERE query_hash = :hash',
        [':hash' => $queryHash]
    );
    return VectorMath::unserializeVector($cached['embedding']);
}

$embedding = $this->openai->createEmbedding($userMessage);
$binaryEmbedding = VectorMath::serializeVector($embedding);

$this->db->query(
    'INSERT INTO query_embedding_cache 
     (query_hash, embedding, created_at, last_used_at, hit_count)
     VALUES (:hash, :embedding, NOW(), NOW(), 0)
     ON DUPLICATE KEY UPDATE 
     embedding = :embedding2, 
     created_at = NOW(), 
     last_used_at = NOW()',
    [
        ':hash' => $queryHash,
        ':embedding' => $binaryEmbedding,
        ':embedding2' => $binaryEmbedding
    ]
);

private function cleanExpiredCache()
{
    try {
        $this->db->query(
            'DELETE FROM query_embedding_cache 
             WHERE last_used_at < DATE_SUB(NOW(), INTERVAL 7 DAY)'
        );
    } catch (\Exception $e) {
        $this->logger->warning('RAG: Cache cleanup failed');
    }
}

Cache Benefits

Cost Reduction

Avoids redundant embedding API calls for repeated queries

Performance

Instant embedding retrieval from database

Case Insensitive

Normalizes queries to maximize cache hits

Auto Cleanup

Removes entries unused for 7 days

Response Structure

No Context Found

if (empty($similarChunks)) {
    return [
        'response' => null,
        'context' => '',
        'confidence' => 0.0,
        'sources' => []
    ];
}

When no relevant context is found, the bot falls back to using OpenAI Service without RAG context, or displays a fallback message.

Successful Response

{
  "response": "Based on our documentation, you can reset your password by...",
  "context": "Password Reset\n\nTo reset your password: 1. Click 'Forgot Password'...",
  "confidence": 0.92,
  "sources": [
    {
      "document": "user_guide.pdf",
      "score": 0.92
    },
    {
      "document": "faq.pdf",
      "score": 0.85
    }
  ]
}

Error Handling

try {
    $result = $rag->generateResponse($messageData['text'], $systemPrompt);
    
    if ($result['response'] && $result['confidence'] >= 0.7) {
        // Handle high-confidence response
    } else {
        // Handle low-confidence response
    }
} catch (\Exception $e) {
    if (strpos($e->getMessage(), 'INSUFFICIENT_FUNDS') !== false) {
        // Handle OpenAI quota exceeded
        $db->query(
            "INSERT INTO settings (setting_key, setting_value) 
             VALUES ('openai_status', 'insufficient_funds') 
             ON DUPLICATE KEY UPDATE setting_value = 'insufficient_funds'"
        );
    }
    
    $logger->error('RAG Error: ' . $e->getMessage());
}

Configuration

Configure RAG behavior in config/config.php:

config/config.php

return [
    'rag' => [
        'similarity_method' => 'cosine',  // or 'euclidean'
        'similarity_threshold' => 0.7,
        'top_k' => 3,
        'chunk_size' => 500,
        'chunk_overlap' => 50
    ]
];

Best Practices

Adjust topK Based on Document Count

Small knowledge base (<10 docs): Use topK = 3
Medium knowledge base (10-50 docs): Use topK = 5
Large knowledge base (>50 docs): Use topK = 7-10

Set Appropriate Thresholds

Strict accuracy (70-85%): threshold = 0.75
Balanced (60-75%): threshold = 0.7 (default)
Broader recall (50-70%): threshold = 0.6

Optimize Chunk Size

Technical docs: 300-500 words with 50 word overlap
Conversational content: 500-700 words with 100 word overlap
Short FAQs: 200-300 words with 30 word overlap

Enable Database Caching

Always pass the $db parameter to enable embedding caching:

$rag = new RAGService($openai, $vectorSearch, $logger, 3, 0.7, $db);

OpenAI Service

Handles embeddings and chat completions

Vector Search

Performs similarity search on embeddings

Next Steps

Index Documents

Learn how to upload and index documents

Tune Parameters

Optimize RAG performance for your use case

Core Services

Business Logic

Infrastructure

Overview

Class Structure

Constructor

Instantiation Example

Core Methods

generateResponse()

Usage Example

indexDocument()

Usage Example

RAG Pipeline Algorithm

Embedding Caching

Cache Strategy

Cache Benefits

Cost Reduction

Performance

Case Insensitive

Auto Cleanup

Response Structure

No Context Found

Successful Response

Error Handling

Configuration

Best Practices

OpenAI Service

Vector Search

Next Steps

Index Documents

Tune Parameters

Build docs developers (and LLMs) love

Core Services

Business Logic

Infrastructure

​Overview

​Class Structure

​Constructor

​Instantiation Example

​Core Methods

​generateResponse()

​Usage Example

​indexDocument()

​Usage Example

​RAG Pipeline Algorithm

​Embedding Caching

​Cache Strategy

​Cache Benefits

Cost Reduction

Performance

Case Insensitive

Auto Cleanup

​Response Structure

​No Context Found

​Successful Response

​Error Handling

​Configuration

​Best Practices

​Related Services

OpenAI Service

Vector Search

​Next Steps

Index Documents

Tune Parameters

Build docs developers (and LLMs) love

Overview

Class Structure

Constructor

Instantiation Example

Core Methods

generateResponse()

Usage Example

indexDocument()

Usage Example

RAG Pipeline Algorithm

Embedding Caching

Cache Strategy

Cache Benefits

Response Structure

No Context Found

Successful Response

Error Handling

Configuration

Best Practices

Related Services

Next Steps