Model Architecture
Sentinel AI’s AI stack consists of:Language Model
GPT-4o for reasoning, planning, and decision-making
Embedding Model
text-embedding-3-small for document vectorization
Vector Database
Pinecone for semantic search and knowledge retrieval
Reranker
Cohere Rerank for improving search relevance
Primary Language Model
The main reasoning engine uses OpenAI’s GPT-4o model:Model Parameters
The OpenAI model to use for reasoning and text generation.Supported models:
gpt-4o- Latest GPT-4 optimized model (recommended)gpt-4-turbo- Fast GPT-4 variantgpt-4- Standard GPT-4gpt-3.5-turbo- Faster, cheaper alternative
src/core/config.py:11Sampling temperature for response generation.
0- Deterministic, consistent responses (recommended for DevOps)0.0-0.3- Focused, predictable outputs0.4-0.7- Balanced creativity and consistency0.8-1.0- More creative, varied responses
src/core/config.py:12Temperature is set to
0 for deterministic DevOps operations. Increase for more creative problem-solving.Customizing the Language Model
To use a different model, modifysrc/core/config.py:
Embedding Model
Embeddings convert text into vector representations for semantic search:Embedding Parameters
OpenAI embedding model for document vectorization.Available models:
text-embedding-3-small- 1536 dimensions, fast and efficient (recommended)text-embedding-3-large- 3072 dimensions, higher qualitytext-embedding-ada-002- Legacy model, 1536 dimensions
src/core/config.py:19-20Dimension size for embeddings. Must match the model’s output dimensions.
text-embedding-3-small: 1536 dimstext-embedding-3-large: 3072 dimstext-embedding-ada-002: 1536 dims
src/core/config.py:21Switching Embedding Models
To use a different embedding model:Vector Database (Pinecone)
Pinecone stores document embeddings for semantic search:Pinecone Configuration
Name of the Pinecone index for storing document vectors.Defined in:
src/core/config.py:15-16Change this if you want to use a different index or environment.Index Configuration
The Pinecone index is configured with:- Dimension: 1536 (matches
text-embedding-3-small) - Metric: Cosine similarity
- Cloud: AWS
- Region: us-east-1
- Type: Serverless (auto-scaling)
Customize Index Configuration
Customize Index Configuration
Modify
src/core/knowledge.py:27-36 to change index settings:Reranker (Cohere)
Cohere Rerank improves search relevance by reranking retrieved documents:Reranker Configuration
Number of documents to return after reranking.Defined in:
src/core/knowledge.py:23- Lower values (3-5): Faster, more focused results
- Higher values (10-15): More context, slower processing
Customizing Reranker
Modifysrc/core/knowledge.py to adjust reranking behavior:
RAG Configuration
Retrieval-Augmented Generation (RAG) combines vector search with LLM reasoning:Document Chunking
Documents are split into chunks for efficient retrieval:Maximum number of characters per chunk.
- Smaller chunks (512-1024): More precise, faster search
- Larger chunks (2048-4096): More context, slower search
src/core/knowledge.py:62Number of overlapping characters between chunks.Prevents information loss at chunk boundaries.Defined in:
src/core/knowledge.py:62Query Optimization
Sentinel AI uses query rewriting to improve search accuracy:Query rewriting generates 5 variations of each question to improve recall. Results are combined and reranked for relevance.
Retrieval Settings
Number of documents to retrieve per query from Pinecone.Defined in:
src/core/knowledge.py:126With 6 query variations × 5 docs each = up to 30 documents retrieved, then reranked to top 5.LlamaParse Configuration
LlamaParse converts PDF documentation to markdown:Parser Parameters
- result_type: Output format (
markdownortext) - verbose: Enable detailed logging
- language: Primary document language (
en,es, etc.)
Customize PDF Parsing
Customize PDF Parsing
Modify
src/core/knowledge.py to adjust parsing behavior:Response Generation
The final response is generated using a structured prompt:The system prompt enforces strict adherence to source material, preventing hallucinations and ensuring accurate technical responses.
Performance Tuning
Optimize for Speed
Optimize for Quality
Optimize for Cost
Monitoring and Debugging
Enable Verbose Logging
Test RAG Pipeline
Cost Estimation
Estimate API costs for different configurations:| Component | Model | Input Cost | Output Cost | Notes |
|---|---|---|---|---|
| LLM | gpt-4o | $2.50/1M tokens | $10/1M tokens | Main reasoning |
| LLM | gpt-3.5-turbo | $0.50/1M tokens | $1.50/1M tokens | Budget option |
| Embeddings | text-embedding-3-small | $0.02/1M tokens | - | Document vectorization |
| Embeddings | text-embedding-3-large | $0.13/1M tokens | - | Higher quality |
| Rerank | Cohere Rerank | $1.00/1000 searches | - | Per query |
| Vector DB | Pinecone Serverless | $0.10/1M reads | $2.00/1M writes | Storage + queries |
Next Steps
Environment Variables
Configure API keys and system settings
Services Configuration
Define services to monitor
Knowledge Base
Learn about ingesting documentation
Agent Workflow
Understand how the agent uses these models