ILLMService interface, ensuring consistent behavior regardless of which vendor you choose.
Supported providers
The platform currently supports five LLM providers:- OpenAI
- Anthropic
- Google Gemini
- Groq
- Azure
OpenAI GPT
Provider ID:OpenAIGPTImplementation:
OpenAIGPTStreamingLLMService.csSupports the full GPT model family including GPT-4o, GPT-4 Turbo, and o-series reasoning models.Configuration fields
| Field | Type | Required | Description |
|---|---|---|---|
apiKey | password | Yes | OpenAI API key from platform.openai.com |
endpoint | text | Yes | API endpoint (default: https://api.openai.com/v1) |
model | select | Yes | Model identifier (e.g., gpt-4o, gpt-4-turbo) |
temperature | number | No | Sampling temperature (0.0-2.0) |
topP | number | No | Nucleus sampling (0.0-1.0) |
maxTokens | number | No | Max completion tokens (min: 200) |
serviceTier | select | No | default, flex, or priority |
reasoningEffort | select | No | For o-series models: minimal, low, medium, high |
reasoningSummary | select | No | auto, concise, or detailed |
Supported models
- GPT-4o - Multimodal flagship model
- GPT-4o mini - Cost-optimized variant
- GPT-4 Turbo - Previous generation high-performance
- o1 - Advanced reasoning model
- o1-mini - Faster reasoning variant
- o3-mini - Latest reasoning model
The
reasoningEffort and reasoningSummary parameters only apply to o-series models. They control how much computational budget the model uses for chain-of-thought reasoning.Example configuration
Implementation details
All LLM providers follow a consistent implementation pattern:Interface contract
Streaming architecture
All providers use server-sent events (SSE) for streaming:- Request initiation -
ProcessInputAsyncstarts the streaming request - Chunk reception - Provider SDK receives token deltas
- Event emission -
MessageStreamedevent fires for each chunk - Cancellation -
Cancel()stops the stream mid-flight
Provider manager
TheLLMProviderManager (defined in IqraInfrastructure/Managers/LLM/LLMProviderManager.cs) handles:
- Provider registration - Auto-discovers implementations via reflection
- Model catalog - Maintains available models per provider
- Configuration validation - Ensures required fields are present
- Instance creation - Instantiates provider services with credentials
- Integration linking - Connects to
IntegrationsManagerfor credential lookup
Configuration best practices
Temperature tuning
- 0.0-0.3 - Deterministic, factual responses (customer support, data lookup)
- 0.4-0.7 - Balanced creativity (general conversation)
- 0.8-1.0 - Creative, varied responses (storytelling, brainstorming)
- >1.0 - Highly random (rarely useful in production)
Token budgets
- Minimum 200 tokens - Enforced by Iqra AI to prevent truncated responses
- Voice use cases - Keep under 500 tokens for natural conversation pacing
- Complex reasoning - Allocate 2000+ tokens for o-series or Claude thinking modes
Model selection
- Latency-critical
- Quality-first
- Cost-optimized
Best options:
- Groq Llama 3.3 70B (fastest inference)
- GPT-4o mini (good balance)
- Gemini 2.0 Flash (multimodal + speed)
Adding custom providers
To add a new LLM provider:- Add enum value in
IqraCore/Entities/Interfaces/InterfaceLLMProviderEnum.cs - Implement interface in
IqraInfrastructure/Managers/LLM/Providers/ - Add static method
GetProviderTypeStatic()returning your enum value - Handle streaming using provider’s native SDK
- Restart application - Provider auto-registers on startup
OpenAIGPTStreamingLLMService.cs:26-65 for reference implementation.
Next steps
Configure agent prompts
Learn how to craft effective system prompts
Add voice output
Set up text-to-speech for conversations
Multi-language support
Configure parallel language contexts
Script builder
Build conversation flows in the visual IDE