Skip to main content
Iqra AI supports multiple large language model (LLM) providers through a unified streaming interface. All providers implement the ILLMService interface, ensuring consistent behavior regardless of which vendor you choose.

Supported providers

The platform currently supports five LLM providers:

OpenAI GPT

Provider ID: OpenAIGPT
Implementation: OpenAIGPTStreamingLLMService.cs
Supports the full GPT model family including GPT-4o, GPT-4 Turbo, and o-series reasoning models.

Configuration fields

FieldTypeRequiredDescription
apiKeypasswordYesOpenAI API key from platform.openai.com
endpointtextYesAPI endpoint (default: https://api.openai.com/v1)
modelselectYesModel identifier (e.g., gpt-4o, gpt-4-turbo)
temperaturenumberNoSampling temperature (0.0-2.0)
topPnumberNoNucleus sampling (0.0-1.0)
maxTokensnumberNoMax completion tokens (min: 200)
serviceTierselectNodefault, flex, or priority
reasoningEffortselectNoFor o-series models: minimal, low, medium, high
reasoningSummaryselectNoauto, concise, or detailed

Supported models

  • GPT-4o - Multimodal flagship model
  • GPT-4o mini - Cost-optimized variant
  • GPT-4 Turbo - Previous generation high-performance
  • o1 - Advanced reasoning model
  • o1-mini - Faster reasoning variant
  • o3-mini - Latest reasoning model
The reasoningEffort and reasoningSummary parameters only apply to o-series models. They control how much computational budget the model uses for chain-of-thought reasoning.

Example configuration

{
  "model": "gpt-4o",
  "temperature": 0.7,
  "maxTokens": 1000,
  "serviceTier": "default"
}

Implementation details

All LLM providers follow a consistent implementation pattern:

Interface contract

public interface ILLMService
{
    // Streaming events
    event EventHandler<ConversationAgentEventLLMStreamed>? MessageStreamed;
    event EventHandler<ConversationAgentEventLLMStreamCancelled> MessageStreamedCancelled;
    
    // Core methods
    Task ProcessInputAsync(CancellationToken cancellationToken, 
                          string? beforeMessageContext = null, 
                          string? afterMessageContext = null);
    void AddUserMessage(string message);
    void SetSystemPrompt(string prompt);
    void Cancel();
}

Streaming architecture

All providers use server-sent events (SSE) for streaming:
  1. Request initiation - ProcessInputAsync starts the streaming request
  2. Chunk reception - Provider SDK receives token deltas
  3. Event emission - MessageStreamed event fires for each chunk
  4. Cancellation - Cancel() stops the stream mid-flight
This architecture ensures ultra-low latency for voice applications—the agent can start speaking before the LLM completes the full response.

Provider manager

The LLMProviderManager (defined in IqraInfrastructure/Managers/LLM/LLMProviderManager.cs) handles:
  • Provider registration - Auto-discovers implementations via reflection
  • Model catalog - Maintains available models per provider
  • Configuration validation - Ensures required fields are present
  • Instance creation - Instantiates provider services with credentials
  • Integration linking - Connects to IntegrationsManager for credential lookup

Configuration best practices

Temperature tuning

  • 0.0-0.3 - Deterministic, factual responses (customer support, data lookup)
  • 0.4-0.7 - Balanced creativity (general conversation)
  • 0.8-1.0 - Creative, varied responses (storytelling, brainstorming)
  • >1.0 - Highly random (rarely useful in production)

Token budgets

  • Minimum 200 tokens - Enforced by Iqra AI to prevent truncated responses
  • Voice use cases - Keep under 500 tokens for natural conversation pacing
  • Complex reasoning - Allocate 2000+ tokens for o-series or Claude thinking modes

Model selection

Best options:
  • Groq Llama 3.3 70B (fastest inference)
  • GPT-4o mini (good balance)
  • Gemini 2.0 Flash (multimodal + speed)
Avoid o-series models and Claude with thinking enabled.

Adding custom providers

To add a new LLM provider:
  1. Add enum value in IqraCore/Entities/Interfaces/InterfaceLLMProviderEnum.cs
  2. Implement interface in IqraInfrastructure/Managers/LLM/Providers/
  3. Add static method GetProviderTypeStatic() returning your enum value
  4. Handle streaming using provider’s native SDK
  5. Restart application - Provider auto-registers on startup
See OpenAIGPTStreamingLLMService.cs:26-65 for reference implementation.

Next steps

Configure agent prompts

Learn how to craft effective system prompts

Add voice output

Set up text-to-speech for conversations

Multi-language support

Configure parallel language contexts

Script builder

Build conversation flows in the visual IDE

Build docs developers (and LLMs) love