Arcana.ask/2 and the Agent pipeline. Use model strings, functions, or custom modules.
Quick Start
Using req_llm (Recommended)
req_llm provides a unified interface to 45+ LLM providers.Setup
Model Strings
Pass model strings directly toask/2 or Agent functions:
- OpenAI
- Anthropic
- Google
- Other Providers
Model String Options
Pass options as a tuple:Global Configuration
Set a default LLM in your config::llm:
Function-Based LLM
Provide a function for custom LLM logic:- 1-Arity Function
- 2-Arity Function
- 3-Arity Function
Signature:
fn prompt -> {:ok, response} | {:error, reason}Custom LLM Module
Implement custom LLM logic in a module:Agentic RAG
Use LLMs with the Agent pipeline for complex workflows:Pipeline Steps with LLM
Each Agent step uses the LLM:| Step | LLM Purpose |
|---|---|
gate/2 | Decide if retrieval is needed |
rewrite/2 | Clean up conversational queries |
select/2 | Choose relevant collections |
expand/2 | Add synonyms and related terms |
decompose/2 | Split into sub-questions |
reason/2 | Evaluate if more search needed |
rerank/2 | Score chunk relevance (0-10) |
answer/2 | Generate final answer |
Custom Prompts
Override default prompts for any step:Streaming Responses
Stream LLM responses for better UX in LiveView:Custom RAG Module
Wrap Arcana for app-specific RAG:Cost Tracking
Monitor LLM costs via telemetry:Best Practices
- Use gpt-4o-mini for development - Fast and cheap ($0.15/1M tokens)
- Upgrade to Claude 4.5 for production - Better quality, longer context
- Set max_tokens - Prevent runaway costs
- Use temperature=0.7 - Good balance of creativity and consistency
- Stream responses - Better UX for chat interfaces
- Monitor costs - Attach telemetry handlers
- Cache common queries - LLM calls are expensive
- Use hybrid search - Better context = better answers
Model Selection Guide
| Use Case | Recommended Model | Reason |
|---|---|---|
| Development | gpt-4o-mini | Fast, cheap, good quality |
| Production | claude-sonnet-4-20250514 | Best quality, 200K context |
| High Volume | gemini-2.0-flash-exp | Free tier, fast |
| Complex Reasoning | gpt-4o or claude-opus-4 | Best reasoning capabilities |
| Low Latency | groq:llama-3.1-* | Ultra-fast inference |
| Budget | gemini-flash or gpt-4o-mini | Low cost |
Troubleshooting
req_llm not loaded
req_llm not loaded
Add dependency:Run:
API key errors
API key errors
Set environment variables:Check config:
Rate limit errors
Rate limit errors
Implement retry logic:
Timeout errors
Timeout errors
Increase timeout:
Next Steps
Agentic RAG Guide
Build sophisticated RAG pipelines
Embeddings
Configure embedding providers