Skip to main content
The Arcana.Ask module implements the complete RAG (Retrieval Augmented Generation) workflow for question answering. It retrieves relevant context from your knowledge base and uses an LLM to generate accurate, grounded answers.

Overview

The RAG workflow consists of three steps:
  1. Retrieve - Search for relevant context chunks using Arcana.Search
  2. Augment - Build a prompt with the retrieved context
  3. Generate - Use an LLM to generate an answer based on the context
This approach ensures answers are grounded in your actual documentation rather than relying solely on the LLM’s training data.

Function

ask/2

Asks a question using retrieved context from the knowledge base.
ask(question, opts) :: {:ok, answer, context} | {:error, term()}
question
string
required
The question to ask. This will be used to search for relevant context and passed to the LLM.
opts
keyword
required
Ask options:
repo
module
required
The Ecto repo to use for searching the knowledge base. Required unless configured globally via config :arcana, repo: MyApp.Repo.
llm
term
required
LLM implementing the Arcana.LLM protocol. Can be:
  • A string: "openai:gpt-4o-mini", "anthropic:claude-3-5-sonnet"
  • A configured module: {MyApp.CustomLLM, opts}
  • Any implementation of Arcana.LLM.complete/4
Required unless configured globally via config :arcana, llm: "openai:gpt-4o-mini".
limit
integer
default:"5"
Maximum number of context chunks to retrieve. More context provides better answers but increases LLM costs and latency.
mode
atom
default:":semantic"
Search mode for context retrieval:
  • :semantic - Vector similarity search (default)
  • :fulltext - Keyword-based search
  • :hybrid - Combines both modes
source_id
string
Filter context to documents with this source_id. Useful for scoping answers to specific document sources.
threshold
float
default:"0.0"
Minimum similarity score for context chunks (0.0 to 1.0). Filters out low-quality context.
collection
string
Filter context to a specific collection by name. Use this to answer questions from a subset of documents.
collections
list(string)
Filter context to multiple collections. Context is retrieved from all specified collections.
prompt
function
Custom prompt function with signature: fn question, context -> system_prompt_string endUse this to customize how context is presented to the LLM. The default prompt instructs the LLM to answer based on the provided context.
ok
{:ok, answer, context}
Returns a tuple with:
  • answer (string) - The LLM’s generated response
  • context (list) - The context chunks that were provided to the LLM. Each chunk is a map with:
    • id - Chunk UUID
    • text - Chunk text content
    • document_id - Parent document UUID
    • chunk_index - Position in document
    • score - Relevance score
error
{:error, term()}
Returns an error tuple if the ask operation fails:
  • {:error, :no_llm_configured} - No LLM specified in options or config
  • {:error, {:search_failed, reason}} - Failed to retrieve context
  • {:error, reason} - LLM generation failed
Examples:
# Basic question answering
{:ok, answer, context} = Arcana.Ask.ask(
  "What is Elixir?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o-mini"
)

IO.puts(answer)
# "Elixir is a dynamic, functional programming language designed for building 
# scalable and maintainable applications. It runs on the Erlang VM..."

IO.puts("Used #{length(context)} context chunks")
# "Used 3 context chunks"

# With more context
{:ok, answer, _context} = Arcana.Ask.ask(
  "How does Elixir handle concurrency?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o",
  limit: 10  # Retrieve more context for complex questions
)

# Scoped to a collection
{:ok, answer, _} = Arcana.Ask.ask(
  "How do I authenticate API requests?",
  repo: MyApp.Repo,
  llm: "anthropic:claude-3-5-sonnet",
  collection: "api_documentation"
)

# With custom prompt
{:ok, answer, _} = Arcana.Ask.ask(
  "Summarize the deployment process",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o-mini",
  prompt: fn question, context ->
    context_text = Enum.map_join(context, "\n\n", & &1.text)
    """
    You are a technical writer creating documentation.
    Be concise and use bullet points.
    
    Question: #{question}
    
    Documentation:
    #{context_text}
    """
  end
)

# Multiple collections with hybrid search
{:ok, answer, context} = Arcana.Ask.ask(
  "What are the security best practices?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o",
  collections: ["security_guides", "api_docs", "compliance"],
  mode: :hybrid,
  limit: 8
)

Custom Prompts

The default prompt instructs the LLM to answer based on context:
# Default prompt (built-in)
"""
Answer the user's question based on the following context.
If the answer is not in the context, say you don't know.

Context:
[retrieved chunks]
"""
Customize the prompt for different use cases:

Technical Documentation

defmodule MyApp.Prompts do
  def technical_docs(_question, context) do
    context_text = Enum.map_join(context, "\n\n---\n\n", & &1.text)
    
    """
    You are a technical documentation expert.
    
    Instructions:
    - Answer based only on the provided documentation
    - Include code examples when available
    - Be precise and accurate
    - If unsure, say "The documentation doesn't specify"
    
    Documentation:
    #{context_text}
    """
  end
end

Arcana.Ask.ask(
  question,
  repo: MyApp.Repo,
  llm: "openai:gpt-4o",
  prompt: &MyApp.Prompts.technical_docs/2
)

Conversational Support

def support_bot(_question, context) do
  context_text = Enum.map_join(context, "\n\n", & &1.text)
  
  """
  You are a friendly customer support assistant.
  
  Instructions:
  - Answer helpfully based on the knowledge base below
  - Be conversational and empathetic
  - If you can't help, suggest contacting support
  - Provide step-by-step instructions when appropriate
  
  Knowledge Base:
  #{context_text}
  """
end

Summarization

def summarizer(question, context) do
  context_text = Enum.map_join(context, "\n\n", & &1.text)
  
  """
  Create a concise summary answering: #{question}
  
  Requirements:
  - Use bullet points
  - Maximum 3-5 key points
  - Be factual and concise
  
  Source Material:
  #{context_text}
  """
end

Citation-Aware

def with_citations(_question, context) do
  # Build context with chunk IDs for citation
  context_text =
    context
    |> Enum.with_index(1)
    |> Enum.map_join("\n\n", fn {chunk, idx} ->
      "[#{idx}] #{chunk.text}"
    end)
  
  """
  Answer the question using the provided sources.
  Cite sources using [1], [2], etc. after each claim.
  
  Sources:
  #{context_text}
  """
end

{:ok, answer, context} = Arcana.Ask.ask(
  "What are the benefits?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o",
  prompt: &with_citations/2
)

IO.puts(answer)
# "The main benefits include scalability [1], fault tolerance [2], 
# and developer productivity [1][3]."

Search Configuration

The ask/2 function uses Arcana.Search under the hood. All search options are supported:

Semantic Search (Default)

Arcana.Ask.ask(
  "What is pattern matching?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o-mini",
  mode: :semantic,  # explicit, but this is default
  limit: 5
)
# Good for exact keyword matching
Arcana.Ask.ask(
  "GenServer callback documentation",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o",
  mode: :fulltext
)
# Best of both worlds
Arcana.Ask.ask(
  "How do I deploy to production?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o",
  mode: :hybrid
)

High-Quality Context Only

# Only use highly relevant context
Arcana.Ask.ask(
  question,
  repo: MyApp.Repo,
  llm: "openai:gpt-4o",
  threshold: 0.8,  # Only chunks with 80%+ similarity
  limit: 3
)

LLM Configuration

Arcana.Ask works with any LLM implementing the Arcana.LLM protocol:

String Format

# OpenAI
Arcana.Ask.ask(question, repo: MyApp.Repo, llm: "openai:gpt-4o-mini")
Arcana.Ask.ask(question, repo: MyApp.Repo, llm: "openai:gpt-4o")

# Anthropic
Arcana.Ask.ask(question, repo: MyApp.Repo, llm: "anthropic:claude-3-5-sonnet")
Arcana.Ask.ask(question, repo: MyApp.Repo, llm: "anthropic:claude-3-5-haiku")

Module Configuration

# With options
Arcana.Ask.ask(
  question,
  repo: MyApp.Repo,
  llm: {Arcana.LLM.OpenAI, model: "gpt-4o", temperature: 0.7}
)

# Custom implementation
Arcana.Ask.ask(
  question,
  repo: MyApp.Repo,
  llm: {MyApp.CustomLLM, api_key: "...", endpoint: "..."}
)

Global Configuration

# config/config.exs
config :arcana,
  repo: MyApp.Repo,
  llm: "openai:gpt-4o-mini"

# Now you can omit repo and llm
Arcana.Ask.ask("What is Elixir?")

Working with Context

The returned context can be used for various purposes:

Display Sources

{:ok, answer, context} = Arcana.Ask.ask(question, repo: MyApp.Repo, llm: llm)

IO.puts(answer)
IO.puts("\n---\nSources:")

Enum.each(context, fn chunk ->
  IO.puts("\n- Score: #{Float.round(chunk.score, 2)}")
  IO.puts("  Document: #{chunk.document_id}")
  IO.puts("  Text: #{String.slice(chunk.text, 0..100)}...")
end)

Confidence Scoring

{:ok, answer, context} = Arcana.Ask.ask(question, repo: MyApp.Repo, llm: llm)

avg_score = Enum.reduce(context, 0.0, & &1.score + &2) / length(context)

confidence = 
  cond do
    avg_score > 0.8 -> "high"
    avg_score > 0.6 -> "medium"
    true -> "low"
  end

IO.puts("Answer confidence: #{confidence}")
{:ok, answer, context} = Arcana.Ask.ask(question, repo: MyApp.Repo, llm: llm)

# Get unique document IDs
document_ids = 
  context
  |> Enum.map(& &1.document_id)
  |> Enum.uniq()

# Fetch documents with metadata
documents = 
  MyApp.Repo.all(
    from d in Arcana.Document,
    where: d.id in ^document_ids,
    select: %{id: d.id, file_path: d.file_path, metadata: d.metadata}
  )

IO.puts("\nReferences:")
Enum.each(documents, fn doc ->
  IO.puts("- #{doc.file_path || doc.metadata["title"]}")
end)

Error Handling

case Arcana.Ask.ask(question, repo: MyApp.Repo, llm: llm) do
  {:ok, answer, context} when context == [] ->
    # No relevant context found
    {:ok, "I couldn't find relevant information to answer that question."}
    
  {:ok, answer, context} ->
    # Success
    {:ok, answer}
    
  {:error, :no_llm_configured} ->
    Logger.error("LLM not configured")
    {:error, "Service unavailable"}
    
  {:error, {:search_failed, reason}} ->
    Logger.error("Search failed: #{inspect(reason)}")
    {:error, "Failed to retrieve context"}
    
  {:error, reason} ->
    Logger.error("LLM failed: #{inspect(reason)}")
    {:error, "Failed to generate answer"}
end

Telemetry Events

Monitor RAG operations with telemetry:
:telemetry.attach(
  "ask-handler",
  [:arcana, :ask, :stop],
  fn _event, measurements, metadata, _config ->
    IO.puts("Ask took #{measurements.duration}ns")
    IO.puts("Question: #{metadata.question}")
    IO.puts("Context chunks: #{metadata.context_count}")
    if metadata[:answer] do
      IO.puts("Answer length: #{String.length(metadata.answer)}")
    end
  end,
  nil
)
Events:
  • [:arcana, :ask, :start] - RAG operation started
  • [:arcana, :ask, :stop] - RAG operation completed
  • [:arcana, :ask, :exception] - RAG operation failed

Best Practices

Context Amount

# Simple questions: fewer chunks
Arcana.Ask.ask(
  "What is X?",
  repo: MyApp.Repo,
  llm: llm,
  limit: 3
)

# Complex questions: more context
Arcana.Ask.ask(
  "Compare X and Y, including pros and cons",
  repo: MyApp.Repo,
  llm: llm,
  limit: 10
)

Quality Over Quantity

# Use threshold to filter low-quality context
Arcana.Ask.ask(
  question,
  repo: MyApp.Repo,
  llm: llm,
  threshold: 0.7,  # Only use relevant chunks
  limit: 5
)

Cost Optimization

# Use smaller, cheaper models for simple questions
Arcana.Ask.ask(
  "What is the API endpoint?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o-mini"  # Cheaper
)

# Use powerful models for complex reasoning
Arcana.Ask.ask(
  "Analyze the trade-offs between these approaches",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o"  # More capable
)

Build docs developers (and LLMs) love