Function Signature
Generates the final answer from search results using the LLM.
Purpose
Collects all chunks from results, deduplicates by ID, and prompts the LLM to generate an answer based on the context.
Handles both with-context answers (normal RAG) and no-context answers (when skip_retrieval: true from gate/2).
Parameters
ctx
Arcana.Agent.Context
required
The agent context from the pipeline
Options for the answer step
Options
Custom answerer module or function (default: Arcana.Agent.Answerer.LLM)
- Module: Must implement
answer/3 callback
- Function: Signature
fn question, chunks, opts -> {:ok, answer} | {:error, reason} end
Custom prompt function fn question, chunks -> prompt_string endOnly used by the default LLM answerer.
Override the LLM function for this step
Enable self-correcting answers (default: false)If true, the LLM evaluates the answer and corrects it if not well-grounded in context.
Max correction attempts (default: 2)Only used when self_correct: true.
Context Updates
The chunks that were used to generate the answer (deduplicated)
Number of self-corrections performed (0 if self_correct is false)
List of {previous_answer, feedback} tuples from correction iterations
Set if answer generation fails
Examples
Basic Usage
ctx
|> Arcana.Agent.search()
|> Arcana.Agent.answer()
ctx.answer
# => "Elixir is a functional, concurrent programming language..."
ctx.context_used
# => [%{id: 1, text: "...", score: 0.89}, ...]
With Self-Correction
ctx
|> Arcana.Agent.search()
|> Arcana.Agent.answer(self_correct: true, max_corrections: 3)
ctx.answer
# => "Based on the documentation, GenServer is..."
ctx.correction_count
# => 1
ctx.corrections
# => [
# {"GenServer is probably...", "Answer contains speculation, stick to documentation"}
# ]
Complete Pipeline
ctx = Arcana.Agent.new("What is Elixir?")
|> Arcana.Agent.gate()
|> Arcana.Agent.search()
|> Arcana.Agent.rerank()
|> Arcana.Agent.answer()
IO.puts(ctx.answer)
No-Context Answer (Skip Retrieval)
ctx = Arcana.Agent.new("What is 2 + 2?")
|> Arcana.Agent.gate() # Sets skip_retrieval: true
|> Arcana.Agent.search() # Returns empty results
|> Arcana.Agent.answer() # Answers from general knowledge
ctx.skip_retrieval
# => true
ctx.context_used
# => []
ctx.answer
# => "2 + 2 equals 4."
Custom Answerer Module
defmodule MyApp.TemplateAnswerer do
@behaviour Arcana.Agent.Answerer
@impl true
def answer(question, chunks, opts) do
skip_retrieval = Keyword.get(opts, :skip_retrieval, false)
answer =
if skip_retrieval do
general_knowledge_answer(question)
else
template_based_answer(question, chunks)
end
{:ok, answer}
end
defp template_based_answer(question, chunks) do
context = Enum.map_join(chunks, "\n", & &1.text)
"""
Based on our documentation:
#{context}
To answer your question "#{question}":
#{synthesize_answer(question, chunks)}
"""
end
defp general_knowledge_answer(question) do
"This is a general knowledge question: #{question}"
end
end
# Usage
ctx
|> Arcana.Agent.search()
|> Arcana.Agent.answer(answerer: MyApp.TemplateAnswerer)
Inline Answerer Function
ctx
|> Arcana.Agent.search()
|> Arcana.Agent.answer(
answerer: fn question, chunks, opts ->
llm = Keyword.fetch!(opts, :llm)
# Simple prompt
context = Enum.map_join(chunks, "\n\n", & &1.text)
prompt = "Q: #{question}\n\nContext:\n#{context}\n\nA:"
Arcana.LLM.complete(llm, prompt, [], [])
end
)
Streaming Answer
defmodule MyApp.StreamingAnswerer do
@behaviour Arcana.Agent.Answerer
@impl true
def answer(question, chunks, opts) do
callback = Keyword.get(opts, :stream_callback)
llm = Keyword.fetch!(opts, :llm)
context = Enum.map_join(chunks, "\n\n", & &1.text)
prompt = build_prompt(question, context)
# Stream tokens to callback
full_answer =
llm
|> stream_completion(prompt)
|> Enum.reduce("", fn token, acc ->
if callback, do: callback.(token)
acc <> token
end)
{:ok, full_answer}
end
end
# Usage
ctx
|> Arcana.Agent.search()
|> Arcana.Agent.answer(
answerer: MyApp.StreamingAnswerer,
stream_callback: fn token -> IO.write(token) end
)
Self-Correction Flow
When self_correct: true:
- Generate initial answer
- Ask LLM: “Is this answer well-grounded in the context?”
- If yes → Done
- If no → Get feedback and regenerate
- Repeat until grounded or max_corrections reached
ctx
|> Arcana.Agent.search()
|> Arcana.Agent.answer(self_correct: true, max_corrections: 2)
# Iteration 1:
# Answer: "GenServer probably handles state..."
# Eval: Not grounded - contains speculation
# Feedback: "Stick to documentation, avoid 'probably'"
# Iteration 2:
# Answer: "According to the docs, GenServer handles state..."
# Eval: Well-grounded
# Done
ctx.correction_count
# => 1
Evaluation Prompt
For self-correction, the LLM evaluates answers with:
"""
Evaluate if the following answer is well-grounded in the provided context.
Question: "#{question}"
Context:
#{context}
Answer to evaluate:
#{answer}
Respond with JSON:
- If the answer is well-grounded and accurate: {"grounded": true}
- If the answer needs improvement: {"grounded": false, "feedback": "specific feedback on what to improve"}
Only mark as not grounded if there are clear issues like:
- Claims not supported by the context
- Missing key information from the context
- Factual errors
JSON response:
"""
Chunk Deduplication
Chunks are deduplicated by ID before answer generation:
# If search/decompose/reason created duplicate chunks
ctx.results = [
%{chunks: [%{id: 1, text: "..."}, %{id: 2, text: "..."}]},
%{chunks: [%{id: 2, text: "..."}, %{id: 3, text: "..."}]} # id:2 duplicate
]
# Deduplication
ctx.context_used
# => [%{id: 1, ...}, %{id: 2, ...}, %{id: 3, ...}] # Only unique chunks
Custom Answerer Behaviour
defmodule Arcana.Agent.Answerer do
@callback answer(
question :: String.t(),
chunks :: [map()],
opts :: Keyword.t()
) ::
{:ok, String.t()} | {:error, term()}
end
Answerer options include:
llm - The LLM function
skip_retrieval - Whether this is a no-context answer
- Any custom options passed to
answer/2
Telemetry Events
Emits [:arcana, :agent, :answer]:
# Start metadata
%{
question: ctx.question,
answerer: Arcana.Agent.Answerer.LLM
}
# Stop metadata
%{
context_chunk_count: 5,
correction_count: 1,
success: true
}
Also emits [:arcana, :agent, :self_correct] for each correction attempt:
# Start metadata
%{attempt: 1}
# Stop metadata
%{
result: :corrected, # or :accepted, :error, :eval_failed
attempt: 1
}
Error Handling
If answer generation fails, sets ctx.error:
ctx
|> Arcana.Agent.search()
|> Arcana.Agent.answer()
if ctx.error do
Logger.error("Answer failed: #{ctx.error}")
{:error, ctx.error}
else
{:ok, ctx.answer}
end
When to Use Self-Correction
Use self_correct: true when:
- Answer quality is critical
- You want to reduce hallucinations
- Context might be ambiguous
- LLM tends to speculate beyond context
Costs:
- 2-3× more LLM calls (eval + regeneration)
- Increased latency
- Higher token usage
Best Practices
- Use rerank first - Better chunks = better answers
- Monitor context_used - Ensure relevant chunks are used
- Tune prompts - Custom prompts for domain-specific answers
- Handle errors - Check
ctx.error before using answer
- Consider streaming - For better UX on long answers
- Use self-correct selectively - Only when quality is critical
Common Patterns
Minimal Pipeline
Agent.new("question")
|> Agent.search()
|> Agent.answer()
Quality-Focused Pipeline
Agent.new("question")
|> Agent.search()
|> Agent.reason(max_iterations: 2)
|> Agent.rerank(threshold: 8)
|> Agent.answer(self_correct: true)
Speed-Focused Pipeline
Agent.new("question")
|> Agent.gate() # Skip retrieval if possible
|> Agent.search(collection: "fast_index")
|> Agent.answer() # No rerank, no self-correct
See Also