Embeddings

What are Embeddings?

Embeddings are numerical representations of text that capture semantic meaning. They transform words, sentences, or documents into vectors (lists of numbers) in a high-dimensional space where similar meanings are positioned close together.

Why Embeddings Matter for RAG

Semantic Understanding
Cross-lingual
Synonym Awareness

Traditional keyword search matches exact words. Embeddings understand meaning:

Query: "ML algorithms"

Keyword match:
❌ "machine learning models"  (no match - different words)
✅ "ML algorithms overview"    (exact match)

Embedding match:
✅ "machine learning models"  (0.87 similarity - same concept)
✅ "ML algorithms overview"    (0.92 similarity)
✅ "neural networks"           (0.78 similarity - related)

Embeddings can match concepts across languages:

Query (English): "artificial intelligence"

Results:
✅ "intelligence artificielle" (French - 0.85 similarity)
✅ "künstliche Intelligenz"    (German - 0.83 similarity)

Requires multilingual models like sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Embeddings automatically handle synonyms:

Document: "The automobile was fast"
Query: "car speed"

Similarity: 0.82 ✅

# "automobile" ≈ "car", "fast" ≈ "speed"

How Vector Similarity Search Works

Arcana uses cosine similarity to find relevant chunks:

# 1. Embed the query
query = "What is Elixir?"
{:ok, query_embedding} = Embedder.embed(embedder, query, intent: :query)
# => [0.23, -0.45, 0.67, ...] (384 dimensions for bge-small)

# 2. Compare with stored chunk embeddings using cosine similarity
# Cosine similarity measures the angle between vectors (0 to 1)
#   1.0 = identical meaning
#   0.8+ = highly relevant
#   0.5-0.8 = somewhat relevant
#   under 0.5 = not relevant

# 3. PostgreSQL pgvector computes similarity efficiently
results = VectorStore.search(collection, query_embedding, limit: 5)
# Returns chunks sorted by similarity score

Cosine Similarity Formula

Given two vectors A and B:

similarity = (A · B) / (||A|| × ||B||)

Where:
- A · B = dot product (sum of element-wise products)
- ||A|| = magnitude of vector A
- ||B|| = magnitude of vector B

PostgreSQL implementation (from lib/arcana/vector_store/pgvector.ex:109):

SELECT 
  id, text,
  1 - (embedding <=> query_embedding) AS score
FROM arcana_chunks
ORDER BY embedding <=> query_embedding
LIMIT 10

The <=> operator computes cosine distance. Arcana converts it to similarity with 1 - distance.

Embedding Providers

Arcana supports multiple embedding providers with a pluggable architecture:

Local (Bumblebee)
OpenAI
Custom Provider

Default: Run models locally with no API costs.

# config/config.exs
config :arcana, embedder: :local
config :arcana, embedder: {:local, model: "BAAI/bge-large-en-v1.5"}

Pros:

No API costs
Data privacy (no external calls)
No rate limits

Cons:

Requires CPU/GPU resources
Slower initial model download
Needs Nx backend (EXLA, EMLX, or Torchx)

Popular models (from lib/arcana/embedder/local.ex:33-48):

Model	Dimensions	Size	Best For
`BAAI/bge-small-en-v1.5`	384	~133 MB	Default - balanced speed/quality
`BAAI/bge-base-en-v1.5`	768	~438 MB	Better quality, slower
`BAAI/bge-large-en-v1.5`	1024	~1.3 GB	Best quality, slowest
`intfloat/e5-small-v2`	384	~133 MB	Requires query/passage prefixes
`sentence-transformers/all-MiniLM-L6-v2`	384	~90 MB	Lightweight, fast

Setup:

# Add to supervision tree
children = [
  MyApp.Repo,
  {Arcana.Embedder.Local, model: "BAAI/bge-small-en-v1.5"}
]

# Configure Nx backend (required)
config :nx,
  default_backend: EXLA.Backend,
  default_defn_options: [compiler: EXLA]

Use OpenAI’s embedding API via Req.LLM.

# config/config.exs
config :arcana, embedder: :openai
config :arcana, embedder: {:openai, model: "text-embedding-3-large"}

Pros:

No local resources needed
Fast inference
State-of-the-art quality

Cons:

API costs (~$0.13 per 1M tokens for text-embedding-3-small)
Rate limits
Data sent to OpenAI

Models (from lib/arcana/embedder/openai.ex:54-62):

Model	Dimensions	Cost (per 1M tokens)
`text-embedding-3-small`	1536	$0.02
`text-embedding-3-large`	3072	$0.13
`text-embedding-ada-002`	1536	$0.10

Setup:

# Add to mix.exs
{:req_llm, "~> 0.3"}

# Set environment variable
export OPENAI_API_KEY="sk-..."

Implement your own embedding provider (Cohere, Voyage AI, etc.):

defmodule MyApp.CohereEmbedder do
  @behaviour Arcana.Embedder

  @impl true
  def embed(text, opts) do
    api_key = opts[:api_key] || System.get_env("COHERE_API_KEY")
    
    # Call Cohere API
    body = Jason.encode!(%{
      texts: [text],
      model: "embed-english-v3.0",
      input_type: "search_document"
    })
    
    case Req.post(
      "https://api.cohere.ai/v1/embed",
      headers: [{"Authorization", "Bearer #{api_key}"}],
      body: body
    ) do
      {:ok, %{body: %{"embeddings" => [embedding]}}} ->
        {:ok, embedding}
      {:error, reason} ->
        {:error, reason}
    end
  end

  @impl true
  def dimensions(_opts), do: 1024
end

# config/config.exs
config :arcana, embedder: {MyApp.CohereEmbedder, api_key: "..."}

Required callbacks (from lib/arcana/embedder.ex:56-77):

embed/2 - Embed single text → {:ok, [float()]} or {:error, term()}
dimensions/1 - Return embedding dimension count
embed_batch/2 (optional) - Batch embedding for efficiency

E5 Models and Query/Passage Prefixes

E5 models from Microsoft require special prefixes to distinguish search queries from document content:

# Query embedding (what the user searches for)
Embedder.embed(embedder, "What is Elixir?", intent: :query)
# Behind the scenes: "query: What is Elixir?" (lib/arcana/embedder/local.ex:141-150)

# Document embedding (content being indexed)
Embedder.embed(embedder, "Elixir is a functional language...", intent: :document)
# Behind the scenes: "passage: Elixir is a functional language..."

Why Prefixes Matter

E5 models were trained with these prefixes to differentiate:

Queries = short, question-like text
Passages = longer document chunks

Using the wrong prefix significantly reduces retrieval quality. Automatic prefix handling (from lib/arcana/embedder/local.ex:141-151):

def prepare_text(text, model, intent) do
  if MapSet.member?(@e5_models, model) do
    case intent do
      :query -> "query: #{text}"
      :document -> "passage: #{text}"
      nil -> "passage: #{text}"  # default to passage
    end
  else
    text  # Other models don't need prefixes
  end
end

Only E5 models (intfloat/e5-*) require prefixes. BGE, GTE, and Sentence Transformers models do not use them.

Embedding Dimensions Comparison

Dimensions affect:

Storage size: More dimensions = larger database
Search speed: More dimensions = slower cosine similarity
Quality: Generally, more dimensions = better semantic understanding (with diminishing returns)

Storage Calculator

# Each dimension = 4 bytes (float32)
# Example: 10,000 chunks with bge-small (384 dims)

chunk_count = 10_000
dimensions = 384
bytes_per_dim = 4

total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~14.6 MB for embeddings alone

# With text-embedding-3-large (3072 dims):
dimensions = 3_072
total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~117 MB for embeddings

Dimension Trade-offs

Low Dimensions (384)

Models: bge-small, e5-small, MiniLMPros:

Fast search (under 10ms for 100K chunks)
Small storage footprint
Quick embedding generation

Cons:

Slightly lower semantic precision
May miss subtle relationships

Best for: High-volume applications, real-time search

Medium Dimensions (768)

Models: bge-base, e5-basePros:

Balanced quality/speed
Good semantic understanding

Cons:

2x storage vs 384 dims
Moderate speed impact

Best for: Most production use cases

High Dimensions (1024-1536)

Models: bge-large, text-embedding-3-smallPros:

Excellent semantic precision
Better handling of nuanced queries

Cons:

3-4x storage vs 384 dims
Slower search on large datasets

Best for: Research, legal, medical domains

Very High Dimensions (3072)

Models: text-embedding-3-largePros:

State-of-the-art quality
Best for complex domains

Cons:

8x storage vs 384 dims
Noticeably slower search
Higher API costs

Best for: Critical applications where quality > cost

How Embeddings Work with Vector Stores

Ingestion flow (from lib/arcana/ingest.ex:125-156):

# 1. Text is chunked
chunks = Chunker.chunk(chunker_config, text, opts)
# => [%{text: "...", chunk_index: 0, token_count: 342}, ...]

# 2. Each chunk is embedded
Enum.reduce_while(chunks, {:ok, []}, fn chunk, {:ok, acc} ->
  case Embedder.embed(emb, chunk.text, intent: :document) do
    {:ok, embedding} ->
      # 3. Embedding stored with chunk
      chunk_record =
        %Chunk{}
        |> Chunk.changeset(%{
          text: chunk.text,
          embedding: embedding,  # [0.23, -0.45, 0.67, ...]
          chunk_index: chunk.chunk_index,
          document_id: document.id
        })
        |> repo.insert!()
      
      {:cont, {:ok, [chunk_record | acc]}}
      
    {:error, reason} ->
      {:halt, {:error, reason}}
  end
end)

Search flow (from lib/arcana/search.ex:228-246):

# 1. Query is embedded
case Embedder.embed(embedder, query, intent: :query) do
  {:ok, query_embedding} ->
    # 2. pgvector finds similar chunks
    results = VectorStore.search(collection, query_embedding, opts)
    
    # SQL behind the scenes (lib/arcana/vector_store/pgvector.ex:97-114):
    # SELECT id, text,
    #   1 - (embedding <=> $1) AS score
    # FROM arcana_chunks
    # WHERE 1 - (embedding <=> $1) > $2  -- threshold
    # ORDER BY embedding <=> $1
    # LIMIT $3
    
    {:ok, transform_results(results)}
end

Embedding Real Examples

Local Bumblebee
OpenAI
Batch Embedding

# Start the serving (in supervision tree)
children = [
  {Arcana.Embedder.Local, model: "BAAI/bge-small-en-v1.5"}
]

# Embed a query
embedder = {:local, model: "BAAI/bge-small-en-v1.5"}
{:ok, embedding} = Arcana.Embedder.embed(
  embedder, 
  "How does Phoenix LiveView work?",
  intent: :query
)

# Result:
# {:ok, [0.234, -0.456, 0.678, ...]} (384 floats)
length(embedding)  # => 384

# Behind the scenes (lib/arcana/embedder/local.ex:99-115):
# 1. Text sent to Nx.Serving (Bumblebee model)
# 2. Model computes embedding on EXLA/EMLX backend
# 3. Nx tensor converted to Elixir list
# 4. Telemetry event emitted

# Set API key
System.put_env("OPENAI_API_KEY", "sk-...")

# Embed with OpenAI
embedder = {:openai, model: "text-embedding-3-small"}
{:ok, embedding} = Arcana.Embedder.embed(
  embedder,
  "Explain vector databases",
  intent: :query
)

# Result:
# {:ok, [0.123, -0.234, 0.345, ...]} (1536 floats)
length(embedding)  # => 1536

# Behind the scenes (lib/arcana/embedder/openai.ex:40-50):
# 1. Call ReqLLM.embed("openai:text-embedding-3-small", text)
# 2. HTTP POST to OpenAI API
# 3. Parse response JSON
# 4. Return embedding vector

# Embed multiple texts efficiently
texts = [
  "First document chunk",
  "Second document chunk",
  "Third document chunk"
]

{:ok, embeddings} = Arcana.Embedder.embed_batch(embedder, texts)

# Result:
# {:ok, [
#   [0.12, -0.34, ...],  # 384 floats
#   [0.23, -0.45, ...],  # 384 floats
#   [0.34, -0.56, ...]   # 384 floats
# ]}

# Note: Falls back to sequential embedding if provider
# doesn't implement embed_batch/2 (lib/arcana/embedder.ex:113-126)

Choosing the Right Embedding Model

Consider Your Dataset Size

Small (under 10K chunks): Use any model, even large onesMedium (10K-100K chunks): Use 384-768 dimensionsLarge (>100K chunks): Prefer 384 dimensions for speed

Evaluate Quality Requirements

General knowledge base: bge-small or MiniLM (384 dims)Technical/domain-specific: bge-base or e5-base (768 dims)Legal/medical/research: bge-large or OpenAI large (1024-3072 dims)

Factor in Costs

Budget-constrained: Use local models (no API costs)Scale & convenience: OpenAI (pay per usage)Hybrid: Local for documents, OpenAI for queries (rare embeddings)

Test with Your Data

# Benchmark different models
models = [
  {:local, model: "BAAI/bge-small-en-v1.5"},
  {:local, model: "BAAI/bge-base-en-v1.5"},
  {:openai, model: "text-embedding-3-small"}
]

test_queries = ["query 1", "query 2", "query 3"]

Enum.each(models, fn embedder ->
  # Ingest test data
  # Run test queries
  # Measure precision/recall
  # Compare results
end)

See Evaluation Guide for metrics and testing.

Best Practices

Use Intent Parameter

Always specify :intent for E5 models:

# For queries
embed(embedder, query, intent: :query)

# For documents
embed(embedder, chunk, intent: :document)

Cache Embeddings

Never re-embed the same text. Arcana stores embeddings automatically:

# Embeddings stored in arcana_chunks table
# Only query text needs fresh embedding

Monitor Dimensions

Ensure dimensions match across ingestion and search:

# Migration checks dimensions
Embedder.dimensions(embedder)
# Must match for all chunks in collection

Handle Errors

Embedding can fail (API limits, model load):

case Embedder.embed(embedder, text) do
  {:ok, embedding} -> # proceed
  {:error, reason} -> # retry or log
end

Next Steps

Search Modes

Learn how to search with embeddings using semantic, full-text, and hybrid modes

Chunking Strategies

Optimize how documents are split before embedding

Evaluation

Measure and improve embedding quality with metrics

Getting Started

Set up your first embedding-powered application

Getting Started

Core Concepts

Guides

Configuration

What are Embeddings?

Why Embeddings Matter for RAG

How Vector Similarity Search Works

Cosine Similarity Formula

Embedding Providers

E5 Models and Query/Passage Prefixes

Why Prefixes Matter

Embedding Dimensions Comparison

Storage Calculator

Dimension Trade-offs

Low Dimensions (384)

Medium Dimensions (768)

High Dimensions (1024-1536)

Very High Dimensions (3072)

How Embeddings Work with Vector Stores

Embedding Real Examples

Choosing the Right Embedding Model

Best Practices

Use Intent Parameter

Cache Embeddings

Monitor Dimensions

Handle Errors

Next Steps

Search Modes

Chunking Strategies

Evaluation

Getting Started

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Configuration

​What are Embeddings?

​Why Embeddings Matter for RAG

​How Vector Similarity Search Works

​Cosine Similarity Formula

​Embedding Providers

​E5 Models and Query/Passage Prefixes

​Why Prefixes Matter

​Embedding Dimensions Comparison

​Storage Calculator

​Dimension Trade-offs

Low Dimensions (384)

Medium Dimensions (768)

High Dimensions (1024-1536)

Very High Dimensions (3072)

​How Embeddings Work with Vector Stores

​Embedding Real Examples

​Choosing the Right Embedding Model

​Best Practices

Use Intent Parameter

Cache Embeddings

Monitor Dimensions

Handle Errors

​Next Steps

Search Modes

Chunking Strategies

Evaluation

Getting Started

Build docs developers (and LLMs) love

What are Embeddings?

Why Embeddings Matter for RAG

How Vector Similarity Search Works

Cosine Similarity Formula

Embedding Providers

E5 Models and Query/Passage Prefixes

Why Prefixes Matter

Embedding Dimensions Comparison

Storage Calculator

Dimension Trade-offs

How Embeddings Work with Vector Stores

Embedding Real Examples

Choosing the Right Embedding Model

Best Practices

Next Steps