Skip to main content

What are Embeddings?

Embeddings are numerical representations of text that capture semantic meaning. They transform words, sentences, or documents into vectors (lists of numbers) in a high-dimensional space where similar meanings are positioned close together.

Why Embeddings Matter for RAG

Traditional keyword search matches exact words. Embeddings understand meaning:
Query: "ML algorithms"

Keyword match:
❌ "machine learning models"  (no match - different words)
✅ "ML algorithms overview"    (exact match)

Embedding match:
✅ "machine learning models"  (0.87 similarity - same concept)
✅ "ML algorithms overview"    (0.92 similarity)
✅ "neural networks"           (0.78 similarity - related)

How Vector Similarity Search Works

Arcana uses cosine similarity to find relevant chunks:
# 1. Embed the query
query = "What is Elixir?"
{:ok, query_embedding} = Embedder.embed(embedder, query, intent: :query)
# => [0.23, -0.45, 0.67, ...] (384 dimensions for bge-small)

# 2. Compare with stored chunk embeddings using cosine similarity
# Cosine similarity measures the angle between vectors (0 to 1)
#   1.0 = identical meaning
#   0.8+ = highly relevant
#   0.5-0.8 = somewhat relevant
#   under 0.5 = not relevant

# 3. PostgreSQL pgvector computes similarity efficiently
results = VectorStore.search(collection, query_embedding, limit: 5)
# Returns chunks sorted by similarity score

Cosine Similarity Formula

Given two vectors A and B:
similarity = (A · B) / (||A|| × ||B||)

Where:
- A · B = dot product (sum of element-wise products)
- ||A|| = magnitude of vector A
- ||B|| = magnitude of vector B
PostgreSQL implementation (from lib/arcana/vector_store/pgvector.ex:109):
SELECT 
  id, text,
  1 - (embedding <=> query_embedding) AS score
FROM arcana_chunks
ORDER BY embedding <=> query_embedding
LIMIT 10
The <=> operator computes cosine distance. Arcana converts it to similarity with 1 - distance.

Embedding Providers

Arcana supports multiple embedding providers with a pluggable architecture:
Default: Run models locally with no API costs.
# config/config.exs
config :arcana, embedder: :local
config :arcana, embedder: {:local, model: "BAAI/bge-large-en-v1.5"}
Pros:
  • No API costs
  • Data privacy (no external calls)
  • No rate limits
Cons:
  • Requires CPU/GPU resources
  • Slower initial model download
  • Needs Nx backend (EXLA, EMLX, or Torchx)
Popular models (from lib/arcana/embedder/local.ex:33-48):
ModelDimensionsSizeBest For
BAAI/bge-small-en-v1.5384~133 MBDefault - balanced speed/quality
BAAI/bge-base-en-v1.5768~438 MBBetter quality, slower
BAAI/bge-large-en-v1.51024~1.3 GBBest quality, slowest
intfloat/e5-small-v2384~133 MBRequires query/passage prefixes
sentence-transformers/all-MiniLM-L6-v2384~90 MBLightweight, fast
Setup:
# Add to supervision tree
children = [
  MyApp.Repo,
  {Arcana.Embedder.Local, model: "BAAI/bge-small-en-v1.5"}
]

# Configure Nx backend (required)
config :nx,
  default_backend: EXLA.Backend,
  default_defn_options: [compiler: EXLA]

E5 Models and Query/Passage Prefixes

E5 models from Microsoft require special prefixes to distinguish search queries from document content:
# Query embedding (what the user searches for)
Embedder.embed(embedder, "What is Elixir?", intent: :query)
# Behind the scenes: "query: What is Elixir?" (lib/arcana/embedder/local.ex:141-150)

# Document embedding (content being indexed)
Embedder.embed(embedder, "Elixir is a functional language...", intent: :document)
# Behind the scenes: "passage: Elixir is a functional language..."

Why Prefixes Matter

E5 models were trained with these prefixes to differentiate:
  • Queries = short, question-like text
  • Passages = longer document chunks
Using the wrong prefix significantly reduces retrieval quality. Automatic prefix handling (from lib/arcana/embedder/local.ex:141-151):
def prepare_text(text, model, intent) do
  if MapSet.member?(@e5_models, model) do
    case intent do
      :query -> "query: #{text}"
      :document -> "passage: #{text}"
      nil -> "passage: #{text}"  # default to passage
    end
  else
    text  # Other models don't need prefixes
  end
end
Only E5 models (intfloat/e5-*) require prefixes. BGE, GTE, and Sentence Transformers models do not use them.

Embedding Dimensions Comparison

Dimensions affect:
  • Storage size: More dimensions = larger database
  • Search speed: More dimensions = slower cosine similarity
  • Quality: Generally, more dimensions = better semantic understanding (with diminishing returns)

Storage Calculator

# Each dimension = 4 bytes (float32)
# Example: 10,000 chunks with bge-small (384 dims)

chunk_count = 10_000
dimensions = 384
bytes_per_dim = 4

total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~14.6 MB for embeddings alone

# With text-embedding-3-large (3072 dims):
dimensions = 3_072
total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~117 MB for embeddings

Dimension Trade-offs

Low Dimensions (384)

Models: bge-small, e5-small, MiniLMPros:
  • Fast search (under 10ms for 100K chunks)
  • Small storage footprint
  • Quick embedding generation
Cons:
  • Slightly lower semantic precision
  • May miss subtle relationships
Best for: High-volume applications, real-time search

Medium Dimensions (768)

Models: bge-base, e5-basePros:
  • Balanced quality/speed
  • Good semantic understanding
Cons:
  • 2x storage vs 384 dims
  • Moderate speed impact
Best for: Most production use cases

High Dimensions (1024-1536)

Models: bge-large, text-embedding-3-smallPros:
  • Excellent semantic precision
  • Better handling of nuanced queries
Cons:
  • 3-4x storage vs 384 dims
  • Slower search on large datasets
Best for: Research, legal, medical domains

Very High Dimensions (3072)

Models: text-embedding-3-largePros:
  • State-of-the-art quality
  • Best for complex domains
Cons:
  • 8x storage vs 384 dims
  • Noticeably slower search
  • Higher API costs
Best for: Critical applications where quality > cost

How Embeddings Work with Vector Stores

Ingestion flow (from lib/arcana/ingest.ex:125-156):
# 1. Text is chunked
chunks = Chunker.chunk(chunker_config, text, opts)
# => [%{text: "...", chunk_index: 0, token_count: 342}, ...]

# 2. Each chunk is embedded
Enum.reduce_while(chunks, {:ok, []}, fn chunk, {:ok, acc} ->
  case Embedder.embed(emb, chunk.text, intent: :document) do
    {:ok, embedding} ->
      # 3. Embedding stored with chunk
      chunk_record =
        %Chunk{}
        |> Chunk.changeset(%{
          text: chunk.text,
          embedding: embedding,  # [0.23, -0.45, 0.67, ...]
          chunk_index: chunk.chunk_index,
          document_id: document.id
        })
        |> repo.insert!()
      
      {:cont, {:ok, [chunk_record | acc]}}
      
    {:error, reason} ->
      {:halt, {:error, reason}}
  end
end)
Search flow (from lib/arcana/search.ex:228-246):
# 1. Query is embedded
case Embedder.embed(embedder, query, intent: :query) do
  {:ok, query_embedding} ->
    # 2. pgvector finds similar chunks
    results = VectorStore.search(collection, query_embedding, opts)
    
    # SQL behind the scenes (lib/arcana/vector_store/pgvector.ex:97-114):
    # SELECT id, text,
    #   1 - (embedding <=> $1) AS score
    # FROM arcana_chunks
    # WHERE 1 - (embedding <=> $1) > $2  -- threshold
    # ORDER BY embedding <=> $1
    # LIMIT $3
    
    {:ok, transform_results(results)}
end

Embedding Real Examples

# Start the serving (in supervision tree)
children = [
  {Arcana.Embedder.Local, model: "BAAI/bge-small-en-v1.5"}
]

# Embed a query
embedder = {:local, model: "BAAI/bge-small-en-v1.5"}
{:ok, embedding} = Arcana.Embedder.embed(
  embedder, 
  "How does Phoenix LiveView work?",
  intent: :query
)

# Result:
# {:ok, [0.234, -0.456, 0.678, ...]} (384 floats)
length(embedding)  # => 384

# Behind the scenes (lib/arcana/embedder/local.ex:99-115):
# 1. Text sent to Nx.Serving (Bumblebee model)
# 2. Model computes embedding on EXLA/EMLX backend
# 3. Nx tensor converted to Elixir list
# 4. Telemetry event emitted

Choosing the Right Embedding Model

1

Consider Your Dataset Size

Small (under 10K chunks): Use any model, even large onesMedium (10K-100K chunks): Use 384-768 dimensionsLarge (>100K chunks): Prefer 384 dimensions for speed
2

Evaluate Quality Requirements

General knowledge base: bge-small or MiniLM (384 dims)Technical/domain-specific: bge-base or e5-base (768 dims)Legal/medical/research: bge-large or OpenAI large (1024-3072 dims)
3

Factor in Costs

Budget-constrained: Use local models (no API costs)Scale & convenience: OpenAI (pay per usage)Hybrid: Local for documents, OpenAI for queries (rare embeddings)
4

Test with Your Data

# Benchmark different models
models = [
  {:local, model: "BAAI/bge-small-en-v1.5"},
  {:local, model: "BAAI/bge-base-en-v1.5"},
  {:openai, model: "text-embedding-3-small"}
]

test_queries = ["query 1", "query 2", "query 3"]

Enum.each(models, fn embedder ->
  # Ingest test data
  # Run test queries
  # Measure precision/recall
  # Compare results
end)
See Evaluation Guide for metrics and testing.

Best Practices

Use Intent Parameter

Always specify :intent for E5 models:
# For queries
embed(embedder, query, intent: :query)

# For documents
embed(embedder, chunk, intent: :document)

Cache Embeddings

Never re-embed the same text. Arcana stores embeddings automatically:
# Embeddings stored in arcana_chunks table
# Only query text needs fresh embedding

Monitor Dimensions

Ensure dimensions match across ingestion and search:
# Migration checks dimensions
Embedder.dimensions(embedder)
# Must match for all chunks in collection

Handle Errors

Embedding can fail (API limits, model load):
case Embedder.embed(embedder, text) do
  {:ok, embedding} -> # proceed
  {:error, reason} -> # retry or log
end

Next Steps

Search Modes

Learn how to search with embeddings using semantic, full-text, and hybrid modes

Chunking Strategies

Optimize how documents are split before embedding

Evaluation

Measure and improve embedding quality with metrics

Getting Started

Set up your first embedding-powered application

Build docs developers (and LLMs) love