Skip to main content
Embedding functions provide tools for normalizing and computing similarity between embedding vectors. All functions are available via fc.embedding.*.

normalize

Normalize embedding vectors to unit length.
fc.embedding.normalize(column: ColumnOrName) -> Column
column
ColumnOrName
required
Column containing embedding vectors.
return
Column
A column of normalized embedding vectors with the same embedding type.
  • Normalizes each embedding vector to have unit length (L2 norm = 1)
  • Preserves the original embedding model in the type
  • Null values are preserved as null
  • Zero vectors become NaN after normalization

Examples

df.select(
    fc.embedding.normalize(fc.col("embeddings")).alias("unit_embeddings")
)

compute_similarity

Compute similarity between embedding vectors using specified metric.
fc.embedding.compute_similarity(
    column: ColumnOrName,
    other: Union[ColumnOrName, List[float], np.ndarray],
    metric: SemanticSimilarityMetric = "cosine"
) -> Column
column
ColumnOrName
required
Column containing embedding vectors.
other
Union[ColumnOrName, List[float], np.ndarray]
required
Either:
  • Another column containing embedding vectors for pairwise similarity
  • A query vector (list of floats or numpy array) for similarity with each embedding
metric
SemanticSimilarityMetric
default:"cosine"
The similarity metric to use:
  • cosine: Cosine similarity (range: -1 to 1, higher is more similar)
  • dot: Dot product similarity (raw inner product)
  • l2: L2 (Euclidean) distance (lower is more similar)
return
Column
A column of float values representing similarity scores.
  • Cosine similarity normalizes vectors internally, so pre-normalization is not required
  • Dot product does not normalize, useful when vectors are already normalized
  • L2 distance measures the straight-line distance between vectors
  • When using two columns, dimensions must match between embeddings

Examples

query = [0.1, 0.2, 0.3]
df.select(
    fc.embedding.compute_similarity(
        fc.col("embeddings"),
        query
    ).alias("similarity")
)

Build docs developers (and LLMs) love