Embeddings

Embeddings are numerical representations of text that capture semantic meaning. In Flowise, embeddings power retrieval-augmented generation (RAG), semantic search, and document similarity features.

Overview

Embeddings transform text into high-dimensional vectors:

Text: "How do I reset my password?"
        ↓
Embedding Model
        ↓
Vector: [0.234, -0.891, 0.432, ..., 0.123] (1536 dimensions)

These vectors enable:

Semantic Search: Find similar content by meaning, not just keywords
Document Retrieval: Retrieve relevant documents for RAG
Clustering: Group similar documents together
Similarity Comparison: Measure how similar two texts are

Supported Embedding Providers

Flowise supports multiple embedding providers:

OpenAI Embeddings

Industry-standard embeddings with excellent performance: Available Models:

text-embedding-3-large: 3072 dimensions, best quality
text-embedding-3-small: 1536 dimensions, faster and cheaper
text-embedding-ada-002: 1536 dimensions, previous generation

Pricing:

text-embedding-3-large: $0.13 per 1M tokens
text-embedding-3-small: $0.02 per 1M tokens
text-embedding-ada-002: $0.10 per 1M tokens

Azure OpenAI Embeddings

OpenAI embeddings through Azure:

Same models as OpenAI
Enterprise security and compliance
Private network deployment
Regional data residency

Cohere Embeddings

Multilingual embeddings with strong performance: Available Models:

embed-english-v3.0: English optimized
embed-multilingual-v3.0: 100+ languages
embed-english-light-v3.0: Faster, lower cost

Features:

Compression support
Input type specification (search_document, search_query)
Fine-tuning capabilities

HuggingFace Embeddings

Open-source embeddings for privacy and cost control: Popular Models:

sentence-transformers/all-MiniLM-L6-v2: 384 dimensions, fast
sentence-transformers/all-mpnet-base-v2: 768 dimensions, high quality
intfloat/e5-large-v2: 1024 dimensions, state-of-the-art

Deployment Options:

HuggingFace Inference API (cloud)
Self-hosted (via HuggingFace Inference Endpoints)
Local execution

Google Vertex AI Embeddings

Google Cloud’s embedding service: Available Models:

textembedding-gecko@003: 768 dimensions
textembedding-gecko-multilingual@001: Multilingual support
text-embedding-preview-0409: Preview models

Features:

Google Cloud integration
Enterprise-grade SLAs
Global deployment

Ollama Embeddings

Run embeddings locally with Ollama: Available Models:

nomic-embed-text: 768 dimensions, high quality
mxbai-embed-large: 1024 dimensions
all-minilm: 384 dimensions, lightweight

Benefits:

Complete privacy (local execution)
No API costs
Offline capability
Custom model support

Additional Providers

Mistral AI: European AI with competitive pricing
Voyage AI: Optimized for retrieval tasks
Jina AI: Multimodal embeddings (text + images)
Together AI: Multiple open-source models
AWS Bedrock: Amazon’s embedding service
IBM Watsonx: Enterprise AI platform

Using Embeddings in Flowise

In Document Stores

Embeddings are essential for document stores:

Create Document Store

Navigate to Document Store and create a new store.

Add Documents

Upload documents using document loaders.

Configure Embeddings

In the Upsert Configuration screen:

Click Select Embeddings Provider

Choose your preferred provider

Configure credentials and settings

Select Vector Store

Choose where to store the embedded vectors.

Upsert Documents

Click Upsert to generate embeddings and store them.

In Chatflows

Add embeddings directly to your chatflow:

Drag an Embeddings node onto the canvas
Configure the embeddings provider
Connect to a Vector Store node
Connect document loaders to the vector store

[PDF Loader] → [Text Splitter] → [Vector Store] ← [Embeddings]
                                        ↓
                                  [Retriever]

Configuring Embedding Providers

OpenAI Embeddings

{
  "model": "text-embedding-3-small",
  "dimensions": 1536,  // Optional: reduce dimensions
  "batchSize": 512,    // Batch processing
  "stripNewLines": true,
  "timeout": 30000
}

Add Credential

Go to Credentials settings

Add OpenAI API credential

Enter your OpenAI API key

Select Model

Choose the embedding model:

Use text-embedding-3-small for most cases

Use text-embedding-3-large for best quality

Use text-embedding-ada-002 for compatibility

Configure Options

Optional parameters:

Dimensions: Reduce vector size (only for v3 models)

Batch Size: Number of texts to embed at once

Strip New Lines: Remove newlines before embedding

Cohere Embeddings

{
  "model": "embed-english-v3.0",
  "inputType": "search_document",  // or "search_query"
  "truncate": "END",  // How to truncate long texts
  "embeddingTypes": ["float"]  // Can add "int8" for compression
}

HuggingFace Embeddings

{
  "model": "sentence-transformers/all-MiniLM-L6-v2",
  "endpointUrl": "https://api-inference.huggingface.co",
  // For self-hosted:
  // "endpointUrl": "http://localhost:8080"
}

Ollama Embeddings

{
  "model": "nomic-embed-text",
  "baseUrl": "http://localhost:11434"
}

Make sure Ollama is running locally and the model is downloaded:

ollama pull nomic-embed-text

Embedding Dimensions

Different models produce different vector dimensions:

Provider	Model	Dimensions	Use Case
OpenAI	text-embedding-3-small	1536	General purpose
OpenAI	text-embedding-3-large	3072	High accuracy
Cohere	embed-english-v3.0	1024	English content
HuggingFace	all-MiniLM-L6-v2	384	Fast retrieval
HuggingFace	all-mpnet-base-v2	768	Quality balance
Ollama	nomic-embed-text	768	Local deployment
Google	textembedding-gecko	768	GCP integration

Dimension Reduction

Some providers support dimension reduction:

// OpenAI text-embedding-3-small (normally 1536 dims)
{
  "model": "text-embedding-3-small",
  "dimensions": 512  // Reduce to 512 dims
}

Benefits:

Lower storage costs
Faster similarity search
Reduced memory usage

Trade-offs:

Slightly lower accuracy
Cannot increase later

Best Practices

Choosing an Embedding Model

Consider these factors: For Production:

OpenAI text-embedding-3-small: Best balance of cost/performance
Cohere embed-english-v3.0: Great for English content
Azure OpenAI: Enterprise requirements

For Development:

HuggingFace models: No API costs
Ollama: Local testing

For Privacy:

Ollama: Complete data privacy
Self-hosted HuggingFace: Control your infrastructure

For Multilingual:

Cohere embed-multilingual-v3.0: 100+ languages
Google Vertex AI: Strong multilingual support

Input Optimization

Text Preprocessing:

# Clean text before embedding
def prepare_text(text):
    # Remove excessive whitespace
    text = " ".join(text.split())
    # Remove special characters if needed
    # Normalize unicode
    return text

Chunking Strategy:

{
  "chunkSize": 512,      // Match model's context window
  "chunkOverlap": 50,    // Preserve context
  "separator": "\n\n"    // Split on paragraphs
}

Cost Optimization

Batch Processing: Embed multiple texts at once:

{
  "batchSize": 100  // Process 100 texts per API call
}

Caching: Avoid re-embedding the same text:

import hashlib

def get_embedding(text, cache):
    text_hash = hashlib.md5(text.encode()).hexdigest()
    
    if text_hash in cache:
        return cache[text_hash]
    
    embedding = embed(text)
    cache[text_hash] = embedding
    return embedding

Model Selection: Balance cost vs. quality:

High Volume:
├─ OpenAI text-embedding-3-small ($0.02/1M tokens)
├─ HuggingFace (self-hosted, free)
└─ Ollama (local, free)

High Quality:
├─ OpenAI text-embedding-3-large ($0.13/1M tokens)
├─ Cohere embed-english-v3.0
└─ Voyage AI

Embedding Quality

Testing Embeddings

Test semantic similarity:

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Embed queries
query1 = "How to reset password?"
query2 = "Forgot my password"
query3 = "Weather forecast"

emb1 = get_embedding(query1)
emb2 = get_embedding(query2)
emb3 = get_embedding(query3)

# Calculate similarity
sim_1_2 = cosine_similarity([emb1], [emb2])[0][0]
sim_1_3 = cosine_similarity([emb1], [emb3])[0][0]

print(f"Password queries similarity: {sim_1_2:.3f}")  # Should be high
print(f"Unrelated similarity: {sim_1_3:.3f}")        # Should be low

Evaluation Metrics

Retrieval Quality:

Precision@K: Relevant docs in top K results
Recall@K: Proportion of relevant docs retrieved
MRR: Mean reciprocal rank of first relevant result
NDCG: Normalized discounted cumulative gain

Monitor Performance:

def evaluate_retrieval(queries, expected_docs, k=5):
    precision_sum = 0
    
    for query, expected in zip(queries, expected_docs):
        results = retrieve(query, k=k)
        relevant = len([r for r in results if r in expected])
        precision_sum += relevant / k
    
    return precision_sum / len(queries)

Troubleshooting

Embeddings Fail to Generate

Verify API key is valid
Check rate limits
Ensure text is not empty
Review text length (max tokens)

Poor Retrieval Quality

Try different embedding models
Adjust chunk size and overlap
Improve text preprocessing
Add metadata for filtering

High Costs

Switch to smaller models
Enable caching
Increase batch size
Consider self-hosted options

Dimension Mismatch

Error: Vector dimensions don't match

Ensure all vectors use same model
Verify dimension configuration
Recreate vector store if changed

Get Started

Core Concepts

Building Workflows

Integrations

Features

Deployment

Development

Overview

Supported Embedding Providers

OpenAI Embeddings

Azure OpenAI Embeddings

Cohere Embeddings

HuggingFace Embeddings

Google Vertex AI Embeddings

Ollama Embeddings

Additional Providers

Using Embeddings in Flowise

In Document Stores

In Chatflows

Configuring Embedding Providers

OpenAI Embeddings

Cohere Embeddings

HuggingFace Embeddings

Ollama Embeddings

Embedding Dimensions

Dimension Reduction

Best Practices

Choosing an Embedding Model

Input Optimization

Cost Optimization

Embedding Quality

Testing Embeddings

Evaluation Metrics

Troubleshooting

Embeddings Fail to Generate

Poor Retrieval Quality

High Costs

Dimension Mismatch

Build docs developers (and LLMs) love

Get Started

Core Concepts

Building Workflows

Integrations

Features

Deployment

Development

​Overview

​Supported Embedding Providers

​OpenAI Embeddings

​Azure OpenAI Embeddings

​Cohere Embeddings

​HuggingFace Embeddings

​Google Vertex AI Embeddings

​Ollama Embeddings

​Additional Providers

​Using Embeddings in Flowise

​In Document Stores

​In Chatflows

​Configuring Embedding Providers

​OpenAI Embeddings

​Cohere Embeddings

​HuggingFace Embeddings

​Ollama Embeddings

​Embedding Dimensions

​Dimension Reduction

​Best Practices

​Choosing an Embedding Model

​Input Optimization

​Cost Optimization

​Embedding Quality

​Testing Embeddings

​Evaluation Metrics

​Troubleshooting

​Embeddings Fail to Generate

​Poor Retrieval Quality

​High Costs

​Dimension Mismatch

​Related Resources

Build docs developers (and LLMs) love

Overview

Supported Embedding Providers

OpenAI Embeddings

Azure OpenAI Embeddings

Cohere Embeddings

HuggingFace Embeddings

Google Vertex AI Embeddings

Ollama Embeddings

Additional Providers

Using Embeddings in Flowise

In Document Stores

In Chatflows

Configuring Embedding Providers

OpenAI Embeddings

Cohere Embeddings

HuggingFace Embeddings

Ollama Embeddings

Embedding Dimensions

Dimension Reduction

Best Practices

Choosing an Embedding Model

Input Optimization

Cost Optimization

Embedding Quality

Testing Embeddings

Evaluation Metrics

Troubleshooting

Embeddings Fail to Generate

Poor Retrieval Quality

High Costs

Dimension Mismatch

Related Resources