Skip to main content
Embeddings are numerical representations of text that capture semantic meaning. In Flowise, embeddings power retrieval-augmented generation (RAG), semantic search, and document similarity features.

Overview

Embeddings transform text into high-dimensional vectors:
Text: "How do I reset my password?"

Embedding Model

Vector: [0.234, -0.891, 0.432, ..., 0.123] (1536 dimensions)
These vectors enable:
  • Semantic Search: Find similar content by meaning, not just keywords
  • Document Retrieval: Retrieve relevant documents for RAG
  • Clustering: Group similar documents together
  • Similarity Comparison: Measure how similar two texts are

Supported Embedding Providers

Flowise supports multiple embedding providers:

OpenAI Embeddings

Industry-standard embeddings with excellent performance: Available Models:
  • text-embedding-3-large: 3072 dimensions, best quality
  • text-embedding-3-small: 1536 dimensions, faster and cheaper
  • text-embedding-ada-002: 1536 dimensions, previous generation
Pricing:
  • text-embedding-3-large: $0.13 per 1M tokens
  • text-embedding-3-small: $0.02 per 1M tokens
  • text-embedding-ada-002: $0.10 per 1M tokens

Azure OpenAI Embeddings

OpenAI embeddings through Azure:
  • Same models as OpenAI
  • Enterprise security and compliance
  • Private network deployment
  • Regional data residency

Cohere Embeddings

Multilingual embeddings with strong performance: Available Models:
  • embed-english-v3.0: English optimized
  • embed-multilingual-v3.0: 100+ languages
  • embed-english-light-v3.0: Faster, lower cost
Features:
  • Compression support
  • Input type specification (search_document, search_query)
  • Fine-tuning capabilities

HuggingFace Embeddings

Open-source embeddings for privacy and cost control: Popular Models:
  • sentence-transformers/all-MiniLM-L6-v2: 384 dimensions, fast
  • sentence-transformers/all-mpnet-base-v2: 768 dimensions, high quality
  • intfloat/e5-large-v2: 1024 dimensions, state-of-the-art
Deployment Options:
  • HuggingFace Inference API (cloud)
  • Self-hosted (via HuggingFace Inference Endpoints)
  • Local execution

Google Vertex AI Embeddings

Google Cloud’s embedding service: Available Models:
  • textembedding-gecko@003: 768 dimensions
  • textembedding-gecko-multilingual@001: Multilingual support
  • text-embedding-preview-0409: Preview models
Features:
  • Google Cloud integration
  • Enterprise-grade SLAs
  • Global deployment

Ollama Embeddings

Run embeddings locally with Ollama: Available Models:
  • nomic-embed-text: 768 dimensions, high quality
  • mxbai-embed-large: 1024 dimensions
  • all-minilm: 384 dimensions, lightweight
Benefits:
  • Complete privacy (local execution)
  • No API costs
  • Offline capability
  • Custom model support

Additional Providers

  • Mistral AI: European AI with competitive pricing
  • Voyage AI: Optimized for retrieval tasks
  • Jina AI: Multimodal embeddings (text + images)
  • Together AI: Multiple open-source models
  • AWS Bedrock: Amazon’s embedding service
  • IBM Watsonx: Enterprise AI platform

Using Embeddings in Flowise

In Document Stores

Embeddings are essential for document stores:
1
Create Document Store
2
Navigate to Document Store and create a new store.
3
Add Documents
4
Upload documents using document loaders.
5
Configure Embeddings
6
In the Upsert Configuration screen:
7
  • Click Select Embeddings Provider
  • Choose your preferred provider
  • Configure credentials and settings
  • 8
    Select Vector Store
    9
    Choose where to store the embedded vectors.
    10
    Upsert Documents
    11
    Click Upsert to generate embeddings and store them.

    In Chatflows

    Add embeddings directly to your chatflow:
    1. Drag an Embeddings node onto the canvas
    2. Configure the embeddings provider
    3. Connect to a Vector Store node
    4. Connect document loaders to the vector store
    [PDF Loader] → [Text Splitter] → [Vector Store] ← [Embeddings]
    
                                      [Retriever]
    

    Configuring Embedding Providers

    OpenAI Embeddings

    {
      "model": "text-embedding-3-small",
      "dimensions": 1536,  // Optional: reduce dimensions
      "batchSize": 512,    // Batch processing
      "stripNewLines": true,
      "timeout": 30000
    }
    
    1
    Add Credential
    2
  • Go to Credentials settings
  • Add OpenAI API credential
  • Enter your OpenAI API key
  • 3
    Select Model
    4
    Choose the embedding model:
    5
  • Use text-embedding-3-small for most cases
  • Use text-embedding-3-large for best quality
  • Use text-embedding-ada-002 for compatibility
  • 6
    Configure Options
    7
    Optional parameters:
    8
  • Dimensions: Reduce vector size (only for v3 models)
  • Batch Size: Number of texts to embed at once
  • Strip New Lines: Remove newlines before embedding
  • Cohere Embeddings

    {
      "model": "embed-english-v3.0",
      "inputType": "search_document",  // or "search_query"
      "truncate": "END",  // How to truncate long texts
      "embeddingTypes": ["float"]  // Can add "int8" for compression
    }
    

    HuggingFace Embeddings

    {
      "model": "sentence-transformers/all-MiniLM-L6-v2",
      "endpointUrl": "https://api-inference.huggingface.co",
      // For self-hosted:
      // "endpointUrl": "http://localhost:8080"
    }
    

    Ollama Embeddings

    {
      "model": "nomic-embed-text",
      "baseUrl": "http://localhost:11434"
    }
    
    Make sure Ollama is running locally and the model is downloaded:
    ollama pull nomic-embed-text
    

    Embedding Dimensions

    Different models produce different vector dimensions:
    ProviderModelDimensionsUse Case
    OpenAItext-embedding-3-small1536General purpose
    OpenAItext-embedding-3-large3072High accuracy
    Cohereembed-english-v3.01024English content
    HuggingFaceall-MiniLM-L6-v2384Fast retrieval
    HuggingFaceall-mpnet-base-v2768Quality balance
    Ollamanomic-embed-text768Local deployment
    Googletextembedding-gecko768GCP integration

    Dimension Reduction

    Some providers support dimension reduction:
    // OpenAI text-embedding-3-small (normally 1536 dims)
    {
      "model": "text-embedding-3-small",
      "dimensions": 512  // Reduce to 512 dims
    }
    
    Benefits:
    • Lower storage costs
    • Faster similarity search
    • Reduced memory usage
    Trade-offs:
    • Slightly lower accuracy
    • Cannot increase later

    Best Practices

    Choosing an Embedding Model

    Consider these factors: For Production:
    • OpenAI text-embedding-3-small: Best balance of cost/performance
    • Cohere embed-english-v3.0: Great for English content
    • Azure OpenAI: Enterprise requirements
    For Development:
    • HuggingFace models: No API costs
    • Ollama: Local testing
    For Privacy:
    • Ollama: Complete data privacy
    • Self-hosted HuggingFace: Control your infrastructure
    For Multilingual:
    • Cohere embed-multilingual-v3.0: 100+ languages
    • Google Vertex AI: Strong multilingual support

    Input Optimization

    Text Preprocessing:
    # Clean text before embedding
    def prepare_text(text):
        # Remove excessive whitespace
        text = " ".join(text.split())
        # Remove special characters if needed
        # Normalize unicode
        return text
    
    Chunking Strategy:
    {
      "chunkSize": 512,      // Match model's context window
      "chunkOverlap": 50,    // Preserve context
      "separator": "\n\n"    // Split on paragraphs
    }
    

    Cost Optimization

    Batch Processing: Embed multiple texts at once:
    {
      "batchSize": 100  // Process 100 texts per API call
    }
    
    Caching: Avoid re-embedding the same text:
    import hashlib
    
    def get_embedding(text, cache):
        text_hash = hashlib.md5(text.encode()).hexdigest()
        
        if text_hash in cache:
            return cache[text_hash]
        
        embedding = embed(text)
        cache[text_hash] = embedding
        return embedding
    
    Model Selection: Balance cost vs. quality:
    High Volume:
    ├─ OpenAI text-embedding-3-small ($0.02/1M tokens)
    ├─ HuggingFace (self-hosted, free)
    └─ Ollama (local, free)
    
    High Quality:
    ├─ OpenAI text-embedding-3-large ($0.13/1M tokens)
    ├─ Cohere embed-english-v3.0
    └─ Voyage AI
    

    Embedding Quality

    Testing Embeddings

    Test semantic similarity:
    from sklearn.metrics.pairwise import cosine_similarity
    import numpy as np
    
    # Embed queries
    query1 = "How to reset password?"
    query2 = "Forgot my password"
    query3 = "Weather forecast"
    
    emb1 = get_embedding(query1)
    emb2 = get_embedding(query2)
    emb3 = get_embedding(query3)
    
    # Calculate similarity
    sim_1_2 = cosine_similarity([emb1], [emb2])[0][0]
    sim_1_3 = cosine_similarity([emb1], [emb3])[0][0]
    
    print(f"Password queries similarity: {sim_1_2:.3f}")  # Should be high
    print(f"Unrelated similarity: {sim_1_3:.3f}")        # Should be low
    

    Evaluation Metrics

    Retrieval Quality:
    • Precision@K: Relevant docs in top K results
    • Recall@K: Proportion of relevant docs retrieved
    • MRR: Mean reciprocal rank of first relevant result
    • NDCG: Normalized discounted cumulative gain
    Monitor Performance:
    def evaluate_retrieval(queries, expected_docs, k=5):
        precision_sum = 0
        
        for query, expected in zip(queries, expected_docs):
            results = retrieve(query, k=k)
            relevant = len([r for r in results if r in expected])
            precision_sum += relevant / k
        
        return precision_sum / len(queries)
    

    Troubleshooting

    Embeddings Fail to Generate

    • Verify API key is valid
    • Check rate limits
    • Ensure text is not empty
    • Review text length (max tokens)

    Poor Retrieval Quality

    • Try different embedding models
    • Adjust chunk size and overlap
    • Improve text preprocessing
    • Add metadata for filtering

    High Costs

    • Switch to smaller models
    • Enable caching
    • Increase batch size
    • Consider self-hosted options

    Dimension Mismatch

    Error: Vector dimensions don't match
    
    • Ensure all vectors use same model
    • Verify dimension configuration
    • Recreate vector store if changed

    Build docs developers (and LLMs) love