Introduction
Embeddings are a powerful way to represent data as dense vectors that capture semantic meaning. With the rise of Large Language Models (LLMs), embeddings have become essential for building intelligent applications that understand the meaning and context of text, images, and videos.What are Embeddings?
In traditional IT systems, most data is organized as structured or tabular data, using simple keywords, labels, and categories in databases and search engines. In contrast, AI-powered services arrange data into a simple data structure known as “embeddings.”Embeddings are vector representations that capture the semantic meaning of content. Similar content has embeddings that are close together in the embedding space.
How Embeddings Work
Let’s take an example where a text discusses movies, music, and actors, with a distribution of 10%, 2%, and 30%, respectively. The AI can create an embedding with three values: 0.1, 0.02, and 0.3, in 3-dimensional space. AI can put content with similar meanings closely together in the space. This is how Google organizes data across various services like Google Search, YouTube, and Play to provide search results and recommendations with relevant content.Use Cases
Embeddings can be used to represent different types of business data:Semantic Search
Find content based on meaning rather than just keywords. Enable natural language queries to find relevant documents, products, or media.
Recommendations
Build recommendation systems that understand user preferences and content similarity to suggest relevant items.
Classification
Classify texts with semantic understanding for use cases like customer segmentation and content categorization.
Question Answering
Ground LLM outputs with relevant business data through Retrieval-Augmented Generation (RAG).
Embedding Models on Vertex AI
Vertex AI provides several embedding models for different use cases:Text Embeddings
text-embedding-005
Latest model with 768 dimensions, supporting task types for optimized retrieval, classification, and clustering.
Multimodal Embeddings
Themultimodalembedding model generates embeddings for:
- Text: Contextual text understanding
- Images: Visual content representation
- Video: Temporal video segment embeddings
Vector Search
Once you have embeddings, you need a fast way to find similar items. Vertex AI Vector Search (formerly Matching Engine) provides:- Blazingly fast: Millisecond-level search across billions of vectors
- ScaNN algorithm: Google’s state-of-the-art Approximate Nearest Neighbor (ANN) algorithm
- Fully managed: No infrastructure management required
- Hybrid search: Combine semantic and keyword-based search
- Autoscaling: Automatically resize based on workload demands
How Vector Search Works
Vector Search uses Approximate Nearest Neighbor (ANN) techniques to quickly find similar embeddings:- Indexing: Organize embeddings into an efficient tree structure
- Querying: Find nearest neighbors in milliseconds
- Ranking: Return top-k most similar items
Key Concepts
Embedding Dimensions
Vertex AI embedding models support various dimensions:- 128, 256, 512: Smaller dimensions for faster processing
- 768: Default for text-embedding-005
- 1408: Maximum for multimodal embeddings
Distance Metrics
- Dot Product
- Cosine Similarity
- Euclidean Distance
Used by Vertex AI text embedding models. Higher values indicate more similarity.
Architecture Patterns
RAG (Retrieval-Augmented Generation)
Semantic Search
Getting Started
Ready to build with embeddings and vector search? Explore these topics:Text Embeddings
Learn how to generate and use text embeddings with task types
Multimodal Embeddings
Work with image and video embeddings
Vector Search
Set up and query Vector Search indexes
Hybrid Search
Combine semantic and keyword search for better results
Pricing
Vertex AI Embeddings and Vector Search have separate pricing:- Embeddings API: Charged per 1,000 characters of input text
- Vector Search: Charged based on node hours and queries