Skip to main content

Overview

Vertex AI Text Embeddings API allows you to generate embeddings from text using state-of-the-art language models. The API supports task-specific optimization, enabling you to generate embeddings tailored for specific use cases like question answering, document retrieval, or classification.

Supported Models

Latest English text embedding model with 768 dimensions.Features:
  • Task type optimization
  • Improved semantic understanding
  • Support for up to 2048 input tokens
  • Multiple output dimensions: 128, 256, 512, 768

Generate Text Embeddings

Installation

pip install --upgrade google-genai

Basic Usage

from google import genai

# Initialize client
client = genai.Client(
    vertexai=True,
    project="your-project-id",
    location="us-central1"
)

# Generate embeddings
response = client.models.embed_content(
    model="text-embedding-005",
    contents=["What are embeddings?"]
)

embedding = response.embeddings[0].values
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Task Types

Task types optimize embeddings for specific use cases, significantly improving search quality and performance.
Vertex AI supports the following task types:
Task TypeUse CaseQuery TypeDocument Type
RETRIEVAL_QUERYDocument searchQuery textRETRIEVAL_DOCUMENT
QUESTION_ANSWERINGQ&A systemsQuestionRETRIEVAL_DOCUMENT
SEMANTIC_SIMILARITYSimilarity searchAny textAny text
CLASSIFICATIONText classificationText to classifyN/A
CLUSTERINGText clusteringText to clusterN/A
FACT_VERIFICATIONFact checkingStatementRETRIEVAL_DOCUMENT
CODE_RETRIEVAL_QUERYCode searchQueryCode snippets

Using Task Types

from google import genai
from google.genai.types import EmbedContentConfig

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

# Generate question embedding
question = "Why is the sky blue?"
question_emb = client.models.embed_content(
    model="text-embedding-005",
    contents=[question],
    config=EmbedContentConfig(task_type="QUESTION_ANSWERING")
).embeddings[0].values

# Generate answer embeddings
answers = [
    "The sky is blue today",
    "The scattering of sunlight causes the blue color"
]
answer_embs = client.models.embed_content(
    model="text-embedding-005",
    contents=answers,
    config=EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT")
)

# Calculate similarities
import numpy as np
for i, answer in enumerate(answers):
    similarity = np.dot(question_emb, answer_embs.embeddings[i].values)
    print(f"{answer}: {similarity:.4f}")

Semantic Similarity

Calculate similarity between texts using embeddings:
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from google import genai

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

# Sample texts
texts = [
    "i really enjoyed the movie last night",
    "so many amazing cinematic scenes yesterday",
    "had a great time writing my Python scripts a few days ago",
    "huge sense of relief when my .py script finally ran without error"
]

# Generate embeddings
embeddings = []
for text in texts:
    emb = client.models.embed_content(
        model="text-embedding-005",
        contents=[text]
    ).embeddings[0].values
    embeddings.append(emb)

# Calculate cosine similarity matrix
similarity_matrix = cosine_similarity(embeddings)

# Display as DataFrame
df = pd.DataFrame(similarity_matrix, index=texts, columns=texts)
print(df)

Output Dimensions

You can specify different output dimensions for optimization:
response = client.models.embed_content(
    model="text-embedding-005",
    contents=["Sample text"]
)
# Returns 768-dimensional embeddings
Smaller dimensions (128, 256) are faster to process and require less storage, while larger dimensions (512, 768) capture more semantic nuance.

Working with DataFrames

Integrate embeddings into your pandas workflow:
import pandas as pd
from google import genai

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

# Create DataFrame
df = pd.DataFrame({
    'text': [
        "Customer service inquiry",
        "Technical support request",
        "Billing question"
    ]
})

# Generate embeddings
def get_embedding(text):
    response = client.models.embed_content(
        model="text-embedding-005",
        contents=[text]
    )
    return response.embeddings[0].values

df['embedding'] = df['text'].apply(get_embedding)
print(df.head())

Rate Limits and Quotas

The text embeddings API has the following default quotas:
  • Requests per minute: 60 for new projects, 600 for projects with usage history
  • Batch size: Up to 5 texts per request

Rate Limiting

import time
from google import genai

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

def get_embeddings_with_rate_limit(texts: list[str], batch_size: int = 5):
    embeddings = []
    
    for i in range(0, len(texts), batch_size):
        time.sleep(1)  # Limit to 1 request per second
        batch = texts[i:i + batch_size]
        response = client.models.embed_content(
            model="text-embedding-005",
            contents=batch
        )
        embeddings.extend([e.values for e in response.embeddings])
    
    return embeddings

Tuning Text Embeddings

For domain-specific applications, you can fine-tune the embedding model:
1

Prepare Training Data

Create query-document pairs in JSONL format:
{"query": "How to reset password?", "document": "Navigate to Settings > Security..."}
2

Upload to Cloud Storage

gsutil cp training_data.jsonl gs://your-bucket/embeddings/
3

Start Tuning Job

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

tuning_job = aiplatform.TextEmbeddingTuningJob.create(
    base_model="text-embedding-005",
    training_data_uri="gs://your-bucket/embeddings/training_data.jsonl"
)
Learn more in the Embeddings Tuning Guide.

Best Practices

Choose the Right Task Type

Use QUESTION_ANSWERING for Q&A, RETRIEVAL_QUERY for search, and SEMANTIC_SIMILARITY for general similarity.

Batch Your Requests

Process up to 5 texts per API call to reduce latency and costs.

Cache Embeddings

Store generated embeddings to avoid regenerating them for the same text.

Monitor Token Usage

The API accepts up to 2048 tokens per input. Truncate longer texts appropriately.

Next Steps

Build docs developers (and LLMs) love