RAG & Search Overview

What is RAG and Grounding?

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by allowing them to access and process external information sources during generation. This ensures the model’s responses are grounded in factual data and reduces hallucinations.

Ungrounded Generation

Relies on LLM training data alone and is prone to hallucinations when it doesn’t have all the right facts

Grounded Generation

Provides fresh and potentially private data to the model as part of its input or prompt

RAG is a technique that retrieves relevant facts, often via search, and provides them to the LLM to improve generation quality and reduce hallucinations.

Why Use RAG?

Access Up-to-Date Information

LLMs are trained on static datasets, so their knowledge can become outdated. RAG allows them to access real-time or frequently updated information.

Improved Accuracy

RAG reduces the risk of LLM “hallucinations” (generating false or misleading information) by grounding responses in verified external data.

Enhanced Context

By combining additional knowledge sources with existing LLM knowledge, RAG provides better context to enhance response quality.

Private Data Access

Enable LLMs to understand and use your organization’s private data that wasn’t part of their training.

RAG Architecture

A typical RAG system consists of several key components:

1. Data Ingestion

Intake data from different sources:

Local files
Google Cloud Storage
Google Drive
BigQuery
Websites and structured data

2. Data Transformation

Conversion and preparation of data for indexing:

Document parsing and extraction
Text chunking and splitting
Metadata extraction
Format normalization

3. Embedding

Numerical representations of text that capture semantic meaning and context. Similar or related text tends to have similar embeddings in high-dimensional vector space.

from vertexai.language_models import TextEmbeddingModel

# Initialize embedding model
model = TextEmbeddingModel.from_pretrained("text-embedding-005")

# Generate embeddings
embeddings = model.get_embeddings(["Sample text to embed"])
for embedding in embeddings:
    vector = embedding.values
    print(f"Embedding dimension: {len(vector)}")

4. Data Indexing

Structure the knowledge base for optimized searching:

Vector databases (Vertex AI Vector Search, Feature Store)
Enterprise search indexes (Vertex AI Search)
Database storage (AlloyDB, BigQuery)

5. Retrieval

When a user asks a question, the retrieval component searches through the knowledge base to find relevant information:

Semantic search using vector similarity
Keyword-based search
Hybrid search combining both approaches

6. Generation

The retrieved information becomes context added to the original user query to guide the LLM in generating factually grounded responses.

from google import genai
from google.genai.types import Tool, Retrieval, VertexRagStore

client = genai.Client(vertexai=True, project=PROJECT_ID, location="global")

# Define RAG tool
rag_tool = Tool(
    retrieval=Retrieval(
        vertex_rag_store=VertexRagStore(
            rag_resources=[f"projects/{PROJECT_ID}/locations/us-east1/ragCorpora/{corpus_name}"]
        )
    )
)

# Generate with RAG context
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="What are the key features of our product?",
    config={"tools": [rag_tool]}
)

RAG Solutions on Google Cloud

Google Cloud offers multiple approaches for implementing RAG:

Vertex AI Search

Out-of-the-box enterprise search with Google-quality results for your data

RAG Engine

Managed data framework for building context-augmented LLM applications with flexible backends

Custom RAG

Build your own RAG pipeline using Vertex AI components and vector databases

Grounding API

Ground Gemini responses in Google Search or Vertex AI Search with a simple API

Common Use Cases

Enterprise Search

Enable employees to search across company documents, wikis, and data sources with natural language queries.

Customer Support

Provide customer service agents or chatbots with instant access to product documentation and support knowledge bases.

Document Q&A

Answer questions about contracts, reports, research papers, and other documents by extracting relevant information.

Code Search

Help developers find relevant code snippets, API documentation, and implementation examples.

For best results with RAG, focus on high-quality data ingestion, appropriate chunking strategies, and comprehensive evaluation of retrieval accuracy.

Next Steps

RAG Engine

Learn about managed RAG orchestration with Vertex AI

Vertex AI Search

Explore enterprise search capabilities and datastores

Grounding Techniques

Understand chunking, retrieval, and grounding strategies

Evaluation

Learn how to evaluate RAG system performance

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

What is RAG and Grounding?

Ungrounded Generation

Grounded Generation

Why Use RAG?

RAG Architecture

1. Data Ingestion

2. Data Transformation

3. Embedding

4. Data Indexing

5. Retrieval

6. Generation

RAG Solutions on Google Cloud

Vertex AI Search

RAG Engine

Custom RAG

Grounding API

Common Use Cases

Enterprise Search

Customer Support

Document Q&A

Code Search

Next Steps

RAG Engine

Vertex AI Search

Grounding Techniques

Evaluation

Build docs developers (and LLMs) love

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

​What is RAG and Grounding?

Ungrounded Generation

Grounded Generation

​Why Use RAG?

​RAG Architecture

​1. Data Ingestion

​2. Data Transformation

​3. Embedding

​4. Data Indexing

​5. Retrieval

​6. Generation

​RAG Solutions on Google Cloud

Vertex AI Search

RAG Engine

Custom RAG

Grounding API

​Common Use Cases

​Enterprise Search

​Customer Support

​Document Q&A

​Code Search

​Next Steps

RAG Engine

Vertex AI Search

Grounding Techniques

Evaluation

Build docs developers (and LLMs) love

What is RAG and Grounding?

Why Use RAG?

RAG Architecture

1. Data Ingestion

2. Data Transformation

3. Embedding

4. Data Indexing

5. Retrieval

6. Generation

RAG Solutions on Google Cloud

Common Use Cases

Enterprise Search

Customer Support

Document Q&A

Code Search

Next Steps