Skip to main content

Overview

FixMyCar is a production-ready Retrieval-Augmented Generation (RAG) application that helps car owners troubleshoot issues by querying vehicle owner’s manuals. The application demonstrates how to integrate Vertex AI Search with Gemini for accurate, grounded responses. FixMyCar Architecture

Architecture

System Components

Frontend

Streamlit Python App
  • Chat interface
  • Real-time streaming
  • Deployed on GKE

Backend

Java Spring Boot
  • REST API
  • Vertex AI Search client
  • Gemini integration

Search Engine

Vertex AI Search
  • OCR Parser for PDFs
  • Vector embeddings
  • Extractive answers

Infrastructure

GKE Autopilot
  • Auto-scaling
  • Workload Identity
  • Load balancing

RAG Implementation

Two-Step RAG Pipeline

FixMyCar implements the classic RAG pattern:
1

Retrieval: Vertex AI Search

Search the car manual datastore using natural language query
2

Augmentation: Prompt Engineering

Construct Gemini prompt with search results as grounding context
3

Generation: Gemini Inference

Generate accurate, contextual response based on manual excerpts

Java Backend Implementation

package com.cpet.fixmycarbackend;

import com.google.cloud.discoveryengine.v1.*;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.generativeai.ChatSession;
import com.google.cloud.vertexai.generativeai.GenerativeModel;

@RestController
public class FixMyCarBackendController {
  @Autowired
  private FixMyCarConfiguration config;
  
  @PostMapping("/chat")
  public ChatMessage message(@RequestBody ChatMessage message) {
    return ragVertexAISearch(message);
  }
  
  public ChatMessage ragVertexAISearch(ChatMessage message) {
    // Step 1: Search Vertex AI Search datastore
    String searchQuery = message.getPrompt();
    String vectorSearchResults = performVertexAISearch(searchQuery);
    
    // Step 2: Generate response with Gemini
    String result = geminiInference(
      message.getPrompt(), 
      vectorSearchResults
    );
    message.setResponse(result);
    return message;
  }
}

Streamlit Frontend

import streamlit as st
import requests

BACKEND_URL = "http://fixmycar-backend:8080/chat"

st.title("🚗 FixMyCar Assistant")
st.write("Ask questions about your Cymbal Starlight 2024")

# Chat interface
if prompt := st.chat_input("What's your question?"):
    st.chat_message("user").write(prompt)
    
    # Call backend API
    response = requests.post(
        BACKEND_URL,
        json={"prompt": prompt},
        headers={"Content-Type": "application/json"}
    )
    
    if response.status_code == 200:
        data = response.json()
        st.chat_message("assistant").write(data["response"])
    else:
        st.error("Failed to get response from backend")

Vertex AI Search Configuration

OCR Parser for PDFs

Vertex AI Search uses Document AI’s OCR parser to extract text from owner’s manuals:
1

Upload PDFs to Cloud Storage

Store manuals in GCS bucket (e.g., cymbal-starlight-2024.pdf)
2

Create Datastore

Configure with:
  • Source: Cloud Storage bucket
  • Parser: OCR Parser (not Layout Parser)
  • Region: Global
  • Enterprise features: Enabled
3

Indexing

Vertex AI Search automatically:
  • Extracts text from PDFs
  • Generates vector embeddings
  • Creates extractive answer indexes
  • Builds search indexes
Duration: ~10 minutes for typical owner’s manual
4

Test Search

Use Preview interface to test queries before deployment

Extractive Answers

Vertex AI Search returns structured extractive answers:
{
  "results": [
    {
      "document": {
        "derivedStructData": {
          "extractive_answers": [
            {
              "content": "The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. The cargo area is located in the trunk of the vehicle.",
              "pageNumber": 42
            }
          ]
        }
      }
    }
  ]
}
These answers are pre-extracted during indexing, not generated by LLM, ensuring:
  • Accuracy: Direct quotes from source documents
  • Low latency: No inference required during retrieval
  • Grounding: Provenance with page numbers

GKE Deployment

Workload Identity Setup

FixMyCar uses GKE Workload Identity to authenticate with Vertex AI:
#!/bin/bash
# workload_identity.sh

PROJECT_ID=$(gcloud config get-value project)
CLUSTER_NAME="fixmycar"
REGION="us-central1"
NAMESPACE="default"
KSA_NAME="fixmycar-backend-sa"  # Kubernetes Service Account
GSA_NAME="fixmycar-gsa"         # Google Cloud Service Account

# Create Google Cloud Service Account
gcloud iam service-accounts create ${GSA_NAME} \
  --display-name="FixMyCar Backend Service Account"

# Grant Vertex AI permissions
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/discoveryengine.editor"

# Create Kubernetes Service Account
kubectl create serviceaccount ${KSA_NAME} -n ${NAMESPACE}

# Bind Kubernetes SA to Google Cloud SA
gcloud iam service-accounts add-iam-policy-binding \
  ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:${PROJECT_ID}.svc.id.goog[${NAMESPACE}/${KSA_NAME}]"

# Annotate Kubernetes SA
kubectl annotate serviceaccount ${KSA_NAME} \
  -n ${NAMESPACE} \
  iam.gke.io/gcp-service-account=${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com

Kubernetes Manifests

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fixmycar-backend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: fixmycar-backend
  template:
    metadata:
      labels:
        app: fixmycar-backend
    spec:
      serviceAccountName: fixmycar-backend-sa
      containers:
      - name: fixmycar-backend
        image: us-central1-docker.pkg.dev/PROJECT_ID/fixmycar/backend:latest
        ports:
        - containerPort: 8080
        env:
        - name: GCP_PROJECT_ID
          value: "your-project-id"
        - name: VERTEX_AI_DATASTORE_ID
          value: "your-datastore-id"
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

Deployment Steps

1

Prerequisites

  • Google Cloud project with billing
  • gcloud CLI installed
  • Docker or Colima for container builds
  • Java 18+, Maven 3.9.6+
  • Python 3.9+
2

Create Artifact Registry

gcloud artifacts repositories create fixmycar \
  --repository-format=docker \
  --location=us-central1
3

Build & Push Containers

# Authenticate Docker
gcloud auth configure-docker us-central1-docker.pkg.dev

# Update PROJECT_ID in dockerpush.sh
./dockerpush.sh
4

Create GKE Cluster

gcloud container clusters create-auto fixmycar \
  --region=us-central1 \
  --project=YOUR_PROJECT_ID

# Get credentials
gcloud container clusters get-credentials fixmycar \
  --region=us-central1
5

Upload Owner's Manual

# Create bucket
gcloud storage buckets create gs://YOUR_PROJECT_ID-fixmycar \
  --location=us-central1

# Upload manual
gcloud storage cp cymbal-starlight-2024.pdf \
  gs://YOUR_PROJECT_ID-fixmycar/
6

Configure Vertex AI Search

  1. Navigate to Agent Builder in console
  2. Create Search app: YOUR_PROJECT_ID-fixmycar
  3. Create datastore:
    • Source: Cloud Storage bucket
    • Parser: OCR Parser
    • Region: Global
  4. Wait ~10 minutes for indexing
  5. Test in Preview interface
7

Setup Workload Identity

./workload_identity.sh
8

Deploy to GKE

# Update image and env vars in YAML files
kubectl apply -f kubernetes/backend-deployment-vertex-search.yaml
kubectl apply -f kubernetes/backend-service.yaml
kubectl apply -f kubernetes/frontend-deployment.yaml
kubectl apply -f kubernetes/frontend-service.yaml

# Wait for pods to be ready
kubectl get pods -w
9

Access Application

# Get external IP
kubectl get service fixmycar-frontend

# Open in browser: http://EXTERNAL_IP

Testing & Validation

Example Queries

Cymbal Starlight 2024: What is the max cargo capacity?

# Expected Response:
# The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. 
# The cargo area is located in the trunk of the vehicle.

Backend Logs

View RAG pipeline execution:
kubectl logs -l app=fixmycar-backend --tail=100 -f
Example output:
2024-03-23T23:35:07.059Z INFO --- 🔍 Vertex AI Search results: 
Chapter 6: Towing, Cargo, and Luggage
The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.

2024-03-23T23:35:07.060Z INFO --- 🔮 Gemini Prompt: 
You are a helpful car manual chatbot...
Human prompt: What is the max cargo capacity?
Grounding data: [The Cymbal Starlight 2024 has a cargo capacity...]

2024-03-23T23:35:07.762Z INFO --- 🔮 Gemini Response: 
The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.

Performance Optimization

Caching Strategy

@Configuration
public class CacheConfig {
  @Bean
  public CacheManager cacheManager() {
    return new ConcurrentMapCacheManager("searchResults");
  }
}

@Service
public class SearchService {
  @Cacheable(value = "searchResults", key = "#query")
  public String search(String query) {
    return performVertexAISearch(query);
  }
}

GKE Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fixmycar-backend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fixmycar-backend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Troubleshooting

GKE Autopilot is scaling up nodes. Wait 3-5 minutes.
kubectl describe pod <pod-name>
Check Workload Identity configuration:
# Verify annotation
kubectl get sa fixmycar-backend-sa -o yaml

# Verify IAM binding
gcloud iam service-accounts get-iam-policy \
  fixmycar-gsa@PROJECT_ID.iam.gserviceaccount.com
Ensure:
  • Datastore indexing completed (check Activity tab)
  • OCR Parser selected (not Layout Parser)
  • PDFs uploaded to correct bucket path
  • Test query in Preview interface first
Check logs for detailed error:
kubectl logs -l app=fixmycar-backend --tail=50
Common issues:
  • Incorrect VERTEX_AI_DATASTORE_ID
  • Missing GCP_PROJECT_ID
  • Network policy blocking egress

Cleanup

# Delete GKE cluster
gcloud container clusters delete fixmycar --region=us-central1

# Delete Artifact Registry
gcloud artifacts repositories delete fixmycar --location=us-central1

# Delete Cloud Storage bucket
gcloud storage rm -r gs://YOUR_PROJECT_ID-fixmycar

# Delete Vertex AI Search app
# (Must be done via console)

# Delete service account
gcloud iam service-accounts delete fixmycar-gsa@PROJECT_ID.iam.gserviceaccount.com

Key Takeaways

Vertex AI Search

Managed search with OCR removes complexity of building custom RAG pipelines

Extractive Answers

Pre-computed answers ensure accurate, low-latency retrieval

GKE Workload Identity

Secure, keyless authentication for Google Cloud services

Spring Boot + Gemini

Java ecosystem integrates seamlessly with Vertex AI SDKs

Next Steps

Build docs developers (and LLMs) love