FixMyCar Automotive Assistant

Overview

FixMyCar is a production-ready Retrieval-Augmented Generation (RAG) application that helps car owners troubleshoot issues by querying vehicle owner’s manuals. The application demonstrates how to integrate Vertex AI Search with Gemini for accurate, grounded responses. FixMyCar Architecture

Architecture

System Components

Frontend

Streamlit Python App

Chat interface
Real-time streaming
Deployed on GKE

Backend

Java Spring Boot

REST API
Vertex AI Search client
Gemini integration

Search Engine

Vertex AI Search

OCR Parser for PDFs
Vector embeddings
Extractive answers

Infrastructure

GKE Autopilot

Auto-scaling
Workload Identity
Load balancing

RAG Implementation

Two-Step RAG Pipeline

FixMyCar implements the classic RAG pattern:

Retrieval: Vertex AI Search

Search the car manual datastore using natural language query

Augmentation: Prompt Engineering

Construct Gemini prompt with search results as grounding context

Generation: Gemini Inference

Generate accurate, contextual response based on manual excerpts

Java Backend Implementation

package com.cpet.fixmycarbackend;

import com.google.cloud.discoveryengine.v1.*;
import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.generativeai.ChatSession;
import com.google.cloud.vertexai.generativeai.GenerativeModel;

@RestController
public class FixMyCarBackendController {
  @Autowired
  private FixMyCarConfiguration config;
  
  @PostMapping("/chat")
  public ChatMessage message(@RequestBody ChatMessage message) {
    return ragVertexAISearch(message);
  }
  
  public ChatMessage ragVertexAISearch(ChatMessage message) {
    // Step 1: Search Vertex AI Search datastore
    String searchQuery = message.getPrompt();
    String vectorSearchResults = performVertexAISearch(searchQuery);
    
    // Step 2: Generate response with Gemini
    String result = geminiInference(
      message.getPrompt(), 
      vectorSearchResults
    );
    message.setResponse(result);
    return message;
  }
}

Streamlit Frontend

import streamlit as st
import requests

BACKEND_URL = "http://fixmycar-backend:8080/chat"

st.title("🚗 FixMyCar Assistant")
st.write("Ask questions about your Cymbal Starlight 2024")

# Chat interface
if prompt := st.chat_input("What's your question?"):
    st.chat_message("user").write(prompt)
    
    # Call backend API
    response = requests.post(
        BACKEND_URL,
        json={"prompt": prompt},
        headers={"Content-Type": "application/json"}
    )
    
    if response.status_code == 200:
        data = response.json()
        st.chat_message("assistant").write(data["response"])
    else:
        st.error("Failed to get response from backend")

Vertex AI Search Configuration

OCR Parser for PDFs

Vertex AI Search uses Document AI’s OCR parser to extract text from owner’s manuals:

Upload PDFs to Cloud Storage

Store manuals in GCS bucket (e.g., cymbal-starlight-2024.pdf)

Create Datastore

Configure with:

Source: Cloud Storage bucket
Parser: OCR Parser (not Layout Parser)
Region: Global
Enterprise features: Enabled

Indexing

Vertex AI Search automatically:

Extracts text from PDFs
Generates vector embeddings
Creates extractive answer indexes
Builds search indexes

Duration: ~10 minutes for typical owner’s manual

Test Search

Use Preview interface to test queries before deployment

Extractive Answers

Vertex AI Search returns structured extractive answers:

{
  "results": [
    {
      "document": {
        "derivedStructData": {
          "extractive_answers": [
            {
              "content": "The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. The cargo area is located in the trunk of the vehicle.",
              "pageNumber": 42
            }
          ]
        }
      }
    }
  ]
}

These answers are pre-extracted during indexing, not generated by LLM, ensuring:

Accuracy: Direct quotes from source documents
Low latency: No inference required during retrieval
Grounding: Provenance with page numbers

GKE Deployment

Workload Identity Setup

FixMyCar uses GKE Workload Identity to authenticate with Vertex AI:

#!/bin/bash
# workload_identity.sh

PROJECT_ID=$(gcloud config get-value project)
CLUSTER_NAME="fixmycar"
REGION="us-central1"
NAMESPACE="default"
KSA_NAME="fixmycar-backend-sa"  # Kubernetes Service Account
GSA_NAME="fixmycar-gsa"         # Google Cloud Service Account

# Create Google Cloud Service Account
gcloud iam service-accounts create ${GSA_NAME} \
  --display-name="FixMyCar Backend Service Account"

# Grant Vertex AI permissions
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/discoveryengine.editor"

# Create Kubernetes Service Account
kubectl create serviceaccount ${KSA_NAME} -n ${NAMESPACE}

# Bind Kubernetes SA to Google Cloud SA
gcloud iam service-accounts add-iam-policy-binding \
  ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:${PROJECT_ID}.svc.id.goog[${NAMESPACE}/${KSA_NAME}]"

# Annotate Kubernetes SA
kubectl annotate serviceaccount ${KSA_NAME} \
  -n ${NAMESPACE} \
  iam.gke.io/gcp-service-account=${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com

Kubernetes Manifests

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fixmycar-backend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: fixmycar-backend
  template:
    metadata:
      labels:
        app: fixmycar-backend
    spec:
      serviceAccountName: fixmycar-backend-sa
      containers:
      - name: fixmycar-backend
        image: us-central1-docker.pkg.dev/PROJECT_ID/fixmycar/backend:latest
        ports:
        - containerPort: 8080
        env:
        - name: GCP_PROJECT_ID
          value: "your-project-id"
        - name: VERTEX_AI_DATASTORE_ID
          value: "your-datastore-id"
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

Deployment Steps

Prerequisites

Google Cloud project with billing
gcloud CLI installed
Docker or Colima for container builds
Java 18+, Maven 3.9.6+
Python 3.9+

Create Artifact Registry

gcloud artifacts repositories create fixmycar \
  --repository-format=docker \
  --location=us-central1

Build & Push Containers

# Authenticate Docker
gcloud auth configure-docker us-central1-docker.pkg.dev

# Update PROJECT_ID in dockerpush.sh
./dockerpush.sh

Create GKE Cluster

gcloud container clusters create-auto fixmycar \
  --region=us-central1 \
  --project=YOUR_PROJECT_ID

# Get credentials
gcloud container clusters get-credentials fixmycar \
  --region=us-central1

Upload Owner's Manual

# Create bucket
gcloud storage buckets create gs://YOUR_PROJECT_ID-fixmycar \
  --location=us-central1

# Upload manual
gcloud storage cp cymbal-starlight-2024.pdf \
  gs://YOUR_PROJECT_ID-fixmycar/

Configure Vertex AI Search

Navigate to Agent Builder in console
Create Search app: YOUR_PROJECT_ID-fixmycar
Create datastore:
- Source: Cloud Storage bucket
- Parser: OCR Parser
- Region: Global
Wait ~10 minutes for indexing
Test in Preview interface

Setup Workload Identity

./workload_identity.sh

Deploy to GKE

# Update image and env vars in YAML files
kubectl apply -f kubernetes/backend-deployment-vertex-search.yaml
kubectl apply -f kubernetes/backend-service.yaml
kubectl apply -f kubernetes/frontend-deployment.yaml
kubectl apply -f kubernetes/frontend-service.yaml

# Wait for pods to be ready
kubectl get pods -w

Access Application

# Get external IP
kubectl get service fixmycar-frontend

# Open in browser: http://EXTERNAL_IP

Testing & Validation

Example Queries

Cymbal Starlight 2024: What is the max cargo capacity?

# Expected Response:
# The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. 
# The cargo area is located in the trunk of the vehicle.

Backend Logs

View RAG pipeline execution:

kubectl logs -l app=fixmycar-backend --tail=100 -f

Example output:

2024-03-23T23:35:07.059Z INFO --- 🔍 Vertex AI Search results: 
Chapter 6: Towing, Cargo, and Luggage
The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.

2024-03-23T23:35:07.060Z INFO --- 🔮 Gemini Prompt: 
You are a helpful car manual chatbot...
Human prompt: What is the max cargo capacity?
Grounding data: [The Cymbal Starlight 2024 has a cargo capacity...]

2024-03-23T23:35:07.762Z INFO --- 🔮 Gemini Response: 
The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.

Performance Optimization

Caching Strategy

@Configuration
public class CacheConfig {
  @Bean
  public CacheManager cacheManager() {
    return new ConcurrentMapCacheManager("searchResults");
  }
}

@Service
public class SearchService {
  @Cacheable(value = "searchResults", key = "#query")
  public String search(String query) {
    return performVertexAISearch(query);
  }
}

GKE Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fixmycar-backend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fixmycar-backend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Troubleshooting

Pods stuck in Pending state

GKE Autopilot is scaling up nodes. Wait 3-5 minutes.

kubectl describe pod <pod-name>

403 Forbidden from Vertex AI

Check Workload Identity configuration:

# Verify annotation
kubectl get sa fixmycar-backend-sa -o yaml

# Verify IAM binding
gcloud iam service-accounts get-iam-policy \
  fixmycar-gsa@PROJECT_ID.iam.gserviceaccount.com

Vertex AI Search returns no results

Ensure:

Datastore indexing completed (check Activity tab)
OCR Parser selected (not Layout Parser)
PDFs uploaded to correct bucket path
Test query in Preview interface first

Backend returns 500 error

Check logs for detailed error:

kubectl logs -l app=fixmycar-backend --tail=50

Common issues:

Incorrect VERTEX_AI_DATASTORE_ID
Missing GCP_PROJECT_ID
Network policy blocking egress

Cleanup

# Delete GKE cluster
gcloud container clusters delete fixmycar --region=us-central1

# Delete Artifact Registry
gcloud artifacts repositories delete fixmycar --location=us-central1

# Delete Cloud Storage bucket
gcloud storage rm -r gs://YOUR_PROJECT_ID-fixmycar

# Delete Vertex AI Search app
# (Must be done via console)

# Delete service account
gcloud iam service-accounts delete fixmycar-gsa@PROJECT_ID.iam.gserviceaccount.com

Key Takeaways

Vertex AI Search

Managed search with OCR removes complexity of building custom RAG pipelines

Extractive Answers

Pre-computed answers ensure accurate, low-latency retrieval

GKE Workload Identity

Secure, keyless authentication for Google Cloud services

Spring Boot + Gemini

Java ecosystem integrates seamlessly with Vertex AI SDKs

Next Steps

Explore GenWealth’s AlloyDB AI integration
Learn about Spanner’s graph search
Build real-time voice AI with Gemini Live

Applications

Overview

Architecture

System Components

Frontend

Backend

Search Engine

Infrastructure

RAG Implementation

Two-Step RAG Pipeline

Java Backend Implementation

Streamlit Frontend

Vertex AI Search Configuration

OCR Parser for PDFs

Extractive Answers

GKE Deployment

Workload Identity Setup

Kubernetes Manifests

Deployment Steps

Testing & Validation

Example Queries

Backend Logs

Performance Optimization

Caching Strategy

GKE Autoscaling

Troubleshooting

Cleanup

Key Takeaways

Vertex AI Search

Extractive Answers

GKE Workload Identity

Spring Boot + Gemini

Next Steps

Build docs developers (and LLMs) love

Applications

​Overview

​Architecture

​System Components

Frontend

Backend

Search Engine

Infrastructure

​RAG Implementation

​Two-Step RAG Pipeline

​Java Backend Implementation

​Streamlit Frontend

​Vertex AI Search Configuration

​OCR Parser for PDFs

​Extractive Answers

​GKE Deployment

​Workload Identity Setup

​Kubernetes Manifests

​Deployment Steps

​Testing & Validation

​Example Queries

​Backend Logs

​Performance Optimization

​Caching Strategy

​GKE Autoscaling

​Troubleshooting

​Cleanup

​Key Takeaways

Vertex AI Search

Extractive Answers

GKE Workload Identity

Spring Boot + Gemini

​Next Steps

Build docs developers (and LLMs) love

Overview

Architecture

System Components

RAG Implementation

Two-Step RAG Pipeline

Java Backend Implementation

Streamlit Frontend

Vertex AI Search Configuration

OCR Parser for PDFs

Extractive Answers

GKE Deployment

Workload Identity Setup

Kubernetes Manifests

Deployment Steps

Testing & Validation

Example Queries

Backend Logs

Performance Optimization

Caching Strategy

GKE Autoscaling

Troubleshooting

Cleanup

Key Takeaways

Next Steps