Overview
FixMyCar is a production-ready Retrieval-Augmented Generation (RAG) application that helps car owners troubleshoot issues by querying vehicle owner’s manuals. The application demonstrates how to integrate Vertex AI Search with Gemini for accurate, grounded responses.
Architecture
System Components
Frontend
Streamlit Python App
- Chat interface
- Real-time streaming
- Deployed on GKE
Backend
Java Spring Boot
- REST API
- Vertex AI Search client
- Gemini integration
Search Engine
Vertex AI Search
- OCR Parser for PDFs
- Vector embeddings
- Extractive answers
Infrastructure
GKE Autopilot
- Auto-scaling
- Workload Identity
- Load balancing
RAG Implementation
Two-Step RAG Pipeline
FixMyCar implements the classic RAG pattern:Java Backend Implementation
Streamlit Frontend
Vertex AI Search Configuration
OCR Parser for PDFs
Vertex AI Search uses Document AI’s OCR parser to extract text from owner’s manuals:Create Datastore
Configure with:
- Source: Cloud Storage bucket
- Parser: OCR Parser (not Layout Parser)
- Region: Global
- Enterprise features: Enabled
Indexing
Vertex AI Search automatically:
- Extracts text from PDFs
- Generates vector embeddings
- Creates extractive answer indexes
- Builds search indexes
Extractive Answers
Vertex AI Search returns structured extractive answers:- Accuracy: Direct quotes from source documents
- Low latency: No inference required during retrieval
- Grounding: Provenance with page numbers
GKE Deployment
Workload Identity Setup
FixMyCar uses GKE Workload Identity to authenticate with Vertex AI:Kubernetes Manifests
Deployment Steps
Prerequisites
- Google Cloud project with billing
- gcloud CLI installed
- Docker or Colima for container builds
- Java 18+, Maven 3.9.6+
- Python 3.9+
Configure Vertex AI Search
- Navigate to Agent Builder in console
- Create Search app:
YOUR_PROJECT_ID-fixmycar - Create datastore:
- Source: Cloud Storage bucket
- Parser: OCR Parser
- Region: Global
- Wait ~10 minutes for indexing
- Test in Preview interface
Testing & Validation
Example Queries
Backend Logs
View RAG pipeline execution:Performance Optimization
Caching Strategy
GKE Autoscaling
Troubleshooting
Pods stuck in Pending state
Pods stuck in Pending state
GKE Autopilot is scaling up nodes. Wait 3-5 minutes.
403 Forbidden from Vertex AI
403 Forbidden from Vertex AI
Check Workload Identity configuration:
Vertex AI Search returns no results
Vertex AI Search returns no results
Ensure:
- Datastore indexing completed (check Activity tab)
- OCR Parser selected (not Layout Parser)
- PDFs uploaded to correct bucket path
- Test query in Preview interface first
Backend returns 500 error
Backend returns 500 error
Check logs for detailed error:Common issues:
- Incorrect VERTEX_AI_DATASTORE_ID
- Missing GCP_PROJECT_ID
- Network policy blocking egress
Cleanup
Key Takeaways
Vertex AI Search
Managed search with OCR removes complexity of building custom RAG pipelines
Extractive Answers
Pre-computed answers ensure accurate, low-latency retrieval
GKE Workload Identity
Secure, keyless authentication for Google Cloud services
Spring Boot + Gemini
Java ecosystem integrates seamlessly with Vertex AI SDKs
Next Steps
- Explore GenWealth’s AlloyDB AI integration
- Learn about Spanner’s graph search
- Build real-time voice AI with Gemini Live