Cloud Run Deployment

Deploy Genkit applications to Google Cloud Run with automatic scaling, containerization, and support for all languages (JavaScript, Go, Python).

Overview

Cloud Run provides:

Fully managed - Serverless container platform
Any language - JavaScript, Go, Python, or any container
Automatic scaling - Scale to zero when not in use
Pay per use - Only pay for actual request time
Custom domains - Map to your own domain

Prerequisites

# Install Google Cloud CLI
curl https://sdk.cloud.google.com | bash

# Login and set project
gcloud auth login
gcloud config set project YOUR_PROJECT_ID

# Enable required APIs
gcloud services enable run.googleapis.com
gcloud services enable cloudbuild.googleapis.com

Node.js Deployment

1. Create Express Server

src/index.ts

import { expressHandler, startFlowServer } from '@genkit-ai/express';
import { googleAI } from '@genkit-ai/google-genai';
import express from 'express';
import { genkit, z } from 'genkit';

const ai = genkit({
  plugins: [googleAI()],
});

const jokeFlow = ai.defineFlow(
  {
    name: 'jokeFlow',
    inputSchema: z.string(),
    outputSchema: z.string(),
  },
  async (subject) => {
    const result = await ai.generate({
      model: googleAI.model('gemini-2.5-flash'),
      prompt: `Tell me a joke about ${subject}`,
    });
    return result.text;
  }
);

const app = express();
app.use(express.json());

// Health check for Cloud Run
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'healthy' });
});

// Expose flow
app.post('/joke', expressHandler(jokeFlow));

const port = process.env.PORT || 8080;
app.listen(port, () => {
  console.log(`Server listening on port ${port}`);
});

2. Create Dockerfile

Dockerfile

FROM node:20-slim

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

ENV PORT=8080
EXPOSE 8080

CMD ["node", "dist/index.js"]

3. Create .dockerignore

.dockerignore

node_modules
npm-debug.log
.git
.gitignore
README.md
.env
*.local
dist
build

4. Deploy to Cloud Run

# Build and deploy in one command
gcloud run deploy genkit-app \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars GEMINI_API_KEY=your-api-key

# Or build separately
gcloud builds submit --tag gcr.io/PROJECT_ID/genkit-app
gcloud run deploy genkit-app \
  --image gcr.io/PROJECT_ID/genkit-app \
  --region us-central1

Go Deployment

1. Create Go Server

main.go

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"

    "github.com/firebase/genkit/go/ai"
    "github.com/firebase/genkit/go/genkit"
    "github.com/firebase/genkit/go/plugins/googlegenai"
)

func main() {
    ctx := context.Background()

    // Initialize Genkit
    g := genkit.Init(ctx, genkit.WithPlugins(&googlegenai.GoogleAI{}))

    // Define a flow
    genkit.DefineFlow(g, "jokeFlow", 
        func(ctx context.Context, input string) (string, error) {
            if input == "" {
                input = "programming"
            }

            return genkit.GenerateText(ctx, g,
                ai.WithModelName("googleai/gemini-2.5-flash"),
                ai.WithPrompt("Tell me a joke about %s.", input),
            )
        },
    )

    // Create HTTP server
    mux := http.NewServeMux()

    // Health check
    mux.HandleFunc("GET /health", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte(`{"status":"healthy"}`))
    })

    // Expose all flows as HTTP endpoints
    for _, flow := range genkit.ListFlows(g) {
        mux.HandleFunc("POST /"+flow.Name(), genkit.Handler(flow))
    }

    // Get port from environment (Cloud Run sets this)
    port := os.Getenv("PORT")
    if port == "" {
        port = "8080"
    }

    addr := fmt.Sprintf(":%s", port)
    log.Printf("Server listening on %s", addr)
    log.Fatal(http.ListenAndServe(addr, mux))
}

2. Create Dockerfile for Go

Dockerfile

# Build stage
FROM golang:1.22-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /server .

# Runtime stage
FROM alpine:latest

RUN apk --no-cache add ca-certificates
WORKDIR /root/

COPY --from=builder /server .

ENV PORT=8080
EXPOSE 8080

CMD ["./server"]

3. Deploy Go App

gcloud run deploy genkit-go-app \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars GEMINI_API_KEY=your-api-key

Python Deployment

1. Create FastAPI Server

main.py

import os
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn

from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(
    plugins=[GoogleAI()],
    model='googleai/gemini-2.0-flash',
)

app = FastAPI(title='Genkit App')

class JokeRequest(BaseModel):
    subject: str

class JokeResponse(BaseModel):
    text: str

@app.get('/health')
async def health():
    return {'status': 'healthy'}

@ai.flow()
async def joke_flow(subject: str) -> str:
    """Generate a joke about a subject."""
    response = await ai.generate(
        prompt=f'Tell me a joke about {subject}'
    )
    return response.text

@app.post('/joke', response_model=JokeResponse)
async def joke_endpoint(request: JokeRequest) -> JokeResponse:
    result = await joke_flow(request.subject)
    return JokeResponse(text=result)

if __name__ == '__main__':
    port = int(os.getenv('PORT', 8080))
    uvicorn.run(app, host='0.0.0.0', port=port)

2. Create requirements.txt

requirements.txt

fastapi
uvicorn[standard]
genkit
genkit-plugin-google-genai

3. Create Dockerfile for Python

Dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV PORT=8080
EXPOSE 8080

CMD ["python", "main.py"]

4. Deploy Python App

gcloud run deploy genkit-python-app \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars GEMINI_API_KEY=your-api-key

Configuration

Environment Variables

# Set environment variables
gcloud run deploy genkit-app \
  --set-env-vars GEMINI_API_KEY=your-key \
  --set-env-vars LOG_LEVEL=info

# Or use Secret Manager
gcloud run deploy genkit-app \
  --update-secrets GEMINI_API_KEY=genkit-api-key:latest

Memory and CPU

gcloud run deploy genkit-app \
  --memory 2Gi \
  --cpu 2 \
  --timeout 300s  # 5 minutes

Concurrency and Autoscaling

gcloud run deploy genkit-app \
  --concurrency 80 \
  --min-instances 1 \
  --max-instances 100

Custom Domain

# Map to your domain
gcloud run domain-mappings create \
  --service genkit-app \
  --domain api.yourdomain.com \
  --region us-central1

Authentication

Require Authentication

# Deploy with authentication required
gcloud run deploy genkit-app \
  --no-allow-unauthenticated

# Call with authentication
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  https://genkit-app-xxx.run.app/joke

Service Account

# Create service account
gcloud iam service-accounts create genkit-service

# Grant permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:genkit-service@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

# Deploy with service account
gcloud run deploy genkit-app \
  --service-account genkit-service@PROJECT_ID.iam.gserviceaccount.com

Monitoring

View Logs

# Stream logs
gcloud run services logs tail genkit-app \
  --region us-central1

# View in Cloud Console
echo "https://console.cloud.google.com/run/detail/us-central1/genkit-app/logs"

Enable Tracing

import { enableGoogleCloudTelemetry } from '@genkit-ai/google-cloud';

enableGoogleCloudTelemetry({
  projectId: 'your-project-id',
});

Testing

Test Deployed Service

# Get service URL
SERVICE_URL=$(gcloud run services describe genkit-app \
  --region us-central1 \
  --format 'value(status.url)')

# Test health check
curl $SERVICE_URL/health

# Test flow
curl -X POST $SERVICE_URL/joke \
  -H "Content-Type: application/json" \
  -d '{"data": "programming"}'

Load Testing

# Install Apache Bench
sudo apt-get install apache2-utils

# Run load test
ab -n 100 -c 10 -p data.json -T application/json \
  $SERVICE_URL/joke

Multi-Region Deployment

Deploy to multiple regions for lower latency:

# Deploy to multiple regions
for region in us-central1 europe-west1 asia-east1; do
  gcloud run deploy genkit-app \
    --region $region \
    --source .
done

# Use Cloud Load Balancer for global routing
gcloud compute backend-services create genkit-backend \
  --global \
  --load-balancing-scheme=EXTERNAL

Cost Optimization

Scale to Zero

# Allow scaling to zero (default)
gcloud run deploy genkit-app \
  --min-instances 0

CPU Allocation

# Only allocate CPU during request processing
gcloud run deploy genkit-app \
  --cpu-throttling  # Default

# Keep CPU always allocated (faster response, higher cost)
gcloud run deploy genkit-app \
  --no-cpu-throttling

Troubleshooting

Container Fails to Start

Problem: Service deployment fails. Solution: Check logs:

gcloud run services logs read genkit-app \
  --region us-central1 \
  --limit 50

Timeout Errors

Problem: Requests timeout. Solution: Increase timeout:

gcloud run deploy genkit-app \
  --timeout 540s  # Max 60 minutes for 2nd gen

Out of Memory

Problem: Container crashes with OOM. Solution: Increase memory:

gcloud run deploy genkit-app \
  --memory 4Gi

Best Practices

Use health checks - Cloud Run uses / by default, add a dedicated endpoint
Set appropriate timeouts - AI operations need longer timeouts than default
Enable tracing - Use Cloud Trace for debugging
Use secrets - Store API keys in Secret Manager, not environment variables
Implement graceful shutdown - Handle SIGTERM signals
Monitor costs - Set up billing alerts

Overview

Getting Started

Core Concepts

Guides

Model Providers

Deployment

Developer Tools

​Cloud Run Deployment

​Overview

​Prerequisites

​Node.js Deployment

​1. Create Express Server

​2. Create Dockerfile

​3. Create .dockerignore

​4. Deploy to Cloud Run

​Go Deployment

​1. Create Go Server

​2. Create Dockerfile for Go

​3. Deploy Go App

​Python Deployment

​1. Create FastAPI Server

​2. Create requirements.txt

​3. Create Dockerfile for Python

​4. Deploy Python App

​Configuration

​Environment Variables

​Memory and CPU

​Concurrency and Autoscaling

​Custom Domain

​Authentication

​Require Authentication

​Service Account

​Monitoring

​View Logs

​Enable Tracing

​Testing

​Test Deployed Service

​Load Testing

​Multi-Region Deployment

​Cost Optimization

​Scale to Zero

​CPU Allocation

​Troubleshooting

​Container Fails to Start

​Timeout Errors

​Out of Memory

​Best Practices

​Next Steps

Express Plugin

Monitoring

Build docs developers (and LLMs) love

Cloud Run Deployment

Overview

Prerequisites

Node.js Deployment

1. Create Express Server

2. Create Dockerfile

3. Create .dockerignore

4. Deploy to Cloud Run

Go Deployment

1. Create Go Server

2. Create Dockerfile for Go

3. Deploy Go App

Python Deployment

1. Create FastAPI Server

2. Create requirements.txt

3. Create Dockerfile for Python

4. Deploy Python App

Configuration

Environment Variables

Memory and CPU

Concurrency and Autoscaling

Custom Domain

Authentication

Require Authentication

Service Account

Monitoring

View Logs

Enable Tracing

Testing

Test Deployed Service

Load Testing

Multi-Region Deployment

Cost Optimization

Scale to Zero

CPU Allocation

Troubleshooting

Container Fails to Start

Timeout Errors

Out of Memory

Best Practices

Next Steps