Skip to main content

Overview

Vertex AI Agent Engine is a fully managed platform for deploying AI agents at scale. It handles infrastructure, scaling, and operational complexity so you can focus on building agent logic. Key features:
  • Automatic scaling from zero to millions of requests
  • Built-in Memory Bank for persistent agent memory
  • Support for ADK, LangGraph, and custom frameworks
  • Terraform deployment automation
  • Integrated monitoring and logging
Agent Engine includes a free Express Mode for 90 days with no billing account required—perfect for learning and prototyping.

Deployment Methods

Agent Engine supports three deployment approaches:

Express Mode

Free & Fast
  • No billing account for 90 days
  • Simple API key authentication
  • Deploy with ADK CLI
  • Perfect for learning

Agent Object

Interactive Development
  • Create agents in notebooks
  • Direct deployment from code
  • Ideal for experimentation
  • Requires Cloud Storage bucket

Inline Source

Production CI/CD
  • Deploy from source files
  • Version control friendly
  • Terraform compatible
  • No Cloud Storage needed

Express Mode Deployment

The fastest way to deploy your first agent:
1

Get API Key

  1. Sign up at console.cloud.google.com/expressmode
  2. Navigate to APIs & Services > Credentials
  3. Copy your Generative Language API Key
2

Install ADK

pip install google-adk
3

Create Agent

adk create my_agent --api_key=YOUR_API_KEY
This creates a directory with:
  • agent.py - Agent definition
  • requirements.txt - Dependencies
  • .adk/config.json - Configuration
4

Deploy

adk deploy agent_engine my_agent
Deployment takes 5-10 minutes. You’ll receive a resource name like:
projects/123.../locations/us-central1/reasoningEngines/456...
5

Query Your Agent

import vertexai

client = vertexai.Client(api_key=api_key)
agent = client.agent_engines.get(name=agent_resource_name)

async for item in agent.async_stream_query(
    message="What are the latest AI announcements from Google?",
    user_id="demo_user",
):
    if "content" in item and item["content"]:
        for part in item["content"]["parts"]:
            if "text" in part:
                print(part["text"], end="", flush=True)

Agent Object Deployment

Deploy agents created in notebooks or Python scripts:
from google.adk.agents import LlmAgent
from google.adk.tools import google_search
from vertexai import agent_engines

# Create agent in memory
agent = LlmAgent(
    name="search_agent",
    model="gemini-2.5-flash",
    description="A production agent that can search the web",
    instruction="Use Google Search for fresh information. Cite sources.",
    tools=[google_search],
)

# Wrap in AdkApp for deployment
adk_app = agent_engines.AdkApp(
    agent=agent,
    enable_tracing=True,
)

Inline Source Deployment

Deploy agents from source files for CI/CD pipelines:
1

Project Structure

my_agent/
├── agent_package/
│   ├── __init__.py
│   └── agent.py          # Agent definition
├── deployment/
│   └── deploy.py         # AdkApp wrapper
└── requirements.txt
2

Define Agent

agent_package/agent.py
from google.adk.agents import LlmAgent
from google.adk.tools import google_search

root_agent = LlmAgent(
    name="academic_research",
    model="gemini-2.5-flash",
    description="Answer academic research questions",
    instruction="Search for scholarly information",
    tools=[google_search],
)
3

Create Deployment Wrapper

deployment/deploy.py
from vertexai import agent_engines
from agent_package.agent import root_agent

adk_app = agent_engines.AdkApp(
    agent=root_agent,
    enable_tracing=True,
)
4

Deploy

import vertexai

client = vertexai.Client(
    project="your-project",
    location="us-central1"
)

agent = client.agent_engines.create(
    config={
        "display_name": "Academic Research Agent",
        "source_packages": ["agent_package", "deployment", "requirements.txt"],
        "entrypoint_module": "deployment.deploy",
        "entrypoint_object": "adk_app",
        "class_methods": [
            {
                "name": "async_stream_query",
                "api_mode": "async_stream",
                "description": "Stream responses",
            },
            {
                "name": "async_create_session",
                "api_mode": "async",
                "description": "Create session",
            },
        ],
    },
)

Memory Bank Integration

Memory Bank provides persistent, context-aware memory for agents:

What is Memory Bank?

Memory Bank is a managed service that gives agents the ability to:
  • Remember user preferences and context across sessions
  • Consolidate information like a human brain during sleep
  • Retrieve relevant memories for personalized interactions
  • Scale to millions of users with automatic memory management
Memory Bank uses Gemini models to automatically extract, consolidate, and retrieve memories from conversations—no manual memory management required.

Creating an Agent with Memory Bank

import vertexai
from vertexai import types

# Configuration aliases
MemoryBankConfig = types.ReasoningEngineContextSpecMemoryBankConfig
SimilaritySearchConfig = types.ReasoningEngineContextSpecMemoryBankConfigSimilaritySearchConfig
GenerationConfig = types.ReasoningEngineContextSpecMemoryBankConfigGenerationConfig

client = vertexai.Client(project=PROJECT_ID, location=LOCATION)

# Create Memory Bank configuration
memory_config = MemoryBankConfig(
    similarity_search_config=SimilaritySearchConfig(
        embedding_model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/text-embedding-005"
    ),
    generation_config=GenerationConfig(
        model=f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/gemini-2.5-flash"
    ),
)

# Create Agent Engine with Memory Bank
agent_engine = client.agent_engines.create(
    config={"context_spec": {"memory_bank_config": memory_config}}
)

Memory Retrieval Methods

Scope-Based

Get all memories for a userUse when:
  • Building user profiles
  • Displaying preference dashboards
  • Small number of memories
results = memories.retrieve(
    scope={"user_id": user_id}
)

Similarity Search

Get relevant memories for specific questionsUse when:
  • Answering targeted questions
  • User has many memories
  • Need fast, focused responses
results = memories.retrieve(
    scope={"user_id": user_id},
    similarity_search_params={
        "search_query": "dietary needs?",
        "top_k": 3,
    },
)

Terraform Deployment

Automate infrastructure and agent deployment:
module "agent_engine_project" {
  source  = "terraform-google-modules/project-factory/google"
  version = "~> 14.0"

  name              = "agent-engine-demo"
  billing_account   = var.billing_account
  org_id            = var.org_id
  
  activate_apis = [
    "aiplatform.googleapis.com",
    "cloudaicompanion.googleapis.com",
  ]
}

resource "google_vertex_ai_agent_engine" "demo_agent" {
  project  = module.agent_engine_project.project_id
  location = "us-central1"
  
  display_name = "Demo Agent"
  
  reasoning_engine {
    source_code {
      inline_source {
        source_packages     = ["agent_package"]
        entrypoint_module   = "agent_package.agent"
        entrypoint_object   = "root_agent"
      }
    }
  }
}

Multi-Agent Systems

Deploy orchestrated multi-agent architectures:
Multi-Agent with Claude
from google.adk.agents import LlmAgent
from google.adk.runners import RunnerContext

# Create specialized agents
research_agent = LlmAgent(
    name="researcher",
    model="claude-4-sonnet@20250514",
    instruction="Search and synthesize research",
)

writing_agent = LlmAgent(
    name="writer",
    model="gemini-2.5-flash",
    instruction="Create polished content",
)

# Orchestrator
root_agent = LlmAgent(
    name="orchestrator",
    model="gemini-2.5-flash",
    instruction="Route tasks to specialized agents",
    agents=[research_agent, writing_agent],
)

# Deploy
adk_app = agent_engines.AdkApp(agent=root_agent)
remote = client.agent_engines.create(agent=adk_app, config=deployment_config)

Monitoring and Observability

Cloud Logging

All agent requests automatically logged to Cloud Logging
gcloud logging read \
  "resource.type=vertex_ai_agent_engine"

Cloud Trace

Enable tracing for performance monitoring
adk_app = agent_engines.AdkApp(
    agent=agent,
    enable_tracing=True,
)

Custom Metrics

Export custom metrics to Cloud MonitoringTrack:
  • Request latency
  • Token usage
  • Error rates
  • Memory operations

Audit Logs

Compliance-ready audit trails
  • Who deployed what
  • Configuration changes
  • Access patterns

Best Practices

1

Use Express Mode for Learning

Start with Express Mode to understand Agent Engine without billing setup
2

Version Control Your Agents

Use Inline Source deployment for production to maintain agent code in Git
3

Implement Sessions

Always create sessions for stateful conversations with Memory Bank
4

Monitor Token Usage

Track Gemini token consumption through Cloud Monitoring for cost optimization
5

Use Similarity Search

For users with extensive history, use similarity search instead of retrieving all memories

Next Steps

ADK Documentation

Learn how to build agents with the Agent Development Kit

Memory Bank Guide

Deep dive into Memory Bank capabilities

Terraform Examples

Infrastructure-as-code templates

Multi-Agent Patterns

Build collaborative agent systems

Build docs developers (and LLMs) love