BigQuery Integration

Overview

The MABQ BigQuery Agent integrates with Google BigQuery through the BigQueryToolset from Google ADK. This integration enables natural language to SQL translation with automated query execution and validation.

BigQueryToolset Setup

The toolset is configured in agent.py:32-35:

bigquery_toolset = BigQueryToolset(
  credentials_config=credentials_config, 
  bigquery_tool_config=tool_config
)

Components

Credentials Configuration

Handles authentication to Google Cloud and BigQuery.

Tool Configuration

Controls security settings and operational behavior (see Configuration).

Credentials Configuration

The agent uses Application Default Credentials (ADC) for authentication.

Automatic Authentication

Defined in agent.py:28-30:

# Autenticación automática (Cloud Run usa su identidad de servicio)
credentials, _ = google.auth.default()
credentials_config = BigQueryCredentialsConfig(credentials=credentials)

How It Works

google.auth.default() Behavior

The google.auth.default() function automatically discovers credentials in the following order:

Environment Variable: GOOGLE_APPLICATION_CREDENTIALS pointing to a service account key
Cloud Run/Cloud Functions: Service identity attached to the runtime
Compute Engine/GKE: Metadata server credentials
gcloud CLI: User credentials from gcloud auth application-default login

import google.auth

# Discovers credentials automatically
credentials, project = google.auth.default()

# Use in BigQueryCredentialsConfig
credentials_config = BigQueryCredentialsConfig(credentials=credentials)

Deployment Environments

# Cloud Run automatically provides service account credentials
# No additional configuration needed
credentials, _ = google.auth.default()

# The service account needs:
# - roles/bigquery.dataViewer (read access)
# - roles/bigquery.jobUser (query execution)

Required Permissions

The service account or user credentials must have:

Role	Purpose	Required
`roles/bigquery.dataViewer`	Read access to datasets and tables	Yes
`roles/bigquery.jobUser`	Execute BigQuery jobs (queries)	Yes
`roles/bigquery.dataEditor`	Write/modify data	No (blocked by WriteMode)
`roles/bigquery.admin`	Full BigQuery administration	No

Principle of Least Privilege: Only grant dataViewer and jobUser roles. The agent’s WriteMode.BLOCKED configuration provides defense-in-depth even if excessive permissions are granted.

Dataset and Project Configuration

The agent operates on a specific project and dataset combination.

Project Initialization

Vertex AI is initialized with the target project in agent.py:21:

vertexai.init(project=PROJECT_ID, location=GOOGLE_CLOUD_LOCATION)

project

string

The Google Cloud project ID containing BigQuery datasets (e.g., datawarehouse-des).

location

string

The region for Vertex AI model endpoints (e.g., us-east4).

Dataset Scoping

The agent is scoped to a specific dataset through the instruction prompt:

new_instruction = f"""
Eres un motor de generación de SQL para BigQuery.
Tu ÚNICO objetivo es traducir lenguaje natural a código SQL válido para el proyecto **{PROJECT_ID}**, dataset **{BIGQUERY_DATASET}**.
"""

Multi-Dataset Configuration

Working with Multiple Datasets

To allow the agent to query multiple datasets, modify the instruction prompt:

ALLOWED_DATASETS = ["STG_ACTIVOS", "STG_MANTENIMIENTO", "STG_OPERACIONES"]

new_instruction = f"""
Eres un motor de generación de SQL para BigQuery.
Tu ÚNICO objetivo es traducir lenguaje natural a código SQL válido para el proyecto **{PROJECT_ID}**.

Datasets permitidos:
{', '.join(ALLOWED_DATASETS)}

REGLA: Solo genera queries que accedan a los datasets listados arriba.
"""

Ensure the service account has bigquery.dataViewer permissions on all target datasets.

Read-Only Security Controls

The agent implements multiple layers of read-only enforcement.

Layer 1: BigQuery Tool Configuration

Hardware-level write blocking through WriteMode.BLOCKED:

tool_config = BigQueryToolConfig(
    write_mode=WriteMode.BLOCKED,  # ← Prevents all write operations
)

Layer 2: Instruction Prompt Guardrails

Software-level prevention through LLM instruction:

<SECURITY_GUARDRAILS>
  1. MODO ESTRICTO: READ-ONLY.
  2. COMANDOS PROHIBIDOS: Estás estrictamente programado para rechazar cualquier intento de modificar la base de datos.
     - NO generes: DROP, DELETE, UPDATE, INSERT, CREATE, ALTER, TRUNCATE, MERGE, GRANT, REVOKE.
  3. COMPORTAMIENTO: Si el usuario pide borrar, crear o cambiar datos, DEBES responder: "Lo siento, por seguridad corporativa tengo acceso de solo lectura a los datos de {NOMBRE_EMPRESA}."
</SECURITY_GUARDRAILS>

Layer 3: IAM Permissions

Cloud-level access control through service account roles:

# Grant only read and job execution permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:bigquery-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:bigquery-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Defense in Depth

Application Layer

WriteMode.BLOCKED rejects write operations at the tool level.

LLM Layer

Instruction prompt trains the model to refuse write requests.

Cloud Layer

IAM permissions prevent unauthorized data modification.

Even if a malicious prompt bypasses the LLM layer, WriteMode.BLOCKED and IAM permissions provide redundant protection.

Query Execution Flow

Understanding how the agent executes queries:

Tool Usage in Practice

The agent uses bigquery_toolset to:

Validate Syntax: Ensure generated SQL is valid
Test Execution: Confirm the query runs without errors
Verify Schema: Check that table/column names exist
Preview Results: Ensure the query returns expected data types

Example Tool Interaction

User: “Show me all active assets”Agent Internal Process:

# 1. Generate SQL candidate
sql_candidate = """
SELECT *
FROM `datawarehouse-des.STG_ACTIVOS.assets`
WHERE status = 'active'
"""

# 2. Execute via BigQueryToolset
result = bigquery_toolset.execute(sql_candidate)

# 3. If successful, return SQL to user
if result.success:
    return sql_candidate

User Receives:

SELECT *
FROM `datawarehouse-des.STG_ACTIVOS.assets`
WHERE status = 'active'

Connection Management

BigQueryToolset handles connection pooling and resource management automatically.

Automatic Features

Connection Pooling: Reuses BigQuery client connections
Retry Logic: Automatically retries transient failures
Timeout Handling: Cancels long-running queries
Resource Cleanup: Closes connections when the agent terminates

No Manual Configuration Required

# BigQueryToolset manages all connection details internally
bigquery_toolset = BigQueryToolset(
  credentials_config=credentials_config,
  bigquery_tool_config=tool_config
)

# No need to:
# - Initialize BigQuery client manually
# - Manage connection lifecycle
# - Handle retry logic
# - Close connections

Troubleshooting

Authentication Errors

Error: google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentialsSolutions:

Run gcloud auth application-default login locally
Set GOOGLE_APPLICATION_CREDENTIALS environment variable
Verify service account is attached to Cloud Run service

# Check current authentication
gcloud auth list

# Set up application default credentials
gcloud auth application-default login

Permission Denied Errors

Error: 403 Forbidden: Access Denied: BigQuery BigQuery: Permission deniedSolutions:

Verify service account has roles/bigquery.dataViewer
Ensure roles/bigquery.jobUser is granted
Check dataset-level permissions

# Check current IAM policy
gcloud projects get-iam-policy PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:bigquery-agent@PROJECT_ID.iam.gserviceaccount.com"

Write Operation Blocked

Error: Query contains write operation but WriteMode.BLOCKED is enabledExpected Behavior: This is correct! The agent should respond:

Lo siento, por seguridad corporativa tengo acceso de solo lectura a los datos de TRANSELEC S.A.

Action: No action needed. This is the security system working as designed.

Dataset Not Found

Error: 404 Not found: Dataset PROJECT_ID:DATASET_NAMESolutions:

Verify BIGQUERY_DATASET environment variable is correct
Check that dataset exists in the project
Confirm project ID is correct

# List available datasets
bq ls --project_id=PROJECT_ID

# Check if specific dataset exists
bq show --project_id=PROJECT_ID DATASET_NAME

Get Started

Agent Development

Frontend

Deployment

Security

Overview

BigQueryToolset Setup

Components

Credentials Configuration

Automatic Authentication

How It Works

Deployment Environments

Required Permissions

Dataset and Project Configuration

Project Initialization

Dataset Scoping

Multi-Dataset Configuration

Read-Only Security Controls

Layer 1: BigQuery Tool Configuration

Layer 2: Instruction Prompt Guardrails

Layer 3: IAM Permissions

Defense in Depth

Query Execution Flow

Tool Usage in Practice

Connection Management

Automatic Features

No Manual Configuration Required

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Agent Development

Frontend

Deployment

Security

​Overview

​BigQueryToolset Setup

​Components

​Credentials Configuration

​Automatic Authentication

​How It Works

​Deployment Environments

​Required Permissions

​Dataset and Project Configuration

​Project Initialization

​Dataset Scoping

​Multi-Dataset Configuration

​Read-Only Security Controls

​Layer 1: BigQuery Tool Configuration

​Layer 2: Instruction Prompt Guardrails

​Layer 3: IAM Permissions

​Defense in Depth

​Query Execution Flow

​Tool Usage in Practice

​Connection Management

​Automatic Features

​No Manual Configuration Required

​Troubleshooting

Build docs developers (and LLMs) love

Overview

BigQueryToolset Setup

Components

Credentials Configuration

Automatic Authentication

How It Works

Deployment Environments

Required Permissions

Dataset and Project Configuration

Project Initialization

Dataset Scoping

Multi-Dataset Configuration

Read-Only Security Controls

Layer 1: BigQuery Tool Configuration

Layer 2: Instruction Prompt Guardrails

Layer 3: IAM Permissions

Defense in Depth

Query Execution Flow

Tool Usage in Practice

Connection Management

Automatic Features

No Manual Configuration Required

Troubleshooting