Skip to main content

Overview

The MABQ BigQuery Agent integrates with Google BigQuery through the BigQueryToolset from Google ADK. This integration enables natural language to SQL translation with automated query execution and validation.

BigQueryToolset Setup

The toolset is configured in agent.py:32-35:
bigquery_toolset = BigQueryToolset(
  credentials_config=credentials_config, 
  bigquery_tool_config=tool_config
)

Components

1

Credentials Configuration

Handles authentication to Google Cloud and BigQuery.
2

Tool Configuration

Controls security settings and operational behavior (see Configuration).

Credentials Configuration

The agent uses Application Default Credentials (ADC) for authentication.

Automatic Authentication

Defined in agent.py:28-30:
# Autenticación automática (Cloud Run usa su identidad de servicio)
credentials, _ = google.auth.default()
credentials_config = BigQueryCredentialsConfig(credentials=credentials)

How It Works

The google.auth.default() function automatically discovers credentials in the following order:
  1. Environment Variable: GOOGLE_APPLICATION_CREDENTIALS pointing to a service account key
  2. Cloud Run/Cloud Functions: Service identity attached to the runtime
  3. Compute Engine/GKE: Metadata server credentials
  4. gcloud CLI: User credentials from gcloud auth application-default login
import google.auth

# Discovers credentials automatically
credentials, project = google.auth.default()

# Use in BigQueryCredentialsConfig
credentials_config = BigQueryCredentialsConfig(credentials=credentials)

Deployment Environments

# Cloud Run automatically provides service account credentials
# No additional configuration needed
credentials, _ = google.auth.default()

# The service account needs:
# - roles/bigquery.dataViewer (read access)
# - roles/bigquery.jobUser (query execution)

Required Permissions

The service account or user credentials must have:
RolePurposeRequired
roles/bigquery.dataViewerRead access to datasets and tablesYes
roles/bigquery.jobUserExecute BigQuery jobs (queries)Yes
roles/bigquery.dataEditorWrite/modify dataNo (blocked by WriteMode)
roles/bigquery.adminFull BigQuery administrationNo
Principle of Least Privilege: Only grant dataViewer and jobUser roles. The agent’s WriteMode.BLOCKED configuration provides defense-in-depth even if excessive permissions are granted.

Dataset and Project Configuration

The agent operates on a specific project and dataset combination.

Project Initialization

Vertex AI is initialized with the target project in agent.py:21:
vertexai.init(project=PROJECT_ID, location=GOOGLE_CLOUD_LOCATION)
project
string
The Google Cloud project ID containing BigQuery datasets (e.g., datawarehouse-des).
location
string
The region for Vertex AI model endpoints (e.g., us-east4).

Dataset Scoping

The agent is scoped to a specific dataset through the instruction prompt:
new_instruction = f"""
Eres un motor de generación de SQL para BigQuery.
Tu ÚNICO objetivo es traducir lenguaje natural a código SQL válido para el proyecto **{PROJECT_ID}**, dataset **{BIGQUERY_DATASET}**.
"""

Multi-Dataset Configuration

To allow the agent to query multiple datasets, modify the instruction prompt:
ALLOWED_DATASETS = ["STG_ACTIVOS", "STG_MANTENIMIENTO", "STG_OPERACIONES"]

new_instruction = f"""
Eres un motor de generación de SQL para BigQuery.
Tu ÚNICO objetivo es traducir lenguaje natural a código SQL válido para el proyecto **{PROJECT_ID}**.

Datasets permitidos:
{', '.join(ALLOWED_DATASETS)}

REGLA: Solo genera queries que accedan a los datasets listados arriba.
"""
Ensure the service account has bigquery.dataViewer permissions on all target datasets.

Read-Only Security Controls

The agent implements multiple layers of read-only enforcement.

Layer 1: BigQuery Tool Configuration

Hardware-level write blocking through WriteMode.BLOCKED:
tool_config = BigQueryToolConfig(
    write_mode=WriteMode.BLOCKED,  # ← Prevents all write operations
)

Layer 2: Instruction Prompt Guardrails

Software-level prevention through LLM instruction:
<SECURITY_GUARDRAILS>
  1. MODO ESTRICTO: READ-ONLY.
  2. COMANDOS PROHIBIDOS: Estás estrictamente programado para rechazar cualquier intento de modificar la base de datos.
     - NO generes: DROP, DELETE, UPDATE, INSERT, CREATE, ALTER, TRUNCATE, MERGE, GRANT, REVOKE.
  3. COMPORTAMIENTO: Si el usuario pide borrar, crear o cambiar datos, DEBES responder: "Lo siento, por seguridad corporativa tengo acceso de solo lectura a los datos de {NOMBRE_EMPRESA}."
</SECURITY_GUARDRAILS>

Layer 3: IAM Permissions

Cloud-level access control through service account roles:
# Grant only read and job execution permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:bigquery-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:bigquery-agent@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Defense in Depth

1

Application Layer

WriteMode.BLOCKED rejects write operations at the tool level.
2

LLM Layer

Instruction prompt trains the model to refuse write requests.
3

Cloud Layer

IAM permissions prevent unauthorized data modification.
Even if a malicious prompt bypasses the LLM layer, WriteMode.BLOCKED and IAM permissions provide redundant protection.

Query Execution Flow

Understanding how the agent executes queries:

Tool Usage in Practice

The agent uses bigquery_toolset to:
  1. Validate Syntax: Ensure generated SQL is valid
  2. Test Execution: Confirm the query runs without errors
  3. Verify Schema: Check that table/column names exist
  4. Preview Results: Ensure the query returns expected data types
User: “Show me all active assets”Agent Internal Process:
# 1. Generate SQL candidate
sql_candidate = """
SELECT *
FROM `datawarehouse-des.STG_ACTIVOS.assets`
WHERE status = 'active'
"""

# 2. Execute via BigQueryToolset
result = bigquery_toolset.execute(sql_candidate)

# 3. If successful, return SQL to user
if result.success:
    return sql_candidate
User Receives:
SELECT *
FROM `datawarehouse-des.STG_ACTIVOS.assets`
WHERE status = 'active'

Connection Management

BigQueryToolset handles connection pooling and resource management automatically.

Automatic Features

  • Connection Pooling: Reuses BigQuery client connections
  • Retry Logic: Automatically retries transient failures
  • Timeout Handling: Cancels long-running queries
  • Resource Cleanup: Closes connections when the agent terminates

No Manual Configuration Required

# BigQueryToolset manages all connection details internally
bigquery_toolset = BigQueryToolset(
  credentials_config=credentials_config,
  bigquery_tool_config=tool_config
)

# No need to:
# - Initialize BigQuery client manually
# - Manage connection lifecycle
# - Handle retry logic
# - Close connections

Troubleshooting

Error: google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentialsSolutions:
  1. Run gcloud auth application-default login locally
  2. Set GOOGLE_APPLICATION_CREDENTIALS environment variable
  3. Verify service account is attached to Cloud Run service
# Check current authentication
gcloud auth list

# Set up application default credentials
gcloud auth application-default login
Error: 403 Forbidden: Access Denied: BigQuery BigQuery: Permission deniedSolutions:
  1. Verify service account has roles/bigquery.dataViewer
  2. Ensure roles/bigquery.jobUser is granted
  3. Check dataset-level permissions
# Check current IAM policy
gcloud projects get-iam-policy PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:bigquery-agent@PROJECT_ID.iam.gserviceaccount.com"
Error: Query contains write operation but WriteMode.BLOCKED is enabledExpected Behavior: This is correct! The agent should respond:
Lo siento, por seguridad corporativa tengo acceso de solo lectura a los datos de TRANSELEC S.A.
Action: No action needed. This is the security system working as designed.
Error: 404 Not found: Dataset PROJECT_ID:DATASET_NAMESolutions:
  1. Verify BIGQUERY_DATASET environment variable is correct
  2. Check that dataset exists in the project
  3. Confirm project ID is correct
# List available datasets
bq ls --project_id=PROJECT_ID

# Check if specific dataset exists
bq show --project_id=PROJECT_ID DATASET_NAME

Build docs developers (and LLMs) love