Skip to main content

Environment Variables

The agent reads configuration from environment variables with sensible defaults defined in agent.py:10-18.

Project and Dataset Configuration

PROJECT_ID
string
default:"datawarehouse-des"
The Google Cloud project ID containing the BigQuery datasets.
PROJECT_ID = os.getenv("PROJECT_ID", "datawarehouse-des")
BIGQUERY_DATASET
string
default:"STG_ACTIVOS"
The target BigQuery dataset for agent queries.
BIGQUERY_DATASET = os.getenv("BIGQUERY_DATASET", "STG_ACTIVOS")
GOOGLE_CLOUD_LOCATION
string
default:"us-east4"
The Google Cloud region for Vertex AI operations.
GOOGLE_CLOUD_LOCATION = os.getenv("GOOGLE_CLOUD_LOCATION", "us-east4")
NOMBRE_EMPRESA
string
default:"TRANSELEC S.A."
The organization name used in security rejection messages.
NOMBRE_EMPRESA = os.getenv("NOMBRE_EMPRESA", "TRANSELEC S.A.")

Model Configuration

ANALYTICS_AGENT_MODEL
string
default:"gemini-2.5-pro"
Legacy model configuration variable (retained for compatibility).
ANALYTICS_AGENT_MODEL = os.getenv("ANALYTICS_AGENT_MODEL", "gemini-2.5-pro")
LLM_1_NAME
string
default:"bigquery_agent_stg_activos"
The internal name identifier for the agent.
LLM_1_NAME = os.getenv("LLM_1_NAME", "bigquery_agent_stg_activos")
LLM_1_MODELO
string
default:"gemini-2.5-pro"
The Vertex AI model used by the agent. This is the active model configuration.
LLM_1_MODELO = os.getenv("LLM_1_MODELO", "gemini-2.5-pro")

Model Configuration

The agent uses Gemini 2.5 Pro as the default language model.

Supported Models

While gemini-2.5-pro is the default, you can configure any Vertex AI model by setting the LLM_1_MODELO environment variable:
export LLM_1_MODELO="gemini-2.5-pro"

Model Selection Criteria

When choosing a model, consider:
  • Gemini 2.5 Pro: Best for complex SQL generation and reasoning
  • Gemini 2.5 Flash: Faster responses, suitable for simpler queries
  • Gemini 1.5 Pro: Balance of performance and cost

Tool Configuration

The agent’s BigQuery integration is configured with strict security controls.

BigQueryToolConfig

Defined in agent.py:24-26:
tool_config = BigQueryToolConfig(
    write_mode=WriteMode.BLOCKED,
)
write_mode
WriteMode
default:"WriteMode.BLOCKED"
Controls write access to BigQuery. Set to WriteMode.BLOCKED to enforce read-only operations.

WriteMode Options

The WriteMode enum provides the following security levels:
ModeDescriptionUse Case
WriteMode.BLOCKEDPrevents all write operationsProduction analytics (current setting)
WriteMode.ALLOWEDPermits write operationsDevelopment/testing environments
Security Critical: The agent is configured with WriteMode.BLOCKED to prevent any data modification. Changing this setting could compromise data integrity.

Tool Integration

The configured tool is passed to the agent:
root_agent = LlmAgent(
    model=LLM_1_MODELO, 
    name=LLM_1_NAME,
    description="Agente para responder preguntas sobre datos y modelos de BigQuery",
    instruction=new_instruction,
    tools=[bigquery_toolset]  # ← Tool configuration applied here
)

Customizing the Instruction Prompt

The agent’s behavior is primarily controlled by the instruction prompt. You can customize it by modifying the new_instruction variable in agent.py:38-68.

Current Instruction Structure

1

Agent Role Definition

Defines the agent as a SQL generation engine:
new_instruction = f"""
Eres un motor de generación de SQL para BigQuery.
Tu ÚNICO objetivo es traducir lenguaje natural a código SQL válido...
"""
2

Security Guardrails

Specifies prohibited commands and rejection behavior:
<SECURITY_GUARDRAILS>
  1. MODO ESTRICTO: READ-ONLY.
  2. COMANDOS PROHIBIDOS: DROP, DELETE, UPDATE, INSERT...
</SECURITY_GUARDRAILS>
3

Operational Instructions

Defines tool usage and output format requirements:
<INSTRUCTIONS>
  - Tienes acceso a `bigquery_toolset`.
  - Tu prioridad absoluta es la sintaxis correcta...
</INSTRUCTIONS>

Customization Examples

new_instruction = f"""
Eres un motor de generación de SQL para BigQuery especializado en datos del sector eléctrico.
Tu ÚNICO objetivo es traducir lenguaje natural a código SQL válido para el proyecto **{PROJECT_ID}**, dataset **{BIGQUERY_DATASET}**.

Contexto del dominio:
- Los datos provienen de operaciones de transmisión eléctrica
- Las tablas contienen información de activos, mantenimiento y operaciones
- Los usuarios son analistas de ingeniería y operaciones

<SECURITY_GUARDRAILS>
  # ... rest of the prompt
"""
new_instruction = f"""
# ... existing prompt sections ...

FORMATO DE RESPUESTA ACEPTADO:
```sql
-- Query generated for: [user question]
SELECT ...
Incluye un comentario breve con la pregunta del usuario. """
</Accordion>

<Accordion title="Add Query Optimization Guidance">
```python
new_instruction = f"""
# ... existing prompt sections ...

<OPTIMIZATION_RULES>
  - Usa particiones de fecha cuando estén disponibles
  - Limita resultados con LIMIT cuando sea apropiado
  - Prefiere agregaciones sobre datos completos
  - Evita SELECT * en producción
</OPTIMIZATION_RULES>
"""

Configuration Best Practices

1

Use Environment Variables

Never hardcode configuration values. Always use environment variables for:
  • Project IDs and dataset names
  • Model selection
  • Region configuration
  • Organization names
2

Maintain Security Defaults

Keep WriteMode.BLOCKED in production environments:
tool_config = BigQueryToolConfig(
    write_mode=WriteMode.BLOCKED,  # ← Never change this in production
)
3

Test Instruction Changes

When modifying the instruction prompt:
  1. Test with diverse query types
  2. Verify security guardrails still work
  3. Ensure output format remains consistent
  4. Check that tool usage is still correct
4

Document Custom Configuration

If you modify defaults, document:
  • What was changed and why
  • Expected behavior differences
  • Any new environment variables

Example Configuration File

For deployment, create a .env file:
# Project Configuration
PROJECT_ID=datawarehouse-prod
BIGQUERY_DATASET=STG_ACTIVOS
GOOGLE_CLOUD_LOCATION=us-east4
NOMBRE_EMPRESA=TRANSELEC S.A.

# Model Configuration
LLM_1_NAME=bigquery_agent_stg_activos
LLM_1_MODELO=gemini-2.5-pro
Load environment variables using python-dotenv (already included in requirements.txt).

Build docs developers (and LLMs) love