Skip to main content

Overview

The Databricks integration provides a unified data platform for advanced analytics, enabling SQL-based queries and data management across multiple business domains including SCADA analysis, recipe control, and route management.

Architecture

The Databricks integration is built on a comprehensive archetype pattern:

Core Components

  1. DatabricksAClient - HTTP client for Databricks SQL API (DataBricks/Database/DatabricksAClient.cs)
  2. GenericRepository - Base repository pattern for data access (DataBricks/Archetype/Repository/GenericRepository.cs)
  3. DatabricksExecutor - SQL execution engine
  4. SqlBuilder - Dynamic SQL query builder
  5. DatabricksUnitOfWork - Transaction management

Service Interfaces

Location: Program.cs:377-379
builder.Services.AddScoped<IDatabricksApiClient, DatabricksApiClient>();
builder.Services.AddScoped<IDatabricksRepository, DatabricksRepository>();

Authentication

Databricks supports three authentication modes:

1. Static Token

{
  "Databricks": {
    "AuthMode": "StaticToken",
    "AccessToken": "your-databricks-token"
  }
}

2. Azure CLI

Uses Azure CLI credentials for authentication:
{
  "Databricks": {
    "AuthMode": "AzureCli"
  }
}

3. Managed Identity

Uses Azure Managed Identity (for production deployments):
{
  "Databricks": {
    "AuthMode": "ManagedIdentity"
  }
}
Location: DataBricks/Database/DatabricksAClient.cs:9-66

Business Domains

The Databricks integration covers four main business domains:

1. RegistroNIR (NIR Analysis Registration)

Manages SCADA (Supervisory Control and Data Acquisition) analysis data for semi-finished and finished products.

Entities

  • SemielaboradoAnalisis - Semi-finished product analysis
  • SemielaboradoAnalisisReal - Real-time semi-finished analysis
  • SemielaboradoAnalisisNIR - NIR analysis for semi-finished products
  • ProductoTerminadoAnalisis - Finished product analysis
  • ProductoTerminadoAnalisisReal - Real-time finished product analysis
  • ProductoTerminadoAnalisisNIR - NIR analysis for finished products
  • Homologacion - Product homologation records

2. ControlReceta (Recipe Control)

Manages product recipes, tolerances, and material specifications.

Entities

  • MaterialPT - Finished product materials
  • ToleranciaPT - Finished product tolerances
  • UnidadToleranciaCatalogo - Tolerance unit catalog

3. GestionRutas (Route Management)

Manages transportation routes and distances.

Entities

  • RutaDistancia - Route distance information

4. Archetype (Generic Data Operations)

Provides reusable patterns for all Databricks operations.

GraphQL Queries

RegistroNIR Queries

GetSemielaboradoAnalisisById

Retrieves a semi-finished product analysis record by ID. Location: DataBricks/Schema/Queries/RegistroNIR/ScadaSemiAnalisisQuery.cs:11-16
query GetSemielaboradoAnalisisById($id: String!) {
  getSemielaboradoAnalisisById(id: $id) {
    id
    semana
    turnoId
    fechaAnalisis
    horaMuestreo
    lote
    homologacionId
    mezclaTrigo
    molinoId
    silo
    inspectorId
    estado
    tipoHarina
    observaciones
    createdAt
    updatedAt
    active
  }
}
id
String!
required
Unique identifier for the analysis record

GetSemielaboradoAnalisisPaged

Retrieves paginated semi-finished product analysis records with filtering.
query GetSemielaboradoAnalisisPaged($filter: FilterRequestInput) {
  getSemielaboradoAnalisisPaged(filter: $filter) {
    items {
      id
      fechaAnalisis
      lote
      homologacionId
      estado
    }
    totalCount
    pageInfo {
      hasNextPage
      hasPreviousPage
    }
  }
}
filter
FilterRequestInput
Filter criteria including:
  • pagination - Page number and size
  • sort - Sort fields and direction
  • conditions - Filter conditions

GetSemielaboradoAnalisisByConditions

Retrieves all matching records based on filter conditions.
query GetSemielaboradoAnalisisByConditions($filter: FilterRequestInput) {
  getSemielaboradoAnalisisByConditions(filter: $filter) {
    id
    fechaAnalisis
    lote
    estado
  }
}

GestionRutas Queries

GetRutaDistanciaById

Retrieves route distance information by ID. Location: DataBricks/Schema/Queries/GestionRutas/RutaDistanciaQuery.cs:11-16
query GetRutaDistanciaById($id: String!) {
  getRutaDistanciaById(id: $id) {
    id
    anio
    categoria
    origen
    destino
    distanciaKm
    usuarioCreacion
    createdAt
    updatedAt
    active
  }
}
anio
Int!
Year of the route data (e.g., 2025)
categoria
String
Route category (T0, T1, T2, IMP-TRIGO, etc.)
origen
String
Origin location
destino
String
Destination location
distanciaKm
Decimal!
Distance in kilometers

GetRutaDistanciaAvailableImportYears

Retrieves available years for importing route data. Location: DataBricks/Schema/Queries/GestionRutas/RutaDistanciaQuery.cs:34-38
query GetRutaDistanciaAvailableImportYears($currentYear: String!) {
  getRutaDistanciaAvailableImportYears(currentYear: $currentYear)
}

ControlReceta Queries

GetMaterialesPorSector

Retrieves materials filtered by sector. Location: DataBricks/Schema/Queries/ControlReceta/MaterialPTQuery.cs:10-11
query GetMaterialesPorSector($sectorMaterial: String!) {
  getMaterialesPorSector(sectorMaterial: $sectorMaterial) {
    # Material fields
  }
}

GetMaterialesByConditions

Retrieves materials based on filter conditions.
query GetMaterialesByConditions($filter: FilterRequestInput) {
  getMaterialesByConditions(filter: $filter) {
    # Material fields
  }
}

Data Models

SemielaboradoAnalisis

Table: nir_semielaborado_analisis_cabecera Location: DataBricks/Entities/RegistroNIR/SemielaboradoAnalisis.cs
{
  id: string                    // Primary key
  semana: string                // Week number
  turnoId: string              // Shift ID
  fechaAnalisis: DateTime       // Analysis date
  horaMuestreo: string         // Sampling time
  lote: string                 // Batch/lot number
  homologacionId: string       // Homologation ID (flour type)
  mezclaTrigo: string          // Wheat mixture
  molinoId: string             // Mill ID
  silo: string                 // Silo identifier
  inspectorId: string          // Inspector ID
  estado: string               // Status
  tipoHarina: string           // Flour type
  observaciones: string        // Observations/notes
  createdAt: DateTime          // Creation timestamp
  updatedAt: DateTime          // Update timestamp
  active: boolean              // Active flag (default: true)
}

RutaDistancia

Table: ruta_distancia Location: DataBricks/Entities/GestionRutas/RutaDistancia.cs
{
  id: string                    // Primary key
  anio: int                     // Year (e.g., 2025)
  categoria: string             // Category (T0, T1, T2, IMP-TRIGO)
  origen: string                // Origin location
  destino: string               // Destination location
  distanciaKm: decimal          // Distance in kilometers
  usuarioCreacion: string       // Creator user
  createdAt: DateTime           // Creation timestamp
  updatedAt: DateTime           // Update timestamp
  active: boolean               // Active flag (default: true)
}

ToleranciaPT

Table: tolerancia_pt Location: DataBricks/Entities/ControlReceta/ToleranciaPT.cs
{
  productoTerminado: string     // Finished product code
  centro: string                // Center/plant code
  nombreCentro: string          // Center name
  listaMaterial: string         // Material list
  descripcionPT: string         // Product description
  versionSistemaProd: string    // Production system version
  versionTol: string            // Tolerance version
  codPro: string                // Product code
  pro: string                   // Product
  cantidad: decimal             // Quantity
  um: string                    // Unit of measure
  porcentajeMax: decimal        // Maximum percentage
  porcentajeMin: decimal        // Minimum percentage
  unidadTolerancia: string      // Tolerance unit
  usuario: string               // User
  estadoVersionSistema: string  // System version status
  fechaIngreso: string          // Entry date
  fechaActualizacion: string    // Update date
  fechaCargaSistema: string     // System load date
  validoDesde: string           // Valid from
  validoHasta: string           // Valid until
  fechaIngresoTolerancia: string // Tolerance entry date
  vigDesdeTol: string           // Tolerance valid from
  vigHastaTol: string           // Tolerance valid until
  estadoTolerancia: string      // Tolerance status
  usuarioTol: string            // Tolerance user
  temporal: string              // Temporary flag
}

Filter System

The Databricks integration uses a powerful filter system:

FilterRequestInput

{
  pagination: PaginationInput   // Page number and size
  sort: SortInput[]            // Sort criteria
  conditions: ConditionInput[] // Filter conditions
}

PaginationInput

{
  pageNumber: int  // Page number (1-based)
  pageSize: int    // Items per page
}

SortInput

{
  field: string     // Field name to sort by
  direction: string // "ASC" or "DESC"
}

ConditionInput

{
  field: string      // Field name
  operator: string   // Comparison operator (=, !=, >, <, >=, <=, LIKE, IN)
  value: string      // Comparison value
}

SQL Execution

The Databricks client provides three execution methods:

ExecuteQueryAsync

Executes a SELECT query and waits for results. Location: DataBricks/Database/DatabricksAClient.cs:119-156
Task<JsonElement> ExecuteQueryAsync(string sql)
  • Timeout: 60 seconds
  • Polling interval: 250ms
  • Returns: JSON result set

ExecuteNonQueryAsync

Executes INSERT/UPDATE/DELETE and waits for completion. Location: DataBricks/Database/DatabricksAClient.cs:158-217
Task<int> ExecuteNonQueryAsync(string sql)
  • Returns: Number of rows affected
  • Includes result validation

ExecuteNonQueryFastAsync

Executes a statement without waiting for completion (fire-and-forget). Location: DataBricks/Database/DatabricksAClient.cs:219-236
Task ExecuteNonQueryFastAsync(string sql)
  • No result waiting
  • Fast execution
  • Use for bulk operations

Transaction Management

The DatabricksUnitOfWork class manages transactions:
using (var uow = new DatabricksUnitOfWork(client))
{
    await uow.AddOperationAsync("INSERT INTO ...");
    await uow.AddOperationAsync("UPDATE ...");
    await uow.CommitAsync();
}

GraphQL Mutations

Mutations follow similar patterns across all domains:
mutation CreateSemielaboradoAnalisis($input: SemielaboradoAnalisisInput!) {
  createSemielaboradoAnalisis(input: $input) {
    id
    fechaAnalisis
    estado
  }
}

mutation UpdateRutaDistancia($id: String!, $input: RutaDistanciaInput!) {
  updateRutaDistancia(id: $id, input: $input) {
    id
    distanciaKm
    updatedAt
  }
}

Error Handling

Timeout Errors

⏳ Timeout esperando resultado de Databricks.
Occurs when query execution exceeds 60 seconds.

Execution Errors

❌ Databricks error: Estado=FAILED, Mensaje={error message}
Occurs when SQL execution fails or is canceled.

HTTP Errors

Standard HTTP status codes with detailed error messages.

Configuration

Required configuration in appsettings.json:
{
  "Databricks": {
    "BaseUrl": "https://your-databricks-workspace.azuredatabricks.net",
    "WarehouseId": "your-warehouse-id",
    "DatabricksId": "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d",
    "AuthMode": "ManagedIdentity",
    "AccessToken": "optional-static-token"
  }
}

Configuration Parameters

BaseUrl
String
required
Databricks workspace URL
WarehouseId
String
required
SQL warehouse identifier
DatabricksId
String
required
Azure Databricks resource ID for authentication
AuthMode
String
required
Authentication mode: StaticToken, AzureCli, or ManagedIdentity
AccessToken
String
Static access token (required only for StaticToken mode)

Performance Optimization

  1. Pagination: Always use pagination for large datasets
  2. Filtering: Apply filters at the database level, not in memory
  3. Caching: Token caching reduces authentication overhead
  4. Batch Operations: Use ExecuteNonQueryFastAsync for bulk inserts
  5. Connection Pooling: HttpClient is configured with connection pooling

Best Practices

  1. Use Filters: Leverage the FilterRequestInput for efficient queries
  2. Pagination: Always paginate large result sets
  3. Transaction Management: Use UnitOfWork for multi-statement operations
  4. Error Handling: Implement proper timeout and error handling
  5. Authentication: Use Managed Identity in production environments
  6. SQL Injection: Use parameterized queries (handled by SqlBuilder)
  7. Active Flags: Soft delete using active field instead of hard deletes

Build docs developers (and LLMs) love