Insights Tool

Overview

The Insights Tool enables AI agents to perform data analysis and generate actionable insights. It includes capabilities for data quality assessment, table comparisons, and database exploration.

Core Functions

generateInsights

Generate AI-powered insights from data analysis.

tableName

string

required

The name of the table to analyze for insights.

query

string

Optional SQL query to filter the data before analysis. If not provided, analyzes the entire table.

forceRefresh

boolean

default:false

Force refresh insights even if cached results exist.

Response

{
  success: boolean
  message?: string
  tableName: string
  rowCount: number
  columnCount: number
  note?: string
  error?: string
}

Example Usage

const insights = await generateInsights.handler(ctx, {
  tableName: 'customers',
  query: 'SELECT * FROM customers WHERE signup_date > \'2024-01-01\''
})

if (insights.success) {
  console.log(`Analyzed ${insights.rowCount} rows, ${insights.columnCount} columns`)
  console.log(insights.message)
}

analyzeDataQuality

Analyze data quality of a table, checking for null values, duplicates, and other issues.

tableName

string

required

The name of the table to analyze.

Response

{
  success: boolean
  tableName: string
  rowCount: number
  columnCount: number
  qualityScore: number  // 0-100 score
  issueCount: number
  issues: Array<{
    column: string
    issue: string
    severity: 'low' | 'medium' | 'high'
    details: string
  }>
  summary: string
  error?: string
}

Example Usage

const quality = await analyzeDataQuality.handler(ctx, {
  tableName: 'orders'
})

if (quality.success) {
  console.log(`Quality Score: ${quality.qualityScore}/100`)
  console.log(`Found ${quality.issueCount} issues:`)
  
  quality.issues.forEach(issue => {
    console.log(`  [${issue.severity}] ${issue.column}: ${issue.issue}`)
    console.log(`    ${issue.details}`)
  })
}

compareTables

Compare two tables to find differences, similarities, or relationships.

table1

string

required

Name of the first table.

table2

string

required

Name of the second table.

compareColumns

boolean

default:true

Whether to compare column schemas.

compareRowCounts

boolean

default:true

Whether to compare row counts.

Response

{
  success: boolean
  table1: {
    name: string
    rowCount: number
    columnCount: number
  }
  table2: {
    name: string
    rowCount: number
    columnCount: number
  }
  comparison: {
    rowCountDifference: number
    commonColumns: number
    onlyInTable1: number
    onlyInTable2: number
  }
  details: {
    commonColumns: Array<{ name: string; type: string }>
    onlyInTable1: Array<{ name: string; type: string }>
    onlyInTable2: Array<{ name: string; type: string }>
  }
  summary: string
  error?: string
}

Example Usage

const comparison = await compareTables.handler(ctx, {
  table1: 'products_2023',
  table2: 'products_2024'
})

if (comparison.success) {
  console.log(comparison.summary)
  console.log('Common columns:', comparison.details.commonColumns.length)
  console.log('Only in 2023:', comparison.details.onlyInTable1.length)
  console.log('Only in 2024:', comparison.details.onlyInTable2.length)
}

getTableList

Get a list of all available tables in the database.

Response

{
  success: boolean
  tables: string[]
  count: number
  message: string
  error?: string
}

Example Usage

const tableList = await getTableList.handler(ctx)

if (tableList.success) {
  console.log(`Found ${tableList.count} tables:`)
  tableList.tables.forEach(table => console.log(`  - ${table}`))
}

Data Quality Analysis Details

The analyzeDataQuality function performs comprehensive checks:

Null Value Detection

From table_agent.ts:315-345:

// Check for nulls in each column
const nullCheckQuery = `
  SELECT COUNT(*) as total, COUNT(${colName}) as non_null
  FROM ${tableName}
`
const nullPercent = ((total - nonNull) / total) * 100

if (nullPercent > 50) {
  issues.push({
    column: colName,
    issue: 'High null percentage',
    severity: 'high',
    details: `${nullPercent.toFixed(1)}% of values are null`
  })
} else if (nullPercent > 20) {
  issues.push({
    column: colName,
    issue: 'Moderate null percentage',
    severity: 'medium',
    details: `${nullPercent.toFixed(1)}% of values are null`
  })
}

Duplicate Detection

For columns that appear to be identifiers:

if (colName.toLowerCase().includes('id') || 
    colName.toLowerCase().includes('key')) {
  const duplicateQuery = `
    SELECT ${colName}, COUNT(*) as cnt
    FROM ${tableName}
    GROUP BY ${colName}
    HAVING COUNT(*) > 1
    LIMIT 10
  `
  
  if (dupResult.rows.length > 0) {
    issues.push({
      column: colName,
      issue: 'Duplicate values found',
      severity: 'high',
      details: `Found ${dupResult.rows.length} duplicate values`
    })
  }
}

Empty String Detection

For text columns:

if (colType.includes('varchar') || colType.includes('text')) {
  const emptyQuery = `
    SELECT COUNT(*) as empty_count
    FROM ${tableName}
    WHERE ${colName} = '' OR ${colName} IS NULL
  `
  
  if (emptyCount > 0) {
    issues.push({
      column: colName,
      issue: 'Empty string values',
      severity: 'low',
      details: `Found ${emptyCount} empty or null string values`
    })
  }
}

Quality Score Calculation

From table_agent.ts:416-424:

qualityScore = issues.length === 0 ? 100 :
  Math.max(0,
    100 - 
    issues.length * 10 - 
    issues.filter(i => i.severity === 'high').length * 20
  )

Scoring:

Start at 100
Deduct 10 points per issue
Deduct additional 20 points per high-severity issue
Minimum score is 0

Table Comparison Implementation

The compareTables tool performs parallel queries for efficiency (agent_tools.ts:676-708):

// Get schemas in parallel
const [schema1Result, schema2Result] = await Promise.all([
  ctx.runAction(api.table_agent.fetchDuckDBQuery, {
    query: `DESCRIBE ${args.table1}`
  }),
  ctx.runAction(api.table_agent.fetchDuckDBQuery, {
    query: `DESCRIBE ${args.table2}`
  }),
])

// Get row counts in parallel
const [count1Result, count2Result] = await Promise.all([
  ctx.runAction(api.table_agent.fetchDuckDBQuery, {
    query: `SELECT COUNT(*) as count FROM ${args.table1}`
  }),
  ctx.runAction(api.table_agent.fetchDuckDBQuery, {
    query: `SELECT COUNT(*) as count FROM ${args.table2}`
  }),
])

Column Comparison Logic

// Find common columns (same name and type)
const commonColumns = columns1.filter((col1) =>
  columns2.some((col2) => 
    col2.name === col1.name && col2.type === col1.type
  )
)

// Find columns only in table1
const onlyInTable1 = columns1.filter((col1) =>
  !columns2.some((col2) => col2.name === col1.name)
)

// Find columns only in table2
const onlyInTable2 = columns2.filter((col2) =>
  !columns1.some((col1) => col1.name === col2.name)
)

Usage in Agent Workflows

Both Query and Analysis agents have access to insights tools:

const analysis_agent = new Agent(components.agent, {
  name: 'analysis_agent',
  languageModel: model,
  instructions: `
You are an assistant that explores and analyzes databases.

Use the available tools to:
- Assess data quality, compare tables, and list tables
- Query and inspect DuckDB tables
- Visualize or analyze data
`,
  tools: {
    analyzeDataQuality,
    compareTables,
    getTableList,
    generateInsights,
    // ... other tools
  },
})

Example Agent Workflow

User: “Check the quality of my customer data”

Agent calls getTableList() to find customer-related tables
Agent calls analyzeDataQuality('customers')
Agent receives quality score and issues
Agent summarizes findings:
- “Your customer table has a quality score of 75/100”
- “Found 3 issues: 25% null values in ‘email’ column, duplicate IDs detected”
Agent may call generateInsights for additional analysis

Implementation Details

Tool Definition

export const analyzeDataQuality = createTool({
  description:
    'Analyze data quality of a table. Checks for null values, duplicates, empty strings, and other data quality issues.',
  args: z.object({
    tableName: z.string().describe('The name of the table to analyze'),
  }),
  handler: async (ctx, args) => {
    try {
      const result = await ctx.runAction(
        api.table_agent.analyzeDataQualityAction,
        { tableName: args.tableName }
      )
      return truncateToolResponse(result)
    } catch (error) {
      return {
        success: false,
        error: error instanceof Error ? error.message : 'Unknown error'
      }
    }
  },
})

Response Truncation

Insights tool responses are truncated for efficiency:

Issues array: Maximum 10 items
String fields: Maximum 2,000 characters
Truncation flag: _truncatedIssues set to true if truncated

Error Handling

Table not found

{
  "success": false,
  "error": "Table 'unknown_table' not found or has no columns"
}

Solution: Use getTableList to verify table names.

No data to analyze

{
  "success": false,
  "error": "No data found to analyze"
}

Solution: Verify table contains data before analysis.

Query execution failed

{
  "success": false,
  "error": "Invalid column name in quality check"
}

Solution: Check table schema integrity.

Issue Severity Levels

High Severity

Critical issues requiring immediate attention:

High null percentage (>50%)
Duplicate IDs/keys
Data type mismatches

Medium Severity

Issues that should be addressed:

Moderate null percentage (20-50%)
Inconsistent formatting
Missing expected columns

Low Severity

Minor issues for awareness:

Empty strings
Trailing/leading whitespace
Unused columns

Best Practices

Run quality checks regularly

Schedule periodic data quality assessments to catch issues early.

Compare before and after transformations

Use compareTables to validate data migrations and transformations.

Address high-severity issues first

Focus on fixing issues marked as “high” severity before others.

Use insights for optimization

Let insights guide query optimization and data modeling decisions.

Query Tool

Execute queries for detailed data inspection

Chart Tool

Visualize quality metrics and trends

Query Editor

Manually fix data quality issues

Backend API

Agent Tools

DuckDB Integration

Overview

Core Functions

generateInsights

Response

Example Usage

analyzeDataQuality

Response

Example Usage

compareTables

Response

Example Usage

getTableList

Response

Example Usage

Data Quality Analysis Details

Null Value Detection

Duplicate Detection

Empty String Detection

Quality Score Calculation

Table Comparison Implementation

Column Comparison Logic

Usage in Agent Workflows

Example Agent Workflow

Implementation Details

Tool Definition

Response Truncation

Error Handling

Issue Severity Levels

High Severity

Medium Severity

Low Severity

Best Practices

Query Tool

Chart Tool

Query Editor

Build docs developers (and LLMs) love

Backend API

Agent Tools

DuckDB Integration

​Overview

​Core Functions

​generateInsights

​Response

​Example Usage

​analyzeDataQuality

​Response

​Example Usage

​compareTables

​Response

​Example Usage

​getTableList

​Response

​Example Usage

​Data Quality Analysis Details

​Null Value Detection

​Duplicate Detection

​Empty String Detection

​Quality Score Calculation

​Table Comparison Implementation

​Column Comparison Logic

​Usage in Agent Workflows

​Example Agent Workflow

​Implementation Details

​Tool Definition

​Response Truncation

​Error Handling

​Issue Severity Levels

High Severity

Medium Severity

Low Severity

​Best Practices

​Related Tools

Query Tool

Chart Tool

Query Editor

Build docs developers (and LLMs) love

Overview

Core Functions

generateInsights

Response

Example Usage

analyzeDataQuality

Response

Example Usage

compareTables

Response

Example Usage

getTableList

Response

Example Usage

Data Quality Analysis Details

Null Value Detection

Duplicate Detection

Empty String Detection

Quality Score Calculation

Table Comparison Implementation

Column Comparison Logic

Usage in Agent Workflows

Example Agent Workflow

Implementation Details

Tool Definition

Response Truncation

Error Handling

Issue Severity Levels

Best Practices

Related Tools