Insights Cache API

Overview

The Insights Cache system generates AI-powered insights from query results and caches them to avoid redundant API calls. It combines:

Statistical analysis - Means, medians, outliers, trends, and performance metrics
AI interpretation - Natural language insights generated by Gemini 2.5 Flash
Intelligent caching - Content-based cache keys using data signatures
Real-time analysis - Automatic insight generation for new queries

Insights are cached based on tableName + query + rowCount + columnCount, ensuring consistent results for identical datasets.

Schema

See the insightsCache table definition for complete field reference.

Core Functions

generateInsights

Generate AI insights from statistical analysis results.

tableName

string

required

DuckDB table name being analyzed

query

string

required

SQL query that produced the data

statisticalAnalyses

array

required

Array of statistical analysis results for each numeric column

Show Statistical Analysis Object

columnName

string

required

Name of the analyzed column

columnType

string

required

Data type (e.g., INTEGER, DOUBLE, VARCHAR)

hasNumericData

boolean

required

Whether the column contains numeric data for analysis

stats

object

Basic statistics (only if hasNumericData is true)

Show Stats Fields

mean

number

Mean value

median

number

Median value

stdDev

number

Standard deviation

min

number

Minimum value

max

number

Maximum value

number

First quartile (25th percentile)

number

Third quartile (75th percentile)

iqr

number

Interquartile range (Q3 - Q1)

count

number

Number of non-null values

outliers

array

Values more than 2 standard deviations from mean

Show Outlier Object

value

number

The outlier value

zScore

number

Z-score (number of std devs from mean)

topPerformers

array

Highest values (top 5)

bottomPerformers

array

Lowest values (bottom 5)

timeTrend

object

Time-series trend analysis (if temporal data detected)

Show Trend Object

trend

enum

Direction: increasing, decreasing, or stable

avgChangePercent

number

Average period-over-period change

totalChangePercent

number

Total change from first to last

rowCount

number

required

Number of rows analyzed

columnCount

number

required

Number of columns analyzed

forceRefresh

boolean

default:"false"

Bypass cache and regenerate insights

returns

object

Show Response Fields

insights

array

required

Array of AI-generated insights

Show Insight Object

title

string

Short title (e.g., “Revenue Spike Detected”)

description

string

1-2 sentence explanation

type

enum

Category: outlier, trend, aggregation, pattern, anomaly

severity

enum

Importance: low, medium, high

statisticalFindings

array

required

Full statistical analysis results (same as input)

cached

boolean

required

Whether the result was served from cache

createdAt

number

Timestamp when insights were generated (only if cached)

error

string | null

Error message if generation failed

Implementation

convex/insights.ts

export const generateInsights = action({
  args: {
    tableName: v.string(),
    query: v.string(),
    statisticalAnalyses: v.array(/* ... */),
    rowCount: v.number(),
    columnCount: v.number(),
    forceRefresh: v.optional(v.boolean()),
  },
  handler: async (ctx, {
    tableName,
    query,
    statisticalAnalyses,
    rowCount,
    columnCount,
    forceRefresh,
  }) => {
    await checkAuth(ctx)
    
    // Filter to numeric columns only
    const analyses = statisticalAnalyses.filter(a => a.hasNumericData)
    
    if (analyses.length === 0) {
      return {
        insights: [],
        statisticalFindings: [],
        error: 'No numeric data to analyze',
        cached: false,
      }
    }
    
    // Generate cache key
    const cacheKey = generateCacheKey(tableName, query, columnCount, rowCount)
    
    // Check cache first
    if (!forceRefresh) {
      const cached = await ctx.runQuery(api.insights.getCachedInsights, {
        cacheKey,
      })
      if (cached) {
        return { ...cached, error: null }
      }
    }
    
    // Build statistical findings summary
    const findings: string[] = []
    analyses.forEach(analysis => {
      if (!analysis.stats) return
      
      const { columnName, stats, outliers, timeTrend } = analysis
      
      findings.push(
        `${columnName}: Mean=${stats.mean.toFixed(2)}, Median=${stats.median.toFixed(2)}`
      )
      
      if (outliers && outliers.length > 0) {
        findings.push(
          `${columnName}: Found ${outliers.length} outliers (>2 std dev)`
        )
      }
      
      if (timeTrend) {
        findings.push(
          `${columnName}: ${timeTrend.trend} trend (${timeTrend.avgChangePercent.toFixed(2)}%)`
        )
      }
    })
    
    // Generate AI insights using Gemini
    const result = await generateObject({
      model: google_gemini.languageModel('gemini-2.5-flash'),
      schema: z.object({
        insights: z.array(
          z.object({
            title: z.string(),
            description: z.string(),
            type: z.enum(['outlier', 'trend', 'aggregation', 'pattern', 'anomaly']),
            severity: z.enum(['low', 'medium', 'high']),
          })
        ).min(1).max(5),
      }),
      prompt: `
DATASET ANALYSIS:
- Table: ${tableName}
- Rows: ${rowCount}
- Numeric columns: ${analyses.length}

STATISTICAL FINDINGS:
${findings.join('\n')}

Provide 3-5 key insights focusing on patterns, anomalies, trends, and actionable observations.
      `,
    })
    
    // Cache the insights
    await ctx.runMutation(api.insights.cacheInsights, {
      cacheKey,
      tableName,
      query,
      dataHash: cacheKey,
      insights: result.object.insights,
      statisticalFindings: analyses,
    })
    
    return {
      insights: result.object.insights,
      statisticalFindings: analyses,
      error: null,
      cached: false,
    }
  },
})

getCachedInsights

Retrieve cached insights by cache key.

cacheKey

string

required

Hash generated from tableName|query|columnCount|rowCount

returns

object | null

Cached insights object, or null if not found

convex/insights.ts

export const getCachedInsights = query({
  args: { cacheKey: v.string() },
  handler: async (ctx, { cacheKey }) => {
    const userId = await getAuthUserId(ctx)
    if (!userId) return null
    
    const cached = await ctx.db
      .query('insightsCache')
      .withIndex('by_cacheKey', q => q.eq('cacheKey', cacheKey))
      .filter(q => q.eq(q.field('userId'), userId))
      .first()
    
    if (!cached) return null
    
    return {
      insights: cached.insights,
      statisticalFindings: cached.statisticalFindings,
      cached: true,
      createdAt: cached.createdAt,
    }
  },
})

cacheInsights

Store generated insights in the cache.

cacheKey

string

required

Cache lookup key

tableName

string

required

Table name

query

string

required

SQL query

dataHash

string

required

Data signature hash

insights

array

required

Generated insights array

statisticalFindings

any

required

Statistical analysis results

returns

Id<'insightsCache'>

ID of the cached entry (creates new or updates existing)

convex/insights.ts

export const cacheInsights = mutation({
  args: {
    cacheKey: v.string(),
    tableName: v.string(),
    query: v.string(),
    dataHash: v.string(),
    insights: v.array(/* ... */),
    statisticalFindings: v.any(),
  },
  handler: async (ctx, args) => {
    const userId = await getAuthUserId(ctx)
    if (!userId) throw new Error('Not authenticated')
    
    // Check if cache entry exists
    const existing = await ctx.db
      .query('insightsCache')
      .withIndex('by_cacheKey', q => q.eq('cacheKey', args.cacheKey))
      .filter(q => q.eq(q.field('userId'), userId))
      .first()
    
    if (existing) {
      // Update existing cache
      await ctx.db.patch(existing._id, {
        insights: args.insights,
        statisticalFindings: args.statisticalFindings,
        createdAt: Date.now(), // Refresh timestamp
      })
      return existing._id
    } else {
      // Create new cache entry
      return await ctx.db.insert('insightsCache', {
        ...args,
        createdAt: Date.now(),
        userId,
      })
    }
  },
})

Cache Key Generation

The cache key is generated by hashing the data signature:

convex/insights.ts

function simpleHash(str: string): string {
  let hash = 0
  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i)
    hash = ((hash << 5) - hash) + char
    hash = hash & hash // Convert to 32bit integer
  }
  return Math.abs(hash).toString(36)
}

function generateCacheKey(
  tableName: string,
  query: string,
  columnCount: number,
  rowCount: number
): string {
  const dataSignature = `${tableName}|${query}|${columnCount}|${rowCount}`
  return simpleHash(dataSignature)
}

The cache key changes when:

Table name changes
Query changes (different WHERE clause, columns, etc.)
Number of rows or columns changes (data updated)

Usage Example

Frontend Integration

import { useAction } from 'convex/react'
import { api } from '@/convex/_generated/api'

function DataInsights({ tableName, queryResult }) {
  const generateInsights = useAction(api.insights.generateInsights)
  const [insights, setInsights] = useState(null)
  const [loading, setLoading] = useState(false)
  
  const analyzeData = async () => {
    setLoading(true)
    
    // Perform statistical analysis on query results
    const statisticalAnalyses = queryResult.columns.map(col => ({
      columnName: col.name,
      columnType: col.type,
      hasNumericData: isNumericType(col.type),
      stats: calculateStats(queryResult.rows, col.name),
      outliers: findOutliers(queryResult.rows, col.name),
      topPerformers: getTopValues(queryResult.rows, col.name, 5),
      bottomPerformers: getBottomValues(queryResult.rows, col.name, 5),
    }))
    
    const result = await generateInsights({
      tableName,
      query: queryResult.query,
      statisticalAnalyses,
      rowCount: queryResult.rows.length,
      columnCount: queryResult.columns.length,
    })
    
    setInsights(result)
    setLoading(false)
  }
  
  return (
    <div>
      <button onClick={analyzeData} disabled={loading}>
        {loading ? 'Generating insights...' : 'Analyze Data'}
      </button>
      
      {insights && (
        <div>
          {insights.cached && (
            <span>Cached from {new Date(insights.createdAt).toLocaleString()}</span>
          )}
          
          {insights.insights.map((insight, i) => (
            <div key={i} className={`insight-${insight.severity}`}>
              <h3>{insight.title}</h3>
              <p>{insight.description}</p>
              <span className="badge">{insight.type}</span>
            </div>
          ))}
        </div>
      )}
    </div>
  )
}

Insight Types

The AI generates insights in five categories:

Outlier

Values that deviate significantly from the norm (>2 standard deviations)Example: “Revenue spike of $1.2M detected in Q3, 3.5x above quarterly average”

Trend

Directional patterns over time (increasing, decreasing, stable)Example: “Customer satisfaction declining 12% month-over-month since June”

Aggregation

Summary statistics and aggregate findingsExample: “Top 10% of customers account for 67% of total revenue”

Pattern

Recurring behaviors or correlationsExample: “Sales peak every Monday at 2pm, suggesting optimal campaign timing”

Anomaly

Unusual events or data quality issuesExample: “47 orders with negative prices detected, potential data entry error”

Performance Optimization

Cache Hit Rate

Monitor cache effectiveness:

function calculateCacheHitRate(logs: InsightLog[]) {
  const cached = logs.filter(log => log.cached).length
  return (cached / logs.length) * 100
}

Target: >60% cache hit rate for typical workloads.

Statistical Analysis Offloading

Perform statistical analysis in DuckDB queries rather than in JavaScript for large datasets:

-- Calculate stats in DuckDB
SELECT 
  AVG(revenue) as mean,
  MEDIAN(revenue) as median,
  STDDEV(revenue) as stddev,
  MIN(revenue) as min,
  MAX(revenue) as max
FROM sales

This is 100-1000x faster than analyzing in JavaScript for datasets >10k rows.

AI Model Configuration

Meridian uses Gemini 2.5 Flash for insight generation:

convex/table_agent.ts

import { createGoogleGenerativeAI } from '@ai-sdk/google'

const google_gemini = createGoogleGenerativeAI({
  apiKey: process.env.GEMINI_API_KEY,
})

export const model = google_gemini.languageModel('gemini-2.5-flash')

Benefits of Gemini 2.5 Flash:

Fast response times (under 2 seconds typical)
Low cost per request
Strong reasoning for data analysis
Structured output support via generateObject

Schema Reference

Complete insightsCache table definition

Query Log

Query execution tracking

AI SDK Docs

Vercel AI SDK documentation

Backend API

Agent Tools

DuckDB Integration

Insights Cache API

Overview

Schema

Core Functions

generateInsights

Implementation

getCachedInsights

cacheInsights

Cache Key Generation

Usage Example

Insight Types

Performance Optimization

Cache Hit Rate

Statistical Analysis Offloading

AI Model Configuration

See Also

Schema Reference

Query Log

AI SDK Docs

Build docs developers (and LLMs) love

Backend API

Agent Tools

DuckDB Integration

​Overview

​Schema

​Core Functions

​generateInsights

​Implementation

​getCachedInsights

​cacheInsights

​Cache Key Generation

​Usage Example

​Insight Types

​Performance Optimization

​Cache Hit Rate

​Statistical Analysis Offloading

​AI Model Configuration

​See Also

Schema Reference

Query Log

AI SDK Docs

Build docs developers (and LLMs) love

Overview

Schema

Core Functions

generateInsights

Implementation

getCachedInsights

cacheInsights

Cache Key Generation

Usage Example

Insight Types

Performance Optimization

Cache Hit Rate

Statistical Analysis Offloading

AI Model Configuration

See Also