Skip to main content

Overview

The Insights Cache system generates AI-powered insights from query results and caches them to avoid redundant API calls. It combines:
  • Statistical analysis - Means, medians, outliers, trends, and performance metrics
  • AI interpretation - Natural language insights generated by Gemini 2.5 Flash
  • Intelligent caching - Content-based cache keys using data signatures
  • Real-time analysis - Automatic insight generation for new queries
Insights are cached based on tableName + query + rowCount + columnCount, ensuring consistent results for identical datasets.

Schema

See the insightsCache table definition for complete field reference.

Core Functions

generateInsights

Generate AI insights from statistical analysis results.
tableName
string
required
DuckDB table name being analyzed
query
string
required
SQL query that produced the data
statisticalAnalyses
array
required
Array of statistical analysis results for each numeric column
rowCount
number
required
Number of rows analyzed
columnCount
number
required
Number of columns analyzed
forceRefresh
boolean
default:"false"
Bypass cache and regenerate insights
returns
object

Implementation

convex/insights.ts
export const generateInsights = action({
  args: {
    tableName: v.string(),
    query: v.string(),
    statisticalAnalyses: v.array(/* ... */),
    rowCount: v.number(),
    columnCount: v.number(),
    forceRefresh: v.optional(v.boolean()),
  },
  handler: async (ctx, {
    tableName,
    query,
    statisticalAnalyses,
    rowCount,
    columnCount,
    forceRefresh,
  }) => {
    await checkAuth(ctx)
    
    // Filter to numeric columns only
    const analyses = statisticalAnalyses.filter(a => a.hasNumericData)
    
    if (analyses.length === 0) {
      return {
        insights: [],
        statisticalFindings: [],
        error: 'No numeric data to analyze',
        cached: false,
      }
    }
    
    // Generate cache key
    const cacheKey = generateCacheKey(tableName, query, columnCount, rowCount)
    
    // Check cache first
    if (!forceRefresh) {
      const cached = await ctx.runQuery(api.insights.getCachedInsights, {
        cacheKey,
      })
      if (cached) {
        return { ...cached, error: null }
      }
    }
    
    // Build statistical findings summary
    const findings: string[] = []
    analyses.forEach(analysis => {
      if (!analysis.stats) return
      
      const { columnName, stats, outliers, timeTrend } = analysis
      
      findings.push(
        `${columnName}: Mean=${stats.mean.toFixed(2)}, Median=${stats.median.toFixed(2)}`
      )
      
      if (outliers && outliers.length > 0) {
        findings.push(
          `${columnName}: Found ${outliers.length} outliers (>2 std dev)`
        )
      }
      
      if (timeTrend) {
        findings.push(
          `${columnName}: ${timeTrend.trend} trend (${timeTrend.avgChangePercent.toFixed(2)}%)`
        )
      }
    })
    
    // Generate AI insights using Gemini
    const result = await generateObject({
      model: google_gemini.languageModel('gemini-2.5-flash'),
      schema: z.object({
        insights: z.array(
          z.object({
            title: z.string(),
            description: z.string(),
            type: z.enum(['outlier', 'trend', 'aggregation', 'pattern', 'anomaly']),
            severity: z.enum(['low', 'medium', 'high']),
          })
        ).min(1).max(5),
      }),
      prompt: `
DATASET ANALYSIS:
- Table: ${tableName}
- Rows: ${rowCount}
- Numeric columns: ${analyses.length}

STATISTICAL FINDINGS:
${findings.join('\n')}

Provide 3-5 key insights focusing on patterns, anomalies, trends, and actionable observations.
      `,
    })
    
    // Cache the insights
    await ctx.runMutation(api.insights.cacheInsights, {
      cacheKey,
      tableName,
      query,
      dataHash: cacheKey,
      insights: result.object.insights,
      statisticalFindings: analyses,
    })
    
    return {
      insights: result.object.insights,
      statisticalFindings: analyses,
      error: null,
      cached: false,
    }
  },
})

getCachedInsights

Retrieve cached insights by cache key.
cacheKey
string
required
Hash generated from tableName|query|columnCount|rowCount
returns
object | null
Cached insights object, or null if not found
convex/insights.ts
export const getCachedInsights = query({
  args: { cacheKey: v.string() },
  handler: async (ctx, { cacheKey }) => {
    const userId = await getAuthUserId(ctx)
    if (!userId) return null
    
    const cached = await ctx.db
      .query('insightsCache')
      .withIndex('by_cacheKey', q => q.eq('cacheKey', cacheKey))
      .filter(q => q.eq(q.field('userId'), userId))
      .first()
    
    if (!cached) return null
    
    return {
      insights: cached.insights,
      statisticalFindings: cached.statisticalFindings,
      cached: true,
      createdAt: cached.createdAt,
    }
  },
})

cacheInsights

Store generated insights in the cache.
cacheKey
string
required
Cache lookup key
tableName
string
required
Table name
query
string
required
SQL query
dataHash
string
required
Data signature hash
insights
array
required
Generated insights array
statisticalFindings
any
required
Statistical analysis results
returns
Id<'insightsCache'>
ID of the cached entry (creates new or updates existing)
convex/insights.ts
export const cacheInsights = mutation({
  args: {
    cacheKey: v.string(),
    tableName: v.string(),
    query: v.string(),
    dataHash: v.string(),
    insights: v.array(/* ... */),
    statisticalFindings: v.any(),
  },
  handler: async (ctx, args) => {
    const userId = await getAuthUserId(ctx)
    if (!userId) throw new Error('Not authenticated')
    
    // Check if cache entry exists
    const existing = await ctx.db
      .query('insightsCache')
      .withIndex('by_cacheKey', q => q.eq('cacheKey', args.cacheKey))
      .filter(q => q.eq(q.field('userId'), userId))
      .first()
    
    if (existing) {
      // Update existing cache
      await ctx.db.patch(existing._id, {
        insights: args.insights,
        statisticalFindings: args.statisticalFindings,
        createdAt: Date.now(), // Refresh timestamp
      })
      return existing._id
    } else {
      // Create new cache entry
      return await ctx.db.insert('insightsCache', {
        ...args,
        createdAt: Date.now(),
        userId,
      })
    }
  },
})

Cache Key Generation

The cache key is generated by hashing the data signature:
convex/insights.ts
function simpleHash(str: string): string {
  let hash = 0
  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i)
    hash = ((hash << 5) - hash) + char
    hash = hash & hash // Convert to 32bit integer
  }
  return Math.abs(hash).toString(36)
}

function generateCacheKey(
  tableName: string,
  query: string,
  columnCount: number,
  rowCount: number
): string {
  const dataSignature = `${tableName}|${query}|${columnCount}|${rowCount}`
  return simpleHash(dataSignature)
}
The cache key changes when:
  • Table name changes
  • Query changes (different WHERE clause, columns, etc.)
  • Number of rows or columns changes (data updated)

Usage Example

Frontend Integration
import { useAction } from 'convex/react'
import { api } from '@/convex/_generated/api'

function DataInsights({ tableName, queryResult }) {
  const generateInsights = useAction(api.insights.generateInsights)
  const [insights, setInsights] = useState(null)
  const [loading, setLoading] = useState(false)
  
  const analyzeData = async () => {
    setLoading(true)
    
    // Perform statistical analysis on query results
    const statisticalAnalyses = queryResult.columns.map(col => ({
      columnName: col.name,
      columnType: col.type,
      hasNumericData: isNumericType(col.type),
      stats: calculateStats(queryResult.rows, col.name),
      outliers: findOutliers(queryResult.rows, col.name),
      topPerformers: getTopValues(queryResult.rows, col.name, 5),
      bottomPerformers: getBottomValues(queryResult.rows, col.name, 5),
    }))
    
    const result = await generateInsights({
      tableName,
      query: queryResult.query,
      statisticalAnalyses,
      rowCount: queryResult.rows.length,
      columnCount: queryResult.columns.length,
    })
    
    setInsights(result)
    setLoading(false)
  }
  
  return (
    <div>
      <button onClick={analyzeData} disabled={loading}>
        {loading ? 'Generating insights...' : 'Analyze Data'}
      </button>
      
      {insights && (
        <div>
          {insights.cached && (
            <span>Cached from {new Date(insights.createdAt).toLocaleString()}</span>
          )}
          
          {insights.insights.map((insight, i) => (
            <div key={i} className={`insight-${insight.severity}`}>
              <h3>{insight.title}</h3>
              <p>{insight.description}</p>
              <span className="badge">{insight.type}</span>
            </div>
          ))}
        </div>
      )}
    </div>
  )
}

Insight Types

The AI generates insights in five categories:
Values that deviate significantly from the norm (>2 standard deviations)Example: “Revenue spike of $1.2M detected in Q3, 3.5x above quarterly average”
Directional patterns over time (increasing, decreasing, stable)Example: “Customer satisfaction declining 12% month-over-month since June”
Summary statistics and aggregate findingsExample: “Top 10% of customers account for 67% of total revenue”
Recurring behaviors or correlationsExample: “Sales peak every Monday at 2pm, suggesting optimal campaign timing”
Unusual events or data quality issuesExample: “47 orders with negative prices detected, potential data entry error”

Performance Optimization

Cache Hit Rate

Monitor cache effectiveness:
function calculateCacheHitRate(logs: InsightLog[]) {
  const cached = logs.filter(log => log.cached).length
  return (cached / logs.length) * 100
}
Target: >60% cache hit rate for typical workloads.

Statistical Analysis Offloading

Perform statistical analysis in DuckDB queries rather than in JavaScript for large datasets:
-- Calculate stats in DuckDB
SELECT 
  AVG(revenue) as mean,
  MEDIAN(revenue) as median,
  STDDEV(revenue) as stddev,
  MIN(revenue) as min,
  MAX(revenue) as max
FROM sales
This is 100-1000x faster than analyzing in JavaScript for datasets >10k rows.

AI Model Configuration

Meridian uses Gemini 2.5 Flash for insight generation:
convex/table_agent.ts
import { createGoogleGenerativeAI } from '@ai-sdk/google'

const google_gemini = createGoogleGenerativeAI({
  apiKey: process.env.GEMINI_API_KEY,
})

export const model = google_gemini.languageModel('gemini-2.5-flash')
Benefits of Gemini 2.5 Flash:
  • Fast response times (under 2 seconds typical)
  • Low cost per request
  • Strong reasoning for data analysis
  • Structured output support via generateObject

See Also

Schema Reference

Complete insightsCache table definition

Query Log

Query execution tracking

AI SDK Docs

Vercel AI SDK documentation

Build docs developers (and LLMs) love