AI Integration - Quail BI

Overview

Quail BI leverages large language models (LLMs) to transform natural language questions into SQL queries, generate visualizations, and provide intelligent insights. This page explains how AI is integrated throughout the platform.

Quail uses Azure OpenAI Service by default, but can be configured to use other providers supported by the Vercel AI SDK.

AI Architecture

Natural Language to SQL

Convert user questions into valid SQL queries

Chart Generation

Automatically select and configure visualizations

Query Explanation

Explain SQL queries in plain English

Azure OpenAI Integration

Configuration

Quail is configured to use Azure OpenAI Service through the Vercel AI SDK:

app/app/api/biChat/route.ts

import { createAzure } from '@ai-sdk/azure';

const azure = createAzure({
  resourceName: process.env.NEXT_PUBLIC_AZURE_RESOURCE_NAME!,
  apiKey: process.env.NEXT_PUBLIC_AZURE_API_KEY!,
});

// Model is selected based on speed mode in the request

Azure OpenAI provides enterprise-grade security, compliance, and data residency options.

Environment Variables

Configure these environment variables in .env.local:

.env.local

# Azure AI Configuration
NEXT_PUBLIC_AZURE_RESOURCE_NAME=your-resource-name
NEXT_PUBLIC_AZURE_API_KEY=your-api-key-here
NEXT_PUBLIC_AZURE_FUNCTION_ENDPOINT=https://your-function.azurewebsites.net

Azure OpenAI
OpenAI
Other Providers

Recommended for production

Enterprise security and compliance
Data residency controls
No data used for model training
SLA-backed availability

Setup:

Create Azure OpenAI resource
Deploy GPT-4 or GPT-4o model
Get endpoint URL and API key
Configure environment variables

Alternative for developmentTo use OpenAI directly, modify the provider:

import { openai } from '@ai-sdk/openai';

export const model = openai('gpt-4o');

Set OPENAI_API_KEY in your environment.

AI-Powered Features

1. Natural Language to SQL

The chat interface converts natural language questions into SQL queries:

app/app/api/biChat/route.ts

import { generateText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function POST(req: Request) {
  const { message, schema } = await req.json();
  
  const { text: sql } = await generateText({
    model,
    system: `You are a SQL expert. Generate SQL queries for the given database schema.
    
    Schema:
    ${JSON.stringify(schema, null, 2)}
    
    Rules:
    - Only generate SELECT queries (read-only)
    - Use proper SQL syntax
    - Include appropriate JOINs when needed
    - Use table aliases for clarity
    - Return only the SQL query, no explanations`,
    prompt: message,
  });
  
  return Response.json({ sql });
}

The AI receives the full database schema (tables, columns, relationships) but never sees actual data, following the Agentic Hydration pattern.

2. Chart Generation

The chart editor uses AI to select and configure visualizations:

app/app/api/chartEditor/route.ts

import { generateObject } from 'ai';
import { z } from 'zod';
import { model } from '@/lib/ai/provider';

const chartConfigSchema = z.object({
  type: z.enum(['bar', 'line', 'pie', 'scatter', 'radar', 'polar']),
  xAxis: z.string().describe('Column name for x-axis'),
  yAxis: z.array(z.string()).describe('Column names for y-axis'),
  title: z.string().describe('Chart title'),
  description: z.string().describe('Brief description'),
});

export async function POST(req: Request) {
  const { query, columns } = await req.json();
  
  const { object: config } = await generateObject({
    model,
    schema: chartConfigSchema,
    system: 'You are a data visualization expert.',
    prompt: `Generate chart config for: "${query}"
    Available columns: ${columns.join(', ')}
    
    Choose the most appropriate chart type and axes.`,
  });
  
  return Response.json({ config });
}

The AI only receives column names and the user’s query, not the actual data. This drastically reduces token usage and cost.

3. Query Explanation

The SQL editor can explain queries in plain English:

lib/actions/explain-query.ts

import { generateText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function explainQuery(sql: string): Promise<string> {
  const { text } = await generateText({
    model,
    system: `You are a SQL expert who explains queries to non-technical users.
    Explain what the query does in simple, clear language.`,
    prompt: `Explain this SQL query:

    ${sql}`,
  });
  
  return text;
}

4. Smart Suggestions

The chat interface provides contextual suggestions:

lib/ai/suggestions.ts

import { generateObject } from 'ai';
import { z } from 'zod';
import { model } from '@/lib/ai/provider';

const suggestionsSchema = z.object({
  suggestions: z.array(z.string()).max(5),
});

export async function getSuggestions(
  userMessage: string,
  schema: any
): Promise<string[]> {
  const { object } = await generateObject({
    model,
    schema: suggestionsSchema,
    system: 'Generate helpful follow-up questions based on the conversation.',
    prompt: `User asked: "${userMessage}"
    
    Database schema: ${JSON.stringify(schema)}
    
    Suggest 5 related questions they might want to ask next.`,
  });
  
  return object.suggestions;
}

Token Optimization

Agentic Hydration Pattern

Quail uses the Agentic Hydration pattern to minimize token usage:

Schema-Based Generation

Send database schema (structure) to the AI, not actual data

Sample Data Only

For chart generation, send only the first row of results

Structured Output

Use Zod schemas to constrain AI responses

Client-Side Hydration

Inject full data into components after AI generation

Token Usage Comparison

Feature	Without Optimization	With Agentic Hydration	Savings
Chart Generation	~3,500 tokens	~150 tokens	95%
SQL Generation	~2,000 tokens	~500 tokens	75%
Query Explanation	~1,000 tokens	~800 tokens	20%

These optimizations result in significant cost savings and faster response times.

Streaming Responses

Quail uses streaming for real-time AI responses:

app/app/api/chat/route.ts

import { streamText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function POST(req: Request) {
  const { messages } = await req.json();
  
  const result = await streamText({
    model,
    messages,
    system: 'You are a helpful data analysis assistant.',
  });
  
  return result.toDataStreamResponse();
}

On the client:

components/chat.tsx

import { useChat } from '@ai-sdk/react';

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat',
  });
  
  return (
    <form onSubmit={handleSubmit}>
      {messages.map(m => (
        <div key={m.id}>
          {m.role}: {m.content}
        </div>
      ))}
      <input
        value={input}
        onChange={handleInputChange}
        placeholder="Ask a question..."
      />
    </form>
  );
}

Streaming provides instant feedback and improves perceived performance.

Error Handling

Robust error handling for AI operations:

lib/ai/error-handler.ts

import { AIError } from 'ai';

export async function handleAIOperation<T>(
  operation: () => Promise<T>
): Promise<T> {
  try {
    return await operation();
  } catch (error) {
    if (error instanceof AIError) {
      // Handle specific AI errors
      if (error.name === 'AI_RateLimitError') {
        throw new Error('Rate limit exceeded. Please try again later.');
      }
      if (error.name === 'AI_InvalidRequestError') {
        throw new Error('Invalid request. Please rephrase your question.');
      }
    }
    
    // Generic error
    console.error('AI operation failed:', error);
    throw new Error('AI operation failed. Please try again.');
  }
}

Rate Limiting

Implement rate limiting to prevent abuse:

middleware.ts

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '1 m'),
});

export async function middleware(request: Request) {
  const ip = request.headers.get('x-forwarded-for') ?? 'unknown';
  const { success } = await ratelimit.limit(ip);
  
  if (!success) {
    return new Response('Rate limit exceeded', { status: 429 });
  }
  
  return NextResponse.next();
}

Without rate limiting, AI endpoints can be expensive if abused.

Cost Management

Token Tracking

Track token usage for cost monitoring:

lib/ai/tracking.ts

import { generateText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function generateWithTracking(
  prompt: string,
  userId: string
) {
  const result = await generateText({
    model,
    prompt,
  });
  
  // Log token usage
  await logTokenUsage({
    userId,
    promptTokens: result.usage?.promptTokens ?? 0,
    completionTokens: result.usage?.completionTokens ?? 0,
    totalTokens: result.usage?.totalTokens ?? 0,
  });
  
  return result.text;
}

Cost Estimation

Based on GPT-4o pricing (as of 2025):

Operation	Avg. Tokens	Cost per Request	Requests/Day	Monthly Cost
SQL Generation	500 tokens	$0.0015	100	$4.50
Chart Config	150 tokens	$0.00045	50	$0.68
Query Explanation	800 tokens	$0.0024	20	$1.44
Total				$6.62

Actual costs vary based on usage patterns and model pricing.

Best Practices

Use Structured Output

Always use Zod schemas with generateObject for predictable results

Minimize Context

Send only essential information to reduce token usage

Cache Results

Cache common queries and chart configurations

Implement Fallbacks

Have fallback behavior when AI fails

Monitor Usage

Track token usage and costs

Rate Limit

Protect against abuse with rate limiting

Agentic Hydration

Learn about the pattern that powers efficient AI generation

Chat Interface

Use natural language to query your data

Chart Generation

Create visualizations with AI assistance

Environment Setup

Configure Azure AI and other providers

Architecture

​Overview

​AI Architecture

Natural Language to SQL

Chart Generation

Query Explanation

​Azure OpenAI Integration

​Configuration

​Environment Variables

​AI-Powered Features

​1. Natural Language to SQL

​2. Chart Generation

​3. Query Explanation

​4. Smart Suggestions

​Token Optimization

​Agentic Hydration Pattern

​Token Usage Comparison

​Streaming Responses

​Error Handling

​Rate Limiting

​Cost Management

​Token Tracking

​Cost Estimation

​Best Practices

Use Structured Output

Minimize Context

Cache Results

Implement Fallbacks

Monitor Usage

Rate Limit

​Related Documentation

Agentic Hydration

Chat Interface

Chart Generation

Environment Setup

Build docs developers (and LLMs) love

Overview

AI Architecture

Azure OpenAI Integration

Configuration

Environment Variables

AI-Powered Features

1. Natural Language to SQL

2. Chart Generation

3. Query Explanation

4. Smart Suggestions

Token Optimization

Agentic Hydration Pattern

Token Usage Comparison

Streaming Responses

Error Handling

Rate Limiting

Cost Management

Token Tracking

Cost Estimation

Best Practices

Related Documentation