Skip to main content

Overview

Quail BI leverages large language models (LLMs) to transform natural language questions into SQL queries, generate visualizations, and provide intelligent insights. This page explains how AI is integrated throughout the platform.
Quail uses Azure OpenAI Service by default, but can be configured to use other providers supported by the Vercel AI SDK.

AI Architecture

Natural Language to SQL

Convert user questions into valid SQL queries

Chart Generation

Automatically select and configure visualizations

Query Explanation

Explain SQL queries in plain English

Azure OpenAI Integration

Configuration

Quail is configured to use Azure OpenAI Service through the Vercel AI SDK:
app/app/api/biChat/route.ts
import { createAzure } from '@ai-sdk/azure';

const azure = createAzure({
  resourceName: process.env.NEXT_PUBLIC_AZURE_RESOURCE_NAME!,
  apiKey: process.env.NEXT_PUBLIC_AZURE_API_KEY!,
});

// Model is selected based on speed mode in the request
Azure OpenAI provides enterprise-grade security, compliance, and data residency options.

Environment Variables

Configure these environment variables in .env.local:
.env.local
# Azure AI Configuration
NEXT_PUBLIC_AZURE_RESOURCE_NAME=your-resource-name
NEXT_PUBLIC_AZURE_API_KEY=your-api-key-here
NEXT_PUBLIC_AZURE_FUNCTION_ENDPOINT=https://your-function.azurewebsites.net
Recommended for production
  • Enterprise security and compliance
  • Data residency controls
  • No data used for model training
  • SLA-backed availability
Setup:
  1. Create Azure OpenAI resource
  2. Deploy GPT-4 or GPT-4o model
  3. Get endpoint URL and API key
  4. Configure environment variables

AI-Powered Features

1. Natural Language to SQL

The chat interface converts natural language questions into SQL queries:
app/app/api/biChat/route.ts
import { generateText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function POST(req: Request) {
  const { message, schema } = await req.json();
  
  const { text: sql } = await generateText({
    model,
    system: `You are a SQL expert. Generate SQL queries for the given database schema.
    
    Schema:
    ${JSON.stringify(schema, null, 2)}
    
    Rules:
    - Only generate SELECT queries (read-only)
    - Use proper SQL syntax
    - Include appropriate JOINs when needed
    - Use table aliases for clarity
    - Return only the SQL query, no explanations`,
    prompt: message,
  });
  
  return Response.json({ sql });
}
The AI receives the full database schema (tables, columns, relationships) but never sees actual data, following the Agentic Hydration pattern.

2. Chart Generation

The chart editor uses AI to select and configure visualizations:
app/app/api/chartEditor/route.ts
import { generateObject } from 'ai';
import { z } from 'zod';
import { model } from '@/lib/ai/provider';

const chartConfigSchema = z.object({
  type: z.enum(['bar', 'line', 'pie', 'scatter', 'radar', 'polar']),
  xAxis: z.string().describe('Column name for x-axis'),
  yAxis: z.array(z.string()).describe('Column names for y-axis'),
  title: z.string().describe('Chart title'),
  description: z.string().describe('Brief description'),
});

export async function POST(req: Request) {
  const { query, columns } = await req.json();
  
  const { object: config } = await generateObject({
    model,
    schema: chartConfigSchema,
    system: 'You are a data visualization expert.',
    prompt: `Generate chart config for: "${query}"
    Available columns: ${columns.join(', ')}
    
    Choose the most appropriate chart type and axes.`,
  });
  
  return Response.json({ config });
}
The AI only receives column names and the user’s query, not the actual data. This drastically reduces token usage and cost.

3. Query Explanation

The SQL editor can explain queries in plain English:
lib/actions/explain-query.ts
import { generateText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function explainQuery(sql: string): Promise<string> {
  const { text } = await generateText({
    model,
    system: `You are a SQL expert who explains queries to non-technical users.
    Explain what the query does in simple, clear language.`,
    prompt: `Explain this SQL query:

    ${sql}`,
  });
  
  return text;
}

4. Smart Suggestions

The chat interface provides contextual suggestions:
lib/ai/suggestions.ts
import { generateObject } from 'ai';
import { z } from 'zod';
import { model } from '@/lib/ai/provider';

const suggestionsSchema = z.object({
  suggestions: z.array(z.string()).max(5),
});

export async function getSuggestions(
  userMessage: string,
  schema: any
): Promise<string[]> {
  const { object } = await generateObject({
    model,
    schema: suggestionsSchema,
    system: 'Generate helpful follow-up questions based on the conversation.',
    prompt: `User asked: "${userMessage}"
    
    Database schema: ${JSON.stringify(schema)}
    
    Suggest 5 related questions they might want to ask next.`,
  });
  
  return object.suggestions;
}

Token Optimization

Agentic Hydration Pattern

Quail uses the Agentic Hydration pattern to minimize token usage:
1

Schema-Based Generation

Send database schema (structure) to the AI, not actual data
2

Sample Data Only

For chart generation, send only the first row of results
3

Structured Output

Use Zod schemas to constrain AI responses
4

Client-Side Hydration

Inject full data into components after AI generation

Token Usage Comparison

FeatureWithout OptimizationWith Agentic HydrationSavings
Chart Generation~3,500 tokens~150 tokens95%
SQL Generation~2,000 tokens~500 tokens75%
Query Explanation~1,000 tokens~800 tokens20%
These optimizations result in significant cost savings and faster response times.

Streaming Responses

Quail uses streaming for real-time AI responses:
app/app/api/chat/route.ts
import { streamText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function POST(req: Request) {
  const { messages } = await req.json();
  
  const result = await streamText({
    model,
    messages,
    system: 'You are a helpful data analysis assistant.',
  });
  
  return result.toDataStreamResponse();
}
On the client:
components/chat.tsx
import { useChat } from '@ai-sdk/react';

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat',
  });
  
  return (
    <form onSubmit={handleSubmit}>
      {messages.map(m => (
        <div key={m.id}>
          {m.role}: {m.content}
        </div>
      ))}
      <input
        value={input}
        onChange={handleInputChange}
        placeholder="Ask a question..."
      />
    </form>
  );
}
Streaming provides instant feedback and improves perceived performance.

Error Handling

Robust error handling for AI operations:
lib/ai/error-handler.ts
import { AIError } from 'ai';

export async function handleAIOperation<T>(
  operation: () => Promise<T>
): Promise<T> {
  try {
    return await operation();
  } catch (error) {
    if (error instanceof AIError) {
      // Handle specific AI errors
      if (error.name === 'AI_RateLimitError') {
        throw new Error('Rate limit exceeded. Please try again later.');
      }
      if (error.name === 'AI_InvalidRequestError') {
        throw new Error('Invalid request. Please rephrase your question.');
      }
    }
    
    // Generic error
    console.error('AI operation failed:', error);
    throw new Error('AI operation failed. Please try again.');
  }
}

Rate Limiting

Implement rate limiting to prevent abuse:
middleware.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '1 m'),
});

export async function middleware(request: Request) {
  const ip = request.headers.get('x-forwarded-for') ?? 'unknown';
  const { success } = await ratelimit.limit(ip);
  
  if (!success) {
    return new Response('Rate limit exceeded', { status: 429 });
  }
  
  return NextResponse.next();
}
Without rate limiting, AI endpoints can be expensive if abused.

Cost Management

Token Tracking

Track token usage for cost monitoring:
lib/ai/tracking.ts
import { generateText } from 'ai';
import { model } from '@/lib/ai/provider';

export async function generateWithTracking(
  prompt: string,
  userId: string
) {
  const result = await generateText({
    model,
    prompt,
  });
  
  // Log token usage
  await logTokenUsage({
    userId,
    promptTokens: result.usage?.promptTokens ?? 0,
    completionTokens: result.usage?.completionTokens ?? 0,
    totalTokens: result.usage?.totalTokens ?? 0,
  });
  
  return result.text;
}

Cost Estimation

Based on GPT-4o pricing (as of 2025):
OperationAvg. TokensCost per RequestRequests/DayMonthly Cost
SQL Generation500 tokens$0.0015100$4.50
Chart Config150 tokens$0.0004550$0.68
Query Explanation800 tokens$0.002420$1.44
Total$6.62
Actual costs vary based on usage patterns and model pricing.

Best Practices

Use Structured Output

Always use Zod schemas with generateObject for predictable results

Minimize Context

Send only essential information to reduce token usage

Cache Results

Cache common queries and chart configurations

Implement Fallbacks

Have fallback behavior when AI fails

Monitor Usage

Track token usage and costs

Rate Limit

Protect against abuse with rate limiting

Agentic Hydration

Learn about the pattern that powers efficient AI generation

Chat Interface

Use natural language to query your data

Chart Generation

Create visualizations with AI assistance

Environment Setup

Configure Azure AI and other providers

Build docs developers (and LLMs) love