Streaming Chat API

Overview

The /api/chat endpoint powers Maxw AI’s core conversational experience. It uses Anthropic Claude Sonnet 4.5 with native tool support, streaming responses, and extended thinking capabilities for complex reasoning tasks.

This endpoint returns streaming responses using the AI SDK’s UIMessageStream format. Responses include text chunks, tool calls, and thinking blocks in real-time.

Endpoint

POST /api/chat

Authentication

Requires a valid user session via Better-Auth. The endpoint extracts user context from session headers.

const authData = await auth.api.getSession({ headers: await headers() });
if (!authData?.user) {
  return new Response(JSON.stringify({ error: "Unauthorized" }), { status: 401 });
}

Request Body

messages

UIMessage[]

required

Array of conversation messages. Each message follows the AI SDK UIMessage format with role (“user” | “assistant” | “system”) and content (string or multi-part content).

interface UIMessage {
  role: "user" | "assistant" | "system";
  content: string | Array<TextPart | ImagePart | ToolCallPart>;
  experimental_providerMetadata?: {
    anthropic?: {
      containerId?: string; // For container persistence
    };
  };
}

string

required

Unique chat session identifier (chatId). Used for context persistence and associating messages with a conversation.

trigger

string

Optional trigger type for the chat interaction (e.g., “user”, “suggestion”, “auto”).

Agent Context Building

The endpoint automatically builds rich context for the AI agent on each request:

AgentContext Type Definition

interface AgentContext {
  userId: string;           // Authenticated user ID
  fullName: string;         // User's display name
  schoolName: string;       // User's educational institution
  classes: CanvasCourse[];  // Canvas LMS courses
  currentDateTime: string;  // Localized current date/time
  timezone: string;         // User's timezone (from geolocation or browser)
  chatId: string;          // Current chat session ID
  country?: string;        // User's country (from Vercel geolocation)
  city?: string;           // User's city
  region?: string;         // User's region/state
}

Context Collection Process

route.ts:24-86

Geolocation Extraction: Captures user location from Vercel request headers
```
const location = geolocation(request);
```

User Authentication: Validates session and retrieves user profile

const authData = await auth.api.getSession({ headers: await headers() });
const userId = authData.user.id;
const fullName = authData.user.name;

Canvas Integration: Loads user’s enrolled courses

const classesResponse = await getAllCanvasCourses();
const classes = typeof classesResponse === "string" ? [] : classesResponse;

Timezone Detection: Uses geolocation or browser defaults

const timezone = location.countryRegion || Intl.DateTimeFormat().resolvedOptions().timeZone;

AI Model Configuration

The endpoint uses Claude Sonnet 4.5 with several advanced features:

Extended Thinking

Enables deep reasoning for complex queries with a 10,000 token budget:

thinking: {
  type: "enabled" as const,
  budgetTokens: 10000,
}

Extended thinking allows the model to work through complex problems step-by-step before responding, improving accuracy for mathematical reasoning, planning, and multi-step tasks.

Container Skills

Persistent execution environment with document processing capabilities: route.ts:108-124

container: {
  id: existingContainerId, // Reuse container from previous messages
  skills: [
    { type: "anthropic" as const, skillId: "pptx" },  // PowerPoint creation/editing
    { type: "anthropic" as const, skillId: "docx" },  // Word document handling
    { type: "anthropic" as const, skillId: "xlsx" },  // Excel spreadsheet processing
    { type: "anthropic" as const, skillId: "pdf" },   // PDF generation
    { type: "custom" as const, skillId: "skill_01VmZ8Be2T5orYF7i1YiBUTv" },
    { type: "custom" as const, skillId: "skill_01KxC6EtBShVCeb2jVEs7gXW" },
  ],
}

Containers persist files across conversation turns and expire after ~4.5 minutes of inactivity. The existingContainerId is extracted from the last assistant message to maintain continuity.

Prompt Caching

Optimizes token usage by caching static content: route.ts:133-156

// Cache system prompt (largest static content)
const systemPromptWithCache: ModelMessage = {
  role: "system",
  content: buildSystemPrompt(context),
  providerOptions: {
    anthropic: { cacheControl: { type: "ephemeral" } },
  },
};

// Cache conversation history at strategic breakpoint
const messagesWithCache = modelMessages.map((msg, idx) => {
  if (idx === Math.max(0, modelMessages.length - 4)) {
    return {
      ...msg,
      providerOptions: {
        anthropic: { cacheControl: { type: "ephemeral" } },
      },
    };
  }
  return msg;
});

Tool Execution

The agent has access to native and custom tools:

Native Anthropic Tools

code_execution

Python sandbox for calculations and programmatic tool calling

web_search

Location-aware web search (uses geolocation from context)

memory

Persistent user memory with filesystem-like interface

web_fetch

Fetch and process web content from URLs

Custom Tools

Canvas LMS Tools

searchContent: Semantic search over Canvas assignments, pages, syllabus
getClassAssignments: Fetch assignments (programmatic calling only)

Todo Management Tools (Programmatic Only)

getTodos: Retrieve user’s todo list with filtering
createTodo: Create new tasks with Canvas linking
updateTodo: Modify existing todos
deleteTodo: Permanently delete tasks

Study Tools

createStudySet: Generate flashcards with term/definition pairs

Tools marked “programmatic calling only” are invoked from within Python code execution blocks, enabling efficient batch operations without round-trip latency.

Streaming Response

The endpoint returns a streaming response using AI SDK’s toUIMessageStreamResponse(): route.ts:158-195

const agent = new ToolLoopAgent({
  model: anthropic("claude-sonnet-4-5"),
  instructions: systemPromptWithCache,
  tools,
  providerOptions: providerOptions,
  prepareStep: forwardAnthropicContainerIdFromLastStep,
  onFinish: ({ steps }) => {
    // Log cache performance metrics
    const lastStep = steps[steps.length - 1];
    const metadata = lastStep?.providerMetadata?.anthropic;
    console.log("Cache read tokens:", metadata.cacheReadInputTokens);
    console.log("Steps completed:", steps.length);
  },
});

const result = await agent.stream({
  messages: messagesWithCache,
  experimental_transform: smoothStream({ chunking: "word" }),
});

return result.toUIMessageStreamResponse();

Stream Format

The stream includes:

Text chunks: Streamed word-by-word for smooth rendering
Tool calls: Real-time tool execution notifications
Thinking blocks: Extended reasoning steps (when enabled)
Provider metadata: Container IDs, cache metrics, token usage

import { useChat } from 'ai/react';

const { messages, input, handleSubmit } = useChat({
  api: '/api/chat',
  body: {
    id: chatId,
    trigger: 'user',
  },
});

Error Handling

The endpoint includes comprehensive error handling:

400 Bad Request

error

Missing or invalid request parameters

{
  "error": "No messages provided or invalid format"
}

{
  "error": "No chat ID provided"
}

401 Unauthorized

error

Invalid or missing session token

{
  "error": "Unauthorized"
}

500 Internal Server Error

error

Model or tool execution errors

{
  "error": "An error occurred while processing your request",
  "details": "Error message details"
}

Performance Optimizations

Cache Metrics

The onFinish callback logs cache performance:

const lastStep = steps[steps.length - 1];
const metadata = lastStep?.providerMetadata?.anthropic;

console.log("Cache creation tokens:", metadata.cacheCreationInputTokens ?? 0);
console.log("Cache read tokens:", metadata.cacheReadInputTokens ?? 0);
console.log("Total input tokens:", lastStep.usage?.inputTokens ?? 0);

const cacheHitRate = (cacheReadTokens / totalInputTokens) * 100;
console.log("Cache hit rate:", `${Math.round(cacheHitRate)}%`);

Typical cache hit rates range from 60-80% for multi-turn conversations, significantly reducing latency and costs.

Smooth Streaming

Word-level chunking improves perceived responsiveness:

experimental_transform: smoothStream({ chunking: "word" })

Agent Configuration

System prompt and tool configuration

Custom Tools

Canvas, Todo, and Study tool implementations

Memory System

Persistent user memory across conversations

Metadata API

Chat title and suggestion generation

Getting Started

Chat API

Database Schema

Streaming Chat API

Overview

Endpoint

Authentication

Request Body

Agent Context Building

Context Collection Process

AI Model Configuration

Extended Thinking

Container Skills

Prompt Caching

Tool Execution

Native Anthropic Tools

code_execution

web_search

memory

web_fetch

Custom Tools

Streaming Response

Stream Format

Error Handling

Performance Optimizations

Cache Metrics

Smooth Streaming

Agent Configuration

Custom Tools

Memory System

Metadata API

Build docs developers (and LLMs) love

Getting Started

Chat API

Database Schema

​Overview

​Endpoint

​Authentication

​Request Body

​Agent Context Building

​Context Collection Process

​AI Model Configuration

​Extended Thinking

​Container Skills

​Prompt Caching

​Tool Execution

​Native Anthropic Tools

code_execution

web_search

memory

web_fetch

​Custom Tools

​Streaming Response

​Stream Format

​Error Handling

​Performance Optimizations

​Cache Metrics

​Smooth Streaming

​Related Documentation

Agent Configuration

Custom Tools

Memory System

Metadata API

Build docs developers (and LLMs) love

Overview

Endpoint

Authentication

Request Body

Agent Context Building

Context Collection Process

AI Model Configuration

Extended Thinking

Container Skills

Prompt Caching

Tool Execution

Native Anthropic Tools

Custom Tools

Streaming Response

Stream Format

Error Handling

Performance Optimizations

Cache Metrics

Smooth Streaming

Related Documentation