Skip to main content

Overview

The /api/chat endpoint powers Maxw AI’s core conversational experience. It uses Anthropic Claude Sonnet 4.5 with native tool support, streaming responses, and extended thinking capabilities for complex reasoning tasks.
This endpoint returns streaming responses using the AI SDK’s UIMessageStream format. Responses include text chunks, tool calls, and thinking blocks in real-time.

Endpoint

POST /api/chat

Authentication

Requires a valid user session via Better-Auth. The endpoint extracts user context from session headers.
const authData = await auth.api.getSession({ headers: await headers() });
if (!authData?.user) {
  return new Response(JSON.stringify({ error: "Unauthorized" }), { status: 401 });
}

Request Body

messages
UIMessage[]
required
Array of conversation messages. Each message follows the AI SDK UIMessage format with role (“user” | “assistant” | “system”) and content (string or multi-part content).
interface UIMessage {
  role: "user" | "assistant" | "system";
  content: string | Array<TextPart | ImagePart | ToolCallPart>;
  experimental_providerMetadata?: {
    anthropic?: {
      containerId?: string; // For container persistence
    };
  };
}
id
string
required
Unique chat session identifier (chatId). Used for context persistence and associating messages with a conversation.
trigger
string
Optional trigger type for the chat interaction (e.g., “user”, “suggestion”, “auto”).

Agent Context Building

The endpoint automatically builds rich context for the AI agent on each request:
interface AgentContext {
  userId: string;           // Authenticated user ID
  fullName: string;         // User's display name
  schoolName: string;       // User's educational institution
  classes: CanvasCourse[];  // Canvas LMS courses
  currentDateTime: string;  // Localized current date/time
  timezone: string;         // User's timezone (from geolocation or browser)
  chatId: string;          // Current chat session ID
  country?: string;        // User's country (from Vercel geolocation)
  city?: string;           // User's city
  region?: string;         // User's region/state
}

Context Collection Process

route.ts:24-86
  1. Geolocation Extraction: Captures user location from Vercel request headers
    const location = geolocation(request);
    
  2. User Authentication: Validates session and retrieves user profile
    const authData = await auth.api.getSession({ headers: await headers() });
    const userId = authData.user.id;
    const fullName = authData.user.name;
    
  3. Canvas Integration: Loads user’s enrolled courses
    const classesResponse = await getAllCanvasCourses();
    const classes = typeof classesResponse === "string" ? [] : classesResponse;
    
  4. Timezone Detection: Uses geolocation or browser defaults
    const timezone = location.countryRegion || Intl.DateTimeFormat().resolvedOptions().timeZone;
    

AI Model Configuration

The endpoint uses Claude Sonnet 4.5 with several advanced features:

Extended Thinking

Enables deep reasoning for complex queries with a 10,000 token budget:
thinking: {
  type: "enabled" as const,
  budgetTokens: 10000,
}
Extended thinking allows the model to work through complex problems step-by-step before responding, improving accuracy for mathematical reasoning, planning, and multi-step tasks.

Container Skills

Persistent execution environment with document processing capabilities: route.ts:108-124
container: {
  id: existingContainerId, // Reuse container from previous messages
  skills: [
    { type: "anthropic" as const, skillId: "pptx" },  // PowerPoint creation/editing
    { type: "anthropic" as const, skillId: "docx" },  // Word document handling
    { type: "anthropic" as const, skillId: "xlsx" },  // Excel spreadsheet processing
    { type: "anthropic" as const, skillId: "pdf" },   // PDF generation
    { type: "custom" as const, skillId: "skill_01VmZ8Be2T5orYF7i1YiBUTv" },
    { type: "custom" as const, skillId: "skill_01KxC6EtBShVCeb2jVEs7gXW" },
  ],
}
Containers persist files across conversation turns and expire after ~4.5 minutes of inactivity. The existingContainerId is extracted from the last assistant message to maintain continuity.

Prompt Caching

Optimizes token usage by caching static content: route.ts:133-156
// Cache system prompt (largest static content)
const systemPromptWithCache: ModelMessage = {
  role: "system",
  content: buildSystemPrompt(context),
  providerOptions: {
    anthropic: { cacheControl: { type: "ephemeral" } },
  },
};

// Cache conversation history at strategic breakpoint
const messagesWithCache = modelMessages.map((msg, idx) => {
  if (idx === Math.max(0, modelMessages.length - 4)) {
    return {
      ...msg,
      providerOptions: {
        anthropic: { cacheControl: { type: "ephemeral" } },
      },
    };
  }
  return msg;
});

Tool Execution

The agent has access to native and custom tools:

Native Anthropic Tools

code_execution

Python sandbox for calculations and programmatic tool calling

web_search

Location-aware web search (uses geolocation from context)

memory

Persistent user memory with filesystem-like interface

web_fetch

Fetch and process web content from URLs

Custom Tools

  • searchContent: Semantic search over Canvas assignments, pages, syllabus
  • getClassAssignments: Fetch assignments (programmatic calling only)
  • getTodos: Retrieve user’s todo list with filtering
  • createTodo: Create new tasks with Canvas linking
  • updateTodo: Modify existing todos
  • deleteTodo: Permanently delete tasks
  • createStudySet: Generate flashcards with term/definition pairs
Tools marked “programmatic calling only” are invoked from within Python code execution blocks, enabling efficient batch operations without round-trip latency.

Streaming Response

The endpoint returns a streaming response using AI SDK’s toUIMessageStreamResponse(): route.ts:158-195
const agent = new ToolLoopAgent({
  model: anthropic("claude-sonnet-4-5"),
  instructions: systemPromptWithCache,
  tools,
  providerOptions: providerOptions,
  prepareStep: forwardAnthropicContainerIdFromLastStep,
  onFinish: ({ steps }) => {
    // Log cache performance metrics
    const lastStep = steps[steps.length - 1];
    const metadata = lastStep?.providerMetadata?.anthropic;
    console.log("Cache read tokens:", metadata.cacheReadInputTokens);
    console.log("Steps completed:", steps.length);
  },
});

const result = await agent.stream({
  messages: messagesWithCache,
  experimental_transform: smoothStream({ chunking: "word" }),
});

return result.toUIMessageStreamResponse();

Stream Format

The stream includes:
  • Text chunks: Streamed word-by-word for smooth rendering
  • Tool calls: Real-time tool execution notifications
  • Thinking blocks: Extended reasoning steps (when enabled)
  • Provider metadata: Container IDs, cache metrics, token usage
import { useChat } from 'ai/react';

const { messages, input, handleSubmit } = useChat({
  api: '/api/chat',
  body: {
    id: chatId,
    trigger: 'user',
  },
});

Error Handling

The endpoint includes comprehensive error handling:
400 Bad Request
error
Missing or invalid request parameters
{
  "error": "No messages provided or invalid format"
}
{
  "error": "No chat ID provided"
}
401 Unauthorized
error
Invalid or missing session token
{
  "error": "Unauthorized"
}
500 Internal Server Error
error
Model or tool execution errors
{
  "error": "An error occurred while processing your request",
  "details": "Error message details"
}

Performance Optimizations

Cache Metrics

The onFinish callback logs cache performance:
const lastStep = steps[steps.length - 1];
const metadata = lastStep?.providerMetadata?.anthropic;

console.log("Cache creation tokens:", metadata.cacheCreationInputTokens ?? 0);
console.log("Cache read tokens:", metadata.cacheReadInputTokens ?? 0);
console.log("Total input tokens:", lastStep.usage?.inputTokens ?? 0);

const cacheHitRate = (cacheReadTokens / totalInputTokens) * 100;
console.log("Cache hit rate:", `${Math.round(cacheHitRate)}%`);
Typical cache hit rates range from 60-80% for multi-turn conversations, significantly reducing latency and costs.

Smooth Streaming

Word-level chunking improves perceived responsiveness:
experimental_transform: smoothStream({ chunking: "word" })

Agent Configuration

System prompt and tool configuration

Custom Tools

Canvas, Todo, and Study tool implementations

Memory System

Persistent user memory across conversations

Metadata API

Chat title and suggestion generation

Build docs developers (and LLMs) love