Chat

Endpoint

POST /api/chat

Send messages to the AI assistant and receive streaming responses. The AI uses RAG (Retrieval-Augmented Generation) to answer questions based on the PDF document content associated with the chat.

This endpoint runs on Vercel’s Edge Runtime for optimal streaming performance.

Request Body

chatId

number

required

The ID of the chat session to send messages to. Must be a valid chat ID created via /api/create-chat.

messages

array

required

Array of message objects representing the conversation history.Message Object Structure:

role (string): Either "user" or "system"
content (string): The message content

Response

Returns a streaming response using Server-Sent Events (SSE). The AI response is streamed token-by-token for real-time display.

Response Type

StreamingTextResponse

The response streams text chunks as they are generated by the AI model.

Response Headers

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Error Responses

error

string

Error message describing what went wrong

404 Not Found

Returned when the specified chat ID doesn’t exist:

{
  "error": "chat not found"
}

500 Internal Server Error

Returned when an unexpected error occurs:

{
  "error": "internal server error"
}

How It Works

Context Retrieval: The last user message is used to retrieve relevant context from the PDF via Pinecone vector search
Prompt Construction: A system prompt is created with the retrieved context and AI instructions
Streaming Response: GPT-4 generates a streaming response based on the context
Database Storage: Both user and AI messages are saved to the database

The AI will only answer questions based on the PDF content. If the answer isn’t in the context, it will respond: “I’m sorry, but I don’t know the answer to that question”.

Example Request

const response = await fetch('/api/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    chatId: 1,
    messages: [
      {
        role: 'user',
        content: 'What is the main topic of this document?'
      }
    ]
  })
});

// Handle streaming response
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  console.log(chunk); // Process each chunk
}

Example Streaming Response

The response is streamed token-by-token:

The main topic of this document is...

(Text appears progressively as tokens are generated)

AI Model Configuration

Model: GPT-4 Turbo (gpt-4-1106-preview)
Temperature: Default (not specified, typically 1.0)
Streaming: Enabled
Context: Dynamically retrieved from PDF via semantic search

Message Persistence

Messages are automatically saved to the database:

onStart: User message is saved when streaming begins
onCompletion: AI response is saved when streaming completes

For multi-turn conversations, include the full message history in the messages array. The API filters and processes messages appropriately.

Context Retrieval

The endpoint uses semantic search to find relevant PDF content:

Last user message is embedded using OpenAI embeddings
Similar vectors are retrieved from Pinecone (top-k results)
Retrieved text chunks are injected into the system prompt
AI generates response based on this context

Best Practices

Include conversation history for context-aware responses
Keep individual messages under 4000 tokens for optimal performance
Handle streaming responses properly in your client
Implement error handling for network issues during streaming
Display a loading state while waiting for the first token

Architecture

Integrations

API Reference

Endpoint

Request Body

Response

Response Type

Response Headers

Error Responses

404 Not Found

500 Internal Server Error

How It Works

Example Request

Example Streaming Response

AI Model Configuration

Message Persistence

Context Retrieval

Best Practices

Build docs developers (and LLMs) love

Architecture

Integrations

API Reference

​Endpoint

​Request Body

​Response

​Response Type

​Response Headers

​Error Responses

​404 Not Found

​500 Internal Server Error

​How It Works

​Example Request

​Example Streaming Response

​AI Model Configuration

​Message Persistence

​Context Retrieval

​Best Practices

Build docs developers (and LLMs) love

Endpoint

Request Body

Response

Response Type

Response Headers

Error Responses

404 Not Found

500 Internal Server Error

How It Works

Example Request

Example Streaming Response

AI Model Configuration

Message Persistence

Context Retrieval

Best Practices