Endpoint
This endpoint runs on Vercel’s Edge Runtime for optimal streaming performance.
Request Body
The ID of the chat session to send messages to. Must be a valid chat ID created via
/api/create-chat.Array of message objects representing the conversation history.Message Object Structure:
role(string): Either"user"or"system"content(string): The message content
Response
Returns a streaming response using Server-Sent Events (SSE). The AI response is streamed token-by-token for real-time display.Response Type
Response Headers
Error Responses
Error message describing what went wrong
404 Not Found
Returned when the specified chat ID doesn’t exist:500 Internal Server Error
Returned when an unexpected error occurs:How It Works
- Context Retrieval: The last user message is used to retrieve relevant context from the PDF via Pinecone vector search
- Prompt Construction: A system prompt is created with the retrieved context and AI instructions
- Streaming Response: GPT-4 generates a streaming response based on the context
- Database Storage: Both user and AI messages are saved to the database
Example Request
Example Streaming Response
The response is streamed token-by-token:AI Model Configuration
- Model: GPT-4 Turbo (
gpt-4-1106-preview) - Temperature: Default (not specified, typically 1.0)
- Streaming: Enabled
- Context: Dynamically retrieved from PDF via semantic search
Message Persistence
Messages are automatically saved to the database:- onStart: User message is saved when streaming begins
- onCompletion: AI response is saved when streaming completes
Context Retrieval
The endpoint uses semantic search to find relevant PDF content:- Last user message is embedded using OpenAI embeddings
- Similar vectors are retrieved from Pinecone (top-k results)
- Retrieved text chunks are injected into the system prompt
- AI generates response based on this context
Best Practices
- Include conversation history for context-aware responses
- Keep individual messages under 4000 tokens for optimal performance
- Handle streaming responses properly in your client
- Implement error handling for network issues during streaming
- Display a loading state while waiting for the first token