Skip to main content

Overview

The AI Chat feature enables real-time conversations with your PDF documents using OpenAI’s GPT-4 model. Messages stream in real-time, and the AI provides context-aware answers by retrieving relevant sections from your document using Pinecone vector search.

Key Features

Streaming Responses

Real-time streaming responses using Vercel AI SDK for instant feedback as the AI generates answers.

Context-Aware

Semantic search retrieves the top 5 most relevant document sections with similarity scores above 0.7.

Message History

All messages are persisted in the database and loaded automatically when you return to a chat.

Auto-Scroll

Chat automatically scrolls to the latest message for seamless conversation flow.

How It Works

1

Ask a Question

Type your question in the chat input at the bottom of the screen. Questions can be about any content in your uploaded PDF.
2

Semantic Search

Your question is converted to vector embeddings and compared against your PDF’s content in Pinecone to find the most relevant sections.
3

Context Retrieval

The top 5 matching sections (with similarity score > 0.7) are retrieved and combined into a context block of up to 3,000 characters.
4

AI Response

GPT-4 generates a streaming response based on the retrieved context, appearing word-by-word in real-time.

Chat Interface

The chat component is built with React hooks and the Vercel AI SDK:
src/components/ChatComponent.tsx
const ChatComponent = ({ chatId }: Props) => {
  const { data, isLoading } = useQuery({
    queryKey: ["chats", chatId],
    queryFn: async () => {
      const response = await fetch("/api/get-messages", {
        method: "POST",
        body: JSON.stringify({ chatId }),
      });
      if (!response.ok) {
        throw new Error("Error in fetching messages");
      }
      const data = (await response.json()) as Message[];
      return data;
    },
  });

  const { input, handleInputChange, handleSubmit, messages } = useChat({
    api: "/api/chat",
    body: {
      chatId,
    },
    initialMessages: data || [],
  });
  
  useEffect(() => {
    const messageContainer = document.getElementById("message-container");
    if (messageContainer) {
      messageContainer.scrollTo({
        top: messageContainer.scrollHeight,
        behavior: "smooth",
      });
    }
  });

  return (
    <div className="relative max-h-screen overflow-scroll" id="message-container">
      <div className="sticky top-0 inset-x-0 p-2 bg-white h-fit">
        <h3 className="text-xl font-bold">Chat</h3>
      </div>

      <MessageList messages={messages} isLoading={isLoading} />
      
      <form onSubmit={handleSubmit} className="sticky bottom-0 inset-x-0 px-2 py-4 bg-white">
        <div className="flex">
          <Input
            value={input}
            onChange={handleInputChange}
            placeholder="Ask any question..."
            className="w-full"
          />
          <Button className="bg-blue-600 ml-2">
            <Send className="h-4 w-4" />
          </Button>
        </div>
      </form>
    </div>
  );
};

Context Retrieval

The AI retrieves relevant context from your PDF using vector similarity search:
src/lib/context.ts
export async function getContext(query: string, fileKey: string) {
  const queryEmbeddings = await getEmbeddings(query);
  const matches = await getMatchesFromEmbeddings(queryEmbeddings, fileKey);

  const qualifyingDocs = matches.filter(
    (match) => match.score && match.score > 0.7
  );

  type Metadata = {
    text: string;
    pageNumber: number;
  };

  let docs = qualifyingDocs.map((match) => (match.metadata as Metadata).text);
  // Return top 5 vectors, limited to 3000 characters
  return docs.join("\n").substring(0, 3000);
}
The system retrieves the top 5 most similar document chunks and filters results with a similarity score above 0.7 to ensure relevance. The combined context is limited to 3,000 characters to fit within GPT-4’s context window efficiently.

Streaming API

The chat API uses OpenAI’s streaming capabilities via Vercel Edge Runtime:
src/app/api/chat/route.ts
export const runtime = "edge";

const config = new Configuration({
  apiKey: process.env.OPEN_AI_KEY,
});
const openai = new OpenAIApi(config);

export async function POST(req: NextRequest) {
  try {
    const { messages, chatId } = await req.json();
    const _chats = await db.select().from(chats).where(eq(chats.id, chatId));
    if (_chats.length != 1) {
      return NextResponse.json({ error: "chat not found" }, { status: 404 });
    }
    const fileKey = _chats[0].fileKey;
    const lastMessage = messages[messages.length - 1];
    const context = await getContext(lastMessage.content, fileKey);

    const prompt = {
      role: "system",
      content: `AI assistant is a brand new, powerful, human-like artificial intelligence.
      The traits of AI include expert knowledge, helpfulness, cleverness, and articulateness.
      AI is a well-behaved and well-mannered individual.
      AI is always friendly, kind, and inspiring, and he is eager to provide vivid and thoughtful responses to the user.
      AI has the sum of all knowledge in their brain, and is able to accurately answer nearly any question about any topic in conversation.
      AI assistant is a big fan of Pinecone and Vercel.
      START CONTEXT BLOCK
      ${context}
      END OF CONTEXT BLOCK
      AI assistant will take into account any CONTEXT BLOCK that is provided in a conversation.
      If the context does not provide the answer to question, the AI assistant will say, "I'm sorry, but I don't know the answer to that question".
      AI assistant will not apologize for previous responses, but instead will indicated new information was gained.
      AI assistant will not invent anything that is not drawn directly from the context.
      `,
    };

    const response = await openai.createChatCompletion({
      model: "gpt-4-1106-preview",
      messages: [
        prompt,
        ...messages.filter((message: Message) => message.role === "user"),
      ],
      stream: true,
    });
    
    const stream = OpenAIStream(response, {
      onStart: async () => {
        // Save user's message to database
        await db.insert(dbMessages).values({
          chatId,
          content: lastMessage.content,
          role: "user",
        });
      },
      onCompletion: async (completion) => {
        // Save AI's response to database
        await db.insert(dbMessages).values({
          chatId,
          content: completion,
          role: "system",
        });
      },
    });
    return new StreamingTextResponse(stream);
  } catch (error) {
    return NextResponse.json(
      { error: "internal server error" },
      { status: 500 }
    );
  }
}
The API runs on Vercel’s Edge Runtime for ultra-low latency responses. Messages are saved to the database both when the user sends them (onStart) and when the AI completes the response (onCompletion).

Message Display

Messages are rendered with distinct styling for user and AI responses:
src/components/MessageList.tsx
const MessageList = ({ messages, isLoading }: Props) => {
  if (isLoading) {
    return (
      <div className="absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2">
        <Loader2 className="w-6 h-6 animate-spin" />
      </div>
    );
  }
  if (!messages) return <></>;
  return (
    <div className="flex flex-col gap-2 px-4">
      {messages.map((message) => {
        return (
          <div
            key={message.id}
            className={cn("flex", {
              "justify-end pl-10": message.role === "user",
              "justify-start pr-10": message.role === "assistant",
            })}
          >
            <div
              className={cn(
                "rounded-lg px-3 text-sm py-1 shadow-md ring-1 ring-gray-900/10",
                {
                  "bg-blue-600 text-white": message.role === "user",
                }
              )}
            >
              <p>{message.content}</p>
            </div>
          </div>
        );
      })}
    </div>
  );
};

AI Behavior

The AI is configured with specific behavioral guidelines:
  • Context-bound responses: Only answers based on the provided document context
  • Honest limitations: States when information isn’t available in the context
  • No hallucination: Never invents information not present in the document
  • Incremental learning: Acknowledges new information rather than apologizing
If your question cannot be answered from the PDF content, the AI will respond with “I’m sorry, but I don’t know the answer to that question” rather than making up information.

Message Persistence

All chat messages are stored in a PostgreSQL database using Drizzle ORM:
src/lib/db/schema.ts
export const messages = pgTable('messages', {
  id: serial('id').primaryKey(),
  chatId: integer('chat_id').references(() => chats.id).notNull(),
  content: text('content').notNull(),
  createdAt: timestamp('created_at').notNull().defaultNow(),
  role: userSystemEnum('role').notNull(),
})

Best Practices

The AI performs best with specific, focused questions. Instead of “What is this document about?”, try “What are the key findings in section 3?” or “How does the author define X?”
Since the context includes page numbers in metadata, you can ask questions like “What does page 5 say about…?” for more targeted responses.
The chat maintains conversation history, so you can ask follow-up questions that reference previous answers.

Environment Variables

.env.local
OPEN_AI_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_key
PINECONE_ENVIRONMENT=your_pinecone_environment

Build docs developers (and LLMs) love