Skip to main content

System Architecture

PDF AI is built as a full-stack Next.js application that leverages AI and vector databases to enable intelligent conversations with PDF documents. The architecture follows a serverless edge-first approach for optimal performance and scalability.

High-Level Components

The system consists of four main architectural layers:
  1. Frontend Layer - Next.js 13+ with React Server Components
  2. API Layer - Edge Runtime API routes for low-latency responses
  3. Data Layer - PostgreSQL (via Drizzle ORM) for structured data
  4. AI/ML Layer - OpenAI embeddings and GPT-4 for semantic understanding
The application runs on Vercel’s Edge Runtime, ensuring global distribution and sub-100ms response times for most operations.

Tech Stack

Core Framework

  • Next.js 13+ - React framework with App Router
  • TypeScript - Type-safe development
  • Edge Runtime - Deployed on Vercel Edge Network

AI & Vector Database

  • OpenAI API - text-embedding-ada-002 for embeddings, gpt-4-1106-preview for chat
  • Pinecone - Serverless vector database for semantic search
  • LangChain - Document loading and text splitting utilities

Data & Storage

  • PostgreSQL - Relational database for chats, messages, and user data
  • Drizzle ORM - Type-safe database queries
  • AWS S3 - Object storage for PDF files

Authentication & Payments

  • Clerk - User authentication and management
  • Stripe - Subscription billing

Data Flow

Document Upload Flow

When a user uploads a PDF document, the following sequence occurs:
  1. Upload to S3: PDF file is uploaded to AWS S3 bucket
  2. Download: Server downloads PDF from S3 (src/lib/s3-server.ts:3)
  3. Load PDF: LangChain PDFLoader extracts text from all pages
  4. Split Documents: Text is chunked using RecursiveCharacterTextSplitter
  5. Generate Embeddings: Each chunk is converted to a 1536-dimension vector
  6. Store in Pinecone: Vectors are upserted with metadata (page number, text)
  7. Create Chat: Database record created linking user to document

Query Flow

When a user asks a question about their document:
// src/app/api/chat/route.ts
export async function POST(req: NextRequest) {
  const { messages, chatId } = await req.json();
  
  // 1. Fetch chat metadata
  const _chats = await db.select().from(chats).where(eq(chats.id, chatId));
  const fileKey = _chats[0].fileKey;
  
  // 2. Get relevant context from vector DB
  const lastMessage = messages[messages.length - 1];
  const context = await getContext(lastMessage.content, fileKey);
  
  // 3. Build prompt with context
  const prompt = {
    role: "system",
    content: `AI assistant...
    START CONTEXT BLOCK
    ${context}
    END OF CONTEXT BLOCK
    ...`
  };
  
  // 4. Stream response from GPT-4
  const response = await openai.createChatCompletion({
    model: "gpt-4-1106-preview",
    messages: [prompt, ...messages],
    stream: true,
  });
  
  return new StreamingTextResponse(OpenAIStream(response));
}

Database Schema

The application uses three primary tables:
// src/lib/db/schema.ts
export const chats = pgTable('chats', {
  id: serial('id').primaryKey(),
  pdfName: text('pdf_name').notNull(),
  pdfUrl: text('pdf_url').notNull(),
  createdAt: timestamp('created_at').notNull().defaultNow(),
  userId: varchar('user_id', { length: 255 }).notNull(),
  fileKey: text('file_key').notNull(),
})

Performance Optimizations

The system is optimized for both cold starts and sustained performance.

Edge Runtime

All API routes use export const runtime = "edge" to run on Vercel’s Edge Network, providing:
  • Sub-100ms cold starts
  • Global distribution
  • Automatic scaling
  • Lower costs compared to serverless functions

Streaming Responses

The chat API streams responses using the Vercel AI SDK:
const stream = OpenAIStream(response, {
  onStart: async () => {
    // Save user message to DB
    await db.insert(dbMessages).values({...});
  },
  onCompletion: async (completion) => {
    // Save AI response to DB
    await db.insert(dbMessages).values({...});
  },
});
return new StreamingTextResponse(stream);
This provides instant feedback to users as the AI generates responses.

Pinecone Client Singleton

The Pinecone client is initialized once and reused across requests:
// src/lib/pinecone.ts:12
let pinecone: Pinecone | null = null;

export const getPineconeClient = async () => {
  if (!pinecone) {
    pinecone = new Pinecone({
      apiKey: process.env.PINECONE_API_KEY!,
      environment: process.env.PINECONE_ENVIRONMENT!,
    });
  }
  return pinecone;
};
Ensure all environment variables are properly configured in production. Missing API keys will cause runtime errors.

Security Considerations

Authentication

Clerk middleware protects all authenticated routes:
// src/app/api/create-chat/route.ts:9
const { userId } = getAuth(req);
if (!userId) {
  return NextResponse.json(
    { error: "Authentication error" },
    { status: 401 }
  );
}

Data Isolation

Pinecone namespaces ensure users can only access their own documents:
// src/lib/pinecone.ts:49
const namespace = pineconeIndex.namespace(convertToAscii(fileKey));
Each PDF is stored in a unique namespace, preventing cross-document data leakage.

Environment Variables

Sensitive credentials are stored as environment variables:
  • PINECONE_API_KEY - Pinecone authentication
  • OPEN_AI_KEY - OpenAI API access
  • NEXT_PUBLIC_S3_ACCESS_KEY_ID - AWS S3 credentials
  • NEXT_PUBLIC_S3_SECRET_ACCESS_KEY - AWS S3 secret
  • Database connection strings

Scalability

The architecture supports horizontal scaling:
  • Stateless API routes - No server-side session state
  • Serverless vector database - Pinecone auto-scales
  • Edge distribution - Globally distributed compute
  • Object storage - S3 handles unlimited PDFs
The system can handle thousands of concurrent users and millions of documents with no architectural changes required.

Build docs developers (and LLMs) love