How PDF AI Works
PDF AI uses a sophisticated RAG (Retrieval-Augmented Generation) architecture to enable intelligent conversations with your documents. This guide explains the technical implementation and AI workflows that power the platform.This page covers technical concepts including vector embeddings, semantic search, and LLM integration. It’s designed for users who want to understand the technology behind PDF AI.
Architecture Overview
PDF AI combines multiple AI technologies into a seamless workflow:Phase 1: Document Processing
When you upload a PDF, it goes through several processing stages.Step 1: Secure Upload to S3
Your PDF is first uploaded to AWS S3 for secure, scalable storage.Step 2: Loading the PDF
The PDF is downloaded from S3 and loaded using LangChain’sPDFLoader.
LangChain’s PDFLoader extracts text content page by page, preserving metadata like page numbers.
Step 3: Document Splitting
Large documents are split into smaller, manageable chunks for better context retrieval.- Better Context: Smaller chunks provide more precise context for queries
- Token Limits: LLMs have maximum input sizes; chunks stay within limits
- Improved Accuracy: Focused chunks reduce noise and improve relevance scoring
Step 4: Creating Vector Embeddings
Each document chunk is converted to a vector embedding—a numerical representation of the text’s semantic meaning.text-embedding-ada-002) that capture semantic meaning. Similar texts have similar vectors, enabling semantic search.
How Embeddings Enable Semantic Search
How Embeddings Enable Semantic Search
Traditional keyword search looks for exact matches. Vector embeddings enable semantic search:
- Query: “What is the main finding?”
- Matches: “The primary conclusion…”, “The key result shows…”
Why text-embedding-ada-002?
Why text-embedding-ada-002?
OpenAI’s
text-embedding-ada-002 model offers:- High quality semantic representations
- 1536-dimensional vectors
- Cost-effective pricing
- Fast generation times
- Excellent performance on retrieval tasks
Step 5: Storing in Pinecone
Embeddings are stored in Pinecone, a specialized vector database optimized for similarity search.- Index:
aipdf(the main database) - Namespace: One per PDF (identified by file key)
- Vectors: Each chunk stored with its embedding and metadata
Using namespaces isolates each PDF’s embeddings, ensuring queries only search within the correct document.
Phase 2: Query & Response
When you ask a question, PDF AI retrieves relevant context and generates an answer.Step 1: Query Embedding
Your question is converted to a vector embedding using the same model.Step 2: Vector Similarity Search
Pinecone finds the most similar document chunks using cosine similarity.The 0.7 threshold ensures only highly relevant context is included, reducing noise and improving answer quality.
Step 3: LLM Generation with Context
The retrieved context is combined with your question and sent to GPT-4.- System Prompt: Instructs GPT-4 to only use provided context
- Context Injection: Retrieved chunks are inserted in the CONTEXT BLOCK
- Streaming: Responses stream word-by-word for better UX
- Persistence: All messages saved to database for chat history
Step 4: Streaming Response
Responses stream to the client in real-time using Vercel AI SDK.useChat hook handles streaming, message state, and UI updates automatically.
Why RAG Architecture?
RAG (Retrieval-Augmented Generation) solves key limitations of LLMs:Accuracy
LLMs alone can hallucinate. RAG grounds responses in actual document content.
Context Length
GPT-4 has token limits. RAG retrieves only relevant chunks, not entire documents.
Up-to-Date Info
LLMs have training cutoffs. RAG works with your latest documents.
Cost Efficiency
Sending only relevant chunks reduces token usage and API costs significantly.
Performance Optimizations
Edge Functions
PDF AI runs on Vercel Edge Functions for ultra-low latency.Efficient Chunking
Document splitting usesRecursiveCharacterTextSplitter which:
- Preserves sentence and paragraph boundaries
- Maintains semantic coherence within chunks
- Optimizes chunk sizes for embedding and retrieval
Namespace Isolation
Each PDF gets its own Pinecone namespace, ensuring:- Fast, focused queries (search only one document)
- No cross-document contamination
- Easy document deletion
Technology Stack Deep Dive
OpenAI GPT-4 (gpt-4-1106-preview)
OpenAI GPT-4 (gpt-4-1106-preview)
Model: GPT-4 Turbo with 128k context windowWhy GPT-4?
- Superior reasoning and comprehension
- Better at following complex instructions
- More accurate with nuanced queries
- Excellent at staying within provided context
Pinecone Vector Database
Pinecone Vector Database
Purpose: Store and search vector embeddingsFeatures Used:
- Cosine similarity search
- Namespace isolation per document
- Metadata filtering (page numbers, text)
- Sub-millisecond query times
aipdf with 1536 dimensions (matching embedding model)LangChain
LangChain
Purpose: Document processing and PDF loadingComponents Used:
PDFLoader: Extract text from PDFsRecursiveCharacterTextSplitter: Intelligent document chunkingDocument: Structured document representation
AWS S3
AWS S3
Purpose: Secure, scalable file storageFeatures:
- Server-side encryption
- Presigned URLs for secure access
- High availability and durability
- Integration with Vercel serverless
Next.js 13 App Router
Next.js 13 App Router
Purpose: Full-stack React frameworkFeatures Used:
- Server Components for efficient data fetching
- API Routes for backend endpoints
- Client Components for interactive UI
- Built-in TypeScript support
Data Flow Summary
Here’s the complete flow from upload to response:Limitations & Considerations
- 10MB File Limit: Keeps processing times reasonable and costs manageable
- Text-Based PDFs Only: Scanned or image PDFs require OCR preprocessing
- Context Window: Only top 5 chunks (≤3000 chars) included as context
- Similarity Threshold: 0.7 score threshold may miss marginally relevant content
- No Cross-Document Search: Each chat is isolated to one PDF
Future Enhancements
PDF AI’s architecture is designed for extensibility:- Multi-document chat: Search across multiple PDFs simultaneously
- OCR support: Process scanned documents and images
- Custom embeddings: Fine-tuned models for domain-specific documents
- Larger context windows: Support for GPT-4 Turbo’s full 128k tokens
- Citation tracking: Highlight specific PDF sections in responses
Next Steps
Try It Yourself
Upload a PDF and see RAG in action
API Reference
Explore the API endpoints and integration options