Overview
PDF AI uses OpenAI’stext-embedding-ada-002 model to convert PDF text into vector embeddings. These embeddings enable semantic search and AI-powered chat functionality by representing document content as numerical vectors.
Configuration
Environment Variables
Add your OpenAI API key to your.env file:
Get your API key from the OpenAI Platform Dashboard
Implementation
The OpenAI integration is implemented insrc/lib/embeddings.ts:1-22:
How It Works
- Text Preprocessing: Newline characters are replaced with spaces to normalize the input
- API Call: The text is sent to OpenAI’s embedding endpoint using the
text-embedding-ada-002model - Vector Output: Returns a 1536-dimensional vector representing the semantic meaning of the text
- Error Handling: Logs and propagates errors for debugging
Usage in the Application
ThegetEmbeddings() function is called during the PDF processing pipeline (see src/lib/pinecone.ts:58):
API Parameters
The OpenAI embedding model to use. Currently configured for
text-embedding-ada-002The text to generate embeddings for. Newlines are automatically replaced with spaces
Best Practices
- Text Length: The
text-embedding-ada-002model supports up to 8,191 tokens per request - Batch Processing: For multiple documents, process embeddings in parallel using
Promise.all() - Cost Optimization: Cache embeddings to avoid regenerating them for unchanged content
- Error Handling: Always wrap API calls in try-catch blocks to handle network failures
Dependencies
openai-edge, a lightweight OpenAI client optimized for edge runtime environments.
Troubleshooting
If you encounter authentication errors, verify that your
OPEN_AI_KEY environment variable is correctly set and that your API key has sufficient credits.Common Issues
- Invalid API Key: Ensure the
OPEN_AI_KEYis correctly formatted and active - Rate Limiting: Implement exponential backoff for retry logic
- Token Limits: Split large text chunks before sending to the API
- Network Errors: Add timeout handling for slow or failed requests