Skip to main content

What is a Vector Database?

A vector database is a specialized database designed to store and query high-dimensional vectors (embeddings). Unlike traditional databases that search for exact matches, vector databases find semantically similar content using mathematical distance metrics.
Vector databases enable semantic search: finding documents that mean the same thing, even if they use different words.

Why Pinecone?

PDF AI uses Pinecone as its vector database for several reasons:
  • Serverless - No infrastructure to manage
  • Fast - Sub-100ms query latency
  • Scalable - Handles billions of vectors
  • Accurate - Uses state-of-the-art approximate nearest neighbor algorithms
  • Namespace Support - Built-in data isolation

Pinecone Setup

Client Initialization

The Pinecone client is initialized as a singleton to avoid repeated authentication:
// src/lib/pinecone.ts:12
let pinecone: Pinecone | null = null;

export const getPineconeClient = async () => {
  if (!pinecone) {
    pinecone = new Pinecone({
      apiKey: process.env.PINECONE_API_KEY!,
      environment: process.env.PINECONE_ENVIRONMENT!,
    });
  }
  return pinecone;
};
The singleton pattern ensures only one client instance exists across all API requests, reducing authentication overhead.

Index Configuration

PDF AI uses a single Pinecone index named "aipdf":
// src/lib/pinecone.ts:48
const client = await getPineconeClient();
const pineconeIndex = await client.Index("aipdf");
Index Specifications:
  • Dimension: 1536 (matches OpenAI’s text-embedding-ada-002 model)
  • Metric: Cosine similarity
  • Cloud Provider: AWS (typically)
  • Region: Same as application deployment for low latency
The index must be created manually before deploying the application:
  1. Log in to Pinecone Console
  2. Click “Create Index”
  3. Set Name: aipdf
  4. Set Dimensions: 1536
  5. Set Metric: cosine
  6. Choose Serverless deployment
  7. Select your cloud provider and region
  8. Click “Create Index”

Namespace Strategy

Each PDF document is stored in its own namespace for data isolation:
// src/lib/pinecone.ts:49
const namespace = pineconeIndex.namespace(convertToAscii(fileKey));
The fileKey (S3 object key) is converted to ASCII to ensure compatibility:
// src/lib/utils.ts (inferred)
export function convertToAscii(text: string) {
  return text.replace(/[^\x00-\x7F]/g, "");
}
Namespaces are critical for security. Without them, users could access vectors from other users’ documents.

Embedding Generation

Embeddings convert text into numerical vectors that capture semantic meaning.

OpenAI Embedding Model

PDF AI uses OpenAI’s text-embedding-ada-002 model:
// src/lib/embeddings.ts:9
export async function getEmbeddings(text: string) {
  try {
    const response = await openai.createEmbedding({
      model: "text-embedding-ada-002",
      input: text.replace(/\n/g, " "),
    });
    const result = await response.json();
    return result.data[0].embedding as number[];
  } catch (error) {
    console.log("error calling openai embeddings api", error);
    throw error;
  }
}
Model Characteristics:
  • Output Dimension: 1536
  • Max Tokens: 8,191 tokens (~6,000 words)
  • Cost: $0.0001 per 1,000 tokens
  • Performance: State-of-the-art for semantic search
The text-embedding-ada-002 model is optimized for retrieval tasks and provides excellent quality-to-cost ratio.

Text Preprocessing

Before embedding, text is preprocessed:
input: text.replace(/\n/g, " ")
Newlines are replaced with spaces because:
  1. Embedding models treat newlines as semantic boundaries
  2. Whitespace normalization improves consistency
  3. Reduces token count slightly

Embedding Document Chunks

Each document chunk is embedded and prepared for Pinecone:
// src/lib/pinecone.ts:56
async function embedDocument(doc: Document) {
  try {
    const embeddings = await getEmbeddings(doc.pageContent);
    const hash = md5(doc.pageContent);
    return {
      id: hash,
      values: embeddings,
      metadata: {
        text: doc.metadata.text,
        pageNumber: doc.metadata.pageNumber,
      },
    } as PineconeRecord;
  } catch (error) {
    console.log(error);
    throw new Error("unable to embed document");
  }
}
PineconeRecord Structure:
  • id: MD5 hash of content (deterministic, prevents duplicates)
  • values: 1536-dimension embedding vector
  • metadata: Stored alongside vector for retrieval
    • text: Original chunk text (truncated to 36KB)
    • pageNumber: Source page in PDF
Benefits of MD5 hashing:
  1. Deterministic - Same content always produces same ID
  2. Deduplication - Prevents storing identical chunks multiple times
  3. Idempotent Uploads - Re-uploading same document updates existing vectors
  4. No External State - Don’t need a database to track IDs
Trade-offs:
  • MD5 collisions are theoretically possible (but extremely rare)
  • For production systems at massive scale, consider UUIDs with deduplication logic

Vector Upsert

Vectors are uploaded to Pinecone using the upsert operation:
// src/lib/pinecone.ts:44
const vectors = await Promise.all(documents.flat().map(embedDocument));

const client = await getPineconeClient();
const pineconeIndex = await client.Index("aipdf");
const namespace = pineconeIndex.namespace(convertToAscii(fileKey));

console.log("uploading to pinecone...");
await namespace.upsert(vectors);

Upsert vs Insert

Upsert = Update + Insert
  • If a vector with the same ID exists, it’s updated
  • If it doesn’t exist, it’s inserted
  • Enables idempotent uploads (safe to re-run)
Batch upserts are more efficient than individual inserts. The code uses Promise.all to embed all chunks in parallel, then uploads them together.

Upload Performance

For a typical 10-page PDF:
  • Chunks: ~50-100 (depends on text density)
  • Embedding Time: 0.3s × 100 = 30s (if sequential)
  • Parallel Embedding: ~3-5s (with Promise.all)
  • Upsert Time: ~1-2s (batch operation)
Total upload time: ~5-7 seconds Querying Pinecone to find relevant document chunks.

Query Process

// src/lib/context.ts:18
export async function getMatchesFromEmbeddings(
  embeddings: number[],
  fileKey: string
) {
  const pinecone = new PineconeClient();
  await pinecone.init({
    apiKey: process.env.PINECONE_API_KEY!,
    environment: process.env.PINECONE_ENVIRONMENT!,
  });
  const index = await pinecone.Index("aipdf");

  try {
    const namespace = convertToAscii(fileKey);
    const queryResult = await index.query({
      queryRequest: {
        topK: 5,
        vector: embeddings,
        includeMetadata: true,
        namespace,
      },
    });
    return queryResult.matches || [];
  } catch (error) {
    console.log("error querying embeddings", error);
    throw error;
  }
}

Query Parameters

topK: 5

Query Response Structure

interface QueryResponse {
  matches: Array<{
    id: string;           // MD5 hash
    score: number;        // Similarity score (0-1)
    values?: number[];    // Vector (not included by default)
    metadata?: {
      text: string;       // Original chunk text
      pageNumber: number; // Source page
    };
  }>;
}
Similarity Scores:
  • 0.9-1.0: Extremely similar (almost identical meaning)
  • 0.7-0.9: Highly relevant (strong semantic match)
  • 0.5-0.7: Somewhat relevant (related topic)
  • < 0.5: Not relevant (different topic)

Relevance Filtering

Raw query results are filtered by similarity threshold:
// src/lib/context.ts:50
const qualifyingDocs = matches.filter(
  (match) => match.score && match.score > 0.7
);
Only chunks scoring above 0.7 are used for context. This prevents low-quality matches from confusing the AI.
If no chunks score above 0.7, an empty context is returned, and the AI will respond “I don’t know.”

Cosine Similarity

Pinecone uses cosine similarity to measure vector distance.

Mathematical Definition

cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)
Where:
  • A · B is the dot product
  • ||A|| and ||B|| are vector magnitudes

Why Cosine Similarity?

  • Scale-Invariant: Measures angle, not magnitude
  • Range [0, 1]: Easy to interpret
  • Fast Computation: Optimized for high-dimensional spaces
  • Semantic Meaning: Embeddings with similar meanings have small angles
function cosineSimilarity(vecA: number[], vecB: number[]): number {
  const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

// Example vectors (simplified to 3D for illustration)
const vec1 = [1, 2, 3];     // "machine learning"
const vec2 = [1.1, 2.2, 2.9]; // "deep learning"
const vec3 = [5, -3, 1];    // "cooking recipes"

console.log(cosineSimilarity(vec1, vec2)); // ~0.998 (very similar)
console.log(cosineSimilarity(vec1, vec3)); // ~0.447 (not similar)

Performance Optimization

Indexing Speed

Pinecone builds approximate nearest neighbor (ANN) indices for fast searches:
  • Algorithm: HNSW (Hierarchical Navigable Small World)
  • Query Time: O(log n) instead of O(n)
  • Trade-off: ~99% accuracy vs brute-force

Query Latency

Typical query performance:
  • P50 Latency: 30-50ms
  • P95 Latency: 100-200ms
  • P99 Latency: 300-500ms
Latency increases with index size but remains logarithmic. A 1M-vector index queries nearly as fast as a 100K-vector index.

Parallel Embedding

The code uses Promise.all to embed chunks in parallel:
// src/lib/pinecone.ts:44
const vectors = await Promise.all(documents.flat().map(embedDocument));
Performance Impact:
  • Sequential: 100 chunks × 0.3s = 30 seconds
  • Parallel (10 concurrent): 100 chunks ÷ 10 × 0.3s = 3 seconds

Connection Pooling

The singleton pattern avoids repeated client initialization:
let pinecone: Pinecone | null = null;

export const getPineconeClient = async () => {
  if (!pinecone) {
    pinecone = new Pinecone({...});
  }
  return pinecone;
};
This saves ~100-200ms per API request.

Data Management

Namespace Operations

Each PDF gets its own namespace:
// Create/update namespace
const namespace = pineconeIndex.namespace(fileKey);
await namespace.upsert(vectors);

// Query namespace
const results = await index.query({
  queryRequest: { namespace: fileKey, ... }
});

// Delete namespace (when PDF is deleted)
await pineconeIndex.deleteOne({ deleteRequest: { namespace: fileKey } });
Important: Deleting a chat from the database doesn’t automatically delete its Pinecone namespace. Implement cleanup logic to prevent orphaned data.

Metadata Limits

Pinecone enforces metadata size limits:
  • Per Vector: 40KB
  • PDF AI Limit: 36KB (for safety margin)
// src/lib/pinecone.ts:95
text: truncateStringByBytes(pageContent, 36000)

Storage Costs

Pinecone pricing is based on index size:
  • Serverless: Pay per vector stored and queried
  • Pod-based: Pay for dedicated capacity
For a typical 10-page PDF:
  • Vectors: ~100
  • Storage: 100 vectors × 1536 dims × 4 bytes = 614KB
  • Metadata: 100 vectors × 36KB = 3.6MB
  • Total: ~4.2MB per document

Error Handling

Common Errors

// Missing or invalid API key
Error: Pinecone API key not found

// Solution: Set PINECONE_API_KEY environment variable

Error Recovery

// src/lib/pinecone.ts:68
async function embedDocument(doc: Document) {
  try {
    const embeddings = await getEmbeddings(doc.pageContent);
    // ...
  } catch (error) {
    console.log(error);
    throw new Error("unable to embed document");
  }
}
Embedding errors are logged and re-thrown, causing the entire upload to fail. This prevents partial uploads that could corrupt the namespace.
Best Practice: Implement retry logic with exponential backoff for transient errors (rate limits, timeouts).

Advanced Topics

Combine vector search with keyword filtering:
await index.query({
  queryRequest: {
    topK: 10,
    vector: embeddings,
    filter: {
      pageNumber: { $gte: 5, $lte: 10 }
    },
    namespace: fileKey,
  },
});
This retrieves semantically similar chunks only from pages 5-10.

Multi-Index Strategy

For large-scale applications, consider multiple indices:
  • User Index: One index per user (better isolation)
  • Document Index: One index per document type
  • Time-based Indices: Separate recent vs archived documents

Monitoring

Key metrics to track:
  • Query Latency: P50, P95, P99
  • Error Rate: Failed queries / total queries
  • Vector Count: Growth over time
  • Namespace Count: Active documents
  • Cost: Storage + query costs
Use Pinecone’s API to fetch metrics:
const stats = await pineconeIndex.describeIndexStats();
console.log(stats);
// {
//   dimension: 1536,
//   indexFullness: 0.23,
//   totalVectorCount: 125000,
//   namespaces: {
//     'pdf-abc123': { vectorCount: 1250 },
//     'pdf-def456': { vectorCount: 980 },
//     ...
//   }
// }

Summary

Pinecone vector database integration in PDF AI:
  1. Setup: Singleton client with namespace-per-document isolation
  2. Embeddings: OpenAI text-embedding-ada-002 (1536 dimensions)
  3. Upsert: Batch upload with MD5-based deduplication
  4. Search: Top-5 cosine similarity with 0.7 threshold
  5. Performance: Sub-100ms queries with parallel embedding
The vector database is the core of the RAG system, enabling semantic search that makes PDF AI possible.

Build docs developers (and LLMs) love