Vector Database

What is a Vector Database?

A vector database is a specialized database designed to store and query high-dimensional vectors (embeddings). Unlike traditional databases that search for exact matches, vector databases find semantically similar content using mathematical distance metrics.

Vector databases enable semantic search: finding documents that mean the same thing, even if they use different words.

Why Pinecone?

PDF AI uses Pinecone as its vector database for several reasons:

Serverless - No infrastructure to manage
Fast - Sub-100ms query latency
Scalable - Handles billions of vectors
Accurate - Uses state-of-the-art approximate nearest neighbor algorithms
Namespace Support - Built-in data isolation

Pinecone Setup

Client Initialization

The Pinecone client is initialized as a singleton to avoid repeated authentication:

// src/lib/pinecone.ts:12
let pinecone: Pinecone | null = null;

export const getPineconeClient = async () => {
  if (!pinecone) {
    pinecone = new Pinecone({
      apiKey: process.env.PINECONE_API_KEY!,
      environment: process.env.PINECONE_ENVIRONMENT!,
    });
  }
  return pinecone;
};

The singleton pattern ensures only one client instance exists across all API requests, reducing authentication overhead.

Index Configuration

PDF AI uses a single Pinecone index named "aipdf":

// src/lib/pinecone.ts:48
const client = await getPineconeClient();
const pineconeIndex = await client.Index("aipdf");

Index Specifications:

Dimension: 1536 (matches OpenAI’s text-embedding-ada-002 model)
Metric: Cosine similarity
Cloud Provider: AWS (typically)
Region: Same as application deployment for low latency

How to create the Pinecone index

The index must be created manually before deploying the application:

Log in to Pinecone Console
Click “Create Index”
Set Name: aipdf
Set Dimensions: 1536
Set Metric: cosine
Choose Serverless deployment
Select your cloud provider and region
Click “Create Index”

Namespace Strategy

Each PDF document is stored in its own namespace for data isolation:

// src/lib/pinecone.ts:49
const namespace = pineconeIndex.namespace(convertToAscii(fileKey));

The fileKey (S3 object key) is converted to ASCII to ensure compatibility:

// src/lib/utils.ts (inferred)
export function convertToAscii(text: string) {
  return text.replace(/[^\x00-\x7F]/g, "");
}

Namespaces are critical for security. Without them, users could access vectors from other users’ documents.

Embedding Generation

Embeddings convert text into numerical vectors that capture semantic meaning.

OpenAI Embedding Model

PDF AI uses OpenAI’s text-embedding-ada-002 model:

// src/lib/embeddings.ts:9
export async function getEmbeddings(text: string) {
  try {
    const response = await openai.createEmbedding({
      model: "text-embedding-ada-002",
      input: text.replace(/\n/g, " "),
    });
    const result = await response.json();
    return result.data[0].embedding as number[];
  } catch (error) {
    console.log("error calling openai embeddings api", error);
    throw error;
  }
}

Model Characteristics:

Output Dimension: 1536
Max Tokens: 8,191 tokens (~6,000 words)
Cost: $0.0001 per 1,000 tokens
Performance: State-of-the-art for semantic search

The text-embedding-ada-002 model is optimized for retrieval tasks and provides excellent quality-to-cost ratio.

Text Preprocessing

Before embedding, text is preprocessed:

input: text.replace(/\n/g, " ")

Newlines are replaced with spaces because:

Embedding models treat newlines as semantic boundaries
Whitespace normalization improves consistency
Reduces token count slightly

Embedding Document Chunks

Each document chunk is embedded and prepared for Pinecone:

// src/lib/pinecone.ts:56
async function embedDocument(doc: Document) {
  try {
    const embeddings = await getEmbeddings(doc.pageContent);
    const hash = md5(doc.pageContent);
    return {
      id: hash,
      values: embeddings,
      metadata: {
        text: doc.metadata.text,
        pageNumber: doc.metadata.pageNumber,
      },
    } as PineconeRecord;
  } catch (error) {
    console.log(error);
    throw new Error("unable to embed document");
  }
}

PineconeRecord Structure:

id: MD5 hash of content (deterministic, prevents duplicates)
values: 1536-dimension embedding vector
metadata: Stored alongside vector for retrieval
- text: Original chunk text (truncated to 36KB)
- pageNumber: Source page in PDF

Why use MD5 for vector IDs?

Benefits of MD5 hashing:

Deterministic - Same content always produces same ID
Deduplication - Prevents storing identical chunks multiple times
Idempotent Uploads - Re-uploading same document updates existing vectors
No External State - Don’t need a database to track IDs

Trade-offs:

MD5 collisions are theoretically possible (but extremely rare)
For production systems at massive scale, consider UUIDs with deduplication logic

Vector Upsert

Vectors are uploaded to Pinecone using the upsert operation:

// src/lib/pinecone.ts:44
const vectors = await Promise.all(documents.flat().map(embedDocument));

const client = await getPineconeClient();
const pineconeIndex = await client.Index("aipdf");
const namespace = pineconeIndex.namespace(convertToAscii(fileKey));

console.log("uploading to pinecone...");
await namespace.upsert(vectors);

Upsert vs Insert

Upsert = Update + Insert

If a vector with the same ID exists, it’s updated
If it doesn’t exist, it’s inserted
Enables idempotent uploads (safe to re-run)

Batch upserts are more efficient than individual inserts. The code uses Promise.all to embed all chunks in parallel, then uploads them together.

Upload Performance

For a typical 10-page PDF:

Chunks: ~50-100 (depends on text density)
Embedding Time: 0.3s × 100 = 30s (if sequential)
Parallel Embedding: ~3-5s (with Promise.all)
Upsert Time: ~1-2s (batch operation)

Total upload time: ~5-7 seconds

Similarity Search

Querying Pinecone to find relevant document chunks.

Query Process

// src/lib/context.ts:18
export async function getMatchesFromEmbeddings(
  embeddings: number[],
  fileKey: string
) {
  const pinecone = new PineconeClient();
  await pinecone.init({
    apiKey: process.env.PINECONE_API_KEY!,
    environment: process.env.PINECONE_ENVIRONMENT!,
  });
  const index = await pinecone.Index("aipdf");

  try {
    const namespace = convertToAscii(fileKey);
    const queryResult = await index.query({
      queryRequest: {
        topK: 5,
        vector: embeddings,
        includeMetadata: true,
        namespace,
      },
    });
    return queryResult.matches || [];
  } catch (error) {
    console.log("error querying embeddings", error);
    throw error;
  }
}

Query Parameters

topK: 5

Query Response Structure

interface QueryResponse {
  matches: Array<{
    id: string;           // MD5 hash
    score: number;        // Similarity score (0-1)
    values?: number[];    // Vector (not included by default)
    metadata?: {
      text: string;       // Original chunk text
      pageNumber: number; // Source page
    };
  }>;
}

Similarity Scores:

0.9-1.0: Extremely similar (almost identical meaning)
0.7-0.9: Highly relevant (strong semantic match)
0.5-0.7: Somewhat relevant (related topic)
< 0.5: Not relevant (different topic)

Relevance Filtering

Raw query results are filtered by similarity threshold:

// src/lib/context.ts:50
const qualifyingDocs = matches.filter(
  (match) => match.score && match.score > 0.7
);

Only chunks scoring above 0.7 are used for context. This prevents low-quality matches from confusing the AI.

If no chunks score above 0.7, an empty context is returned, and the AI will respond “I don’t know.”

Cosine Similarity

Pinecone uses cosine similarity to measure vector distance.

Mathematical Definition

cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)

Where:

A · B is the dot product
||A|| and ||B|| are vector magnitudes

Why Cosine Similarity?

Scale-Invariant: Measures angle, not magnitude
Range [0, 1]: Easy to interpret
Fast Computation: Optimized for high-dimensional spaces
Semantic Meaning: Embeddings with similar meanings have small angles

Example: Computing cosine similarity

function cosineSimilarity(vecA: number[], vecB: number[]): number {
  const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

// Example vectors (simplified to 3D for illustration)
const vec1 = [1, 2, 3];     // "machine learning"
const vec2 = [1.1, 2.2, 2.9]; // "deep learning"
const vec3 = [5, -3, 1];    // "cooking recipes"

console.log(cosineSimilarity(vec1, vec2)); // ~0.998 (very similar)
console.log(cosineSimilarity(vec1, vec3)); // ~0.447 (not similar)

Performance Optimization

Indexing Speed

Pinecone builds approximate nearest neighbor (ANN) indices for fast searches:

Algorithm: HNSW (Hierarchical Navigable Small World)
Query Time: O(log n) instead of O(n)
Trade-off: ~99% accuracy vs brute-force

Query Latency

Typical query performance:

P50 Latency: 30-50ms
P95 Latency: 100-200ms
P99 Latency: 300-500ms

Latency increases with index size but remains logarithmic. A 1M-vector index queries nearly as fast as a 100K-vector index.

Parallel Embedding

The code uses Promise.all to embed chunks in parallel:

// src/lib/pinecone.ts:44
const vectors = await Promise.all(documents.flat().map(embedDocument));

Performance Impact:

Sequential: 100 chunks × 0.3s = 30 seconds
Parallel (10 concurrent): 100 chunks ÷ 10 × 0.3s = 3 seconds

Connection Pooling

The singleton pattern avoids repeated client initialization:

let pinecone: Pinecone | null = null;

export const getPineconeClient = async () => {
  if (!pinecone) {
    pinecone = new Pinecone({...});
  }
  return pinecone;
};

This saves ~100-200ms per API request.

Data Management

Namespace Operations

Each PDF gets its own namespace:

// Create/update namespace
const namespace = pineconeIndex.namespace(fileKey);
await namespace.upsert(vectors);

// Query namespace
const results = await index.query({
  queryRequest: { namespace: fileKey, ... }
});

// Delete namespace (when PDF is deleted)
await pineconeIndex.deleteOne({ deleteRequest: { namespace: fileKey } });

Important: Deleting a chat from the database doesn’t automatically delete its Pinecone namespace. Implement cleanup logic to prevent orphaned data.

Metadata Limits

Pinecone enforces metadata size limits:

Per Vector: 40KB
PDF AI Limit: 36KB (for safety margin)

// src/lib/pinecone.ts:95
text: truncateStringByBytes(pageContent, 36000)

Storage Costs

Pinecone pricing is based on index size:

Serverless: Pay per vector stored and queried
Pod-based: Pay for dedicated capacity

For a typical 10-page PDF:

Vectors: ~100
Storage: 100 vectors × 1536 dims × 4 bytes = 614KB
Metadata: 100 vectors × 36KB = 3.6MB
Total: ~4.2MB per document

Error Handling

Common Errors

// Missing or invalid API key
Error: Pinecone API key not found

// Solution: Set PINECONE_API_KEY environment variable

Error Recovery

// src/lib/pinecone.ts:68
async function embedDocument(doc: Document) {
  try {
    const embeddings = await getEmbeddings(doc.pageContent);
    // ...
  } catch (error) {
    console.log(error);
    throw new Error("unable to embed document");
  }
}

Embedding errors are logged and re-thrown, causing the entire upload to fail. This prevents partial uploads that could corrupt the namespace.

Best Practice: Implement retry logic with exponential backoff for transient errors (rate limits, timeouts).

Advanced Topics

Hybrid Search

Combine vector search with keyword filtering:

await index.query({
  queryRequest: {
    topK: 10,
    vector: embeddings,
    filter: {
      pageNumber: { $gte: 5, $lte: 10 }
    },
    namespace: fileKey,
  },
});

This retrieves semantically similar chunks only from pages 5-10.

Multi-Index Strategy

For large-scale applications, consider multiple indices:

User Index: One index per user (better isolation)
Document Index: One index per document type
Time-based Indices: Separate recent vs archived documents

Monitoring

Key metrics to track:

Query Latency: P50, P95, P99
Error Rate: Failed queries / total queries
Vector Count: Growth over time
Namespace Count: Active documents
Cost: Storage + query costs

Example: Pinecone monitoring dashboard

Use Pinecone’s API to fetch metrics:

const stats = await pineconeIndex.describeIndexStats();
console.log(stats);
// {
//   dimension: 1536,
//   indexFullness: 0.23,
//   totalVectorCount: 125000,
//   namespaces: {
//     'pdf-abc123': { vectorCount: 1250 },
//     'pdf-def456': { vectorCount: 980 },
//     ...
//   }
// }

Summary

Pinecone vector database integration in PDF AI:

Setup: Singleton client with namespace-per-document isolation
Embeddings: OpenAI text-embedding-ada-002 (1536 dimensions)
Upsert: Batch upload with MD5-based deduplication
Search: Top-5 cosine similarity with 0.7 threshold
Performance: Sub-100ms queries with parallel embedding

The vector database is the core of the RAG system, enabling semantic search that makes PDF AI possible.

Architecture

Integrations

API Reference

​What is a Vector Database?

​Why Pinecone?

​Pinecone Setup

​Client Initialization

​Index Configuration

​Namespace Strategy

​Embedding Generation

​OpenAI Embedding Model

​Text Preprocessing

​Embedding Document Chunks

​Vector Upsert

​Upsert vs Insert

​Upload Performance

​Similarity Search

​Query Process

​Query Parameters

​Query Response Structure

​Relevance Filtering

​Cosine Similarity

​Mathematical Definition

​Why Cosine Similarity?

​Performance Optimization

​Indexing Speed

​Query Latency

​Parallel Embedding

​Connection Pooling

​Data Management

​Namespace Operations

​Metadata Limits

​Storage Costs

​Error Handling

​Common Errors

​Error Recovery

​Advanced Topics

​Hybrid Search

​Multi-Index Strategy

​Monitoring

​Summary

Build docs developers (and LLMs) love

What is a Vector Database?

Why Pinecone?

Pinecone Setup

Client Initialization

Index Configuration

Namespace Strategy

Embedding Generation

OpenAI Embedding Model

Text Preprocessing

Embedding Document Chunks

Vector Upsert

Upsert vs Insert

Upload Performance

Similarity Search

Query Process

Query Parameters

Query Response Structure

Relevance Filtering

Cosine Similarity

Mathematical Definition

Why Cosine Similarity?

Performance Optimization

Indexing Speed

Query Latency

Parallel Embedding

Connection Pooling

Data Management

Namespace Operations

Metadata Limits

Storage Costs

Error Handling

Common Errors

Error Recovery

Advanced Topics

Hybrid Search

Multi-Index Strategy

Monitoring

Summary