URL Encoding Utilities

Overview

The URL encoding utilities provide efficient compression of URL arrays using a combination of prefix-diffing, deflate compression, and base64url encoding. This is essential for passing large lists of documentation URLs through URL parameters without hitting length limits.

These utilities can reduce URL array sizes by 80-95%, depending on how similar the URLs are.

Functions

encodeUrls()

Encodes an array of URLs into a compressed, URL-safe string.

urls

string[]

required

Array of URLs to encode

string

URL-safe base64 encoded string representing the compressed URL array

import { encodeUrls } from './url-pack';

const urls = [
  'https://docs.example.com/guide/intro.html',
  'https://docs.example.com/guide/setup.html',
  'https://docs.example.com/api/reference.html'
];

const encoded = encodeUrls(urls);
// Returns: "eJxLKkr..." (much shorter than the original)

console.log('Original size:', JSON.stringify(urls).length);
console.log('Encoded size:', encoded.length);
console.log('Compression:', (1 - encoded.length / JSON.stringify(urls).length) * 100, '%');

Algorithm

The encoding process follows these steps:

Sort URLs

URLs are sorted alphabetically to maximize prefix similarity

const sorted = [...urls].sort();

Prefix Diff

Each URL is diffed against the previous one, storing only the different suffix

// Example:
// "https://docs.example.com/guide/intro.html"
// "https://docs.example.com/guide/setup.html"
// Becomes: "32|setup.html" (32 chars match, rest is different)

Join with Newlines

All diffs are joined with newline characters

const joined = diffs.join('\n');

Deflate Compression

The joined string is compressed using zlib deflate

const compressed = deflateSync(joined);

Base64url Encoding

The binary data is encoded using URL-safe base64

return base64urlEncode(compressed);

Sorting URLs alphabetically before diffing dramatically improves compression ratios because similar URLs (from the same site) will be adjacent.

decodeUrls()

Decodes a string produced by encodeUrls() back into the original URL array.

encoded

string

required

The encoded string from encodeUrls()

string[]

Array of decoded URLs

import { decodeUrls } from './url-pack';

const encoded = "eJxLKkr...";
const urls = decodeUrls(encoded);

console.log('Decoded URLs:', urls);
// [
//   'https://docs.example.com/api/reference.html',
//   'https://docs.example.com/guide/intro.html',
//   'https://docs.example.com/guide/setup.html'
// ]

The decoded URLs will be in sorted (alphabetical) order, which may differ from the original input order.

Algorithm

The decoding process reverses the encoding:

Base64url Decode

Convert the URL-safe base64 string back to binary

const decoded = base64urlDecode(encoded);

Inflate

Decompress the binary data using zlib inflate

const joined = inflateSync(decoded).toString();

Split by Newlines

Split the string back into individual diffs

const diffs = joined.split('\n');

Reconstruct URLs

Apply each diff to reconstruct the full URLs

// "32|setup.html" + previous URL
// -> "https://docs.example.com/guide/setup.html"

Implementation Details

Prefix Diffing

The prefix diffing algorithm is the key to compression efficiency:

Source: url-pack.ts:14-29

for (const url of sorted) {
  if (!last) {
    diffs.push(url);  // First URL: store complete
  } else {
    // Find common prefix length
    let i = 0;
    const minLen = Math.min(url.length, last.length);
    while (i < minLen && url[i] === last[i]) i++;
    
    // Store as "prefixLength|suffix"
    const suffix = url.slice(i);
    diffs.push(`${i}|${suffix}`);
  }
  last = url;
}

Example

Given these URLs:

https://docs.example.com/guide/intro.html
https://docs.example.com/guide/setup.html
https://docs.example.com/api/reference.html

After sorting and diffing:

https://docs.example.com/api/reference.html
32|guide/intro.html
32|guide/setup.html

The common prefix https://docs.example.com/ is stored once, then only the different parts are stored.

Base64url Encoding

Standard base64 encoding uses characters that aren’t URL-safe (+, /, =). Base64url encoding replaces these:

Source: url-pack.ts:63-69

function base64urlEncode(buf: Buffer): string {
  return buf
    .toString('base64')
    .replace(/\+/g, '-')    // + becomes -
    .replace(/\//g, '_')    // / becomes _
    .replace(/=+$/g, '');   // Remove padding =
}

Standard Base64

Uses: A-Z, a-z, 0-9, +, /, =
Not URL-safe

Base64url

Uses: A-Z, a-z, 0-9, -, _
URL-safe

Decoding reverses the process:

Source: url-pack.ts:71-76

function base64urlDecode(str: string): Buffer {
  let b64 = str.replace(/-/g, '+').replace(/_/g, '/');
  const pad = b64.length % 4;
  if (pad) b64 += '='.repeat(4 - pad);  // Add back padding
  return Buffer.from(b64, 'base64');
}

Compression with Deflate

The utilities use zlib’s deflate algorithm, which is the same compression used in gzip and ZIP files:

import { deflateSync, inflateSync } from 'zlib';

const compressed = deflateSync(joined);    // Compress
const decompressed = inflateSync(decoded); // Decompress

Deflate compression is particularly effective on the prefix-diffed URLs because they contain repetitive patterns (the prefix length numbers and pipe separators).

Usage Examples

Basic Encoding/Decoding

import { encodeUrls, decodeUrls } from './url-pack';

const originalUrls = [
  'https://docs.example.com/guide/getting-started.html',
  'https://docs.example.com/guide/installation.html',
  'https://docs.example.com/guide/configuration.html',
  'https://docs.example.com/api/classes/Client.html',
  'https://docs.example.com/api/classes/Server.html'
];

// Encode
const encoded = encodeUrls(originalUrls);
console.log('Encoded:', encoded);
console.log('Length:', encoded.length);

// Decode
const decodedUrls = decodeUrls(encoded);
console.log('Decoded:', decodedUrls);

// Verify
const sortedOriginal = [...originalUrls].sort();
const match = JSON.stringify(sortedOriginal) === JSON.stringify(decodedUrls);
console.log('Match:', match);  // true

URL Parameters

The primary use case is passing URL lists through URL parameters:

import { encodeUrls, decodeUrls } from './url-pack';

// Client side - encode URLs for API request
const docUrls = [
  'https://docs.example.com/page1.html',
  'https://docs.example.com/page2.html',
  'https://docs.example.com/page3.html'
];

const encoded = encodeUrls(docUrls);
const apiUrl = `https://api.example.com/search?docs=${encoded}`;

// Server side - decode URLs from request
const params = new URLSearchParams(req.url.split('?')[1]);
const encodedDocs = params.get('docs');
const decodedDocs = decodeUrls(encodedDocs);

console.log('Received docs:', decodedDocs);

Compression Analysis

import { encodeUrls } from './url-pack';

function analyzeCompression(urls: string[]) {
  const originalJson = JSON.stringify(urls);
  const encoded = encodeUrls(urls);
  
  const originalSize = originalJson.length;
  const encodedSize = encoded.length;
  const ratio = (1 - encodedSize / originalSize) * 100;
  
  console.log('Compression Analysis:');
  console.log(`  Original: ${originalSize} bytes`);
  console.log(`  Encoded: ${encodedSize} bytes`);
  console.log(`  Ratio: ${ratio.toFixed(1)}%`);
  console.log(`  Per URL: ${(encodedSize / urls.length).toFixed(1)} bytes`);
}

const urls = [
  'https://docs.example.com/guide/intro.html',
  'https://docs.example.com/guide/setup.html',
  'https://docs.example.com/guide/usage.html',
  'https://docs.example.com/api/reference.html'
];

analyzeCompression(urls);
// Compression Analysis:
//   Original: 245 bytes
//   Encoded: 48 bytes
//   Ratio: 80.4%
//   Per URL: 12.0 bytes

Handling Edge Cases

import { encodeUrls, decodeUrls } from './url-pack';

// Empty array
const empty = encodeUrls([]);
console.log('Empty:', empty);  // ''
console.log('Decoded:', decodeUrls(empty));  // []

// Single URL
const single = encodeUrls(['https://example.com']);
console.log('Single:', decodeUrls(single));  // ['https://example.com']

// Duplicate URLs (will be deduplicated by sorting)
const dupes = ['https://a.com', 'https://b.com', 'https://a.com'];
const encoded = encodeUrls(dupes);
const decoded = decodeUrls(encoded);
console.log('Decoded:', decoded);  // ['https://a.com', 'https://a.com', 'https://b.com']

// Very dissimilar URLs (less compression)
const dissimilar = [
  'https://site1.com/page.html',
  'https://different-site.org/doc.html',
  'https://another.net/file.html'
];
const poor = encodeUrls(dissimilar);
console.log('Poor compression:', poor.length);  // Less effective

Stream Processing for Large Lists

For very large URL lists, consider processing in batches:

import { encodeUrls, decodeUrls } from './url-pack';

function encodeLargeUrlList(urls: string[], batchSize: number = 100): string[] {
  const batches: string[] = [];
  
  for (let i = 0; i < urls.length; i += batchSize) {
    const batch = urls.slice(i, i + batchSize);
    batches.push(encodeUrls(batch));
  }
  
  return batches;
}

function decodeLargeUrlList(encodedBatches: string[]): string[] {
  const allUrls: string[] = [];
  
  for (const batch of encodedBatches) {
    allUrls.push(...decodeUrls(batch));
  }
  
  return allUrls;
}

// Usage
const hugeList = Array.from({ length: 10000 }, (_, i) => 
  `https://docs.example.com/page${i}.html`
);

const encoded = encodeLargeUrlList(hugeList, 100);
console.log(`Encoded into ${encoded.length} batches`);

const decoded = decodeLargeUrlList(encoded);
console.log(`Decoded ${decoded.length} URLs`);

Performance Characteristics

Compression Ratios

Typical compression ratios for different URL patterns:

Same Domain

85-95% compressionURLs from the same documentation site compress extremely well.

https://docs.example.com/a.html
https://docs.example.com/b.html

Similar Domains

70-85% compressionURLs with similar structures compress well.

https://v1.docs.example.com/guide
https://v2.docs.example.com/guide

Different Domains

40-60% compressionCompletely different URLs compress less effectively.

https://site1.com/page.html
https://different.org/doc.html

Time Complexity

encodeUrls

O(n log n + n*m)

O(n log n) - Sorting URLs
O(n*m) - Prefix comparison (n URLs, m avg length)
O(k) - Compression (k = joined string length)

decodeUrls

O(k + n*m)

O(k) - Decompression
O(n*m) - URL reconstruction

Space Complexity

Both functions use O(n*m) space for storing URL arrays, plus temporary space for compression.

Design Decisions

Why Sort URLs?

Sorting URLs alphabetically before diffing maximizes compression:

Without Sorting
With Sorting

https://docs.example.com/api/client.html
https://docs.example.com/guide/intro.html
https://docs.example.com/api/server.html
https://docs.example.com/guide/setup.html

Only partial prefix matches between adjacent URLs.Compression: ~60%

https://docs.example.com/api/client.html
https://docs.example.com/api/server.html
https://docs.example.com/guide/intro.html
https://docs.example.com/guide/setup.html

Similar URLs are adjacent, maximizing prefix matches.Compression: ~85%

The tradeoff is that decoded URLs will be in sorted order, not the original order. This is acceptable for most use cases where URL order doesn’t matter.

Why Deflate Instead of Gzip?

Deflate is used instead of gzip because:

Smaller Output - No gzip headers/trailers (saves ~18 bytes)
Faster - Slightly faster compression/decompression
Sufficient - Same algorithm as gzip, just without metadata

For very small inputs (< 100 bytes), the gzip overhead is significant.

Why Base64url Instead of Base64?

Base64url encoding is necessary for URL parameters:

URL-Safe - Can be used in query parameters without encoding
No Padding - Removes unnecessary = padding
Smaller - Slightly shorter than percent-encoded base64

// Standard base64 in URL (requires encoding)
const url = `?data=${encodeURIComponent('abc+def/123==')}`;  
// ?data=abc%2Bdef%2F123%3D%3D

// Base64url in URL (no encoding needed)
const url = `?data=abc-def_123`;  
// ?data=abc-def_123

WebHelpSearchClient

Uses URL encoding for passing doc URLs to the MCP server

IndexLoader

Loads search indexes from documentation URLs

MCP Tools

Architecture

Overview

Functions

encodeUrls()

Algorithm

decodeUrls()

Algorithm

Implementation Details

Prefix Diffing

Example

Base64url Encoding

Standard Base64

Base64url

Compression with Deflate

Usage Examples

Basic Encoding/Decoding

URL Parameters

Compression Analysis

Handling Edge Cases

Stream Processing for Large Lists

Performance Characteristics

Compression Ratios

Same Domain

Similar Domains

Different Domains

Time Complexity

Space Complexity

Design Decisions

Why Sort URLs?

Why Deflate Instead of Gzip?

Why Base64url Instead of Base64?

WebHelpSearchClient

IndexLoader

Build docs developers (and LLMs) love

MCP Tools

Architecture

​Overview

​Functions

​encodeUrls()

​Algorithm

​decodeUrls()

​Algorithm

​Implementation Details

​Prefix Diffing

​Example

​Base64url Encoding

Standard Base64

Base64url

​Compression with Deflate

​Usage Examples

​Basic Encoding/Decoding

​URL Parameters

​Compression Analysis

​Handling Edge Cases

​Stream Processing for Large Lists

​Performance Characteristics

​Compression Ratios

Same Domain

Similar Domains

Different Domains

​Time Complexity

​Space Complexity

​Design Decisions

​Why Sort URLs?

​Why Deflate Instead of Gzip?

​Why Base64url Instead of Base64?

​Related

WebHelpSearchClient

IndexLoader

Build docs developers (and LLMs) love

Overview

Functions

encodeUrls()

Algorithm

decodeUrls()

Algorithm

Implementation Details

Prefix Diffing

Example

Base64url Encoding

Compression with Deflate

Usage Examples

Basic Encoding/Decoding

URL Parameters

Compression Analysis

Handling Edge Cases

Stream Processing for Large Lists

Performance Characteristics

Compression Ratios

Time Complexity

Space Complexity

Design Decisions

Why Sort URLs?

Why Deflate Instead of Gzip?

Why Base64url Instead of Base64?

Related