Skip to main content

Overview

The URL encoding utilities provide efficient compression of URL arrays using a combination of prefix-diffing, deflate compression, and base64url encoding. This is essential for passing large lists of documentation URLs through URL parameters without hitting length limits.
These utilities can reduce URL array sizes by 80-95%, depending on how similar the URLs are.

Functions

encodeUrls()

Encodes an array of URLs into a compressed, URL-safe string.
urls
string[]
required
Array of URLs to encode
string
URL-safe base64 encoded string representing the compressed URL array
import { encodeUrls } from './url-pack';

const urls = [
  'https://docs.example.com/guide/intro.html',
  'https://docs.example.com/guide/setup.html',
  'https://docs.example.com/api/reference.html'
];

const encoded = encodeUrls(urls);
// Returns: "eJxLKkr..." (much shorter than the original)

console.log('Original size:', JSON.stringify(urls).length);
console.log('Encoded size:', encoded.length);
console.log('Compression:', (1 - encoded.length / JSON.stringify(urls).length) * 100, '%');

Algorithm

The encoding process follows these steps:
1

Sort URLs

URLs are sorted alphabetically to maximize prefix similarity
const sorted = [...urls].sort();
2

Prefix Diff

Each URL is diffed against the previous one, storing only the different suffix
// Example:
// "https://docs.example.com/guide/intro.html"
// "https://docs.example.com/guide/setup.html"
// Becomes: "32|setup.html" (32 chars match, rest is different)
3

Join with Newlines

All diffs are joined with newline characters
const joined = diffs.join('\n');
4

Deflate Compression

The joined string is compressed using zlib deflate
const compressed = deflateSync(joined);
5

Base64url Encoding

The binary data is encoded using URL-safe base64
return base64urlEncode(compressed);
Sorting URLs alphabetically before diffing dramatically improves compression ratios because similar URLs (from the same site) will be adjacent.

decodeUrls()

Decodes a string produced by encodeUrls() back into the original URL array.
encoded
string
required
The encoded string from encodeUrls()
string[]
Array of decoded URLs
import { decodeUrls } from './url-pack';

const encoded = "eJxLKkr...";
const urls = decodeUrls(encoded);

console.log('Decoded URLs:', urls);
// [
//   'https://docs.example.com/api/reference.html',
//   'https://docs.example.com/guide/intro.html',
//   'https://docs.example.com/guide/setup.html'
// ]
The decoded URLs will be in sorted (alphabetical) order, which may differ from the original input order.

Algorithm

The decoding process reverses the encoding:
1

Base64url Decode

Convert the URL-safe base64 string back to binary
const decoded = base64urlDecode(encoded);
2

Inflate

Decompress the binary data using zlib inflate
const joined = inflateSync(decoded).toString();
3

Split by Newlines

Split the string back into individual diffs
const diffs = joined.split('\n');
4

Reconstruct URLs

Apply each diff to reconstruct the full URLs
// "32|setup.html" + previous URL
// -> "https://docs.example.com/guide/setup.html"

Implementation Details

Prefix Diffing

The prefix diffing algorithm is the key to compression efficiency:
Source: url-pack.ts:14-29
for (const url of sorted) {
  if (!last) {
    diffs.push(url);  // First URL: store complete
  } else {
    // Find common prefix length
    let i = 0;
    const minLen = Math.min(url.length, last.length);
    while (i < minLen && url[i] === last[i]) i++;
    
    // Store as "prefixLength|suffix"
    const suffix = url.slice(i);
    diffs.push(`${i}|${suffix}`);
  }
  last = url;
}

Example

Given these URLs:
https://docs.example.com/guide/intro.html
https://docs.example.com/guide/setup.html
https://docs.example.com/api/reference.html
After sorting and diffing:
https://docs.example.com/api/reference.html
32|guide/intro.html
32|guide/setup.html
The common prefix https://docs.example.com/ is stored once, then only the different parts are stored.

Base64url Encoding

Standard base64 encoding uses characters that aren’t URL-safe (+, /, =). Base64url encoding replaces these:
Source: url-pack.ts:63-69
function base64urlEncode(buf: Buffer): string {
  return buf
    .toString('base64')
    .replace(/\+/g, '-')    // + becomes -
    .replace(/\//g, '_')    // / becomes _
    .replace(/=+$/g, '');   // Remove padding =
}

Standard Base64

Uses: A-Z, a-z, 0-9, +, /, =
Not URL-safe

Base64url

Uses: A-Z, a-z, 0-9, -, _
URL-safe
Decoding reverses the process:
Source: url-pack.ts:71-76
function base64urlDecode(str: string): Buffer {
  let b64 = str.replace(/-/g, '+').replace(/_/g, '/');
  const pad = b64.length % 4;
  if (pad) b64 += '='.repeat(4 - pad);  // Add back padding
  return Buffer.from(b64, 'base64');
}

Compression with Deflate

The utilities use zlib’s deflate algorithm, which is the same compression used in gzip and ZIP files:
import { deflateSync, inflateSync } from 'zlib';

const compressed = deflateSync(joined);    // Compress
const decompressed = inflateSync(decoded); // Decompress
Deflate compression is particularly effective on the prefix-diffed URLs because they contain repetitive patterns (the prefix length numbers and pipe separators).

Usage Examples

Basic Encoding/Decoding

import { encodeUrls, decodeUrls } from './url-pack';

const originalUrls = [
  'https://docs.example.com/guide/getting-started.html',
  'https://docs.example.com/guide/installation.html',
  'https://docs.example.com/guide/configuration.html',
  'https://docs.example.com/api/classes/Client.html',
  'https://docs.example.com/api/classes/Server.html'
];

// Encode
const encoded = encodeUrls(originalUrls);
console.log('Encoded:', encoded);
console.log('Length:', encoded.length);

// Decode
const decodedUrls = decodeUrls(encoded);
console.log('Decoded:', decodedUrls);

// Verify
const sortedOriginal = [...originalUrls].sort();
const match = JSON.stringify(sortedOriginal) === JSON.stringify(decodedUrls);
console.log('Match:', match);  // true

URL Parameters

The primary use case is passing URL lists through URL parameters:
import { encodeUrls, decodeUrls } from './url-pack';

// Client side - encode URLs for API request
const docUrls = [
  'https://docs.example.com/page1.html',
  'https://docs.example.com/page2.html',
  'https://docs.example.com/page3.html'
];

const encoded = encodeUrls(docUrls);
const apiUrl = `https://api.example.com/search?docs=${encoded}`;

// Server side - decode URLs from request
const params = new URLSearchParams(req.url.split('?')[1]);
const encodedDocs = params.get('docs');
const decodedDocs = decodeUrls(encodedDocs);

console.log('Received docs:', decodedDocs);

Compression Analysis

import { encodeUrls } from './url-pack';

function analyzeCompression(urls: string[]) {
  const originalJson = JSON.stringify(urls);
  const encoded = encodeUrls(urls);
  
  const originalSize = originalJson.length;
  const encodedSize = encoded.length;
  const ratio = (1 - encodedSize / originalSize) * 100;
  
  console.log('Compression Analysis:');
  console.log(`  Original: ${originalSize} bytes`);
  console.log(`  Encoded: ${encodedSize} bytes`);
  console.log(`  Ratio: ${ratio.toFixed(1)}%`);
  console.log(`  Per URL: ${(encodedSize / urls.length).toFixed(1)} bytes`);
}

const urls = [
  'https://docs.example.com/guide/intro.html',
  'https://docs.example.com/guide/setup.html',
  'https://docs.example.com/guide/usage.html',
  'https://docs.example.com/api/reference.html'
];

analyzeCompression(urls);
// Compression Analysis:
//   Original: 245 bytes
//   Encoded: 48 bytes
//   Ratio: 80.4%
//   Per URL: 12.0 bytes

Handling Edge Cases

import { encodeUrls, decodeUrls } from './url-pack';

// Empty array
const empty = encodeUrls([]);
console.log('Empty:', empty);  // ''
console.log('Decoded:', decodeUrls(empty));  // []

// Single URL
const single = encodeUrls(['https://example.com']);
console.log('Single:', decodeUrls(single));  // ['https://example.com']

// Duplicate URLs (will be deduplicated by sorting)
const dupes = ['https://a.com', 'https://b.com', 'https://a.com'];
const encoded = encodeUrls(dupes);
const decoded = decodeUrls(encoded);
console.log('Decoded:', decoded);  // ['https://a.com', 'https://a.com', 'https://b.com']

// Very dissimilar URLs (less compression)
const dissimilar = [
  'https://site1.com/page.html',
  'https://different-site.org/doc.html',
  'https://another.net/file.html'
];
const poor = encodeUrls(dissimilar);
console.log('Poor compression:', poor.length);  // Less effective

Stream Processing for Large Lists

For very large URL lists, consider processing in batches:
import { encodeUrls, decodeUrls } from './url-pack';

function encodeLargeUrlList(urls: string[], batchSize: number = 100): string[] {
  const batches: string[] = [];
  
  for (let i = 0; i < urls.length; i += batchSize) {
    const batch = urls.slice(i, i + batchSize);
    batches.push(encodeUrls(batch));
  }
  
  return batches;
}

function decodeLargeUrlList(encodedBatches: string[]): string[] {
  const allUrls: string[] = [];
  
  for (const batch of encodedBatches) {
    allUrls.push(...decodeUrls(batch));
  }
  
  return allUrls;
}

// Usage
const hugeList = Array.from({ length: 10000 }, (_, i) => 
  `https://docs.example.com/page${i}.html`
);

const encoded = encodeLargeUrlList(hugeList, 100);
console.log(`Encoded into ${encoded.length} batches`);

const decoded = decodeLargeUrlList(encoded);
console.log(`Decoded ${decoded.length} URLs`);

Performance Characteristics

Compression Ratios

Typical compression ratios for different URL patterns:

Same Domain

85-95% compressionURLs from the same documentation site compress extremely well.
https://docs.example.com/a.html
https://docs.example.com/b.html

Similar Domains

70-85% compressionURLs with similar structures compress well.
https://v1.docs.example.com/guide
https://v2.docs.example.com/guide

Different Domains

40-60% compressionCompletely different URLs compress less effectively.
https://site1.com/page.html
https://different.org/doc.html

Time Complexity

encodeUrls
O(n log n + n*m)
  • O(n log n) - Sorting URLs
  • O(n*m) - Prefix comparison (n URLs, m avg length)
  • O(k) - Compression (k = joined string length)
decodeUrls
O(k + n*m)
  • O(k) - Decompression
  • O(n*m) - URL reconstruction

Space Complexity

Both functions use O(n*m) space for storing URL arrays, plus temporary space for compression.

Design Decisions

Why Sort URLs?

Sorting URLs alphabetically before diffing maximizes compression:
https://docs.example.com/api/client.html
https://docs.example.com/guide/intro.html
https://docs.example.com/api/server.html
https://docs.example.com/guide/setup.html
Only partial prefix matches between adjacent URLs.Compression: ~60%
The tradeoff is that decoded URLs will be in sorted order, not the original order. This is acceptable for most use cases where URL order doesn’t matter.

Why Deflate Instead of Gzip?

Deflate is used instead of gzip because:
  • Smaller Output - No gzip headers/trailers (saves ~18 bytes)
  • Faster - Slightly faster compression/decompression
  • Sufficient - Same algorithm as gzip, just without metadata
For very small inputs (< 100 bytes), the gzip overhead is significant.

Why Base64url Instead of Base64?

Base64url encoding is necessary for URL parameters:
  • URL-Safe - Can be used in query parameters without encoding
  • No Padding - Removes unnecessary = padding
  • Smaller - Slightly shorter than percent-encoded base64
// Standard base64 in URL (requires encoding)
const url = `?data=${encodeURIComponent('abc+def/123==')}`;  
// ?data=abc%2Bdef%2F123%3D%3D

// Base64url in URL (no encoding needed)
const url = `?data=abc-def_123`;  
// ?data=abc-def_123

WebHelpSearchClient

Uses URL encoding for passing doc URLs to the MCP server

IndexLoader

Loads search indexes from documentation URLs

Build docs developers (and LLMs) love