Overview
The URL encoding utilities provide efficient compression of URL arrays using a combination of prefix-diffing, deflate compression, and base64url encoding. This is essential for passing large lists of documentation URLs through URL parameters without hitting length limits.
These utilities can reduce URL array sizes by 80-95%, depending on how similar the URLs are.
Functions
encodeUrls()
Encodes an array of URLs into a compressed, URL-safe string.
URL-safe base64 encoded string representing the compressed URL array
import { encodeUrls } from './url-pack' ;
const urls = [
'https://docs.example.com/guide/intro.html' ,
'https://docs.example.com/guide/setup.html' ,
'https://docs.example.com/api/reference.html'
];
const encoded = encodeUrls ( urls );
// Returns: "eJxLKkr..." (much shorter than the original)
console . log ( 'Original size:' , JSON . stringify ( urls ). length );
console . log ( 'Encoded size:' , encoded . length );
console . log ( 'Compression:' , ( 1 - encoded . length / JSON . stringify ( urls ). length ) * 100 , '%' );
Algorithm
The encoding process follows these steps:
Sort URLs
URLs are sorted alphabetically to maximize prefix similarity const sorted = [ ... urls ]. sort ();
Prefix Diff
Each URL is diffed against the previous one, storing only the different suffix // Example:
// "https://docs.example.com/guide/intro.html"
// "https://docs.example.com/guide/setup.html"
// Becomes: "32|setup.html" (32 chars match, rest is different)
Join with Newlines
All diffs are joined with newline characters const joined = diffs . join ( ' \n ' );
Deflate Compression
The joined string is compressed using zlib deflate const compressed = deflateSync ( joined );
Base64url Encoding
The binary data is encoded using URL-safe base64 return base64urlEncode ( compressed );
Sorting URLs alphabetically before diffing dramatically improves compression ratios because similar URLs (from the same site) will be adjacent.
decodeUrls()
Decodes a string produced by encodeUrls() back into the original URL array.
The encoded string from encodeUrls()
import { decodeUrls } from './url-pack' ;
const encoded = "eJxLKkr..." ;
const urls = decodeUrls ( encoded );
console . log ( 'Decoded URLs:' , urls );
// [
// 'https://docs.example.com/api/reference.html',
// 'https://docs.example.com/guide/intro.html',
// 'https://docs.example.com/guide/setup.html'
// ]
The decoded URLs will be in sorted (alphabetical) order, which may differ from the original input order.
Algorithm
The decoding process reverses the encoding:
Base64url Decode
Convert the URL-safe base64 string back to binary const decoded = base64urlDecode ( encoded );
Inflate
Decompress the binary data using zlib inflate const joined = inflateSync ( decoded ). toString ();
Split by Newlines
Split the string back into individual diffs const diffs = joined . split ( ' \n ' );
Reconstruct URLs
Apply each diff to reconstruct the full URLs // "32|setup.html" + previous URL
// -> "https://docs.example.com/guide/setup.html"
Implementation Details
Prefix Diffing
The prefix diffing algorithm is the key to compression efficiency:
Source: url-pack.ts:14-29
for ( const url of sorted ) {
if ( ! last ) {
diffs . push ( url ); // First URL: store complete
} else {
// Find common prefix length
let i = 0 ;
const minLen = Math . min ( url . length , last . length );
while ( i < minLen && url [ i ] === last [ i ]) i ++ ;
// Store as "prefixLength|suffix"
const suffix = url . slice ( i );
diffs . push ( ` ${ i } | ${ suffix } ` );
}
last = url ;
}
Example
Given these URLs:
https://docs.example.com/guide/intro.html
https://docs.example.com/guide/setup.html
https://docs.example.com/api/reference.html
After sorting and diffing:
https://docs.example.com/api/reference.html
32|guide/intro.html
32|guide/setup.html
The common prefix https://docs.example.com/ is stored once, then only the different parts are stored.
Base64url Encoding
Standard base64 encoding uses characters that aren’t URL-safe (+, /, =). Base64url encoding replaces these:
Source: url-pack.ts:63-69
function base64urlEncode ( buf : Buffer ) : string {
return buf
. toString ( 'base64' )
. replace ( / \+ / g , '-' ) // + becomes -
. replace ( / \/ / g , '_' ) // / becomes _
. replace ( /= + $ / g , '' ); // Remove padding =
}
Standard Base64 Uses: A-Z, a-z, 0-9, +, /, =
Not URL-safe
Base64url Uses: A-Z, a-z, 0-9, -, _
URL-safe
Decoding reverses the process:
Source: url-pack.ts:71-76
function base64urlDecode ( str : string ) : Buffer {
let b64 = str . replace ( /-/ g , '+' ). replace ( /_/ g , '/' );
const pad = b64 . length % 4 ;
if ( pad ) b64 += '=' . repeat ( 4 - pad ); // Add back padding
return Buffer . from ( b64 , 'base64' );
}
Compression with Deflate
The utilities use zlib’s deflate algorithm, which is the same compression used in gzip and ZIP files:
import { deflateSync , inflateSync } from 'zlib' ;
const compressed = deflateSync ( joined ); // Compress
const decompressed = inflateSync ( decoded ); // Decompress
Deflate compression is particularly effective on the prefix-diffed URLs because they contain repetitive patterns (the prefix length numbers and pipe separators).
Usage Examples
Basic Encoding/Decoding
import { encodeUrls , decodeUrls } from './url-pack' ;
const originalUrls = [
'https://docs.example.com/guide/getting-started.html' ,
'https://docs.example.com/guide/installation.html' ,
'https://docs.example.com/guide/configuration.html' ,
'https://docs.example.com/api/classes/Client.html' ,
'https://docs.example.com/api/classes/Server.html'
];
// Encode
const encoded = encodeUrls ( originalUrls );
console . log ( 'Encoded:' , encoded );
console . log ( 'Length:' , encoded . length );
// Decode
const decodedUrls = decodeUrls ( encoded );
console . log ( 'Decoded:' , decodedUrls );
// Verify
const sortedOriginal = [ ... originalUrls ]. sort ();
const match = JSON . stringify ( sortedOriginal ) === JSON . stringify ( decodedUrls );
console . log ( 'Match:' , match ); // true
URL Parameters
The primary use case is passing URL lists through URL parameters:
import { encodeUrls , decodeUrls } from './url-pack' ;
// Client side - encode URLs for API request
const docUrls = [
'https://docs.example.com/page1.html' ,
'https://docs.example.com/page2.html' ,
'https://docs.example.com/page3.html'
];
const encoded = encodeUrls ( docUrls );
const apiUrl = `https://api.example.com/search?docs= ${ encoded } ` ;
// Server side - decode URLs from request
const params = new URLSearchParams ( req . url . split ( '?' )[ 1 ]);
const encodedDocs = params . get ( 'docs' );
const decodedDocs = decodeUrls ( encodedDocs );
console . log ( 'Received docs:' , decodedDocs );
Compression Analysis
import { encodeUrls } from './url-pack' ;
function analyzeCompression ( urls : string []) {
const originalJson = JSON . stringify ( urls );
const encoded = encodeUrls ( urls );
const originalSize = originalJson . length ;
const encodedSize = encoded . length ;
const ratio = ( 1 - encodedSize / originalSize ) * 100 ;
console . log ( 'Compression Analysis:' );
console . log ( ` Original: ${ originalSize } bytes` );
console . log ( ` Encoded: ${ encodedSize } bytes` );
console . log ( ` Ratio: ${ ratio . toFixed ( 1 ) } %` );
console . log ( ` Per URL: ${ ( encodedSize / urls . length ). toFixed ( 1 ) } bytes` );
}
const urls = [
'https://docs.example.com/guide/intro.html' ,
'https://docs.example.com/guide/setup.html' ,
'https://docs.example.com/guide/usage.html' ,
'https://docs.example.com/api/reference.html'
];
analyzeCompression ( urls );
// Compression Analysis:
// Original: 245 bytes
// Encoded: 48 bytes
// Ratio: 80.4%
// Per URL: 12.0 bytes
Handling Edge Cases
import { encodeUrls , decodeUrls } from './url-pack' ;
// Empty array
const empty = encodeUrls ([]);
console . log ( 'Empty:' , empty ); // ''
console . log ( 'Decoded:' , decodeUrls ( empty )); // []
// Single URL
const single = encodeUrls ([ 'https://example.com' ]);
console . log ( 'Single:' , decodeUrls ( single )); // ['https://example.com']
// Duplicate URLs (will be deduplicated by sorting)
const dupes = [ 'https://a.com' , 'https://b.com' , 'https://a.com' ];
const encoded = encodeUrls ( dupes );
const decoded = decodeUrls ( encoded );
console . log ( 'Decoded:' , decoded ); // ['https://a.com', 'https://a.com', 'https://b.com']
// Very dissimilar URLs (less compression)
const dissimilar = [
'https://site1.com/page.html' ,
'https://different-site.org/doc.html' ,
'https://another.net/file.html'
];
const poor = encodeUrls ( dissimilar );
console . log ( 'Poor compression:' , poor . length ); // Less effective
Stream Processing for Large Lists
For very large URL lists, consider processing in batches:
import { encodeUrls , decodeUrls } from './url-pack' ;
function encodeLargeUrlList ( urls : string [], batchSize : number = 100 ) : string [] {
const batches : string [] = [];
for ( let i = 0 ; i < urls . length ; i += batchSize ) {
const batch = urls . slice ( i , i + batchSize );
batches . push ( encodeUrls ( batch ));
}
return batches ;
}
function decodeLargeUrlList ( encodedBatches : string []) : string [] {
const allUrls : string [] = [];
for ( const batch of encodedBatches ) {
allUrls . push ( ... decodeUrls ( batch ));
}
return allUrls ;
}
// Usage
const hugeList = Array . from ({ length: 10000 }, ( _ , i ) =>
`https://docs.example.com/page ${ i } .html`
);
const encoded = encodeLargeUrlList ( hugeList , 100 );
console . log ( `Encoded into ${ encoded . length } batches` );
const decoded = decodeLargeUrlList ( encoded );
console . log ( `Decoded ${ decoded . length } URLs` );
Compression Ratios
Typical compression ratios for different URL patterns:
Same Domain 85-95% compression URLs from the same documentation site compress extremely well. https://docs.example.com/a.html
https://docs.example.com/b.html
Similar Domains 70-85% compression URLs with similar structures compress well. https://v1.docs.example.com/guide
https://v2.docs.example.com/guide
Different Domains 40-60% compression Completely different URLs compress less effectively. https://site1.com/page.html
https://different.org/doc.html
Time Complexity
O(n log n) - Sorting URLs
O(n*m) - Prefix comparison (n URLs, m avg length)
O(k) - Compression (k = joined string length)
O(k) - Decompression
O(n*m) - URL reconstruction
Space Complexity
Both functions use O(n*m) space for storing URL arrays, plus temporary space for compression.
Design Decisions
Why Sort URLs?
Sorting URLs alphabetically before diffing maximizes compression:
Without Sorting
With Sorting
https://docs.example.com/api/client.html
https://docs.example.com/guide/intro.html
https://docs.example.com/api/server.html
https://docs.example.com/guide/setup.html
Only partial prefix matches between adjacent URLs. Compression: ~60% https://docs.example.com/api/client.html
https://docs.example.com/api/server.html
https://docs.example.com/guide/intro.html
https://docs.example.com/guide/setup.html
Similar URLs are adjacent, maximizing prefix matches. Compression: ~85%
The tradeoff is that decoded URLs will be in sorted order, not the original order. This is acceptable for most use cases where URL order doesn’t matter.
Why Deflate Instead of Gzip?
Deflate is used instead of gzip because:
Smaller Output - No gzip headers/trailers (saves ~18 bytes)
Faster - Slightly faster compression/decompression
Sufficient - Same algorithm as gzip, just without metadata
For very small inputs (< 100 bytes), the gzip overhead is significant.
Why Base64url Instead of Base64?
Base64url encoding is necessary for URL parameters:
URL-Safe - Can be used in query parameters without encoding
No Padding - Removes unnecessary = padding
Smaller - Slightly shorter than percent-encoded base64
// Standard base64 in URL (requires encoding)
const url = `?data= ${ encodeURIComponent ( 'abc+def/123==' ) } ` ;
// ?data=abc%2Bdef%2F123%3D%3D
// Base64url in URL (no encoding needed)
const url = `?data=abc-def_123` ;
// ?data=abc-def_123
WebHelpSearchClient Uses URL encoding for passing doc URLs to the MCP server
IndexLoader Loads search indexes from documentation URLs