Federated Search

Federated search enables querying multiple WebHelp documentation sites in a single request. Results from all sites are merged, scored, and ranked together, providing comprehensive answers across your entire documentation ecosystem.

How It Works

Instead of pointing to a single documentation site, federated search uses a special URL format that encodes multiple site URLs:

https://webhelp-mcp.vercel.app/federated/{encoded-urls}

The server:

Decodes the URL list using the decodeUrls function
Searches each site independently
Merges all results into a single array
Sorts by relevance score
Returns the top 10 results across all sites

Federated search always uses index-based search, not semantic search, to ensure consistent scoring across sites.

URL Encoding

Federated search uses a compressed encoding scheme to pack multiple URLs into a single path segment:

// From url-pack.ts:11-34
export function encodeUrls(urls: string[]): string {
  if (!urls || urls.length === 0) return '';

  const sorted = [...urls].sort();
  const diffs: string[] = [];
  let last = '';

  for (const url of sorted) {
    if (!last) {
      diffs.push(url);
    } else {
      let i = 0;
      const minLen = Math.min(url.length, last.length);
      while (i < minLen && url[i] === last[i]) i++;
      const suffix = url.slice(i);
      diffs.push(`${i}|${suffix}`);
    }
    last = url;
  }

  const joined = diffs.join('\n');
  const compressed = deflateSync(joined);
  return base64urlEncode(compressed);
}

Encoding steps:

Sort URLs alphabetically to maximize prefix similarity
Compute prefix differences between consecutive URLs
Join differences with newlines
Compress with zlib deflate
Encode as base64url (URL-safe)

The encoding is highly efficient — 10 similar URLs might compress to just 50-100 characters.

URL Decoding

The server decodes the compressed URL list on each request:

// From url-pack.ts:36-61
export function decodeUrls(encoded: string): string[] {
  if (!encoded) return [];
  const decoded = base64urlDecode(encoded);
  const joined = inflateSync(decoded).toString();
  const diffs = joined.split('\n');

  const urls: string[] = [];
  let last = '';
  for (const diff of diffs) {
    const sepIndex = diff.indexOf('|');
    if (sepIndex > -1) {
      const prefixLen = parseInt(diff.slice(0, sepIndex), 10);
      if (!isNaN(prefixLen)) {
        const prefix = last.slice(0, prefixLen);
        const url = prefix + diff.slice(sepIndex + 1);
        urls.push(url);
        last = url;
        continue;
      }
    }
    urls.push(diff);
    last = diff;
  }
  return urls;
}

Route Handler

The Next.js route handler detects federated mode and decodes URLs:

// From app/[...site]/route.ts:8-14
function resolveBaseUrls(site: Array<string>): string[] {
  if (site[0] === 'federated' && site[1]) {
    return decodeUrls(site[1]);
  }
  const endpoint = site.join('/');
  return [`https://${endpoint}/`];
}

URL structure:

Single site: /www.example.com/docs
Federated: /federated/{encoded-string}

Search Implementation

Federated search queries all sites and merges results:

// From webhelp-search-client.ts:40-88
async search(query: string): Promise<SearchResult> {
  const urls = this.baseUrls;

  // Semantic search only for single sites
  if (urls.length === 1) {
    try {
      const semantic = await this.semanticSearch(query, urls[0]);
      if (!semantic.error && semantic.results.length > 0) {
        return semantic;
      }
    } catch (e) {
      // Fall back to index search
    }
  }

  // Index-based search for federated or fallback
  const mergedResults: SearchResult['results'] = [];

  for (const url of urls) {
    await this.loadIndex(url);
    
    let result: any = null;
    this.indexLoader.performSearch(query, function (r: any) {
      result = r;
    });
    
    const idx = urls.indexOf(url);
    const formatted = this.formatSearchResult(result, url, idx);
    mergedResults.push(...formatted.results);
  }

  // Sort all results by score
  mergedResults.sort((a, b) => b.score - a.score);

  return { results: mergedResults };
}

Federated search skips semantic search even if all sites support it. This ensures consistent scoring across sites.

Result Format

Federated search results include the site index in the document ID:

[
  {
    "title": "Getting Started",
    "id": "0:topics/getting-started.html",
    "url": "https://site1.example.com/docs/topics/getting-started.html",
    "score": 8.5
  },
  {
    "title": "API Reference",
    "id": "1:reference/api.html",
    "url": "https://site2.example.com/docs/reference/api.html",
    "score": 7.2
  },
  {
    "title": "Configuration",
    "id": "0:topics/configuration.html",
    "url": "https://site1.example.com/docs/topics/configuration.html",
    "score": 6.8
  }
]

ID format: {index}:{path}

0: — First URL in the decoded list
1: — Second URL in the decoded list
And so on…

The fetch tool uses the index to resolve the correct base URL when retrieving documents.

Usage Examples

Searching Multiple Oxygen Products

Search across Oxygen XML Editor, Author, and Developer documentation:

{
  "mcpServers": {
    "oxygen-all": {
      "url": "https://webhelp-mcp.vercel.app/federated/{encoded}"
    }
  }
}

Where {encoded} represents:

https://www.oxygenxml.com/doc/versions/26.1/ug-editor/
https://www.oxygenxml.com/doc/versions/26.1/ug-author/
https://www.oxygenxml.com/doc/versions/26.1/ug-developer/

Searching DITA-OT and Oxygen Together

Combine DITA-OT and Oxygen documentation for comprehensive DITA authoring help:

{
  "mcpServers": {
    "dita-ecosystem": {
      "url": "https://webhelp-mcp.vercel.app/federated/{encoded}"
    }
  }
}

Where {encoded} represents:

https://www.dita-ot.org/dev/
https://www.oxygenxml.com/doc/versions/26.1/ug-editor/

Creating Federated URLs

You can create federated URLs programmatically:

import { encodeUrls } from './lib/url-pack';

const urls = [
  'https://www.dita-ot.org/dev/',
  'https://www.oxygenxml.com/doc/versions/26.1/ug-editor/',
  'https://docs.oasis-open.org/dita/v1.3/'
];

const encoded = encodeUrls(urls);
const federatedUrl = `https://webhelp-mcp.vercel.app/federated/${encoded}`;

console.log(federatedUrl);

Performance Considerations

Index Loading

Each site’s search index must be loaded independently:

for (const url of urls) {
  await this.loadIndex(url);
  // ... search ...
}

Loading time:

1 site: ~500ms
3 sites: ~1.5s
5 sites: ~2.5s
10 sites: ~5s

Loading many sites sequentially can cause timeouts. Consider limiting to 3-5 sites per federated search.

Search Execution

Searching happens sequentially, not in parallel:

for (const url of urls) {
  await this.loadIndex(url);
  this.indexLoader.performSearch(query, callback);
  // ...
}

This is a current limitation that could be improved with parallel execution.

Result Merging

Merging and sorting results is fast even with many results:

mergedResults.sort((a, b) => b.score - a.score);

100 results: < 1ms
1000 results: < 10ms

Score Normalization

Different sites may use different scoring scales. The server does not normalize scores, which can affect ranking:

Site A: scores 0-10
Site B: scores 0-100

Results from Site B will dominate the merged list.

This is a known limitation. Future versions may add score normalization.

Best Practices

Limit Sites

Keep federated searches to 3-5 sites for acceptable performance

Group Related Docs

Federate related documentation sets, not random sites

Test Scoring

Verify that results from all sites appear in merged output

Cache Configs

Save encoded URLs in MCP configs rather than generating them each time

Limitations

No Semantic Search

Federated search always uses index-based search, even if all sites support semantic search. Reason: Semantic search scores from different sites aren’t comparable.

Sequential Loading

Sites are loaded and searched sequentially, not in parallel. Impact: Response time scales linearly with the number of sites.

No Score Normalization

Scores from different sites are merged without normalization. Impact: Results from high-scoring sites may dominate.

No Site Labels

Search results don’t indicate which site each result came from (except via the index in the ID). Workaround: Parse the URL or ID to determine the source site.

Error Handling

Partial Failures

If one site fails to load, the entire federated search fails:

for (const url of urls) {
  try {
    await this.loadIndex(url);
  } catch (error: any) {
    return {
      error: `Failed to load index: ${error.message}`,
      results: []
    };
  }
}

A single unavailable site breaks the entire federated search. Consider adding fallback logic for production use.

Invalid Encoded URLs

If the encoded URL parameter is malformed:

export function decodeUrls(encoded: string): string[] {
  if (!encoded) return [];
  const decoded = base64urlDecode(encoded);
  const joined = inflateSync(decoded).toString();
  // ...
}

Malformed encoding will throw an error during decompression.

Future Improvements

Potential enhancements to federated search:

Parallel loading — Load and search sites concurrently
Score normalization — Normalize scores to a 0-1 range per site
Partial success — Return results even if some sites fail
Site labels — Include site name or index in results
Semantic federation — Support semantic search across multiple sites
Result diversity — Ensure results from all sites appear in top results

Next Steps

Search Tool

Learn about single-site search

Fetch Tool

Retrieve documents from federated results

Integration Guide

Set up federated search in Claude Desktop

URL Encoding

Deep dive into the encoding scheme

Get Started

Core Features

Integration

Deployment

​Federated Search

​How It Works

​URL Encoding

​URL Decoding

​Route Handler

​Search Implementation

​Result Format

​Usage Examples

​Searching Multiple Oxygen Products

​Searching DITA-OT and Oxygen Together

​Creating Federated URLs

​Performance Considerations

​Index Loading

​Search Execution

​Result Merging

​Score Normalization

​Best Practices

Limit Sites

Group Related Docs

Test Scoring

Cache Configs

​Limitations

​No Semantic Search

​Sequential Loading

​No Score Normalization

​No Site Labels

​Error Handling

​Partial Failures

​Invalid Encoded URLs

​Future Improvements

​Next Steps

Search Tool

Fetch Tool

Integration Guide

URL Encoding

Build docs developers (and LLMs) love

Federated Search

How It Works

URL Encoding

URL Decoding

Route Handler

Search Implementation

Result Format

Usage Examples

Searching Multiple Oxygen Products

Searching DITA-OT and Oxygen Together

Creating Federated URLs

Performance Considerations

Index Loading

Search Execution

Result Merging

Score Normalization

Best Practices

Limitations

No Semantic Search

Sequential Loading

No Score Normalization

No Site Labels

Error Handling

Partial Failures

Invalid Encoded URLs

Future Improvements

Next Steps