Skip to main content

Federated Search

Federated search enables querying multiple WebHelp documentation sites in a single request. Results from all sites are merged, scored, and ranked together, providing comprehensive answers across your entire documentation ecosystem.

How It Works

Instead of pointing to a single documentation site, federated search uses a special URL format that encodes multiple site URLs:
https://webhelp-mcp.vercel.app/federated/{encoded-urls}
The server:
  1. Decodes the URL list using the decodeUrls function
  2. Searches each site independently
  3. Merges all results into a single array
  4. Sorts by relevance score
  5. Returns the top 10 results across all sites
Federated search always uses index-based search, not semantic search, to ensure consistent scoring across sites.

URL Encoding

Federated search uses a compressed encoding scheme to pack multiple URLs into a single path segment:
// From url-pack.ts:11-34
export function encodeUrls(urls: string[]): string {
  if (!urls || urls.length === 0) return '';

  const sorted = [...urls].sort();
  const diffs: string[] = [];
  let last = '';

  for (const url of sorted) {
    if (!last) {
      diffs.push(url);
    } else {
      let i = 0;
      const minLen = Math.min(url.length, last.length);
      while (i < minLen && url[i] === last[i]) i++;
      const suffix = url.slice(i);
      diffs.push(`${i}|${suffix}`);
    }
    last = url;
  }

  const joined = diffs.join('\n');
  const compressed = deflateSync(joined);
  return base64urlEncode(compressed);
}
Encoding steps:
  1. Sort URLs alphabetically to maximize prefix similarity
  2. Compute prefix differences between consecutive URLs
  3. Join differences with newlines
  4. Compress with zlib deflate
  5. Encode as base64url (URL-safe)
The encoding is highly efficient — 10 similar URLs might compress to just 50-100 characters.

URL Decoding

The server decodes the compressed URL list on each request:
// From url-pack.ts:36-61
export function decodeUrls(encoded: string): string[] {
  if (!encoded) return [];
  const decoded = base64urlDecode(encoded);
  const joined = inflateSync(decoded).toString();
  const diffs = joined.split('\n');

  const urls: string[] = [];
  let last = '';
  for (const diff of diffs) {
    const sepIndex = diff.indexOf('|');
    if (sepIndex > -1) {
      const prefixLen = parseInt(diff.slice(0, sepIndex), 10);
      if (!isNaN(prefixLen)) {
        const prefix = last.slice(0, prefixLen);
        const url = prefix + diff.slice(sepIndex + 1);
        urls.push(url);
        last = url;
        continue;
      }
    }
    urls.push(diff);
    last = diff;
  }
  return urls;
}

Route Handler

The Next.js route handler detects federated mode and decodes URLs:
// From app/[...site]/route.ts:8-14
function resolveBaseUrls(site: Array<string>): string[] {
  if (site[0] === 'federated' && site[1]) {
    return decodeUrls(site[1]);
  }
  const endpoint = site.join('/');
  return [`https://${endpoint}/`];
}
URL structure:
  • Single site: /www.example.com/docs
  • Federated: /federated/{encoded-string}

Search Implementation

Federated search queries all sites and merges results:
// From webhelp-search-client.ts:40-88
async search(query: string): Promise<SearchResult> {
  const urls = this.baseUrls;

  // Semantic search only for single sites
  if (urls.length === 1) {
    try {
      const semantic = await this.semanticSearch(query, urls[0]);
      if (!semantic.error && semantic.results.length > 0) {
        return semantic;
      }
    } catch (e) {
      // Fall back to index search
    }
  }

  // Index-based search for federated or fallback
  const mergedResults: SearchResult['results'] = [];

  for (const url of urls) {
    await this.loadIndex(url);
    
    let result: any = null;
    this.indexLoader.performSearch(query, function (r: any) {
      result = r;
    });
    
    const idx = urls.indexOf(url);
    const formatted = this.formatSearchResult(result, url, idx);
    mergedResults.push(...formatted.results);
  }

  // Sort all results by score
  mergedResults.sort((a, b) => b.score - a.score);

  return { results: mergedResults };
}
Federated search skips semantic search even if all sites support it. This ensures consistent scoring across sites.

Result Format

Federated search results include the site index in the document ID:
[
  {
    "title": "Getting Started",
    "id": "0:topics/getting-started.html",
    "url": "https://site1.example.com/docs/topics/getting-started.html",
    "score": 8.5
  },
  {
    "title": "API Reference",
    "id": "1:reference/api.html",
    "url": "https://site2.example.com/docs/reference/api.html",
    "score": 7.2
  },
  {
    "title": "Configuration",
    "id": "0:topics/configuration.html",
    "url": "https://site1.example.com/docs/topics/configuration.html",
    "score": 6.8
  }
]
ID format: {index}:{path}
  • 0: — First URL in the decoded list
  • 1: — Second URL in the decoded list
  • And so on…
The fetch tool uses the index to resolve the correct base URL when retrieving documents.

Usage Examples

Searching Multiple Oxygen Products

Search across Oxygen XML Editor, Author, and Developer documentation:
{
  "mcpServers": {
    "oxygen-all": {
      "url": "https://webhelp-mcp.vercel.app/federated/{encoded}"
    }
  }
}
Where {encoded} represents:
https://www.oxygenxml.com/doc/versions/26.1/ug-editor/
https://www.oxygenxml.com/doc/versions/26.1/ug-author/
https://www.oxygenxml.com/doc/versions/26.1/ug-developer/

Searching DITA-OT and Oxygen Together

Combine DITA-OT and Oxygen documentation for comprehensive DITA authoring help:
{
  "mcpServers": {
    "dita-ecosystem": {
      "url": "https://webhelp-mcp.vercel.app/federated/{encoded}"
    }
  }
}
Where {encoded} represents:
https://www.dita-ot.org/dev/
https://www.oxygenxml.com/doc/versions/26.1/ug-editor/

Creating Federated URLs

You can create federated URLs programmatically:
import { encodeUrls } from './lib/url-pack';

const urls = [
  'https://www.dita-ot.org/dev/',
  'https://www.oxygenxml.com/doc/versions/26.1/ug-editor/',
  'https://docs.oasis-open.org/dita/v1.3/'
];

const encoded = encodeUrls(urls);
const federatedUrl = `https://webhelp-mcp.vercel.app/federated/${encoded}`;

console.log(federatedUrl);

Performance Considerations

Index Loading

Each site’s search index must be loaded independently:
for (const url of urls) {
  await this.loadIndex(url);
  // ... search ...
}
Loading time:
  • 1 site: ~500ms
  • 3 sites: ~1.5s
  • 5 sites: ~2.5s
  • 10 sites: ~5s
Loading many sites sequentially can cause timeouts. Consider limiting to 3-5 sites per federated search.

Search Execution

Searching happens sequentially, not in parallel:
for (const url of urls) {
  await this.loadIndex(url);
  this.indexLoader.performSearch(query, callback);
  // ...
}
This is a current limitation that could be improved with parallel execution.

Result Merging

Merging and sorting results is fast even with many results:
mergedResults.sort((a, b) => b.score - a.score);
  • 100 results: < 1ms
  • 1000 results: < 10ms

Score Normalization

Different sites may use different scoring scales. The server does not normalize scores, which can affect ranking:
  • Site A: scores 0-10
  • Site B: scores 0-100
Results from Site B will dominate the merged list.
This is a known limitation. Future versions may add score normalization.

Best Practices

Limit Sites

Keep federated searches to 3-5 sites for acceptable performance

Group Related Docs

Federate related documentation sets, not random sites

Test Scoring

Verify that results from all sites appear in merged output

Cache Configs

Save encoded URLs in MCP configs rather than generating them each time

Limitations

Federated search always uses index-based search, even if all sites support semantic search. Reason: Semantic search scores from different sites aren’t comparable.

Sequential Loading

Sites are loaded and searched sequentially, not in parallel. Impact: Response time scales linearly with the number of sites.

No Score Normalization

Scores from different sites are merged without normalization. Impact: Results from high-scoring sites may dominate.

No Site Labels

Search results don’t indicate which site each result came from (except via the index in the ID). Workaround: Parse the URL or ID to determine the source site.

Error Handling

Partial Failures

If one site fails to load, the entire federated search fails:
for (const url of urls) {
  try {
    await this.loadIndex(url);
  } catch (error: any) {
    return {
      error: `Failed to load index: ${error.message}`,
      results: []
    };
  }
}
A single unavailable site breaks the entire federated search. Consider adding fallback logic for production use.

Invalid Encoded URLs

If the encoded URL parameter is malformed:
export function decodeUrls(encoded: string): string[] {
  if (!encoded) return [];
  const decoded = base64urlDecode(encoded);
  const joined = inflateSync(decoded).toString();
  // ...
}
Malformed encoding will throw an error during decompression.

Future Improvements

Potential enhancements to federated search:
  1. Parallel loading — Load and search sites concurrently
  2. Score normalization — Normalize scores to a 0-1 range per site
  3. Partial success — Return results even if some sites fail
  4. Site labels — Include site name or index in results
  5. Semantic federation — Support semantic search across multiple sites
  6. Result diversity — Ensure results from all sites appear in top results

Next Steps

Search Tool

Learn about single-site search

Fetch Tool

Retrieve documents from federated results

Integration Guide

Set up federated search in Claude Desktop

URL Encoding

Deep dive into the encoding scheme

Build docs developers (and LLMs) love