Skip to main content

Search Tool

The search tool enables AI assistants to query WebHelp documentation sites efficiently. It automatically chooses between semantic search (when available) and index-based search to deliver the most relevant results.

How It Works

When you invoke the search tool, the server:
  1. Attempts semantic search first (for single-site queries)
  2. Falls back to index-based search if semantic search is unavailable
  3. Returns up to 10 results sorted by relevance score
  4. Provides document IDs that can be used with the fetch tool
For federated search across multiple sites, see Federated Search.

Search Strategies

The WebHelp MCP Server uses two complementary search approaches: For single-site queries, the server first attempts semantic search using Oxygen Feedback’s AI-powered search service. This provides natural language understanding and ranks results by semantic relevance.
// From webhelp-search-client.ts:95-99
async semanticSearch(
  query: string,
  baseUrl: string,
  pageSize: number = 10
): Promise<SearchResult>
Semantic search:
  • Extracts the deployment token from the WebHelp site
  • Queries the Oxygen Feedback API at feedback.oxygenxml.com
  • Returns results with relevance scores
  • Falls back gracefully if unavailable
When semantic search isn’t available or for multi-site queries, the server uses the WebHelp search index directly:
// From webhelp-search-client.ts:56-83
for (const url of urls) {
  await this.loadIndex(url);
  this.indexLoader.performSearch(query, function (r: any) {
    result = r;
  });
  const formatted = this.formatSearchResult(result, url, idx);
  mergedResults.push(...formatted.results);
}
mergedResults.sort((a, b) => b.score - a.score);
Index-based search:
  • Downloads the WebHelp search index files (index-1.js, index-2.js, etc.)
  • Loads stopwords and file metadata
  • Executes the WebHelp search engine (nwSearchFnt.js)
  • Supports boolean operators (AND, OR)

Usage Example

Here’s how the search tool is defined in the MCP server:
// From app/[...site]/route.ts:37-84
server.tool(
  "search",
  searchDescription,
  {
    query: z
      .string()
      .describe("Search query string (supports boolean operators like AND, OR)"),
  },
  async ({ query }) => {
    const result = await searchClient.search(query);
    const maxResultsToUse = 10;

    if (result.error) {
      return {
        content: [{
          type: "text",
          text: `Search error: ${result.error}`
        }],
        isError: true
      };
    }

    const topResults = result.results.slice(0, maxResultsToUse);
    const results = topResults.map((doc: any) => ({
      title: doc.title,
      id: doc.id,
      url: doc.url
    }));
    return {
      content: [{
        type: "text",
        text: JSON.stringify(results)
      }]
    };
  }
);

Parameters

query
string
required
Search query string. Supports boolean operators like AND and OR for index-based search.

Return Value

The search tool returns a JSON array of results:
[
  {
    "title": "Getting Started with WebHelp",
    "id": "0:topics/getting-started.html",
    "url": "https://example.com/docs/topics/getting-started.html"
  },
  {
    "title": "Advanced Configuration",
    "id": "0:topics/configuration.html",
    "url": "https://example.com/docs/topics/configuration.html"
  }
]

Result Fields

  • title — The document title extracted from the search index
  • id — Composite identifier in format index:path (used for fetching)
  • url — Full URL to the document
The id field is crucial — pass it to the fetch tool to retrieve the full document content.

Real-World Examples

Searching DITA Documentation

# Claude Desktop example
Search for "publishing output" in DITA OT docs
MCP server configuration:
{
  "mcpServers": {
    "dita-ot-docs": {
      "url": "https://webhelp-mcp.vercel.app/www.dita-ot.org/dev"
    }
  }
}

Searching Oxygen XML Documentation

# Search for "transformation scenarios"
MCP server configuration:
{
  "mcpServers": {
    "oxygen-docs": {
      "url": "https://webhelp-mcp.vercel.app/www.oxygenxml.com/doc/versions/26.1/ug-editor"
    }
  }
}

Query Tips

Use Specific Terms

“DITA map validation” works better than “checking maps”

Boolean Operators

“publishing AND PDF” to require both terms (index search only)

Natural Language

“How do I publish output?” works well with semantic search

Short Queries

2-5 word queries typically yield better results

Error Handling

The search tool handles various error scenarios:

Index Load Failure

{
  "error": "Failed to load index: HTTP 404: Not Found",
  "results": []
}
This typically means the WebHelp site doesn’t exist or the search index files aren’t accessible.

Search Engine Error

{
  "error": "Search error: Cannot read property 'w' of undefined",
  "results": []
}
Search errors usually indicate malformed or incomplete search indexes. Try fetching the index files directly to diagnose.

Performance Considerations

Index Loading

The first search request for a site loads the entire search index:
// From webhelp-index-loader.ts:328-354
async loadIndex(baseUrl: string): Promise<void> {
  const searchUrl = `${baseUrl.replace(/\/$/, '')}/oxygen-webhelp/app/search`;
  
  // Download all files
  const nwSearchFntJs = await this.downloadSearchEngine(searchUrl);
  const indexParts = await this.downloadIndexParts(searchUrl);
  const metadataFiles = await this.downloadMetadataFiles(searchUrl);
  
  // Process and initialize
  this.processStopwords(metadataFiles.stopwords);
  this.processFileInfoList(metadataFiles.htmlFileInfoList);
  this.processIndexParts(indexParts);
  this.initializeSearchEngine(nwSearchFntJs);
}
Index files loaded:
  • nwSearchFnt.js — Search engine code
  • index-1.js through index-N.js — Word indexes
  • stopwords.js — Stop words list
  • htmlFileInfoList.js — File metadata
Index loading is cached per deployment. Subsequent searches are much faster.

Result Limits

The server returns a maximum of 10 results to keep responses fast and manageable:
const maxResultsToUse = 10;
const topResults = result.results.slice(0, maxResultsToUse);

Next Steps

Fetch Documents

Retrieve full content after searching

Federated Search

Search multiple sites simultaneously

Semantic Search

Deep dive into AI-powered search

Integration Guide

Connect to AI tools

Build docs developers (and LLMs) love