WebHelpSearchClient

Overview

The WebHelpSearchClient class is the primary interface for searching Oxygen WebHelp documentation. It provides a unified search experience that intelligently combines two search strategies:

Semantic Search - Uses the Oxygen Feedback service for AI-powered semantic search
Index Search - Falls back to traditional keyword-based search using the WebHelp search index

The client automatically handles multiple documentation sites, merging results and ranking them by relevance score.

Architecture

The WebHelpSearchClient is built on several key components:

WebHelpIndexLoader - Loads and manages the search index from WebHelp sites
Turndown - Converts HTML content to Markdown format
JSDOM - Parses HTML to extract article content
Proxy Support - Automatically detects and uses HTTP/HTTPS proxy settings

Class Interface

export class WebHelpSearchClient {
  constructor(baseUrls: string | string[] = [])
  
  async loadIndex(baseUrl: string): Promise<void>
  async search(query: string): Promise<SearchResult>
  async semanticSearch(query: string, baseUrl: string, pageSize?: number): Promise<SearchResult>
  async fetchDocumentContent(documentId: string): Promise<Document>
  
  private formatSearchResult(searchResult: any, baseUrl: string, index: number): SearchResult
  private resolveDocumentUrl(documentId: string): string
  extractArticleElement(htmlContent: string): string
}

Types

export interface SearchResult {
  error?: string;
  results: Array<{
    id: string;
    title: string;
    url: string;
    score: number;
  }>;
}

Constructor

baseUrls

string | string[]

required

The base URL(s) of the WebHelp documentation site(s). Can be a single URL string or an array of URLs for multi-site search.

const client = new WebHelpSearchClient('https://docs.example.com');

// Multiple sites
const multiClient = new WebHelpSearchClient([
  'https://docs.example.com/v1',
  'https://docs.example.com/v2'
]);

The constructor will throw an error if no base URL is provided. At least one URL is required for the search client to function.

Methods

search()

Performs an intelligent search that tries semantic search first (for single sites), then falls back to index-based search.

query

string

required

The search query string

SearchResult

object

Contains search results sorted by relevance score (descending)

error

string

Error message if the search failed

results

array

Array of search results

string

Document identifier in format index:path (e.g., 0:/topics/intro.html)

title

string

Page title

url

string

Full URL to the page

score

number

Relevance score (higher is better)

const results = await client.search('authentication');

if (results.error) {
  console.error('Search failed:', results.error);
} else {
  results.results.forEach(result => {
    console.log(`${result.title} (score: ${result.score})`);
    console.log(`  ${result.url}`);
  });
}

The search method automatically merges results from multiple documentation sites and sorts them by relevance score.

semanticSearch()

Performs AI-powered semantic search using the Oxygen Feedback service. This requires the documentation site to have the Oxygen Feedback integration enabled.

query

string

required

The search query string

baseUrl

string

required

The base URL of the documentation site

pageSize

number

default:"10"

Maximum number of results to return

const results = await client.semanticSearch(
  'how to configure authentication',
  'https://docs.example.com',
  5
);

How It Works

Downloads the main page and extracts the Oxygen Feedback deployment token
Sends a POST request to the Oxygen Feedback API with the query
Parses and formats the semantic search results
Returns results with relevance scores

Semantic search respects proxy settings from environment variables: HTTPS_PROXY, https_proxy, HTTP_PROXY, or http_proxy.

fetchDocumentContent()

Retrieves the full content of a documentation page, converted to Markdown format.

documentId

string

required

The document ID from a search result (format: index:path)

Document

object

string

The document identifier

title

string

Page title extracted from HTML

text

string

Page content converted to Markdown

url

string

Full URL to the page

metadata

any

Optional metadata about the document

const searchResults = await client.search('installation');
if (searchResults.results.length > 0) {
  const doc = await client.fetchDocumentContent(searchResults.results[0].id);
  
  console.log(`# ${doc.title}\n`);
  console.log(doc.text);
}

Content Processing

The method performs several transformations:

Downloads the HTML page
Extracts only the <article> element (if present)
Converts HTML to Markdown using Turndown
Extracts the page title

By extracting only the article element, the method excludes navigation, headers, footers, and other non-content elements, providing clean documentation content.

loadIndex()

Explicitly loads the search index for a specific base URL. This is called automatically by search(), but can be called manually for preloading.

baseUrl

string

required

The base URL of the documentation site

// Preload index before searching
await client.loadIndex('https://docs.example.com');
const results = await client.search('query');

extractArticleElement()

Extracts the main article content from HTML, removing navigation and other UI elements.

htmlContent

string

required

The full HTML content of the page

const html = await downloadFile('https://docs.example.com/page.html');
const articleHtml = client.extractArticleElement(html);

Implementation Details

Search Strategy

The search() method implements an intelligent fallback strategy:

Source: webhelp-search-client.ts:40-88

async search(query: string): Promise<SearchResult> {
  // For single sites, try semantic search first
  if (urls.length === 1) {
    try {
      const semantic = await this.semanticSearch(query, urls[0]);
      if (!semantic.error && semantic.results.length > 0) {
        return semantic;
      }
    } catch (e) {
      // Fall back to index search
    }
  }

  // Load and search all indexes
  const mergedResults = [];
  for (const url of urls) {
    await this.loadIndex(url);
    const result = this.indexLoader.performSearch(query, callback);
    mergedResults.push(...result);
  }

  // Sort by relevance score
  mergedResults.sort((a, b) => b.score - a.score);
  return { results: mergedResults };
}

Semantic search is only attempted for single-site queries. Multi-site queries always use index-based search.

Document ID Format

Document IDs use a special format to support multiple documentation sites:

index:path

index - The zero-based index of the base URL in the constructor array
path - The relative path to the document (e.g., /topics/intro.html)

Examples:

0:/topics/intro.html - First site, intro page
1:/api/reference.html - Second site, reference page

Markdown Conversion

The Turndown service is configured with sensible defaults:

const turndownService = new TurndownService({
  headingStyle: 'atx',          // Use # for headings
  codeBlockStyle: 'fenced',     // Use ``` for code blocks
  bulletListMarker: '-'         // Use - for lists
});

Usage Examples

Basic Search

import { WebHelpSearchClient } from './webhelp-search-client';

const client = new WebHelpSearchClient('https://docs.example.com');

// Search for documentation
const results = await client.search('authentication');

if (!results.error) {
  console.log(`Found ${results.results.length} results`);
  for (const result of results.results) {
    console.log(`${result.title} - ${result.url}`);
  }
}

Multi-Site Search

const client = new WebHelpSearchClient([
  'https://docs.example.com/v1',
  'https://docs.example.com/v2',
  'https://docs.example.com/v3'
]);

// Search across all versions
const results = await client.search('deprecated features');

// Results are merged and sorted by relevance
for (const result of results.results) {
  console.log(`[${result.id.split(':')[0]}] ${result.title}`);
}

Fetch Full Content

const searchResults = await client.search('getting started');

if (searchResults.results.length > 0) {
  // Get the most relevant result
  const topResult = searchResults.results[0];
  
  // Fetch full content as Markdown
  const doc = await client.fetchDocumentContent(topResult.id);
  
  console.log(`# ${doc.title}\n`);
  console.log(doc.text);
  console.log(`\nSource: ${doc.url}`);
}

Using Semantic Search Directly

const client = new WebHelpSearchClient('https://docs.example.com');

// Use semantic search for better results on conceptual queries
const results = await client.semanticSearch(
  'how do I set up user authentication with OAuth2',
  'https://docs.example.com',
  5
);

if (results.error) {
  console.log('Semantic search not available, falling back to index search');
  const fallbackResults = await client.search('OAuth2 authentication');
}

Error Handling

try {
  const client = new WebHelpSearchClient('https://docs.example.com');
  const results = await client.search('query');
  
  if (results.error) {
    console.error('Search error:', results.error);
    return;
  }
  
  if (results.results.length === 0) {
    console.log('No results found');
    return;
  }
  
  // Process results
  for (const result of results.results) {
    console.log(result.title);
  }
} catch (error) {
  console.error('Client error:', error.message);
}

Design Decisions

Why Two Search Strategies?

The client implements both semantic and index-based search for optimal results:

Semantic Search provides better results for natural language queries and conceptual questions
Index Search is more reliable for keyword-based queries and works without external dependencies
Automatic Fallback ensures search always works, even if semantic search is unavailable

Multi-Site Support

Supporting multiple documentation sites in a single client enables:

Version Search - Search across multiple versions of documentation
Product Search - Search across related products
Unified Results - Merge and rank results from all sites

Markdown Conversion

Converting HTML to Markdown provides:

Clean Content - Removes styling and non-content elements
LLM-Friendly - Ideal format for language model consumption
Portable - Easy to process, display, and store

IndexLoader

Learn about the index loading mechanism

URL Encoding

Understand URL compression utilities

MCP Tools

Architecture

Overview

Architecture

Class Interface

Types

Constructor

Methods

search()

semanticSearch()

How It Works

fetchDocumentContent()

Content Processing

loadIndex()

extractArticleElement()

Implementation Details

Search Strategy

Document ID Format

Markdown Conversion

Usage Examples

Basic Search

Multi-Site Search

Fetch Full Content

Using Semantic Search Directly

Error Handling

Design Decisions

Why Two Search Strategies?

Multi-Site Support

Markdown Conversion

IndexLoader

URL Encoding

Build docs developers (and LLMs) love

MCP Tools

Architecture

​Overview

​Architecture

​Class Interface

​Types

​Constructor

​Methods

​search()

​semanticSearch()

​How It Works

​fetchDocumentContent()

​Content Processing

​loadIndex()

​extractArticleElement()

​Implementation Details

​Search Strategy

​Document ID Format

​Markdown Conversion

​Usage Examples

​Basic Search

​Multi-Site Search

​Fetch Full Content

​Using Semantic Search Directly

​Error Handling

​Design Decisions

​Why Two Search Strategies?

​Multi-Site Support

​Markdown Conversion

​Related

IndexLoader

URL Encoding

Build docs developers (and LLMs) love

Overview

Architecture

Class Interface

Types

Constructor

Methods

search()

semanticSearch()

How It Works

fetchDocumentContent()

Content Processing

loadIndex()

extractArticleElement()

Implementation Details

Search Strategy

Document ID Format

Markdown Conversion

Usage Examples

Basic Search

Multi-Site Search

Fetch Full Content

Using Semantic Search Directly

Error Handling

Design Decisions

Why Two Search Strategies?

Multi-Site Support

Markdown Conversion

Related