Skip to main content

Overview

The WebHelpSearchClient class is the primary interface for searching Oxygen WebHelp documentation. It provides a unified search experience that intelligently combines two search strategies:
  1. Semantic Search - Uses the Oxygen Feedback service for AI-powered semantic search
  2. Index Search - Falls back to traditional keyword-based search using the WebHelp search index
The client automatically handles multiple documentation sites, merging results and ranking them by relevance score.

Architecture

The WebHelpSearchClient is built on several key components:
  • WebHelpIndexLoader - Loads and manages the search index from WebHelp sites
  • Turndown - Converts HTML content to Markdown format
  • JSDOM - Parses HTML to extract article content
  • Proxy Support - Automatically detects and uses HTTP/HTTPS proxy settings

Class Interface

export class WebHelpSearchClient {
  constructor(baseUrls: string | string[] = [])
  
  async loadIndex(baseUrl: string): Promise<void>
  async search(query: string): Promise<SearchResult>
  async semanticSearch(query: string, baseUrl: string, pageSize?: number): Promise<SearchResult>
  async fetchDocumentContent(documentId: string): Promise<Document>
  
  private formatSearchResult(searchResult: any, baseUrl: string, index: number): SearchResult
  private resolveDocumentUrl(documentId: string): string
  extractArticleElement(htmlContent: string): string
}

Types

export interface SearchResult {
  error?: string;
  results: Array<{
    id: string;
    title: string;
    url: string;
    score: number;
  }>;
}

Constructor

baseUrls
string | string[]
required
The base URL(s) of the WebHelp documentation site(s). Can be a single URL string or an array of URLs for multi-site search.
const client = new WebHelpSearchClient('https://docs.example.com');

// Multiple sites
const multiClient = new WebHelpSearchClient([
  'https://docs.example.com/v1',
  'https://docs.example.com/v2'
]);
The constructor will throw an error if no base URL is provided. At least one URL is required for the search client to function.

Methods

Performs an intelligent search that tries semantic search first (for single sites), then falls back to index-based search.
query
string
required
The search query string
SearchResult
object
Contains search results sorted by relevance score (descending)
error
string
Error message if the search failed
results
array
Array of search results
id
string
Document identifier in format index:path (e.g., 0:/topics/intro.html)
title
string
Page title
url
string
Full URL to the page
score
number
Relevance score (higher is better)
const results = await client.search('authentication');

if (results.error) {
  console.error('Search failed:', results.error);
} else {
  results.results.forEach(result => {
    console.log(`${result.title} (score: ${result.score})`);
    console.log(`  ${result.url}`);
  });
}
The search method automatically merges results from multiple documentation sites and sorts them by relevance score.

semanticSearch()

Performs AI-powered semantic search using the Oxygen Feedback service. This requires the documentation site to have the Oxygen Feedback integration enabled.
query
string
required
The search query string
baseUrl
string
required
The base URL of the documentation site
pageSize
number
default:"10"
Maximum number of results to return
const results = await client.semanticSearch(
  'how to configure authentication',
  'https://docs.example.com',
  5
);

How It Works

  1. Downloads the main page and extracts the Oxygen Feedback deployment token
  2. Sends a POST request to the Oxygen Feedback API with the query
  3. Parses and formats the semantic search results
  4. Returns results with relevance scores
Semantic search respects proxy settings from environment variables: HTTPS_PROXY, https_proxy, HTTP_PROXY, or http_proxy.

fetchDocumentContent()

Retrieves the full content of a documentation page, converted to Markdown format.
documentId
string
required
The document ID from a search result (format: index:path)
Document
object
id
string
The document identifier
title
string
Page title extracted from HTML
text
string
Page content converted to Markdown
url
string
Full URL to the page
metadata
any
Optional metadata about the document
const searchResults = await client.search('installation');
if (searchResults.results.length > 0) {
  const doc = await client.fetchDocumentContent(searchResults.results[0].id);
  
  console.log(`# ${doc.title}\n`);
  console.log(doc.text);
}

Content Processing

The method performs several transformations:
  1. Downloads the HTML page
  2. Extracts only the <article> element (if present)
  3. Converts HTML to Markdown using Turndown
  4. Extracts the page title
By extracting only the article element, the method excludes navigation, headers, footers, and other non-content elements, providing clean documentation content.

loadIndex()

Explicitly loads the search index for a specific base URL. This is called automatically by search(), but can be called manually for preloading.
baseUrl
string
required
The base URL of the documentation site
// Preload index before searching
await client.loadIndex('https://docs.example.com');
const results = await client.search('query');

extractArticleElement()

Extracts the main article content from HTML, removing navigation and other UI elements.
htmlContent
string
required
The full HTML content of the page
const html = await downloadFile('https://docs.example.com/page.html');
const articleHtml = client.extractArticleElement(html);

Implementation Details

Search Strategy

The search() method implements an intelligent fallback strategy:
Source: webhelp-search-client.ts:40-88
async search(query: string): Promise<SearchResult> {
  // For single sites, try semantic search first
  if (urls.length === 1) {
    try {
      const semantic = await this.semanticSearch(query, urls[0]);
      if (!semantic.error && semantic.results.length > 0) {
        return semantic;
      }
    } catch (e) {
      // Fall back to index search
    }
  }

  // Load and search all indexes
  const mergedResults = [];
  for (const url of urls) {
    await this.loadIndex(url);
    const result = this.indexLoader.performSearch(query, callback);
    mergedResults.push(...result);
  }

  // Sort by relevance score
  mergedResults.sort((a, b) => b.score - a.score);
  return { results: mergedResults };
}
Semantic search is only attempted for single-site queries. Multi-site queries always use index-based search.

Document ID Format

Document IDs use a special format to support multiple documentation sites:
index:path
  • index - The zero-based index of the base URL in the constructor array
  • path - The relative path to the document (e.g., /topics/intro.html)
Examples:
  • 0:/topics/intro.html - First site, intro page
  • 1:/api/reference.html - Second site, reference page

Markdown Conversion

The Turndown service is configured with sensible defaults:
const turndownService = new TurndownService({
  headingStyle: 'atx',          // Use # for headings
  codeBlockStyle: 'fenced',     // Use ``` for code blocks
  bulletListMarker: '-'         // Use - for lists
});

Usage Examples

import { WebHelpSearchClient } from './webhelp-search-client';

const client = new WebHelpSearchClient('https://docs.example.com');

// Search for documentation
const results = await client.search('authentication');

if (!results.error) {
  console.log(`Found ${results.results.length} results`);
  for (const result of results.results) {
    console.log(`${result.title} - ${result.url}`);
  }
}
const client = new WebHelpSearchClient([
  'https://docs.example.com/v1',
  'https://docs.example.com/v2',
  'https://docs.example.com/v3'
]);

// Search across all versions
const results = await client.search('deprecated features');

// Results are merged and sorted by relevance
for (const result of results.results) {
  console.log(`[${result.id.split(':')[0]}] ${result.title}`);
}

Fetch Full Content

const searchResults = await client.search('getting started');

if (searchResults.results.length > 0) {
  // Get the most relevant result
  const topResult = searchResults.results[0];
  
  // Fetch full content as Markdown
  const doc = await client.fetchDocumentContent(topResult.id);
  
  console.log(`# ${doc.title}\n`);
  console.log(doc.text);
  console.log(`\nSource: ${doc.url}`);
}

Using Semantic Search Directly

const client = new WebHelpSearchClient('https://docs.example.com');

// Use semantic search for better results on conceptual queries
const results = await client.semanticSearch(
  'how do I set up user authentication with OAuth2',
  'https://docs.example.com',
  5
);

if (results.error) {
  console.log('Semantic search not available, falling back to index search');
  const fallbackResults = await client.search('OAuth2 authentication');
}

Error Handling

try {
  const client = new WebHelpSearchClient('https://docs.example.com');
  const results = await client.search('query');
  
  if (results.error) {
    console.error('Search error:', results.error);
    return;
  }
  
  if (results.results.length === 0) {
    console.log('No results found');
    return;
  }
  
  // Process results
  for (const result of results.results) {
    console.log(result.title);
  }
} catch (error) {
  console.error('Client error:', error.message);
}

Design Decisions

Why Two Search Strategies?

The client implements both semantic and index-based search for optimal results:
  • Semantic Search provides better results for natural language queries and conceptual questions
  • Index Search is more reliable for keyword-based queries and works without external dependencies
  • Automatic Fallback ensures search always works, even if semantic search is unavailable

Multi-Site Support

Supporting multiple documentation sites in a single client enables:
  • Version Search - Search across multiple versions of documentation
  • Product Search - Search across related products
  • Unified Results - Merge and rank results from all sites

Markdown Conversion

Converting HTML to Markdown provides:
  • Clean Content - Removes styling and non-content elements
  • LLM-Friendly - Ideal format for language model consumption
  • Portable - Easy to process, display, and store

IndexLoader

Learn about the index loading mechanism

URL Encoding

Understand URL compression utilities

Build docs developers (and LLMs) love