Overview
The WebHelpSearchClient class is the primary interface for searching Oxygen WebHelp documentation. It provides a unified search experience that intelligently combines two search strategies:
Semantic Search - Uses the Oxygen Feedback service for AI-powered semantic search
Index Search - Falls back to traditional keyword-based search using the WebHelp search index
The client automatically handles multiple documentation sites, merging results and ranking them by relevance score.
Architecture
The WebHelpSearchClient is built on several key components:
WebHelpIndexLoader - Loads and manages the search index from WebHelp sites
Turndown - Converts HTML content to Markdown format
JSDOM - Parses HTML to extract article content
Proxy Support - Automatically detects and uses HTTP/HTTPS proxy settings
Class Interface
export class WebHelpSearchClient {
constructor ( baseUrls : string | string [] = [])
async loadIndex ( baseUrl : string ) : Promise < void >
async search ( query : string ) : Promise < SearchResult >
async semanticSearch ( query : string , baseUrl : string , pageSize ?: number ) : Promise < SearchResult >
async fetchDocumentContent ( documentId : string ) : Promise < Document >
private formatSearchResult ( searchResult : any , baseUrl : string , index : number ) : SearchResult
private resolveDocumentUrl ( documentId : string ) : string
extractArticleElement ( htmlContent : string ) : string
}
Types
export interface SearchResult {
error ?: string ;
results : Array <{
id : string ;
title : string ;
url : string ;
score : number ;
}>;
}
Constructor
baseUrls
string | string[]
required
The base URL(s) of the WebHelp documentation site(s). Can be a single URL string or an array of URLs for multi-site search.
const client = new WebHelpSearchClient ( 'https://docs.example.com' );
// Multiple sites
const multiClient = new WebHelpSearchClient ([
'https://docs.example.com/v1' ,
'https://docs.example.com/v2'
]);
The constructor will throw an error if no base URL is provided. At least one URL is required for the search client to function.
Methods
search()
Performs an intelligent search that tries semantic search first (for single sites), then falls back to index-based search.
Contains search results sorted by relevance score (descending) Error message if the search failed
Array of search results Document identifier in format index:path (e.g., 0:/topics/intro.html)
Relevance score (higher is better)
const results = await client . search ( 'authentication' );
if ( results . error ) {
console . error ( 'Search failed:' , results . error );
} else {
results . results . forEach ( result => {
console . log ( ` ${ result . title } (score: ${ result . score } )` );
console . log ( ` ${ result . url } ` );
});
}
The search method automatically merges results from multiple documentation sites and sorts them by relevance score.
semanticSearch()
Performs AI-powered semantic search using the Oxygen Feedback service. This requires the documentation site to have the Oxygen Feedback integration enabled.
The base URL of the documentation site
Maximum number of results to return
const results = await client . semanticSearch (
'how to configure authentication' ,
'https://docs.example.com' ,
5
);
How It Works
Downloads the main page and extracts the Oxygen Feedback deployment token
Sends a POST request to the Oxygen Feedback API with the query
Parses and formats the semantic search results
Returns results with relevance scores
Semantic search respects proxy settings from environment variables: HTTPS_PROXY, https_proxy, HTTP_PROXY, or http_proxy.
fetchDocumentContent()
Retrieves the full content of a documentation page, converted to Markdown format.
The document ID from a search result (format: index:path)
Page title extracted from HTML
Page content converted to Markdown
Optional metadata about the document
const searchResults = await client . search ( 'installation' );
if ( searchResults . results . length > 0 ) {
const doc = await client . fetchDocumentContent ( searchResults . results [ 0 ]. id );
console . log ( `# ${ doc . title } \n ` );
console . log ( doc . text );
}
Content Processing
The method performs several transformations:
Downloads the HTML page
Extracts only the <article> element (if present)
Converts HTML to Markdown using Turndown
Extracts the page title
By extracting only the article element, the method excludes navigation, headers, footers, and other non-content elements, providing clean documentation content.
loadIndex()
Explicitly loads the search index for a specific base URL. This is called automatically by search(), but can be called manually for preloading.
The base URL of the documentation site
// Preload index before searching
await client . loadIndex ( 'https://docs.example.com' );
const results = await client . search ( 'query' );
extractArticleElement()
Extracts the main article content from HTML, removing navigation and other UI elements.
The full HTML content of the page
const html = await downloadFile ( 'https://docs.example.com/page.html' );
const articleHtml = client . extractArticleElement ( html );
Implementation Details
Search Strategy
The search() method implements an intelligent fallback strategy:
Source: webhelp-search-client.ts:40-88
async search ( query : string ): Promise < SearchResult > {
// For single sites, try semantic search first
if (urls.length === 1) {
try {
const semantic = await this . semanticSearch ( query , urls [ 0 ]);
if ( ! semantic . error && semantic . results . length > 0 ) {
return semantic ;
}
} catch ( e ) {
// Fall back to index search
}
}
// Load and search all indexes
const mergedResults = [];
for ( const url of urls ) {
await this . loadIndex ( url );
const result = this . indexLoader . performSearch ( query , callback );
mergedResults . push ( ... result );
}
// Sort by relevance score
mergedResults.sort(( a , b) => b.score - a.score);
return { results : mergedResults };
}
Semantic search is only attempted for single-site queries. Multi-site queries always use index-based search.
Document IDs use a special format to support multiple documentation sites:
index - The zero-based index of the base URL in the constructor array
path - The relative path to the document (e.g., /topics/intro.html)
Examples:
0:/topics/intro.html - First site, intro page
1:/api/reference.html - Second site, reference page
Markdown Conversion
The Turndown service is configured with sensible defaults:
const turndownService = new TurndownService ({
headingStyle: 'atx' , // Use # for headings
codeBlockStyle: 'fenced' , // Use ``` for code blocks
bulletListMarker: '-' // Use - for lists
});
Usage Examples
Basic Search
import { WebHelpSearchClient } from './webhelp-search-client' ;
const client = new WebHelpSearchClient ( 'https://docs.example.com' );
// Search for documentation
const results = await client . search ( 'authentication' );
if ( ! results . error ) {
console . log ( `Found ${ results . results . length } results` );
for ( const result of results . results ) {
console . log ( ` ${ result . title } - ${ result . url } ` );
}
}
Multi-Site Search
const client = new WebHelpSearchClient ([
'https://docs.example.com/v1' ,
'https://docs.example.com/v2' ,
'https://docs.example.com/v3'
]);
// Search across all versions
const results = await client . search ( 'deprecated features' );
// Results are merged and sorted by relevance
for ( const result of results . results ) {
console . log ( `[ ${ result . id . split ( ':' )[ 0 ] } ] ${ result . title } ` );
}
Fetch Full Content
const searchResults = await client . search ( 'getting started' );
if ( searchResults . results . length > 0 ) {
// Get the most relevant result
const topResult = searchResults . results [ 0 ];
// Fetch full content as Markdown
const doc = await client . fetchDocumentContent ( topResult . id );
console . log ( `# ${ doc . title } \n ` );
console . log ( doc . text );
console . log ( ` \n Source: ${ doc . url } ` );
}
Using Semantic Search Directly
const client = new WebHelpSearchClient ( 'https://docs.example.com' );
// Use semantic search for better results on conceptual queries
const results = await client . semanticSearch (
'how do I set up user authentication with OAuth2' ,
'https://docs.example.com' ,
5
);
if ( results . error ) {
console . log ( 'Semantic search not available, falling back to index search' );
const fallbackResults = await client . search ( 'OAuth2 authentication' );
}
Error Handling
try {
const client = new WebHelpSearchClient ( 'https://docs.example.com' );
const results = await client . search ( 'query' );
if ( results . error ) {
console . error ( 'Search error:' , results . error );
return ;
}
if ( results . results . length === 0 ) {
console . log ( 'No results found' );
return ;
}
// Process results
for ( const result of results . results ) {
console . log ( result . title );
}
} catch ( error ) {
console . error ( 'Client error:' , error . message );
}
Design Decisions
Why Two Search Strategies?
The client implements both semantic and index-based search for optimal results:
Semantic Search provides better results for natural language queries and conceptual questions
Index Search is more reliable for keyword-based queries and works without external dependencies
Automatic Fallback ensures search always works, even if semantic search is unavailable
Multi-Site Support
Supporting multiple documentation sites in a single client enables:
Version Search - Search across multiple versions of documentation
Product Search - Search across related products
Unified Results - Merge and rank results from all sites
Markdown Conversion
Converting HTML to Markdown provides:
Clean Content - Removes styling and non-content elements
LLM-Friendly - Ideal format for language model consumption
Portable - Easy to process, display, and store
IndexLoader Learn about the index loading mechanism
URL Encoding Understand URL compression utilities