WebHelpIndexLoader

Overview

The WebHelpIndexLoader class handles the complex process of loading, parsing, and initializing Oxygen WebHelp search indexes. It creates an isolated JavaScript execution context to safely evaluate the WebHelp search engine code without polluting the global scope.

This class is used internally by WebHelpSearchClient and typically doesn’t need to be used directly.

Architecture

The loader follows a multi-stage initialization process:

Download - Fetches search engine code and index files from the WebHelp site
Parse - Extracts JavaScript variables from the downloaded files
Initialize - Creates an isolated context and evaluates the search engine
Ready - Provides a search interface for querying the index

File Structure

Oxygen WebHelp sites organize search files in a specific structure:

oxygen-webhelp/app/search/
├── nwSearchFnt.js           # Search engine code
├── index/                   # Index files (may be in root)
│   ├── index-1.js
│   ├── index-2.js
│   ├── ...
│   ├── stopwords.js
│   └── htmlFileInfoList.js

The loader automatically tries both index/ subdirectory and root locations for maximum compatibility with different WebHelp versions.

Class Interface

export class WebHelpIndexLoader {
  // Public methods
  async loadIndex(baseUrl: string): Promise<void>
  performSearch(query: string, callback: (result: any) => void): void
  getSearchContext(): any
  
  // Download methods
  async downloadSearchEngine(searchUrl: string): Promise<string>
  async downloadIndexParts(searchUrl: string): Promise<string[]>
  async downloadMetadataFiles(searchUrl: string): Promise<MetadataFiles>
  
  // Processing methods
  private setupSearchContext(): void
  private processStopwords(stopwordsContent: string): void
  private processLinkToParent(linkToParentContent: string): void
  private processKeywords(keywordsContent: string): void
  private processFileInfoList(htmlFileInfoListContent: string): void
  private processIndexParts(indexParts: string[]): void
  private initializeSearchEngine(nwSearchFntJs: string): void
  private parseJsonWithLogging<T>(json: string, context: string): T | null
}

Types

export interface SearchIndex {
  w: Record<string, any>;              // Word index
  fil: Record<string, any>;            // File information
  stopWords: string[];                 // Words to ignore
  link2parent: Record<string, any>;    // Parent links
}

Methods

loadIndex()

Loads and initializes the complete search index from a WebHelp documentation site.

baseUrl

string

required

The base URL of the WebHelp documentation site (e.g., https://docs.example.com)

const loader = new WebHelpIndexLoader();
await loader.loadIndex('https://docs.example.com');

Loading Process

The method performs these steps in sequence:

Constructs the search URL: {baseUrl}/oxygen-webhelp/app/search
Sets up an isolated search context
Downloads the search engine code (nwSearchFnt.js)
Downloads all index parts (index-1.js, index-2.js, …)
Downloads metadata files (stopwords.js, htmlFileInfoList.js)
Processes and merges all data into the search context
Initializes the search engine

Source: webhelp-index-loader.ts:328-354

async loadIndex(baseUrl: string): Promise<void> {
  const searchUrl = `${baseUrl.replace(/\/$/, '')}/oxygen-webhelp/app/search`;
  this.baseUrl = searchUrl + '/';
  
  try {
    this.setupSearchContext();
    
    const nwSearchFntJs = await this.downloadSearchEngine(searchUrl);
    const indexParts = await this.downloadIndexParts(searchUrl);
    const metadataFiles = await this.downloadMetadataFiles(searchUrl);
    
    this.processStopwords(metadataFiles.stopwords);
    this.processFileInfoList(metadataFiles.htmlFileInfoList);
    this.processIndexParts(indexParts);
    
    this.initializeSearchEngine(nwSearchFntJs);
  } catch (error: any) {
    throw new Error(`Failed to load search index: ${error.message}`);
  }
}

If any step fails, the method throws an error with a descriptive message. The index must be successfully loaded before calling performSearch().

performSearch()

Executes a search query against the loaded index using a callback pattern.

query

string

required

The search query string

callback

function

required

Callback function that receives the search results

const loader = new WebHelpIndexLoader();
await loader.loadIndex('https://docs.example.com');

loader.performSearch('authentication', (result) => {
  console.log('Found documents:', result.documents);
  result.documents.forEach(doc => {
    console.log(`${doc.title} - ${doc.relativePath}`);
  });
});

The search engine must be initialized (via loadIndex()) before calling this method. Otherwise, it throws an error.

getSearchContext()

Returns the internal search context object, useful for debugging or advanced usage.

const context = loader.getSearchContext();
console.log('Word index entries:', Object.keys(context.w).length);
console.log('Stopwords:', context.stopWords);

downloadSearchEngine()

Downloads the WebHelp search engine code.

searchUrl

string

required

The search directory URL

const code = await loader.downloadSearchEngine(
  'https://docs.example.com/oxygen-webhelp/app/search'
);

downloadIndexParts()

Downloads all index part files, trying both index/ subdirectory and root locations.

searchUrl

string

required

The search directory URL

const parts = await loader.downloadIndexParts(
  'https://docs.example.com/oxygen-webhelp/app/search'
);
console.log(`Loaded ${parts.length} index parts`);

Download Strategy

For each index file (1-10), the method:

Tries {searchUrl}/index/index-{n}.js
Falls back to {searchUrl}/index-{n}.js
Stops when a file is not found

The method supports up to 10 index parts. Most documentation sites use 1-3 parts, but large sites may use more.

downloadMetadataFiles()

Downloads stopwords and file information metadata.

searchUrl

string

required

The search directory URL

const metadata = await loader.downloadMetadataFiles(
  'https://docs.example.com/oxygen-webhelp/app/search'
);

Implementation Details

Isolated Context

The loader creates an isolated JavaScript context to safely evaluate WebHelp code:

Source: webhelp-index-loader.ts:89-128

private setupSearchContext(): void {
  this.searchContext = {
    w: {},                    // Word index
    fil: {},                  // File information
    stopWords: [],            // Stopwords array
    linkToParent: {},         // Parent links
    indexerLanguage: 'en',    // Default language
    doStem: false,            // Stemming disabled by default
    stemmer: null,            // Stemmer instance
    
    // Utility functions for the search engine
    debug: function() {},
    warn: function() {},
    info: function() {},
    trim: function(str: string, chars?: string) { /*...*/ },
    contains: function(arrayOfWords: string[], word: string) { /*...*/ },
    inArray: function(needle: any, haystack: any[]) { /*...*/ }
  };
}

The isolated context prevents WebHelp code from accessing or modifying the global scope, ensuring security and preventing conflicts.

JSON Parsing

WebHelp index files contain JavaScript variable declarations. The loader extracts and parses these:

Source: webhelp-index-loader.ts:24-34

private parseJsonWithLogging<T>(json: string, context: string): T | null {
  try {
    return JSON.parse(json) as T;
  } catch (e: any) {
    console.error(`Failed to parse JSON for ${context}: ${e.message}`);
    if (json) {
      console.error(`JSON snippet: ${json.substring(0, 500)}`);
    }
    return null;
  }
}

Example index file format:

var index1 = {
  "hello": [[0, 1, 2], [5, 3, 1]],
  "world": [[1, 2], [3, 1]]
};

Processing Pipeline

Each type of data file goes through a specific processing method:

Stopwords
File Info
Index Parts

Source: webhelp-index-loader.ts:130-141

private processStopwords(stopwordsContent: string): void {
  const jsonMatch = stopwordsContent.match(
    /var\s+stopwords\s*=\s*(\[[\s\S]*?\]);?\s*(?:\/\/.*)?$/
  );
  if (jsonMatch) {
    const parsed = this.parseJsonWithLogging<any[]>(jsonMatch[1], 'stopwords');
    if (parsed) {
      this.searchContext.stopwords = parsed;
      this.searchContext.stopWords = parsed;
    }
  }
}

Extracts words to ignore during search (e.g., “the”, “a”, “an”).

Source: webhelp-index-loader.ts:183-210

private processFileInfoList(htmlFileInfoListContent: string): void {
  // Extract htmlFileInfoList array
  const htmlFileInfoListMatch = htmlFileInfoListContent.match(
    /var\s+htmlFileInfoList\s*=\s*(\[[\s\S]*?\]);/
  );
  
  // Extract fil object
  const filMatch = htmlFileInfoListContent.match(
    /var\s+fil\s*=\s*(\{[\s\S]*?\});/
  );
  
  // Build fil object from array if needed
  if (this.searchContext.htmlFileInfoList && !this.searchContext.fil) {
    this.searchContext.fil = {};
    this.searchContext.htmlFileInfoList.forEach((item: any, index: number) => {
      this.searchContext.fil[index.toString()] = item;
    });
  }
}

Maps file IDs to their metadata (title, path, etc.).

Source: webhelp-index-loader.ts:212-237

private processIndexParts(indexParts: string[]): void {
  // Parse each index file
  indexParts.forEach((part, idx) => {
    const indexMatch = part.match(/var\s+index(\d+)\s*=\s*(\{[\s\S]*?\});?/);
    if (indexMatch) {
      const indexNum = indexMatch[1];
      const parsed = this.parseJsonWithLogging(indexMatch[2], `index${indexNum}`);
      if (parsed) {
        this.searchContext[`index${indexNum}`] = parsed;
      }
    }
  });
  
  // Merge all indexes into a single word index
  const allWords: Record<string, any> = {};
  for (let i = 1; i <= indexParts.length; i++) {
    Object.assign(allWords, this.searchContext[`index${i}`]);
  }
  this.searchContext.w = allWords;
}

Merges multiple index files into a unified word index.

Search Engine Initialization

The most complex part is safely evaluating the WebHelp search engine code:

Source: webhelp-index-loader.ts:239-326

private initializeSearchEngine(nwSearchFntJs: string): void {
  // Create sandboxed evaluation context
  const evalContext = (function(context: any) {
    const evalCode = `
      (function(context) {
        var w = context.w;
        var fil = context.fil;
        var stopWords = context.stopWords;
        // ... map other context variables
        
        ${nwSearchFntJs}  // Inject search engine code
        
        return nwSearchFnt;  // Return constructor
      })(arguments[0]);
    `;
    return eval(evalCode);
  })(this.searchContext);
  
  // Store constructor and create instance
  this.searchContext.nwSearchFnt = evalContext;
  this.searchContext.searchEngine = new this.searchContext.nwSearchFnt(
    this.searchContext.index,
    this.searchContext.options,
    this.searchContext.stemmer,
    this.searchContext.util
  );
}

The search engine code is evaluated in a sandboxed function scope with explicit variable mapping. This prevents it from accessing Node.js globals or the file system.

Usage Examples

Basic Usage

import { WebHelpIndexLoader } from './webhelp-index-loader';

const loader = new WebHelpIndexLoader();

// Load the index
await loader.loadIndex('https://docs.example.com');

// Perform a search
loader.performSearch('authentication', (result) => {
  if (result.error) {
    console.error('Search error:', result.error);
    return;
  }
  
  console.log(`Found ${result.documents.length} documents`);
  result.documents.forEach(doc => {
    console.log(`${doc.title}`);
    console.log(`  Path: ${doc.relativePath}`);
    console.log(`  Score: ${doc.scoring}`);
  });
});

Inspecting the Index

const loader = new WebHelpIndexLoader();
await loader.loadIndex('https://docs.example.com');

const context = loader.getSearchContext();

console.log('Index Statistics:');
console.log(`  Words: ${Object.keys(context.w).length}`);
console.log(`  Files: ${Object.keys(context.fil).length}`);
console.log(`  Stopwords: ${context.stopWords.length}`);
console.log(`  Language: ${context.indexerLanguage}`);

Error Handling

try {
  const loader = new WebHelpIndexLoader();
  await loader.loadIndex('https://docs.example.com');
  
  loader.performSearch('query', (result) => {
    // Process results
  });
} catch (error) {
  if (error.message.includes('Failed to load search index')) {
    console.error('Could not download index files');
  } else if (error.message.includes('Search engine not initialized')) {
    console.error('Must call loadIndex() first');
  } else {
    console.error('Unexpected error:', error);
  }
}

Custom Download Locations

const loader = new WebHelpIndexLoader();

// The loader automatically tries multiple locations
const searchUrl = 'https://docs.example.com/oxygen-webhelp/app/search';

// Try custom locations if standard ones fail
try {
  const engine = await loader.downloadSearchEngine(searchUrl);
  const parts = await loader.downloadIndexParts(searchUrl);
  const metadata = await loader.downloadMetadataFiles(searchUrl);
} catch (error) {
  console.error('Could not download from standard locations');
}

Design Decisions

Why Isolated Context?

The WebHelp search engine code expects specific global variables. Creating an isolated context:

Prevents Pollution - Doesn’t modify the Node.js global scope
Ensures Security - Limits access to system resources
Enables Reuse - Multiple loaders can coexist
Simplifies Debugging - All search state is contained

Why Callback Pattern?

The performSearch() method uses a callback because:

Compatibility - Matches the WebHelp search engine API
Flexibility - Caller controls result processing
Efficiency - No need to create intermediate result objects

Why Try Multiple Locations?

Different Oxygen WebHelp versions organize files differently:

Older Versions - Place index files in the root search/ directory
Newer Versions - Use an index/ subdirectory
Fallback Strategy - Ensures compatibility across versions

Performance Considerations

Index Size

Large documentation sites can have substantial indexes:

Small Site - 1 index file, ~50KB
Medium Site - 2-3 index files, ~200KB
Large Site - 5-10 index files, ~1MB+

Consider caching the loaded index in memory if performing multiple searches, rather than reloading it each time.

Network Requests

The loader makes multiple HTTP requests:

1 request for search engine code
1-10 requests for index parts
2 requests for metadata files

Total: 4-13 requests per load

// Preload for better performance
const loader = new WebHelpIndexLoader();
await loader.loadIndex(baseUrl);

// Now searches are instant
for (const query of queries) {
  loader.performSearch(query, processResults);
}

WebHelpSearchClient

High-level search client that uses this loader

URL Encoding

URL compression utilities

MCP Tools

Architecture

Overview

Architecture

File Structure

Class Interface

Types

Methods

loadIndex()

Loading Process

performSearch()

getSearchContext()

downloadSearchEngine()

downloadIndexParts()

Download Strategy

downloadMetadataFiles()

Implementation Details

Isolated Context

JSON Parsing

Processing Pipeline

Search Engine Initialization

Usage Examples

Basic Usage

Inspecting the Index

Error Handling

Custom Download Locations

Design Decisions

Why Isolated Context?

Why Callback Pattern?

Why Try Multiple Locations?

Performance Considerations

Index Size

Network Requests

WebHelpSearchClient

URL Encoding

Build docs developers (and LLMs) love

MCP Tools

Architecture

​Overview

​Architecture

​File Structure

​Class Interface

​Types

​Methods

​loadIndex()

​Loading Process

​performSearch()

​getSearchContext()

​downloadSearchEngine()

​downloadIndexParts()

​Download Strategy

​downloadMetadataFiles()

​Implementation Details

​Isolated Context

​JSON Parsing

​Processing Pipeline

​Search Engine Initialization

​Usage Examples

​Basic Usage

​Inspecting the Index

​Error Handling

​Custom Download Locations

​Design Decisions

​Why Isolated Context?

​Why Callback Pattern?

​Why Try Multiple Locations?

​Performance Considerations

​Index Size

​Network Requests

​Related

WebHelpSearchClient

URL Encoding

Build docs developers (and LLMs) love

Overview

Architecture

File Structure

Class Interface

Types

Methods

loadIndex()

Loading Process

performSearch()

getSearchContext()

downloadSearchEngine()

downloadIndexParts()

Download Strategy

downloadMetadataFiles()

Implementation Details

Isolated Context

JSON Parsing

Processing Pipeline

Search Engine Initialization

Usage Examples

Basic Usage

Inspecting the Index

Error Handling

Custom Download Locations

Design Decisions

Why Isolated Context?

Why Callback Pattern?

Why Try Multiple Locations?

Performance Considerations

Index Size

Network Requests

Related