Skip to main content

Overview

The WebHelpIndexLoader class handles the complex process of loading, parsing, and initializing Oxygen WebHelp search indexes. It creates an isolated JavaScript execution context to safely evaluate the WebHelp search engine code without polluting the global scope.
This class is used internally by WebHelpSearchClient and typically doesn’t need to be used directly.

Architecture

The loader follows a multi-stage initialization process:
  1. Download - Fetches search engine code and index files from the WebHelp site
  2. Parse - Extracts JavaScript variables from the downloaded files
  3. Initialize - Creates an isolated context and evaluates the search engine
  4. Ready - Provides a search interface for querying the index

File Structure

Oxygen WebHelp sites organize search files in a specific structure:
oxygen-webhelp/app/search/
├── nwSearchFnt.js           # Search engine code
├── index/                   # Index files (may be in root)
│   ├── index-1.js
│   ├── index-2.js
│   ├── ...
│   ├── stopwords.js
│   └── htmlFileInfoList.js
The loader automatically tries both index/ subdirectory and root locations for maximum compatibility with different WebHelp versions.

Class Interface

export class WebHelpIndexLoader {
  // Public methods
  async loadIndex(baseUrl: string): Promise<void>
  performSearch(query: string, callback: (result: any) => void): void
  getSearchContext(): any
  
  // Download methods
  async downloadSearchEngine(searchUrl: string): Promise<string>
  async downloadIndexParts(searchUrl: string): Promise<string[]>
  async downloadMetadataFiles(searchUrl: string): Promise<MetadataFiles>
  
  // Processing methods
  private setupSearchContext(): void
  private processStopwords(stopwordsContent: string): void
  private processLinkToParent(linkToParentContent: string): void
  private processKeywords(keywordsContent: string): void
  private processFileInfoList(htmlFileInfoListContent: string): void
  private processIndexParts(indexParts: string[]): void
  private initializeSearchEngine(nwSearchFntJs: string): void
  private parseJsonWithLogging<T>(json: string, context: string): T | null
}

Types

export interface SearchIndex {
  w: Record<string, any>;              // Word index
  fil: Record<string, any>;            // File information
  stopWords: string[];                 // Words to ignore
  link2parent: Record<string, any>;    // Parent links
}

Methods

loadIndex()

Loads and initializes the complete search index from a WebHelp documentation site.
baseUrl
string
required
The base URL of the WebHelp documentation site (e.g., https://docs.example.com)
const loader = new WebHelpIndexLoader();
await loader.loadIndex('https://docs.example.com');

Loading Process

The method performs these steps in sequence:
  1. Constructs the search URL: {baseUrl}/oxygen-webhelp/app/search
  2. Sets up an isolated search context
  3. Downloads the search engine code (nwSearchFnt.js)
  4. Downloads all index parts (index-1.js, index-2.js, …)
  5. Downloads metadata files (stopwords.js, htmlFileInfoList.js)
  6. Processes and merges all data into the search context
  7. Initializes the search engine
Source: webhelp-index-loader.ts:328-354
async loadIndex(baseUrl: string): Promise<void> {
  const searchUrl = `${baseUrl.replace(/\/$/, '')}/oxygen-webhelp/app/search`;
  this.baseUrl = searchUrl + '/';
  
  try {
    this.setupSearchContext();
    
    const nwSearchFntJs = await this.downloadSearchEngine(searchUrl);
    const indexParts = await this.downloadIndexParts(searchUrl);
    const metadataFiles = await this.downloadMetadataFiles(searchUrl);
    
    this.processStopwords(metadataFiles.stopwords);
    this.processFileInfoList(metadataFiles.htmlFileInfoList);
    this.processIndexParts(indexParts);
    
    this.initializeSearchEngine(nwSearchFntJs);
  } catch (error: any) {
    throw new Error(`Failed to load search index: ${error.message}`);
  }
}
If any step fails, the method throws an error with a descriptive message. The index must be successfully loaded before calling performSearch().

performSearch()

Executes a search query against the loaded index using a callback pattern.
query
string
required
The search query string
callback
function
required
Callback function that receives the search results
const loader = new WebHelpIndexLoader();
await loader.loadIndex('https://docs.example.com');

loader.performSearch('authentication', (result) => {
  console.log('Found documents:', result.documents);
  result.documents.forEach(doc => {
    console.log(`${doc.title} - ${doc.relativePath}`);
  });
});
The search engine must be initialized (via loadIndex()) before calling this method. Otherwise, it throws an error.

getSearchContext()

Returns the internal search context object, useful for debugging or advanced usage.
const context = loader.getSearchContext();
console.log('Word index entries:', Object.keys(context.w).length);
console.log('Stopwords:', context.stopWords);

downloadSearchEngine()

Downloads the WebHelp search engine code.
searchUrl
string
required
The search directory URL
const code = await loader.downloadSearchEngine(
  'https://docs.example.com/oxygen-webhelp/app/search'
);

downloadIndexParts()

Downloads all index part files, trying both index/ subdirectory and root locations.
searchUrl
string
required
The search directory URL
const parts = await loader.downloadIndexParts(
  'https://docs.example.com/oxygen-webhelp/app/search'
);
console.log(`Loaded ${parts.length} index parts`);

Download Strategy

For each index file (1-10), the method:
  1. Tries {searchUrl}/index/index-{n}.js
  2. Falls back to {searchUrl}/index-{n}.js
  3. Stops when a file is not found
The method supports up to 10 index parts. Most documentation sites use 1-3 parts, but large sites may use more.

downloadMetadataFiles()

Downloads stopwords and file information metadata.
searchUrl
string
required
The search directory URL
const metadata = await loader.downloadMetadataFiles(
  'https://docs.example.com/oxygen-webhelp/app/search'
);

Implementation Details

Isolated Context

The loader creates an isolated JavaScript context to safely evaluate WebHelp code:
Source: webhelp-index-loader.ts:89-128
private setupSearchContext(): void {
  this.searchContext = {
    w: {},                    // Word index
    fil: {},                  // File information
    stopWords: [],            // Stopwords array
    linkToParent: {},         // Parent links
    indexerLanguage: 'en',    // Default language
    doStem: false,            // Stemming disabled by default
    stemmer: null,            // Stemmer instance
    
    // Utility functions for the search engine
    debug: function() {},
    warn: function() {},
    info: function() {},
    trim: function(str: string, chars?: string) { /*...*/ },
    contains: function(arrayOfWords: string[], word: string) { /*...*/ },
    inArray: function(needle: any, haystack: any[]) { /*...*/ }
  };
}
The isolated context prevents WebHelp code from accessing or modifying the global scope, ensuring security and preventing conflicts.

JSON Parsing

WebHelp index files contain JavaScript variable declarations. The loader extracts and parses these:
Source: webhelp-index-loader.ts:24-34
private parseJsonWithLogging<T>(json: string, context: string): T | null {
  try {
    return JSON.parse(json) as T;
  } catch (e: any) {
    console.error(`Failed to parse JSON for ${context}: ${e.message}`);
    if (json) {
      console.error(`JSON snippet: ${json.substring(0, 500)}`);
    }
    return null;
  }
}
Example index file format:
var index1 = {
  "hello": [[0, 1, 2], [5, 3, 1]],
  "world": [[1, 2], [3, 1]]
};

Processing Pipeline

Each type of data file goes through a specific processing method:
Source: webhelp-index-loader.ts:130-141
private processStopwords(stopwordsContent: string): void {
  const jsonMatch = stopwordsContent.match(
    /var\s+stopwords\s*=\s*(\[[\s\S]*?\]);?\s*(?:\/\/.*)?$/
  );
  if (jsonMatch) {
    const parsed = this.parseJsonWithLogging<any[]>(jsonMatch[1], 'stopwords');
    if (parsed) {
      this.searchContext.stopwords = parsed;
      this.searchContext.stopWords = parsed;
    }
  }
}
Extracts words to ignore during search (e.g., “the”, “a”, “an”).

Search Engine Initialization

The most complex part is safely evaluating the WebHelp search engine code:
Source: webhelp-index-loader.ts:239-326
private initializeSearchEngine(nwSearchFntJs: string): void {
  // Create sandboxed evaluation context
  const evalContext = (function(context: any) {
    const evalCode = `
      (function(context) {
        var w = context.w;
        var fil = context.fil;
        var stopWords = context.stopWords;
        // ... map other context variables
        
        ${nwSearchFntJs}  // Inject search engine code
        
        return nwSearchFnt;  // Return constructor
      })(arguments[0]);
    `;
    return eval(evalCode);
  })(this.searchContext);
  
  // Store constructor and create instance
  this.searchContext.nwSearchFnt = evalContext;
  this.searchContext.searchEngine = new this.searchContext.nwSearchFnt(
    this.searchContext.index,
    this.searchContext.options,
    this.searchContext.stemmer,
    this.searchContext.util
  );
}
The search engine code is evaluated in a sandboxed function scope with explicit variable mapping. This prevents it from accessing Node.js globals or the file system.

Usage Examples

Basic Usage

import { WebHelpIndexLoader } from './webhelp-index-loader';

const loader = new WebHelpIndexLoader();

// Load the index
await loader.loadIndex('https://docs.example.com');

// Perform a search
loader.performSearch('authentication', (result) => {
  if (result.error) {
    console.error('Search error:', result.error);
    return;
  }
  
  console.log(`Found ${result.documents.length} documents`);
  result.documents.forEach(doc => {
    console.log(`${doc.title}`);
    console.log(`  Path: ${doc.relativePath}`);
    console.log(`  Score: ${doc.scoring}`);
  });
});

Inspecting the Index

const loader = new WebHelpIndexLoader();
await loader.loadIndex('https://docs.example.com');

const context = loader.getSearchContext();

console.log('Index Statistics:');
console.log(`  Words: ${Object.keys(context.w).length}`);
console.log(`  Files: ${Object.keys(context.fil).length}`);
console.log(`  Stopwords: ${context.stopWords.length}`);
console.log(`  Language: ${context.indexerLanguage}`);

Error Handling

try {
  const loader = new WebHelpIndexLoader();
  await loader.loadIndex('https://docs.example.com');
  
  loader.performSearch('query', (result) => {
    // Process results
  });
} catch (error) {
  if (error.message.includes('Failed to load search index')) {
    console.error('Could not download index files');
  } else if (error.message.includes('Search engine not initialized')) {
    console.error('Must call loadIndex() first');
  } else {
    console.error('Unexpected error:', error);
  }
}

Custom Download Locations

const loader = new WebHelpIndexLoader();

// The loader automatically tries multiple locations
const searchUrl = 'https://docs.example.com/oxygen-webhelp/app/search';

// Try custom locations if standard ones fail
try {
  const engine = await loader.downloadSearchEngine(searchUrl);
  const parts = await loader.downloadIndexParts(searchUrl);
  const metadata = await loader.downloadMetadataFiles(searchUrl);
} catch (error) {
  console.error('Could not download from standard locations');
}

Design Decisions

Why Isolated Context?

The WebHelp search engine code expects specific global variables. Creating an isolated context:
  • Prevents Pollution - Doesn’t modify the Node.js global scope
  • Ensures Security - Limits access to system resources
  • Enables Reuse - Multiple loaders can coexist
  • Simplifies Debugging - All search state is contained

Why Callback Pattern?

The performSearch() method uses a callback because:
  • Compatibility - Matches the WebHelp search engine API
  • Flexibility - Caller controls result processing
  • Efficiency - No need to create intermediate result objects

Why Try Multiple Locations?

Different Oxygen WebHelp versions organize files differently:
  • Older Versions - Place index files in the root search/ directory
  • Newer Versions - Use an index/ subdirectory
  • Fallback Strategy - Ensures compatibility across versions

Performance Considerations

Index Size

Large documentation sites can have substantial indexes:
  • Small Site - 1 index file, ~50KB
  • Medium Site - 2-3 index files, ~200KB
  • Large Site - 5-10 index files, ~1MB+
Consider caching the loaded index in memory if performing multiple searches, rather than reloading it each time.

Network Requests

The loader makes multiple HTTP requests:
  • 1 request for search engine code
  • 1-10 requests for index parts
  • 2 requests for metadata files
Total: 4-13 requests per load
// Preload for better performance
const loader = new WebHelpIndexLoader();
await loader.loadIndex(baseUrl);

// Now searches are instant
for (const query of queries) {
  loader.performSearch(query, processResults);
}

WebHelpSearchClient

High-level search client that uses this loader

URL Encoding

URL compression utilities

Build docs developers (and LLMs) love