Overview
The WebHelpIndexLoader class handles the complex process of loading, parsing, and initializing Oxygen WebHelp search indexes. It creates an isolated JavaScript execution context to safely evaluate the WebHelp search engine code without polluting the global scope.
This class is used internally by WebHelpSearchClient and typically doesn’t need to be used directly.
Architecture
The loader follows a multi-stage initialization process:
Download - Fetches search engine code and index files from the WebHelp site
Parse - Extracts JavaScript variables from the downloaded files
Initialize - Creates an isolated context and evaluates the search engine
Ready - Provides a search interface for querying the index
File Structure
Oxygen WebHelp sites organize search files in a specific structure:
oxygen-webhelp/app/search/
├── nwSearchFnt.js # Search engine code
├── index/ # Index files (may be in root)
│ ├── index-1.js
│ ├── index-2.js
│ ├── ...
│ ├── stopwords.js
│ └── htmlFileInfoList.js
The loader automatically tries both index/ subdirectory and root locations for maximum compatibility with different WebHelp versions.
Class Interface
export class WebHelpIndexLoader {
// Public methods
async loadIndex ( baseUrl : string ) : Promise < void >
performSearch ( query : string , callback : ( result : any ) => void ) : void
getSearchContext () : any
// Download methods
async downloadSearchEngine ( searchUrl : string ) : Promise < string >
async downloadIndexParts ( searchUrl : string ) : Promise < string []>
async downloadMetadataFiles ( searchUrl : string ) : Promise < MetadataFiles >
// Processing methods
private setupSearchContext () : void
private processStopwords ( stopwordsContent : string ) : void
private processLinkToParent ( linkToParentContent : string ) : void
private processKeywords ( keywordsContent : string ) : void
private processFileInfoList ( htmlFileInfoListContent : string ) : void
private processIndexParts ( indexParts : string []) : void
private initializeSearchEngine ( nwSearchFntJs : string ) : void
private parseJsonWithLogging < T >( json : string , context : string ) : T | null
}
Types
SearchIndex
MetadataFiles
MetadataFile
export interface SearchIndex {
w : Record < string , any >; // Word index
fil : Record < string , any >; // File information
stopWords : string []; // Words to ignore
link2parent : Record < string , any >; // Parent links
}
Methods
loadIndex()
Loads and initializes the complete search index from a WebHelp documentation site.
The base URL of the WebHelp documentation site (e.g., https://docs.example.com)
const loader = new WebHelpIndexLoader ();
await loader . loadIndex ( 'https://docs.example.com' );
Loading Process
The method performs these steps in sequence:
Constructs the search URL: {baseUrl}/oxygen-webhelp/app/search
Sets up an isolated search context
Downloads the search engine code (nwSearchFnt.js)
Downloads all index parts (index-1.js, index-2.js, …)
Downloads metadata files (stopwords.js, htmlFileInfoList.js)
Processes and merges all data into the search context
Initializes the search engine
Source: webhelp-index-loader.ts:328-354
async loadIndex ( baseUrl : string ): Promise < void > {
const searchUrl = ` ${ baseUrl . replace ( / \/ $ / , '' ) } /oxygen-webhelp/app/search` ;
this. baseUrl = searchUrl + '/' ;
try {
this . setupSearchContext ();
const nwSearchFntJs = await this . downloadSearchEngine ( searchUrl );
const indexParts = await this . downloadIndexParts ( searchUrl );
const metadataFiles = await this . downloadMetadataFiles ( searchUrl );
this . processStopwords ( metadataFiles . stopwords );
this . processFileInfoList ( metadataFiles . htmlFileInfoList );
this . processIndexParts ( indexParts );
this . initializeSearchEngine ( nwSearchFntJs );
} catch (error: any) {
throw new Error ( `Failed to load search index: ${ error . message } ` );
}
}
If any step fails, the method throws an error with a descriptive message. The index must be successfully loaded before calling performSearch().
Executes a search query against the loaded index using a callback pattern.
Callback function that receives the search results
const loader = new WebHelpIndexLoader ();
await loader . loadIndex ( 'https://docs.example.com' );
loader . performSearch ( 'authentication' , ( result ) => {
console . log ( 'Found documents:' , result . documents );
result . documents . forEach ( doc => {
console . log ( ` ${ doc . title } - ${ doc . relativePath } ` );
});
});
The search engine must be initialized (via loadIndex()) before calling this method. Otherwise, it throws an error.
getSearchContext()
Returns the internal search context object, useful for debugging or advanced usage.
const context = loader . getSearchContext ();
console . log ( 'Word index entries:' , Object . keys ( context . w ). length );
console . log ( 'Stopwords:' , context . stopWords );
downloadSearchEngine()
Downloads the WebHelp search engine code.
const code = await loader . downloadSearchEngine (
'https://docs.example.com/oxygen-webhelp/app/search'
);
downloadIndexParts()
Downloads all index part files, trying both index/ subdirectory and root locations.
const parts = await loader . downloadIndexParts (
'https://docs.example.com/oxygen-webhelp/app/search'
);
console . log ( `Loaded ${ parts . length } index parts` );
Download Strategy
For each index file (1-10), the method:
Tries {searchUrl}/index/index-{n}.js
Falls back to {searchUrl}/index-{n}.js
Stops when a file is not found
The method supports up to 10 index parts. Most documentation sites use 1-3 parts, but large sites may use more.
Downloads stopwords and file information metadata.
const metadata = await loader . downloadMetadataFiles (
'https://docs.example.com/oxygen-webhelp/app/search'
);
Implementation Details
Isolated Context
The loader creates an isolated JavaScript context to safely evaluate WebHelp code:
Source: webhelp-index-loader.ts:89-128
private setupSearchContext (): void {
this . searchContext = {
w: {}, // Word index
fil: {}, // File information
stopWords: [], // Stopwords array
linkToParent: {}, // Parent links
indexerLanguage: 'en' , // Default language
doStem: false , // Stemming disabled by default
stemmer: null , // Stemmer instance
// Utility functions for the search engine
debug : function () {},
warn : function () {},
info : function () {},
trim : function ( str : string , chars ?: string ) { /*...*/ },
contains : function ( arrayOfWords : string [], word : string ) { /*...*/ },
inArray : function ( needle : any , haystack : any []) { /*...*/ }
};
}
The isolated context prevents WebHelp code from accessing or modifying the global scope, ensuring security and preventing conflicts.
JSON Parsing
WebHelp index files contain JavaScript variable declarations. The loader extracts and parses these:
Source: webhelp-index-loader.ts:24-34
private parseJsonWithLogging < T >( json : string , context : string ): T | null {
try {
return JSON . parse ( json ) as T ;
} catch ( e : any ) {
console . error ( `Failed to parse JSON for ${ context } : ${ e . message } ` );
if ( json ) {
console . error ( `JSON snippet: ${ json . substring ( 0 , 500 ) } ` );
}
return null ;
}
}
Example index file format:
var index1 = {
"hello" : [[ 0 , 1 , 2 ], [ 5 , 3 , 1 ]],
"world" : [[ 1 , 2 ], [ 3 , 1 ]]
};
Processing Pipeline
Each type of data file goes through a specific processing method:
Stopwords
File Info
Index Parts
Source: webhelp-index-loader.ts:130-141
private processStopwords ( stopwordsContent : string ): void {
const jsonMatch = stopwordsContent . match (
/var \s + stopwords \s * = \s * ( \[ [ \s\S ] *? \] ) ; ? \s * (?: \/\/ . * ) ? $ /
);
if ( jsonMatch ) {
const parsed = this . parseJsonWithLogging < any []>( jsonMatch [ 1 ], 'stopwords' );
if ( parsed ) {
this . searchContext . stopwords = parsed ;
this . searchContext . stopWords = parsed ;
}
}
}
Extracts words to ignore during search (e.g., “the”, “a”, “an”). Source: webhelp-index-loader.ts:183-210
private processFileInfoList ( htmlFileInfoListContent : string ): void {
// Extract htmlFileInfoList array
const htmlFileInfoListMatch = htmlFileInfoListContent . match (
/var \s + htmlFileInfoList \s * = \s * ( \[ [ \s\S ] *? \] ) ;/
);
// Extract fil object
const filMatch = htmlFileInfoListContent . match (
/var \s + fil \s * = \s * ( \{ [ \s\S ] *? \} ) ;/
);
// Build fil object from array if needed
if ( this . searchContext . htmlFileInfoList && ! this . searchContext . fil ) {
this . searchContext . fil = {};
this . searchContext . htmlFileInfoList . forEach (( item : any , index : number ) => {
this . searchContext . fil [ index . toString ()] = item ;
});
}
}
Maps file IDs to their metadata (title, path, etc.). Source: webhelp-index-loader.ts:212-237
private processIndexParts ( indexParts : string []): void {
// Parse each index file
indexParts . forEach (( part , idx ) => {
const indexMatch = part . match ( /var \s + index ( \d + ) \s * = \s * ( \{ [ \s\S ] *? \} ) ; ? / );
if ( indexMatch ) {
const indexNum = indexMatch [ 1 ];
const parsed = this . parseJsonWithLogging ( indexMatch [ 2 ], `index ${ indexNum } ` );
if ( parsed ) {
this . searchContext [ `index ${ indexNum } ` ] = parsed ;
}
}
});
// Merge all indexes into a single word index
const allWords : Record < string , any > = {};
for ( let i = 1 ; i <= indexParts . length ; i ++ ) {
Object . assign ( allWords , this . searchContext [ `index ${ i } ` ]);
}
this . searchContext . w = allWords ;
}
Merges multiple index files into a unified word index.
Search Engine Initialization
The most complex part is safely evaluating the WebHelp search engine code:
Source: webhelp-index-loader.ts:239-326
private initializeSearchEngine ( nwSearchFntJs : string ): void {
// Create sandboxed evaluation context
const evalContext = ( function ( context : any ) {
const evalCode = `
(function(context) {
var w = context.w;
var fil = context.fil;
var stopWords = context.stopWords;
// ... map other context variables
${ nwSearchFntJs } // Inject search engine code
return nwSearchFnt; // Return constructor
})(arguments[0]);
` ;
return eval ( evalCode );
})( this . searchContext );
// Store constructor and create instance
this . searchContext . nwSearchFnt = evalContext ;
this . searchContext . searchEngine = new this . searchContext . nwSearchFnt (
this . searchContext . index ,
this . searchContext . options ,
this . searchContext . stemmer ,
this . searchContext . util
);
}
The search engine code is evaluated in a sandboxed function scope with explicit variable mapping. This prevents it from accessing Node.js globals or the file system.
Usage Examples
Basic Usage
import { WebHelpIndexLoader } from './webhelp-index-loader' ;
const loader = new WebHelpIndexLoader ();
// Load the index
await loader . loadIndex ( 'https://docs.example.com' );
// Perform a search
loader . performSearch ( 'authentication' , ( result ) => {
if ( result . error ) {
console . error ( 'Search error:' , result . error );
return ;
}
console . log ( `Found ${ result . documents . length } documents` );
result . documents . forEach ( doc => {
console . log ( ` ${ doc . title } ` );
console . log ( ` Path: ${ doc . relativePath } ` );
console . log ( ` Score: ${ doc . scoring } ` );
});
});
Inspecting the Index
const loader = new WebHelpIndexLoader ();
await loader . loadIndex ( 'https://docs.example.com' );
const context = loader . getSearchContext ();
console . log ( 'Index Statistics:' );
console . log ( ` Words: ${ Object . keys ( context . w ). length } ` );
console . log ( ` Files: ${ Object . keys ( context . fil ). length } ` );
console . log ( ` Stopwords: ${ context . stopWords . length } ` );
console . log ( ` Language: ${ context . indexerLanguage } ` );
Error Handling
try {
const loader = new WebHelpIndexLoader ();
await loader . loadIndex ( 'https://docs.example.com' );
loader . performSearch ( 'query' , ( result ) => {
// Process results
});
} catch ( error ) {
if ( error . message . includes ( 'Failed to load search index' )) {
console . error ( 'Could not download index files' );
} else if ( error . message . includes ( 'Search engine not initialized' )) {
console . error ( 'Must call loadIndex() first' );
} else {
console . error ( 'Unexpected error:' , error );
}
}
Custom Download Locations
const loader = new WebHelpIndexLoader ();
// The loader automatically tries multiple locations
const searchUrl = 'https://docs.example.com/oxygen-webhelp/app/search' ;
// Try custom locations if standard ones fail
try {
const engine = await loader . downloadSearchEngine ( searchUrl );
const parts = await loader . downloadIndexParts ( searchUrl );
const metadata = await loader . downloadMetadataFiles ( searchUrl );
} catch ( error ) {
console . error ( 'Could not download from standard locations' );
}
Design Decisions
Why Isolated Context?
The WebHelp search engine code expects specific global variables. Creating an isolated context:
Prevents Pollution - Doesn’t modify the Node.js global scope
Ensures Security - Limits access to system resources
Enables Reuse - Multiple loaders can coexist
Simplifies Debugging - All search state is contained
Why Callback Pattern?
The performSearch() method uses a callback because:
Compatibility - Matches the WebHelp search engine API
Flexibility - Caller controls result processing
Efficiency - No need to create intermediate result objects
Why Try Multiple Locations?
Different Oxygen WebHelp versions organize files differently:
Older Versions - Place index files in the root search/ directory
Newer Versions - Use an index/ subdirectory
Fallback Strategy - Ensures compatibility across versions
Index Size
Large documentation sites can have substantial indexes:
Small Site - 1 index file, ~50KB
Medium Site - 2-3 index files, ~200KB
Large Site - 5-10 index files, ~1MB+
Consider caching the loaded index in memory if performing multiple searches, rather than reloading it each time.
Network Requests
The loader makes multiple HTTP requests:
1 request for search engine code
1-10 requests for index parts
2 requests for metadata files
Total: 4-13 requests per load
// Preload for better performance
const loader = new WebHelpIndexLoader ();
await loader . loadIndex ( baseUrl );
// Now searches are instant
for ( const query of queries ) {
loader . performSearch ( query , processResults );
}
WebHelpSearchClient High-level search client that uses this loader
URL Encoding URL compression utilities