Semantic Search
Semantic search provides AI-powered natural language understanding for WebHelp documentation queries. When available, it offers more intuitive search results than traditional keyword matching.
How It Works
Semantic search is powered by Oxygen XML’s Feedback service:
Extracts the deployment token from the WebHelp site
Sends the query to the Oxygen Feedback API
Receives AI-ranked results based on semantic relevance
Falls back to index-based search if unavailable
Semantic search only works for WebHelp sites that have Oxygen Feedback enabled. The server automatically detects availability and falls back gracefully.
Implementation
Here’s the complete semantic search implementation:
// From webhelp-search-client.ts:95-171
async semanticSearch (
query : string ,
baseUrl : string ,
pageSize : number = 10
): Promise < SearchResult > {
try {
// Extract deployment token from the WebHelp page
const mainPage = await downloadFile ( baseUrl );
const match = mainPage . match ( /feedback-init [ ^ > ] + deploymentToken= ( [ ^ "'> ] + ) / );
if ( ! match ) {
return { error: 'Deployment token not found' , results: [] };
}
const token = match [ 1 ];
// Prepare search request
const postData = JSON . stringify ({
searchQuery: query ,
facets: [],
currentPage: 1 ,
pageSize ,
exactSearch: false ,
defaultJoinOperator: 'AND' ,
highlight: false ,
indexFields: []
});
// Configure proxy if needed
const proxyUrl =
process . env . HTTPS_PROXY ||
process . env . https_proxy ||
process . env . HTTP_PROXY ||
process . env . http_proxy ;
const options : any = {
method: 'POST' ,
headers: {
'Content-Type' : 'application/json' ,
'Content-Length' : Buffer . byteLength ( postData )
}
};
if ( proxyUrl ) {
options . agent = new HttpsProxyAgent ( proxyUrl );
}
// Execute search request
const dataStr : string = await new Promise (( resolve , reject ) => {
const req = https . request (
`https://feedback.oxygenxml.com/api/html-content/search?token= ${ token } ` ,
options ,
res => {
if ( res . statusCode && res . statusCode >= 200 && res . statusCode < 300 ) {
let body = '' ;
res . on ( 'data' , chunk => ( body += chunk ));
res . on ( 'end' , () => resolve ( body ));
} else {
reject ( new Error ( `HTTP ${ res . statusCode } : ${ res . statusMessage } ` ));
}
}
);
req . on ( 'error' , reject );
req . write ( postData );
req . end ();
});
// Parse and format results
const data : any = JSON . parse ( dataStr );
const results = ( data . documents || []). map (( doc : any , idx : number ) => {
const url = doc . fields ?. uri || '' ;
const rel = url . startsWith ( baseUrl ) ? url . substring ( baseUrl . length ) : url ;
return {
id: `0: ${ rel } ` ,
title: doc . fields ?. title || '' ,
url ,
score: doc . score ?? 0
};
});
return { results };
} catch (error: any) {
return { error : `Semantic search failed: ${ error . message } ` , results : [] };
}
}
The deployment token is embedded in the WebHelp page HTML:
const mainPage = await downloadFile ( baseUrl );
const match = mainPage . match ( /feedback-init [ ^ > ] + deploymentToken= ( [ ^ "'> ] + ) / );
if ( ! match ) {
return { error: 'Deployment token not found' , results: [] };
}
const token = match [ 1 ];
Typical HTML:
< script >
oxygenFeedbackInit ({
deploymentToken: 'abc123def456' ,
productName: 'My Documentation' ,
productVersion: '1.0'
});
</ script >
If the WebHelp site doesn’t include the feedback-init script with a deploymentToken, semantic search is unavailable.
API Request
The server posts to the Oxygen Feedback API:
Endpoint:
https://feedback.oxygenxml.com/api/html-content/search?token={token}
Request body:
{
"searchQuery" : "how do I validate DITA maps" ,
"facets" : [],
"currentPage" : 1 ,
"pageSize" : 10 ,
"exactSearch" : false ,
"defaultJoinOperator" : "AND" ,
"highlight" : false ,
"indexFields" : []
}
Parameters:
searchQuery — Natural language query
pageSize — Maximum results (default: 10)
exactSearch — Exact phrase matching (always false)
defaultJoinOperator — Term matching mode (always “AND”)
highlight — Return highlighted snippets (always false)
The Oxygen Feedback API returns:
{
"documents" : [
{
"fields" : {
"uri" : "https://example.com/docs/topics/validation.html" ,
"title" : "Validating DITA Maps"
},
"score" : 0.95
},
{
"fields" : {
"uri" : "https://example.com/docs/topics/checking-links.html" ,
"title" : "Checking Links in DITA"
},
"score" : 0.82
}
]
}
Scores range from 0 to 1, with higher values indicating better semantic relevance.
Search Fallback Strategy
The WebHelp MCP Server implements automatic fallback:
// From webhelp-search-client.ts:40-52
async search ( query : string ): Promise < SearchResult > {
const urls = this . baseUrls ;
// Try semantic search for single sites
if (urls.length === 1) {
try {
const semantic = await this . semanticSearch ( query , urls [ 0 ]);
if ( ! semantic . error && semantic . results . length > 0 ) {
return semantic ;
}
} catch ( e ) {
// Fall back to index search
}
}
// Use index search for federated or fallback
// ...
}
Fallback conditions:
Federated search (multiple sites) — Always uses index search
No deployment token found — Falls back to index search
Oxygen Feedback API error — Falls back to index search
No results from semantic search — Falls back to index search
The fallback is transparent to the user. AI tools receive results without knowing which search method was used.
Advantages
Natural Language Understands questions like “How do I configure output?”
Better Ranking AI-powered relevance scoring improves result quality
Context Aware Understands synonyms and related concepts
User Friendly No need to learn boolean operators or exact keywords
Limitations
Single Site Only
Semantic search works only for single-site queries:
if ( urls . length === 1 ) {
// Try semantic search
}
Federated searches always use index-based search.
Reason: Semantic scores from different Oxygen Feedback instances aren’t comparable.
Requires Oxygen Feedback
Semantic search only works if:
The WebHelp site was published with Oxygen Feedback enabled
The deployment token is accessible in the page HTML
The Feedback service is reachable from your server
Many WebHelp sites don’t have Oxygen Feedback enabled. The server falls back gracefully, but semantic search won’t be available.
API Dependency
Semantic search depends on an external service:
https://feedback.oxygenxml.com/api/html-content/search
If this service is down or unreachable, semantic search fails and falls back to index search.
No Customization
The server always uses these search parameters:
exactSearch: false
defaultJoinOperator: 'AND'
highlight: false
pageSize: 10
These cannot be customized per query.
Proxy Support
The server respects HTTP proxy environment variables:
const proxyUrl =
process . env . HTTPS_PROXY ||
process . env . https_proxy ||
process . env . HTTP_PROXY ||
process . env . http_proxy ;
if ( proxyUrl ) {
options . agent = new HttpsProxyAgent ( proxyUrl );
}
Supported variables:
HTTPS_PROXY
https_proxy
HTTP_PROXY
http_proxy
Set these environment variables if your server requires a proxy to reach feedback.oxygenxml.com.
Error Handling
Token Not Found
{
"error" : "Deployment token not found" ,
"results" : []
}
The WebHelp page doesn’t include Oxygen Feedback integration.
API Request Failed
{
"error" : "Semantic search failed: HTTP 503: Service Unavailable" ,
"results" : []
}
The Oxygen Feedback service is temporarily unavailable.
Network Errors
{
"error" : "Semantic search failed: ECONNREFUSED" ,
"results" : []
}
The server cannot reach feedback.oxygenxml.com (check proxy settings).
Request Time
Semantic search involves two HTTP requests:
Download main page to extract token (~200ms)
POST query to Feedback API (~300-800ms)
Total: 500-1000ms
Caching Opportunities
The deployment token could be cached to eliminate the first request:
// Potential optimization (not implemented)
const tokenCache = new Map < string , string >();
if ( tokenCache . has ( baseUrl )) {
token = tokenCache . get ( baseUrl );
} else {
// Extract token and cache it
}
Token caching is not currently implemented. Each semantic search downloads the main page.
Query Tips
Good semantic queries:
“How do I configure PDF output?”
“What is the difference between a map and a topic?”
“Troubleshooting build errors”
“Best practices for reuse”
Less effective queries:
Single words: “PDF”, “map”, “error”
Boolean operators: “publishing AND PDF” (use index search instead)
Exact phrases in quotes (use exactSearch: true if needed)
Semantic search works best with natural questions and multi-word phrases that express intent.
Checking Availability
To check if a WebHelp site supports semantic search:
View the page source
Search for “feedback-init” or “deploymentToken”
If found, semantic search is available
Example:
curl https://www.oxygenxml.com/doc/versions/26.1/ug-editor/ | grep deploymentToken
If output includes the token, semantic search works for that site.
Comparison: Semantic vs Index Search
Feature Semantic Search Index Search Natural language ✅ Excellent ❌ Limited Boolean operators ❌ Not supported ✅ Supported Federated search ❌ Single site only ✅ Multi-site Performance 🟡 500-1000ms 🟢 200-500ms Availability 🟡 Oxygen Feedback only ✅ All WebHelp sites Result quality 🟢 AI-ranked 🟡 Keyword-based Offline support ❌ Requires API ✅ Local index
Next Steps
Search Tool Learn about the unified search tool
Federated Search Query multiple sites (uses index search)
Fetch Tool Retrieve full document content
Oxygen Feedback Learn about Oxygen Feedback integration