Overview
A Searcher can:- Modify queries before they are executed (query rewriting)
- Process results by altering, reorganizing, or adding hits
- Federate to multiple search chains in series or parallel
- Act as a source by creating results from internal or external data
- Implement workflows by calling downstream searchers multiple times
Basic Searcher Structure
All custom searchers extend thecom.yahoo.search.Searcher class and override the search method:
Searcher Lifecycle
Construction
The searcher is instantiated with its configuration. Build any required in-memory structures here.
In Service
The
search() method is called by multiple threads in parallel. Keep shared data structures immutable or use proper synchronization.Common Searcher Patterns
Query Processor
Modifies the query before execution:Result Processor
Modifies results after execution:Federator
Searches multiple sources and combines results:Source Searcher
Generates results from custom data sources:Constructor Injection
Vespa supports dependency injection in searcher constructors:(ComponentId, ConfigClass1, ConfigClass2, ...)(String, ConfigClass1, ConfigClass2, ...)(ConfigClass1, ConfigClass2, ...)(ComponentId)(String)- Default no-argument constructor
Search Chain Configuration
Add your searcher toservices.xml:
Chain Dependencies
Control ordering with annotations:Real-World Example: Stemming
Here’s a simplified version of Vespa’sStemmingSearcher from ~/workspace/source/container-search/src/main/java/com/yahoo/prelude/querytransform/StemmingSearcher.java:62:
Error Handling
Handling Expected Events
Handling Expected Events
Create a Result with an error message:
Handling Unexpected Events
Handling Unexpected Events
Throw a RuntimeException:
Recoverable User Errors
Recoverable User Errors
Add a FeedbackHit explaining the condition:
Fill Operations
For federating searchers, overridefill() to fetch additional data:
Testing
Test searchers using the Execution framework:Performance Tips
- Avoid synchronization: Keep data structures built during construction read-only
- Use tracing: Call
query.trace()to add debug information - Minimize allocations: Reuse objects when possible
- Profile carefully: Searchers are on the critical path for all queries
Next Steps
- Learn about Document Processors for feed processing
- Explore HTTP Handlers for custom endpoints
- See Plugins and Bundles for packaging components