Overview
ForkBB includes a powerful full-text search system that indexes posts and topics for fast retrieval. The search engine supports multiple languages, CJK characters, wildcards, and sophisticated word processing.
Search Architecture
The search system is implemented in app/Models/Search/Search.php:15 with separate action handlers for different search types:
Forums Search within specific forums (ActionF)
Posts Full-text search of post content (ActionP)
Topics Search topic titles (ActionT)
Text Processing
Character Normalization
The search system normalizes text before indexing:
app/Models/Search/Search.php
const QUOTES = [ 'ʹ' , 'ʻ' , 'ʼ' , 'ʽ' , 'ʾ' , 'ʿ' , '΄' , '᾿' , 'Ꞌ' , 'ꞌ' , ''', ''' , '‛' , '′' , '´' , '`' , '`' , ''' , '`' , '''];
public function cleanText(string $text, bool $indexing = false): string
{
// Extract hashtags separately
$tags = [];
$text = \preg_replace_callback(
' % ( ?<=^| \ s | \ n | \ r ) #(?=[\p{L}\p{N}_]{3})[\p{L}\p{N}]+(?:_+[\p{L}\p{N}]+)*(?=$|\s|\n|\r|\.|,)%u',
function ( $matches ) use ( & $tags ) {
$tags [] = $matches [ 0 ];
return ' ' ;
},
$text
);
// Normalize quotes
$text = \ str_replace ( self :: QUOTES , ' \' ' , $text );
// Russian normalization
$text = \ str_replace ( 'ё' , 'е' , $text );
// Separate CJK characters with spaces
$text = \ preg_replace ( '%' . self :: CJK_REGEX . '%u' , ' \0 ' , $text );
// Reduce repeated characters (4+ to 1)
$text = \ preg_replace ( '%(\p{L})\1{3,}%u' , '\1' , $text );
// Remove quotes and hyphens outside words
$text = \ preg_replace ( '%((?<![\p{L}\p{N}])[ \'\' \-]|[ \'\' \-](?![\p{L}\p{N}]))%u' , ' ' , $text );
if ( false !== \ strpos ( $text , '-' )) {
// Remove words ending with -либо, -нибудь, -нить
$text = \ preg_replace ( '%\b[\p{L}\p{N}\- \' ]+\-(?:либо|нибу[дт]ь|нить)(?![\p{L}\p{N} \' \-])%u' , '' , $text );
// Remove trailing suffixes like -таки, -чуть
$text = \ preg_replace ( '%(?<=[\p{L}\p{N}])(\-(?:таки|чуть|[а-я]{1,2}))+(?![\p{L}\p{N} \' \-])%u' , '' , $text );
}
// Remove non-alphanumeric characters (keep wildcards if not indexing)
$text = \ preg_replace ( '%(?![ \'\' \-' . ( $indexing ? '' : '\?\*' ) . '])[^\p{L}\p{N}]+%u' , ' ' , $text );
// Compress multiple spaces
$text = \ preg_replace ( '% {2,}%' , ' ' , $text );
return \ trim ( $text . ' ' . \ implode ( ' ' , $tags ));
}
Text normalization ensures consistent indexing and searching regardless of input variations.
CJK Support
The search system has extensive support for Chinese, Japanese, and Korean characters:
app/Models/Search/Search.php
const CJK_REGEX = '[' .
'\x{1100}-\x{11FF}' . // Hangul Jamo
'\x{3130}-\x{318F}' . // Hangul Compatibility Jamo
'\x{AC00}-\x{D7AF}' . // Hangul Syllables
// Hiragana
'\x{3040}-\x{309F}' . // Hiragana
// Katakana
'\x{30A0}-\x{30FF}' . // Katakana
'\x{31F0}-\x{31FF}' . // Katakana Phonetic Extensions
// CJK Unified Ideographs
'\x{2E80}-\x{2EFF}' . // CJK Radicals Supplement
'\x{2F00}-\x{2FDF}' . // Kangxi Radicals
'\x{2FF0}-\x{2FFF}' . // Ideographic Description Characters
'\x{3000}-\x{303F}' . // CJK Symbols and Punctuation
'\x{31C0}-\x{31EF}' . // CJK Strokes
'\x{3200}-\x{32FF}' . // Enclosed CJK Letters and Months
'\x{3400}-\x{4DBF}' . // CJK Unified Ideographs Extension A
'\x{4E00}-\x{9FFF}' . // CJK Unified Ideographs
'\x{20000}-\x{2A6DF}' . // CJK Unified Ideographs Extension B
'\x{2A700}-\x{2B73F}' . // CJK Unified Ideographs Extension C
'\x{2B740}-\x{2B81F}' . // CJK Unified Ideographs Extension D
'\x{2B820}-\x{2CEAF}' . // CJK Unified Ideographs Extension E
'\x{2CEB0}-\x{2EBEF}' . // CJK Unified Ideographs Extension F
'\x{2F800}-\x{2FA1F}' . // CJK Compatibility Ideographs Supplement
'\x{30000}-\x{3134F}' . // CJK Unified Ideographs Extension G
'\x{31350}-\x{323AF}' . // CJK Unified Ideographs Extension H
']' ;
public function isCJKWord ( string $word ) : bool
{
return \ preg_match ( '%^' . self :: CJK_REGEX . '+$%u' , $word ) ? true : false ;
}
CJK Word Handling
CJK characters are treated differently from alphabetic languages:
Each character is separated during text cleaning
CJK words bypass length restrictions
Individual characters can be searched
Word Processing
Word Validation
app/Models/Search/Search.php
public function word ( string $word , bool $indexing = false ) : ? string
{
// Check stopwords
if ( isset ( $this -> c -> stopwords -> list [ $word ])) {
return null ;
}
// CJK words are always valid
if ( $this -> isCJKWord ( $word )) {
return $word ;
}
// Check minimum length (3 characters)
$len = \ mb_strlen ( \ trim ( $word , '?*' ), 'UTF-8' );
if ( $len < 3 ) {
return null ;
}
// Truncate to maximum length (20 characters)
if ( $len > 20 ) {
$word = \ mb_substr ( $word , 0 , 20 , 'UTF-8' );
}
return $word ;
}
Word Length Requirements : Regular words must be 3-20 characters. CJK characters have no length restrictions.
app/Models/Search/Search.php
public function words ( string $text , bool $indexing ) : array
{
$text = $this -> cleanText ( $text , $indexing );
$words = [];
foreach ( \ explode ( ' ' , $text ) as $word ) {
$word = $this -> word ( $word , $indexing );
if ( null !== $word ) {
$words [ $word ] = $word ;
}
}
return \ array_values ( $words );
}
Stopwords
Common words are filtered out to improve search relevance:
if ( isset ( $this -> c -> stopwords -> list [ $word ])) {
return null ;
}
Stopwords typically include:
Articles (a, an, the)
Prepositions (in, on, at)
Common verbs (is, are, was)
Pronouns (I, you, they)
Stopwords prevent searching for very common terms. Configure your stopword list based on your forum’s language.
Search Execution
The search system uses separate handlers for different search types:
Search Actions
ActionP (Posts)
ActionT (Topics)
ActionF (Forums)
Searches post content using the full-text index: // app/Models/Search/ActionP.php
// Searches through post messages
// Returns post IDs matching the query
Searches topic titles: // app/Models/Search/ActionT.php
// Searches through topic subjects
// Returns topic IDs matching the query
Filters results by forum: // app/Models/Search/ActionF.php
// Limits search to specific forums
// Combines with other search types
Search Preparation
// app/Models/Search/Prepare.php
// Validates and prepares search queries
// Handles wildcards and operators
Search Execution
// app/Models/Search/Execute.php
// Runs the prepared query
// Returns sorted, paginated results
Indexing
The search index is maintained automatically:
// app/Models/Search/Index.php
// Indexes new posts as they're created
// Updates index when posts are edited
Index Management
// app/Models/Search/TruncateIndex.php
// Clears the search index
// Used for rebuilding or maintenance
The search index should be rebuilt periodically or after bulk imports.
Pagination and Results
Result Links
app/Models/Search/Search.php
protected function getlink () : string
{
return $this -> c -> Router -> link ( $this -> linkMarker , $this -> linkArgs );
}
protected function getpagination () : array
{
return $this -> c -> Func -> paginate ( $this -> numPages , $this -> page , $this -> linkMarker , $this -> linkArgs );
}
public function hasPage () : bool
{
return $this -> page > 0 && $this -> page <= $this -> numPages ;
}
Result Slicing
Efficient result handling for pagination:
app/Models/Search/Search.php
public function slice ( string | array $data , int $offset , int $length ) : array
{
if ( \ is_array ( $data )) {
return \ array_slice ( $data , $offset , $length );
}
// For comma-separated string of IDs
$p = 0 ;
$i = 0 ;
// Skip to offset
while ( $i < $offset ) {
if ( false === ( $p = \ strpos ( $data , ',' , $p ))) {
return [];
}
++ $p ;
++ $i ;
}
$e = $p ;
$offset += $length ;
// Extract slice
while ( $i < $offset ) {
if ( false === ( $e = \ strpos ( $data , ',' , $e ))) {
return \ array_map ( ' \\ intval' , \ explode ( ',' , \ substr ( $data , $p )));
}
++ $e ;
++ $i ;
}
return \ array_map ( ' \\ intval' , \ explode ( ',' , \ substr ( $data , $p , $e - $p - 1 )));
}
public function count ( string | array $data ) : int
{
return \ is_array ( $data ) ? \ count ( $data ) : \ substr_count ( $data , ',' ) + 1 ;
}
Results can be stored as arrays or comma-separated strings for memory efficiency.
Wildcard Search
Searches support wildcards for partial matching:
* matches zero or more characters
? matches exactly one character
Examples:
test* matches “test”, “testing”, “tester”
t?st matches “test”, “tost”, “tast”
Wildcards are removed during indexing but preserved during searching.
Hashtag Support
Hashtags are automatically detected and indexed:
$text = \ preg_replace_callback (
'%(?<=^|\s|\n|\r)#(?=[\p{L}\p{N}_]{3})[\p{L}\p{N}]+(?:_+[\p{L}\p{N}]+)*(?=$|\s|\n|\r|\.|,)%u' ,
function ( $matches ) use ( & $tags ) {
$tags [] = $matches [ 0 ];
return ' ' ;
},
$text
);
Hashtags must:
Start with #
Contain at least 3 alphanumeric characters or underscores
Be preceded by whitespace or line start
Be followed by whitespace, line end, or punctuation
Search Deletion
Old searches are cleaned up periodically:
// app/Models/Search/Delete.php
// Removes expired search results
// Keeps database size manageable
Indexed Searches Full-text indexes enable fast queries
Result Caching Search results are cached temporarily
Word Filtering Stopwords reduce index size
Efficient Slicing Smart pagination without loading all results
Best Practices
Rebuild the search index after importing posts or if search results seem stale. Run index maintenance during low-traffic periods.
Customize stopwords for your forum’s primary language. Too few stopwords bloat the index; too many prevent valid searches.
If your forum has CJK content, ensure proper character encoding (UTF-8) throughout your application.
For very large forums (millions of posts), consider external search solutions like Elasticsearch or Sphinx.
Forums & Topics Understand the content being searched
BBCode How formatting affects search indexing