Search

Overview

ForkBB includes a powerful full-text search system that indexes posts and topics for fast retrieval. The search engine supports multiple languages, CJK characters, wildcards, and sophisticated word processing.

Search Architecture

The search system is implemented in app/Models/Search/Search.php:15 with separate action handlers for different search types:

Forums

Search within specific forums (ActionF)

Posts

Full-text search of post content (ActionP)

Topics

Search topic titles (ActionT)

Text Processing

Character Normalization

The search system normalizes text before indexing:

app/Models/Search/Search.php

const QUOTES = ['ʹ', 'ʻ', 'ʼ', 'ʽ', 'ʾ', 'ʿ', '΄', '᾿', 'Ꞌ', 'ꞌ', ''', ''', '‛', '′', '´', '`', '｀', '＇', '`', '''];

public function cleanText(string $text, bool $indexing = false): string
{
    // Extract hashtags separately
    $tags = [];
    $text = \preg_replace_callback(
        '%(?<=^|\s|\n|\r)#(?=[\p{L}\p{N}_]{3})[\p{L}\p{N}]+(?:_+[\p{L}\p{N}]+)*(?=$|\s|\n|\r|\.|,)%u',
        function ($matches) use (&$tags)  {
            $tags[] = $matches[0];
            return ' ';
        },
        $text
    );
    
    // Normalize quotes
    $text = \str_replace(self::QUOTES, '\'', $text);
    
    // Russian normalization
    $text = \str_replace('ё', 'е', $text);
    
    // Separate CJK characters with spaces
    $text = \preg_replace('%' . self::CJK_REGEX . '%u', ' \0 ', $text);
    
    // Reduce repeated characters (4+ to 1)
    $text = \preg_replace('%(\p{L})\1{3,}%u', '\1', $text);
    
    // Remove quotes and hyphens outside words
    $text = \preg_replace('%((?<![\p{L}\p{N}])[\'\'\-]|[\'\'\-](?![\p{L}\p{N}]))%u', ' ', $text);
    
    if (false !== \strpos($text, '-')) {
        // Remove words ending with -либо, -нибудь, -нить
        $text = \preg_replace('%\b[\p{L}\p{N}\-\']+\-(?:либо|нибу[дт]ь|нить)(?![\p{L}\p{N}\'\-])%u', '', $text);
        
        // Remove trailing suffixes like -таки, -чуть
        $text = \preg_replace('%(?<=[\p{L}\p{N}])(\-(?:таки|чуть|[а-я]{1,2}))+(?![\p{L}\p{N}\'\-])%u', '', $text);
    }
    
    // Remove non-alphanumeric characters (keep wildcards if not indexing)
    $text = \preg_replace('%(?![\'\'\-'.($indexing ? '' : '\?\*').'])[^\p{L}\p{N}]+%u', ' ', $text);
    
    // Compress multiple spaces
    $text = \preg_replace('% {2,}%', ' ', $text);
    
    return \trim($text . ' '. \implode(' ', $tags));
}

Text normalization ensures consistent indexing and searching regardless of input variations.

CJK Support

The search system has extensive support for Chinese, Japanese, and Korean characters:

app/Models/Search/Search.php

const CJK_REGEX = '['.  
    '\x{1100}-\x{11FF}'.   // Hangul Jamo
    '\x{3130}-\x{318F}'.   // Hangul Compatibility Jamo
    '\x{AC00}-\x{D7AF}'.   // Hangul Syllables
    
    // Hiragana
    '\x{3040}-\x{309F}'.   // Hiragana
    
    // Katakana
    '\x{30A0}-\x{30FF}'.   // Katakana
    '\x{31F0}-\x{31FF}'.   // Katakana Phonetic Extensions
    
    // CJK Unified Ideographs
    '\x{2E80}-\x{2EFF}'.   // CJK Radicals Supplement
    '\x{2F00}-\x{2FDF}'.   // Kangxi Radicals
    '\x{2FF0}-\x{2FFF}'.   // Ideographic Description Characters
    '\x{3000}-\x{303F}'.   // CJK Symbols and Punctuation
    '\x{31C0}-\x{31EF}'.   // CJK Strokes
    '\x{3200}-\x{32FF}'.   // Enclosed CJK Letters and Months
    '\x{3400}-\x{4DBF}'.   // CJK Unified Ideographs Extension A
    '\x{4E00}-\x{9FFF}'.   // CJK Unified Ideographs
    '\x{20000}-\x{2A6DF}'. // CJK Unified Ideographs Extension B
    '\x{2A700}-\x{2B73F}'. // CJK Unified Ideographs Extension C
    '\x{2B740}-\x{2B81F}'. // CJK Unified Ideographs Extension D
    '\x{2B820}-\x{2CEAF}'. // CJK Unified Ideographs Extension E
    '\x{2CEB0}-\x{2EBEF}'. // CJK Unified Ideographs Extension F
    '\x{2F800}-\x{2FA1F}'. // CJK Compatibility Ideographs Supplement
    '\x{30000}-\x{3134F}'. // CJK Unified Ideographs Extension G
    '\x{31350}-\x{323AF}'. // CJK Unified Ideographs Extension H
    ']';

public function isCJKWord(string $word): bool
{
    return \preg_match('%^' . self::CJK_REGEX . '+$%u', $word) ? true : false;
}

CJK Word Handling

CJK characters are treated differently from alphabetic languages:

Each character is separated during text cleaning
CJK words bypass length restrictions
Individual characters can be searched

Word Processing

Word Validation

app/Models/Search/Search.php

public function word(string $word, bool $indexing = false): ?string
{
    // Check stopwords
    if (isset($this->c->stopwords->list[$word])) {
        return null;
    }
    
    // CJK words are always valid
    if ($this->isCJKWord($word)) {
        return $word;
    }
    
    // Check minimum length (3 characters)
    $len = \mb_strlen(\trim($word, '?*'), 'UTF-8');
    
    if ($len < 3) {
        return null;
    }
    
    // Truncate to maximum length (20 characters)
    if ($len > 20) {
        $word = \mb_substr($word, 0, 20, 'UTF-8');
    }
    
    return $word;
}

Word Length Requirements: Regular words must be 3-20 characters. CJK characters have no length restrictions.

Extracting Words

app/Models/Search/Search.php

public function words(string $text, bool $indexing): array
{
    $text  = $this->cleanText($text, $indexing);
    $words = [];
    
    foreach (\explode(' ', $text) as $word) {
        $word = $this->word($word, $indexing);
        
        if (null !== $word) {
            $words[$word] = $word;
        }
    }
    
    return \array_values($words);
}

Stopwords

Common words are filtered out to improve search relevance:

if (isset($this->c->stopwords->list[$word])) {
    return null;
}

Stopwords typically include:

Articles (a, an, the)
Prepositions (in, on, at)
Common verbs (is, are, was)
Pronouns (I, you, they)

Stopwords prevent searching for very common terms. Configure your stopword list based on your forum’s language.

Search Execution

The search system uses separate handlers for different search types:

Search Actions

ActionP (Posts)
ActionT (Topics)
ActionF (Forums)

Searches post content using the full-text index:

// app/Models/Search/ActionP.php
// Searches through post messages
// Returns post IDs matching the query

Searches topic titles:

// app/Models/Search/ActionT.php  
// Searches through topic subjects
// Returns topic IDs matching the query

Filters results by forum:

// app/Models/Search/ActionF.php
// Limits search to specific forums
// Combines with other search types

Search Preparation

// app/Models/Search/Prepare.php
// Validates and prepares search queries
// Handles wildcards and operators

Search Execution

// app/Models/Search/Execute.php
// Runs the prepared query
// Returns sorted, paginated results

Indexing

The search index is maintained automatically:

// app/Models/Search/Index.php
// Indexes new posts as they're created
// Updates index when posts are edited

Index Management

// app/Models/Search/TruncateIndex.php
// Clears the search index
// Used for rebuilding or maintenance

The search index should be rebuilt periodically or after bulk imports.

Pagination and Results

Result Links

app/Models/Search/Search.php

protected function getlink(): string
{
    return $this->c->Router->link($this->linkMarker, $this->linkArgs);
}

protected function getpagination(): array
{
    return $this->c->Func->paginate($this->numPages, $this->page, $this->linkMarker, $this->linkArgs);
}

public function hasPage(): bool
{
    return $this->page > 0 && $this->page <= $this->numPages;
}

Result Slicing

Efficient result handling for pagination:

app/Models/Search/Search.php

public function slice(string|array $data, int $offset, int $length): array
{
    if (\is_array($data)) {
        return \array_slice($data, $offset, $length);
    }
    
    // For comma-separated string of IDs
    $p = 0;
    $i = 0;
    
    // Skip to offset
    while ($i < $offset) {
        if (false === ($p = \strpos($data, ',', $p))) {
            return [];
        }
        ++$p;
        ++$i;
    }
    
    $e       = $p;
    $offset += $length;
    
    // Extract slice
    while ($i < $offset) {
        if (false === ($e = \strpos($data, ',', $e))) {
            return \array_map('\\intval', \explode(',', \substr($data, $p)));
        }
        ++$e;
        ++$i;
    }
    
    return \array_map('\\intval', \explode(',', \substr($data, $p, $e - $p - 1)));
}

public function count(string|array $data): int
{
    return \is_array($data) ? \count($data) : \substr_count($data, ',') + 1;
}

Results can be stored as arrays or comma-separated strings for memory efficiency.

Wildcard Search

Searches support wildcards for partial matching:

* matches zero or more characters
? matches exactly one character

Examples:

test* matches “test”, “testing”, “tester”
t?st matches “test”, “tost”, “tast”

Wildcards are removed during indexing but preserved during searching.

Hashtag Support

Hashtags are automatically detected and indexed:

$text = \preg_replace_callback(
    '%(?<=^|\s|\n|\r)#(?=[\p{L}\p{N}_]{3})[\p{L}\p{N}]+(?:_+[\p{L}\p{N}]+)*(?=$|\s|\n|\r|\.|,)%u',
    function ($matches) use (&$tags)  {
        $tags[] = $matches[0];
        return ' ';
    },
    $text
);

Hashtags must:

Start with #
Contain at least 3 alphanumeric characters or underscores
Be preceded by whitespace or line start
Be followed by whitespace, line end, or punctuation

Search Deletion

Old searches are cleaned up periodically:

// app/Models/Search/Delete.php
// Removes expired search results
// Keeps database size manageable

Performance Optimization

Indexed Searches

Full-text indexes enable fast queries

Result Caching

Search results are cached temporarily

Word Filtering

Stopwords reduce index size

Efficient Slicing

Smart pagination without loading all results

Best Practices

Index Maintenance

Rebuild the search index after importing posts or if search results seem stale. Run index maintenance during low-traffic periods.

Stopword Configuration

Customize stopwords for your forum’s primary language. Too few stopwords bloat the index; too many prevent valid searches.

CJK Content

If your forum has CJK content, ensure proper character encoding (UTF-8) throughout your application.

Performance

For very large forums (millions of posts), consider external search solutions like Elasticsearch or Sphinx.

Forums & Topics

Understand the content being searched

BBCode

How formatting affects search indexing

Get Started

Core Concepts

Features

Extensions

Administration

Overview

Search Architecture

Forums

Posts

Topics

Text Processing

Character Normalization

CJK Support

CJK Word Handling

Word Processing

Word Validation

Extracting Words

Stopwords

Search Execution

Search Actions

Search Preparation

Search Execution

Indexing

Index Management

Result Links

Result Slicing

Wildcard Search

Hashtag Support

Search Deletion

Performance Optimization

Indexed Searches

Result Caching

Word Filtering

Efficient Slicing

Best Practices

Forums & Topics

BBCode

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Extensions

Administration

​Overview

​Search Architecture

Forums

Posts

Topics

​Text Processing

​Character Normalization

​CJK Support

​CJK Word Handling

​Word Processing

​Word Validation

​Extracting Words

​Stopwords

​Search Execution

​Search Actions

​Search Preparation

​Search Execution

​Indexing

​Index Management

​Pagination and Results

​Result Links

​Result Slicing

​Wildcard Search

​Hashtag Support

​Search Deletion

​Performance Optimization

Indexed Searches

Result Caching

Word Filtering

Efficient Slicing

​Best Practices

​Related Features

Forums & Topics

BBCode

Build docs developers (and LLMs) love

Overview

Search Architecture

Text Processing

Character Normalization

CJK Support

CJK Word Handling

Word Processing

Word Validation

Extracting Words

Stopwords

Search Execution

Search Actions

Search Preparation

Search Execution

Indexing

Index Management

Pagination and Results

Result Links

Result Slicing

Wildcard Search

Hashtag Support

Search Deletion

Performance Optimization

Best Practices

Related Features