Overview
A suffix array is a sorted array of all suffixes of a string. It’s a space-efficient alternative to suffix trees, enabling fast pattern matching and string analysis.Time Complexity
- Build: O(n log n)
- Pattern Search: O(m log n)
- LCP Construction: O(n)
What is a Suffix Array?
For strings = "banana", the suffixes are:
[5, 3, 1, 0, 4, 2]
Implementation
Suffix Array Construction
string_suffix_array.cpp
LCP Array Construction
The LCP (Longest Common Prefix) array stores the length of the longest common prefix between consecutive suffixes in the sorted order.string_suffix_array.cpp
Basic Usage
- Build Suffix Array
- Build LCP Array
- Pattern Search
Advanced Applications
Longest Common Substring
Number of Distinct Substrings
Longest Repeated Substring
K-th Lexicographical Substring
Circular String Minimum Rotation
Suffix Array vs Other Structures
| Feature | Suffix Array | Suffix Tree | Trie |
|---|---|---|---|
| Build Time | O(n log n) | O(n) | O(Σ m) |
| Space | O(n) | O(n) | O(Σ m × alphabet) |
| Pattern Search | O(m log n) | O(m) | O(m) |
| LCP Queries | O(1) with RMQ | O(1) | ❌ |
| Memory Efficient | ✅ | ❌ | ❌ |
| Implementation | Medium | Complex | Simple |
When to use Suffix Array:
- Need space-efficient suffix structure
- Multiple pattern searches on same text
- Computing substring statistics
- Finding repeated patterns
- Bioinformatics (DNA sequence analysis)
- Less memory than suffix tree (5x)
- Simpler implementation
- Cache-friendly (better constants)
- Works well with LCP array
- Slower build than suffix tree
- Pattern search is O(m log n) vs O(m)
- Online construction not supported
Optimizations
Using Sparse Table for RMQ on LCP
Kasai’s Algorithm (Optimized LCP)
The implementation above uses Kasai’s algorithm, which is optimal O(n) and uses the observation that LCP values decrease by at most 1 between consecutive positions in the original string.Practice Problems
- Longest Repeated Substring (LeetCode 1062)
- Distinct Substrings (SPOJ - DISUBSTR)
- Longest Common Substring (SPOJ - LCS)
- Minimum Rotation (SPOJ - MINMOVE)
- Repeated DNA Sequences (LeetCode 187)
- Palindrome Pairs (LeetCode 336)
See Also
- String Hashing - Alternative for substring comparison
- Pattern Matching - KMP, Rabin-Karp for simpler searches