What is Levenshtein Distance?
The Levenshtein distance between two strings is the minimum number of edit operations required to transform one into the other.Example Calculation
Let’s walk through a simple example:Real Market Example
From the Satya Nadella market (Example 5 in the whitepaper):Actual Text
Copilot is now generating 46% of all new code at GitHub-connected enterprises. The AI transformation of software is just beginning.Claude Prediction
Copilot is now generating 45% of all new code at GitHub-connected enterprises. The AI transformation of software is just beginning.Distance: 1 (single character difference: 5 → 6)Winner: Claude at distance 1. The 7-edit gap between Claude and GPT determines the entire pool. In a binary market, both would “predict correctly” and split nothing.
Mathematical Properties
Levenshtein distance is a proper metric on the space of strings. This mathematical property is essential for fair market design.Metric Properties
Identity
d(a, b) = 0 if and only if a = bPerfect predictions are unambiguously identified.
Symmetry
d(a, b) = d(b, a)Order doesn’t matter - prediction vs actual gives same distance.
Triangle Inequality
d(a, c) ≤ d(a, b) + d(b, c)Ensures coherent distance relationships.
Non-Negativity
d(a, b) ≥ 0 for all stringsDistances are always positive or zero.
Why This Matters
These properties guarantee:- No gaming: You can’t manipulate distances through intermediate strings
- Deterministic: Same inputs always produce same distance
- Fair: The closest prediction always wins, no ambiguity
On-Chain Implementation
Proteus computes Levenshtein distance entirely on-chain using the Wagner-Fischer dynamic programming algorithm.Algorithm Overview
PredictionMarketV2.sol:460-503
Complexity & Gas Costs
Time Complexity: O(m × n)
Time Complexity: O(m × n)
Where
m and n are the lengths of the two strings. For tweet-length text (280 characters each), this is approximately 78,400 operations.Space Complexity: O(min(m, n))
Space Complexity: O(min(m, n))
The two-row optimization reduces memory usage. We only store two rows of the matrix at a time instead of the full m × n matrix.
Gas Costs on BASE L2
Gas Costs on BASE L2
Approximate gas costs from benchmarking:
The contract enforces
| String Length (each) | Gas Cost |
|---|---|
| 50 characters | ~400,000 |
| 100 characters | ~1,500,000 |
| 280 characters | ~9,000,000 |
MAX_TEXT_LENGTH = 280 to prevent block gas limit DoS attacks.Continuous Gradient Payoff
Unlike binary markets (correct/incorrect), Levenshtein creates a continuous payoff surface.Distance vs Payout
The winner is determined by:- Perfect (d = 0)
- Near-Perfect (d = 1-5)
- High Quality (d = 10-20)
- Thematic (d = 50-100)
- Random (d ≈ max length)
Exact match. Takes entire pool (93% after fees).Example:
__NULL__ prediction when target doesn’t post.Lipschitz Continuity
The expected payoff is Lipschitz-continuous with respect to prediction quality:- Marginal improvements always translate to marginal payout improvements
- No “close enough” threshold
- Every edit counts
Anti-Bot Properties
Levenshtein distance naturally filters spam and random submissions.Expected Distance for Random Strings
For random strings over a large alphabet (like printable ASCII with 95 characters):Real Example: Bot Entropy
From Tim Cook market (Example 6):| Submitter | Prediction | Distance |
|---|---|---|
| AI Roleplay | Coherent Apple Intelligence message | 28 |
| Human (thematic) | “Tim will say something about privacy…“ | 53 |
| Random bot | x7g APPLE j2m PHONE kq9 BUY... | 65 |
| Degenerate bot | aaaaaaaaaaaaaaaaaaaaaa... | 73 |
Outcome Space Analysis
Levenshtein distance operates over a combinatorially explosive space.Binary Market
- Outcome space:
- Information content: 1 bit
Text Prediction Market
- Alphabet: 95 printable ASCII characters
- Max length: 280 characters (tweet length)
- Outcome space: ≈ 95^280 ≈ 10^554 possibilities
- Information content: ≈ 1,840 bits
Information density ratio: 1,840:1Each text prediction market encodes approximately 1,840 times more information than a binary market.
Comparison: Levenshtein vs Binary
Binary Markets
Payoff: Cliff function (right/wrong)Space: 2 outcomesAI Resistance: Collapses as models convergeExample: “Will Nadella post about Copilot? Yes/No”Both Claude and GPT say “Yes” → split pool
Levenshtein Markets
Payoff: Continuous gradientSpace: 10^554 outcomesAI Resistance: Deepens as models improveExample: “What will Nadella post about Copilot?”Claude at d=1, GPT at d=8 → 7-edit gap = entire pool
Key Takeaways
Proper Metric
Mathematical guarantees ensure fair, deterministic scoring
Continuous Payoff
Every character of precision is rewarded
Natural Spam Filter
Random strings achieve maximum distance automatically
On-Chain Computation
Wagner-Fischer algorithm runs entirely on BASE
1,840x Information
Massive outcome space vs binary markets
AI-Resistant
Markets deepen as models improve, not commoditize
Further Reading
Market Lifecycle
How markets progress from creation to resolution
Fee Structure
Platform fees and payout distribution
Academic Reference: Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707-710.Wagner-Fischer algorithm: Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal of the ACM, 21(1), 168-173.