The Core Primitive
Proteus implements a novel prediction market structure: text prediction scored by on-chain Levenshtein distance. Instead of betting on yes/no outcomes, participants predict the exact character sequence a public figure will post, and the closest match wins on a continuous gradient.Binary Markets (Existing)
Outcome space: 2 possibilitiesInformation density: 1 bitPayoff function: Cliff (right or wrong)AI impact: Commoditizes as models converge on same probability
Text Prediction (Proteus)
Outcome space: ~10^554 possibilitiesInformation density: 1,840 bitsPayoff function: Continuous gradientAI impact: Deepens as models compete on character-level precision
Levenshtein Distance Explained
The Levenshtein distance (edit distance) between two strings is the minimum number of single-character operations needed to transform one into the other. Three operations are allowed:Why This Metric?
Levenshtein distance is a proper metric on the space of text predictions, satisfying three critical properties:Why These Properties Matter for Markets
Why These Properties Matter for Markets
Identity of indiscernibles ensures perfect predictions (distance 0) are unambiguously identified. No approximation - character-for-character exactness is the only way to achieve zero distance.Symmetry ensures the scoring function doesn’t depend on the order of comparison. Computing distance(prediction, actual) gives the same result as distance(actual, prediction).Triangle inequality ensures coherence: a prediction “close” to the actual text cannot simultaneously be “far” from another prediction that is also “close” to the actual text. This prevents pathological scoring scenarios.Together, these properties create a continuous payoff surface where marginal improvements in prediction quality always translate to marginal improvements in expected payout. There is no “close enough” threshold - every edit counts.
The On-Chain Algorithm
The smart contract implements Levenshtein distance using the Wagner-Fischer dynamic programming algorithm with space optimization:Gas Costs (BASE L2 approximate):
- 50 characters each: ~400,000 gas
- 100 characters each: ~1,500,000 gas
- 280 characters each: ~9,000,000 gas
MAX_TEXT_LENGTH = 280 to prevent block gas limit DoS attacks.Worked Example: Computing Distance
Let’s compute the distance between two predictions for an Elon Musk post: Actual text:Initialize Matrix
Create a matrix where rows represent characters in the actual text and columns represent characters in the prediction:
Fill Matrix
For each cell (i, j), compute the minimum of:
- Cell above + 1 (deletion)
- Cell to the left + 1 (insertion)
- Diagonal cell + cost (substitution, cost = 0 if characters match)
Find Mismatches
After “Starship flight 2 ”, we have:
- Actual:
is GO for March - Predicted:
confirmed for March
- Substitute ‘i’ → ‘c’
- Substitute ‘s’ → ‘o’
- Insert ‘n’
- Substitute ’ ’ → ‘f’
- Substitute ‘G’ → ‘i’
- Substitute ‘O’ → ‘r’
- Insert ‘m’
- Insert ‘e’
- Delete extra ‘d’
Market Lifecycle
Markets progress through five phases:Creation
Anyone can create a market by calling Constraints:
createMarket(actorHandle, duration):- Duration: 1 hour to 30 days
- No ETH required to create
- Creator address recorded for fee distribution
Submission Phase
Users submit predictions by calling Constraints:
createSubmission(marketId, predictedText) with ETH stake:- Minimum stake: 0.001 ETH
- Maximum text length: 280 characters
- Betting cutoff: 1 hour before market end
- Empty strings revert (use
__NULL__for silence prediction)
- ETH transferred to contract
- Submission stored on-chain
totalPoolincrementedSubmissionCreatedevent emitted
Market End
When
block.timestamp >= endTime, submissions are no longer accepted. The market enters a waiting state for oracle resolution.Edge case: If only 1 submission exists, anyone can call refundSingleSubmission(marketId) for full refund (no fee taken).Oracle Resolution
The contract owner (currently a single EOA, future: decentralized oracle consensus) calls What happens:
resolveMarket(marketId, actualText):- Requires minimum 2 submissions
- Iterates through all submissions
- Computes Levenshtein distance for each
- Selects winner:
argmin(distance) - Tie-breaking: first submitter wins (deterministic)
- Emits
MarketResolvedevent with winner ID and distance
Scoring Examples
Let’s examine six scenarios demonstrating different strategic outcomes:Example 1: AI Roleplay Dominance
Market: What will @elonmusk post about Starship? Actual text:- Claude captures tone, structure, and vocabulary
- Gets characteristic phrases: “Humanity becomes multiplanetary or … trying”
- Misses: “is GO” vs “confirmed” (8 edits), “we die” vs “dies” (4 edits)
- Human understood the theme but theme doesn’t pay - exact wording does
- Random bot demonstrates anti-spam property: gibberish → distance ≈ max(len(a), len(b))
Example 2: Insider Information Edge
Market: What will @sama post about AGI? Actual text:- Insider heard the rehearsed phrase “we are now confident”
- AI generated plausible but incorrect “we now believe”
- That single phrase difference = 14-edit advantage
- Information asymmetry is priced continuously, not binary
Example 3: THE THESIS EXAMPLE - AI vs AI
Market: What will @sataborasu (Satya Nadella) post about Copilot? Actual text:Example 4: Predicting Silence
Market: What will @JensenHuang post? Actual result: (nothing posted) - resolved with__NULL__
- Binary markets cannot express “this person will not post”
- The
__NULL__sentinel enables betting on inaction - AI roleplay agents always generate text - structurally incapable of predicting silence
- Distance 0 (exact match) = entire pool to null trader
Outcome Space Analysis
The combinatorial explosion of text prediction creates an AI-resistant market structure:Mathematical Formulation
Mathematical Formulation
Binary market outcome space:Text prediction outcome space (280-char ASCII):Information density ratio:Each text prediction market encodes approximately 1,840 times more information than a binary market.For context, the number of atoms in the observable universe is estimated at ~10^80. The text prediction outcome space exceeds this by 474 orders of magnitude.
Expected Distance for Random Strings
Expected Distance for Random Strings
For two random strings of lengths m and n over alphabet A:Intuition: When the alphabet is large (e.g., 95 printable ASCII characters), the probability that any two random characters match is ~1.05%. With negligible match probability, the optimal edit strategy is to delete all of string A and insert all of string B, giving distance = max(m, n).Implication for spam: Bots submitting random strings cannot get lucky. The metric itself functions as a spam filter - there is no shortcut in a character-level outcome space.Empirical verification: In Example 1, the random bot prediction
a8j3kd9xmz pqlw7 MARS ufk2 rocket lol achieves distance 72 against actual text of ~85 characters, close to the theoretical maximum.Fee Distribution
When a market resolves with 2+ submissions, the winner receives 93% of the total pool. The 7% platform fee (700 basis points) is split:| Recipient | Share of Fee | Share of Volume |
|---|---|---|
| Genesis NFT Holders | 20.0% | 1.4% |
| Oracles | 28.6% | 2.0% |
| Market Creators | 14.3% | 1.0% |
| Node Operators | 14.3% | 1.0% |
| Builder Pool | 28.6% | 2.0% |
Edge case: Markets with only 1 submission receive a full refund (no fee) when
refundSingleSubmission() is called after market end. This prevents unfair fee extraction when there’s no competition.Tie-Breaking Rules
When multiple submissions achieve the same minimum distance, the first submitter wins. The resolution algorithm uses strict less-than comparison:Security Properties
Reentrancy Protection
All payment functions use OpenZeppelin’s
nonReentrant modifier. Payouts follow checks-effects-interactions pattern.Pull-Based Fees
Fee recipients call
withdrawFees() to claim accumulated fees. Prevents griefing via malicious contract fee recipients.Gas Limit Protection
MAX_TEXT_LENGTH = 280 prevents DoS via excessively long strings that exceed block gas limit.Deterministic Ordering
Tie-breaking by submission ID (chronological) is deterministic and transparent. No oracle discretion in winner selection.
What’s Next?
API Reference
Complete function reference for PredictionMarketV2
Examples
6 worked examples with full transaction data
Architecture
System design and contract stack overview
Whitepaper
Full research paper with formal analysis