How It Works

The Core Primitive

Proteus implements a novel prediction market structure: text prediction scored by on-chain Levenshtein distance. Instead of betting on yes/no outcomes, participants predict the exact character sequence a public figure will post, and the closest match wins on a continuous gradient.

Binary Markets (Existing)

Outcome space: 2 possibilitiesInformation density: 1 bitPayoff function: Cliff (right or wrong)AI impact: Commoditizes as models converge on same probability

Text Prediction (Proteus)

Outcome space: ~10^554 possibilitiesInformation density: 1,840 bitsPayoff function: Continuous gradientAI impact: Deepens as models compete on character-level precision

Levenshtein Distance Explained

The Levenshtein distance (edit distance) between two strings is the minimum number of single-character operations needed to transform one into the other. Three operations are allowed:

Insertion

Add a character: cat → ca**r**t (distance = 1)

Deletion

Remove a character: cart → cat (distance = 1)

Substitution

Replace a character: cat → c**u**t (distance = 1)

Why This Metric?

Levenshtein distance is a proper metric on the space of text predictions, satisfying three critical properties:

For all strings a, b, c: 1. Identity: d_L(a, b) = 0 ⟺ a = b 2. Symmetry: d_L(a, b) = d_L(b, a) 3. Triangle inequality: d_L(a, c) ≤ d_L(a, b) + d_L(b, c)

Why These Properties Matter for Markets

Identity of indiscernibles ensures perfect predictions (distance 0) are unambiguously identified. No approximation - character-for-character exactness is the only way to achieve zero distance.Symmetry ensures the scoring function doesn’t depend on the order of comparison. Computing distance(prediction, actual) gives the same result as distance(actual, prediction).Triangle inequality ensures coherence: a prediction “close” to the actual text cannot simultaneously be “far” from another prediction that is also “close” to the actual text. This prevents pathological scoring scenarios.Together, these properties create a continuous payoff surface where marginal improvements in prediction quality always translate to marginal improvements in expected payout. There is no “close enough” threshold - every edit counts.

The On-Chain Algorithm

The smart contract implements Levenshtein distance using the Wagner-Fischer dynamic programming algorithm with space optimization:

function levenshteinDistance(
    string memory a,
    string memory b
) public pure returns (uint256) {
    bytes memory bytesA = bytes(a);
    bytes memory bytesB = bytes(b);
    
    uint256 lenA = bytesA.length;
    uint256 lenB = bytesB.length;
    
    // Handle empty strings
    if (lenA == 0) return lenB;
    if (lenB == 0) return lenA;
    
    // Two-row optimization (O(min(m,n)) space instead of O(m*n))
    uint256[] memory prevRow = new uint256[](lenB + 1);
    uint256[] memory currRow = new uint256[](lenB + 1);
    
    // Initialize first row: [0, 1, 2, 3, ..., lenB]
    for (uint256 j = 0; j <= lenB; j++) {
        prevRow[j] = j;
    }
    
    // Fill the distance matrix
    for (uint256 i = 1; i <= lenA; i++) {
        currRow[0] = i;
        
        for (uint256 j = 1; j <= lenB; j++) {
            uint256 cost = (bytesA[i - 1] == bytesB[j - 1]) ? 0 : 1;
            
            // Minimum of: deletion, insertion, substitution
            uint256 deletion = prevRow[j] + 1;
            uint256 insertion = currRow[j - 1] + 1;
            uint256 substitution = prevRow[j - 1] + cost;
            
            currRow[j] = min3(deletion, insertion, substitution);
        }
        
        // Swap rows for next iteration
        (prevRow, currRow) = (currRow, prevRow);
    }
    
    return prevRow[lenB];
}

Gas Costs (BASE L2 approximate):

50 characters each: ~400,000 gas
100 characters each: ~1,500,000 gas
280 characters each: ~9,000,000 gas

The contract enforces MAX_TEXT_LENGTH = 280 to prevent block gas limit DoS attacks.

Worked Example: Computing Distance

Let’s compute the distance between two predictions for an Elon Musk post: Actual text:

Starship flight 2 is GO for March

Prediction:

Starship flight 2 confirmed for March

Initialize Matrix

Create a matrix where rows represent characters in the actual text and columns represent characters in the prediction:

     ""  S  t  a  r  s  h  i  p  ...
""    0  1  2  3  4  5  6  7  8  ...
S     1  
t     2
a     3
r     4
...

Fill Matrix

For each cell (i, j), compute the minimum of:

Cell above + 1 (deletion)
Cell to the left + 1 (insertion)
Diagonal cell + cost (substitution, cost = 0 if characters match)

The first 17 characters match exactly: “Starship flight 2”

Find Mismatches

After “Starship flight 2 ”, we have:

Actual: is GO for March
Predicted: confirmed for March

Operations needed:

Substitute ‘i’ → ‘c’
Substitute ‘s’ → ‘o’
Insert ‘n’
Substitute ’ ’ → ‘f’
Substitute ‘G’ → ‘i’
Substitute ‘O’ → ‘r’
Insert ‘m’
Insert ‘e’
Delete extra ‘d’

Total distance: 8 edits

The algorithm computes the minimum number of edits. There may be multiple paths to transform string A into string B, but Levenshtein distance always returns the shortest one.

Market Lifecycle

Markets progress through five phases:

Creation

Anyone can create a market by calling createMarket(actorHandle, duration):

function createMarket(
    string calldata _actorHandle,  // e.g., "@elonmusk"
    uint256 _duration              // seconds until market ends
) external returns (uint256 marketId)

Constraints:

Duration: 1 hour to 30 days
No ETH required to create
Creator address recorded for fee distribution

Submission Phase

Users submit predictions by calling createSubmission(marketId, predictedText) with ETH stake:

function createSubmission(
    uint256 _marketId,
    string calldata _predictedText
) external payable returns (uint256 submissionId)

Constraints:

Minimum stake: 0.001 ETH
Maximum text length: 280 characters
Betting cutoff: 1 hour before market end
Empty strings revert (use __NULL__ for silence prediction)

What happens:

ETH transferred to contract
Submission stored on-chain
totalPool incremented
SubmissionCreated event emitted

Market End

When block.timestamp >= endTime, submissions are no longer accepted. The market enters a waiting state for oracle resolution.Edge case: If only 1 submission exists, anyone can call refundSingleSubmission(marketId) for full refund (no fee taken).

Oracle Resolution

The contract owner (currently a single EOA, future: decentralized oracle consensus) calls resolveMarket(marketId, actualText):

function resolveMarket(
    uint256 _marketId,
    string calldata _actualText
) external onlyOwner

What happens:

Requires minimum 2 submissions
Iterates through all submissions
Computes Levenshtein distance for each
Selects winner: argmin(distance)
Tie-breaking: first submitter wins (deterministic)
Emits MarketResolved event with winner ID and distance

Payout

The winning submitter calls claimPayout(submissionId):

function claimPayout(uint256 _submissionId) external nonReentrant

Payout calculation:

totalPool = sum of all stakes
fee = floor(totalPool × 700 / 10000)  // 7%
payout = totalPool - fee               // 93% to winner

The fee is accumulated for pull-based withdrawal (prevents griefing if fee recipient is a malicious contract).

Scoring Examples

Let’s examine six scenarios demonstrating different strategic outcomes:

Example 1: AI Roleplay Dominance

Market: What will @elonmusk post about Starship? Actual text:

Starship flight 2 is GO for March. Humanity becomes multiplanetary or we die trying.

Starship flight 2 confirmed for March. Humanity becomes multiplanetary or dies trying.

Analysis:

Claude captures tone, structure, and vocabulary
Gets characteristic phrases: “Humanity becomes multiplanetary or … trying”
Misses: “is GO” vs “confirmed” (8 edits), “we die” vs “dies” (4 edits)
Human understood the theme but theme doesn’t pay - exact wording does
Random bot demonstrates anti-spam property: gibberish → distance ≈ max(len(a), len(b))

Example 2: Insider Information Edge

Market: What will @sama post about AGI? Actual text:

we are now confident AGI is achievable with current techniques. announcement soon.

we are now confident AGI is achievable with current techniques. big announcement soon.

Analysis:

Insider heard the rehearsed phrase “we are now confident”
AI generated plausible but incorrect “we now believe”
That single phrase difference = 14-edit advantage
Information asymmetry is priced continuously, not binary

Example 3: THE THESIS EXAMPLE - AI vs AI

Market: What will @sataborasu (Satya Nadella) post about Copilot? Actual text:

Copilot is now generating 46% of all new code at GitHub-connected enterprises. The AI transformation of software is just beginning.

Copilot is now generating 45% of all new code at GitHub-connected enterprises. The AI transformation of software is just beginning.

This is the thesis example. Two frontier AI models, same public training corpus, same prompt. Claude gets within 1 edit (only difference: “45%” vs “46%”). GPT gets within 8 edits (additionally substitutes “has just begun” for “is just beginning”). The 7-edit gap between frontier models determines the entire pool. In a binary market, both “predicted correctly” - both said Copilot generates lots of code. Binary markets would split nothing. Levenshtein rewards marginal calibration.

Example 4: Predicting Silence

Market: What will @JensenHuang post? Actual result: (nothing posted) - resolved with __NULL__

__NULL__

Analysis:

Binary markets cannot express “this person will not post”
The __NULL__ sentinel enables betting on inaction
AI roleplay agents always generate text - structurally incapable of predicting silence
Distance 0 (exact match) = entire pool to null trader

Outcome Space Analysis

The combinatorial explosion of text prediction creates an AI-resistant market structure:

Mathematical Formulation

Binary market outcome space:

|O_binary| = 2
Information content = log₂(2) = 1 bit

Text prediction outcome space (280-char ASCII):

|O_text| = Σ(k=0 to 280) 95^k ≈ 95^280 ≈ 10^554
Information content = 280 × log₂(95) ≈ 1,840 bits

Information density ratio:

I_text / I_binary = 1,840 / 1 = 1,840:1

Each text prediction market encodes approximately 1,840 times more information than a binary market.For context, the number of atoms in the observable universe is estimated at ~10^80. The text prediction outcome space exceeds this by 474 orders of magnitude.

Expected Distance for Random Strings

For two random strings of lengths m and n over alphabet A:

E[d_L(a, b)] → max(m, n) as |A| → ∞

Intuition: When the alphabet is large (e.g., 95 printable ASCII characters), the probability that any two random characters match is ~1.05%. With negligible match probability, the optimal edit strategy is to delete all of string A and insert all of string B, giving distance = max(m, n).Implication for spam: Bots submitting random strings cannot get lucky. The metric itself functions as a spam filter - there is no shortcut in a character-level outcome space.Empirical verification: In Example 1, the random bot prediction a8j3kd9xmz pqlw7 MARS ufk2 rocket lol achieves distance 72 against actual text of ~85 characters, close to the theoretical maximum.

Fee Distribution

When a market resolves with 2+ submissions, the winner receives 93% of the total pool. The 7% platform fee (700 basis points) is split:

Recipient	Share of Fee	Share of Volume
Genesis NFT Holders	20.0%	1.4%
Oracles	28.6%	2.0%
Market Creators	14.3%	1.0%
Node Operators	14.3%	1.0%
Builder Pool	28.6%	2.0%

Fees are accumulated in a pull-based collection mapping to prevent griefing attacks where a malicious fee recipient reverts transfers.

Edge case: Markets with only 1 submission receive a full refund (no fee) when refundSingleSubmission() is called after market end. This prevents unfair fee extraction when there’s no competition.

Tie-Breaking Rules

When multiple submissions achieve the same minimum distance, the first submitter wins. The resolution algorithm uses strict less-than comparison:

if (distance < minDistance) {  // Strict less-than
    minDistance = distance;
    winningId = subIds[i];
}

A later submission with equal distance does not displace an earlier one. This creates an incentive to submit early and confidently - waiting to see others’ submissions provides no advantage since all submissions are committed on-chain before resolution.

Security Properties

Reentrancy Protection

All payment functions use OpenZeppelin’s nonReentrant modifier. Payouts follow checks-effects-interactions pattern.

Pull-Based Fees

Fee recipients call withdrawFees() to claim accumulated fees. Prevents griefing via malicious contract fee recipients.

Gas Limit Protection

MAX_TEXT_LENGTH = 280 prevents DoS via excessively long strings that exceed block gas limit.

Deterministic Ordering

Tie-breaking by submission ID (chronological) is deterministic and transparent. No oracle discretion in winner selection.

Known Centralization Risk: The current resolution mechanism uses a single EOA (contract owner) to call resolveMarket(). This is the most significant centralization risk in the v0 prototype.Planned upgrade: Decentralized oracle consensus using commit-reveal, with slashing for dishonest oracles and screenshot proof requirements.

What’s Next?

API Reference

Complete function reference for PredictionMarketV2

Examples

6 worked examples with full transaction data

Architecture

System design and contract stack overview

Whitepaper

Full research paper with formal analysis

Get Started

Core Concepts

Using Proteus

Smart Contracts

Development

The Core Primitive

Binary Markets (Existing)

Text Prediction (Proteus)

Levenshtein Distance Explained

Why This Metric?

The On-Chain Algorithm

Worked Example: Computing Distance

Market Lifecycle

Scoring Examples

Example 1: AI Roleplay Dominance

Example 2: Insider Information Edge

Example 3: THE THESIS EXAMPLE - AI vs AI

Example 4: Predicting Silence

Outcome Space Analysis

Fee Distribution

Tie-Breaking Rules

Security Properties

Reentrancy Protection

Pull-Based Fees

Gas Limit Protection

Deterministic Ordering

What’s Next?

API Reference

Examples

Architecture

Whitepaper

Build docs developers (and LLMs) love

Get Started

Core Concepts

Using Proteus

Smart Contracts

Development

​The Core Primitive

Binary Markets (Existing)

Text Prediction (Proteus)

​Levenshtein Distance Explained

​Why This Metric?

​The On-Chain Algorithm

​Worked Example: Computing Distance

​Market Lifecycle

​Scoring Examples

​Example 1: AI Roleplay Dominance

​Example 2: Insider Information Edge

​Example 3: THE THESIS EXAMPLE - AI vs AI

​Example 4: Predicting Silence

​Outcome Space Analysis

​Fee Distribution

​Tie-Breaking Rules

​Security Properties

Reentrancy Protection

Pull-Based Fees

Gas Limit Protection

Deterministic Ordering

​What’s Next?

API Reference

Examples

Architecture

Whitepaper

Build docs developers (and LLMs) love

The Core Primitive

Levenshtein Distance Explained

Why This Metric?

The On-Chain Algorithm

Worked Example: Computing Distance

Market Lifecycle

Scoring Examples

Example 1: AI Roleplay Dominance

Example 2: Insider Information Edge

Example 3: THE THESIS EXAMPLE - AI vs AI

Example 4: Predicting Silence

Outcome Space Analysis

Fee Distribution

Tie-Breaking Rules

Security Properties

What’s Next?