Repositories and MST

What are Repositories?

Repositories are signed, authenticated data structures that store all of a user’s records in AT Protocol. Each user has one repository containing their posts, likes, follows, profile, and other data.

Why Repositories Matter

Self-Authenticating

Every repository is cryptographically signed, making it impossible to tamper with data without detection.

Portable

Users can export their entire repository and import it on a different server, maintaining their complete history.

Efficient Sync

Repositories use Merkle trees, allowing efficient synchronization by only transferring changed data.

Verifiable

Anyone can verify the integrity and authorship of repository data using cryptographic proofs.

Repository Structure

A repository consists of:

Commit - Signed pointer to the current state
MST (Merkle Search Tree) - Ordered tree of records
Records - Individual data items (posts, likes, etc.)
Blocks - CBOR-encoded data blocks

Merkle Search Tree (MST)

The MST is the core data structure that organizes records in a repository. It combines properties of:

Merkle Trees - Cryptographic verification
B-Trees - Efficient searching and insertion
Deterministic ordering - Same data always produces same tree

Key Properties

/**
 * MST characteristics:
 * - Keys are stored in alphabetical order
 * - Insert-order independent (deterministic)
 * - Each key is hashed, leading zeros determine layer
 * - ~4 fanout (2-bits of zero per layer)
 * - Uses SHA-256 for key hashing
 */

How It Works

The MST uses a clever algorithm:

Hash each key with SHA-256
Count leading zero bits in the hash
Number of zeros determines tree layer
More zeros = higher in the tree

// Example key hashing:
// Key: "app.bsky.feed.post/abc123"
// Hash: 0x00F8A3... (8 leading zero bits)
// Layer: 4 (8 bits / 2 = layer 4)

This ensures:

Deterministic structure - Same records always produce same tree
Balanced tree - Probabilistically balanced by hash distribution
Efficient operations - O(log n) search, insert, delete

Using Repositories

Creating a Repository

import { MemoryBlockstore } from '@atproto/repo'
import { Repo } from '@atproto/repo'
import { Keypair } from '@atproto/crypto'

// Create storage and keypair
const storage = new MemoryBlockstore()
const keypair = await Keypair.create({ exportable: true })

const did = 'did:plc:user123'

// Create repository with initial records
const repo = await Repo.create(
  storage,
  did,
  keypair,
  [
    {
      action: 'create',
      collection: 'app.bsky.feed.post',
      rkey: '3jzxvpqr2bc2a',
      record: {
        $type: 'app.bsky.feed.post',
        text: 'Hello AT Protocol!',
        createdAt: new Date().toISOString()
      }
    },
    {
      action: 'create',
      collection: 'app.bsky.actor.profile',
      rkey: 'self',
      record: {
        $type: 'app.bsky.actor.profile',
        displayName: 'Alice',
        description: 'AT Protocol enthusiast'
      }
    }
  ]
)

console.log('Repository CID:', repo.cid)
console.log('DID:', repo.did)

Reading Records

// Get a specific record
const uri = 'at://did:plc:user123/app.bsky.feed.post/3jzxvpqr2bc2a'
const record = await repo.getRecord(uri)

console.log('Post:', record.value)

// List records in a collection
const posts = await repo.listRecords(
  'app.bsky.feed.post',
  { limit: 50 }
)

for (const post of posts.records) {
  console.log(`${post.uri}: ${post.value.text}`)
}

Writing Records

// Create a new record
const updated = await repo.applyWrites(
  {
    action: 'create',
    collection: 'app.bsky.feed.post',
    rkey: '3jzxvpqr2bc2b',
    record: {
      $type: 'app.bsky.feed.post',
      text: 'Another post!',
      createdAt: new Date().toISOString()
    }
  },
  keypair
)

console.log('New repository state:', updated.cid)

Updating Records

// Update an existing record
const updated = await repo.applyWrites(
  {
    action: 'update',
    collection: 'app.bsky.actor.profile',
    rkey: 'self',
    record: {
      $type: 'app.bsky.actor.profile',
      displayName: 'Alice Smith',
      description: 'Updated bio'
    }
  },
  keypair
)

Deleting Records

// Delete a record
const updated = await repo.applyWrites(
  {
    action: 'delete',
    collection: 'app.bsky.feed.post',
    rkey: '3jzxvpqr2bc2a'
  },
  keypair
)

Batch Operations

// Apply multiple writes in one commit
const updated = await repo.applyWrites(
  [
    {
      action: 'create',
      collection: 'app.bsky.feed.post',
      rkey: 'post1',
      record: { $type: 'app.bsky.feed.post', text: 'Post 1', createdAt: new Date().toISOString() }
    },
    {
      action: 'create',
      collection: 'app.bsky.feed.post',
      rkey: 'post2',
      record: { $type: 'app.bsky.feed.post', text: 'Post 2', createdAt: new Date().toISOString() }
    },
    {
      action: 'update',
      collection: 'app.bsky.actor.profile',
      rkey: 'self',
      record: { $type: 'app.bsky.actor.profile', displayName: 'Alice' }
    }
  ],
  keypair
)

Record Keys (rkeys)

Records are identified by their collection and rkey (record key):

// AT URI format:
// at://{did}/{collection}/{rkey}

// Example:
// at://did:plc:abc123/app.bsky.feed.post/3jzxvpqr2bc2a
//     └─────┬─────┘ └────────┬────────┘ └──────┬──────┘
//          DID            collection           rkey

TID-based Keys

Most records use TIDs (Timestamp Identifiers) as rkeys:

import { TID } from '@atproto/common'

// Generate a TID (time-based, k-sortable)
const rkey = TID.nextStr()
// Example: '3jzxvpqr2bc2a'

// TIDs are:
// - Timestamp-based (roughly sortable by creation time)
// - Collision-resistant
// - URL-safe base32 encoded

Literal Keys

Some records use fixed keys:

// Profile always uses 'self'
at://did:plc:abc123/app.bsky.actor.profile/self

// Defined in Lexicon:
{
  "type": "record",
  "key": "literal:self"
}

Commits

Each repository state is represented by a signed commit:

interface Commit {
  did: string        // Repository owner
  version: 3         // Commit format version
  rev: string        // Revision (TID)
  data: CID          // Pointer to MST root
}

Commit Lifecycle:

// Format a commit (doesn't apply it)
const commitData = await repo.formatCommit(
  writeOps,
  keypair
)

console.log('Commit CID:', commitData.cid)
console.log('Revision:', commitData.rev)
console.log('New blocks:', commitData.newBlocks.size)
console.log('Removed CIDs:', commitData.removedCids.size)

// Apply the commit
const updated = await repo.applyCommit(commitData)

CAR Files

Repositories are distributed as CAR (Content Addressed aRchive) files:

import { CarWriter } from '@atproto/repo'

// Export repository to CAR format
const car = await repo.exportCar()

// CAR contains:
// - All MST nodes
// - All record blocks
// - Commit object
// - Organized by CID

CAR files enable:

Repository export - Users can download their complete data
Efficient sync - Only transfer changed blocks
Backup and migration - Portable repository format

MST Operations

Direct MST usage (lower-level API):

import { MST } from '@atproto/repo'

const storage = new MemoryBlockstore()

// Create an MST
let mst = await MST.create(storage)

// Add entries
mst = await mst.add('app.bsky.feed.post/abc', recordCid)
mst = await mst.add('app.bsky.feed.post/def', recordCid2)
mst = await mst.add('app.bsky.feed.post/xyz', recordCid3)

// Get entry
const cid = await mst.get('app.bsky.feed.post/abc')

// Update entry
mst = await mst.update('app.bsky.feed.post/abc', newCid)

// Delete entry
mst = await mst.delete('app.bsky.feed.post/abc')

// List entries
const entries = await mst.list(50, 'app.bsky.feed.post/')

// Get MST root CID
const rootCid = await mst.getPointer()

Walking the Tree

// Walk all entries
for await (const entry of mst.walk()) {
  if (entry.isLeaf()) {
    console.log(`Key: ${entry.key}, Value: ${entry.value}`)
  }
}

// Walk from a specific key
for await (const leaf of mst.walkLeavesFrom('app.bsky.feed.post/abc')) {
  console.log(`${leaf.key}: ${leaf.value}`)
}

// List with prefix
const posts = await mst.listWithPrefix('app.bsky.feed.post/', 100)

Data Diff

Compute differences between repository states:

import { DataDiff } from '@atproto/repo'

// Compare two MST states
const diff = await DataDiff.of(newMst, oldMst)

console.log('New MST blocks:', diff.newMstBlocks)
console.log('New leaf CIDs:', diff.newLeafCids)
console.log('Removed CIDs:', diff.removedCids)

Useful for:

Computing repository updates
Generating sync payloads
Tracking changes

Proofs and Verification

MSTs support cryptographic proofs:

// Get proof for a specific record
const proof = await mst.getCoveringProof('app.bsky.feed.post/abc')

// Proof includes:
// - MST nodes on path to record
// - Sibling records (left and right)
// - Enough data to verify record existence

// Verify a record against proof
// (Typically done by receiving party)

Repository Sync Protocol

Repositories sync using the following protocol: Event Format:

interface RepoCommit {
  seq: number           // Sequence number
  rebase: boolean       // If true, full repo resync needed
  tooBig: boolean       // If true, repo too large to include
  repo: string          // DID
  commit: CID           // Commit CID
  rev: string           // Revision (TID)
  since: string | null  // Previous revision
  blocks: Uint8Array    // CAR file of new blocks
  ops: RepoOp[]        // List of operations
  blobs: CID[]         // New blobs
  time: string         // Timestamp
}

interface RepoOp {
  action: 'create' | 'update' | 'delete'
  path: string         // collection/rkey
  cid: CID | null     // Record CID (null for deletes)
}

Best Practices

Use TIDs for time-based records

For posts, likes, and other time-series data, use TID-based rkeys for chronological ordering.

Batch writes when possible

Combine multiple operations into a single commit to reduce overhead and improve atomicity.

Validate records before writing

Use Lexicon validation to ensure records conform to schemas before adding to repository.

Handle repository migrations

Design your system to support repository export/import for user portability.

Monitor repository size

Large repositories can be expensive to sync. Consider archiving or pagination strategies.

Storage Backends

Repositories can use different storage implementations:

import { 
  MemoryBlockstore,    // In-memory (testing)
  SqliteBlockstore,    // SQLite (production)
  // Custom implementations possible
} from '@atproto/repo'

// Memory storage (ephemeral)
const memStorage = new MemoryBlockstore()

// Persistent storage
const sqlStorage = new SqliteBlockstore('path/to/db.sqlite')

Storage Interface:

interface Blockstore {
  has(cid: CID): Promise<boolean>
  get(cid: CID): Promise<Uint8Array>
  put(cid: CID, bytes: Uint8Array): Promise<void>
  delete(cid: CID): Promise<void>
  // ... additional methods
}

Error Handling

import { 
  MissingBlockError,
  MissingBlocksError 
} from '@atproto/repo'

try {
  const record = await repo.getRecord(uri)
} catch (error) {
  if (error instanceof MissingBlockError) {
    console.error('Block not found:', error.cid)
  } else if (error instanceof MissingBlocksError) {
    console.error('Multiple blocks missing:', error.cids)
  } else {
    console.error('Unexpected error:', error)
  }
}

Lexicons - Record schemas and validation
Identity - DIDs and repository ownership
Overview - AT Protocol architecture

Additional Resources

@atproto/repo Package

NPM package documentation

Repository Spec

Official repository specification

MST Paper

Academic paper on Merkle Search Trees

CAR Format

Content Addressed aRchive specification

Get Started

Core Concepts

Guides

What are Repositories?

Why Repositories Matter

Self-Authenticating

Portable

Efficient Sync

Verifiable

Repository Structure

Merkle Search Tree (MST)

Key Properties

How It Works

Using Repositories

Creating a Repository

Reading Records

Writing Records

Updating Records

Deleting Records

Batch Operations

Record Keys (rkeys)

TID-based Keys

Literal Keys

Commits

CAR Files

MST Operations

Walking the Tree

Data Diff

Proofs and Verification

Repository Sync Protocol

Best Practices

Storage Backends

Error Handling

Additional Resources

@atproto/repo Package

Repository Spec

MST Paper

CAR Format

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​What are Repositories?

​Why Repositories Matter

​Self-Authenticating

​Portable

​Efficient Sync

​Verifiable

​Repository Structure

​Merkle Search Tree (MST)

​Key Properties

​How It Works

​Using Repositories

​Creating a Repository

​Reading Records

​Writing Records

​Updating Records

​Deleting Records

​Batch Operations

​Record Keys (rkeys)

​TID-based Keys

​Literal Keys

​Commits

​CAR Files

​MST Operations

​Walking the Tree

​Data Diff

​Proofs and Verification

​Repository Sync Protocol

​Best Practices

​Storage Backends

​Error Handling

​Related Topics

​Additional Resources

@atproto/repo Package

Repository Spec

MST Paper

CAR Format

Build docs developers (and LLMs) love

What are Repositories?

Why Repositories Matter

Self-Authenticating

Portable

Efficient Sync

Verifiable

Repository Structure

Merkle Search Tree (MST)

Key Properties

How It Works

Using Repositories

Creating a Repository

Reading Records

Writing Records

Updating Records

Deleting Records

Batch Operations

Record Keys (rkeys)

TID-based Keys

Literal Keys

Commits

CAR Files

MST Operations

Walking the Tree

Data Diff

Proofs and Verification

Repository Sync Protocol

Best Practices

Storage Backends

Error Handling

Related Topics

Additional Resources