Skip to main content

What are Repositories?

Repositories are signed, authenticated data structures that store all of a user’s records in AT Protocol. Each user has one repository containing their posts, likes, follows, profile, and other data.

Why Repositories Matter

Self-Authenticating

Every repository is cryptographically signed, making it impossible to tamper with data without detection.

Portable

Users can export their entire repository and import it on a different server, maintaining their complete history.

Efficient Sync

Repositories use Merkle trees, allowing efficient synchronization by only transferring changed data.

Verifiable

Anyone can verify the integrity and authorship of repository data using cryptographic proofs.

Repository Structure

A repository consists of:
  1. Commit - Signed pointer to the current state
  2. MST (Merkle Search Tree) - Ordered tree of records
  3. Records - Individual data items (posts, likes, etc.)
  4. Blocks - CBOR-encoded data blocks

Merkle Search Tree (MST)

The MST is the core data structure that organizes records in a repository. It combines properties of:
  • Merkle Trees - Cryptographic verification
  • B-Trees - Efficient searching and insertion
  • Deterministic ordering - Same data always produces same tree

Key Properties

/**
 * MST characteristics:
 * - Keys are stored in alphabetical order
 * - Insert-order independent (deterministic)
 * - Each key is hashed, leading zeros determine layer
 * - ~4 fanout (2-bits of zero per layer)
 * - Uses SHA-256 for key hashing
 */

How It Works

The MST uses a clever algorithm:
  1. Hash each key with SHA-256
  2. Count leading zero bits in the hash
  3. Number of zeros determines tree layer
  4. More zeros = higher in the tree
// Example key hashing:
// Key: "app.bsky.feed.post/abc123"
// Hash: 0x00F8A3... (8 leading zero bits)
// Layer: 4 (8 bits / 2 = layer 4)
This ensures:
  • Deterministic structure - Same records always produce same tree
  • Balanced tree - Probabilistically balanced by hash distribution
  • Efficient operations - O(log n) search, insert, delete

Using Repositories

Creating a Repository

import { MemoryBlockstore } from '@atproto/repo'
import { Repo } from '@atproto/repo'
import { Keypair } from '@atproto/crypto'

// Create storage and keypair
const storage = new MemoryBlockstore()
const keypair = await Keypair.create({ exportable: true })

const did = 'did:plc:user123'

// Create repository with initial records
const repo = await Repo.create(
  storage,
  did,
  keypair,
  [
    {
      action: 'create',
      collection: 'app.bsky.feed.post',
      rkey: '3jzxvpqr2bc2a',
      record: {
        $type: 'app.bsky.feed.post',
        text: 'Hello AT Protocol!',
        createdAt: new Date().toISOString()
      }
    },
    {
      action: 'create',
      collection: 'app.bsky.actor.profile',
      rkey: 'self',
      record: {
        $type: 'app.bsky.actor.profile',
        displayName: 'Alice',
        description: 'AT Protocol enthusiast'
      }
    }
  ]
)

console.log('Repository CID:', repo.cid)
console.log('DID:', repo.did)

Reading Records

// Get a specific record
const uri = 'at://did:plc:user123/app.bsky.feed.post/3jzxvpqr2bc2a'
const record = await repo.getRecord(uri)

console.log('Post:', record.value)

// List records in a collection
const posts = await repo.listRecords(
  'app.bsky.feed.post',
  { limit: 50 }
)

for (const post of posts.records) {
  console.log(`${post.uri}: ${post.value.text}`)
}

Writing Records

// Create a new record
const updated = await repo.applyWrites(
  {
    action: 'create',
    collection: 'app.bsky.feed.post',
    rkey: '3jzxvpqr2bc2b',
    record: {
      $type: 'app.bsky.feed.post',
      text: 'Another post!',
      createdAt: new Date().toISOString()
    }
  },
  keypair
)

console.log('New repository state:', updated.cid)

Updating Records

// Update an existing record
const updated = await repo.applyWrites(
  {
    action: 'update',
    collection: 'app.bsky.actor.profile',
    rkey: 'self',
    record: {
      $type: 'app.bsky.actor.profile',
      displayName: 'Alice Smith',
      description: 'Updated bio'
    }
  },
  keypair
)

Deleting Records

// Delete a record
const updated = await repo.applyWrites(
  {
    action: 'delete',
    collection: 'app.bsky.feed.post',
    rkey: '3jzxvpqr2bc2a'
  },
  keypair
)

Batch Operations

// Apply multiple writes in one commit
const updated = await repo.applyWrites(
  [
    {
      action: 'create',
      collection: 'app.bsky.feed.post',
      rkey: 'post1',
      record: { $type: 'app.bsky.feed.post', text: 'Post 1', createdAt: new Date().toISOString() }
    },
    {
      action: 'create',
      collection: 'app.bsky.feed.post',
      rkey: 'post2',
      record: { $type: 'app.bsky.feed.post', text: 'Post 2', createdAt: new Date().toISOString() }
    },
    {
      action: 'update',
      collection: 'app.bsky.actor.profile',
      rkey: 'self',
      record: { $type: 'app.bsky.actor.profile', displayName: 'Alice' }
    }
  ],
  keypair
)

Record Keys (rkeys)

Records are identified by their collection and rkey (record key):
// AT URI format:
// at://{did}/{collection}/{rkey}

// Example:
// at://did:plc:abc123/app.bsky.feed.post/3jzxvpqr2bc2a
//     └─────┬─────┘ └────────┬────────┘ └──────┬──────┘
//          DID            collection           rkey

TID-based Keys

Most records use TIDs (Timestamp Identifiers) as rkeys:
import { TID } from '@atproto/common'

// Generate a TID (time-based, k-sortable)
const rkey = TID.nextStr()
// Example: '3jzxvpqr2bc2a'

// TIDs are:
// - Timestamp-based (roughly sortable by creation time)
// - Collision-resistant
// - URL-safe base32 encoded

Literal Keys

Some records use fixed keys:
// Profile always uses 'self'
at://did:plc:abc123/app.bsky.actor.profile/self

// Defined in Lexicon:
{
  "type": "record",
  "key": "literal:self"
}

Commits

Each repository state is represented by a signed commit:
interface Commit {
  did: string        // Repository owner
  version: 3         // Commit format version
  rev: string        // Revision (TID)
  data: CID          // Pointer to MST root
}
Commit Lifecycle:
// Format a commit (doesn't apply it)
const commitData = await repo.formatCommit(
  writeOps,
  keypair
)

console.log('Commit CID:', commitData.cid)
console.log('Revision:', commitData.rev)
console.log('New blocks:', commitData.newBlocks.size)
console.log('Removed CIDs:', commitData.removedCids.size)

// Apply the commit
const updated = await repo.applyCommit(commitData)

CAR Files

Repositories are distributed as CAR (Content Addressed aRchive) files:
import { CarWriter } from '@atproto/repo'

// Export repository to CAR format
const car = await repo.exportCar()

// CAR contains:
// - All MST nodes
// - All record blocks
// - Commit object
// - Organized by CID
CAR files enable:
  • Repository export - Users can download their complete data
  • Efficient sync - Only transfer changed blocks
  • Backup and migration - Portable repository format

MST Operations

Direct MST usage (lower-level API):
import { MST } from '@atproto/repo'

const storage = new MemoryBlockstore()

// Create an MST
let mst = await MST.create(storage)

// Add entries
mst = await mst.add('app.bsky.feed.post/abc', recordCid)
mst = await mst.add('app.bsky.feed.post/def', recordCid2)
mst = await mst.add('app.bsky.feed.post/xyz', recordCid3)

// Get entry
const cid = await mst.get('app.bsky.feed.post/abc')

// Update entry
mst = await mst.update('app.bsky.feed.post/abc', newCid)

// Delete entry
mst = await mst.delete('app.bsky.feed.post/abc')

// List entries
const entries = await mst.list(50, 'app.bsky.feed.post/')

// Get MST root CID
const rootCid = await mst.getPointer()

Walking the Tree

// Walk all entries
for await (const entry of mst.walk()) {
  if (entry.isLeaf()) {
    console.log(`Key: ${entry.key}, Value: ${entry.value}`)
  }
}

// Walk from a specific key
for await (const leaf of mst.walkLeavesFrom('app.bsky.feed.post/abc')) {
  console.log(`${leaf.key}: ${leaf.value}`)
}

// List with prefix
const posts = await mst.listWithPrefix('app.bsky.feed.post/', 100)

Data Diff

Compute differences between repository states:
import { DataDiff } from '@atproto/repo'

// Compare two MST states
const diff = await DataDiff.of(newMst, oldMst)

console.log('New MST blocks:', diff.newMstBlocks)
console.log('New leaf CIDs:', diff.newLeafCids)
console.log('Removed CIDs:', diff.removedCids)
Useful for:
  • Computing repository updates
  • Generating sync payloads
  • Tracking changes

Proofs and Verification

MSTs support cryptographic proofs:
// Get proof for a specific record
const proof = await mst.getCoveringProof('app.bsky.feed.post/abc')

// Proof includes:
// - MST nodes on path to record
// - Sibling records (left and right)
// - Enough data to verify record existence

// Verify a record against proof
// (Typically done by receiving party)

Repository Sync Protocol

Repositories sync using the following protocol: Event Format:
interface RepoCommit {
  seq: number           // Sequence number
  rebase: boolean       // If true, full repo resync needed
  tooBig: boolean       // If true, repo too large to include
  repo: string          // DID
  commit: CID           // Commit CID
  rev: string           // Revision (TID)
  since: string | null  // Previous revision
  blocks: Uint8Array    // CAR file of new blocks
  ops: RepoOp[]        // List of operations
  blobs: CID[]         // New blobs
  time: string         // Timestamp
}

interface RepoOp {
  action: 'create' | 'update' | 'delete'
  path: string         // collection/rkey
  cid: CID | null     // Record CID (null for deletes)
}

Best Practices

For posts, likes, and other time-series data, use TID-based rkeys for chronological ordering.
Combine multiple operations into a single commit to reduce overhead and improve atomicity.
Use Lexicon validation to ensure records conform to schemas before adding to repository.
Design your system to support repository export/import for user portability.
Large repositories can be expensive to sync. Consider archiving or pagination strategies.

Storage Backends

Repositories can use different storage implementations:
import { 
  MemoryBlockstore,    // In-memory (testing)
  SqliteBlockstore,    // SQLite (production)
  // Custom implementations possible
} from '@atproto/repo'

// Memory storage (ephemeral)
const memStorage = new MemoryBlockstore()

// Persistent storage
const sqlStorage = new SqliteBlockstore('path/to/db.sqlite')
Storage Interface:
interface Blockstore {
  has(cid: CID): Promise<boolean>
  get(cid: CID): Promise<Uint8Array>
  put(cid: CID, bytes: Uint8Array): Promise<void>
  delete(cid: CID): Promise<void>
  // ... additional methods
}

Error Handling

import { 
  MissingBlockError,
  MissingBlocksError 
} from '@atproto/repo'

try {
  const record = await repo.getRecord(uri)
} catch (error) {
  if (error instanceof MissingBlockError) {
    console.error('Block not found:', error.cid)
  } else if (error instanceof MissingBlocksError) {
    console.error('Multiple blocks missing:', error.cids)
  } else {
    console.error('Unexpected error:', error)
  }
}

Additional Resources

@atproto/repo Package

NPM package documentation

Repository Spec

Official repository specification

MST Paper

Academic paper on Merkle Search Trees

CAR Format

Content Addressed aRchive specification

Build docs developers (and LLMs) love