What are Repositories?
Repositories are signed, authenticated data structures that store all of a user’s records in AT Protocol. Each user has one repository containing their posts, likes, follows, profile, and other data.Why Repositories Matter
Self-Authenticating
Every repository is cryptographically signed, making it impossible to tamper with data without detection.Portable
Users can export their entire repository and import it on a different server, maintaining their complete history.Efficient Sync
Repositories use Merkle trees, allowing efficient synchronization by only transferring changed data.Verifiable
Anyone can verify the integrity and authorship of repository data using cryptographic proofs.Repository Structure
A repository consists of:- Commit - Signed pointer to the current state
- MST (Merkle Search Tree) - Ordered tree of records
- Records - Individual data items (posts, likes, etc.)
- Blocks - CBOR-encoded data blocks
Merkle Search Tree (MST)
The MST is the core data structure that organizes records in a repository. It combines properties of:- Merkle Trees - Cryptographic verification
- B-Trees - Efficient searching and insertion
- Deterministic ordering - Same data always produces same tree
Key Properties
How It Works
The MST uses a clever algorithm:- Hash each key with SHA-256
- Count leading zero bits in the hash
- Number of zeros determines tree layer
- More zeros = higher in the tree
- Deterministic structure - Same records always produce same tree
- Balanced tree - Probabilistically balanced by hash distribution
- Efficient operations - O(log n) search, insert, delete
Using Repositories
Creating a Repository
Reading Records
Writing Records
Updating Records
Deleting Records
Batch Operations
Record Keys (rkeys)
Records are identified by their collection and rkey (record key):TID-based Keys
Most records use TIDs (Timestamp Identifiers) as rkeys:Literal Keys
Some records use fixed keys:Commits
Each repository state is represented by a signed commit:CAR Files
Repositories are distributed as CAR (Content Addressed aRchive) files:- Repository export - Users can download their complete data
- Efficient sync - Only transfer changed blocks
- Backup and migration - Portable repository format
MST Operations
Direct MST usage (lower-level API):Walking the Tree
Data Diff
Compute differences between repository states:- Computing repository updates
- Generating sync payloads
- Tracking changes
Proofs and Verification
MSTs support cryptographic proofs:Repository Sync Protocol
Repositories sync using the following protocol: Event Format:Best Practices
Use TIDs for time-based records
Use TIDs for time-based records
For posts, likes, and other time-series data, use TID-based rkeys for chronological ordering.
Batch writes when possible
Batch writes when possible
Combine multiple operations into a single commit to reduce overhead and improve atomicity.
Validate records before writing
Validate records before writing
Use Lexicon validation to ensure records conform to schemas before adding to repository.
Handle repository migrations
Handle repository migrations
Design your system to support repository export/import for user portability.
Monitor repository size
Monitor repository size
Large repositories can be expensive to sync. Consider archiving or pagination strategies.
Storage Backends
Repositories can use different storage implementations:Error Handling
Related Topics
- Lexicons - Record schemas and validation
- Identity - DIDs and repository ownership
- Overview - AT Protocol architecture
Additional Resources
@atproto/repo Package
NPM package documentation
Repository Spec
Official repository specification
MST Paper
Academic paper on Merkle Search Trees
CAR Format
Content Addressed aRchive specification