Databas uses B-tree structures to organize table data efficiently. Each table is stored as a B-tree where row IDs serve as keys and row data is stored in leaf pages.
B-tree structure
A Databas B-tree consists of two node types:
Leaf nodes
Interior nodes
Leaf nodes store actual rows as (row_id, payload) pairs. They form the bottom level of the tree and contain all user data.pub(crate) struct LeafCellRef<'a> {
pub(crate) row_id: RowId,
pub(crate) payload: &'a [u8],
}
Interior nodes store separator keys that route searches to child pages. Each cell contains a (left_child, row_id) pair.pub(crate) struct InteriorCell {
pub(crate) left_child: PageId,
pub(crate) row_id: RowId,
}
The tree is always balanced - all leaf pages are at the same depth.
Tree navigation
Searching for a row traverses the tree from root to leaf using the separator keys.
Interior page routing
Given a target row ID, an interior page determines which child to descend into (table_page/interior.rs:74-92):
pub(crate) fn child_for_row_id(&self, row_id: RowId) -> TablePageResult<PageId> {
let slot_count = usize::from(self.cell_count());
let slot_index = match find_interior_row_id(self.page, row_id)? {
SearchResult::Found(slot_index) => Some(slot_index),
SearchResult::NotFound(insertion_index) => {
(usize::from(insertion_index) < slot_count).then_some(insertion_index)
}
};
if let Some(slot_index) = slot_index {
return Ok(InteriorCell::try_deserialize_at_slot(self.page, slot_index)?.left_child);
}
Ok(self.rightmost_child())
}
The algorithm:
- Binary search for a separator ≥ target row ID
- If found, follow that cell’s left child pointer
- If not found but there’s a next separator, follow its left child
- Otherwise, follow the rightmost child pointer
Example traversal
Consider searching for row ID 25 in this tree:
Interior (root)
[10, 20, 30]
/ | | \
/ | | \
Leaf Leaf Leaf Leaf
[1-9] [10-19] [20-29] [30+]
Traversal steps:
- Start at root interior page
- Search separators: 25 falls between 20 and 30
- Follow cell with separator 30 → left child
- Land on leaf page containing rows 20-29
- Binary search within leaf for row 25
Cell ordering
Cells within a page are kept sorted by row ID using the slot directory.
Binary search
Both leaf and interior pages use binary search for lookups (table_page/leaf.rs:199-224):
fn find_leaf_row_id(page: &[u8; PAGE_SIZE], row_id: RowId) -> TablePageResult<SearchResult> {
layout::validate(page, LEAF_SPEC)?;
let cell_count = usize::from(layout::cell_count(page));
let mut left = 0usize;
let mut right = cell_count;
while left < right {
let mid = left + ((right - left) / 2);
let mid_u16 = mid as u16;
let cell = layout::cell_bytes_at_slot_on_valid_page(page, LEAF_SPEC, mid_u16)?;
if cell.len() < LEAF_CELL_PREFIX_SIZE {
return Err(TablePageError::CorruptCell { slot_index: mid_u16 });
}
let current_row_id = read_u64(cell, PAYLOAD_LEN_SIZE);
match current_row_id.cmp(&row_id) {
Ordering::Less => left = mid + 1,
Ordering::Greater => right = mid,
Ordering::Equal => return Ok(SearchResult::Found(mid_u16)),
}
}
let insertion_index = left as u16;
Ok(SearchResult::NotFound(insertion_index))
}
The search returns either:
SearchResult::Found(slot_index): Row exists at this slot
SearchResult::NotFound(insertion_index): Row doesn’t exist; this is where it would be inserted
Maintaining sort order
When inserting a new cell, the slot directory is shifted to maintain sorted order (table_page/layout.rs:211-243):
pub(super) fn insert_slot(
page: &mut [u8; PAGE_SIZE],
spec: PageSpec,
insert_index: u16,
cell_offset: u16,
) -> TablePageResult<()> {
validate(page, spec)?;
let cell_count = usize::from(cell_count(page));
let insert_index_usize = usize::from(insert_index);
if insert_index_usize > cell_count {
return Err(TablePageError::CorruptPage(
TablePageCorruptionKind::SlotIndexOutOfBounds
));
}
let new_count = cell_count + 1;
let slot_dir_end_after = slot_dir_end_for_count(spec, new_count)?;
let content_start = usize::from(content_start(page));
if slot_dir_end_after > content_start {
return Err(TablePageError::CorruptPage(
TablePageCorruptionKind::SlotDirectoryOverlapsCellContent,
));
}
// Shift slots [insert_index..cell_count) one position right
for slot in (insert_index_usize..cell_count).rev() {
let offset = slot_offset(page, spec, slot as u16)?;
write_slot_offset_raw(page, spec, slot + 1, offset)?;
}
write_slot_offset_raw(page, spec, insert_index_usize, cell_offset)?;
set_cell_count(page, new_count as u16);
Ok()
}
This ensures O(log n) lookup time despite the slot array being unsorted in memory.
The slot directory maintains logical order, but cells can be physically scattered in the cell-content region. This separation allows efficient updates without moving large payloads.
Leaf page operations
Leaf pages implement the core key-value operations.
Insert
Inserting a row (table_page/leaf.rs:124-132):
pub(crate) fn insert(&mut self, row_id: RowId, payload: &[u8]) -> TablePageResult<()> {
let insertion_index = match find_leaf_row_id(self.page, row_id)? {
SearchResult::Found(_) => return Err(TablePageError::DuplicateRowId { row_id }),
SearchResult::NotFound(insertion_index) => insertion_index,
};
let cell_offset = insert_leaf_cell(self.page, row_id, payload)?;
layout::insert_slot(self.page, LEAF_SPEC, insertion_index, cell_offset)
}
Steps:
- Search for row ID to find insertion point
- Reject if row ID already exists
- Encode and append cell to content region
- Insert slot at correct position
If space is insufficient, the page is defragmented automatically (table_page/leaf.rs:274-292).
Search
Searching for a row (table_page/leaf.rs:68-75):
pub(crate) fn search(&self, row_id: RowId) -> TablePageResult<Option<LeafCellRef<'a>>> {
let slot_index = match find_leaf_row_id(self.page, row_id)? {
SearchResult::Found(slot_index) => slot_index,
SearchResult::NotFound(_) => return Ok(None),
};
LeafCellRef::try_deserialize_at_slot(self.page, slot_index).map(Some)
}
Returns None if the row doesn’t exist, or Some(cell) with a borrowed reference to the payload.
Update
Updating a row (table_page/leaf.rs:137-162):
pub(crate) fn update(&mut self, row_id: RowId, payload: &[u8]) -> TablePageResult<()> {
let slot_index = match find_leaf_row_id(self.page, row_id)? {
SearchResult::Found(slot_index) => slot_index,
SearchResult::NotFound(_) => return Err(TablePageError::RowIdNotFound { row_id }),
};
let existing_cell = layout::cell_bytes_at_slot(self.page, LEAF_SPEC, slot_index)?;
let existing_len = leaf_cell_len(existing_cell)
.map_err(|_| TablePageError::CorruptCell { slot_index })?;
let new_len = leaf_cell_encoded_len(payload)?;
// Fast path: same size, overwrite in place
if existing_len == new_len {
let cell_offset = usize::from(layout::slot_offset(self.page, LEAF_SPEC, slot_index)?;
let cell_end = cell_offset + existing_len;
let cell = &mut self.page[cell_offset..cell_end];
let payload_len = payload.len() as u16;
cell[0..PAYLOAD_LEN_SIZE].copy_from_slice(&payload_len.to_le_bytes());
cell[PAYLOAD_LEN_SIZE..LEAF_CELL_PREFIX_SIZE].copy_from_slice(&row_id.to_le_bytes());
cell[LEAF_CELL_PREFIX_SIZE..].copy_from_slice(payload);
return Ok(());
}
// Different size: allocate new cell and update slot
let cell_offset = update_leaf_cell(self.page, row_id, payload)?;
layout::set_slot_offset(self.page, LEAF_SPEC, slot_index, cell_offset)
}
Optimization: If the new payload has the same length, the cell is overwritten in place without reallocating.
Delete
Deleting a row (table_page/leaf.rs:167-174):
pub(crate) fn delete(&mut self, row_id: RowId) -> TablePageResult<()> {
let slot_index = match find_leaf_row_id(self.page, row_id)? {
SearchResult::Found(slot_index) => slot_index,
SearchResult::NotFound(_) => return Err(TablePageError::RowIdNotFound { row_id }),
};
layout::remove_slot(self.page, LEAF_SPEC, slot_index)
}
Deletion removes the slot entry but leaves the cell content in place (creating a gap). The space is reclaimed when the page is defragmented.
Interior page operations
Interior pages manage routing information for tree traversal.
Insert separator
Adding a new separator key (table_page/interior.rs:167-176):
pub(crate) fn insert(&mut self, row_id: RowId, left_child: PageId) -> TablePageResult<()> {
let insertion_index = match find_interior_row_id(self.page, row_id)? {
SearchResult::Found(_) => return Err(TablePageError::DuplicateRowId { row_id }),
SearchResult::NotFound(insertion_index) => insertion_index,
};
let cell = encode_interior_cell(left_child, row_id);
let cell_offset = insert_interior_cell(self.page, &cell)?;
layout::insert_slot(self.page, INTERIOR_SPEC, insertion_index, cell_offset)
}
Identical logic to leaf inserts, but with fixed-size cells.
Update child pointer
Updating a separator’s left child (table_page/interior.rs:181-197):
pub(crate) fn update(&mut self, row_id: RowId, left_child: PageId) -> TablePageResult<()> {
let slot_index = match find_interior_row_id(self.page, row_id)? {
SearchResult::Found(slot_index) => slot_index,
SearchResult::NotFound(_) => return Err(TablePageError::RowIdNotFound { row_id }),
};
let existing_cell = layout::cell_bytes_at_slot(self.page, INTERIOR_SPEC, slot_index)?;
if existing_cell.len() < INTERIOR_CELL_SIZE {
return Err(TablePageError::CorruptCell { slot_index });
}
let cell_offset = usize::from(layout::slot_offset(self.page, INTERIOR_SPEC, slot_index)?;
let cell = encode_interior_cell(left_child, row_id);
let cell_end = cell_offset + INTERIOR_CELL_SIZE;
self.page[cell_offset..cell_end].copy_from_slice(&cell);
Ok()
}
Since interior cells are fixed-size, updates always overwrite in place.
Rightmost child
The rightmost child pointer lives in the page header (table_page/interior.rs:214-221):
pub(crate) fn rightmost_child(&self) -> PageId {
layout::read_u64_at(self.page, layout::INTERIOR_RIGHTMOST_CHILD_OFFSET)
}
pub(crate) fn set_rightmost_child(&mut self, page_id: PageId) -> TablePageResult<()> {
layout::validate(self.page, INTERIOR_SPEC)?;
layout::write_u64_at(self.page, layout::INTERIOR_RIGHTMOST_CHILD_OFFSET, page_id);
Ok()
}
This pointer is separate from the cell array because it handles all keys greater than the largest separator.
Space management
B-tree pages handle space exhaustion through defragmentation and eventual splits.
Page full errors
When a page cannot fit a new cell even after defragmentation, it returns PageFull (table_page/leaf.rs:268-271):
match layout::try_append_cell(page, LEAF_SPEC, &cell)? {
Ok(offset) => Ok(offset),
Err(SpaceError { needed, available }) => {
Err(TablePageError::PageFull { needed, available })
}
}
Higher layers must handle this by splitting the page.
Defragmentation trigger
Defragmentation is attempted automatically when space allocation fails (table_page/leaf.rs:260-264):
if let Ok(offset) = layout::try_append_cell(page, LEAF_SPEC, &cell)? {
return Ok(offset);
}
defragment_leaf_page(page)?;
This reclaims space from deleted or updated cells before declaring the page full.
Maximum cell size
Cell payloads are limited by:
- Payload length field: 16-bit unsigned (max 65,535 bytes)
- Page size: Must fit cell + slot + header in 4092 bytes
For leaf pages, the practical maximum payload size is:
max_payload = PAGE_DATA_END - LEAF_HEADER_SIZE - 2 - LEAF_CELL_PREFIX_SIZE
max_payload = 4092 - 8 - 2 - 10 = 4072 bytes
Larger payloads would require overflow pages (not currently implemented).
Page type dispatch
Since pages can be either leaf or interior, Databas provides runtime dispatch enums (table_page/mod.rs:18-37):
pub(crate) enum TablePageRef<'a> {
Leaf(TableLeafPageRef<'a>),
Interior(TableInteriorPageRef<'a>),
}
impl<'a> TablePageRef<'a> {
pub(crate) fn from_bytes(page: &'a [u8; PAGE_SIZE]) -> TablePageResult<Self> {
match layout::page_type(page) {
layout::LEAF_PAGE_TYPE => Ok(Self::Leaf(TableLeafPageRef::from_bytes(page)?)),
layout::INTERIOR_PAGE_TYPE => {
Ok(Self::Interior(TableInteriorPageRef::from_bytes(page)?))
}
page_type => Err(TablePageError::InvalidPageType { page_type }),
}
}
}
This allows working with pages of unknown type, checking the type byte at runtime.
The B-tree implementation is complete for single-page operations but does not yet implement page splitting, merging, or multi-level tree management.
Corruption detection
Multiple layers of validation protect against corruption:
- Page checksums: CRC32 verified on every disk read
- Header validation: Page type, cell count, content start checked
- Cell validation: Payload lengths and cell boundaries verified
- Slot validation: Slot indices must be in bounds
Corruption errors are surfaced through the TablePageError and TablePageCorruptionKind types, allowing higher layers to handle or report them appropriately.