Skip to main content
Databas uses B-tree structures to organize table data efficiently. Each table is stored as a B-tree where row IDs serve as keys and row data is stored in leaf pages.

B-tree structure

A Databas B-tree consists of two node types:
Leaf nodes store actual rows as (row_id, payload) pairs. They form the bottom level of the tree and contain all user data.
pub(crate) struct LeafCellRef<'a> {
    pub(crate) row_id: RowId,
    pub(crate) payload: &'a [u8],
}
The tree is always balanced - all leaf pages are at the same depth.

Tree navigation

Searching for a row traverses the tree from root to leaf using the separator keys.

Interior page routing

Given a target row ID, an interior page determines which child to descend into (table_page/interior.rs:74-92):
pub(crate) fn child_for_row_id(&self, row_id: RowId) -> TablePageResult<PageId> {
    let slot_count = usize::from(self.cell_count());
    let slot_index = match find_interior_row_id(self.page, row_id)? {
        SearchResult::Found(slot_index) => Some(slot_index),
        SearchResult::NotFound(insertion_index) => {
            (usize::from(insertion_index) < slot_count).then_some(insertion_index)
        }
    };

    if let Some(slot_index) = slot_index {
        return Ok(InteriorCell::try_deserialize_at_slot(self.page, slot_index)?.left_child);
    }

    Ok(self.rightmost_child())
}
The algorithm:
  1. Binary search for a separator ≥ target row ID
  2. If found, follow that cell’s left child pointer
  3. If not found but there’s a next separator, follow its left child
  4. Otherwise, follow the rightmost child pointer

Example traversal

Consider searching for row ID 25 in this tree:
        Interior (root)
        [10, 20, 30]
       /    |    |    \
     /     |    |      \
  Leaf   Leaf  Leaf   Leaf
  [1-9]  [10-19] [20-29] [30+]
Traversal steps:
  1. Start at root interior page
  2. Search separators: 25 falls between 20 and 30
  3. Follow cell with separator 30 → left child
  4. Land on leaf page containing rows 20-29
  5. Binary search within leaf for row 25

Cell ordering

Cells within a page are kept sorted by row ID using the slot directory. Both leaf and interior pages use binary search for lookups (table_page/leaf.rs:199-224):
fn find_leaf_row_id(page: &[u8; PAGE_SIZE], row_id: RowId) -> TablePageResult<SearchResult> {
    layout::validate(page, LEAF_SPEC)?;

    let cell_count = usize::from(layout::cell_count(page));
    let mut left = 0usize;
    let mut right = cell_count;

    while left < right {
        let mid = left + ((right - left) / 2);
        let mid_u16 = mid as u16;
        let cell = layout::cell_bytes_at_slot_on_valid_page(page, LEAF_SPEC, mid_u16)?;
        if cell.len() < LEAF_CELL_PREFIX_SIZE {
            return Err(TablePageError::CorruptCell { slot_index: mid_u16 });
        }
        let current_row_id = read_u64(cell, PAYLOAD_LEN_SIZE);

        match current_row_id.cmp(&row_id) {
            Ordering::Less => left = mid + 1,
            Ordering::Greater => right = mid,
            Ordering::Equal => return Ok(SearchResult::Found(mid_u16)),
        }
    }

    let insertion_index = left as u16;
    Ok(SearchResult::NotFound(insertion_index))
}
The search returns either:
  • SearchResult::Found(slot_index): Row exists at this slot
  • SearchResult::NotFound(insertion_index): Row doesn’t exist; this is where it would be inserted

Maintaining sort order

When inserting a new cell, the slot directory is shifted to maintain sorted order (table_page/layout.rs:211-243):
pub(super) fn insert_slot(
    page: &mut [u8; PAGE_SIZE],
    spec: PageSpec,
    insert_index: u16,
    cell_offset: u16,
) -> TablePageResult<()> {
    validate(page, spec)?;

    let cell_count = usize::from(cell_count(page));
    let insert_index_usize = usize::from(insert_index);
    if insert_index_usize > cell_count {
        return Err(TablePageError::CorruptPage(
            TablePageCorruptionKind::SlotIndexOutOfBounds
        ));
    }

    let new_count = cell_count + 1;
    let slot_dir_end_after = slot_dir_end_for_count(spec, new_count)?;
    let content_start = usize::from(content_start(page));
    if slot_dir_end_after > content_start {
        return Err(TablePageError::CorruptPage(
            TablePageCorruptionKind::SlotDirectoryOverlapsCellContent,
        ));
    }

    // Shift slots [insert_index..cell_count) one position right
    for slot in (insert_index_usize..cell_count).rev() {
        let offset = slot_offset(page, spec, slot as u16)?;
        write_slot_offset_raw(page, spec, slot + 1, offset)?;
    }

    write_slot_offset_raw(page, spec, insert_index_usize, cell_offset)?;

    set_cell_count(page, new_count as u16);
    Ok()
}
This ensures O(log n) lookup time despite the slot array being unsorted in memory.
The slot directory maintains logical order, but cells can be physically scattered in the cell-content region. This separation allows efficient updates without moving large payloads.

Leaf page operations

Leaf pages implement the core key-value operations.

Insert

Inserting a row (table_page/leaf.rs:124-132):
pub(crate) fn insert(&mut self, row_id: RowId, payload: &[u8]) -> TablePageResult<()> {
    let insertion_index = match find_leaf_row_id(self.page, row_id)? {
        SearchResult::Found(_) => return Err(TablePageError::DuplicateRowId { row_id }),
        SearchResult::NotFound(insertion_index) => insertion_index,
    };

    let cell_offset = insert_leaf_cell(self.page, row_id, payload)?;
    layout::insert_slot(self.page, LEAF_SPEC, insertion_index, cell_offset)
}
Steps:
  1. Search for row ID to find insertion point
  2. Reject if row ID already exists
  3. Encode and append cell to content region
  4. Insert slot at correct position
If space is insufficient, the page is defragmented automatically (table_page/leaf.rs:274-292). Searching for a row (table_page/leaf.rs:68-75):
pub(crate) fn search(&self, row_id: RowId) -> TablePageResult<Option<LeafCellRef<'a>>> {
    let slot_index = match find_leaf_row_id(self.page, row_id)? {
        SearchResult::Found(slot_index) => slot_index,
        SearchResult::NotFound(_) => return Ok(None),
    };

    LeafCellRef::try_deserialize_at_slot(self.page, slot_index).map(Some)
}
Returns None if the row doesn’t exist, or Some(cell) with a borrowed reference to the payload.

Update

Updating a row (table_page/leaf.rs:137-162):
pub(crate) fn update(&mut self, row_id: RowId, payload: &[u8]) -> TablePageResult<()> {
    let slot_index = match find_leaf_row_id(self.page, row_id)? {
        SearchResult::Found(slot_index) => slot_index,
        SearchResult::NotFound(_) => return Err(TablePageError::RowIdNotFound { row_id }),
    };

    let existing_cell = layout::cell_bytes_at_slot(self.page, LEAF_SPEC, slot_index)?;
    let existing_len = leaf_cell_len(existing_cell)
        .map_err(|_| TablePageError::CorruptCell { slot_index })?;
    let new_len = leaf_cell_encoded_len(payload)?;

    // Fast path: same size, overwrite in place
    if existing_len == new_len {
        let cell_offset = usize::from(layout::slot_offset(self.page, LEAF_SPEC, slot_index)?;
        let cell_end = cell_offset + existing_len;
        let cell = &mut self.page[cell_offset..cell_end];
        let payload_len = payload.len() as u16;

        cell[0..PAYLOAD_LEN_SIZE].copy_from_slice(&payload_len.to_le_bytes());
        cell[PAYLOAD_LEN_SIZE..LEAF_CELL_PREFIX_SIZE].copy_from_slice(&row_id.to_le_bytes());
        cell[LEAF_CELL_PREFIX_SIZE..].copy_from_slice(payload);
        return Ok(());
    }

    // Different size: allocate new cell and update slot
    let cell_offset = update_leaf_cell(self.page, row_id, payload)?;
    layout::set_slot_offset(self.page, LEAF_SPEC, slot_index, cell_offset)
}
Optimization: If the new payload has the same length, the cell is overwritten in place without reallocating.

Delete

Deleting a row (table_page/leaf.rs:167-174):
pub(crate) fn delete(&mut self, row_id: RowId) -> TablePageResult<()> {
    let slot_index = match find_leaf_row_id(self.page, row_id)? {
        SearchResult::Found(slot_index) => slot_index,
        SearchResult::NotFound(_) => return Err(TablePageError::RowIdNotFound { row_id }),
    };

    layout::remove_slot(self.page, LEAF_SPEC, slot_index)
}
Deletion removes the slot entry but leaves the cell content in place (creating a gap). The space is reclaimed when the page is defragmented.

Interior page operations

Interior pages manage routing information for tree traversal.

Insert separator

Adding a new separator key (table_page/interior.rs:167-176):
pub(crate) fn insert(&mut self, row_id: RowId, left_child: PageId) -> TablePageResult<()> {
    let insertion_index = match find_interior_row_id(self.page, row_id)? {
        SearchResult::Found(_) => return Err(TablePageError::DuplicateRowId { row_id }),
        SearchResult::NotFound(insertion_index) => insertion_index,
    };

    let cell = encode_interior_cell(left_child, row_id);
    let cell_offset = insert_interior_cell(self.page, &cell)?;
    layout::insert_slot(self.page, INTERIOR_SPEC, insertion_index, cell_offset)
}
Identical logic to leaf inserts, but with fixed-size cells.

Update child pointer

Updating a separator’s left child (table_page/interior.rs:181-197):
pub(crate) fn update(&mut self, row_id: RowId, left_child: PageId) -> TablePageResult<()> {
    let slot_index = match find_interior_row_id(self.page, row_id)? {
        SearchResult::Found(slot_index) => slot_index,
        SearchResult::NotFound(_) => return Err(TablePageError::RowIdNotFound { row_id }),
    };

    let existing_cell = layout::cell_bytes_at_slot(self.page, INTERIOR_SPEC, slot_index)?;
    if existing_cell.len() < INTERIOR_CELL_SIZE {
        return Err(TablePageError::CorruptCell { slot_index });
    }

    let cell_offset = usize::from(layout::slot_offset(self.page, INTERIOR_SPEC, slot_index)?;
    let cell = encode_interior_cell(left_child, row_id);
    let cell_end = cell_offset + INTERIOR_CELL_SIZE;
    self.page[cell_offset..cell_end].copy_from_slice(&cell);
    Ok()
}
Since interior cells are fixed-size, updates always overwrite in place.

Rightmost child

The rightmost child pointer lives in the page header (table_page/interior.rs:214-221):
pub(crate) fn rightmost_child(&self) -> PageId {
    layout::read_u64_at(self.page, layout::INTERIOR_RIGHTMOST_CHILD_OFFSET)
}

pub(crate) fn set_rightmost_child(&mut self, page_id: PageId) -> TablePageResult<()> {
    layout::validate(self.page, INTERIOR_SPEC)?;
    layout::write_u64_at(self.page, layout::INTERIOR_RIGHTMOST_CHILD_OFFSET, page_id);
    Ok()
}
This pointer is separate from the cell array because it handles all keys greater than the largest separator.

Space management

B-tree pages handle space exhaustion through defragmentation and eventual splits.

Page full errors

When a page cannot fit a new cell even after defragmentation, it returns PageFull (table_page/leaf.rs:268-271):
match layout::try_append_cell(page, LEAF_SPEC, &cell)? {
    Ok(offset) => Ok(offset),
    Err(SpaceError { needed, available }) => {
        Err(TablePageError::PageFull { needed, available })
    }
}
Higher layers must handle this by splitting the page.

Defragmentation trigger

Defragmentation is attempted automatically when space allocation fails (table_page/leaf.rs:260-264):
if let Ok(offset) = layout::try_append_cell(page, LEAF_SPEC, &cell)? {
    return Ok(offset);
}

defragment_leaf_page(page)?;
This reclaims space from deleted or updated cells before declaring the page full.

Maximum cell size

Cell payloads are limited by:
  1. Payload length field: 16-bit unsigned (max 65,535 bytes)
  2. Page size: Must fit cell + slot + header in 4092 bytes
For leaf pages, the practical maximum payload size is:
max_payload = PAGE_DATA_END - LEAF_HEADER_SIZE - 2 - LEAF_CELL_PREFIX_SIZE
max_payload = 4092 - 8 - 2 - 10 = 4072 bytes
Larger payloads would require overflow pages (not currently implemented).

Page type dispatch

Since pages can be either leaf or interior, Databas provides runtime dispatch enums (table_page/mod.rs:18-37):
pub(crate) enum TablePageRef<'a> {
    Leaf(TableLeafPageRef<'a>),
    Interior(TableInteriorPageRef<'a>),
}

impl<'a> TablePageRef<'a> {
    pub(crate) fn from_bytes(page: &'a [u8; PAGE_SIZE]) -> TablePageResult<Self> {
        match layout::page_type(page) {
            layout::LEAF_PAGE_TYPE => Ok(Self::Leaf(TableLeafPageRef::from_bytes(page)?)),
            layout::INTERIOR_PAGE_TYPE => {
                Ok(Self::Interior(TableInteriorPageRef::from_bytes(page)?))
            }
            page_type => Err(TablePageError::InvalidPageType { page_type }),
        }
    }
}
This allows working with pages of unknown type, checking the type byte at runtime.
The B-tree implementation is complete for single-page operations but does not yet implement page splitting, merging, or multi-level tree management.

Corruption detection

Multiple layers of validation protect against corruption:
  1. Page checksums: CRC32 verified on every disk read
  2. Header validation: Page type, cell count, content start checked
  3. Cell validation: Payload lengths and cell boundaries verified
  4. Slot validation: Slot indices must be in bounds
Corruption errors are surfaced through the TablePageError and TablePageCorruptionKind types, allowing higher layers to handle or report them appropriately.

Build docs developers (and LLMs) love