Skip to main content
BinaryDB persists data to disk using Python’s pickle module with atomic file replacement to ensure data integrity. This page explains the persistence mechanism in detail.

Overview

BinaryDB stores all data in memory as a Python dictionary and serializes it to disk when needed. The persistence system includes:
  • commit() - Write in-memory data to disk atomically
  • load() - Read data from disk into memory
  • .pkl file format - Binary serialization using pickle
  • Atomic writes - Prevent corruption during save operations
  • _dirty flag - Track when data needs to be saved

File Format

BinaryDB uses the .pkl extension and Python’s pickle protocol for serialization:
database.py:49
self._path: Path = Path(path).with_suffix(".pkl")
Security Notice: Pickle is unsafe to use with untrusted data. Never load database files from untrusted sources. Pickle can execute arbitrary code during deserialization.

Automatic Extension Handling

The database automatically adds the .pkl extension:
# All of these create "mydata.pkl"
Database("mydata")
Database("mydata.db")
Database("mydata.pkl")
Database("/path/to/mydata.txt")

Committing Data to Disk

commit()

Persist the current in-memory state to disk using atomic file replacement.
database.py:150-170
def commit(self) -> None:
    """
    Persist the database to disk.

    Uses atomic file replacement to avoid corruption.
    """
    self._ensure_open()

    if not self._dirty:
        return

    tmp = self._path.with_suffix(".tmp")

    try:
        with tmp.open("wb") as f:
            pickle.dump(self._data, f, protocol=pickle.HIGHEST_PROTOCOL)
        tmp.replace(self._path)
    except OSError as exc:
        raise DatabaseIOError("Failed to write database to disk") from exc

    self._dirty = False

How It Works

  1. Check dirty flag - Skip if no changes (if not self._dirty: return)
  2. Write to temporary file - Create .tmp file with serialized data
  3. Atomic replacement - Use tmp.replace(self._path) to atomically replace old file
  4. Reset dirty flag - Mark data as clean (self._dirty = False)
commit() uses pickle.HIGHEST_PROTOCOL for optimal performance and compatibility with newer Python versions.

Atomic File Replacement

The atomic write process prevents corruption:
# Step 1: Write to temporary file
tmp = self._path.with_suffix(".tmp")  # e.g., "mydata.tmp"
with tmp.open("wb") as f:
    pickle.dump(self._data, f, protocol=pickle.HIGHEST_PROTOCOL)

# Step 2: Atomically replace old file
tmp.replace(self._path)  # Atomic operation on most systems
Why this is important:
  • If the write fails midway, the original .pkl file remains intact
  • The .replace() operation is atomic on most filesystems
  • If the program crashes during write, you won’t have a corrupted database
File sequence:
Before commit:
  mydata.pkl  (old data)

During commit:
  mydata.pkl  (old data - still intact)
  mydata.tmp  (new data being written)

After commit:
  mydata.pkl  (new data - atomically replaced)
  mydata.tmp  (deleted by replace)

Basic Usage

from binarydb.database import Database

db = Database("mydata.pkl")

# Make changes
db.set("key1", "value1")
db.set("key2", "value2")

# Persist to disk
db.commit()  # Creates/updates mydata.pkl

Optimization: Dirty Flag

The commit() method checks the dirty flag before writing:
database.py:158-159
if not self._dirty:
    return
This prevents unnecessary disk I/O:
db = Database("mydata.pkl")
db.load()

# No changes made
db.commit()  # No-op, returns immediately

db.set("key", "value")  # Sets _dirty = True
db.commit()  # Writes to disk

db.commit()  # No-op again, _dirty is False
Call commit() frequently without worrying about performance. The dirty flag ensures no unnecessary writes.

Error Handling

Disk write failures raise DatabaseIOError:
from binarydb.database import Database
from binarydb.errors import DatabaseIOError

db = Database("/read-only-path/mydata.pkl")
db.set("key", "value")

try:
    db.commit()
except DatabaseIOError as e:
    print(f"Failed to save: {e}")
    # Original exception is preserved via 'from exc'
    print(f"Caused by: {e.__cause__}")

Loading Data from Disk

load()

Read database contents from disk and replace all in-memory data.
database.py:172-195
def load(self) -> None:
    """
    Load database contents from disk.

    Replaces all current in-memory data.
    """
    self._ensure_open()

    if not self._path.exists():
        return

    try:
        with self._path.open("rb") as f:
            data = pickle.load(f)
    except Exception as exc:
        raise DatabaseCorruptedError(
            "Failed to load database file"
        ) from exc

    if not isinstance(data, dict):
        raise DatabaseCorruptedError("Invalid database format")

    self._data = data
    self._dirty = False

How It Works

  1. Check if file exists - Return early if no database file (if not self._path.exists(): return)
  2. Deserialize data - Use pickle.load() to read the file
  3. Validate format - Ensure loaded data is a dictionary
  4. Replace in-memory data - Set self._data = data
  5. Clear dirty flag - Mark as clean since data matches disk

Basic Usage

from binarydb.database import Database

db = Database("mydata.pkl")
db.load()  # Load existing data from disk

# If file doesn't exist, no error is raised
db2 = Database("nonexistent.pkl")
db2.load()  # Safe - just returns without error

Loading Replaces All Data

load() completely replaces the in-memory data. Any uncommitted changes are lost.
db = Database("mydata.pkl")

# Initial data
db.set("key1", "value1")
db.commit()

# Make changes
db.set("key2", "value2")  # Not committed!
db.set("key3", "value3")  # Not committed!

# Load from disk
db.load()  # Discards key2 and key3

print(db.exists("key1"))  # True
print(db.exists("key2"))  # False - lost!
print(db.exists("key3"))  # False - lost!

Error Handling

Corrupted File

If the file cannot be deserialized, DatabaseCorruptedError is raised:
from binarydb.database import Database
from binarydb.errors import DatabaseCorruptedError

db = Database("corrupted.pkl")

try:
    db.load()
except DatabaseCorruptedError as e:
    print(f"Database corrupted: {e}")
    # Handle corruption - maybe restore from backup

Invalid Format

If the file contains valid pickle data but not a dictionary:
database.py:191-192
if not isinstance(data, dict):
    raise DatabaseCorruptedError("Invalid database format")
Example:
import pickle
from pathlib import Path

# Create invalid database file (list instead of dict)
with Path("invalid.pkl").open("wb") as f:
    pickle.dump([1, 2, 3], f)

# Try to load
db = Database("invalid.pkl")
try:
    db.load()
except DatabaseCorruptedError as e:
    print(e)  # "Invalid database format"

The Dirty Flag System

The dirty flag (_dirty) is central to BinaryDB’s persistence mechanism.

When It’s Set

Operations that modify data set the dirty flag:
database.py:72-74
def _mark_dirty(self) -> None:
    if not self._in_transaction:
        self._dirty = True
Called by:
  • set() - After adding/updating a record
  • delete() - After removing a record (if it existed)
  • update() - After modifying dictionary fields

When It’s Cleared

The dirty flag is reset to False by:
  • commit() - After successful write to disk (database.py:170)
  • load() - After loading data from disk (database.py:195)
  • rollback() - After discarding transaction changes (database.py:228)

Transaction Behavior

During a transaction, the dirty flag is not set:
db = Database("mydata.pkl")
db.load()

db.begin()
print(db._dirty)  # False

db.set("key", "value")
print(db._dirty)  # Still False! (in transaction)

db.end()  # Calls commit(), which sets and then clears _dirty
print(db._dirty)  # False (committed)
This prevents auto-commit during transactions:
database.py:72-74
def _mark_dirty(self) -> None:
    if not self._in_transaction:  # Only mark dirty outside transactions
        self._dirty = True

Lifecycle and Auto-Save

close()

Closing a database automatically commits pending changes:
database.py:247-255
def close(self) -> None:
    """
    Close the database.

    Commits pending changes and prevents further operations.
    """
    if not self._closed:
        self.commit()
        self._closed = True
Example:
db = Database("mydata.pkl")
db.set("key", "value")  # Sets _dirty = True

db.close()  # Automatically commits before closing

# Further operations will fail
try:
    db.set("another_key", "value")
except DatabaseError as e:
    print(e)  # "Database is closed"
Always call close() when you’re done with a database to ensure data is persisted.

Complete Persistence Example

from binarydb.database import Database
from binarydb.errors import DatabaseIOError, DatabaseCorruptedError
import logging

logging.basicConfig(level=logging.INFO)

def initialize_database(path):
    """Create or load a database."""
    db = Database(path)
    
    try:
        db.load()
        logging.info(f"Loaded existing database with {len(db)} records")
    except DatabaseCorruptedError:
        logging.error("Database corrupted, starting fresh")
        # Could restore from backup here
    
    return db

def save_with_retry(db, max_retries=3):
    """Attempt to save database with retries."""
    for attempt in range(max_retries):
        try:
            db.commit()
            logging.info("Database saved successfully")
            return True
        except DatabaseIOError as e:
            logging.warning(f"Save attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                logging.error("Max retries reached, data not saved!")
                return False
    return False

# Main application
db = initialize_database("app.pkl")

# Populate database
if len(db) == 0:
    logging.info("Initializing new database")
    db.set("version", "1.0.0")
    db.set("created_at", "2026-03-04")
    db.set("config", {"debug": False, "max_connections": 100})
    save_with_retry(db)

# Perform operations
db.set("last_access", "2026-03-04T10:30:00")
db.update("config", {"debug": True})

# Check if save is needed
if db._dirty:
    logging.info("Changes detected, saving...")
    save_with_retry(db)
else:
    logging.info("No changes to save")

# Clean shutdown
db.close()  # Auto-commits if needed
logging.info("Database closed")

Best Practices

Load existing data before making changes to avoid losing data:
db = Database("mydata.pkl")
db.load()  # Load existing data first
db.set("new_key", "value")
db.commit()
Don’t wait to commit important changes:
db.set("payment_id", transaction_data)
db.commit()  # Ensure payment is saved immediately
Always close the database to ensure pending changes are saved:
try:
    db.set("key", "value")
    db.commit()
finally:
    db.close()  # Ensures commit even if error occurred
Implement fallback strategies for corrupted databases:
try:
    db.load()
except DatabaseCorruptedError:
    # Restore from backup
    restore_from_backup(db._path)
    db.load()
Never load database files from untrusted sources:
# BAD: Loading user-uploaded file
uploaded_file = request.files['database']
db.load()  # DANGEROUS! Can execute arbitrary code

# GOOD: Validate source first
if is_trusted_source(uploaded_file):
    db.load()
else:
    raise SecurityError("Untrusted database file")

Internals: Storage Format

The .pkl file contains a pickled Python dictionary:
import pickle
from pathlib import Path

# What BinaryDB stores:
data = {
    "key1": "value1",
    "key2": {"nested": "dict"},
    "key3": [1, 2, 3]
}

# Serialized like this:
with Path("mydata.pkl").open("wb") as f:
    pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)

# Deserialized like this:
with Path("mydata.pkl").open("rb") as f:
    loaded_data = pickle.load(f)

assert data == loaded_data
The entire database is a single dictionary serialized with pickle. There are no indexes, headers, or metadata - just pure Python data structures.

Next Steps

Error Handling

Learn how to handle persistence errors

Database Operations

Master CRUD operations

Build docs developers (and LLMs) love