Understand how BinaryDB saves and loads data from disk
BinaryDB persists data to disk using Python’s pickle module with atomic file replacement to ensure data integrity. This page explains the persistence mechanism in detail.
BinaryDB uses the .pkl extension and Python’s pickle protocol for serialization:
database.py:49
self._path: Path = Path(path).with_suffix(".pkl")
Security Notice: Pickle is unsafe to use with untrusted data. Never load database files from untrusted sources. Pickle can execute arbitrary code during deserialization.
Persist the current in-memory state to disk using atomic file replacement.
database.py:150-170
def commit(self) -> None: """ Persist the database to disk. Uses atomic file replacement to avoid corruption. """ self._ensure_open() if not self._dirty: return tmp = self._path.with_suffix(".tmp") try: with tmp.open("wb") as f: pickle.dump(self._data, f, protocol=pickle.HIGHEST_PROTOCOL) tmp.replace(self._path) except OSError as exc: raise DatabaseIOError("Failed to write database to disk") from exc self._dirty = False
# Step 1: Write to temporary filetmp = self._path.with_suffix(".tmp") # e.g., "mydata.tmp"with tmp.open("wb") as f: pickle.dump(self._data, f, protocol=pickle.HIGHEST_PROTOCOL)# Step 2: Atomically replace old filetmp.replace(self._path) # Atomic operation on most systems
Why this is important:
If the write fails midway, the original .pkl file remains intact
The .replace() operation is atomic on most filesystems
If the program crashes during write, you won’t have a corrupted database
File sequence:
Before commit: mydata.pkl (old data)During commit: mydata.pkl (old data - still intact) mydata.tmp (new data being written)After commit: mydata.pkl (new data - atomically replaced) mydata.tmp (deleted by replace)
from binarydb.database import Databasefrom binarydb.errors import DatabaseIOErrordb = Database("/read-only-path/mydata.pkl")db.set("key", "value")try: db.commit()except DatabaseIOError as e: print(f"Failed to save: {e}") # Original exception is preserved via 'from exc' print(f"Caused by: {e.__cause__}")
Read database contents from disk and replace all in-memory data.
database.py:172-195
def load(self) -> None: """ Load database contents from disk. Replaces all current in-memory data. """ self._ensure_open() if not self._path.exists(): return try: with self._path.open("rb") as f: data = pickle.load(f) except Exception as exc: raise DatabaseCorruptedError( "Failed to load database file" ) from exc if not isinstance(data, dict): raise DatabaseCorruptedError("Invalid database format") self._data = data self._dirty = False
from binarydb.database import Databasedb = Database("mydata.pkl")db.load() # Load existing data from disk# If file doesn't exist, no error is raiseddb2 = Database("nonexistent.pkl")db2.load() # Safe - just returns without error
db = Database("mydata.pkl")db.load()db.begin()print(db._dirty) # Falsedb.set("key", "value")print(db._dirty) # Still False! (in transaction)db.end() # Calls commit(), which sets and then clears _dirtyprint(db._dirty) # False (committed)
This prevents auto-commit during transactions:
database.py:72-74
def _mark_dirty(self) -> None: if not self._in_transaction: # Only mark dirty outside transactions self._dirty = True
Closing a database automatically commits pending changes:
database.py:247-255
def close(self) -> None: """ Close the database. Commits pending changes and prevents further operations. """ if not self._closed: self.commit() self._closed = True
Example:
db = Database("mydata.pkl")db.set("key", "value") # Sets _dirty = Truedb.close() # Automatically commits before closing# Further operations will failtry: db.set("another_key", "value")except DatabaseError as e: print(e) # "Database is closed"
Always call close() when you’re done with a database to ensure data is persisted.