Overview
The blob store provides content-addressed storage for notebook outputs (images, HTML, rich data). It has two components:- Write API - Unix socket IPC for storing blobs (see IPC Protocol)
- Read API - HTTP server for retrieving blobs (this page)
Architecture
On-Disk Layout
.meta file):
HTTP Server
The daemon runs an HTTP server on127.0.0.1:0 (random OS-assigned port).
Endpoint: http://127.0.0.1:{port} where {port} is advertised in daemon.json
daemon.json
Endpoints
GET /health
Health check endpoint.200
GET /blob/:hash
Retrieve blob by SHA-256 hash.64-character hex SHA-256 hash
Success Response (200)
Media type from metadata sidecar (e.g.,
image/png, text/html). Falls back to application/octet-stream if metadata is missing.Blob size in bytes
public, max-age=31536000, immutableContent-addressed blobs never change, so cache aggressively.*CORS enabled for cross-origin requests from notebook renderers.Raw blob bytes
Error Response (404)
Returned when the hash is not found in the blob store.Usage Examples
Rendering an Image
Fetching HTML Output
Streaming Large Data
Content Addressing
Hash Calculation
Blobs are identified by SHA-256 hash of the raw bytes:- Same bytes always produce the same hash
- Media type does NOT affect the hash
- 256-bit hashes are cryptographically unguessable
Validation
Valid hashes must be:- Exactly 64 characters long
- Hex digits only (
0-9a-f)
404 Not Found.
Write Operations
Blobs are written via the Unix socket (see IPC Protocol):Size Limit
Maximum blob size: 100 MiB
Atomic Writes
Blobs are written atomically:- Write to temp file:
.tmp.{uuid} - Rename to final path:
{hash}
Output Manifests (Phase 6)
Notebook outputs use a two-level storage strategy:- Output manifests - Jupyter output structure with ContentRef for data
- Blobs - Raw content referenced by manifests
ContentRef
Content can be inlined or stored as a blob:Inline (< 8KB)
Blob (>= 8KB)
Display Data Manifest
Stream Output Manifest
Small Log (inline)
Large Log (blob)
Error Output Manifest
Inlining Threshold
Content smaller than 8 KB is inlined in the manifest. Larger content goes to the blob store.
- Most
text/plainoutputs: inline (one request) - Most images: blob (two requests)
- Small stdout: inline
- Training loop logs: blob
- Error tracebacks: usually inline (1-5 KB)
Manifest Storage
Manifests are themselves stored as blobs with media typeapplication/x-jupyter-output+json:
Security Model
Why No Authentication?
Threat model: We protect against:- Remote attackers (server binds to
127.0.0.1only) - Cross-user access (Unix socket permissions control writes)
- Local malicious processes reading outputs (they already have full filesystem access)
- Hash guessing (256-bit SHA-256 hashes are cryptographically unguessable)
Why CORS is Enabled
Notebook outputs render in sandboxed iframes. CORS headers allow the iframe to fetch blobs from the localhost HTTP server:Garbage Collection
Current strategy: Manual cleanup only. Users can clear the cache:Performance Characteristics
Concurrent Reads
- Safe: Multiple processes can read simultaneously
- Efficient: OS page cache reduces disk I/O for frequently accessed blobs
- No locking: Content-addressed blobs are immutable
Concurrent Writes
- Safe: Atomic rename ensures partial writes are never visible
- Idempotent: Writing the same content twice is a no-op (same hash)
- Race handling: On Windows,
renamefails if target exists. This is detected and treated as success (concurrent writer placed identical content).
Sharding
Two-character prefix directories prevent filesystem bottlenecks:- 256 possible shards (
00throughff) - Evenly distributes load across directories
- Typical notebook: ~100 blobs → ~0.4 blobs per shard on average
Rust API
The daemon’sBlobStore Rust API:
Migration Guide: Phase 5 → Phase 6
The outputs list in the CRDT document usesList of Str. The format is auto-detected:
Phase 5 (legacy): JSON string
- String starts with
{and parses as JSON → Phase 5 inline JSON - String is 64 hex characters → Phase 6 manifest hash