Architecture Diagram
Overview
After Bazel executes a build action, it uploads the outputs and action results to BuildBuddy’s remote cache so that future builds can reuse these results. BuildBuddy implements the Remote Execution API’s cache write operations, supporting both Content Addressable Storage (CAS) writes and Action Cache updates.Components Involved
Bazel Build Tool
The client uploading cached data:- Executes build actions
- Computes output file digests (SHA256)
- Uploads outputs to CAS
- Stores ActionResult in Action Cache
- Handles upload retries on failure
Cache Service (API Server)
Handles cache write requests:- Receives gRPC requests for cache writes
- Authenticates and authorizes requests
- Validates digest formats and sizes
- Routes to storage backends
- Returns success/failure status
Digest Computer
Computes content hashes:- SHA256 hash of file contents
- Includes file size in digest
- Used for content addressing
- Ensures data integrity
Content Addressable Storage (CAS)
Stores artifact blobs:- Files identified by content hash
- Immutable storage (write-once)
- Automatic deduplication
- Supports compression
Action Cache
Stores action results:- Maps action digest → ActionResult
- ActionResult contains output digests and metadata
- Mutable (can be overwritten)
- Implements TTL expiration
Storage Backend
Persists data to disk or cloud:- Local disk storage
- Cloud object storage (S3, GCS, Azure Blob)
- Supports multi-region replication
- Handles large file uploads
Upload Coordinator
Manages concurrent uploads:- Batches small files
- Parallelizes large uploads
- Implements retry logic
- Tracks upload progress
Data Flow
CAS Write Flow
Step 1: Action Execution
- Bazel executes build action locally or remotely
- Action produces output files
- Bazel computes digest for each output:
- Read file contents
- Compute SHA256 hash
- Get file size
- Create Digest (hash + size)
Step 2: Check Missing Blobs
- Before uploading, Bazel calls FindMissingBlobs
- Sends list of output digests to BuildBuddy
- BuildBuddy checks which digests are already in cache
- Returns list of missing digests
- Bazel only uploads missing blobs (saves bandwidth)
Step 3: Blob Upload
Two upload APIs available: BatchUpdateBlobs (for small files < 2MB):- Multiple small files in single request
- All blobs committed atomically
- Fast for many small files
- Stream file data in chunks
- Support resumable uploads
- Handle large files efficiently
- Final chunk sets finish_write=true
Step 4: Storage Write
- Cache service receives blob data
- Validates digest (recompute hash, verify size)
- Checks if blob already exists (deduplication)
- If new:
- Writes to storage backend
- Stores in multiple tiers (disk, cloud)
- Applies compression if configured
- Returns success response
Step 5: Write Verification
- Storage backend confirms write success
- Digest indexed for future reads
- Bazel receives success response
- Proceeds to update Action Cache
Action Cache Write Flow
Step 1: Create ActionResult
After outputs are uploaded to CAS:-
Bazel creates ActionResult message:
-
Includes:
- Output file paths and digests
- Exit code
- Stdout/stderr digests
- Execution timing metadata
Step 2: UpdateActionResult Request
- Bazel sends UpdateActionResult gRPC request
- Request includes:
- Action digest (identifies the action)
- ActionResult (output information)
- Instance name
- Cache service authenticates request
Step 3: Action Cache Update
- Service stores action digest → ActionResult mapping
- Overwrites previous entry if exists
- Sets TTL for expiration
- Indexes for fast lookup
- Returns success response
Step 4: Cache Entry Ready
- Action is now cached
- Future builds with same action digest:
- Will get cache hit
- Can skip execution
- Will download outputs from CAS
Upload Optimization
Deduplication
Content addressing provides automatic deduplication:- Same file content = same digest
- FindMissingBlobs avoids re-uploading existing blobs
- Significant bandwidth savings for:
- Shared dependencies
- Incremental builds
- Multiple build configurations
Batching
Small files batched together:- Reduces gRPC call overhead
- BatchUpdateBlobs handles up to hundreds of small files
- Single round-trip for multiple blobs
- Improves throughput for builds with many small outputs
Compression
Compress before upload:- Zstd compression for large text files
- Upload to compressed-blobs/zstd path
- Reduces upload time
- Stored compressed (decompressed on read)
- Typical compression ratio: 2-4x
Parallel Uploads
Multiple concurrent uploads:- Bazel uploads blobs in parallel
- Configurable concurrency limit
- Maximizes network bandwidth utilization
- Especially beneficial for large builds
Incremental Uploads
For very large files:- ByteStream.Write supports chunked uploads
- Upload can be resumed on failure
- Client tracks upload progress
- Retries only upload remaining chunks
Write Policies
Cache Isolation
Instance names provide cache isolation:- Prevent cache poisoning between environments
- Different TTLs per instance
- Separate quota management
TTL Configuration
Cache entries expire after TTL:Size Limits
Enforce blob size limits:Compression Settings
Configure compression:Error Handling
Upload Failures
- Network Errors: Bazel retries with exponential backoff
- Digest Mismatch: Upload rejected, client recomputes digest
- Size Mismatch: Upload rejected, client checks file
- Storage Full: Service returns RESOURCE_EXHAUSTED
- Permission Denied: Authentication or quota issue
Partial Writes
- ByteStream.Write supports resumable uploads
- Client tracks write_offset for resume
- Server stores partial upload in temporary location
- Upload completed when finish_write=true received
- Cleanup of abandoned uploads after timeout
Monitoring
Key metrics to track:- Upload Rate: Bytes/second uploaded
- Upload Latency: Time to upload blobs (p50, p95, p99)
- Deduplication Rate: Percentage of blobs already cached
- Compression Ratio: Compressed size / original size
- Upload Errors: Failed uploads by error type
- Storage Growth: Cache size over time