Architecture Diagram
Overview
BuildBuddy implements the Remote Execution API’s cache services, providing both Content Addressable Storage (CAS) for build artifacts and an Action Cache (AC) for build action results. When Bazel needs a cached artifact or wants to check if an action has been previously executed, it queries BuildBuddy’s cache.Components Involved
Bazel Build Tool
The client requesting cached data:- Checks Action Cache before executing actions
- Reads build artifacts from CAS
- Uses content addressing (SHA256 digests) to identify artifacts
- Falls back to execution on cache miss
Cache Service (API Server)
Handles cache requests:- Receives gRPC requests for cache reads
- Authenticates requests using API keys or mTLS
- Validates digest formats and permissions
- Routes requests to storage backends
- Returns cached data or cache miss responses
Action Cache
Maps action hashes to results:- Stores action digest → ActionResult mappings
- ActionResult contains output file digests
- Fast lookup using hash-based indexing
- Implements TTL-based expiration
Content Addressable Storage (CAS)
Stores actual artifact data:- Blobs identified by SHA256 content hash
- Immutable data (never modified after write)
- Automatic deduplication
- Supports compression
Storage Backend
Persists cached data:- Local disk cache for fast access
- Cloud storage (S3, GCS, Azure) for scale
- Multi-tier storage with promotion/demotion
- Supports encryption at rest
Digest Validator
Ensures request validity:- Validates digest format (hash/size)
- Checks blob size limits
- Verifies instance name permissions
Data Flow
Action Cache Read Flow
Step 1: Action Hash Computation
- Bazel computes action hash from:
- Command line and arguments
- Input file digests
- Environment variables
- Platform properties
- Creates ActionDigest (hash + size)
Step 2: GetActionResult Request
- Bazel sends GetActionResult gRPC request
- Request includes:
- Action digest
- Instance name (for cache isolation)
- Cache service authenticates request
- Validates digest format
Step 3: Action Cache Lookup
- Service queries Action Cache with action digest
- Cache lookup paths:
- Cache Hit: ActionResult found
- Cache Miss: No entry exists
- For cache hit:
- ActionResult contains output file digests
- Exit code and execution metadata included
Step 4: Response
- On Hit: Return ActionResult to Bazel
- On Miss: Return NOT_FOUND status
- Bazel proceeds accordingly:
- Hit: Skip execution, fetch outputs from CAS
- Miss: Execute action, upload results
CAS Read Flow
Step 1: Digest Identification
- Bazel needs a file (input or cached output)
- Has the digest (SHA256 hash + size) from:
- Input manifest
- ActionResult output references
- Directory structure
Step 2: Read Request
Two APIs available: BatchReadBlobs (for small files):Step 3: Storage Lookup
- Service receives read request
- Validates digest and permissions
- Checks storage tiers in order:
- In-memory cache (hot data)
- Local disk cache
- Remote cloud storage
- Retrieves blob data
Step 4: Data Transfer
- Small blobs: Returned in single response
- Large blobs: Streamed in chunks
- Compression applied if requested:
- Bazel requests compressed-blobs/zstd path
- Service returns compressed data
- Reduces network transfer time
Step 5: Verification
- Bazel receives blob data
- Computes SHA256 of received data
- Verifies hash matches requested digest
- Uses data for build
FindMissingBlobs Flow
Before downloading multiple blobs, Bazel can check which are already cached locally:- Bazel sends FindMissingBlobs request with digest list
- Service checks which digests exist in cache
- Returns list of missing digests
- Bazel only downloads missing blobs
- Reduces unnecessary network transfer
Cache Hit Optimization
Local Disk Cache
BuildBuddy maintains a local disk cache for frequently accessed blobs:- First access: Retrieve from cloud storage
- Cache promotion: Store in local disk cache
- Subsequent access: Serve from local disk (much faster)
- Eviction: LRU policy when disk fills
In-Memory Cache
Hot data cached in memory:- Recent ActionResults
- Small frequently accessed blobs
- Directory structures
- Significantly faster than disk access
Compression
Reduces network transfer time:- Zstd compression for large blobs
- Bazel requests via compressed-blobs/zstd path
- Typical compression ratio: 2-4x
- CPU trade-off for network bandwidth savings
Cache Warming
Pre-populate cache with common artifacts:- Base images and toolchains
- Shared dependencies
- Reduces initial cache misses
Performance Considerations
Latency Factors
- Network RTT: Distance to BuildBuddy server
- Storage Backend: Disk vs cloud storage speed
- Blob Size: Large files take longer to transfer
- Compression: CPU overhead vs bandwidth savings
- Cache Tier: Memory > disk > cloud storage
Optimization Strategies
- Instance Name Isolation: Separate caches for different use cases
- Regional Deployment: Deploy near build infrastructure
- CDN Integration: Use CDN for geographically distributed teams
- Prefetching: Download inputs proactively
- Concurrent Reads: Parallel blob downloads
Monitoring
Key metrics to track:- Action Cache hit rate
- CAS hit rate (by storage tier)
- Blob download latency (p50, p95, p99)
- Bytes transferred (network usage)
- Storage backend errors
- Cache miss reasons