Purpose
The Privacy Budget Manager provides:- Privacy Budget Tracking: Monitor cumulative privacy expenditure per privacy bucket
- Budget Enforcement: Reject queries that would exceed privacy budget limits
- Audit Logging: Record all queries for privacy compliance auditing
- Landscape Management: Support evolving privacy bucket definitions over time
Differential Privacy: A mathematical framework that provides provable privacy guarantees. The Privacy Budget Manager ensures that the total privacy “spent” across all queries never exceeds safe limits, preventing privacy leakage through repeated queries.
Core Concepts
Privacy Budget
Differential privacy is quantified by parameters:- Epsilon (ε): Primary privacy parameter. Lower is more private.
- Delta (δ): Secondary privacy parameter for approximate DP.
Privacy Buckets
Data is divided into privacy buckets - logical groupings for privacy accounting:- Typically organized by time period (e.g., day, week, month)
- May include other dimensions (geography, user segment, etc.)
- Each bucket has independent privacy budget
- Defined by the landscape configuration
Landscape
A landscape defines:- The structure of privacy buckets
- How to map queries to buckets
- Privacy budget limits per bucket
- Valid date/time ranges
LandscapeProcessor.kt
Landscapes can evolve over time through a mapping chain.
Queries
A query represents:- A request to access sensitive data
- Privacy parameters (epsilon, delta)
- Targeting criteria (which data/buckets)
- Unique reference ID for deduplication
Query proto definition
Architecture
Core Components
Privacy Budget Manager
File:PrivacyBudgetManager.kt
Initialization
Key Parameters
Maximum Privacy Budget
Maximum Privacy Budget
The maximum epsilon that can be consumed in any single privacy bucket.Example:
maximumPrivacyBudget = 10.0Once this limit is reached, no more queries can target that bucket.Maximum Total Delta
Maximum Total Delta
The maximum cumulative delta parameter across all queries in a bucket.Example:
maximumTotalDelta = 0.001Provides additional privacy guarantee for approximate DP.Landscape Mapping Chain
Landscape Mapping Chain
A list of
MappingNode objects defining how to map older landscapes to the current active landscape.Purpose: Allows evolving privacy bucket structure over time while maintaining historical accounting.The tail of the list must be the active landscape currently in use.Main Operation: charge()
The primary method for charging privacy budget:Charge Process
Check for Duplicates
Read queries from ledger to identify already-committed queries:
- Uses external reference IDs for deduplication
- Idempotent: charging same query twice doesn’t double-charge
Calculate Delta
Map new queries to privacy buckets and calculate privacy charge:
- Use landscape processor to map queries to buckets
- Aggregate epsilon and delta per bucket
- Produce a “slice” of privacy charges
Read Current Charges
Read existing charges for affected buckets from ledger:
- Transactional read for consistency
- Gets current epsilon and delta totals
Commit to Ledger
Write aggregated charges and queries to ledger:
- Atomic transaction ensures consistency
- Queries stamped with commit time
- Transaction succeeds or fails as unit
Write Audit Log
Write ALL queries (including duplicates) to audit log:
- Owned by EDP, not modifiable by PBM operator
- Provides independent verification
- Returns audit reference ID to caller
Ledger
File:Ledger.kt
Exception: LedgerException.kt
The ledger is a transactional backing store for privacy charges:
Responsibilities
- Store privacy charge rows (bucket ID → epsilon, delta)
- Store query records with commit timestamps
- Provide transactional reads and writes
- Support querying by reference ID for deduplication
Implementations
PostgresLedger
PostgresLedger
File:
deploy/postgres/PostgresLedger.ktPostgreSQL-backed ledger:- Uses ACID transactions
- Row-level locking for bucket charges
- Efficient indexing on reference IDs
InMemoryLedger
InMemoryLedger
File:
testing/InMemoryLedger.ktFor testing:- No persistence
- Simulates transactional behavior
- Fast for unit tests
Ledger Row Keys
File:Slice.kt
A “slice” contains:
- Map of ledger row keys to privacy charges
- Ledger row key = bucket identifier
- Privacy charge = (epsilon, delta) tuple
Audit Log
File:AuditLog.kt
The audit log is an append-only, immutable log owned by the Event Data Provider:
Responsibilities
- Record all queries presented to PBM
- Provide tamper-evident logging
- Enable independent privacy audits
- Return audit reference ID for each write
Implementations
GcsAuditLog
GcsAuditLog
File:
deploy/gcloud/GcsAuditLog.ktGoogle Cloud Storage backed audit log:- Writes to GCS bucket owned by EDP
- Object names include timestamps for ordering
- Immutable once written (via bucket policy)
- Can be in different GCP project than PBM
InMemoryAuditLog
InMemoryAuditLog
File:
testing/InMemoryAuditLog.ktFor testing purposes.Audit Trail
The audit log enables: Independent Verification:- EDP can verify PBM operated correctly
- Auditor can reconstruct privacy budget usage
- Detect unauthorized queries
- Demonstrate privacy budget enforcement
- Show all queries were properly accounted
- Provide evidence for privacy audits
Landscape Processor
File:LandscapeProcessor.kt
Processes landscape definitions and maps queries to privacy buckets:
MappingNode
Defines a landscape and optionally how to map from a previous landscape:Landscape Evolution
As privacy bucket structure changes over time:- Initial Landscape: Define initial bucket structure
- New Landscape: Define new bucket structure
- Mapping Function: Define how to map old buckets to new
- Append to Chain: Add new MappingNode to landscapeMappingChain
Privacy Charge Calculation
The PBM calculates privacy charge using composition theorems:Sequential Composition
Multiple queries on the same data:Group Privacy
Queries may have agroupId representing related queries:
- Queries in the same group are charged together
- Group-level privacy guarantees may apply
- Enables advanced composition techniques
Error Handling
Insufficient Privacy Budget
Exception:InsufficientPrivacyBudgetException
When: Adding query charges would exceed budget limits
Resolution:
- Query is rejected (not charged)
- Ledger and audit log remain unchanged for this query
- Caller must wait or adjust query parameters
Ledger Transaction Failure
Exception:LedgerException
When: Database transaction fails
Resolution:
- Entire operation rolled back
- Nothing written to ledger
- Audit log not written (since charge failed)
- Caller should retry
Audit Log Write Failure
When: Audit log write fails after successful ledger commit Critical Scenario:- Privacy budget HAS been consumed
- But audit log doesn’t reflect it
- Caller should NOT fulfill requisitions
- System should alert operators
Integration with Halo
The Privacy Budget Manager integrates with the Halo system:EDP Integration
EDPs use PBM before fulfilling requisitions:Privacy Parameters
Measurement Consumers specify privacy parameters:- Epsilon and delta in measurement request
- Kingdom validates parameters
- EDP PBM enforces budget limits
Deployment Considerations
Ledger Backend
Choose based on requirements: PostgreSQL:- ACID transactions
- Mature and well-understood
- Good for moderate scale
- Easier to operate
- Global distribution
- Higher scalability
- Built-in high availability
- Higher cost
Audit Log Storage
Requirements:- Immutable (write-once)
- Owned by EDP (different from PBM operator)
- Durable and highly available
- Access controls (EDP and auditor only)
- Enable object versioning
- Set bucket retention policy
- Use separate GCP project/AWS account
- Encrypt at rest
High Availability
Ensure PBM is highly available:- Multiple PBM instances behind load balancer
- Database with replication and failover
- Monitoring and alerting for failures
- Automated retry with exponential backoff
Security
Access Control
- Only authorized services can call PBM.charge()
- Mutual TLS for authentication
- Rate limiting to prevent abuse
Data Protection
- Queries may contain sensitive targeting criteria
- Encrypt data in transit and at rest
- Minimize retention of query details
Audit Log Security
- Audit log immutable and tamper-evident
- Separate from PBM operator control
- EDP controls access
- Enable logging of access to audit log itself
Monitoring
Budget Utilization
Current privacy budget consumed per bucket
Rejection Rate
Percentage of queries rejected due to insufficient budget
Charge Latency
Time to process charge() operation
Audit Log Lag
Delay between ledger commit and audit log write
Best Practices
Setting Budget Limits
- Base on differential privacy theory and acceptable privacy loss
- Consider cumulative loss over bucket lifetime
- Set conservative limits initially
- Monitor and adjust based on usage patterns
Landscape Design
- Align bucket granularity with measurement frequency
- Finer buckets = more flexibility, more complex accounting
- Coarser buckets = simpler, but less flexibility
- Plan for landscape evolution
Testing
- Test with representative query patterns
- Verify budget enforcement at limits
- Test landscape mapping functions
- Simulate concurrent charge() calls
- Validate audit log completeness
Next Steps
EDP Aggregator
Learn how EDPs integrate with PBM
Kingdom Overview
Understand measurement orchestration