What is Differential Privacy?
Differential privacy is a mathematical framework for quantifying and limiting the privacy risk when analyzing datasets containing sensitive information. In the Halo Cross-Media Measurement System, differential privacy ensures that individual user data cannot be extracted from measurement results, even when an adversary has access to aggregate statistics.Differential privacy provides a rigorous guarantee: the presence or absence of any single individual in a dataset has a negligible effect on the output of any analysis.
Why Differential Privacy?
The measurement system processes sensitive user-level data from multiple publishers to compute cross-media reach and frequency statistics. Without privacy protections, these aggregated measurements could leak information about individual users, especially when:- Multiple measurements are made over the same population
- Adversaries have auxiliary information about users
- Small audience segments are measured
How Noise Protects Privacy
The system adds random noise at multiple stages of the measurement protocol:Publisher-Level Noise
Publishers add noise when computing their encrypted sketches. This protects individual user privacy against other participants in the measurement system.Global Noise Mechanisms
The Duchies introduce additional noise during the secure multiparty computation phases:Blind Histogram Noise
Blind Histogram Noise
Added during the setup phase to protect the distribution of user frequencies across publishers. This noise is distributed among multiple Duchies to maintain security even if some Duchies are compromised.
Reach Noise
Reach Noise
Applied to the final reach estimate to ensure that the total unique audience count preserves differential privacy guarantees.
Frequency Noise
Frequency Noise
Added to frequency distribution estimates to protect information about how many times individual users were exposed to campaigns.
Noise Mechanisms
The system supports multiple noise mechanisms based on the measurement requirements:- Laplace Mechanism: Adds noise from a Laplace distribution, suitable for unbounded sensitivity queries
- Gaussian Mechanism: Uses Gaussian noise for better accuracy in certain scenarios
- Discrete Noise: Integer-valued noise for count-based measurements
Privacy Budget Management
Differential privacy uses the concept of a privacy budget (epsilon, ε) to quantify privacy loss:- Lower ε values = stronger privacy guarantees, but more noise and less accuracy
- Higher ε values = weaker privacy guarantees, but less noise and higher accuracy
Privacy Budget Composition
When multiple measurements are performed on the same population, privacy budgets accumulate. The system tracks and manages this composition to ensure total privacy loss remains within acceptable bounds.Typical privacy parameters used in the system:
- ε (epsilon): Privacy budget, commonly set between 1.0 and 10.0
- δ (delta): Failure probability, typically set to 1/population_size or smaller
Distributed Privacy Budget
In the multi-party setting, the total privacy budget is distributed among:- Publisher noise: Each data provider’s contribution to privacy
- MPC noise: Noise added by Duchies during computation
- Multiple queries: Budget allocated across different measurements
Privacy-Accuracy Trade-offs
Differential privacy involves fundamental trade-offs:| Aspect | Impact |
|---|---|
| Smaller audiences | Require more noise (relative to signal), reducing accuracy |
| Tighter privacy budgets | Better privacy but lower measurement precision |
| Frequency estimation | More complex than reach, requires additional budget |
| Multiple queries | Each query consumes budget, limiting total measurements |
Research Foundations
The differential privacy implementation is based on extensive research:Privacy-Preserving Reach Estimation
System design paper covering the complete architecture
Secure Cardinality Estimation
Technical details on privacy-preserving cardinality and frequency estimation
Additional Resources
For those new to differential privacy, the following resources provide comprehensive introductions:- Differential Privacy from the Ground Up: Conceptual introduction to differential privacy fundamentals (available in
~/workspace/source/docs/dp_intro/) - Introduction to Differential Privacy (Slides): Visual presentation of key concepts
- Differential Privacy and Randomized Response: Historical context and classical techniques
Implementation in Halo
The noise computation is implemented in:- C++ noise parameters:
src/main/cc/wfa/measurement/internal/duchy/protocol/common/noise_parameters_computation.h - Protocol configurations:
src/main/proto/wfa/measurement/internal/duchy/protocol/liquid_legions_v2_noise_config.proto
Next Steps
Secure Computation
Learn how multiple parties compute results without revealing individual data
Sketches
Understand the data structures that enable privacy-preserving measurements