Differential Privacy

What is Differential Privacy?

Differential privacy is a mathematical framework for quantifying and limiting the privacy risk when analyzing datasets containing sensitive information. In the Halo Cross-Media Measurement System, differential privacy ensures that individual user data cannot be extracted from measurement results, even when an adversary has access to aggregate statistics.

Differential privacy provides a rigorous guarantee: the presence or absence of any single individual in a dataset has a negligible effect on the output of any analysis.

Why Differential Privacy?

The measurement system processes sensitive user-level data from multiple publishers to compute cross-media reach and frequency statistics. Without privacy protections, these aggregated measurements could leak information about individual users, especially when:

Multiple measurements are made over the same population
Adversaries have auxiliary information about users
Small audience segments are measured

Differential privacy addresses these risks by introducing carefully calibrated noise into the computation, making it mathematically impossible to determine whether any specific individual contributed to the result.

How Noise Protects Privacy

The system adds random noise at multiple stages of the measurement protocol:

Publisher-Level Noise

Publishers add noise when computing their encrypted sketches. This protects individual user privacy against other participants in the measurement system.

Global Noise Mechanisms

The Duchies introduce additional noise during the secure multiparty computation phases:

Blind Histogram Noise

Added during the setup phase to protect the distribution of user frequencies across publishers. This noise is distributed among multiple Duchies to maintain security even if some Duchies are compromised.

Reach Noise

Applied to the final reach estimate to ensure that the total unique audience count preserves differential privacy guarantees.

Frequency Noise

Added to frequency distribution estimates to protect information about how many times individual users were exposed to campaigns.

Noise Mechanisms

The system supports multiple noise mechanisms based on the measurement requirements:

Laplace Mechanism: Adds noise from a Laplace distribution, suitable for unbounded sensitivity queries
Gaussian Mechanism: Uses Gaussian noise for better accuracy in certain scenarios
Discrete Noise: Integer-valued noise for count-based measurements

The noise mechanism is configured per measurement based on the required privacy level and accuracy trade-offs.

Privacy Budget Management

Differential privacy uses the concept of a privacy budget (epsilon, ε) to quantify privacy loss:

Lower ε values = stronger privacy guarantees, but more noise and less accuracy
Higher ε values = weaker privacy guarantees, but less noise and higher accuracy

Privacy Budget Composition

When multiple measurements are performed on the same population, privacy budgets accumulate. The system tracks and manages this composition to ensure total privacy loss remains within acceptable bounds.

Typical privacy parameters used in the system:

ε (epsilon): Privacy budget, commonly set between 1.0 and 10.0
δ (delta): Failure probability, typically set to 1/population_size or smaller

Distributed Privacy Budget

In the multi-party setting, the total privacy budget is distributed among:

Publisher noise: Each data provider’s contribution to privacy
MPC noise: Noise added by Duchies during computation
Multiple queries: Budget allocated across different measurements

The system coordinates these allocations to maintain the overall privacy guarantee specified in measurement requests.

Privacy-Accuracy Trade-offs

Differential privacy involves fundamental trade-offs:

Aspect	Impact
Smaller audiences	Require more noise (relative to signal), reducing accuracy
Tighter privacy budgets	Better privacy but lower measurement precision
Frequency estimation	More complex than reach, requires additional budget
Multiple queries	Each query consumes budget, limiting total measurements

The system is designed to optimize these trade-offs while maintaining strong privacy guarantees.

Research Foundations

The differential privacy implementation is based on extensive research:

Privacy-Preserving Reach Estimation

System design paper covering the complete architecture

Secure Cardinality Estimation

Technical details on privacy-preserving cardinality and frequency estimation

Additional Resources

For those new to differential privacy, the following resources provide comprehensive introductions:

Differential Privacy from the Ground Up: Conceptual introduction to differential privacy fundamentals (available in ~/workspace/source/docs/dp_intro/)
Introduction to Differential Privacy (Slides): Visual presentation of key concepts
Differential Privacy and Randomized Response: Historical context and classical techniques

Implementation in Halo

The noise computation is implemented in:

C++ noise parameters: src/main/cc/wfa/measurement/internal/duchy/protocol/common/noise_parameters_computation.h
Protocol configurations: src/main/proto/wfa/measurement/internal/duchy/protocol/liquid_legions_v2_noise_config.proto

See Protocol Cryptography for how noise is applied during the MPC protocol phases.

Get Started

Setup & Deployment

Operations

Development

Concepts

What is Differential Privacy?

Why Differential Privacy?

How Noise Protects Privacy

Publisher-Level Noise

Global Noise Mechanisms

Noise Mechanisms

Privacy Budget Management

Privacy Budget Composition

Distributed Privacy Budget

Privacy-Accuracy Trade-offs

Research Foundations

Privacy-Preserving Reach Estimation

Secure Cardinality Estimation

Additional Resources

Implementation in Halo

Next Steps

Secure Computation

Sketches

Build docs developers (and LLMs) love

Get Started

Setup & Deployment

Operations

Development

Concepts

​What is Differential Privacy?

​Why Differential Privacy?

​How Noise Protects Privacy

​Publisher-Level Noise

​Global Noise Mechanisms

​Noise Mechanisms

​Privacy Budget Management

​Privacy Budget Composition

​Distributed Privacy Budget

​Privacy-Accuracy Trade-offs

​Research Foundations

Privacy-Preserving Reach Estimation

Secure Cardinality Estimation

​Additional Resources

​Implementation in Halo

​Next Steps

Secure Computation

Sketches

Build docs developers (and LLMs) love

What is Differential Privacy?

Why Differential Privacy?

How Noise Protects Privacy

Publisher-Level Noise

Global Noise Mechanisms

Noise Mechanisms

Privacy Budget Management

Privacy Budget Composition

Distributed Privacy Budget

Privacy-Accuracy Trade-offs

Research Foundations

Additional Resources

Implementation in Halo

Next Steps