Skip to main content

What are Sketches?

Sketches are compact, probabilistic data structures that enable approximate counting and frequency estimation with strong privacy guarantees. In the Halo Cross-Media Measurement System, publishers use sketches to represent their user populations in a form that can be:
  • Encrypted: Protected from unauthorized access
  • Aggregated: Combined across multiple publishers
  • Measured: Used to estimate reach and frequency
  • Private: Impossible to reverse-engineer individual user data
Sketches are inspired by algorithms like HyperLogLog, but modified significantly to support encryption, secure multiparty computation, and differential privacy.

Why Use Sketches?

Direct counting of unique users across publishers faces several challenges:
ChallengeHow Sketches Help
PrivacyCannot reverse-engineer individual users from sketch
EncryptionSketches support homomorphic operations when encrypted
ScalabilityFixed-size representation regardless of audience size
AccuracyProbabilistic but statistically rigorous estimates
Cross-publisherSketches from different publishers can be combined

Sketch Structure

A sketch consists of multiple registers, where each register stores information about a subset of users:

Register Components

Each register in a Liquid Legions sketch contains:
Register Index: Determines which register a user contributes to
  • Derived from hashing user identifiers
  • Distributes users uniformly across registers
  • Enables independent estimation from each register
Example: A sketch with 2^20 registers has indices from 0 to 1,048,575

Sketch Size Parameters

Sketches are configured with several parameters:
// From liquid_legions_sketch_parameter.proto
- decay_rate: Controls accuracy vs. sketch size
- size: Total number of registers (typically 2^20 to 2^25)
- sampling_rate: Fraction of users included in the sketch
- max_frequency: Maximum frequency value tracked
Trade-off: Larger sketches (more registers) provide better accuracy but increase computation cost and bandwidth requirements.

How Publishers Compute Sketches

Publishers follow these steps to create encrypted sketches:
1

Hash User Identifiers

For each user in the campaign audience:
  1. Hash the user identifier to determine the register index
  2. Hash the identifier again to create the key value
  3. Track the frequency (exposure count) for the user
2

Apply Differential Privacy

Add publisher-level noise:
  • Some users are randomly added or removed
  • Frequencies are perturbed
  • Noise calibrated to achieve the required privacy budget
3

Encrypt Register Values

Encrypt each register’s key and count:
  • Map plaintext values to elliptic curve points
  • Encrypt using ElGamal with the composite Duchy public key
  • Each register becomes a pair of ElGamal ciphertexts
4

Submit to Duchy

Send the encrypted sketch to a Duchy:
  • Sketch is stored encrypted in Duchy storage
  • Multiple sketches may be required per publisher (for different requisitions)
  • Publisher cannot decrypt or modify the sketch after submission

Encryption Details

Publishers use ElGamal encryption over elliptic curves:
  • Public key: Composite key from all Duchies (received from Kingdom)
  • Plaintext space: Points on an elliptic curve (e.g., NIST P-256)
  • Ciphertext: Pair of curve points (C₁, C₂) per encrypted value
  • Properties: Additively homomorphic, semantically secure
See Protocol Cryptography for cryptographic details.

Sketch Aggregation in the Protocol

The MPC protocol performs several operations on encrypted sketches:

1. Combining Sketches (Setup Phase)

The aggregator Duchy combines encrypted sketches from all publishers:
Global CRV = Combine(Sketch₁, Sketch₂, ..., Sketchₙ)
Process:
  • For each register index i
  • Collect all (key, count) pairs across all publisher sketches
  • Combine into a single Combined Register Vector (CRV)
  • Add distributed differential privacy noise

2. Shuffling (Execution Phase One)

Each Duchy shuffles the CRV to hide the original structure:
Shuffling prevents adversaries from:
  • Linking decrypted values back to specific publishers
  • Using register index patterns to infer user overlap
  • Correlating multiple measurements over time
Each Duchy applies an independent random permutation, compounding the privacy protection.

3. Join by Encrypted Keys (Execution Phase One)

The aggregator identifies registers with the same encrypted key:
  • Same key = same user across different publishers
  • Counts from matching keys are combined
  • Result: One (flag, count) tuple per unique user
Cryptographic challenge: Keys are encrypted, so comparison must be done without decryption! Solution: Uses Same Key Aggregation (SKA) with ElGamal properties:
  • Blind and re-randomize ciphertexts
  • Destroyed registers (non-matching keys) become identifiable after decryption
  • Preserved registers contain combined counts

4. Frequency Aggregation (Execution Phase Two & Three)

After partial decryption:
  • Flag values reveal which registers represent real users vs. noise
  • Count values are decrypted to reveal frequency information
  • A 2-D matrix aggregates counts: matrix[frequency][index]
  • Final reach and frequency statistics are estimated from the matrix

Privacy Properties of Sketches

Sketches provide several privacy protections:

1. One-Way Function

User identifiers are hashed to produce keys. The hash function is one-way: ✅ User ID → Key (easy)
❌ Key → User ID (computationally infeasible)
Even after decryption, the system learns only that “some user with hash value X” appeared, not the user’s actual identity.

2. Aggregation

Multiple users contribute to each register. Individual contributions are mixed together, making it impossible to isolate a single user’s data.

3. Noise

Differential privacy noise is added at multiple levels:
  • Publisher-level: During sketch construction
  • MPC-level: During setup and aggregation phases
  • Result-level: In final reach/frequency estimates
See Differential Privacy for details.

4. Encryption

Sketches remain encrypted throughout transmission and most of the computation. Only after all Duchies collaborate in the MPC protocol can values be decrypted.

Advanced: HyperLogLog-Style Estimation

The sketch design borrows concepts from HyperLogLog (HLL):

HyperLogLog Basics

HLL estimates cardinality by:
  1. Hashing elements and observing the position of the first ‘1’ bit
  2. Maintaining maximum observed positions in multiple registers
  3. Computing harmonic mean across registers

Modifications for Privacy

The Liquid Legions protocol modifies HLL:
HLL FeatureLiquid Legions Modification
Store max bit positionStore encrypted user key + count
Deterministic mergeSecure multi-party join
Direct cardinalityEstimate after decryption
No frequencyFull frequency distribution
Important: Standard HyperLogLog is NOT used directly because:
  • It doesn’t support encryption of intermediate values
  • Merging encrypted HLL sketches would leak information
  • No support for frequency estimation
  • Difficult to add differential privacy without bias

Reach-Only vs. Frequency Sketches

The system supports two sketch variants:
Use case: Only measuring unique users (cardinality)Optimizations:
  • No count values stored (only keys)
  • Faster computation (fewer decryptions)
  • Smaller bandwidth requirements
  • Simpler protocol stages
Implementation: ReachOnlyLiquidLegionsSketchAggregationV2Protocol

Implementation References

Key files for sketch handling:
  • Encryption utilities: src/main/cc/wfa/measurement/common/crypto/encryption_utility_helper.h
  • Register extraction: Functions to parse key-count pairs from encrypted sketches
  • Sketch parameters: src/main/proto/wfa/measurement/internal/duchy/protocol/liquid_legions_sketch_parameter.proto
  • Mill processing: src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/job/mill/liquidlegionsv2/

Research Papers

The sketch design and aggregation techniques are described in:

Privacy-Preserving Reach Estimation

Complete system design including sketch construction

Secure Cardinality Estimation

Detailed cryptographic protocols for sketch aggregation

Next Steps

Protocol Cryptography

Learn the cryptographic operations performed on encrypted sketches

Secure Computation

Understand how multiple Duchies process sketches without decryption

Build docs developers (and LLMs) love