Utility Analysis for Differentially-Private Pipelines

Monday, June 03, 2024 - 10:00 am10:20 am

Vadym Doroshenko‎, Google

Abstract: 

Differential privacy offers strong privacy guarantees, but its implementation can lead to complex trade-offs with data quality. Techniques like aggregation, outlier handling, and noise addition introduce data loss, bias, and variance. These effects are difficult to predict and they depend on both data characteristics and chosen hyperparameters. This severely limits the usability of differential privacy tools.

We present a methodology and an open-source module within the PipelineDP framework to help address this challenge. Our approach systematically estimates the impact of differential privacy techniques on data quality and helps to choose hyperparameters to maximize data quality. Results are presented in a user-friendly format, aiding informed decision-making when implementing differentially private pipelines in Python.

Vadym Doroshenko‎, Google

Vadym Doroshenko is a software engineer at Google where he works on building anonymization infrastructure and helping teams to apply anonymization. He is tech lead at PipelineDP (pipelinedp.io). He is passionate about Differential Privacy research and in bringing it to production. He received his PhD in mathematics from Taras Shevchenko National University of Kyiv.

BibTeX
@conference {296335,
author = {Vadym Doroshenko‎},
title = {Utility Analysis for {Differentially-Private} Pipelines},
year = {2024},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun
}