Cache-22: Doing Privacy Engineering with Privacy Standards

Zachary Kilhoffer; Nitin Agrawal; Devyn Wilder; Smita Rajmohan; Suejung Shin; Sri Pravallika

Monday, June 3

7:45 am–8:45 am

Continental Breakfast

8:45 am–9:00 am

Opening Remarks

Nuria Ruiz and Lawrence You

9:00 am–10:20 am

PETs

Empirical Privacy Metrics: The Bad, the Ugly… and the Good, Maybe?

Monday, 9:00 am–9:20 am

Damien Desfontaines, Tumult Labs

Synthetic data generation makes for a convincing pitch: create fake data that follows the same statistical distribution as your real data, so you can analyze it, share it, sell it… while claiming that this is all privacy-safe and compliant, because synthetic data is also "fully anonymous".

How do synthetic data vendors justify such privacy claims? The answer often boils down to "empirical privacy metrics": after generating synthetic data, run measurements on this data and empirically determine whether it's safe enough to release. But how do these metrics work? How useful are they? How much should you rely on them?

This talk will take a critical look at the space of synthetic data generation and empirical privacy metrics, dispel some some marketing-fueled myths that are a little too good to be true, and explain what is needed for these tools to be a valuable part of a larger privacy posture.

Damien Desfontaines, Tumult Labs

Damien works at Tumult Labs, a startup that helps organizations share or publish insights from sensitive data using differential privacy. He likes deploying robust anonymization solutions that solve real-world problems while deeply respecting the privacy of the people in the data. He tends to get kind of annoyed when people adopt an approach to anonymization mostly based on ~vibes~, even though the principled solution is, like, right there.

Connect:

Mastodon

ABSet: Harnessing Blowfish Privacy for Private Friend Recommendations

Monday, 9:20 am–9:40 am

Yuchao Tao and Ios Kotsogiannis, Snap Inc.

In this talk, we describe ABSet-DP (aka ABSet), a novel Blowfish-based framework that is currently being used in the Snapchat production friend recommendation system.

ABSet helps protect the privacy of the underlying social graph (e.g., the social connection between Alice and Bob should only be known to them). While traditional techniques like edge-DP are applicable, they can be infeasible for large graphs (e.g., O(10^8) nodes), or non-trivial to correctly implement.

ABSet has low computation cost and a high provable privacy bar. For each Snapchat user, friend suggestions come from multiple sources; for example, it considers users that are 2-hops and 3-hops away from the searcher in the friend graph. ABSet assumes a partition of sources into two sets: set A and set B, and applies a randomized swapping mechanism to the list of friend recommendations. It provides indistinguishability as to which source a recommendation is coming from, and thus limiting the friend graph leakage.

In general, ABSet considers two datasets as neighboring by differing one set label, and it allows flexible rules for the set assignment, therefore the mechanism design on top of ABSet is orthogonal to the semantic meaning of the set assignment. Following the Blowfish framework, ABSet's policy offers privacy semantics that are equivalent to edge-DP under certain practical assumptions.

Authors: Yuchao Tao and Ios Kotsogiannis

Yuchao Tao, Snap Inc.

Yuchao Tao, is a privacy engineer at Snap Inc. He works on applying Differential Privacy technologies for graph related privacy problems and general query answering problems. He defended his PhD at the CS department of Duke University under the supervision of Ashwin Machanavajjhala.

Connect:

Ios Kotsogiannis, Snap Inc.

Ios Kotsogiannis, is a privacy engineer at Snap Inc. He holds two patents on the application of privacy technologies for real-world products at Snap. Prior to joining Snap, he defended his PhD at the CS department of Duke University under the supervision of Ashwin Machanavajjhala.

Connect:

Balancing Privacy: Leveraging Privacy Budgets for Privacy-Enhancing Technologies

Monday, 9:40 am–10:00 am

Dr. Jordan Brandt, Inpher

PETs ensure input privacy, enabling sharing of and computation on sensitive input data without revealing its contents or allowing deduction. However, PETs do not necessarily guarantee output privacy. Revealing the final result of a computation (the output) often allows some conclusions about the input data, which might compromise sensitive data. A critical (and sometimes overlooked) aspect in ensuring robust privacy preservation is to account for the privacy budget allocated in any given PETs project. The privacy budget refers to the finite amount of privacy protection that can be allocated when performing various computations or data analyses using PETs, in order to safeguard output privacy. In the talk we will start with some examples of how PETs don't protect output privacy and we will give a comprehensive overview of privacy budget allocation for PETs by reviewing different privacy metrics and a number of relevant policy and governance guidelines. We will discuss how the concept of a privacy budget can help to demonstrate compliance with the requirements of privacy by design and we will show how organizations can strike a balance between preserving the privacy of sensitive input data and deriving valuable insights from data analysis.

Jordan Brandt, Inpher

Dr. Jordan Brandt is the CEO and co-founder at Inpher, the leader in privacy-enhancing computation that empowers organizations to harness the potential of AI securely and with trust. As a Technology Futurist, Jordan's research and insight on cybersecurity, AI, and robotics have been featured in print and live broadcast internationally on Bloomberg, CNBC, Forbes, Financial Times, Wired, and other business and technology press. Jordan is the former CEO and co-founder of Horizontal Systems, acquired by Autodesk (Nasdaq: ADSK) in 2011. He went on to serve as the director of Autodesk's $100m investment fund, while also teaching and conducting research as a Consulting Professor of Engineering at Stanford University. Jordan completed his undergraduate work at the University of Kansas and his Ph.D. in Building Technology at Harvard. In 2014 he was selected as one of Forbes 'Next-Gen Innovators'.

Connect:

Utility Analysis for Differentially-Private Pipelines

Monday, 10:00 am–10:20 am

Vadym Doroshenko‎, Google

Differential privacy offers strong privacy guarantees, but its implementation can lead to complex trade-offs with data quality. Techniques like aggregation, outlier handling, and noise addition introduce data loss, bias, and variance. These effects are difficult to predict and they depend on both data characteristics and chosen hyperparameters. This severely limits the usability of differential privacy tools.

We present a methodology and an open-source module within the PipelineDP framework to help address this challenge. Our approach systematically estimates the impact of differential privacy techniques on data quality and helps to choose hyperparameters to maximize data quality. Results are presented in a user-friendly format, aiding informed decision-making when implementing differentially private pipelines in Python.

Vadym Doroshenko‎, Google

Vadym Doroshenko is a software engineer at Google where he works on building anonymization infrastructure and helping teams to apply anonymization. He is tech lead at PipelineDP (pipelinedp.io). He is passionate about Differential Privacy research and in bringing it to production. He received his PhD in mathematics from Taras Shevchenko National University of Kyiv.

10:20 am–10:50 am

Break with Refreshments

10:50 am–12:30 pm

Privacy Preserving Analytics

Presto-Native Noisy Aggregations for Privacy-Preserving Workflows

Monday, 10:50 am–11:05 am

Kien Nguyen and Chen-Kuei Lee, Meta

At Meta, large-scale data analysis happens constantly, across varied surfaces, platforms, and systems. Differential privacy (DP), because of its strong protection, is one of the privacy-enhancing technologies deployed by Meta to protect users' privacy. However, implementing DP in practice, especially at Meta scale, has many challenges, including the diversity of interfaces for analysis, size of datasets, expertise required, and integration with other policy requirements and enforcement. In this talk, we describe an approach to private data analysis at Meta that places a set of common privacy primitives in the compute engine (Presto), which are leveraged by different frameworks and services to enforce DP guarantees across our many systems. Examples include automatic query rewriting for interactive data analysis, privacy-preserving ETL pipelines, and web mapping of aggregate statistics. The Presto-based approach helped increase flexibility, minimize changes to existing workflows, and enable robust privacy enforcement and guarantees. This is joint work with Jonathan Hehir (Meta Platforms, Inc.)

Kien Nguyen, Meta

Kien Nguyen currently works as a Research Scientist in the Applied Privacy Tech team at Meta, developing and deploying large-scale privacy-preserving systems in Meta. Kien finished his PhD program in Computer Science at the University of Southern California, under the supervision of Prof. Cyrus Shahabi. Kien is interested in privacy-preserving data analysis, location privacy, marketplaces, and their applications.

Chen-Kuei Lee, Meta

Chen-Kuei Lee is a Software Engineer in Applied Privacy Team at Meta. He works on applying a variety of privacy preserving techniques inside Meta to support data minimization and to reduce re-identification risks.

Designing for User Privacy: Integrating Differential Privacy into Ad Measurement Systems in Practice

Monday, 11:05 am–11:25 am

Jian Du, TikTok Inc.

Advertisers deploy ad campaigns across many advertising platforms to increase their reach. A multi-touch advertising measurement system is extensively used in practice to assess which ad exposure, amid several platforms, contributes to the final desired user actions such as purchases, sign-ups, and downloads. This attribution process involves tracing users' actions such as views, clicks, and purchases across platforms using tools like pixels and cookies. However, cross-site user tracking has raised increasing privacy concerns regarding the potential misuse of personal information. These concerns have led to legislative actions such as GDPR and industry initiatives like Apple's App Tracking Transparency.

We propose an initiative that provides formal privacy guarantees for cross-site advertising measurement outcomes, with a specific focus on real-time reporting in practical advertising campaigns. This proposal maintains the utility of the practical systems while offering formal and stronger user-level privacy guarantees through differential privacy. Experiments conducted with publicly available real-world advertising campaign datasets demonstrate the effectiveness of this proposal in providing formal privacy guarantees and increasing measurement accuracy, thereby advancing the state-of-the-art in privacy-preserving advertising measurement.

Authors: Jian Du and Shikun Zhang

Jian Du, TikTok Inc.

Jian Du is a research scientist at TikTok, leading the research and development efforts focused on integrating privacy-enhancing technologies into TikTok's products. For instance, Jian leads the development of PrivacyGo, an open-source project available on GitHub. Privacy Go aims to synergistically fuse PETs to address real-world privacy challenges, such as combining secure multi-party computation and differential privacy to enable privacy-preserving ad measurement and optimization of ad models, as well as privacy-preserving large language models. Prior to joining TikTok, Jian worked on PETs at Ant Financial and held a postdoctoral research position at Carnegie Mellon University.

Connect:

Being Intentional: AB Testing and Data Minimization

Monday, 11:25 am–11:45 am

Matt Gershoff, Conductrics

Often in analytics and data science we have the 'big table' mental picture of customer data where we are continuously trying to append and link new bits of data back to each customer. It turns out, however, that for many of the statistics needed for tasks like AB Testing, if we are intentional about how the data is collected, then often there is no need to link all of this information back to a central visitor or customer ID. Most of the basic statistical approaches used for inference in AB Testing (t-tests, ANOVA, nested partial f-tests, etc.) can be done on data stored in equivalence classes, or at the task level, rather than at an individual level. The hope is that once armed with options, attendees will be able to consider the trade-offs and make informed decisions on when each approach is most appropriate.

Matt Gershoff, Conductrics

Matt Gershoff is co-founder of Conductrics, a software company that offers integrated AB Testing, multi-armed Bandit, and customer research/survey software. While having been exposed to various 'advanced' approaches during MSc. degrees in both Resource Economics and Artificial Intelligence, Matt's bias is to try to squeeze as much value out of the simplest approach possible and to always be intentional about the marginal cost of complexity. "Just as the ability to devise simple but evocative models is the signature of the great scientist so over elaboration and overparameterization is often the mark of mediocrity." - George Box '76 Science and Statistics.

Connect:

Panel: Data Analytics: Fighting the Privacy Boogeyman

Monday, 11:45 am–12:30 pm

Moderator: TBA
Panelists: TBA

12:30 pm–2:00 pm

Lunch

2:00 pm–3:35 pm

Spectrum of Privacy Design Choices

FTC, Privacy, and You

Monday, 2:00 pm–2:15 pm

Jessica Colnago, Federal Trade Commission, Office of Technology

The Federal Trade Commission (FTC) has a dual mandate to promote fair competition and to protect consumers. Data plays an important role in both, as it can be used (or restricted) in a way that illegally hinders competition or that harms consumers. In order to fulfill these duties, it is critical that the agency have technical experts that can help its attorneys better understand the systems at play. For over a hundred years, FTC's staff have worked hard to tackle emerging harms and tech developments. In early 2023 they were complemented with the newly established Office of Technology. In this talk I will cover the basics of the FTC as an agency, its mandate and authorities, and how it approaches technology and privacy. I'll then cover the Office of Technology, the role of a technologist in supporting the FTC's mission, and how the skillset of a privacy engineer aligns with what is needed.

Jessica Colnago, Federal Trade Commission, Office of Technology

Jessica Colnago. Dr. Jessica Colnago is a senior technology advisor in the Federal Trade Commission's Office of Technology, with expertise in machine learning privacy, kids and teens privacy and, more broadly, usable privacy and people's understanding and expectations of privacy. She has experience both in academic and industry environments, performing foundational research and practitioner focused work. Prior to the FTC she was a senior privacy engineer in Google's machine learning privacy team and kids and teens privacy team. Jessica has a bachelors in Computer Engineering, a masters in Computer Science with a focus on Human Computer Interaction, and a masters and PhD in Societal Computing, Computer Science.

Don't End Up in the PETs Cemetery

Monday, 2:15 pm–2:35 pm

Simon Fondrie-Teitler, Federal Trade Commission, Office of Technology

Privacy Enhancing Technologies (PETs) offer a useful set of tools for building products and functionality while protecting the privacy of users' data. But the methods and implementations exist on a spectrum. On one end of the spectrum, they allow a company to offer products and services without ever having access to a user's data. On the other, they are a small layer on top of otherwise unfettered access to an individual's data. This talk will discuss where two technologies in particular, multi-party computation and private relays, fall on this spectrum. It will describe the Federal Trade Commission's optimism about PETs. It will also emphasize that because not all PETs are on the fully private end of the spectrum, companies making representations to consumers about their use of PETs must follow the law and ensure that any privacy claims or representations are accurate.

Simon Fondrie-Teitler, Federal Trade Commission, Office of Technology

Simon is a senior technology advisor in the Federal Trade Commission's Office of Technology. Previously, he worked at The Markup, a non-profit investigative news organization and contributed to several investigations, including co-authoring several articles reporting on the use and data sharing practices of advertising-motivated tracking technologies on sensitive websites like those of hospitals, telehealth startups, and tax preparation companies. Simon also contributed to building and maintaining Blacklight, an online real-time website privacy inspector tool. Before that, he helped stabilize and build the cloud infrastructure at several companies and led the technical effort to bring a startup into compliance with HIPAA.

Anonymization Aspects of a Low-latency VoIP Security Analytics System

Monday, 2:35 pm–2:55 pm

Jiri Kuthan, Intuitive Labs

We describe the privacy aspects of an alerting system we have designed for low-latency voice-over-IP (VoIP) security analytics. The application of end-to-end encryption to Privacy Identifiable Information (PII) reliably assures that neither analytics system administrators nor intruders can find out who has been calling whom. Only a client, represented by a traffic probe at the source and a GUI at the receiving end, can observe the data in plain text. At the same time, we aim to preserve the system's analytic capabilities. The underlying system can ingest massive streams of events describing user and device behavior, analyze them, and provide low-latency automated responses to detected threats. Encryption of PII in ingested data poses a challenge to both analytical capabilities on the server side and CPU performance on the client side. We are thus using specific knowledge of the application's data. We limit the encryption to PII such as SIP URIs, E.164 telephone numbers, and IP addresses. Further, we use prefix-preserving encryption techniques. Performance measurements and field validation have shown that we could still support typical security analytical cases, preserve PII privacy, and achieve reasonable processing latency for human system users and automated response facilities.

Authors: Cristian Constantin and Jiri Kuthan

Jiri Kuthan, Intuitive Labs

Jiri Kuthan serves as Chief Technology Officer at Intuitive Labs. Jiri graduated in CS from the University of Salzburg, Austria, and started his research career at Fraunhofer Labs in Berlin, Germany. He co-founded a startup, iptel.org, that produced an open-source SIP server known today as Kamalio/opensips. Jiri then started several other startups that focused on monitoring, session border control, and, lately, security analytics. Jiri has co-authored the RFC 3303, which coined the notion of a middlebox, a book on SIP security, and several related patents.

How We Can Save Anonymization

Monday, 2:55 pm–3:15 pm

Daniel Simmons-Marengo, Tumult Labs

When we claim data is anonymous, we offer a simple promise to users: "This data cannot harm you. You don't need to worry about it." We've broken that promise again and again when datasets that we claimed were safe have been reidentified. There is now growing skepticism that anonymization is possible, and a belief that any claim of anonymization is fraud.

The good news is that we can do better. It is possible to effectively anonymize data. It's been done before! The bad news is that we're not doing it consistently, because there are no widely used standards for effective anonymization techniques. We might normally expect lawmakers and regulators to set minimum standards that prevent mistakes and restore public trust. But as Katharina Koerner's excellent PEPR talk pointed out last year, the legal guidance is muddled and contradictory, and shows no sign of improving any time soon. The law is not going to save anonymization.

Which leaves us - the privacy community - as the only ones left to fix this mess. If we want to regain the trust of our users, we need to follow a consistent approach that reliably gets anonymization right. That's what this talk is about. I'll walk through a list of operational principles our anonymization techniques need to meet to live up to our promises.

Daniel Simmons-Marengo, Tumult Labs

Daniel is a software engineer at Tumult Labs. Previously, he co-led Google's anonymization team.

Transforming Data through Deidentification

Monday, 3:15 pm–3:35 pm

Akshatha Gangadharaiah, Bijeeta Pal, and Sameera Ghayyur, Snap Inc.

Text data shared by users is important for improving the user experience and quality of products and introducing new features. To clarify, text data does not relate to private communications data but rather refers to data that can come from various features e.g., search queries, text captions from public content. Using text data for downstream tasks like analysis, improving user safety, training models can pose significant privacy risks as the data may contain sensitive and private information about individuals. To address these challenges, we introduced a novel text deidentification workflow, designed to improve privacy while maximizing the utility for downstream tasks. The deidentification workflow works as follows: firstly, a PII redaction process systematically eliminates user-identifying attributes; secondly, an LLM rewrite modifies sentences to remove user-specific writing styles and lastly, a validation process gauges the efficacy of text deidentification. In this talk, we will go over the details of the various phases of the text deidentification workflow along with an overview of its implementation using Temporal.

Akshatha Gangadharaiah, Snap Inc.

Akshatha Gangadharaiah is the lead of Data Governance and MyAI Privacy at Snap. She has been working on privacy and governance solutions at Snap for the last 5 years. She holds a Master's degree in Computer Science from University of California, San Diego.

Bijeeta Pal, Snap Inc.

Bijeeta Pal is a privacy engineer at Snap. Before joining Snap, she completed her PhD in computer science at Cornell University.

Sameera Ghayyur, Snap Inc.

Sameera Ghayyur is currently a privacy engineer at Snap Inc where she is the primary privacy reviewer on My AI chatbot product among many other features in Snapchat. In the past, she has also worked in the privacy teams at Meta and Honeywell. She received her PhD in computer science from University of California, Irvine and her research is focused on accuracy aware privacy preserving algorithms. She also has experience working as a software engineer and a lecturer.

3:35 pm–4:05 pm

Break with Refreshments

4:05 pm–5:35 pm

Synthetic Data and Data Governance

A New Model for International, Privacy-Preserving Data Science

Monday, 4:05 pm–4:20 pm

Curtis Mitchell, xD, US Census Bureau

Currently when data analysis is performed between National Statistical Organizations (NSOs) such as the US Census Bureau and Statistics Canada, a complex series of arrangements must be agreed to that creates severe yet important restrictions on how and by whom the required data is accessed, thus increasing burden and time.

Here we demonstrate a new approach using remote, privacy-preserving processes via a collaboration between multiple NSOs in conjunction with the United Nations Privacy-Enhancing Technologies Lab (UN PET Lab). The proof-of-concept involves using the open-source data science platform PySyft and establishing the cloud infrastructure necessary such that nodes hosted by the US Census Bureau and other NSOs are facilitated by a network gateway hosted by the UN PET Lab. This architecture enables a private join on synthetic data representing realistic trade data from UN Comtrade, without each NSO needing to directly access the other NSO's data. It also enables investigations into key policy and governance questions as these technologies mature.

We believe this project will be an important milestone towards enabling privacy-preserving and remote data science between international government entities and uncovering future aspects of privacy policy and governance.

Curtis Mitchell, xD, US Census Bureau

Curtis Mitchell is an Emerging Technology Fellow on the xD team at the US Census Bureau where he is contributing to a variety of projects involving privacy-enhancing technologies, responsible artificial intelligence, and modern web applications. He has over 15 years of experience in software- and data-related roles at small startups, large corporations, and open-source communities. Prior to joining the Census Bureau, he worked at NASA's Ames Research Center.

Connect:

Compute Engine Testing with Synthetic Data Generation

Monday, 4:20 pm–4:40 pm

Brandon Vo and Eric Liu, Meta

At Meta, we have developed a new testing framework that utilizes privacy-safe and production-like synthetic data to detect regressions in various compute engines, such as Presto, within the Meta Data Warehouse. In this talk, we will discuss the challenges and solutions we have implemented to operate this framework at scale. We will also highlight key features of our synthetic data generation process, including the addition of differential privacy, expanded column schema support, and improved scalability. Finally, we will discuss how Meta leverages this testing framework to increase test coverage, reduce the Presto release cycle, and prevent production regressions.

Brandon Vo, Meta

Brandon Vo is a software engineer at Meta. He has worked on privacy at the company for the past seven years with a specific focus on data anonymization and synthetic data generation.

Eric Liu, Meta

Eric Liu is a software engineer at Meta. He works in the Presto team, with a strong interest in improving and consolidating testing solutions across Compute Engines in Data Warehouse. Before joining Meta, Eric Liu worked as a Chief Engineer and TLM for ADP.

Empowered User Control: Learnings from Building "Event Level Deletion"

Monday, 4:40 pm–4:55 pm

Lingwei Meng and Yan Li, Meta

Through a new privacy initiative, Meta gives users the ability to remove their historical ad-interaction events. For example, users can go to their "recent ad activities" page and "delete" an ad they have clicked on. After deletion, this particular interaction will no longer be used in their ads' ranking on Facebook. How did we change the data ingestion and storage infra to support this new functionality? How did we change the feature representation and model architecture to support this new functionality? This is a talk that combines infra side changes along with leveraging a new feature representation.

Lingwei Meng, Meta

Lingwei is a software engineer at Meta working on ads ranking. Lingwei's work focus is to serve highest quality ads to Facebook's users while maintaining a high privacy bar that's compliant with various regulations.

Connect:

Yan Li, Meta

Yan Li is a software engineer at Meta Platforms, Incorporated. Since joining Meta, she has focused on feature infrastructure. Now, she is focusing on building data privacy control infrastructure to enhance the ads experience for Meta users.

Connect:

Lineage Quality Measurement

Monday, 4:55 pm–5:15 pm

JiSu Kim, Alex Lambert, and Francesco Logozzo, Meta

Lineage allows us to understand data flows in systems, which is important for privacy because we need to understand where our data goes in order to protect them. There are a variety of Lineage approaches, such as static & dynamic analysis; each approach has potential false positives and false negatives.

These potential false positives and false negatives impact multiple entities: the Lineage tool owners, the product team using the tool, the privacy engineers assessing the products and tools. To ensure all the entities are using and upholding a consistent standard or threshold, we will present a common framework for measuring the performance of Lineage.

With this framework, we can help:

Product team and privacy engineers select the most effective and balanced approach for relevant privacy problems;
Lineage tool owners focus our engineering efforts on the Lineage improvements that reduce the most risk; and
Lineage tool owners develop new Lineage solutions and quantitatively show their efficacy.

JiSu Kim, Meta

JiSu Kim is a privacy engineer at Meta focusing on critical privacy reviews.

Alex Lambert, Meta

Alex Lambert is a software engineer at Meta focusing on privacy and security problems.

Francesco Logozzo, Meta

Francesco Logozzo is a software engineer in the Privacy team at Meta. He has published more 50 academic papers in the main programming languages conferences (POPL, PLDI, OOPSLA, SAS, VMCAI...), gave keynote talks at Academic and Industrial conferences (MSFT BUILD, …) chaired several program committees (SAS, VMCAI...) and served in many many more. He is the co-recipient of the "2021 IEEE Cybersecurity Award for Practice" for his work on Zoncolan.

Deleting Data at Organizational Scale

Monday, 5:15 pm–5:35 pm

Diogo Lucas, Stripe

Deleting a million records from a dataset can be hard. Deleting one record from a million datasets can be even harder.

Data has a tendency to sprawl. In today's information-hungry world, information is replicated and permutated in a myriad of ways in data marts, lakes, and warehouses. This proliferation can add massive volume and variety, turning a single input point into many thousands of somewhat related downstream entries.

So when it comes to observing a person's right to be forgotten, how can we find their information's needle in a company's data-hungry haystack? How can we do that in a world of architectural sprawl and data repurposing? And how do we do all that without breaking legitimate data usage cases?

In this session, we will evaluate the fundamental building blocks and practices that allow Stripe to guarantee our customer's (direct and indirect) rights for data deletion. Those include detection and attribution of sensitive data and its affiliation, impact analysis through exploration, and the combined use of deletion propagation and orchestration.

Diogo Lucas, Stripe

Diogo Lucas is an engineering lead in Stripe's privacy infrastructure team. He is deeply involved in privacy-related initiatives such as data deletion and sensitive data access controls. He has more than 15 years of industry experience, many of those dedicated to automating privacy and overall governance controls.

5:35 pm–7:00 pm

Conference Reception

Tuesday, June 4

8:00 am–9:00 am

Continental Breakfast

9:00 am–10:30 am

User Privacy Controls and Governance

Building Permissions into Data Modeling

Tuesday, 9:00 am–9:20 am

Lingtian Cheng, Meta

Complex products like social network apps are unavoidable to have complicated permission checks. There are thousands of actions that could happen on the Facebook app, and each action requires a permission check to decide if the viewer is allowed to perform that activity in order to prevent unintentional or unauthorized action. For example:

Can the viewer make a post on their friend's Timeline?
Can the viewer change the cover photo of this Group?
Can the viewer send a message to that seller on Marketplace?

The concept might seem simple at start, but as the products grow and add more features over time, managing complicated permission logics becomes challenging.

In this talk, I will describe a design pattern that enables engineers to define and implement permissions into data models. It contains three components:

a rules engine, which is responsible for modeling the permission logic;
an integration with the data modeling layer, which supports flexible abstraction and delegation of permissions;
an integration with the data fetching layer, which allows conditional loading based on permissions.

This design pattern has been widely used in Meta on numerous products, and has shown multiple improvements in reliability and performance of permission checks in production.

Lingtian Cheng, Meta

Lingtian Cheng is a Software Engineer on the Facebook Privacy team at Meta. He is passionate about building products with privacy by design, and has been developing solutions for permission modeling, data segmentation and other privacy challenges.

Connect:

Lessons Learned Deploying the Global Privacy Control to Millions of Customers

Tuesday, 9:20 am–9:35 am

Ryan Guest, Amazon

This talk will focus on how Amazon deployed the Global Privacy Control (Sec-GPC). The Global Privacy Control is a new web technology that allows customers to restrict the sale of their data, the sharing of their data with third parties, and the use of their data for cross-site targeted advertising. The talk starts by summarizing the standard and how the Global Privacy Control works. Then, we will talk about the different lessons learned when deploying the Global Privacy Control to millions of Amazon customers.

Ryan Guest, Amazon

Ryan Guest is a Principal Software Engineer at Amazon, where he focuses on data privacy. Before joining Amazon, he worked at Salesforce and a legal tech startup. Ryan is a member of the IAPP Privacy Engineering Advisory Board. He graduated from Cal Poly San Luis Obispo with a degree in Computer Science and enjoys hanging out on the beach with his wife and three young sons.

Consent Management at Airbnb

Tuesday, 9:35 am–9:55 am

Aziel Epilepsia and Fernando Rubio, Airbnb

At Airbnb, we architected a Consent Management Platform that provides a data model and API that solves the GDPR/ePrivacy directive Cookie consent & banner problems, and is extensible to support other consent gestures/scenarios applicable to our users. We allow services in Airbnb to programmatically model the "Terms" to present to an end-user, describe the personas impacted by the consent, and persist an audit log for regulation inquiries. We built this platform from learnings we gained from a prior vendor solution that was no longer meeting our needs. In this talk, we present the concepts and data model we use to implement consent, and the API surface that allows our clients to query whether the client must prompt the user for consent, what terms need to be presented to the user, and persist the new record of consent. We illustrate how this model scales to solve various problems like explicit consent through Cookies Banner, and implicit consent through GPC (Global Privacy Control), and how we adapted the API to fulfill new user flows.

Aziel Epilepsia, Airbnb

Aziel is a Software Engineer in the Airbnb Privacy Engineering team. He currently focuses on Privacy service technical problems regarding user consent and data subject rights.

Fernando Rubio, Airbnb

Fernando is a software engineer on the Airbnb privacy engineering team. He currently focuses on Privacy UX to prevent dark patterns in the UI and build client-side libraries and tooling with embedded privacy, primarily focused on software used to honor user consent and manage client-side storages.

Approaches and Challenges to Purpose Limitation across Diverse Data Uses

Tuesday, 9:55 am–10:15 am

Rituraj Kirti and Diana Marsala, Meta

Purpose limitation is a fundamental principle of data privacy. It means that the use of data is limited to only the stated purpose(s) disclosed at the point of its collection. In this presentation we will discuss the main challenges when addressing purpose limitation at scale and some of the new technical solutions we have developed at Meta that makes it more efficient.

Our approach to purpose limitation involves using annotations to represent different aspects of data and its processing and using these annotations to apply policy checks across data flows. We will describe the key concepts and our overall workflow that illustrates how we maintain continuous discovery of assets and data flows, reviewing them where required, apply annotations, and iteratively traverse the data flow graph to find and fix any issues.

There are several challenges that impact designing a solution:

Translating purpose limitation restrictions to code, data and systems is not yet a well defined concept in the industry;
Handling different data granularities (table, column, row level);
Conditional data flows; and
The scale of applying this tech to large companies such as Meta.

Rituraj Kirti, Meta

Rituraj Kirti is a Software Engineer on the Privacy Infrastructure team at Meta that builds technologies for addressing privacy obligations. Kirti's prior work at Meta includes creating and scaling various products that apply machine learning to improve the effectiveness of advertisers. He holds a B.E. (Hons) degree in Instrumentation Engineering from Birla Institute of Technology and Science, Pilani, India.

Connect:

Diana Marsala, Meta

Diana Marsala is a Software Engineer on the Privacy Infrastructure team at Meta. She was an early adopter of privacy infrastructure technologies, using them to uphold key privacy obligations, and now builds and adapts these technologies for wider use across the company. Marsala holds B.A.S. and M.S.E. degrees in Computer Science from the University of Pennsylvania.

Connect:

Governing Identity, Respectfully

Tuesday, 10:15 am–10:30 am

Wendy Seltzer, Tucows

User-centric or self-sovereign identity envisions a world in which individuals are at the center of their data and its uses. The challenge is getting all the other users of this data to agree!

The challenge is more than the technical hurdles of incomplete or incompatible standards, but also the steeplechase of social, economic, and governance: closing the gaps in trust and managing conflicting motivations. How do we do all this while ensuring that respect for the individual stays centered?

Multistakeholder governance brings the interests of individuals into dialog with other ecosystem participants. It doesn't guarantee that they will always prevail, but gives them the chance to be part of the consensus. We seek Institutional designs that elucidate participant interests and promote benefits from cooperation, respecting individual agency to make choices both at design-time and use-time.

We'll discuss lessons from previous "identity" efforts and multistakeholder institutions, efforts to build a new governance framework for user-centered identity, and what privacy assurances result. An identity system that embraces individual stakeholders in its governance offers the best measure of respect and privacy assurance.

Wendy Seltzer, Tucows

Wendy Seltzer is Principal Identity Architect at Tucows. She previously served as Strategy Lead and Counsel to the World Wide Web Consortium (W3C), improving the Web's security, availability, and interoperability through standards. As a Fellow with Harvard's Berkman Klein Center for Internet & Society, Wendy founded the Lumen Project (formerly Chilling Effects Clearinghouse), the web's pioneering transparency report to measure the impact of legal takedown demands online. She seeks to improve technology policy in support of user-driven innovation and secure communication. She co-authored the second edition of Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion.

Connect:

Mastodon

10:30 am–11:00 am

Break with Refreshments

11:00 am–12:35 pm

AI and LLMs

Through the Lens of LLMs: Unveiling Differential Privacy Challenges

Tuesday, 11:00 am–11:15 am

Aman Priyanshu, Yash Maurya, and Vy Tran, Carnegie Mellon University

Despite the growing reliance on differential privacy to shield user data in interest-based advertising, critical gaps remain in our understanding of its effectiveness against sophisticated threats. This presentation zeroes in on Privacy Attacks, highlighting their significance in truly appraising privacy in such settings, and explores whether LLMs/LMs could serve as formidable attackers. We will explore Google's Topics API, a pioneering effort to balance user privacy with advertising needs, to identify and quantify its vulnerabilities to re-identification and membership inference attacks. Leveraging practical simulations, we expose how edge cases and niche topics within the API amplify re-identification risks, a concern underexplored in prior literature. Our use of Large Language Models (LLMs) to simulate attacks marks a significant departure from traditional analysis, uncovering a heightened accuracy in user re-identification that challenges the API's privacy assertions. The findings underscore a pressing need for the PETs community to pivot towards evaluating the resilience of privacy technologies against LLM-driven threats, ensuring that mechanisms like the Topics API can truly withstand the evolving landscape of digital privacy risks.

Authors:

Aman Priyanshu (Carnegie Mellon University) apriyans@andrew.cmu.edu
Yash Maurya (Carnegie Mellon University) ymaurya@andrew.cmu.edu
Suriya Ganesh Ayyamperumal (Carnegie Mellon University) sayyampe@andrew.cmu.edu
Vy Tran (Carnegie Mellon University) vtran@andrew.cmu.edu
Saranya Vijayakumar (Carnegie Mellon University) saranyav@andrew.cmu.edu
Hana Habib (Carnegie Mellon University) htq@andrew.cmu.edu
Norman Sadeh (Carnegie Mellon University) sadeh@cs.cmu.edu

Aman Priyanshu, Carnegie Mellon University

Aman Priyanshu is a master's student at Carnegie Mellon University, specializing in Privacy Engineering. He is currently working under Professor Norman Sadeh and Professor Ashique KhudaBukush on AI for Social Good. Aman has earned recognition as an AAAI Undergraduate Scholar for his work in Fairness and Privacy. His professional experience includes working at Eder Labs R&D Private Limited, Concordia University (MITACS Globalink Research Scholar), and the Manipal Institute of Technology.

Connect:

X

Yash Maurya, Carnegie Mellon University

Yash Maurya is a Privacy Engineering graduate student at Carnegie Mellon University, aiming to develop AI solutions that prioritize privacy and ethics. His work focuses on creating systems that safeguard societal values, with interests in Federated Learning, Differential Privacy, and Explainable AI. He is currently working with Professor Virginia Smith, on Unlearning in LLMs. Yash is also working as a Research Assistant, building a user-centric notice and choice threat modeling framework.

Connect:

X

Vy Tran, Carnegie Mellon University

Vy Tran, currently a second-year undergraduate at Carnegie Mellon University, majors in Information Systems and minors in Information Security, Privacy and Policy. Vy is passionate about integrating privacy and security into both physical and digital domains and exploring the ML-privacy nexus alongside her graduate peers. She is set to intern in summer 2024 with The Washington Post's Cyber Security & Infrastructure team, and in summer 2025 with PwC's Cyber Defense & Engineering Consulting team.

Connect:

Learning and Unlearning Your Data in Federated Settings

Tuesday, 11:15 am–11:30 am

Tamara Bonaci, Northeastern University, Khoury College of Computer Sciences

Federated learning is a distributed machine learning approach, allowing multiple data owners to collaboratively train a machine learning model while never revealing their data sets to other data owners. It is seen as a promising approach towards achieving data privacy, and it has already proven useful in several ubiquitous applications, such as predictive spelling.

Machine unlearning is another emerging machine learning sub-field, focusing on the need to minimize, and/or fully remove a data input from a training data set, as well as from the trained machine learning model. The need for machine unlearning comes as a direct response to recent regulations, such as the GDPR's right to erasure, and the right to be forgotten. In this proposal, we focus on the question of machine unlearning in the context of federated learning. Due to the distributed and collaborative nature of federated learning, simply removing a data input from a training set, and re-training the model typically is not possible. More commonly, different approaches, referred to federated unlearning need to be used. We introduce and analyze several such federated unlearning approaches, in terms of their ability to unlearn and performance. We also provide guidance for a practical federated unlearning method.

Tamara Bonaci, Northeastern University, Khoury College of Computer Sciences

Tamara Bonaci is an Assistant Teaching Professor at Northeastern University, Khoury College of Computer Sciences, and an Affiliate Assistant Professor at the University of Washington, Department of Electrical and Computer Engineering. Her research interests focus on security and privacy of emerging technologies, with an emphasis on biomedical technologies, and the development of privacy-preserving machine learning approaches.

Navigating the Privacy Landscape of AI on the Devices: Challenges and Best Practices

Tuesday, 11:30 am–11:50 am

Tousif Ahmed and Mina Askari, Google

While artificial intelligence based technologies are gaining traction and are delivered on our personal devices, user data privacy is a key consideration for developing the technologies. Personal devices including mobile, wearable, and smart home devices are equipped with a variety of user and sensor data (camera, microphone, and motion sensors) that, on one hand, can provide extraordinary opportunities for personalizing user experience, while on the other hand, can reveal extremely sensitive information about the user. Therefore, it is crucial to establish clear privacy guidelines for the responsible use of sensitive device and sensor data for AI development. In this talk, we explore the intricate relationship between device-specific AI and privacy considerations. We examine the unique challenges posed by device and sensor based AI models, spanning from data collection and usage transparency to model security and user control. In this talk, we present frameworks and guidelines that can help organizations and developers responsibly handle user data while harnessing the power of device-specific AI.

Authors: Tousif Ahmed, Mina Askari, and Joe Genereux, Devices and Services team at Google

Tousif Ahmed, Google

Tousif Ahmed is a Privacy Engineer and Applied Privacy Researcher at Google in the Devices and Services Team, that includes Pixel, Fitbit, and Nest. He is interested in topics related to mobile, wearable, and smart home devices, including sensor privacy, privacy of multimodal devices, IoT privacy, health information privacy, camera privacy, and bystander privacy. He received his PhD in Computer Science from Indiana University Bloomington in 2019.

Connect:

X

Mina Askari, Google

Mina Askari is the Privacy Lead for Pixel Phones and Applied Privacy Researcher at Google. Her interest lies in Generative AI and Privacy, ML Privacy, Data Anonymization, and Data centric privacy. She holds a PhD in Computer Science from University of Calgary.

Panel: Privacy Design Patterns for AI Systems: Threats and Protections

Tuesday, 11:50 am–12:35 pm

Moderator: Vasudha Hegde, DoorDash
Panelists: Nitin Agrawal, Snap Inc.; Smita Rajmohan, Google; Sri Pravallika, Autodesk

As industries increasingly embrace AI technologies, the risk of privacy breaches and unlawful data processing escalates. This panel proposes a comprehensive discussion on identifying essential privacy patterns for AI systems and advocating for privacy by design principles within ML pipelines. We will explore legal obligations surrounding the implementation of robust security measures, delve into technical risks associated with ML algorithms, and examine prevailing privacy-preserving machine learning technologies. Additionally, we will analyze the specific challenges posed by large language models (LLMs) and generative AI, including their susceptibility to privacy and ethical risks. By sharing insights and strategies, this session aims to equip participants with actionable knowledge to enhance privacy in AI/ML practices.

Nitin Agrawal, Snap Inc.

Nitin Agrawal is currently a Privacy Engineer at Snap Inc., specializing in AI privacy and data classification. Previously, he worked as an Applied Scientist for Alexa Privacy at Amazon. He holds a Ph.D. in Computer Science from the University of Oxford, where his research focused on advancing techniques for effective and equitable privacy-preserving machine learning.

Smita Rajmohan, Google

Smita Rajmohan is a Senior Product Counsel at Autodesk, where she is the head of the AI/ML Legal Practice Group. Smita serves on the IAPP Education Advisory Board and is an exam writer for the AI Governance certification. She is also part of IEEE's AI Policy Committee and serves on the Standards Committee for Institute of Operational Privacy Design.

Sri Pravallika, Autodesk

Sri Pravallika is a Privacy Tech Lead at Google's Privacy Trust Response team. Besides managing the response to complex Privacy incidents, she also leads incident prevention and remediation programs. She built her career in Security with a Masters in Cybersecurity from Northeastern University and eventually pivoted to Privacy.

12:35 pm–2:00 pm

Lunch

2:00 pm–3:30 pm

Privacy Governance, Safety, and Risk

Automating Technical Privacy Reviews Using LLMs

Tuesday, 2:00 pm–2:20 pm

Stefano Bennati, HERE Technologies; Engin Bozdag, Uber

In the world of Trust-by-Design, technical privacy and security reviews are essential for ensuring systems not only meet privacy standards, but also integrate privacy from start. However, as companies grow and diversify their technology, the process of conducting these reviews becomes more challenging and expensive.

This challenge is particularly evident in agile environments, where frequent releases of small software components need to be reviewed in a timely manner. Scale worsens this challenge, where the number of reviews from thousands of developers and microservices can easily overwhelm a small team of privacy engineers.

The quality of documentation also plays a significant role: poor or incomplete documentation can result in wasted efforts on review that present little privacy risk, or even worse, can result in overlooking serious privacy concerns.

The challenge of identifying low-risk items that don't need a review (false positives) and high-risk items skipping the review (false negatives) becomes a critical task for maintaining privacy-by-design effectively across the organization.

This presentation will explore how Uber and Here Technologies have worked to improve efficiency of their review triage processes via automation. Large Language Models (LLMs) are suited to assess the completeness of technical documentation and classify a feature/project into high or low risk buckets, due to the textual representation of information and the models being trained on privacy and security concepts. We will demonstrate how we have adopted LLMs in the triage phase and how we identified that LLMs are not suited to perform full reviews and remediate issues without supervision, as they struggle reaching factual and logical conclusions.

Attendees will learn how AI can enhance efficiency for privacy engineers, and the most effective technologies and strategies, such as policy writing, dataset validation, prompt engineering with detailed decision trees and fine-tuning. The discussion will also cover the balance between performance (e.g. model accuracy VS human labeling, false negatives) and cost (e.g. workload reduction, computational expense). As an example, we will show how using gates (decision-trees) in GPT4 prompts allowed us to reach accuracy rates up to 90% but with high costs.

The talk will conclude with a discussion of the limitations of this approach and future directions in this area.

Stefano Bennati, HERE Technologies

Stefano is Principal Privacy Engineer at HERE Technologies. He holds a PhD in Privacy algorithms, and authored several scientific publications and patents in the location privacy domain.

At HERE, Stefano provides technical guidance to product teams on building privacy into products. He also builds privacy-enhancing technologies for internal use cases, e.g. data lineage, as well as external use cases, e.g. Anonymizer, HERE's first privacy-focused product.

Engin Bozdag, Uber

Engin is Uber's Principal Privacy Architect and the team lead of Uber's Privacy Architecture team. He holds a PhD in AI Ethics and authored one of the first works on algorithmic bias. He also helped create ISO31700 (the world's first standard on Privacy by Design) and OWASP AI Security and Privacy Guide. Engin has gained extensive experience in diverse organizational settings, cultivating a privacy-focused career that has evolved over the course of a decade. Throughout his journey, he has assumed multifaceted roles, encompassing legal expertise, privacy engineering,engineering management, research, and consultancy in the realm of privacy.

Designing a Data Subject Access Rights Tool

Tuesday, 2:20 pm–2:35 pm

Arthur Borem, University of Chicago

The GDPR and US state privacy laws have strengthened data subjects' right to access personal data collected by companies. However, the data exports companies provide consumers in response to Data Subject Access Requests (DSARs) can be overwhelming and hard to understand. To identify directions for improving the user experience of data exports, we conducted focus groups/codesign sessions, ran a quantitative online survey, and collected over 800 de-identified files from volunteers' data exports. We found that users were overwhelmed by and unable to decipher the dozens and sometimes hundreds of record types in their data and wished to enact other data subject rights (primarily that of deletion) while exploring their data.

This talk will discuss work co-authored by Arthur Borem, Elleen Pan, Olufunmilola Obielodan, Aurelie Roubinowitz, Luca Dovichi, Sophie Veys, Daniel Serrano, Madison Stamos, Margot Herman, Nathan Reitinger, Michelle L. Mazurek, and Blase Ur.

Arthur Borem, University of Chicago

Arthur Borem is a PhD student in Computer Science at the University of Chicago advised by Blase Ur. His work focuses on the privacy and security implications arising from the large-scale collection and use of personal data by online platforms. In particular, he's exploring and designing strategies and tools for implementing data subject rights guaranteed by the GDPR and other privacy regulations that more effectively promote user autonomy and platform transparency. Arthur has a BS in Computer Science from Brown University and before starting his PhD he was a software engineer at Asana.

Connect:

X

Building a Protocol to Improve DSR Flexibility and Integration

Tuesday, 2:35 pm–2:50 pm

Prachi Khandekar, Sam Alexander, and Suejung Shin, Ketch

In order to fulfill data subject requests (DSRs), a structure of the rights request must be created and maintained from the time of intake, to the propagation to all data systems where Personal Information (PI) resides. What should this structure contain to perform effectively and efficiently? What exchanges are required to fulfill Delete and Access obligations?

The aim of this talk is to share how we developed and implemented a protocol based on the Data Rights Protocol, outlining standardized request and response data flows by which Data Subjects can exercise Personal Data Rights. The protocol is incorporated into our broader product's internal handshakes and specifies the payload structures for externally registered webhooks. I'll explain the decisions that drove the structures of these exchanges and lessons learned from implementation in practice. Lastly I'll share some areas of flexibility (e.g., "data subject variables") and discuss future applications for this build.

Prachi Khandekar, Ketch

Prachi Khandekar is a Software Engineer at Ketch. After graduating from UC Berkeley with a degree in Computer Science, Prachi worked at Nextdoor and then at Affirm, a Fintech company, where she was first exposed to the challenges of working with PI data at scale. Prachi now builds the scalable data privacy automation platform at Ketch.

Connect:

Sam Alexander, Ketch

Sam Alexander is a Data Privacy Engineer at Ketch. He previously worked as an engineer for Zendesk and KVH and has a degree in Computer Science and Math from Brown University.

Connect:

Suejung Shin, Ketch

Suejung Shin is a Senior Software Engineer at Ketch where she has spent the last 4 years building privacy solutions for a span of organizations from budding businesses to multinational corporations.

Connect:

Why "Safety"? How We Use Measurement to Advance Governance of Technology

Tuesday, 2:50 pm–3:10 pm

Irene Knapp, Internet Safety Labs

Internet Safety Labs is the world's first independent product safety testing lab for software, seeking to drive industry-wide change in culture and practices. This talk discusses the philosophy of why we think privacy experts can benefit from the concept of safety, and how that framing helps us measure it. To that end, we will do a deep dive into our app safety label, part of our free public data-exploration tool, App Microscope. We will discuss how we develop our standards, rubrics, and risk dictionaries, and our methodologies for data collection.

Irene Knapp, Internet Safety Labs

Irene Knapp (they/them), the Technology Director at Internet Safety Labs, is formerly from Google's advertising privacy team. They found that trying to change the system from within left much to be desired, and have pivoted to non-profit work to drive change that way. They are an activist on a wide variety of topics, best known for tech labor organizing efforts some years ago. Their activism informs their thoughts on privacy and on the responsible governance of technology.

Connect:

Mastodon

Measuring Privacy Risks in Mobile Apps at Scale

Tuesday, 3:10 pm–3:30 pm

Lisa LeVasseur and Bryce Simpson, Internet Safety Labs

Since 2022, Internet Safety Labs (ISL) has been measuring privacy behaviors in K-12 EdTech apps used in schools across the US. More than 120,000 data points and over 1000 apps worth of network traffic have been collected, analyzed, and presented in the form of app Safety Labels. This talk will share how and what was measured, and some of the key learnings, both from the research as well as the challenges/successes of measuring privacy at scale. Research and development was also performed by the ISL team, including Irene Knapp, Saish Mandavkar, and George Vo.

Lisa LeVasseur, Internet Safety Labs

Lisa LeVasseur is the founder, Executive Director and Research Director of Internet Safety Labs, a nonprofit technology watchdog. Her technical industry contributions and deep knowledge of consumer software products and connected technologies span more than three decades. Throughout her career in software she has developed a particular love for industry standards development work, software product & supplier management, and software product safety. She has dedicated the past five years to developing a vision for software product safety—where all software-driven technology is safe for people and humanity.

Connect:

Bryce Simpson, Internet Safety Labs

Bryce Simpson is a Safety Researcher/Auditor at Internet Safety Labs. He's been performing cyber security and online privacy assessments for five years, with recent focus on educational technology platforms, synthesizing regulatory requirements and industry best practices. Specializing in meticulously analyzing the behavior of digital ecosystems, he is dedicated to navigating the intersection of privacy concerns and technological advancements, ensuring a secure online environment for all.

Connect:

3:30 pm–4:00 pm

Break with Refreshments

4:00 pm–4:55 pm

Threats and Engineering Challenges

"A Big Deal": How Privacy Engineering Can Streamline the M&A Process

Tuesday, 4:00 pm–4:20 pm

Shlomi Pasternak, Google

The challenges of conducting privacy assessments of acquisitions are significant: tight timelines, competing priorities, and limited initial visibility. Yet, early proactive privacy assessments are critical to mitigate risks and streamline the integration process. Privacy Engineering's "Privacy by Design" principles, combined with advanced assessment methodologies and tools, offer a solution. This approach enables timely, high-impact privacy insights during the crucial early stages when deal structures and integration plans are being solidified.

Shlomi Pasternak, Google

Shlomi Pasternak is a Manager in the Privacy M&A team at Google. In the past 4 years he worked with more than 50 acquisitions on mitigating their privacy gaps and integrating their environments and products. In the past two years he is leading a new team of Privacy Engineers, developing privacy solutions dedicated to help acquisitions and Bets with their privacy remediation.

Connect:

Cache-22: Doing Privacy Engineering with Privacy Standards

Tuesday, 4:20 pm–4:40 pm

Zachary Kilhoffer and Devyn Wilder, University of Illinois at Urbana–Champaign

Data privacy stands as a pressing and critical concern for numerous organizations. The burgeoning field of privacy engineering has emerged to address this demand. Although there exists no universally agreed-upon definition of the roles or educational requirements for privacy engineers (PE), many organizations enlist professionals to fulfill this pivotal function. In our quest to understand the daily practices and challenges faced by PE, we conducted interviews with 14 individuals currently in this role.

Initial findings underscore the immense diversity encompassed by the responsibilities, tasks, and competencies inherent in privacy engineering. Our research spotlights two key thematic areas: first, the varied ways in which PE employ privacy and security standards and controls; and second, the intricate and multifaceted relationships PE cultivate within their organizations. Notably, our investigations reveal that a considerable number of PE primarily concentrate on ensuring compliance with legal frameworks, such as GDPR and COPPA, rather than actively developing or implementing ambitious privacy policies. Furthermore, results indicate that privacy engineering, while still lacking a precise occupational definition, is undeniably a growing career path deserving of increased standardization. We believe that our findings provide insights into the myriad ways privacy engineering can be expanded and refined.

Zachary Kilhoffer, University of Illinois at Urbana–Champaign

Zachary Kilhoffer is a tech policy researcher and PhD candidate at the University of Illinois at Urbana-Champaign. With his background in governance and technical ML studies, Kilhoffer aims to standardize development and deployment procedures to make AI systems more fair, accountable, transparent, and ethical. In his free time, Kilhoffer likes woodworking, sci-fi, and spending time with his cats Theodore Roosevelt (Teddy) and Franklin Delano Roosevelt (Frankie).

Devyn Wilder, University of Illinois at Urbana–Champaign

Dev Wilder is a PhD student in Information Science at the University of Illinois Urbana-Champaign. Her areas of focus are information policy, medical misinformation, community information sharing, and various topics surrounding privacy. When she's not doing research, Wilder enjoys knitting, ice skating, and baking bread.

Preserving Privacy While Mitigating Insider Threat and Risk

Tuesday, 4:40 pm–4:55 pm

Mark Paes, Carnegie Mellon University

When mitigating the critical risks posed by potential insider threats, there is a delicate balance that needs to be maintained in preserving the privacy of insiders. This presentation will examine the intersection of insider threats, privacy, and methods in detection and security management. We will define insiders, threats (both negligent and malicious), and risks and explore their interconnectedness. We'll then delve into the human element of security and its inherent link to privacy concerns. A core focus will be on the aspects of insider risk management that raise privacy and civil liberty questions. We'll explore program governance, personal data management, user activity monitoring, and more. Additionally, a discussion will be had on the privacy threats posed by insiders themselves, along with mitigation strategies. The latter portion of the presentation explores potential privacy threats arising from insider management practices. We'll discuss safeguards and strategies to mitigate these concerns. Finally, we'll examine emerging technologies in artificial intelligence and encryption that offer the promise of effective risk management while upholding privacy.

PEPR '24 Conference Program

Monday, June 3

7:45 am–8:45 am

Continental Breakfast

8:45 am–9:00 am

Opening Remarks

9:00 am–10:20 am

PETs

10:20 am–10:50 am

Break with Refreshments

10:50 am–12:30 pm

Privacy Preserving Analytics

12:30 pm–2:00 pm

Lunch

2:00 pm–3:35 pm

Spectrum of Privacy Design Choices

3:35 pm–4:05 pm

Break with Refreshments

4:05 pm–5:35 pm

Synthetic Data and Data Governance

5:35 pm–7:00 pm

Conference Reception

Tuesday, June 4

8:00 am–9:00 am

Continental Breakfast

9:00 am–10:30 am

User Privacy Controls and Governance

10:30 am–11:00 am

Break with Refreshments

11:00 am–12:35 pm

AI and LLMs

12:35 pm–2:00 pm

Lunch

2:00 pm–3:30 pm

Privacy Governance, Safety, and Risk

3:30 pm–4:00 pm

Break with Refreshments

4:00 pm–4:55 pm

Threats and Engineering Challenges