The Sticky Problem of Measuring Passwords

October 4, 2022

Research

Authors:

Marina Sanusi Bohuk, Mazharul Islam, Thomas Ristenpart, Rahul Chatterjee

Article shepherded by:

Rik Farrow

Passwords are the primary mode of authentication, yet rarely studied or monitored because of their sensitive nature. In our paper, we introduced a framework called Gossamer for securely measuring password-based logins [1] and a simulation method for measuring risk of individual password-derived measurements. We then deployed Gossamer at two large universities, monitored 34 million login requests over several months, and reported on the resulting data.

The Problem

Although password-based authentication is still widely used across the Web, little is known about the types of passwords submitted by users or attackers to a login system. This is primarily because passwords are very sensitive; submitted passwords that result in login failure are also risky to monitor, as they could be users’ real passwords on other web services or a typo of their real password. Industry practitioners have largely shied away from investigating passwords themselves because of the high risk, with a few exceptions [2, 3]. However, certain questions can only be answered through analyzing submitted passwords as observed by a login system. For example, how often is a mistyped variant of the actual user password submitted against a user account? How often are breached passwords submitted? How often is the submitted password a close variant of a breached password? And more broadly, can we use the submitted passwords, especially the ones that are not correct, to differentiate legitimate users from attackers? These insights can help us understand user behavior, inform policy, and develop countermeasures to attacks on password-based login systems.

To answer these questions, however, we need (a) a framework to log submitted passwords in a secure manner that does not undermine the safety of passwords, and (b) a partnering login system with a large active user base. Creating a simulated login environment—such as using Amazon Mechanical Turk (AMT), as used in many previous password studies—would miss attacker behavior, and users may behave differently when they know they’re being monitored. In our paper, we introduce a framework called Gossamer [1] for measuring password-derived information in live login systems. Here, we explain more about the process behind designing this tool.

Designing a Secure Logging Architecture

We presented the research idea to the IT departments at two large universities who run web login systems with more than 500k users combined. Fortunately, both IT departments were excited about the idea and willing to collaborate with us to design a system that could safely perform measurements on passwords without compromising the performance or security of the university web login systems. We iterated over several different designs over several months trying to satisfy the IRB policies while meeting the operational constraints of the IT departments’ infrastructures. For example, to avoid interacting with any identifiable information, we wanted to only store a keyed hash of the usernames associated with login requests, and delete the key after data collection is over; however, this would not allow us to report compromised users that we discover through our instrumentation to the IT departments. On the other hand, using standard encryption to protect usernames would not allow learning if two login requests contain the same username, preventing us from aggregating login requests by usernames for downstream analysis. Thus we decided to use a deterministic encryption algorithm to protect the privacy of usernames with an encryption key held by the relevant IT Security office.

Design Principles

We finally created a set of four design principles to follow, and throughout the design process, we made infrastructural and design decisions based on these principles. These principles are as follows:

1. Least privilege access to password data

We wanted to ensure that the researchers and engineers involved receive only information necessary for maintaining the service and analyzing login behaviors.

2. Periodic deletion

To further minimize risk in the unlikely event of a complete system compromise, all sensitive data older than 24 hours must be expunged.

3. Safe-on-reboot

All sensitive data should be destroyed on reboot. This principle was introduced in Bunker by Miklas et al [4].

4. Bounded-leakage logging

Since passwords are very sensitive, any password-derived measurements logged must be carefully chosen to bound the improvement in a potential password guessing attack.

How can we measure the risk of logging a measurement?

The fourth design principle required us to consider how we can measure the potential risk of storing password-derived measurements should they be compromised by an attacker. For example, say that an attacker has a list of passwords, and they are executing an optimal frequency-based attack against a user account based on their knowledge of the password distribution. The attacker would start by guessing the password with the highest frequency on their first guess, then the password with the second highest frequency, and so on. Now, If the attacker is targeting a specific password and knows some information about that password, then they can filter the guess list and potentially speed up the attack. Say that the attacker knows the target password has a zxcvbn score of 3 (zxcvbn is a popular password strength meter); then the attacker can filter out any passwords that do not have a zxcvbn score of 3.

We simulated the attack described using a dataset of breach data, split into target passwords and the attacker’s guess list. For the zxcvbn measurement specifically, we found that the original zxcvbn score, which has a range of 0 to 4, gave too much information to the attacker — it would increase an attacker’s probability of guessing a user password by 20%. Thus we remapped the Zxcvbn score to ‘0’ (denoting weak passwords, having an original Zxcvbn score of 0) and ‘1’ (denoting medium to strong passwords, with a zxcvbn score of 1 or more), which leaked much less information and resulted in an attack speedup of less than 2%. We recommend doing a similar simulation for measuring the safety of logging other password-derived measurements that researchers or practitioners may want to record in the future. A full list of the measurements we logged and more details about the system can be found in our paper [1].

The Architecture

Once we settled on the set of measurements that would be useful to answer our research questions, while providing minimal benefit to an attacker in the unlikely case of compromising the measurement logs (bounded leakage), we went ahead and designed a system that would adhere to our other three design principles. This system, called Gossamer, receives a copy of the login request (with a subset of HTTP headers) from the main login server and performs a series of measurements.

Gossamer has two levels of storage — one ephemeral and one persistent — so that we can perform measurements across requests without storing sensitive data persistently. We wanted to know how similar or different are different passwords submitted by an IP address to a user account. Therefore, we need to access previously submitted passwords in plaintext. We solve this challenge by having two different storages: the ephemeral storage stores the passwords (encrypted under an in-memory key which is rotated every 24-hours) and persistent storage stores the anonymized logs of login requests with certain statistics computed about the submitted passwords.

It also uses different levels of privilege for different parts of the system. For example, the service that receives the login request and computes the statistics over raw passwords have a higher privilege level than the analysis service used to interact with the anonymized data stored in the persistent database. In theory, the ephemeral database could sit in the actual login server, although we will have to put more thought into how to do this with distributed login systems.

Deploying Gossamer

We deployed Gossamer in collaboration with IT security offices in two universities. At University 1 (U1), Gossamer was deployed for 7 months, and at University 2 (U2) it was deployed for 3 months. At U2 we received 24 million requests within 3 months, which was sufficient for answering our initial research questions. However, at U1 we only received 3.9 million requests in 3 months, as we were monitoring a subset of login traffic. We therefore continued monitoring at U1 until a very large, distributed attack bottlenecked all the login servers to the point that IT engineers had to remove any extraneous services. By that time we had collected 10 million login requests from U1.

With the data collected over these time periods, we were able to report statistics on user login behavior, especially regarding passwords. For example, we find that 30% of all failed requests at U1 were a typo of the actual password. We also found that 3% of submitted passwords at U1 appeared in a breach, and 6% of valid usernames also appeared in a breach. A full set of statistics and our findings can be found in the paper [3]; but to summarize, we found that breached credential use is still a serious problem, password-based login systems have usability issues, and two-factor authentication adds significant friction to login. The password-derived measurements were very helpful in coming to these conclusions, and we hope that this study will motivate the use of them for developing new login policies and countermeasure designs.

Next Steps

We have open sourced Gossamer, which can easily be augmented with new password-derived measurements. Gossamer can help system admins to securely measure statistics about the submitted passwords and create informed policy decisions. Our findings have strong implications for the utility of password-derived measurements, which we hope will be incorporated in future defense mechanisms.

Appendix

References:

[1] Marina Sanusi Bohuk, Mazharul Islam, Suleman Ahmad, Mike Swift, Tom Ristenpart, Rahul Chatterjee. Gossamer: Securely Measuring Password-based Logins. USENIX Security, 2022.

[2] Rahul Chatterjee, Anish Athalye, Devdatta Akhawe, Ari Juels, Thomas Ristenpart. pASSWORD tYPOS and How to Correct Them Securely. IEEE Symposium on Security and Privacy (S&P), 2016

[3] Joseph Bonneau. The science of guessing: Analyzing an anonymized corpus of 70 million passwords. IEEE Symposium on Security and Privacy, 2012.

[4] Andrew Miklas, Stefan Saroiu, Alec Wolman, and Angela Demke Brown. Bunker: A privacy-oriented platform for network tracing. USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009.

Article Categories:

Security

Sysadmin

Last updated February 8, 2023

Authors:

Marina Sanusi Bohuk is a PhD student in the Department of Computer Science at Cornell University. Her research interests focus on designing more secure authentication systems and countermeasures that can prevent abuse. Most recently, she has been working on detecting password guessing attacks in live login systems.

marina@cs.cornell.edu

Mazharul Islam is a Ph.D. student at UW-Madison. His research interest lies in security, privacy, applied cryptography, and privacy-preserving machine learning. His works have focused on understanding users’ login behavior and protecting users against advanced guessing attacks in a privacy-preserving way using cryptography and machine learning techniques. Before starting his Ph.D. journey in security and privacy, he worked on developing statistical tools for genomic sequencing and phylogenies.

mislam9@wisc.edu

Thomas Ristenpart is an Associate Professor at Cornell Tech and a member of the Computer Science department at Cornell University. His research spans a wide range of computer security topics, with recent focuses including digital privacy and safety in intimate partner violence, mitigating abuse and harassment online, cloud computing security, improvements to authentication mechanisms including passwords, confidentiality and privacy in machine learning, and topics in applied and theoretical cryptography.

ristenpart@cornell.edu

Rahul Chatterjee is an Assistant Professor at UW-Madison. Dr. Chatterjee designs secure systems to make digital technologies safe and secure for everyone. His research methodology combines empiricism with analytical techniques. Some of his recent research includes: designing secure and usable authentication systems, securing private data in trigger-action platforms, mitigating abuse of smart home devices by an abusive intimate partner, and building security mindset CS undergraduate students.

rahul.chatterjee@wisc.edu