Two-in-One: A Model Hijacking Attack Against Text Generation Models

Authors: 

Wai Man Si, Michael Backes, and Yang Zhang, CISPA Helmholtz Center for Information Security; Ahmed Salem, Microsoft

Abstract: 

Machine learning has progressed significantly in various applications ranging from face recognition to text generation. However, its success has been accompanied by different attacks. Recently a new attack has been proposed which raises both accountability and parasitic computing risks, namely the model hijacking attack. Nevertheless, this attack has only focused on image classification tasks. In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability. More concretely, we propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones, e.g., language translation, text summarization, and language modeling. We use a range of text benchmark datasets such as SST-2, TweetEval, AGnews, QNLI, and IMDB to evaluate the performance of our attacks. Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {291056,
author = {Wai Man Si and Michael Backes and Yang Zhang and Ahmed Salem},
title = {{Two-in-One}: A Model Hijacking Attack Against Text Generation Models},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
isbn = {978-1-939133-37-3},
address = {Anaheim, CA},
pages = {2223--2240},
url = {https://www.usenix.org/conference/usenixsecurity23/presentation/si},
publisher = {USENIX Association},
month = aug
}

Presentation Video