Optimizing Dynamic Neural Networks with Brainstorm

Weihao Cui; Zhenhua Han; Lingji Ouyang; Yichuan Wang; Ningxin Zheng; Lingxiao Ma; Yuqing Yang; Fan Yang; Jilong Xue; Lili Qiu; Lidong Zhou; Quan Chen; Haisheng Tan; Minyi Guo

Authors:

Weihao Cui, Shanghai Jiao Tong University; Zhenhua Han, Microsoft Research Asia; Lingji Ouyang, University of Science and Technology of China; Yichuan Wang, Shanghai Jiao Tong University; Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, and Lidong Zhou, Microsoft Research Asia; Quan Chen, Shanghai Jiao Tong University; Haisheng Tan, University of Science and Technology of China; Minyi Guo, Shanghai Jiao Tong University

Abstract:

Dynamic neural networks (NNs), which can adapt sparsely activated sub-networks to inputs during inference, have shown significant advantages over static ones in terms of accuracy, computational efficiency, and adaptiveness. However, existing deep learning frameworks and compilers mainly focus on optimizing static NNs with deterministic execution, missing optimization opportunities brought by non-uniform distribution of activation in dynamic NNs. The key to optimizing dynamic NNs is the traceability of how data are dynamically dispatched to different paths at inference. Such dynamism often happens at sub-tensor level (e.g., conditional dispatching tokens of a tensor), thus hard for existing tensor-centric frameworks to trace due to misaligned expression granularity.

In this paper, we present Brainstorm, a deep learning framework for optimizing dynamic NNs, which bridges the gap by unifying how dynamism should be expressed. Brainstorm proposes (1) Cell, the key data abstraction that lets model developers express the data granularity where dynamism exists, and (2) Router, a unified interface to let model developers express how Cells should be dynamically dispatched. Brainstorm handles efficient execution of routing actions. This design allows Brainstorm to collect profiles of fine-grained dataflow at the correct granularity. The traceability further opens up a new space of dynamic optimization for dynamic NNs to specialize their execution to the runtime dynamism distribution. Extensive evaluations show Brainstorm brings up to 11.7× speedup (3.29× on average) or leads to 42% less memory consumption for popular dynamic neural networks with the proposed dynamic optimizations.

Weihao Cui, Shanghai Jiao Tong University

Zhenhua Han, Microsoft Research Asia

Lingji Ouyang, University of Science and Technology of China

Yichuan Wang, Shanghai Jiao Tong University

Ningxin Zheng, Microsoft Research Asia

Lingxiao Ma, Microsoft Research Asia

Yuqing Yang, Microsoft Research Asia

Fan Yang, Microsoft Research Asia

Jilong Xue, Microsoft Research Asia

Lili Qiu, Microsoft Research Asia

Lidong Zhou, Microsoft Research Asia

Quan Chen, Shanghai Jiao Tong University

Haisheng Tan, University of Science and Technology of China

Minyi Guo, Shanghai Jiao Tong University

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {288550,
author = {Weihao Cui and Zhenhua Han and Lingji Ouyang and Yichuan Wang and Ningxin Zheng and Lingxiao Ma and Yuqing Yang and Fan Yang and Jilong Xue and Lili Qiu and Lidong Zhou and Quan Chen and Haisheng Tan and Minyi Guo},
title = {Optimizing Dynamic Neural Networks with Brainstorm},
booktitle = {17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)},
year = {2023},
isbn = {978-1-939133-34-2},
address = {Boston, MA},
pages = {797--815},
url = {https://www.usenix.org/conference/osdi23/presentation/cui},
publisher = {USENIX Association},
month = jul
}

Download

Cui PDF

Optimizing Dynamic Neural Networks with Brainstorm

Open Access Media

Presentation Video