Aviv Nachman and Gala Yadgar, Technion - Israel Institute of Technology; Sarai Sheinvald, Braude College of Engineering
Deduplication decreases the physical occupancy of files in a storage volume by removing duplicate copies of data chunks, but creates data-sharing dependencies that complicate standard storage management tasks. Specifically, data migration plans must consider the dependencies between files that are remapped to new volumes and files that are not. Thus far, only greedy approaches have been suggested for constructing such plans, and it is unclear how they compare to one another and how much they can be improved.
We set to bridge this gap for seeding—migration in which the target volume is initially empty. We present GoSeed, a formulation of seeding as an integer linear programming (ILP) problem, and three acceleration methods for applying it to real-sized storage volumes. Our experimental evaluation shows that, while the greedy approaches perform well on "easy" problem instances, the cost of their solution can be significantly higher than that of GoSeed's solution on “hard” instances, for which they are sometimes unable to find a solution at all.
FAST '20 Open Access Sponsored by NetApp
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Aviv Nachman and Gala Yadgar and Sarai Sheinvald},
title = {{GoSeed}: Generating an Optimal Seeding Plan for Deduplicated Storage},
booktitle = {18th USENIX Conference on File and Storage Technologies (FAST 20)},
year = {2020},
isbn = {978-1-939133-12-0},
address = {Santa Clara, CA},
pages = {193--207},
url = {https://www.usenix.org/conference/fast20/presentation/nachman},
publisher = {USENIX Association},
month = feb
}