### Practical Erase Suspension for Modern Low-latency SSDs

**Shine Kim<sup>†§</sup>** Jonghyun Bae<sup>†</sup> Hakbeom Jang<sup>\*</sup> Wenjing Jin<sup>†</sup> Jeonghun Gong<sup>†</sup> Seoungyeon Lee<sup>§</sup> Tae Jun Ham<sup>†</sup> Jae W. Lee<sup>†</sup>





§Samsung Electronics



\*Sungkyunkwan University

July 12<sup>th</sup>, 2019

**USENIX ATC 2019, RENTON, WA, USA** 

#### **Today's NAND flash-based SSDs in datacenters**

- NAND flash-based SSDs have become a *de-facto* standard in datacenters
  - Superior throughput, low average latency, and relatively low price



[1] https://www.samsung.com/semiconductor/ssd/enterprise-ssd/

[2] IEEE ISSCC'18, W. Cheong et al., A flash memory controller for 15us ULL-SSD using high-speed 3D NAND flash with 3us read time

[3] www.amazon.com: SAMSUNG 860QVO 1TB

#### **Read tail behavior of NAND flash-based SSD**

Challenge: Despite low average response time, read tail latency can be very long



#### Garbage collection (GC) (e.g., 100ms → 10ms)

- GC-induced read tail latency has been optimized by sophisticated GC schemes
- Block erase operation (e.g., 10ms/block)
  - Has become most dominant source of read tail latency

[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



- Garbage collection (GC) (e.g., 100ms → 10ms)
  - GC-induced read tail latency has been optimized by sophisticated GC schemes
- Block erase operation (e.g., 10ms/block)
  - Has become most dominant source of read tail latency

[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



- Garbage collection (GC) (e.g., 100ms → 10ms)
  - GC-induced read tail latency has been optimized by sophisticated GC schemes
- Block erase operation (e.g., 10ms/block)
  - Has become most dominant source of read tail latency



[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



- Garbage collection (GC) (e.g., 100ms → 10ms)
  - GC-induced read tail latency has been optimized by sophisticated GC schemes
- Block erase operation (e.g., 10ms/block)
  - Has become most dominant source of read tail latency



[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



- Garbage collection (GC) (e.g., 100ms → 10ms)
  - GC-induced read tail latency has been optimized by sophisticated GC schemes
- Block erase operation (e.g., 10ms/block)
  - Has become most dominant source of read tail latency
  - Erase suspension<sup>[1]</sup> can effectively decrease block erase latency



[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



- Garbage collection (GC) (e.g., 100ms → 10ms)
  - GC-induced read tail latency has been optimized by sophisticated GC schemes
- Block erase operation (e.g., 10ms/block)
  - Has become most dominant source of read tail latency
  - Erase suspension<sup>[1]</sup> can effectively decrease block erase latency

## However, existing erase suspension can cause *write starvation and NAND reliability problem!*

[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



 Modern SSDs perform erase operation with multiple discrete pulses to provide well-aligned safe points for suspending an ongoing erase



Architecture and Code Optimization (ARC) Laboratory @ SNU

10

- Modern SSDs perform erase operation with multiple discrete pulses to provide well-aligned safe points for suspending an ongoing erase
- We propose three practical erase suspension schemes



- Modern SSDs perform erase operation with multiple discrete pulses to provide well-aligned safe points for suspending an ongoing erase
- We propose three practical erase suspension schemes
  - Immediate erase suspension (I-ES): Aborts erase immediately and restarts from previous safe-point





- Modern SSDs perform erase operation with multiple discrete pulses to provide well-aligned safe points for suspending an ongoing erase
- We propose three practical erase suspension schemes
  - Immediate erase suspension (I-ES): Aborts erase immediately and restarts from previous safe-point
  - Deferred erase suspension (D-ES): Waits until the current erase pulse is finished





- Modern SSDs perform erase operation with multiple discrete pulses to provide well-aligned safe points for suspending an ongoing erase
- We propose three practical erase suspension schemes
  - Immediate erase suspension (I-ES): Aborts erase immediately and restarts from previous safe-point
  - Deferred erase suspension (D-ES): Waits until the current erase pulse is finished
  - Timeout-based erase suspension (T-ES): Adaptively switches between I-ES and D-ES



#### Prior work: Problems with existing erase suspension<sup>[1]</sup> (1)

#### • Problem #1: Write starvation

- With bursty reads



1) Remaining erase pulse (9ms) may fail to make a progress by incoming reads

#### Erase (and Write) Starvation!

[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



#### Prior work: Problems with existing erase suspension<sup>[1]</sup> (2)

- Problem #2: Endurance degradation
  - With bursty reads



2) Erase suspension/resumption causes additional stress to NAND

Over-erase NAND blocks → Increase uncorrectable bit error rate (UBER)

#### Endurance degradation of SSD!

[1] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012





#### Practical erase suspension: Background

#### NAND erase operation

- Pulls electrons out of floating gate by applying very high voltage

#### • Incremental Step Pulse Erasing (ISPE)

- Standard technique to minimize damages on NAND cells
- Applying several, discrete pulses (of ~1ms) with increasingly higher nominal voltages



#### I-ES operations

- Suspend: Immediately terminates ongoing erase step (taking ~ 100µs)
- Resume: Restarts the suspended erase pulse from the beginning



Time (ms)

☐ : Erase pulse

: Verify pulse



#### I-ES operations

- Suspend: Immediately terminates ongoing erase step (taking ~ 100µs)
- Resume: Restarts the suspended erase pulse from the beginning





☐ : Erase pulse

: Verify pulse



- Suspend: Immediately terminates ongoing erase step (taking ~ 100µs)
- Resume: Restarts the suspended erase pulse from the beginning





- Suspend: Immediately terminates ongoing erase step (taking ~ 100µs)
- Resume: Restarts the suspended erase pulse from the beginning





- Suspend: Immediately terminates ongoing erase step (taking ~ 100µs)
- Resume: Restarts the suspended erase pulse from the beginning



- Suspend: Immediately terminates ongoing erase step (taking ~ 100µs)
- Resume: Restarts the suspended erase pulse from the beginning



- Suspend: Immediately terminates ongoing erase step (taking ~ 100µs)
- Resume: Restarts the suspended erase pulse from the beginning
- Does not guarantee forward progress of erase operation  $\rightarrow$  Write starvation problem!



#### D-ES operations

- Suspend: Waits until current erase step is finished (erase and verify pulse)
- Resume: Start the next erase pulse



Time (ms)

☐ : Erase pulse

: Verify pulse



- Suspend: Waits until current erase step is finished (erase and verify pulse)
- Resume: Start the next erase pulse



- Suspend: Waits until current erase step is finished (erase and verify pulse)
- Resume: Start the next erase pulse



#### • D-ES operations

- Suspend: Waits until current erase step is finished (erase and verify pulse)
- Resume: Start the next erase pulse



#### • D-ES operations

- Suspend: Waits until current erase step is finished (erase and verify pulse)
- Resume: Start the next erase pulse
- No erase and write starvation problem, but longer read tail! (i.e., length of single step, ~ 1ms)



- 1. Performs I-ES until erase operation is suspended for a timeout period (*N* ms)
- 2. If a timeout happens, switches to D-ES to avoid erase and write starvation



- 1. Performs I-ES until erase operation is suspended for a timeout period (*N* ms)
- 2. If a timeout happens, switches to D-ES to avoid erase and write starvation
- Choice of erase timeout period (*N*)
  - Provides an effective control knob for read/write latency
  - Trades maximum write tail latency for reduced read latency

Ex) 
$$N = 64ms$$
, and GC Write Latency =  $35ms$   
 $\bigcirc$   
Maximum Write Latency  $\leq 100ms$ 

#### **Evaluation: Methodology**

- NVMe SSD simulator: MQSim<sup>[1]</sup>
- Benchmarks: Flexible I/O Tester, Aerospike Certification Tool (ACT) and TPC-C
- Comparison of six designs:
  - **Baseline** (no suspension) and **Ideal-ES** (erase suspension with zero penalty)
  - Erase suspension (ES)<sup>[2]</sup>
  - Immediate-ES (I-ES), Deferred-ES (D-ES), and, Timeout-based-ES (T-ES)

| PCIe Gen 3 X 4 Lane, 240GB, NVMe SSD Device                                                                            |                                        |  |  |  |
|------------------------------------------------------------------------------------------------------------------------|----------------------------------------|--|--|--|
| NAND Configurations                                                                                                    | 4 channels, 4 chips/channel, 1die/chip |  |  |  |
| FTL Schemes                                                                                                            | Page Mapping, Preemptible GC           |  |  |  |
| NAND Latency                                                                                                           |                                        |  |  |  |
| Read: 3µs, Program: 100µs, Block Erase: 1ms per step (5 steps),<br>Erase Suspension Penalty: 100µs, T-ES timeout: 64ms |                                        |  |  |  |

[1] Tavakkol et al, MQSim: A framework for enabling realistic studies of modern multi-queue SSD devices, USENIX FAST 2018 [2] Wu et al, Reducing SSD Read Latency via NAND Flash Program and Erase Suspension, USENIX FAST 2012



#### FIO random test

- Read 70%, Write 30%, 4KB QD 16



- $\circ$  Baseline → ~5ms (entire erase operation)
- D-ES → ~1ms (single erase pulse)
- ES, I-ES, T-ES  $\rightarrow$  ~100µs (suspension latency)

 I-ES, T-ES → Long write latency due to repeated erase suspension



#### **Evaluation: Aerospike Certification Tool (ACT)**

#### • ACT: Database benchmark

- Consists of three threads, and gradually increases I/O rate in integer multiples



| Test Item        | Evaluation Criteria                                                     | SSD #1 | SSD #2 | SAMSUNG ·                                          |    |
|------------------|-------------------------------------------------------------------------|--------|--------|----------------------------------------------------|----|
| Performance Test | i) 95% of I/O < 1ms<br>ii) 99% of I/O < 8ms<br>iii) 99.9% of I/O < 64ms | 10X    | 8X     | SAMSUMO<br>SAMSUMO<br>SAMSUMO<br>SAMSUMO<br>AMSUMO | == |
| Stress Test      | iv) I/O latency < request period                                        | 2X     | 10X    | SAMSUND .                                          |    |



#### **Evaluation: Aerospike Certification Tool (ACT)**

#### ACT test results

- Baseline shows poor *performance test* result (14x) due to long-tail latency of read request
- ES and I-ES suffer write starvation problem (22x)
- D-ES and T-ES demonstrate good results (30x) for both stress and performance tests



• TPC-C from SNIA



(a) Read tail latency

 $\circ$  Baseline → ~5ms (entire operation)

- D-ES, T-ES  $\rightarrow$  ~1ms (single erase pulse)
- $\circ$  ES, I-ES  $\rightarrow$  Failure by write command timeout

(b) Write tail latency

 $\circ$  T-ES → Timeout (64ms) + GC latency (24ms)



#### Conclusion

- Practical erase suspension harnesses the full potential of NAND flash-based SSDs
  - Minimizes the impact of erase operation on read tail latency
  - Achieves very low read tail latency without write starvation and endurance degradation



# **Thank You!**

Our simulator is available at

https://github.com/SNU-ARC/MQSim-Practical-ERS-SUS

