Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems
Title | Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems |
Publication Type | Journal Article |
Year of Publication | 2014 |
Authors | Silei Xu, Runhui Li, Lee, P.P.C., Yunfeng Zhu, Liping Xiang, Yinlong Xu, Lui, J.C.S. |
Journal | Computers, IEEE Transactions on |
Volume | 63 |
Pagination | 995-1007 |
Date Published | April |
ISSN | 0018-9340 |
Keywords | Arrays, cloud storage, coding theory, Complexity theory, data availability, data centers, data communication, disc storage, double-fault tolerant coding scheme, encoding, Load management, logical encoding scheme, MDRR, minimum-disk-read-recovery, networked storage system prototype, optimal single-disk failure recovery, optimal update complexity, parallel memories, Parallel storage systems, Peer to peer computing, recovery algorithm, Redundancy, redundancy coding schemes, reliability, single disk failure recovery algorithm, storage management, System recovery, X-code-based optimal recovery scheme, X-code-based parallel storage systems |
Abstract | In modern parallel storage systems (e.g., cloud storage and data centers), it is important to provide data availability guarantees against disk (or storage node) failures via redundancy coding schemes. One coding scheme is X-code, which is double-fault tolerant while achieving the optimal update complexity. When a disk/node fails, recovery must be carried out to reduce the possibility of data unavailability. We propose an X-code-based optimal recovery scheme called minimum-disk-read-recovery (MDRR), which minimizes the number of disk reads for single-disk failure recovery. We make several contributions. First, we show that MDRR provides optimal single-disk failure recovery and reduces about 25 percent of disk reads compared to the conventional recovery approach. Second, we prove that any optimal recovery scheme for X-code cannot balance disk reads among different disks within a single stripe in general cases. Third, we propose an efficient logical encoding scheme that issues balanced disk read in a group of stripes for any recovery algorithm (including the MDRR scheme). Finally, we implement our proposed recovery schemes and conduct extensive testbed experiments in a networked storage system prototype. Experiments indicate that MDRR reduces around 20 percent of recovery time of the conventional approach, showing that our theoretical findings are applicable in practice. |
DOI | 10.1109/TC.2013.8 |
Citation Key | 6409832 |
- networked storage system prototype
- X-code-based parallel storage systems
- X-code-based optimal recovery scheme
- System recovery
- storage management
- single disk failure recovery algorithm
- Reliability
- redundancy coding schemes
- Redundancy
- recovery algorithm
- Peer to peer computing
- Parallel storage systems
- parallel memories
- optimal update complexity
- optimal single-disk failure recovery
- arrays
- minimum-disk-read-recovery
- MDRR
- logical encoding scheme
- Load management
- encoding
- double-fault tolerant coding scheme
- disc storage
- data communication
- data centers
- data availability
- Complexity theory
- coding theory
- cloud storage