Visible to the public Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems

TitleSingle Disk Failure Recovery for X-Code-Based Parallel Storage Systems
Publication TypeJournal Article
Year of Publication2014
AuthorsSilei Xu, Runhui Li, Lee, P.P.C., Yunfeng Zhu, Liping Xiang, Yinlong Xu, Lui, J.C.S.
JournalComputers, IEEE Transactions on
Volume63
Pagination995-1007
Date PublishedApril
ISSN0018-9340
KeywordsArrays, cloud storage, coding theory, Complexity theory, data availability, data centers, data communication, disc storage, double-fault tolerant coding scheme, encoding, Load management, logical encoding scheme, MDRR, minimum-disk-read-recovery, networked storage system prototype, optimal single-disk failure recovery, optimal update complexity, parallel memories, Parallel storage systems, Peer to peer computing, recovery algorithm, Redundancy, redundancy coding schemes, reliability, single disk failure recovery algorithm, storage management, System recovery, X-code-based optimal recovery scheme, X-code-based parallel storage systems
Abstract

In modern parallel storage systems (e.g., cloud storage and data centers), it is important to provide data availability guarantees against disk (or storage node) failures via redundancy coding schemes. One coding scheme is X-code, which is double-fault tolerant while achieving the optimal update complexity. When a disk/node fails, recovery must be carried out to reduce the possibility of data unavailability. We propose an X-code-based optimal recovery scheme called minimum-disk-read-recovery (MDRR), which minimizes the number of disk reads for single-disk failure recovery. We make several contributions. First, we show that MDRR provides optimal single-disk failure recovery and reduces about 25 percent of disk reads compared to the conventional recovery approach. Second, we prove that any optimal recovery scheme for X-code cannot balance disk reads among different disks within a single stripe in general cases. Third, we propose an efficient logical encoding scheme that issues balanced disk read in a group of stripes for any recovery algorithm (including the MDRR scheme). Finally, we implement our proposed recovery schemes and conduct extensive testbed experiments in a networked storage system prototype. Experiments indicate that MDRR reduces around 20 percent of recovery time of the conventional approach, showing that our theoretical findings are applicable in practice.

DOI10.1109/TC.2013.8
Citation Key6409832