Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems

Submitted by BrandonB on Wed, 05/06/2015 - 2:52pm

Title	Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems
Publication Type	Journal Article
Year of Publication	2014
Authors	Silei Xu, Runhui Li, Lee, P.P.C., Yunfeng Zhu, Liping Xiang, Yinlong Xu, Lui, J.C.S.
Journal	Computers, IEEE Transactions on
Volume	63
Pagination	995-1007
Date Published	April
ISSN	0018-9340
Keywords	Arrays, cloud storage, coding theory, Complexity theory, data availability, data centers, data communication, disc storage, double-fault tolerant coding scheme, encoding, Load management, logical encoding scheme, MDRR, minimum-disk-read-recovery, networked storage system prototype, optimal single-disk failure recovery, optimal update complexity, parallel memories, Parallel storage systems, Peer to peer computing, recovery algorithm, Redundancy, redundancy coding schemes, reliability, single disk failure recovery algorithm, storage management, System recovery, X-code-based optimal recovery scheme, X-code-based parallel storage systems
Abstract	In modern parallel storage systems (e.g., cloud storage and data centers), it is important to provide data availability guarantees against disk (or storage node) failures via redundancy coding schemes. One coding scheme is X-code, which is double-fault tolerant while achieving the optimal update complexity. When a disk/node fails, recovery must be carried out to reduce the possibility of data unavailability. We propose an X-code-based optimal recovery scheme called minimum-disk-read-recovery (MDRR), which minimizes the number of disk reads for single-disk failure recovery. We make several contributions. First, we show that MDRR provides optimal single-disk failure recovery and reduces about 25 percent of disk reads compared to the conventional recovery approach. Second, we prove that any optimal recovery scheme for X-code cannot balance disk reads among different disks within a single stripe in general cases. Third, we propose an efficient logical encoding scheme that issues balanced disk read in a group of stripes for any recovery algorithm (including the MDRR scheme). Finally, we implement our proposed recovery schemes and conduct extensive testbed experiments in a networked storage system prototype. Experiments indicate that MDRR reduces around 20 percent of recovery time of the conventional approach, showing that our theoretical findings are applicable in practice.
DOI	10.1109/TC.2013.8
Citation Key	6409832

Groups:

Science of Security VO