Visible to the public On Structuring Holistic Fault Tolerance

TitleOn Structuring Holistic Fault Tolerance
Publication TypeConference Paper
Year of Publication2016
AuthorsGensh, Rem, Romanovsky, Alexander, Yakovlev, Alex
Conference NameProceedings of the 15th International Conference on Modularity
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-3995-7
KeywordsEnergy efficiency, error recovery, Many-core systems, Performance, pubcrawl, Resiliency, system layering, System recovery, system structuring
Abstract

Computer systems are developed taking into account that they should be easily maintained in the future. It is one of the main requirements for the sound architectural design. The existing approaches to introducing fault tolerance rely on recursive system structuring out of functional components - this typically results in non-optimal fault tolerance. The paper proposes a vision of structuring complex many-core systems by introducing a special component supporting system-wide fault tolerance coordination. The component acts as a central module making decisions about fault tolerance strategies to be implemented by individual system components depending on the performance and energy requirements specified as system operating modes.

URLhttp://doi.acm.org/10.1145/2889443.2889458
DOI10.1145/2889443.2889458
Citation Keygensh_structuring_2016