On Structuring Holistic Fault Tolerance
Title | On Structuring Holistic Fault Tolerance |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Gensh, Rem, Romanovsky, Alexander, Yakovlev, Alex |
Conference Name | Proceedings of the 15th International Conference on Modularity |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-3995-7 |
Keywords | Energy efficiency, error recovery, Many-core systems, Performance, pubcrawl, Resiliency, system layering, System recovery, system structuring |
Abstract | Computer systems are developed taking into account that they should be easily maintained in the future. It is one of the main requirements for the sound architectural design. The existing approaches to introducing fault tolerance rely on recursive system structuring out of functional components - this typically results in non-optimal fault tolerance. The paper proposes a vision of structuring complex many-core systems by introducing a special component supporting system-wide fault tolerance coordination. The component acts as a central module making decisions about fault tolerance strategies to be implemented by individual system components depending on the performance and energy requirements specified as system operating modes. |
URL | http://doi.acm.org/10.1145/2889443.2889458 |
DOI | 10.1145/2889443.2889458 |
Citation Key | gensh_structuring_2016 |