Fractal++: Closing the performance gap between fractal and conventional coherence
Title | Fractal++: Closing the performance gap between fractal and conventional coherence |
Publication Type | Conference Paper |
Year of Publication | 2014 |
Authors | Voskuilen, G., Vijaykumar, T.N. |
Conference Name | Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on |
Date Published | June |
Keywords | 32-core simulations, cache coherence protocol bugs, cache storage, Coherence, coherence verification approaches, contention-hints, decoupled-replies, directory protocols, Erbium, formal verification, four-socket system, fractal coherence, fractal protocols, Fractal++, Fractals, fully-parallel-fractal-invalidations, indirect-communication, longer-latency multisocket system, Multicore processing, multicores, observational equivalence, Optimization, parallel invalidations, parallel processing, partially-serial-invalidations, performance gap, performance optimizations, performance scalability, protocol optimizations, Protocols, reply-forwarding, Scalability, single-socket system, state explosion, verification scalability, verification-constrained architectures |
Abstract | Cache coherence protocol bugs can cause multicores to fail. Existing coherence verification approaches incur state explosion at small scales or require considerable human effort. As protocols' complexity and multicores' core counts increase, verification continues to be a challenge. Recently, researchers proposed fractal coherence which achieves scalable verification by enforcing observational equivalence between sub-systems in the coherence protocol. A larger sub-system is verified implicitly if a smaller sub-system has been verified. Unfortunately, fractal protocols suffer from two fundamental limitations: (1) indirect-communication: sub-systems cannot directly communicate and (2) partially-serial-invalidations: cores must be invalidated in a specific, serial order. These limitations disallow common performance optimizations used by conventional directory protocols: reply-forwarding where caches communicate directly and parallel invalidations. Therefore, fractal protocols lack performance scalability while directory protocols lack verification scalability. To enable both performance and verification scalability, we propose Fractal++ which employs a new class of protocol optimizations for verification-constrained architectures: decoupled-replies, contention-hints, and fully-parallel-fractal-invalidations. The first two optimizations allow reply-forwarding-like performance while the third optimization enables parallel invalidations in fractal protocols. Unlike conventional protocols, Fractal++ preserves observational equivalence and hence is scalably verifiable. In 32-core simulations of single- and four-socket systems, Fractal++ performs nearly as well as a directory protocol while providing scalable verifiability whereas the best-performing previous fractal protocol performs 8% on average and up to 26% worse with a single-socket and 12% on average and up to 34% worse with a longer-latency multi-socket system. |
DOI | 10.1109/ISCA.2014.6853211 |
Citation Key | 6853211 |
- performance scalability
- multicores
- observational equivalence
- optimization
- parallel invalidations
- parallel processing
- partially-serial-invalidations
- performance gap
- performance optimizations
- Multicore processing
- protocol optimizations
- Protocols
- reply-forwarding
- Scalability
- single-socket system
- state explosion
- verification scalability
- verification-constrained architectures
- formal verification
- cache coherence protocol bugs
- cache storage
- Coherence
- coherence verification approaches
- contention-hints
- decoupled-replies
- directory protocols
- Erbium
- 32-core simulations
- four-socket system
- fractal coherence
- fractal protocols
- Fractal++
- Fractals
- fully-parallel-fractal-invalidations
- indirect-communication
- longer-latency multisocket system