Visible to the public Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation

TitleScalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation
Publication TypeConference Paper
Year of Publication2022
AuthorsMoses, William S., Narayanan, Sri Hari Krishna, Paehler, Ludger, Churavy, Valentin, Schanen, Michel, Hückelheim, Jan, Doerfert, Johannes, Hovland, Paul
Conference NameSC22: International Conference for High Performance Computing, Networking, Storage and Analysis
Date Publishednov
Keywordsautomatic differentiation, C++, C++ languages, codes, compiler, compiler security, compositionality, distributed, Enzyme, Enzymes, hybrid parallelization, Julia, LLVM, Metrics, MPI, OpenMP, parallel, parallel programming, Program processors, pubcrawl, Raja, Resiliency, Runtime, Scalability, Tasks
AbstractDerivatives are key to numerous science, engineering, and machine learning applications. While existing tools generate derivatives of programs in a single language, modern parallel applications combine a set of frameworks and languages to leverage available performance and function in an evolving hardware landscape. We propose a scheme for differentiating arbitrary DAG-based parallelism that preserves scalability and efficiency, implemented into the LLVM-based Enzyme automatic differentiation framework. By integrating with a full-fledged compiler backend, Enzyme can differentiate numerous parallel frameworks and directly control code generation. Combined with its ability to differentiate any LLVM-based language, this flexibility permits Enzyme to leverage the compiler tool chain for parallel and differentiation-specitic optimizations. We differentiate nine distinct versions of the LULESH and miniBUDE applications, written in different programming languages (C++, Julia) and parallel frameworks (OpenMP, MPI, RAJA, Julia tasks, MPI.jl), demonstrating similar scalability to the original program. On benchmarks with 64 threads or nodes, we find a differentiation overhead of 3.4-6.8x on C++ and 5.4-12.5x on Julia.
DOI10.1109/SC41404.2022.00065
Citation Keymoses_scalable_2022