MIT taps LLVM’s parallel processing power for faster code
6 February 2017 | 0
If you want to go fast in a multicore, multiprocessor world, you must go parallel. Splitting workloads across CPUs and cores is looking like the last, best hope for speed boosts as the limits of Moore’s Law loom.
Small wonder there is growing emphasis on empowering developers using cutting-edge software tools to exploit parallelism. This work encompasses everything from language design to the compiler toolchain.
Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory have developed techniques to automatically add parallel computing optimisations to existing code whenever possible.
These speed-ups come from modifications to LLVM, a popular and well-understood open source compiler framework used by everyone from Apple to Microsoft.
Going faster, side by side
MIT’s findings are described in a paper to be presented next week at the Association for Computing Machinery’s Symposium on Principles and Practice of Parallel Programming, entitled “Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation.”
LLVM uses intermediate representation (IR) as part of its compilation process. The language to be compiled is first translated into the IR, then the IR is itself turned into machine code, give or take a few steps. Thus, LLVM can reason more effectively about the code and perform optimisations that would otherwise be hard to implement. (Rust’s compiler also uses an IR for similar reasons.)
Tapir, the MIT team’s name for its custom IR changes to LLVM, “allows the compiler to optimise across parallel control constructs with only minor changes to its existing analyses and code transformations,” according to the paper. In other words, Tapir was built on top of optimisation tactics for parallelising code that already existed inside LLVM. The changes to LLVM needed for this involved only “about 6,000 lines of LLVM’s 4-million-line codebase,” according to the paper.
Tapir provides a set of IR instructions to enhance fork-join parallelism (FJP), a mechanism found in many existing compilers. With FJP, “subroutines can be spawned in parallel and iterations of a parallel loop can execute concurrently on modern multicore machines.” However, these optimisations only work for code that explicitly requests them; ordinary code, or “serial” code, isn’t optimised this way.
Machines for heavy lifting
How optimisations like this pay off in the real world is always an open question. MIT’s team ran a series of benchmarks designed to test parallel performance and found that Tapir’s optimisations were generally either as good as or better than manual optimisations to the source code. (One of the tested benchmarks yielded worse results than the non-Tapir optimisations, but only about 2% worse.)
MIT has worked on parallelism before. Last year, it announced Milk, a set of extensions to C/C++ that alleviate memory bottlenecks in big data applications. That project extended an existing system for enhancing parallelism in applications, OpenMP, also cited in the Tapir paper as a standard way for developers to enhance parallelism.
Developers will probably always have to do a certain amount of manual work to take advantage of parallelism, such as splitting a given program into two or more high-level components that run as discrete processes. Still, experiments like Tapir show there’s room for many more kinds of automatic optimisations.
IDG News Service