Scientific progress often depends on computer technology providing ever faster computers capable of processing ever increasing amounts of data. The growth in memory capacity and density of current computer systems, however, is in peril as Dynamic Random Access Memory (DRAM), the current dominant main memory technology, faces serious roadblocks in scaling. Non-volatile memory or persistent memory is an emerging alternative technology that offers high integration density, speed similar to current main memory, byte addressability similar to current main memory, and lower standby power than current main memory. Hence, persistent memory is expected to increasingly augment or replace DRAM as main memory, and such a change is also expected to happen in Graphics Processing Unit (GPU) based computing systems which are the dominant accelerators for high performance computing. However, in order to fully realize its potential, research on persistency models on GPUs is needed. This project investigates integrated software and hardware techniques to enable GPUs to make efficient use of non-volatile memory. Successful outcomes of this project will lead to faster access to data by reducing overheads involved with file access. The software produced (persistent GPU benchmarks, compiler, and tuner) and prototyping platform will be made available to other researchers. Education and outreach activities in this project seek to train the next generation of programmers in this discipline.
The research in this project answers the question: on a GPU system, what architecture supports are needed to achieve efficient persistency programming on GPUs with persistent memory (PM) as their device memory? The research contributions include: (1) an open-source GPU PM benchmark suite that is representative of various application domains; (2) an exploration of persistency models in GPUs and Instruction Set Architecture support; (3) optimizations on the persistency models by removing the need for logging; (4) a compiler pass and performance tuner to automatically determine the best-performing memory persistency and recovery model, and transform the code accordingly.