Task Based Parallel Programming in HPC and beyond
Jesus Labarta Mancho's pictureBy Jesus Labarta, BSC, Barcelona, Spain

Abstract:
The talk will present a vision how multicore architectures are impacting parallel programming practices in the high performance context.
Using the OmpSs programming model developed at BSC as a conductor, we will show how we think the challenges new architectures are posing should be addressed.
Overal, key ideas of our vision are to provide a clean interface to the programmer that decouples her from the architecture itself and lets her focus on the algorithmic issues, data accesses and dependences. We advocate for intelligent runtimes to take the resonsability of mapping the computations expressed by the programmer to te available resources in a potentially very dynamic environment.
We aim at OmpSs being a forerunner for OpenMP, proposing and experimenting with features that we believe should be included in such model aiming not only at HPC but also at all kinds of general purpose computing. Example features we will briefly describe include the asynchronous data flow execution of tasks, the support for heterogeneous devices such as GPUs or systems with big and little cores, the benefits of automatic locality aware scheduling policies and hybrid MPI+OmpSs programming.

 

Managing Application Resilience: A Hybrid Programming Language and Run-Time Approach

By Pedro C. Diniz, USC Information Sciences Institute (USC/ISI), Marina del Rey, CA, USA

Abstract:

System resilience is an important challenge that needs to be addressed in the era of extreme scale computing. High-performance computing systems will be architected using millions of processor cores and memory modules. As process technology scales, the reliability of such systems will be challenged by the inherent unreliability of individual components due to extremely small transistor geometries, variability in silicon manufacturing processes, device aging, etc. Therefore, errors and failures in extreme scale systems will increasingly be the norm rather than the exception.  Not all the errors detected warrant catastrophic system failure, but there are presently no mechanisms for the programmer to communicate their knowledge of algorithmic fault tolerance to the system.

In this talk we present a programming model approach for system resilience that allows programmers to explicitly express their fault tolerance knowledge. We propose novel resilience oriented programming model extensions and programming directives, and illustrate their effectiveness. An inference engine leverages this information and combines it with runtime gathered context to increase the dependability of HPC systems. The preliminary experimental results presented here, for a limited set of kernel codes from both scientific and graph-based computing domains reveal that with a very modest programming effort, the described approach incurs fairly low execution time overhead while allowing computations to survive a large number of faults that would otherwise always result in the termination of the computation.

As transient faults become the norm, rather than the exception, it will be come increasingly important to provide the user with high-level programming mechanisms with which he/she can convey important application acceptability criteria. For best performance (either in terms of time, power, energy) the underlying systems need to leverage this information to better navigate the very complex system-level trade-offs to still deliver a reliable and productive computing environment. The work presented here is a simple first step towards this vision.

Short Bio:

Pedro C. Diniz is a Research Associate in the Computational Sciences Division at the University of Southern California's Information Sciences Institute. Dr. Diniz has 20 years of experience in the areas of computer architecture, high-performance computing and compilation, program analysis and optimization. He has been a principal participant in major research programs funded by DARPA and DoE’s Office of Science. He has collaborated with universities, national laboratories and industry as prime contractor and sub-contractor. Dr. Diniz received a B.S. in Computer and Electrical Engineering and a M.S. in Electrical Engineering from Technical University of Lisbon in 1988 and 1992 and a Ph.D. from the University California at Santa Barbara in 1997. His current research focuses on program analysis for software resiliency and high-performance and reconfigurable computing.

 

Energy-efficient HPC in Mont-Blanc and beyond: an ARM hardware and software perspective,

http://www.goingarm.com/img/roxana_rusitoru.jpg by Roxana Rusitoru, ARM Ltd, Cambridge, UK

Abstract:

In this talk we give an overview of energy-efficiency enabling technologies, from architectural features to software libraries.

Short Bio:

Roxana Rusitoru is a Senior Research Engineer in ARM’s Research division, working in Software and Large Scale Systems. She joined ARM in 2012 after obtaining an MEng degree in Computing (Software Engineering) from Imperial College London in optimising unstructured mesh CFD applications on multicores via machine learning and code transformation. At ARM, amongst others, she has worked on Linux kernel optimizations aimed at HPC and sensitivity studies aimed to showcase ARM AArch64 microprocessor characteristics suitable for HPC. Most recently, she has been working on power-aware scheduling at OS level for heterogeneous cores and methodologies to identify representative sub-sections from multi-threaded applications. Some of her research interests are software performance optimization and next-gen heterogeneous architectures. Roxana has been a part of the Mont-Blanc 1 and 2 projects, and is now leading the Software ecosystem in Mont-Blanc 3, in addition to technical contributions.