Task Based Parallel Programming
in HPC and beyond
By Jesus Labarta,
BSC, Barcelona, Spain
Abstract:
The talk will present a vision how multicore architectures are impacting
parallel programming practices in the high performance context.
Using the OmpSs programming model developed at BSC as
a conductor, we will show how we think the challenges new architectures are
posing should be addressed.
Overal, key ideas of our vision are to provide a
clean interface to the programmer that decouples her from the architecture
itself and lets her focus on the algorithmic issues, data accesses and
dependences. We advocate for intelligent runtimes to take the resonsability of mapping the computations expressed by the
programmer to te available
resources in a potentially very dynamic environment.
We aim at OmpSs being a forerunner for OpenMP, proposing and experimenting with features that we
believe should be included in such model aiming not only at HPC but also at all
kinds of general purpose computing. Example features we will briefly describe
include the asynchronous data flow execution of tasks, the support for
heterogeneous devices such as GPUs or systems with big and little cores, the
benefits of automatic locality aware scheduling policies and hybrid MPI+OmpSs programming.
Managing Application Resilience: A Hybrid
Programming Language and Run-Time Approach
By Pedro
C. Diniz, USC Information Sciences Institute (USC/ISI), Marina del Rey,
CA, USA
Abstract:
System resilience is an important challenge that needs
to be addressed in the era of extreme scale computing. High-performance
computing systems will be architected using millions of processor cores and
memory modules. As process technology scales, the reliability of such systems
will be challenged by the inherent unreliability of individual components due
to extremely small transistor geometries, variability in silicon manufacturing
processes, device aging, etc. Therefore, errors and failures in extreme scale
systems will increasingly be the norm rather than the exception. Not all the errors detected warrant
catastrophic system failure, but there are presently no mechanisms for the
programmer to communicate their knowledge of algorithmic fault tolerance to the
system.
In this talk we present a programming model approach
for system resilience that allows programmers to explicitly express their fault
tolerance knowledge. We propose novel resilience oriented programming model
extensions and programming directives, and illustrate their effectiveness. An
inference engine leverages this information and combines it with runtime
gathered context to increase the dependability of HPC systems. The preliminary
experimental results presented here, for a limited set of kernel codes from
both scientific and graph-based computing domains reveal that with a very
modest programming effort, the described approach incurs fairly low execution
time overhead while allowing computations to survive a large number of faults
that would otherwise always result in the termination of the computation.
As transient faults become the norm, rather than the
exception, it will be come increasingly important to
provide the user with high-level programming mechanisms with which he/she can
convey important application acceptability criteria. For best performance
(either in terms of time, power, energy) the underlying systems need to
leverage this information to better navigate the very complex system-level
trade-offs to still deliver a reliable and productive computing environment.
The work presented here is a simple first step towards this vision.
Short Bio:
Pedro C. Diniz is a Research Associate in the
Computational Sciences Division at the University of Southern California's
Information Sciences Institute. Dr. Diniz has 20 years of experience in the
areas of computer architecture, high-performance computing and compilation,
program analysis and optimization. He has been a principal participant in major
research programs funded by DARPA and DoE’s Office of Science. He has
collaborated with universities, national laboratories and industry as prime
contractor and sub-contractor. Dr. Diniz received a B.S. in Computer and Electrical
Engineering and a M.S. in Electrical Engineering from Technical University of
Lisbon in 1988 and 1992 and a Ph.D. from the University California at Santa
Barbara in 1997. His current research focuses on program analysis for software
resiliency and high-performance and reconfigurable computing.
Energy-efficient
HPC in Mont-Blanc and beyond:
an ARM hardware and software
perspective,
by
Roxana Rusitoru,
ARM Ltd, Cambridge, UK
Abstract:
In this talk we give
an overview of energy-efficiency enabling technologies, from architectural
features to software libraries.
Short Bio:
Roxana Rusitoru is a Senior Research Engineer in ARM’s Research
division, working in Software and Large Scale Systems. She joined ARM in 2012
after obtaining an MEng degree in Computing (Software
Engineering) from Imperial College London in optimising
unstructured mesh CFD applications on multicores via machine learning and code
transformation. At ARM, amongst others, she has worked on Linux kernel
optimizations aimed at HPC and sensitivity studies aimed to showcase ARM
AArch64 microprocessor characteristics suitable for HPC. Most recently, she has
been working on power-aware scheduling at OS level for heterogeneous cores and
methodologies to identify representative sub-sections from multi-threaded
applications. Some of her research interests are software performance
optimization and next-gen heterogeneous architectures. Roxana has been a part
of the Mont-Blanc 1 and 2 projects, and is now leading the Software ecosystem
in Mont-Blanc 3, in addition to technical contributions.