Hartung-Gorre Verlag
Inh.: Dr.
Renate Gorre D-78465
Konstanz Fon: +49 (0)7533 97227 Fax: +49 (0)7533 97228 www.hartung-gorre.de
|
S
|
Series in
Microelectronics
edited by
Qiuting
Huang
Andreas
Schenk
Mathieu
Maurice Luisier
Bernd
Witzigmann
Fabian Thomas Schuiki
Streaming Architectures for Extreme Energy Efficiency
in High-Performance Computing
2021. XVI, 312 pages. € 64,00.
ISBN
978-3-86628-725-9
Abstract:
The end of Moore’s law and the breakdown of Dennard
scaling has prompted a paradigm shift in the way we
approach computer architecture design. Performance at low power has become the
key ingredient in achieving high utilization of available hardware in order to
mitigate the effect of limited frequency and overcome dark silicon. The von
Neumann bottleneck is one of the key challenges in this field: instruction
fetches compete with data accesses for memory bandwidth. This bottleneck also
applies to the instruction pipeline of a processor, where load-store and control
instructions compete with compute instructions for
issue slots. A popular way to overcome this bottleneck is to implement
dedicated accelerators for a specific problem. This approach has grown ever
more popular with the recent rise of machine learning. It is based on the
observation that, all other things being equal, specialization in hardware
always wins. However the complementary conclusion also holds: the lack of
general programmability limits the accelerator’s use to a specific problem. In
a time of fast-moving algorithms, today’s hardware accelerator cannot compute
tomorrow’s algorithm. General purpose processors have evolved to mitigate the
von Neumann bottleneck as well. One example of this is the CISC-to-RISC
translation in modern processors, which can act as an instruction compression
scheme. Similarly, SIMD and SIMT paradigms offer a fixed increase in
computations per instruction, while Cray-style vectorization offers a more
dynamic and potentially higher increase. Among the algorithms that lend
themselves particularly well to such acceleration is the class of
data-oblivious algorithms. These algorithms have control flow which does not
depend on the data being processed, and comprise many relevant algorithms from
linear algebra, machine learning, and scientific computing. This thesis
develops the concept of hardware address generation and direct memory streaming
as a method to mitigate the von Neumann bottleneck, applies the concept to
in-order single-issue processors, allowing them to achieve full utilization of
compute resources, introduces pseudo-dual-issue execution with dedicated compute hardware loops, and distills these extensions into
an architectural template for high-performance computers capable of
concentrating a significant part of its energy footprint in the arithmetic
units.
Keywords: energy-efficient, high-performance computing
Direkt bestellen bei / to
order directly from:
Hartung-Gorre
Verlag / D-78465 Konstanz / Germany
Telefon: +49
(0) 7533 97227 Telefax: +49 (0) 7533
97228
http://www.hartung-gorre.de eMail: verlag@hartung-gorre.de