Hartung-Gorre Verlag
Inh.: Dr.
Renate Gorre D-78465
Konstanz Fon:
+49 (0)7533 97227 Fax: +49 (0)7533 97228 www.hartung-gorre.de
|
S
|
Series in
Microelectronics
edited by
Qiuting
Huang
Andreas Schenk
Mathieu Maurice Luisier
Bernd
Witzigmann
Michael Andreas Gautschi
Design of Energy-Efficient
Processing Elements for
Near-Threshold Parallel
Computing
2017. XIV, 178 pages. € 64,00
ISBN 978-3-86628-595-8
Short Abstract:
Internet-of-things applications require
energy-efficient, flexible, and low-cost devices to enable more complex,
near-sensor processing. This thesis focuses on the design of processing
elements for such systems targetting a higher energy efficiency and better performance
scalability. By operating integrated circuits near the threshold voltage of
transistors, and by using a parallel architecture, we build an energy-efficient
processing platform consisting of multiple RISC-V processor cores. The platform
offers performance capabilities ranging from a few kOPS
to GOPS and achieves a top energy efficiency of 193 MOP/s per mW when implemented in a 28 nm FD-SOI technology node.
Further, the platform supports single-precision
floating-point operations which are realized through shared execution units in
the multi-core cluster. This not only enables the platform to deliver very
energy-efficient fixed-point operations, but also high dynamic range operations
which will be key to enable more complex nearsensor
processing in IoT systems.
Long Abstract:
Over the last years, the number of internet of things
(IoT) endpoint devices has grown considerably and
this trend is expected to augment even more in the following decade. Such
systems are small, mostly battery powered, and consist
of various sensors, a wireless transmission solution and a micro-controller
unit (MCU). The varying application requirements of such IoT
systems ask for a programmable and scalable solution, which offers performance
capabilities ranging from a few kOp/s to GOp/s. Further, such systems need to be low cost, energy
efficient and consume only very few milliwatts.
Today’s systems often consist of a low-power MCU with limited computing
capabilities which is mainly used for controlling tasks and not for data
processing. Nearsensor data processing on the other
hand, allows for sensor fusion, and feature extraction and can significantly
reduce the number of transmitted bytes. We propose to use a programmable
multi-core system that is scalable in performance and energy efficiency due to
its parallel architecture and the use of near-threshold (NT) operation.
This thesis focuses on the heart of this architecture,
the processing elements (PEs), which can be programmed to execute various
applications in parallel, or to jointly work on one single application. To
reach a higher performance, and a better energy
efficiency, a RISC-V processor architecture has been designed, and extended
with new instructions typically present in more energy-efficient digital signal
processing (DSP) engines. Sensor data of less precision can be processed on
average 2.3× faster through single-instruction multiple-data (SIMD) extensions,
and the integration of the PEs in the multicore platform is optimized through prefetch buffers to reduce cache contentions and
instruction fetch costs.
Further, the feasibility to support high-dynamic-range
(HDR) arithmetic in multi-core clusters is investigated through two number
systems, the logarithmic number system (LNS) format and a traditional IEEE-754
floating point format. The former has been explored because complex operations
such as multiplication, division, and squareroots
transform to simple integer operation in the logarithmic domain and can be
computed very energy efficient. Additions and subtractions translate to
non-linear functions, which can be interpolated in a shared unit. This LNS unit
also allows to process other complex functions like logarithms, and
trigonometric functions allowing this system to process non-linear kernels up to
4.1× more energy-efficient than with traditional floating-point units (FPUs).
Finally, a generalized sharing framework is introduced
which allows to share individual operators of various latencies in a cluster of
multiple PEs. A fine-grained, shared FPU of 63 kGE,
which supports all RISC-V instructions, is integrated in an octa-core
cluster, enabling HDR arithmetic to all cores at diminishing costs. On a
parallel seizure detection application, it is shown that access contentions can
be kept below 2% which allows the shared unit to be scalable in performance
while minimizing the per core area overhead.
Implementing a four-core cluster in an advanced
technology node like 28 nm FD-SOI allows the PEs to achieve a top energy
efficiency of 193 MOp/s per mW,
which is significantly more than commercially available MCUs achieve, but
scalable at the same time, driving the platform ready to serve more complex IoT systems which will require more and more HDR
arithmetic.
About the Author:
Michael Gautschi was born in Zurich, Switzerland, in 1986. He received his BSc and MSc
degrees from ETH Zurich, Switzerland, in 2010 and 2012. In spring 2013, Michael
Gautschi started his PhD in the Digital Circuits and
Systems group of Prof. Dr. Luca Benini with the focus
on energy-efficient computing architectures. His research interests include the
design of very large scale integration circuits and systems, including
processor design, parallel, lowpower and
energy-efficient architectures, and mobile communication.
Direkt bestellen bei / to
order directly from:
Hartung-Gorre Verlag / D-78465
Konstanz / Germany
Telefon: +49
(0) 7533 97227 Telefax: +49 (0) 7533
97228
http://www.hartung-gorre.de eMail: verlag@hartung-gorre.de