Hartung-Gorre Verlag

Inh.: Dr. Renate Gorre

D-78465 Konstanz

Fon: +49 (0)7533 97227

Fax: +49 (0)7533 97228

www.hartung-gorre.de

S

Series in Microelectronics

edited by Qiuting Huang

Andreas Schenk

Mathieu Maurice Luisier

Bernd Witzigmann

Vol. 231

Michael Andreas Gautschi

Design of Energy-Efficient

Processing Elements for

Near-Threshold Parallel

Computing

2017. XIV, 178 pages. € 64,00

ISBN 978-3-86628-595-8

Contents

Short Abstract:

Internet-of-things applications require energy-efficient, flexible, and low-cost devices to enable more complex, near-sensor processing. This thesis focuses on the design of processing elements for such systems targetting a higher energy efficiency and better performance scalability. By operating integrated circuits near the threshold voltage of transistors, and by using a parallel architecture, we build an energy-efficient processing platform consisting of multiple RISC-V processor cores. The platform offers performance capabilities ranging from a few kOPS to GOPS and achieves a top energy efficiency of 193 MOP/s per mW when implemented in a 28 nm FD-SOI technology node.

Further, the platform supports single-precision floating-point operations which are realized through shared execution units in the multi-core cluster. This not only enables the platform to deliver very energy-efficient fixed-point operations, but also high dynamic range operations which will be key to enable more complex nearsensor processing in IoT systems.

Long Abstract:

Over the last years, the number of internet of things (IoT) endpoint devices has grown considerably and this trend is expected to augment even more in the following decade. Such systems are small, mostly battery powered, and consist of various sensors, a wireless transmission solution and a micro-controller unit (MCU). The varying application requirements of such IoT systems ask for a programmable and scalable solution, which offers performance capabilities ranging from a few kOp/s to GOp/s. Further, such systems need to be low cost, energy efficient and consume only very few milliwatts. Today’s systems often consist of a low-power MCU with limited computing capabilities which is mainly used for controlling tasks and not for data processing. Nearsensor data processing on the other hand, allows for sensor fusion, and feature extraction and can significantly reduce the number of transmitted bytes. We propose to use a programmable multi-core system that is scalable in performance and energy efficiency due to its parallel architecture and the use of near-threshold (NT) operation.

This thesis focuses on the heart of this architecture, the processing elements (PEs), which can be programmed to execute various applications in parallel, or to jointly work on one single application. To reach a higher performance, and a better energy efficiency, a RISC-V processor architecture has been designed, and extended with new instructions typically present in more energy-efficient digital signal processing (DSP) engines. Sensor data of less precision can be processed on average 2.3× faster through single-instruction multiple-data (SIMD) extensions, and the integration of the PEs in the multicore platform is optimized through prefetch buffers to reduce cache contentions and instruction fetch costs.

Further, the feasibility to support high-dynamic-range (HDR) arithmetic in multi-core clusters is investigated through two number systems, the logarithmic number system (LNS) format and a traditional IEEE-754 floating point format. The former has been explored because complex operations such as multiplication, division, and squareroots transform to simple integer operation in the logarithmic domain and can be computed very energy efficient. Additions and subtractions translate to non-linear functions, which can be interpolated in a shared unit. This LNS unit also allows to process other complex functions like logarithms, and trigonometric functions allowing this system to process non-linear kernels up to 4.1× more energy-efficient than with traditional floating-point units (FPUs).

Finally, a generalized sharing framework is introduced which allows to share individual operators of various latencies in a cluster of multiple PEs. A fine-grained, shared FPU of 63 kGE, which supports all RISC-V instructions, is integrated in an octa-core cluster, enabling HDR arithmetic to all cores at diminishing costs. On a parallel seizure detection application, it is shown that access contentions can be kept below 2% which allows the shared unit to be scalable in performance while minimizing the per core area overhead.

Implementing a four-core cluster in an advanced technology node like 28 nm FD-SOI allows the PEs to achieve a top energy efficiency of 193 MOp/s per mW, which is significantly more than commercially available MCUs achieve, but scalable at the same time, driving the platform ready to serve more complex IoT systems which will require more and more HDR arithmetic.

About the Author:

Michael Gautschi was born in Zurich, Switzerland, in 1986. He received his BSc and MSc degrees from ETH Zurich, Switzerland, in 2010 and 2012. In spring 2013, Michael Gautschi started his PhD in the Digital Circuits and Systems group of Prof. Dr. Luca Benini with the focus on energy-efficient computing architectures. His research interests include the design of very large scale integration circuits and systems, including processor design, parallel, lowpower and energy-efficient architectures, and mobile communication.

Series in Microelectronics

Direkt bestellen bei / to order directly from:

Hartung-Gorre Verlag / D-78465 Konstanz / Germany

Telefon: +49 (0) 7533 97227 Telefax: +49 (0) 7533 97228
http://www.hartung-gorre.de eMail: verlag@hartung-gorre.de