Josef Nemecek

Computer-Aided Management of Commodity Parts-Based Supercomputers

edited by Wolfgang Fichtner, Qiuting Huang, Heinz Jäckel, Hans Melchior, George S. Moschytz and Gerhard Tröster

Assuming the current trends in supercomputing continue, the supercomputers of the future will be large clusters, containing millions of off-the-shelf workstations. These «superclusters» will use supercomputing-specific technology to make the pile of hardware and software work together as one supercomputer, even though all products were originally designed to be used individually and not as a part of a supercomputing system. This dissertation presents one technology that is necessary in order to be able to use the supercluster as one entity: Comprehensive and integrated management software. It is the first complete research in its field.

This thesis presents a concept for managing superclusters of various sizes, and some blueprints of management architectures, together with a guide that allows the selection of the optimal architecture depending on system size and user requirements. One of the presented architectures has been implemented in the first Swiss-based supercluster «Swiss-T1», installed at the supercomputing center CAPA of the EPFL. This first implementation (called «COSMOS») is presented in detail, together with the project «Swiss-Tx» that this thesis was part of. The goal of this project was to build, develop and install a series of superclusters with 1 TFLOPS performance in Switzerland.

About the author

Josef Nemecek holds a Dipl. Ing. (M.S.) degree in computer sciences from the Swiss Federal Institute of Technology in Zurich (ETH Zurich), Switzerland. He joined the Electronics Laboratory of the ETH in spring 1997 as a research and teaching assistant and has participated in several supercomputing projects. His main work is the specification, design and implementation of software that allows for integrated and comprehensive system management of commodity parts-based supercomputers. In 2005, he was awarded the Dr. sc. techn. (Ph.D.) degree by the ETH Zurich.

Keywords: Commodity, Supercomputing, Cluster Computing, System Management, Integration, Scalability, Availability.

