Series in Signal and Information Processing, Vol. 14
edited by Hans-Andrea Loeliger

Markus Hofbauer,
Optimal Linear Separation and Deconvolution
of Acoustical Convolutive Mixtures.
1. Auflage/1st edition 2005, XX, 182 Seiten/pages, 64,00. ISBN 3-89649-996-3

This thesis addresses the problem of the optimal inversion of a linear acoustical convolutive mixing process by means of multichannel linear filtering. In most real-world acoustical scenarios, a number of sound emitting sources are encountered, which may be simultaneously active. When perceiving the sound of these sources by direct listening or from microphone recordings, the original undistorted signal of a single source is not accessible, but rather a mixture of the superposed sources. Furthermore, the source signals are reverberated due to multipath propagation. Propagation and mixing of the sources is characterized by a convolutive mixing process and can be completely described by a matrix of acoustical impulse-responses (AIRs).
Reverberation, the superposition of several sources, and additive background noise account for a reduced speech intelligibility in case of speech sources, and for a reduced sound fidelity in general. Several multichannel algorithms exist which aim at a separation and deconvolution (dereverberation) of the sources. The most prevalent linear multichannel filtering techniques are beamforming and blind algorithms. In adverse and highly reverberant environments the performance of these methods is limited, and it is not clear, whether the limitations arise from the particular linear algorithm, or if the setup and physical environment fundamentally limits the performance of any linear filtering method.
A theoretical and practical `best-case' performance analysis for linear multichannel filtering methods in the least-squares optimal sense is presented in this thesis. The term `best-case' implies that the convolutive mixing process is known, i.e., the matrix of AIRs are given. Insights gained by the analysis may serve as an upper bound for any practical linear filtering algorithm with less knowledge.
AIRs in real-world environments are complex in the sense that they are typically non-minimum phase, with lengths of thousands of taps. A direct single-channel inversion of an AIR requires a non-causal filter of infinite length (IIR) and is infeasible. Hence, it is not apparent that effective inverse filters of finite length (FIR) can be found for real-world acoustical convolutive mixing systems, and be reasonably applied.
In this thesis it is demonstrated that in the multichannel case and \emph{under certain conditions} a perfect separation and deconvolution is achievable with FIR filters, in theory and practice, even in adverse environments. In the more applicable general case, where these conditions do not apply, a least-squares solution still yields a significant source enhancement.
A versatile theoretical framework for a `best-case' analysis is developed to determine the filters which yield a least-squares optimal inverse of the convolutive mixing process. Each source is embedded in a block-Toeplitz matrix equation (BTME) according to its propagation model. A general weighing function allows to control the filtering task: The BTME can be arbitrarily weighted to specify confined problems, and to influence the least-squares solution as desired. Lower bounds for the FIR-filter lengths and conditions are derived that guarantee an exact deconvolution and separation, or either one at a time.
A measurement system is established, which allows the measurement of the AIRs and the assessment of the `best-case' performance with a maximum of eight sources and eight sensors, and background noise. The insights of the theoretical analysis are confirmed by real-world experiments in representative environments -- a quiet office, a noisy cafeteria, and a highly reverberant hallway. It is demonstrated that inverse filtering is indeed possible in adverse real-world conditions. Dependencies on important eligible parameters as the system latency and filter length are analyzed. An AIR sensitivity analysis shows the importance of an accurate AIR estimation.
The presented `best-case' analysis framework and measurement system may be utilized when designing an application or multichannel algorithm: For a particular setup and environment, parameters can be optimized and results assessed in listening tests. Finally, the analysis framework reveals the complexity of the convolutive mixing process in a particular environment, and imparts a deeper understanding thereof.

Markus Hofbauer was born in Basel, Switzerland, on July 10, 1972. After the compulsory education in Peru and Germany, he attended the Gymnasium in Weil am Rhein, Germany, where he obtained the Abitur. He spent an interim high-school year in Tacoma, WA, USA and received the high-school Diploma. In 1993 he joined the Swiss Federal Institute of Technology Zurich (ETH) to study electrical engineering and graduated with a Diploma degree in electrical engineering in 1998. Subsequently, he started as a research and teaching assistant at the Signal and Information Processing Laboratory (ISI) at ETH. In parallel with his work on his Ph.D., he was an appointed lecturer of the graduate course `Adaptive Filters' at ETH from 1999-2004. He is co-author of the textbook `Adaptive Filter'. In March 2005 he completed the thesis entitled `Optimal Linear Separation and Deconvolution of Acoustical Convolutive Mixtures' and received the Ph.D. degree from ETH. In May 2005 he joined SIEMENS Zurich as an R&D engineer in a technological consulting team.

Keywords:
Acoustical convolutive mixtures, acoustic impulse-response, beamforming, blind source separation, block-Toeplitz matrix, deconvolution, dereverberation, multichannel filtering, noise suppression, polynomial matrix, optimal filtering, room acoustics, speech enhancement, Sylvester matrix.

Direkt bestellen bei / to order directly from: Hartung.Gorre@t-online.de

Reihe "Series in Signal and Information Processing" im Hartung-Gorre Verlag