Selected Readings in Vision and Graphics
edited by Luc Van Gool, Gábor Székely, Markus Gross, Bernt Schiele

Volume 46

Edgar Seemann

Pedestrian Detection in Crowded Street Scenes.

First edition 2007, X, 154 pages, € 64,00. ISBN 3-86628-173-0

This thesis is concerned with the challenging task of pedestrian detection in realworld environments. That is, the aim is to successfully count and localize persons and pedestrians in still images despite the presence of background clutter or partial occlusions.

Even though pedestrian detection has many practical applications and has been an active area of research for many years, it has not been until recently that recognition algorithms have become robust enough to deal with scenes of realistic complexity. This thesis presents algorithms and algorithmic extensions, which further enhance detection robustness compared to existing state-of-the-art approaches. The basis of the pedestrian detection system proposed in this thesis is a general object categorization approach, which has been successful in the detection of rigid object categories such as cars or motorbikes. Persons and pedestrians, however, are not rigid and their appearance changes greatly depending on the body articulation or pose. The variety of textures and colors in clothing and accessories adds further difficulties. Therefore, we develop a number of algorithms, which are able to successfully deal with these appearance changes.

The general object categorization model, which is used in this thesis, has a highly flexible implicit shape representation. It is based on a visual vocabulary of small object parts and aggregates evidence from local image descriptors. Due to the variety of pedestrian shapes and appearances observed in images, the aggregation of local information alone is often not discriminative enough. We therefore propose algorithms, which combine local with global information. For example, we explicitly learn possible pedestrian articulations from training examples and show, how this information can be valuable to make detection hypotheses more globally consistent. Efficient learning algorithms make it possible to learn these articulations and their associated appearances from relatively few training examples. Furthermore, we conduct a thorough evaluation of shape-based features, which compares the generalization abilities of various local image descriptors. Our findings support, that edge-based or gradient-based descriptions can yield significant better detection results, than descriptions based on gray values. Finally, this thesis makes a first step towards robust pedestrian detection in sequences of images. We propose an algorithm, which is able to learn instance-specific pedestrian models, based on initial detections of a general pedestrian model. Thus, it is possible to follow pedestrians even through longer periods of occlusion.

This thesis puts particular emphasis on pedestrian detection in crowded scenes, where people may heavily overlap or be partially occluded. This is reflected in both our test sets and evaluation criteria. The reported quantitative detection results underline, that the developed algorithms can robustly detect pedestrians in this scenario.


Keywords: Pedestrian Detection, Crowded Scenes, Implicit Shape Model, Articulation Estimation, Cross-Articulation Learning, Computer Vision, Object Categorization

Reihe " Selected Readings in Vision and Graphics " im Hartung-Gorre Verlag

Direkt bestellen bei / to order directly from: