An Empirical Analysis of Visual Features for Multiple Object Tracking in Urban Scenes

International Conference on Pattern Recognition (ICPR 2020)
@inproceedings{miah2020empirical,
    author = {Miah, Mehdi and Pepin, Justine and Saunier, Nicolas and Bilodeau, Guillaume-Alexandre},
    title = {An {Empirical} {Analysis} of {Visual} {Features} for {Multiple} {Object} {Tracking} in {Urban} {Scenes}},
    year = {2020},
    booktitle = {International {Conference} on {Pattern} {Recognition} ({ICPR})}
}

Mehdi Miah, Justine Pepin, Nicolas Saunier, Guillaume-Alexandre Bilodeau

Abstract This paper addresses the problem of selecting appearance features for multiple object tracking (MOT) in urban scenes. Over the years, a large number of features has been used for MOT. However, it is not clear whether some of them are better than others. Commonly used features are color histograms, histograms of oriented gradients, deep features from convolutional neural networks and re-identification (ReID) features. In this study, we assess how good these features are at discriminating objects enclosed by a bounding box in urban scene tracking scenarios. Several affinity measures, namely the L1, L2 and the Bhattacharyya distances, Rank-1 counts and the cosine similarity, are also assessed for their impact on the discriminative power of the features. Results on several datasets show that features from ReID networks are the best for discriminating instances from one another regardless of the quality of the detector. If a ReID model is not available, color histograms may be selected if the detector has a good recall and there are few occlusions; otherwise, deep features are more robust to detectors with lower recall.
Paper Video Poster
Acknowledgements
We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [CRDPJ 528786 - 18], [DG 2017-06115] and the support of Arcturus Networks.