This article is part of the Academic Alibaba series and is taken from the paper entitled “Previewer for Multi-Scale Object Detector” by Zhihang Fu, Zhongming Jin, Guo-Jun Qi, Chen Shen, Rongxin Jiang, Yaowu Chen, and Xian-Sheng Hua, accepted by ACM MM 2018. The full paper can be read here.
In the field of object detection, a false positive — an error in which an object or attribute is improperly indicated as present in an image — can adversely impact the overall accuracy of the object detection process. Object detection methods that use convolutional neural networks (CNNs) have improved dramatically in recent years, but they still often fall short when it comes to dealing with images that contain objects of varying sizes.
To improve the detection of objects at different scales and sizes, a number of common CNN-based detectors, including SSD, MS-CNN, and Hierarchical Gated Deep Network, have turned to leveraging different feature layers. The large resolution of low-level features enables small sliding-windows, making it easier to detect smaller objects.
Low-level features are inadequate, however, as they have weak semantic capabilities and small receptive fields. This in turn leads to contextual information — an essential element in object detection, especially for small objects — often being missed. A lack of contextual information typically results in multi-scale detectors performing poorly. What’s more, due to the large resolution of low-level features, the number of small object priors is vast. This means that, in multi-scale detectors, most false positives tend to be found in small priors.
To combat this, the Alibaba team has proposed a novel previewer block that can be embedded easily into any multi-scale detector. The team also formulated a new matching strategy that selects positive and negative training examples for the previewer block. The lightweight previewer block previews objectness probability for the potential regression region of each prior box, using the stronger features with larger receptive fields and more contextual information for better predictions.
Alibaba found that independent predictions from different feature layers on the same region is conducive to reducing the prevalence of false positives. What sets Alibaba’s previewer block apart from the rest is that it is decoupled from the detection task, using deeper feature layers that have sufficiently larger receptive fields to preview whether regions really do have objects in them. The type of false positives shown in Fig. 1 are easily avoided when the broader image is taken into account.
To prove the superiority of their proposed method, the Alibaba team conducted extensive experiments on the Pascal VOC and KITTI benchmarks, with results showing that the previewer block contributed to the steady improvement of all multi-scale detectors tested. Moreover, the previewer block is lightweight and barely has an impact on real-time effects.
Fig. 3 illustrates the significant reduction in small-size false positives after embedding the proposed previewer block in the multi-scale detector.
The figure below visualizes the predictions of objectness and classifications for the same images. It clearly demonstrates how the previewer block suppresses false positives and improves the object detection performance.
Alibaba’s previewer block is generic and can be easily implemented in multi-scale detectors, such as SSD, RFBNet and MS-CNN.
The full paper can be read here.