This article is part of the Academic Alibaba series and is taken from the paper entitled “Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification” by Chen Shen, Zhongmin Jin, Yiru Zhao, Zhihuang Fu, Rongxin Jiang, Yaowu Chen, and Xian-Sheng Hua, accepted by ACM 2017. The full paper can be read here.
Since its inception, facial recognition has proven to be an extremely useful technology, gaining wide application across the security industry and a host of other commercial contexts. But for all its perks, Facial recognition is not without its limitations — the biggest of these being processing images that do not contain a recognizable face.
Doing so requires training facial recognition systems to move beyond the analysis of basic facial traits and learn to incorporate image recognition techniques. A proposed solution is pedestrian re-identification.
Pedestrian re-identification refers to the process of distinguishing between different individuals captured on the same camera based on image characteristics, isolating a single individual, then finding the isolated individual on a different camera. Pedestrian re-ID technology has important scientific and practical applications in transportation, security, and other fields significant to the creation of cities that are both safe and smart. While some claim facial recognition technology is fully mature, faces can still be difficult to identify in complex real-world scenarios due to issues like low resolution, partial occlusion, differing recording angles, and other circumstances. Therefore, using characteristics obtained from images of the subject’s entire body becomes necessary to make a positive identification.
The Alibaba tech team, in collaboration with Zhejiang University, has proposed a new multi-level similarity perception CNN-based pedestrian recognition method. Debuting at the world’s leading multimedia conference, ACM MM, in 2017, this network efficiently learns discriminative feature representation for significantly improved identification accuracy.
Teaching a Computer to See the Whole Picture
To create a system for pedestrian re-ID that outperforms existing deep learning frameworks, researchers taught a computer to pick up on discriminative features such as shapes and patterns on clothing and to understand how these features work on a 3-dimensional plane. The network was also taught to extract these features offline in order to make the technology suitable for large-scale, real-world applications.
During the network training stage, researchers employed a deep Siamese model to apply similarity constraints to corresponding feature maps. Using image pairs as input, images are processed through the same share-weighted deep CNN network. With the guidance of a multi-level similarity perception mechanism, the network learns to pick up on both discriminative local semantic features and abstract global features on low- and high-level layers, respectively. A loss function is used to reinforce cross-correlation scores of areas from positive pairs while weakening those from negative pairs. Certain shared semantic patterns between positive and negative images, such as hair color, are disregarded.
More Than Just a Face in the Crowd
To demonstrate the model’s effectiveness in the field, researchers took to two large datasets and employed cumulative matching characteristics (CMC) top-k accuracy to evaluate the MSP-CNN model against several industry-standard identification methods. Results showed that the MSP-CNN model outperformed traditional approaches, with an improvement in CMC rank-1 accuracy by 9.3% for labeled discriminative groups and 16% for detected discriminative groups.
While the bounding boxes generated from pedestrian images were not as ideal as manually-annotated ones, the overall results were promising. Researchers hope to further optimize pedestrian re-ID technology in to create safer, smarter city streets.
The full paper can be read here.