CNNs with cross-correlation matching for face recognition in video surveillance using a single training sample per person

Submitted by K_Hooper on Wed, 04/04/2018 - 9:56am

Title	CNNs with cross-correlation matching for face recognition in video surveillance using a single training sample per person
Publication Type	Conference Paper
Year of Publication	2017
Authors	Parchami, M., Bashbaghi, S., Granger, E.
Conference Name	2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
Keywords	adaptive weighted cross-correlation, camera distributed network, CCM-CNN parameters, convolution, correlation methods, cross-correlation matching, discriminant face representations, Face, face recognition, face recognition systems, feature extraction, Hadamard matrices, Human Behavior, image matching, image representation, matrix Hadamard product, neural nets, optimisation, Optimization, Pipelines, pubcrawl, Resiliency, Robustness, Scalability, single training sample, still-to-video FR systems, Training, triplet-loss optimization methods, video surveillance
Abstract	In video surveillance, face recognition (FR) systems seek to detect individuals of interest appearing over a distributed network of cameras. Still-to-video FR systems match faces captured in videos under challenging conditions against facial models, often designed using one reference still per individual. Although CNNs can achieve among the highest levels of accuracy in many real-world FR applications, state-of-the-art CNNs that are suitable for still-to-video FR, like trunk-branch ensemble (TBE) CNNs, represent complex solutions for real-time applications. In this paper, an efficient CNN architecture is proposed for accurate still-to-video FR from a single reference still. The CCM-CNN is based on new cross-correlation matching (CCM) and triplet-loss optimization methods that provide discriminant face representations. The matching pipeline exploits a matrix Hadamard product followed by a fully connected layer inspired by adaptive weighted cross-correlation. A triplet-based training approach is proposed to optimize the CCM-CNN parameters such that the inter-class variations are increased, while enhancing robustness to intra-class variations. To further improve robustness, the network is fine-tuned using synthetically-generated faces based on still and videos of non-target individuals. Experiments on videos from the COX Face and Chokepoint datasets indicate that the CCM-CNN can achieve a high level of accuracy that is comparable to TBE-CNN and HaarNet, but with a significantly lower time and memory complexity. It may therefore represent the better trade-off between accuracy and complexity for real-time video surveillance applications.
URL	https://ieeexplore.ieee.org/document/8078554/
DOI	10.1109/AVSS.2017.8078554
Citation Key	parchami_cnns_2017

Groups:

Science of Security VO