CNNs with cross-correlation matching for face recognition in video surveillance using a single training sample per person
Title | CNNs with cross-correlation matching for face recognition in video surveillance using a single training sample per person |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Parchami, M., Bashbaghi, S., Granger, E. |
Conference Name | 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) |
Keywords | adaptive weighted cross-correlation, camera distributed network, CCM-CNN parameters, convolution, correlation methods, cross-correlation matching, discriminant face representations, Face, face recognition, face recognition systems, feature extraction, Hadamard matrices, Human Behavior, image matching, image representation, matrix Hadamard product, neural nets, optimisation, Optimization, Pipelines, pubcrawl, Resiliency, Robustness, Scalability, single training sample, still-to-video FR systems, Training, triplet-loss optimization methods, video surveillance |
Abstract | In video surveillance, face recognition (FR) systems seek to detect individuals of interest appearing over a distributed network of cameras. Still-to-video FR systems match faces captured in videos under challenging conditions against facial models, often designed using one reference still per individual. Although CNNs can achieve among the highest levels of accuracy in many real-world FR applications, state-of-the-art CNNs that are suitable for still-to-video FR, like trunk-branch ensemble (TBE) CNNs, represent complex solutions for real-time applications. In this paper, an efficient CNN architecture is proposed for accurate still-to-video FR from a single reference still. The CCM-CNN is based on new cross-correlation matching (CCM) and triplet-loss optimization methods that provide discriminant face representations. The matching pipeline exploits a matrix Hadamard product followed by a fully connected layer inspired by adaptive weighted cross-correlation. A triplet-based training approach is proposed to optimize the CCM-CNN parameters such that the inter-class variations are increased, while enhancing robustness to intra-class variations. To further improve robustness, the network is fine-tuned using synthetically-generated faces based on still and videos of non-target individuals. Experiments on videos from the COX Face and Chokepoint datasets indicate that the CCM-CNN can achieve a high level of accuracy that is comparable to TBE-CNN and HaarNet, but with a significantly lower time and memory complexity. It may therefore represent the better trade-off between accuracy and complexity for real-time video surveillance applications. |
URL | https://ieeexplore.ieee.org/document/8078554/ |
DOI | 10.1109/AVSS.2017.8078554 |
Citation Key | parchami_cnns_2017 |
- image representation
- video surveillance
- triplet-loss optimization methods
- Training
- still-to-video FR systems
- single training sample
- Scalability
- Robustness
- Resiliency
- pubcrawl
- Pipelines
- optimization
- optimisation
- neural nets
- matrix Hadamard product
- adaptive weighted cross-correlation
- image matching
- Human behavior
- Hadamard matrices
- feature extraction
- face recognition systems
- face recognition
- Face
- discriminant face representations
- cross-correlation matching
- correlation methods
- convolution
- CCM-CNN parameters
- camera distributed network