Visible to the public Biblio

Filters: Keyword is Computer vision  [Clear All Filters]
2020-10-12
Marrone, Stefano, Sansone, Carlo.  2019.  An Adversarial Perturbation Approach Against CNN-based Soft Biometrics Detection. 2019 International Joint Conference on Neural Networks (IJCNN). :1–8.
The use of biometric-based authentication systems spread over daily life consumer electronics. Over the years, researchers' interest shifted from hard (such as fingerprints, voice and keystroke dynamics) to soft biometrics (such as age, ethnicity and gender), mainly by using the latter to improve the authentication systems effectiveness. While newer approaches are constantly being proposed by domain experts, in the last years Deep Learning has raised in many computer vision tasks, also becoming the current state-of-art for several biometric approaches. However, since the automatic processing of data rich in sensitive information could expose users to privacy threats associated to their unfair use (i.e. gender or ethnicity), in the last years researchers started to focus on the development of defensive strategies in the view of a more secure and private AI. The aim of this work is to exploit Adversarial Perturbation, namely approaches able to mislead state-of-the-art CNNs by injecting a suitable small perturbation over the input image, to protect subjects against unwanted soft biometrics-based identification by automatic means. In particular, since ethnicity is one of the most critical soft biometrics, as a case of study we will focus on the generation of adversarial stickers that, once printed, can hide subjects ethnicity in a real-world scenario.
2020-10-05
Gamba, Matteo, Azizpour, Hossein, Carlsson, Stefan, Björkman, Mårten.  2019.  On the Geometry of Rectifier Convolutional Neural Networks. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). :793—797.

While recent studies have shed light on the expressivity, complexity and compositionality of convolutional networks, the real inductive bias of the family of functions reachable by gradient descent on natural data is still unknown. By exploiting symmetries in the preactivation space of convolutional layers, we present preliminary empirical evidence of regularities in the preimage of trained rectifier networks, in terms of arrangements of polytopes, and relate it to the nonlinear transformations applied by the network to its input.

2020-07-03
Kakadiya, Rutvik, Lemos, Reuel, Mangalan, Sebin, Pillai, Meghna, Nikam, Sneha.  2019.  AI Based Automatic Robbery/Theft Detection using Smart Surveillance in Banks. 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA). :201—204.

Deep learning is the segment of artificial intelligence which is involved with imitating the learning approach that human beings utilize to get some different types of knowledge. Analyzing videos, a part of deep learning is one of the most basic problems of computer vision and multi-media content analysis for at least 20 years. The job is very challenging as the video contains a lot of information with large differences and difficulties. Human supervision is still required in all surveillance systems. New advancement in computer vision which are observed as an important trend in video surveillance leads to dramatic efficiency gains. We propose a CCTV based theft detection along with tracking of thieves. We use image processing to detect theft and motion of thieves in CCTV footage, without the use of sensors. This system concentrates on object detection. The security personnel can be notified about the suspicious individual committing burglary using Real-time analysis of the movement of any human from CCTV footage and thus gives a chance to avert the same.

Feng, Ri-Chen, Lin, Daw-Tung, Chen, Ken-Min, Lin, Yi-Yao, Liu, Chin-De.  2019.  Improving Deep Learning by Incorporating Semi-automatic Moving Object Annotation and Filtering for Vision-based Vehicle Detection*. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :2484—2489.

Deep learning has undergone tremendous advancements in computer vision studies. The training of deep learning neural networks depends on a considerable amount of ground truth datasets. However, labeling ground truth data is a labor-intensive task, particularly for large-volume video analytics applications such as video surveillance and vehicles detection for autonomous driving. This paper presents a rapid and accurate method for associative searching in big image data obtained from security monitoring systems. We developed a semi-automatic moving object annotation method for improving deep learning models. The proposed method comprises three stages, namely automatic foreground object extraction, object annotation in subsequent video frames, and dataset construction using human-in-the-loop quick selection. Furthermore, the proposed method expedites dataset collection and ground truth annotation processes. In contrast to data augmentation and data generative models, the proposed method produces a large amount of real data, which may facilitate training results and avoid adverse effects engendered by artifactual data. We applied the constructed annotation dataset to train a deep learning you-only-look-once (YOLO) model to perform vehicle detection on street intersection surveillance videos. Experimental results demonstrated that the accurate detection performance was improved from a mean average precision (mAP) of 83.99 to 88.03.

Bashir, Muzammil, Rundensteiner, Elke A., Ahsan, Ramoza.  2019.  A deep learning approach to trespassing detection using video surveillance data. 2019 IEEE International Conference on Big Data (Big Data). :3535—3544.
Railroad trespassing is a dangerous activity with significant security and safety risks. However, regular patrolling of potential trespassing sites is infeasible due to exceedingly high resource demands and personnel costs. This raises the need to design automated trespass detection and early warning prediction techniques leveraging state-of-the-art machine learning. To meet this need, we propose a novel framework for Automated Railroad Trespassing detection System using video surveillance data called ARTS. As the core of our solution, we adopt a CNN-based deep learning architecture capable of video processing. However, these deep learning-based methods, while effective, are known to be computationally expensive and time consuming, especially when applied to a large volume of surveillance data. Leveraging the sparsity of railroad trespassing activity, ARTS corresponds to a dual-stage deep learning architecture composed of an inexpensive pre-filtering stage for activity detection, followed by a high fidelity trespass classification stage employing deep neural network. The resulting dual-stage ARTS architecture represents a flexible solution capable of trading-off accuracy with computational time. We demonstrate the efficacy of our approach on public domain surveillance data achieving 0.87 f1 score while keeping up with the enormous video volume, achieving a practical time and accuracy trade-off.
Adari, Suman Kalyan, Garcia, Washington, Butler, Kevin.  2019.  Adversarial Video Captioning. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). :24—27.
In recent years, developments in the field of computer vision have allowed deep learning-based techniques to surpass human-level performance. However, these advances have also culminated in the advent of adversarial machine learning techniques, capable of launching targeted image captioning attacks that easily fool deep learning models. Although attacks in the image domain are well studied, little work has been done in the video domain. In this paper, we show it is possible to extend prior attacks in the image domain to the video captioning task, without heavily affecting the video's playback quality. We demonstrate our attack against a state-of-the-art video captioning model, by extending a prior image captioning attack known as Show and Fool. To the best of our knowledge, this is the first successful method for targeted attacks against a video captioning model, which is able to inject 'subliminal' perturbations into the video stream, and force the model to output a chosen caption with up to 0.981 cosine similarity, achieving near-perfect similarity to chosen target captions.
2020-06-19
Keshari, Tanya, Palaniswamy, Suja.  2019.  Emotion Recognition Using Feature-level Fusion of Facial Expressions and Body Gestures. 2019 International Conference on Communication and Electronics Systems (ICCES). :1184—1189.

Automatic emotion recognition using computer vision is significant for many real-world applications like photojournalism, virtual reality, sign language recognition, and Human Robot Interaction (HRI) etc., Psychological research findings advocate that humans depend on the collective visual conduits of face and body to comprehend human emotional behaviour. Plethora of studies have been done to analyse human emotions using facial expressions, EEG signals and speech etc., Most of the work done was based on single modality. Our objective is to efficiently integrate emotions recognized from facial expressions and upper body pose of humans using images. Our work on bimodal emotion recognition provides the benefits of the accuracy of both the modalities.

Wang, Si, Liu, Wenye, Chang, Chip-Hong.  2019.  Detecting Adversarial Examples for Deep Neural Networks via Layer Directed Discriminative Noise Injection. 2019 Asian Hardware Oriented Security and Trust Symposium (AsianHOST). :1—6.

Deep learning is a popular powerful machine learning solution to the computer vision tasks. The most criticized vulnerability of deep learning is its poor tolerance towards adversarial images obtained by deliberately adding imperceptibly small perturbations to the clean inputs. Such negatives can delude a classifier into wrong decision making. Previous defensive techniques mostly focused on refining the models or input transformation. They are either implemented only with small datasets or shown to have limited success. Furthermore, they are rarely scrutinized from the hardware perspective despite Artificial Intelligence (AI) on a chip is a roadmap for embedded intelligence everywhere. In this paper we propose a new discriminative noise injection strategy to adaptively select a few dominant layers and progressively discriminate adversarial from benign inputs. This is made possible by evaluating the differences in label change rate from both adversarial and natural images by injecting different amount of noise into the weights of individual layers in the model. The approach is evaluated on the ImageNet Dataset with 8-bit truncated models for the state-of-the-art DNN architectures. The results show a high detection rate of up to 88.00% with only approximately 5% of false positive rate for MobileNet. Both detection rate and false positive rate have been improved well above existing advanced defenses against the most practical noninvasive universal perturbation attack on deep learning based AI chip.

2020-04-13
Shahbaz, Ajmal, Hoang, Van-Thanh, Jo, Kang-Hyun.  2019.  Convolutional Neural Network based Foreground Segmentation for Video Surveillance Systems. IECON 2019 - 45th Annual Conference of the IEEE Industrial Electronics Society. 1:86–89.
Convolutional Neural Networks (CNN) have shown astonishing results in the field of computer vision. This paper proposes a foreground segmentation algorithm based on CNN to tackle the practical challenges in the video surveillance system such as illumination changes, dynamic backgrounds, camouflage, and static foreground object, etc. The network is trained using the input of image sequences with respective ground-truth. The algorithm employs a CNN called VGG-16 to extract features from the input. The extracted feature maps are upsampled using a bilinear interpolation. The upsampled feature mask is passed through a sigmoid function and threshold to get the foreground mask. Binary cross entropy is used as the error function to compare the constructed foreground mask with the ground truth. The proposed algorithm was tested on two standard datasets and showed superior performance as compared to the top-ranked foreground segmentation methods.
2020-02-18
Han, Chihye, Yoon, Wonjun, Kwon, Gihyun, Kim, Daeshik, Nam, Seungkyu.  2019.  Representation of White- and Black-Box Adversarial Examples in Deep Neural Networks and Humans: A Functional Magnetic Resonance Imaging Study. 2019 International Joint Conference on Neural Networks (IJCNN). :1–8.

The recent success of brain-inspired deep neural networks (DNNs) in solving complex, high-level visual tasks has led to rising expectations for their potential to match the human visual system. However, DNNs exhibit idiosyncrasies that suggest their visual representation and processing might be substantially different from human vision. One limitation of DNNs is that they are vulnerable to adversarial examples, input images on which subtle, carefully designed noises are added to fool a machine classifier. The robustness of the human visual system against adversarial examples is potentially of great importance as it could uncover a key mechanistic feature that machine vision is yet to incorporate. In this study, we compare the visual representations of white- and black-box adversarial examples in DNNs and humans by leveraging functional magnetic resonance imaging (fMRI). We find a small but significant difference in representation patterns for different (i.e. white- versus black-box) types of adversarial examples for both humans and DNNs. However, human performance on categorical judgment is not degraded by noise regardless of the type unlike DNN. These results suggest that adversarial examples may be differentially represented in the human visual system, but unable to affect the perceptual experience.

2019-09-04
Liang, J., Jiang, L., Cao, L., Li, L., Hauptmann, A..  2018.  Focal Visual-Text Attention for Visual Question Answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. :6135–6143.
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset.
2019-08-12
Eetha, S., Agrawal, S., Neelam, S..  2018.  Zynq FPGA Based System Design for Video Surveillance with Sobel Edge Detection. 2018 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS). :76–79.

Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.

2018-12-03
Sharifara, Ali, Rahim, Mohd Shafry Mohd, Navabifar, Farhad, Ebert, Dylan, Ghaderi, Amir, Papakostas, Michalis.  2017.  Enhanced Facial Recognition Framework Based on Skin Tone and False Alarm Rejection. Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments. :240–241.

Human face detection plays an essential role in the first stage of face processing applications. In this study, an enhanced face detection framework is proposed to improve detection rate based on skin color and provide a validation process. A preliminary segmentation of the input images based on skin color can significantly reduce search space and accelerate the process of human face detection. The primary detection is based on Haar-like features and the Adaboost algorithm. A validation process is introduced to reject non-face objects, which might occur during the face detection process. The validation process is based on two-stage Extended Local Binary Patterns. The experimental results on the CMU-MIT and Caltech 10000 datasets over a wide range of facial variations in different colors, positions, scales, and lighting conditions indicated a successful face detection rate.

2018-11-19
Shinya, A., Tung, N. D., Harada, T., Thawonmas, R..  2017.  Object-Specific Style Transfer Based on Feature Map Selection Using CNNs. 2017 Nicograph International (NicoInt). :88–88.

We propose a method for transferring an arbitrary style to only a specific object in an image. Style transfer is the process of combining the content of an image and the style of another image into a new image. Our results show that the proposed method can realize style transfer to specific object.

2018-06-20
Kosmidis, Konstantinos, Kalloniatis, Christos.  2017.  Machine Learning and Images for Malware Detection and Classification. Proceedings of the 21st Pan-Hellenic Conference on Informatics. :5:1–5:6.

Detecting malicious code with exact match on collected datasets is becoming a large-scale identification problem due to the existence of new malware variants. Being able to promptly and accurately identify new attacks enables security experts to respond effectively. My proposal is to develop an automated framework for identification of unknown vulnerabilities by leveraging current neural network techniques. This has a significant and immediate value for the security field, as current anti-virus software is typically able to recognize the malware type only after its infection, and preventive measures are limited. Artificial Intelligence plays a major role in automatic malware classification: numerous machine-learning methods, both supervised and unsupervised, have been researched to try classifying malware into families based on features acquired by static and dynamic analysis. The value of automated identification is clear, as feature engineering is both a time-consuming and time-sensitive task, with new malware studied while being observed in the wild.

2018-06-11
Moskewicz, Matthew W., Jannesari, Ali, Keutzer, Kurt.  2017.  Boda: A Holistic Approach for Implementing Neural Network Computations. Proceedings of the Computing Frontiers Conference. :53–62.
Neural networks (NNs) are currently a very popular topic in machine learning for both research and practice. GPUs are the dominant computing platform for research efforts and are also gaining popularity as a deployment platform for applications such as autonomous vehicles. As a result, GPU vendors such as NVIDIA have spent enormous effort to write special-purpose NN libraries. On other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. Thus, the development of portable, open, high-performance, energy-efficient GPU code for NN operations would enable broader deployment of NN-based algorithms. A root problem is that high efficiency GPU programming suffers from high complexity, low productivity, and low portability. To address this, this work presents a framework to enable productive, high-efficiency GPU programming for NN computations across hardware platforms and programming models. In particular, the framework provides specific support for metaprogramming and autotuning of operations over ND-Arrays. To show the correctness and value of our framework and approach, we implement a selection of NN operations, covering the core operations needed for deploying three common image-processing neural networks. We target three different hardware platforms: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA GPUs, we show both portability between OpenCL and CUDA as well competitive performance compared to the vendor library. On Qualcomm GPUs, we show that our framework enables productive development of target-specific optimizations, and achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial results that indicate our framework can yield reasonable performance on a new platform with minimal effort.
2018-04-04
Gajjar, V., Khandhediya, Y., Gurnani, A..  2017.  Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). :2805–2809.

With crimes on the rise all around the world, video surveillance is becoming more important day by day. Due to the lack of human resources to monitor this increasing number of cameras manually, new computer vision algorithms to perform lower and higher level tasks are being developed. We have developed a new method incorporating the most acclaimed Histograms of Oriented Gradients, the theory of Visual Saliency and the saliency prediction model Deep Multi-Level Network to detect human beings in video sequences. Furthermore, we implemented the k - Means algorithm to cluster the HOG feature vectors of the positively detected windows and determined the path followed by a person in the video. We achieved a detection precision of 83.11% and a recall of 41.27%. We obtained these results 76.866 times faster than classification on normal images.

2018-03-19
Abdeslam, W. Oulad, Tabii, Y., El Kadiri, K. E..  2017.  Adaptive Appearance Model in Particle Filter Based Visual Tracking. Proceedings of the 2Nd International Conference on Big Data, Cloud and Applications. :85:1–85:5.

Visual Tracking methods based on particle filter framework uses frequently the state space information of the target object to calculate the observation model, However this often gives a poor estimate if unexpected motions happen, or under conditions of cluttered backgrounds illumination changes, because the model explores the state space without any additional information of current state. In order to avoid the tracking failure, we address in this paper, Particle filter based visual tracking, in which the target appearance model is represented through an adaptive conjunction of color histogram, and space based appearance combining with velocity parameters, then the appearance models is estimated using particles whose weights, are incrementally updated for dynamic adaptation of the cue parametrization.

2018-02-28
Su, J. C., Wu, C., Jiang, H., Maji, S..  2017.  Reasoning About Fine-Grained Attribute Phrases Using Reference Games. 2017 IEEE International Conference on Computer Vision (ICCV). :418–427.

We present a framework for learning to describe finegrained visual differences between instances using attribute phrases. Attribute phrases capture distinguishing aspects of an object (e.g., “propeller on the nose” or “door near the wing” for airplanes) in a compositional manner. Instances within a category can be described by a set of these phrases and collectively they span the space of semantic attributes for a category. We collect a large dataset of such phrases by asking annotators to describe several visual differences between a pair of instances within a category. We then learn to describe and ground these phrases to images in the context of a reference game between a speaker and a listener. The goal of a speaker is to describe attributes of an image that allows the listener to correctly identify it within a pair. Data collected in a pairwise manner improves the ability of the speaker to generate, and the ability of the listener to interpret visual descriptions. Moreover, due to the compositionality of attribute phrases, the trained listeners can interpret descriptions not seen during training for image retrieval, and the speakers can generate attribute-based explanations for differences between previously unseen categories. We also show that embedding an image into the semantic space of attribute phrases derived from listeners offers 20% improvement in accuracy over existing attributebased representations on the FGVC-aircraft dataset.

2018-02-27
Soleymani, Mohammad, Riegler, Michael, al Halvorsen, P$\backslash$a.  2017.  Multimodal Analysis of Image Search Intent: Intent Recognition in Image Search from User Behavior and Visual Content. Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. :251–259.

Users search for multimedia content with different underlying motivations or intentions. Study of user search intentions is an emerging topic in information retrieval since understanding why a user is searching for a content is crucial for satisfying the user's need. In this paper, we aimed at automatically recognizing a user's intent for image search in the early stage of a search session. We designed seven different search scenarios under the intent conditions of finding items, re-finding items and entertainment. We collected facial expressions, physiological responses, eye gaze and implicit user interactions from 51 participants who performed seven different search tasks on a custom-built image retrieval platform. We analyzed the users' spontaneous and explicit reactions under different intent conditions. Finally, we trained machine learning models to predict users' search intentions from the visual content of the visited images, the user interactions and the spontaneous responses. After fusing the visual and user interaction features, our system achieved the F-1 score of 0.722 for classifying three classes in a user-independent cross-validation. We found that eye gaze and implicit user interactions, including mouse movements and keystrokes are the most informative features. Given that the most promising results are obtained by modalities that can be captured unobtrusively and online, the results demonstrate the feasibility of deploying such methods for improving multimedia retrieval platforms.

2018-02-15
Yonetani, R., Boddeti, V. N., Kitani, K. M., Sato, Y..  2017.  Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption. 2017 IEEE International Conference on Computer Vision (ICCV). :2059–2069.

We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data. This framework is designed to aggregate multiple classifiers updated locally using private data and to ensure that no private information about the data is exposed during and after its learning procedure. We utilize a homomorphic cryptosystem that can aggregate the local classifiers while they are encrypted and thus kept secret. To overcome the high computational cost of homomorphic encryption of high-dimensional classifiers, we (1) impose sparsity constraints on local classifier updates and (2) propose a novel efficient encryption scheme named doublypermuted homomorphic encryption (DPHE) which is tailored to sparse high-dimensional data. DPHE (i) decomposes sparse data into its constituent non-zero values and their corresponding support indices, (ii) applies homomorphic encryption only to the non-zero values, and (iii) employs double permutations on the support indices to make them secret. Our experimental evaluation on several public datasets shows that the proposed approach achieves comparable performance against state-of-the-art visual recognition methods while preserving privacy and significantly outperforms other privacy-preserving methods.

2018-01-23
Maheshwari, B. C., Burns, J., Blott, M., Gambardella, G..  2017.  Implementation of a scalable real time canny edge detector on programmable SOC. 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA). :1–5.

In today's world, we are surrounded by variety of computer vision applications e.g. medical imaging, bio-metrics, security, surveillance and robotics. Most of these applications require real time processing of a single image or sequence of images. This real time image/video processing requires high computational power and specialized hardware architecture and can't be achieved using general purpose CPUs. In this paper, a FPGA based generic canny edge detector is introduced. Edge detection is one of the basic steps in image processing, image analysis, image pattern recognition, and computer vision. We have implemented a re-sizable canny edge detector IP on programmable logic (PL) of PYNQ-Platform. The IP is integrated with HDMI input/output blocks and can process 1080p input video stream at 60 frames per second. As mentioned the canny edge detection IP is scalable with respect to frame size i.e. depending on the input frame size, the hardware architecture can be scaled up or down by changing the template parameters. The offloading of canny edge detection from PS to PL causes the CPU usage to drop from about 100% to 0%. Moreover, hardware based edge detector runs about 14 times faster than the software based edge detector running on Cortex-A9 ARM processor.

2017-05-17
Thompson, Christopher, Wagner, David.  2016.  Securing Recognizers for Rich Video Applications. Proceedings of the 6th Workshop on Security and Privacy in Smartphones and Mobile Devices. :53–62.

Cameras have become nearly ubiquitous with the rise of smartphones and laptops. New wearable devices, such as Google Glass, focus directly on using live video data to enable augmented reality and contextually enabled services. However, granting applications full access to video data exposes more information than is necessary for their functionality, introducing privacy risks. We propose a privilege-separation architecture for visual recognizer applications that encourages modularization and least privilege–-separating the recognizer logic, sandboxing it to restrict filesystem and network access, and restricting what it can extract from the raw video data. We designed and implemented a prototype that separates the recognizer and application modules and evaluated our architecture on a set of 17 computer-vision applications. Our experiments show that our prototype incurs low overhead for each of these applications, reduces some of the privacy risks associated with these applications, and in some cases can actually increase the performance due to increased parallelism and concurrency.

2017-03-08
Sim, T., Zhang, L..  2015.  Controllable Face Privacy. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 04:1–8.

We present the novel concept of Controllable Face Privacy. Existing methods that alter face images to conceal identity inadvertently also destroy other facial attributes such as gender, race or age. This all-or-nothing approach is too harsh. Instead, we propose a flexible method that can independently control the amount of identity alteration while keeping unchanged other facial attributes. To achieve this flexibility, we apply a subspace decomposition onto our face encoding scheme, effectively decoupling facial attributes such as gender, race, age, and identity into mutually orthogonal subspaces, which in turn enables independent control of these attributes. Our method is thus useful for nuanced face de-identification, in which only facial identity is altered, but others, such gender, race and age, are retained. These altered face images protect identity privacy, and yet allow other computer vision analyses, such as gender detection, to proceed unimpeded. Controllable Face Privacy is therefore useful for reaping the benefits of surveillance cameras while preventing privacy abuse. Our proposal also permits privacy to be applied not just to identity, but also to other facial attributes as well. Furthermore, privacy-protection mechanisms, such as k-anonymity, L-diversity, and t-closeness, may be readily incorporated into our method. Extensive experiments with a commercial facial analysis software show that our alteration method is indeed effective.

Song, D., Liu, W., Ji, R., Meyer, D. A., Smith, J. R..  2015.  Top Rank Supervised Binary Coding for Visual Search. 2015 IEEE International Conference on Computer Vision (ICCV). :1922–1930.

In recent years, binary coding techniques are becoming increasingly popular because of their high efficiency in handling large-scale computer vision applications. It has been demonstrated that supervised binary coding techniques that leverage supervised information can significantly enhance the coding quality, and hence greatly benefit visual search tasks. Typically, a modern binary coding method seeks to learn a group of coding functions which compress data samples into binary codes. However, few methods pursued the coding functions such that the precision at the top of a ranking list according to Hamming distances of the generated binary codes is optimized. In this paper, we propose a novel supervised binary coding approach, namely Top Rank Supervised Binary Coding (Top-RSBC), which explicitly focuses on optimizing the precision of top positions in a Hamming-distance ranking list towards preserving the supervision information. The core idea is to train the disciplined coding functions, by which the mistakes at the top of a Hamming-distance ranking list are penalized more than those at the bottom. To solve such coding functions, we relax the original discrete optimization objective with a continuous surrogate, and derive a stochastic gradient descent to optimize the surrogate objective. To further reduce the training time cost, we also design an online learning algorithm to optimize the surrogate objective more efficiently. Empirical studies based upon three benchmark image datasets demonstrate that the proposed binary coding approach achieves superior image search accuracy over the state-of-the-arts.