Biblio
Here we explore the applicability of traditional sliding window based convolutional neural network (CNN) detection pipeline and region based object detection techniques such as Faster Region-based CNN (R-CNN) and Region-based Fully Convolutional Networks (R-FCN) on the problem of object detection in X-ray security imagery. Within this context, with limited dataset availability, we employ a transfer learning paradigm for network training tackling both single and multiple object detection problems over a number of R-CNN/R-FCN variants. The use of first-stage region proposal within the Faster RCNN and R-FCN provide superior results than traditional sliding window driven CNN (SWCNN) approach. With the use of Faster RCNN with VGG16, pretrained on the ImageNet dataset, we achieve 88.3 mAP for a six object class X-ray detection problem. The use of R-FCN with ResNet-101, yields 96.3 mAP for the two class firearm detection problem requiring 0.1 second computation per image. Overall we illustrate the comparative performance of these techniques as object localization strategies within cluttered X-ray security imagery.
The pattern recognition in the sparse representation (SR) framework has been very successful. In this model, the test sample can be represented as a sparse linear combination of training samples by solving a norm-regularized least squares problem. However, the value of regularization parameter is always indiscriminating for the whole dictionary. To enhance the group concentration of the coefficients and also to improve the sparsity, we propose a new SR model called adaptive sparse representation classifier(ASRC). In ASRC, a sparse coefficient strengthened item is added in the objective function. The model is solved by the artificial bee colony (ABC) algorithm with variable step to speed up the convergence. Also, a partition strategy for large scale dictionary is adopted to lighten bee's load and removes the irrelevant groups. Through different data sets, we empirically demonstrate the property of the new model and its recognition performance.
In this paper, we propose a new regularization scheme for the well-known Support Vector Machine (SVM) classifier that operates on the training sample level. The proposed approach is motivated by the fact that Maximum Margin-based classification defines decision functions as a linear combination of the selected training data and, thus, the variations on training sample selection directly affect generalization performance. We show that the exploitation of the proposed regularization scheme is well motivated and intuitive. Experimental results show that the proposed regularization scheme outperforms standard SVM in human action recognition tasks as well as classical recognition problems.
With crimes on the rise all around the world, video surveillance is becoming more important day by day. Due to the lack of human resources to monitor this increasing number of cameras manually, new computer vision algorithms to perform lower and higher level tasks are being developed. We have developed a new method incorporating the most acclaimed Histograms of Oriented Gradients, the theory of Visual Saliency and the saliency prediction model Deep Multi-Level Network to detect human beings in video sequences. Furthermore, we implemented the k - Means algorithm to cluster the HOG feature vectors of the positively detected windows and determined the path followed by a person in the video. We achieved a detection precision of 83.11% and a recall of 41.27%. We obtained these results 76.866 times faster than classification on normal images.
We present a novel multimodal fusion model for affective content analysis, combining visual, audio and deep visual-sentiment descriptors from the media content with automated facial action measurements from naturalistic responses to the media. We collected a dataset of 48,867 facial responses to 384 media clips and extracted a rich feature set from the facial responses and media content. The stimulus videos were validated to be informative, inspiring, persuasive, sentimental or amusing. By combining the features, we were able to obtain a classification accuracy of 63% (weighted F1-score: 0.62) for a five-class task. This was a significant improvement over using the media content features alone. By analyzing the feature sets independently, we found that states of informed and persuaded were difficult to differentiate from facial responses alone due to the presence of similar sets of action units in each state (AU 2 occurring frequently in both cases). Facial actions were beneficial in differentiating between amused and informed states whereas media content features alone performed less well due to similarities in the visual and audio make up of the content. We highlight examples of content and reactions from each class. This is the first affective content analysis based on reactions of 10,000s of people.
Protecting the privacy of user-identification data is fundamental to protect the information systems from attacks and vulnerabilities. Providing access to such data only to the limited and legitimate users is the key motivation for `Biometrics'. In `Biometric Systems' confirming a user's claim of his/her identity reliably, is more important than focusing on `what he/she really possesses' or `what he/she remembers'. In this paper the use of face image for biometric access is proposed using two multistage face recognition algorithms that employ biometric facial features to validate the user's claim. The proposed algorithms use standard algorithms and classifiers such as EigenFaces, PCA and LDA in stages. Performance evaluation of both proposed algorithms is carried out using two standard datasets, the Extended Yale database and AT&T database. Results using the proposed multi-stage algorithms are better than those using other standard algorithms. Current limitations and possible applications of the proposed algorithms are also discussed along, with further scope of making these robust to pose, illumination and noise variations.
A major issue that arises from mass visual media distribution in modern video sharing, social media and cloud services, is the issue of privacy. Malicious users can use these services to track the actions of certain individuals and/or groups thus violating their privacy. As a result the need to hinder automatic facial image identification in images and videos arises. In this paper we propose a method for de-identifying facial images. Contrary to most de-identification methods, this method manipulates facial images so that humans can still recognize the individual or individuals in an image or video frame, but at the same time common automatic identification algorithms fail to do so. This is achieved by projecting the facial images on a hypersphere. From the conducted experiments it can be verified that this method is effective in reducing the classification accuracy under 10%. Furthermore, in the resulting images the subject can be identified by human viewers.
Segmentation of land and water regions is necessary in many applications involving analysis of remote sensing imagery. Not only is manual segmentation of these regions prone to considerable subjective variability, but the large volume of imagery collected by modern platforms makes manual segmentation extremely tedious to perform, particularly in applications that require frequent re-measurement. This paper examines a robust, semi-automated approach that utilizes simple and efficient machine learning algorithms to perform supervised classification of multi-spectral image data into land and water regions. By combining the four wavelength bands widely available in imaging platforms such as IKONOS, QuickBird, and GeoEye-1 with basic texture metrics, high quality segmentation can be achieved. An efficient workflow was created by constructing a Graphical User Interface (GUI) to these machine learning algorithms.
A robust appearance model is usually required in visual tracking, which can handle pose variation, illumination variation, occlusion and many other interferences occurring in video. So far, a number of tracking algorithms make use of image samples in previous frames to update appearance models. There are many limitations of that approach: 1) At the beginning of tracking, there exists no sufficient amount of data for online update because these adaptive models are data-dependent and 2) in many challenging situations, robustly updating the appearance models is difficult, which often results in drift problems. In this paper, we proposed a tracking algorithm based on compressive sensing theory and particle filter framework. Features are extracted by random projection with data-independent basis. Particle filter is employed to make a more accurate estimation of the target location and make much of the updated classifier. The robustness and the effectiveness of our tracker have been demonstrated in several experiments.
Road In this paper, we focus on both the road vehicle and pedestrians detection, namely obstacle detection. At the same time, a new obstacle detection and classification technique in dynamical background is proposed. Obstacle detection is based on inverse perspective mapping and homography. Obstacle classification is based on fuzzy neural network. The estimation of the vanishing point relies on feature extraction strategy, which segments the lane markings of the images by combining a histogram-based segmentation with temporal filtering. Then, the vanishing point of each image is stabilized by means of a temporal filtering along the estimates of previous images. The IPM image is computed based on the stabilized vanishing point. The method exploits the geometrical relations between the elements in the scene so that obstacle can be detected. The estimated homography of the road plane between successive images is used for image alignment. A new fuzzy decision fusion method with fuzzy attribution for obstacle detection and classification application is described. The fuzzy decision function modifies parameters with auto-adapted algorithm to get better classification probability. It is shown that the method can achieve better classification result.
The existence of mixed pixels is a major problem in remote-sensing image classification. Although the soft classification and spectral unmixing techniques can obtain an abundance of different classes in a pixel to solve the mixed pixel problem, the subpixel spatial attribution of the pixel will still be unknown. The subpixel mapping technique can effectively solve this problem by providing a fine-resolution map of class labels from coarser spectrally unmixed fraction images. However, most traditional subpixel mapping algorithms treat all mixed pixels as an identical type, either boundary-mixed pixel or linear subpixel, leading to incomplete and inaccurate results. To improve the subpixel mapping accuracy, this paper proposes an adaptive subpixel mapping framework based on a multiagent system for remote-sensing imagery. In the proposed multiagent subpixel mapping framework, three kinds of agents, namely, feature detection agents, subpixel mapping agents and decision agents, are designed to solve the subpixel mapping problem. Experiments with artificial images and synthetic remote-sensing images were performed to evaluate the performance of the proposed subpixel mapping algorithm in comparison with the hard classification method and other subpixel mapping algorithms: subpixel mapping based on a back-propagation neural network and the spatial attraction model. The experimental results indicate that the proposed algorithm outperforms the other two subpixel mapping algorithms in reconstructing the different structures in mixed pixels.
To deliver sample estimates provided with the necessary probability foundation to permit generalization from the sample data subset to the whole target population being sampled, probability sampling strategies are required to satisfy three necessary not sufficient conditions: 1) All inclusion probabilities be greater than zero in the target population to be sampled. If some sampling units have an inclusion probability of zero, then a map accuracy assessment does not represent the entire target region depicted in the map to be assessed. 2) The inclusion probabilities must be: a) knowable for nonsampled units and b) known for those units selected in the sample: since the inclusion probability determines the weight attached to each sampling unit in the accuracy estimation formulas, if the inclusion probabilities are unknown, so are the estimation weights. This original work presents a novel (to the best of these authors' knowledge, the first) probability sampling protocol for quality assessment and comparison of thematic maps generated from spaceborne/airborne very high resolution images, where: 1) an original Categorical Variable Pair Similarity Index (proposed in two different formulations) is estimated as a fuzzy degree of match between a reference and a test semantic vocabulary, which may not coincide, and 2) both symbolic pixel-based thematic quality indicators (TQIs) and sub-symbolic object-based spatial quality indicators (SQIs) are estimated with a degree of uncertainty in measurement in compliance with the well-known Quality Assurance Framework for Earth Observation (QA4EO) guidelines. Like a decision-tree, any protocol (guidelines for best practice) comprises a set of rules, equivalent to structural knowledge, and an order of presentation of the rule set, known as procedural knowledge. The combination of these two levels of knowledge makes an original protocol worth more than the sum of its parts. The several degrees of novelty of the proposed probability sampling protocol are highlighted in this paper, at the levels of understanding of both structural and procedural knowledge, in comparison with related multi-disciplinary works selected from the existing literature. In the experimental session, the proposed protocol is tested for accuracy validation of preliminary classification maps automatically generated by the Satellite Image Automatic Mapper (SIAM™) software product from two WorldView-2 images and one QuickBird-2 image provided by DigitalGlobe for testing purposes. In these experiments, collected TQIs and SQIs are statistically valid, statistically significant, consistent across maps, and in agreement with theoretical expectations, visual (qualitative) evidence and quantitative quality indexes of operativeness (OQIs) claimed for SIAM™ by related papers. As a subsidiary conclusion, the statistically consistent and statistically significant accuracy validation of the SIAM™ pre-classification maps proposed in this contribution, together with OQIs claimed for SIAM™ by related works, make the operational (automatic, accurate, near real-time, robust, scalable) SIAM™ software product eligible for opening up new inter-disciplinary research and market opportunities in accordance with the visionary goal of the Global Earth Observation System of Systems initiative and the QA4EO international guidelines.
- « first
- ‹ previous
- 1
- 2
- 3
- 4