Biblio
Relevance feedback can be considered as a learning problem. It has been extensively used to improve the performance of retrieval multimedia information. In this paper, after the relevance feedback upon content-based image retrieval (CBIR) discussed, a hybrid learning scheme on multi-target retrieval (MTR) with relevance feedback was proposed. Suppose the symbolic image database (SID) of object-level with combined image metadata and feature model was constructed. During the interactive query for remote sensing image, we calculate the similarity metric so as to get the relevant image sets from the image library. For the purpose of further improvement of the precision of image retrieval, a hybrid learning scheme parameter also need to be chosen. As a result, the idea of our hybrid learning scheme contains an exception maximization algorithm (EMA) used for retrieving the most relevant images from SID and an algorithm called supported vector machine (SVM) with relevance feedback used for learning the feedback information substantially. Experimental results show that our hybrid learning scheme with relevance feedback on MTR can improve the performance and accuracy compared the basic algorithms.
Recent years, more and more testing criteria for deep learning systems has been proposed to ensure system robustness and reliability. These criteria were defined based on different perspectives of diversity. However, there lacks comprehensive investigation on what are the most essential diversities that should be considered by a testing criteria for deep learning systems. Therefore, in this paper, we conduct an empirical study to investigate the relation between test diversities and erroneous behaviors of deep learning models. We define five metrics to reflect diversities in neuron activities, and leverage metamorphic testing to detect erroneous behaviors. We investigate the correlation between metrics and erroneous behaviors. We also go further step to measure the quality of test suites under the guidance of defined metrics. Our results provided comprehensive insights on the essential diversities for testing criteria to exhibit good fault detection ability.
The goal of content-based recommendation system is to retrieve and rank the list of items that are closest to the query item. Today, almost every e-commerce platform has a recommendation system strategy for products that customers can decide to buy. In this paper we describe our work on creating a Generative Adversarial Network based image retrieval system for e-commerce platforms to retrieve best similar images for a given product image specifically for shoes. We compare state-of-the-art solutions and provide results for the proposed deep learning network on a standard data set.
Recently, hashing has attracted considerable attention for nearest neighbor search due to its fast query speed and low storage cost. However, existing unsupervised hashing algorithms have two problems in common. Firstly, the widely utilized anchor graph construction algorithm has inherent limitations in local weight estimation. Secondly, the locally linear structure in the original feature space is seldom taken into account for binary encoding. Therefore, in this paper, we propose a novel unsupervised hashing method, dubbed “discrete locally-linear preserving hashing”, which effectively calculates the adjacent matrix while preserving the locally linear structure in the obtained hash space. Specifically, a novel local anchor embedding algorithm is adopted to construct the approximate adjacent matrix. After that, we directly minimize the reconstruction error with the discrete constrain to learn the binary codes. Experimental results on two typical image datasets indicate that the proposed method significantly outperforms the state-of-the-art unsupervised methods.
The storage efficiency of hash codes and their application in the fast approximate nearest neighbor search, along with the explosion in the size of available labeled image datasets caused an intensive interest in developing learning based hash algorithms recently. In this paper, we present a learning based hash algorithm that utilize ordinal information of feature vectors. We have proposed a novel mathematically differentiable approximation of argmax function for this hash algorithm. It has enabled seamless integration of hash function with deep neural network architecture which can exploit the rich feature vectors generated by convolutional neural networks. We have also proposed a loss function for the case that the hash code is not binary and its entries are digits of arbitrary k-ary base. The resultant model comprised of feature vector generation and hashing layer is amenable to end-to-end training using gradient descent methods. In contrast to the majority of current hashing algorithms that are either not learning based or use hand-crafted feature vectors as input, simultaneous training of the components of our system results in better optimization. Extensive evaluations on NUS-WIDE, CIFAR-10 and MIRFlickr benchmarks show that the proposed algorithm outperforms state-of-art and classical data agnostic, unsupervised and supervised hashing methods by 2.6% to 19.8% mean average precision under various settings.
With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.
Image retrieval systems have been an active area of research for more than thirty years progressively producing improved algorithms that improve performance metrics, operate in different domains, take advantage of different features extracted from the images to be retrieved, and have different desirable invariance properties. With the ever-growing visual databases of images and videos produced by a myriad of devices comes the challenge of selecting effective features and performing fast retrieval on such databases. In this paper, we incorporate Fourier descriptors (FD) along with a metric-based balanced indexing tree as a viable solution to DHS (Department of Homeland Security) needs to for quick identification and retrieval of weapon images. The FDs allow a simple but effective outline feature representation of an object, while the M-tree provide a dynamic, fast, and balanced search over such features. Motivated by looking for applications of interest to DHS, we have created a basic guns and rifles databases that can be used to identify weapons in images and videos extracted from media sources. Our simulations show excellent performance in both representation and fast retrieval speed.
We, humans, have the ability to easily imagine scenes that depict sentences such as ``Today is a beautiful sunny day'' or ``There is a Christmas feel, in the air''. While it is hard to precisely describe what one person may imagine, the essential high-level themes associated with such sentences largely remains the same. The ability to synthesize novel images that depict the feel of a sentence is very useful in a variety of applications such as education, advertisement, and entertainment. While existing papers tackle this problem given a style image, we aim to provide a far more intuitive and easy to use solution that synthesizes novel renditions of an existing image, conditioned on a given sentence. We present a method for cross-modal style transfer between an English sentence and an image, to produce a new image that imbibes the essential theme of the sentence. We do this by modifying the style transfer mechanism used in image style transfer to incorporate a style component derived from the given sentence. We demonstrate promising results using the YFCC100m dataset.
We present a framework for learning to describe finegrained visual differences between instances using attribute phrases. Attribute phrases capture distinguishing aspects of an object (e.g., “propeller on the nose” or “door near the wing” for airplanes) in a compositional manner. Instances within a category can be described by a set of these phrases and collectively they span the space of semantic attributes for a category. We collect a large dataset of such phrases by asking annotators to describe several visual differences between a pair of instances within a category. We then learn to describe and ground these phrases to images in the context of a reference game between a speaker and a listener. The goal of a speaker is to describe attributes of an image that allows the listener to correctly identify it within a pair. Data collected in a pairwise manner improves the ability of the speaker to generate, and the ability of the listener to interpret visual descriptions. Moreover, due to the compositionality of attribute phrases, the trained listeners can interpret descriptions not seen during training for image retrieval, and the speakers can generate attribute-based explanations for differences between previously unseen categories. We also show that embedding an image into the semantic space of attribute phrases derived from listeners offers 20% improvement in accuracy over existing attributebased representations on the FGVC-aircraft dataset.
This paper proposes a novel recursive hashing scheme, in contrast to conventional "one-off" based hashing algorithms. Inspired by human's "nonsalient-to-salient" perception path, the proposed hashing scheme generates a series of binary codes based on progressively expanded salient regions. Built on a recurrent deep network, i.e., LSTM structure, the binary codes generated from later output nodes naturally inherit information aggregated from previously codes while explore novel information from the extended salient region, and therefore it possesses good scalability property. The proposed deep hashing network is trained via minimizing a triplet ranking loss, which is end-to-end trainable. Extensive experimental results on several image retrieval benchmarks demonstrate good performance gain over state-of-the-art image retrieval methods and its scalability property.
In recent years, binary coding techniques are becoming increasingly popular because of their high efficiency in handling large-scale computer vision applications. It has been demonstrated that supervised binary coding techniques that leverage supervised information can significantly enhance the coding quality, and hence greatly benefit visual search tasks. Typically, a modern binary coding method seeks to learn a group of coding functions which compress data samples into binary codes. However, few methods pursued the coding functions such that the precision at the top of a ranking list according to Hamming distances of the generated binary codes is optimized. In this paper, we propose a novel supervised binary coding approach, namely Top Rank Supervised Binary Coding (Top-RSBC), which explicitly focuses on optimizing the precision of top positions in a Hamming-distance ranking list towards preserving the supervision information. The core idea is to train the disciplined coding functions, by which the mistakes at the top of a Hamming-distance ranking list are penalized more than those at the bottom. To solve such coding functions, we relax the original discrete optimization objective with a continuous surrogate, and derive a stochastic gradient descent to optimize the surrogate objective. To further reduce the training time cost, we also design an online learning algorithm to optimize the surrogate objective more efficiently. Empirical studies based upon three benchmark image datasets demonstrate that the proposed binary coding approach achieves superior image search accuracy over the state-of-the-arts.