Visible to the public Biblio

Filters: Keyword is image retrieval  [Clear All Filters]
2023-07-21
Sivasangari, A., Gomathi, R. M., Anandhi, T., Roobini, Roobini, Ajitha, P..  2022.  Facial Recognition System using Decision Tree Algorithm. 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC). :1542—1546.
Face recognition technology is widely employed in a variety of applications, including public security, criminal identification, multimedia data management, and so on. Because of its importance for practical applications and theoretical issues, the facial recognition system has received a lot of attention. Furthermore, numerous strategies have been offered, each of which has shown to be a significant benefit in the field of facial and pattern recognition systems. Face recognition still faces substantial hurdles in unrestricted situations, despite these advancements. Deep learning techniques for facial recognition are presented in this paper for accurate detection and identification of facial images. The primary goal of facial recognition is to recognize and validate facial features. The database consists of 500 color images of people that have been pre-processed and features extracted using Linear Discriminant Analysis. These features are split into 70 percent for training and 30 percent for testing of decision tree classifiers for the computation of face recognition system performance.
2022-06-09
Yan, Longchuan, Zhang, Zhaoxia, Huang, Huige, Yuan, Xiaoyu, Peng, Yuanlong, Zhang, Qingyun.  2021.  An Improved Deep Pairwise Supervised Hashing Algorithm for Fast Image Retrieval. 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA). 2:1152–1156.
In recent years, hashing algorithm has been widely researched and has made considerable progress in large-scale image retrieval tasks due to its advantages of convenient storage and fast calculation efficiency. Nowadays most researchers use deep convolutional neural networks (CNNs) to perform feature learning and hash coding learning at the same time for image retrieval and the deep hashing methods based on deep CNNs perform much better than the traditional manual feature hashing methods. But most methods are designed to handle simple binary similarity and decrease quantization error, ignoring that the features of similar images and hashing codes generated are not compact enough. In order to enhance the performance of CNNs-based hashing algorithms for large scale image retrieval, this paper proposes a new deep-supervised hashing algorithm in which a novel channel attention mechanism is added and the loss function is elaborately redesigned to generate compact binary codes. It experimentally proves that, compared with the existing hashing methods, this method has better performance on two large scale image datasets CIFAR-10 and NUS-WIDE.
2022-03-08
Kim, Ji-Hoon, Park, Yeo-Reum, Do, Jaeyoung, Ji, Soo-Young, Kim, Joo-Young.  2021.  Accelerating Large-Scale Nearest Neighbor Search with Computational Storage Device. 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). :254—254.
K-nearest neighbor algorithm that searches the K closest samples in a high dimensional feature space is one of the most fundamental tasks in machine learning and image retrieval applications. Computational storage device that combines computing unit and storage module on a single board becomes popular to address the data bandwidth bottleneck of the conventional computing system. In this paper, we propose a nearest neighbor search acceleration platform based on computational storage device, which can process a large-scale image dataset efficiently in terms of speed, energy, and cost. We believe that the proposed acceleration platform is promising to be deployed in cloud datacenters for data-intensive applications.
2022-02-10
Song, Fuyuan, Qin, Zheng, Zhang, Jixin, Liu, Dongxiao, Liang, Jinwen, Shen, Xuemin Sherman.  2020.  Efficient and Privacy-preserving Outsourced Image Retrieval in Public Clouds. GLOBECOM 2020 - 2020 IEEE Global Communications Conference. :1–6.
With the proliferation of cloud services, cloud-based image retrieval services enable large-scale image outsourcing and ubiquitous image searching. While enjoying the benefits of the cloud-based image retrieval services, critical privacy concerns may arise in such services since they may contain sensitive personal information. In this paper, we propose an efficient and Privacy-Preserving Image Retrieval scheme with Key Switching Technique (PPIRS). PPIRS utilizes the inner product encryption for measuring Euclidean distances between image feature vectors and query vectors in a privacy-preserving manner. Due to the high dimension of the image feature vectors and the large scale of the image databases, traditional secure Euclidean distance comparison methods provide insufficient search efficiency. To prune the search space of image retrieval, PPIRS tailors key switching technique (KST) for reducing the dimension of the encrypted image feature vectors and further achieves low communication overhead. Meanwhile, by introducing locality sensitive hashing (LSH), PPIRS builds efficient searchable indexes for image retrieval by organizing similar images into a bucket. Security analysis shows that the privacy of both outsourced images and queries are guaranteed. Extensive experiments on a real-world dataset demonstrate that PPIRS achieves efficient image retrieval in terms of computational cost.
ISSN: 2576-6813
2021-03-29
Li, J., Wang, X., Liu, S..  2020.  Hash Retrieval Method for Recaptured Images Based on Convolutional Neural Network. 2020 2nd World Symposium on Artificial Intelligence (WSAI). :79–83.
For the purpose of outdoor advertising market researching, AD images are recaptured and uploaded everyday for statistics. But the quality of the recaptured advertising images are often affected by conditions such as angle, distance, and light during the shooting process, which consequently reduce either the speed or the accuracy of the retrieving algorithm. In this paper, we proposed a hash retrieval method based on convolutional neural networks for recaptured images. The basic idea is to add a hash layer to the convolutional neural network and then extract the binary hash code output by the hash layer to perform image retrieval in lowdimensional Hamming space. Experimental results show that the retrieval performance is improved compared with the current commonly used hash retrieval methods.
Al-Janabi, S. I. Ali, Al-Janabi, S. T. Faraj, Al-Khateeb, B..  2020.  Image Classification using Convolution Neural Network Based Hash Encoding and Particle Swarm Optimization. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI). :1–5.
Image Retrieval (IR) has become one of the main problems facing computer society recently. To increase computing similarities between images, hashing approaches have become the focus of many programmers. Indeed, in the past few years, Deep Learning (DL) has been considered as a backbone for image analysis using Convolutional Neural Networks (CNNs). This paper aims to design and implement a high-performance image classifier that can be used in several applications such as intelligent vehicles, face recognition, marketing, and many others. This work considers experimentation to find the sequential model's best configuration for classifying images. The best performance has been obtained from two layers' architecture; the first layer consists of 128 nodes, and the second layer is composed of 32 nodes, where the accuracy reached up to 0.9012. The proposed classifier has been achieved using CNN and the data extracted from the CIFAR-10 dataset by the inception model, which are called the Transfer Values (TRVs). Indeed, the Particle Swarm Optimization (PSO) algorithm is used to reduce the TRVs. In this respect, the work focus is to reduce the TRVs to obtain high-performance image classifier models. Indeed, the PSO algorithm has been enhanced by using the crossover technique from genetic algorithms. This led to a reduction of the complexity of models in terms of the number of parameters used and the execution time.
2021-02-08
Saleh, A. H., Yousif, A. S., Ahmed, F. Y. H..  2020.  Information Hiding for Text Files by Adopting the Genetic Algorithm and DNA Coding. 2020 IEEE 10th Symposium on Computer Applications Industrial Electronics (ISCAIE). :220–223.
Hiding information is a process to hide data or include it in different digital media such as image, audio, video, and text. However, there are many techniques to achieve the process of hiding information in the image processing, in this paper, a new method has been proposed for hidden data mechanism (which is a text file), then a transposition cipher method has been employed for encryption completed. It can be used to build an encrypted text and also to increase security against possible attacks while sending it over the World Wide Web. A genetic algorithm has been affected in the adjustment of the encoded text and DNA in the creation of an encrypted text that is difficult to detect and then include in the image and that affected the image visual quality. The proposed method outperforms the state of arts in terms of efficiently retrieving the embedded messages. Performance evaluation has been recorded high visual quality scores for the (SNR (single to noise ratio), PSNR (peak single to noise ratio) and MSE (mean square error).
2020-12-11
Zhou, Y., Zeng, Z..  2019.  Info-Retrieval with Relevance Feedback using Hybrid Learning Scheme for RS Image. 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :135—138.

Relevance feedback can be considered as a learning problem. It has been extensively used to improve the performance of retrieval multimedia information. In this paper, after the relevance feedback upon content-based image retrieval (CBIR) discussed, a hybrid learning scheme on multi-target retrieval (MTR) with relevance feedback was proposed. Suppose the symbolic image database (SID) of object-level with combined image metadata and feature model was constructed. During the interactive query for remote sensing image, we calculate the similarity metric so as to get the relevant image sets from the image library. For the purpose of further improvement of the precision of image retrieval, a hybrid learning scheme parameter also need to be chosen. As a result, the idea of our hybrid learning scheme contains an exception maximization algorithm (EMA) used for retrieving the most relevant images from SID and an algorithm called supported vector machine (SVM) with relevance feedback used for learning the feedback information substantially. Experimental results show that our hybrid learning scheme with relevance feedback on MTR can improve the performance and accuracy compared the basic algorithms.

2020-12-01
Garbo, A., Quer, S..  2018.  A Fast MPEG’s CDVS Implementation for GPU Featured in Mobile Devices. IEEE Access. 6:52027—52046.
The Moving Picture Experts Group's Compact Descriptors for Visual Search (MPEG's CDVS) intends to standardize technologies in order to enable an interoperable, efficient, and cross-platform solution for internet-scale visual search applications and services. Among the key technologies within CDVS, we recall the format of visual descriptors, the descriptor extraction process, and the algorithms for indexing and matching. Unfortunately, these steps require precision and computation accuracy. Moreover, they are very time-consuming, as they need running times in the order of seconds when implemented on the central processing unit (CPU) of modern mobile devices. In this paper, to reduce computation times and maintain precision and accuracy, we re-design, for many-cores embedded graphical processor units (GPUs), all main local descriptor extraction pipeline phases of the MPEG's CDVS standard. To reach this goal, we introduce new techniques to adapt the standard algorithm to parallel processing. Furthermore, to reduce memory accesses and efficiently distribute the kernel workload, we use new approaches to store and retrieve CDVS information on proper GPU data structures. We present a complete experimental analysis on a large and standard test set. Our experiments show that our GPU-based approach is remarkably faster than the CPU-based reference implementation of the standard, and it maintains a comparable precision in terms of true and false positive rates.
2020-11-30
Zhou, K., Sun, S., Wang, H., Huang, P., He, X., Lan, R., Li, W., Liu, W., Yang, T..  2019.  Improving Cache Performance for Large-Scale Photo Stores via Heuristic Prefetching Scheme. IEEE Transactions on Parallel and Distributed Systems. 30:2033–2045.
Photo service providers are facing critical challenges of dealing with the huge amount of photo storage, typically in a magnitude of billions of photos, while ensuring national-wide or world-wide satisfactory user experiences. Distributed photo caching architecture is widely deployed to meet high performance expectations, where efficient still mysterious caching policies play essential roles. In this work, we present a comprehensive study on internet-scale photo caching algorithms in the case of QQPhoto from Tencent Inc., the largest social network service company in China. We unveil that even advanced cache algorithms can only perform at a similar level as simple baseline algorithms and there still exists a large performance gap between these cache algorithms and the theoretically optimal algorithm due to the complicated access behaviors in such a large multi-tenant environment. We then expound the reasons behind this phenomenon via extensively investigating the characteristics of QQPhoto workloads. Finally, in order to realistically further improve QQPhoto cache efficiency, we propose to incorporate a prefetcher in the cache stack based on the observed immediacy feature that is unique to the QQPhoto workload. The prefetcher proactively prefetches selected photos into cache before they are requested for the first time to eliminate compulsory misses and promote hit ratios. Our extensive evaluation results show that with appropriate prefetching we improve the cache hit ratio by up to 7.4 percent, while reducing the average access latency by 6.9 percent at a marginal cost of 4.14 percent backend network traffic compared to the original system that performs no prefetching.
2020-11-16
Anju, J., Shreelekshmi, R..  2019.  Modified Feature Descriptors to enhance Secure Content-based Image Retrieval in Cloud. 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT). 1:674–680.
With the emergence of cloud, content-based image retrieval (CBIR) on encrypted domain gain enormous importance due to the ever increasing need for ensuring confidentiality, authentication, integrity and privacy of data. CBIR on outsourced encrypted images can be done by extracting features from unencrypted images and generating searchable encrypted index based on it. Visual descriptors like color descriptors, shape and texture descriptors, etc. are employed for similarity search. Since visual descriptors used to represent an image have crucial role in retrieving most similar results, an attempt to combine them has been made in this paper. The effect of combining different visual descriptors on retrieval precision in secure CBIR scheme proposed by Xia et al. is analyzed. Experimental results show that combining visual descriptors can significantly enhance retrieval precision of the secure CBIR scheme.
2020-11-02
Zhang, Z., Xie, X..  2019.  On the Investigation of Essential Diversities for Deep Learning Testing Criteria. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS). :394–405.

Recent years, more and more testing criteria for deep learning systems has been proposed to ensure system robustness and reliability. These criteria were defined based on different perspectives of diversity. However, there lacks comprehensive investigation on what are the most essential diversities that should be considered by a testing criteria for deep learning systems. Therefore, in this paper, we conduct an empirical study to investigate the relation between test diversities and erroneous behaviors of deep learning models. We define five metrics to reflect diversities in neuron activities, and leverage metamorphic testing to detect erroneous behaviors. We investigate the correlation between metrics and erroneous behaviors. We also go further step to measure the quality of test suites under the guidance of defined metrics. Our results provided comprehensive insights on the essential diversities for testing criteria to exhibit good fault detection ability.

2020-09-04
Zhao, Pu, Liu, Sijia, Chen, Pin-Yu, Hoang, Nghia, Xu, Kaidi, Kailkhura, Bhavya, Lin, Xue.  2019.  On the Design of Black-Box Adversarial Examples by Leveraging Gradient-Free Optimization and Operator Splitting Method. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). :121—130.
Robust machine learning is currently one of the most prominent topics which could potentially help shaping a future of advanced AI platforms that not only perform well in average cases but also in worst cases or adverse situations. Despite the long-term vision, however, existing studies on black-box adversarial attacks are still restricted to very specific settings of threat models (e.g., single distortion metric and restrictive assumption on target model's feedback to queries) and/or suffer from prohibitively high query complexity. To push for further advances in this field, we introduce a general framework based on an operator splitting method, the alternating direction method of multipliers (ADMM) to devise efficient, robust black-box attacks that work with various distortion metrics and feedback settings without incurring high query complexity. Due to the black-box nature of the threat model, the proposed ADMM solution framework is integrated with zeroth-order (ZO) optimization and Bayesian optimization (BO), and thus is applicable to the gradient-free regime. This results in two new black-box adversarial attack generation methods, ZO-ADMM and BO-ADMM. Our empirical evaluations on image classification datasets show that our proposed approaches have much lower function query complexities compared to state-of-the-art attack methods, but achieve very competitive attack success rates.
2020-07-30
Perez, Claudio A., Estévez, Pablo A, Galdames, Francisco J., Schulz, Daniel A., Perez, Juan P., Bastías, Diego, Vilar, Daniel R..  2018.  Trademark Image Retrieval Using a Combination of Deep Convolutional Neural Networks. 2018 International Joint Conference on Neural Networks (IJCNN). :1—7.
Trademarks are recognizable images and/or words used to distinguish various products or services. They become associated with the reputation, innovation, quality, and warranty of the products. Countries around the world have offices for industrial/intellectual property (IP) registration. A new trademark image in application for registration should be distinct from all the registered trademarks. Due to the volume of trademark registration applications and the size of the databases containing existing trademarks, it is impossible for humans to make all the comparisons visually. Therefore, technological tools are essential for this task. In this work we use a pre-trained, publicly available Convolutional Neural Network (CNN) VGG19 that was trained on the ImageNet database. We adapted the VGG19 for the trademark image retrieval (TIR) task by fine tuning the network using two different databases. The VGG19v was trained with a database organized with trademark images using visual similarities, and the VGG19c was trained using trademarks organized by using conceptual similarities. The database for the VGG19v was built using trademarks downloaded from the WEB, and organized by visual similarity according to experts from the IP office. The database for the VGG19c was built using trademark images from the United States Patent and Trademarks Office and organized according to the Vienna conceptual protocol. The TIR was assessed using the normalized average rank for a test set from the METU database that has 922,926 trademark images. We computed the normalized average ranks for VGG19v, VGG19c, and for a combination of both networks. Our method achieved significantly better results on the METU database than those published previously.
2020-06-12
Ay, Betül, Aydın, Galip, Koyun, Zeynep, Demir, Mehmet.  2019.  A Visual Similarity Recommendation System using Generative Adversarial Networks. 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML). :44—48.

The goal of content-based recommendation system is to retrieve and rank the list of items that are closest to the query item. Today, almost every e-commerce platform has a recommendation system strategy for products that customers can decide to buy. In this paper we describe our work on creating a Generative Adversarial Network based image retrieval system for e-commerce platforms to retrieve best similar images for a given product image specifically for shoes. We compare state-of-the-art solutions and provide results for the proposed deep learning network on a standard data set.

[Anonymous].  2018.  Discrete Locally-Linear Preserving Hashing. {2018 25th IEEE International Conference on Image Processing (ICIP). :490—494.

Recently, hashing has attracted considerable attention for nearest neighbor search due to its fast query speed and low storage cost. However, existing unsupervised hashing algorithms have two problems in common. Firstly, the widely utilized anchor graph construction algorithm has inherent limitations in local weight estimation. Secondly, the locally linear structure in the original feature space is seldom taken into account for binary encoding. Therefore, in this paper, we propose a novel unsupervised hashing method, dubbed “discrete locally-linear preserving hashing”, which effectively calculates the adjacent matrix while preserving the locally linear structure in the obtained hash space. Specifically, a novel local anchor embedding algorithm is adopted to construct the approximate adjacent matrix. After that, we directly minimize the reconstruction error with the discrete constrain to learn the binary codes. Experimental results on two typical image datasets indicate that the proposed method significantly outperforms the state-of-the-art unsupervised methods.

Al Kobaisi, Ali, Wocjan, Pawel.  2018.  Supervised Max Hashing for Similarity Image Retrieval. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). :359—365.

The storage efficiency of hash codes and their application in the fast approximate nearest neighbor search, along with the explosion in the size of available labeled image datasets caused an intensive interest in developing learning based hash algorithms recently. In this paper, we present a learning based hash algorithm that utilize ordinal information of feature vectors. We have proposed a novel mathematically differentiable approximation of argmax function for this hash algorithm. It has enabled seamless integration of hash function with deep neural network architecture which can exploit the rich feature vectors generated by convolutional neural networks. We have also proposed a loss function for the case that the hash code is not binary and its entries are digits of arbitrary k-ary base. The resultant model comprised of feature vector generation and hashing layer is amenable to end-to-end training using gradient descent methods. In contrast to the majority of current hashing algorithms that are either not learning based or use hand-crafted feature vectors as input, simultaneous training of the components of our system results in better optimization. Extensive evaluations on NUS-WIDE, CIFAR-10 and MIRFlickr benchmarks show that the proposed algorithm outperforms state-of-art and classical data agnostic, unsupervised and supervised hashing methods by 2.6% to 19.8% mean average precision under various settings.

2020-03-30
Li, Jian, Zhang, Zelin, Li, Shengyu, Benton, Ryan, Huang, Yulong, Kasukurthi, Mohan Vamsi, Li, Dongqi, Lin, Jingwei, Borchert, Glen M., Tan, Shaobo et al..  2019.  Reversible Data Hiding Based Key Region Protection Method in Medical Images. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). :1526–1530.
The transmission of medical image data in an open network environment is subject to privacy issues including patient privacy and data leakage. In the past, image encryption and information-hiding technology have been used to solve such security problems. But these methodologies, in general, suffered from difficulties in retrieving original images. We present in this paper an algorithm to protect key regions in medical images. First, coefficient of variation is used to locate the key regions, a.k.a. the lesion areas, of an image; other areas are then processed in blocks and analyzed for texture complexity. Next, our reversible data-hiding algorithm is used to embed the contents from the lesion areas into a high-texture area, and the Arnold transformation is performed to protect the original lesion information. In addition to this, we use the ciphertext of the basic information about the image and the decryption parameter to generate the Quick Response (QR) Code to replace the original key regions. Consequently, only authorized customers can obtain the encryption key to extract information from encrypted images. Experimental results show that our algorithm can not only restore the original image without information loss, but also safely transfer the medical image copyright and patient-sensitive information.
2019-08-12
Liu, Y., Yang, Y., Shi, A., Jigang, P., Haowei, L..  2019.  Intelligent monitoring of indoor surveillance video based on deep learning. 2019 21st International Conference on Advanced Communication Technology (ICACT). :648–653.

With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.

2019-03-22
Quweider, M., Lei, H., Zhang, L., Khan, F..  2018.  Managing Big Data in Visual Retrieval Systems for DHS Applications: Combining Fourier Descriptors and Metric Space Indexing. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :188-193.

Image retrieval systems have been an active area of research for more than thirty years progressively producing improved algorithms that improve performance metrics, operate in different domains, take advantage of different features extracted from the images to be retrieved, and have different desirable invariance properties. With the ever-growing visual databases of images and videos produced by a myriad of devices comes the challenge of selecting effective features and performing fast retrieval on such databases. In this paper, we incorporate Fourier descriptors (FD) along with a metric-based balanced indexing tree as a viable solution to DHS (Department of Homeland Security) needs to for quick identification and retrieval of weapon images. The FDs allow a simple but effective outline feature representation of an object, while the M-tree provide a dynamic, fast, and balanced search over such features. Motivated by looking for applications of interest to DHS, we have created a basic guns and rifles databases that can be used to identify weapons in images and videos extracted from media sources. Our simulations show excellent performance in both representation and fast retrieval speed.

2018-11-19
Chelaramani, S., Jha, A., Namboodiri, A. M..  2018.  Cross-Modal Style Transfer. 2018 25th IEEE International Conference on Image Processing (ICIP). :2157–2161.

We, humans, have the ability to easily imagine scenes that depict sentences such as ``Today is a beautiful sunny day'' or ``There is a Christmas feel, in the air''. While it is hard to precisely describe what one person may imagine, the essential high-level themes associated with such sentences largely remains the same. The ability to synthesize novel images that depict the feel of a sentence is very useful in a variety of applications such as education, advertisement, and entertainment. While existing papers tackle this problem given a style image, we aim to provide a far more intuitive and easy to use solution that synthesizes novel renditions of an existing image, conditioned on a given sentence. We present a method for cross-modal style transfer between an English sentence and an image, to produce a new image that imbibes the essential theme of the sentence. We do this by modifying the style transfer mechanism used in image style transfer to incorporate a style component derived from the given sentence. We demonstrate promising results using the YFCC100m dataset.

2018-02-28
Su, J. C., Wu, C., Jiang, H., Maji, S..  2017.  Reasoning About Fine-Grained Attribute Phrases Using Reference Games. 2017 IEEE International Conference on Computer Vision (ICCV). :418–427.

We present a framework for learning to describe finegrained visual differences between instances using attribute phrases. Attribute phrases capture distinguishing aspects of an object (e.g., “propeller on the nose” or “door near the wing” for airplanes) in a compositional manner. Instances within a category can be described by a set of these phrases and collectively they span the space of semantic attributes for a category. We collect a large dataset of such phrases by asking annotators to describe several visual differences between a pair of instances within a category. We then learn to describe and ground these phrases to images in the context of a reference game between a speaker and a listener. The goal of a speaker is to describe attributes of an image that allows the listener to correctly identify it within a pair. Data collected in a pairwise manner improves the ability of the speaker to generate, and the ability of the listener to interpret visual descriptions. Moreover, due to the compositionality of attribute phrases, the trained listeners can interpret descriptions not seen during training for image retrieval, and the speakers can generate attribute-based explanations for differences between previously unseen categories. We also show that embedding an image into the semantic space of attribute phrases derived from listeners offers 20% improvement in accuracy over existing attributebased representations on the FGVC-aircraft dataset.

2018-02-02
Abura'ed, Nour, Khan, Faisal Shah, Bhaskar, Harish.  2017.  Advances in the Quantum Theoretical Approach to Image Processing Applications. ACM Comput. Surv.. 49:75:1–75:49.
In this article, a detailed survey of the quantum approach to image processing is presented. Recently, it has been established that existing quantum algorithms are applicable to image processing tasks allowing quantum informational models of classical image processing. However, efforts continue in identifying the diversity of its applicability in various image processing domains. Here, in addition to reviewing some of the critical image processing applications that quantum mechanics have targeted, such as denoising, edge detection, image storage, retrieval, and compression, this study will also highlight the complexities in transitioning from the classical to the quantum domain. This article shall establish theoretical fundamentals, analyze performance and evaluation, draw key statistical evidence to support claims, and provide recommendations based on published literature mostly during the period from 2010 to 2015.
2018-01-10
Bai, Jiale, Ni, Bingbing, Wang, Minsi, Shen, Yang, Lai, Hanjiang, Zhang, Chongyang, Mei, Lin, Hu, Chuanping, Yao, Chen.  2017.  Deep Progressive Hashing for Image Retrieval. Proceedings of the 2017 ACM on Multimedia Conference. :208–216.

This paper proposes a novel recursive hashing scheme, in contrast to conventional "one-off" based hashing algorithms. Inspired by human's "nonsalient-to-salient" perception path, the proposed hashing scheme generates a series of binary codes based on progressively expanded salient regions. Built on a recurrent deep network, i.e., LSTM structure, the binary codes generated from later output nodes naturally inherit information aggregated from previously codes while explore novel information from the extended salient region, and therefore it possesses good scalability property. The proposed deep hashing network is trained via minimizing a triplet ranking loss, which is end-to-end trainable. Extensive experimental results on several image retrieval benchmarks demonstrate good performance gain over state-of-the-art image retrieval methods and its scalability property.

2017-03-08
Song, D., Liu, W., Ji, R., Meyer, D. A., Smith, J. R..  2015.  Top Rank Supervised Binary Coding for Visual Search. 2015 IEEE International Conference on Computer Vision (ICCV). :1922–1930.

In recent years, binary coding techniques are becoming increasingly popular because of their high efficiency in handling large-scale computer vision applications. It has been demonstrated that supervised binary coding techniques that leverage supervised information can significantly enhance the coding quality, and hence greatly benefit visual search tasks. Typically, a modern binary coding method seeks to learn a group of coding functions which compress data samples into binary codes. However, few methods pursued the coding functions such that the precision at the top of a ranking list according to Hamming distances of the generated binary codes is optimized. In this paper, we propose a novel supervised binary coding approach, namely Top Rank Supervised Binary Coding (Top-RSBC), which explicitly focuses on optimizing the precision of top positions in a Hamming-distance ranking list towards preserving the supervision information. The core idea is to train the disciplined coding functions, by which the mistakes at the top of a Hamming-distance ranking list are penalized more than those at the bottom. To solve such coding functions, we relax the original discrete optimization objective with a continuous surrogate, and derive a stochastic gradient descent to optimize the surrogate objective. To further reduce the training time cost, we also design an online learning algorithm to optimize the surrogate objective more efficiently. Empirical studies based upon three benchmark image datasets demonstrate that the proposed binary coding approach achieves superior image search accuracy over the state-of-the-arts.