Biblio
The storage efficiency of hash codes and their application in the fast approximate nearest neighbor search, along with the explosion in the size of available labeled image datasets caused an intensive interest in developing learning based hash algorithms recently. In this paper, we present a learning based hash algorithm that utilize ordinal information of feature vectors. We have proposed a novel mathematically differentiable approximation of argmax function for this hash algorithm. It has enabled seamless integration of hash function with deep neural network architecture which can exploit the rich feature vectors generated by convolutional neural networks. We have also proposed a loss function for the case that the hash code is not binary and its entries are digits of arbitrary k-ary base. The resultant model comprised of feature vector generation and hashing layer is amenable to end-to-end training using gradient descent methods. In contrast to the majority of current hashing algorithms that are either not learning based or use hand-crafted feature vectors as input, simultaneous training of the components of our system results in better optimization. Extensive evaluations on NUS-WIDE, CIFAR-10 and MIRFlickr benchmarks show that the proposed algorithm outperforms state-of-art and classical data agnostic, unsupervised and supervised hashing methods by 2.6% to 19.8% mean average precision under various settings.
Hashing has been a widely-adopted technique for nearest neighbor search in large-scale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, the cost of annotating data is often an obstacle when applying supervised hashing to a new domain. Moreover, the results can suffer from the robustness problem as the data at training and test stage may come from different distributions. This paper studies the exploration of generating synthetic data through semi-supervised generative adversarial networks (GANs), which leverages largely unlabeled and limited labeled training data to produce highly compelling data with intrinsic invariance and global coherence, for better understanding statistical structures of natural data. We demonstrate that the above two limitations can be well mitigated by applying the synthetic data for hashing. Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream. The whole architecture is trained end-to-end by jointly optimizing three losses, i.e., adversarial loss to correct label of synthetic or real for each sample, triplet ranking loss to preserve the relative similarity ordering in the input real-synthetic triplets and classification loss to classify each sample accurately. Extensive experiments conducted on both CIFAR-10 and NUS-WIDE image benchmarks validate the capability of exploiting synthetic images for hashing. Our framework also achieves superior results when compared to state-of-the-art deep hash models.
K-means algorithm has been widely used in machine learning and data mining due to its simplicity and good performance. However, the standard k-means algorithm would be quite slow for clustering millions of data into thousands of or even tens of thousands of clusters. In this paper, we propose a fast k-means algorithm named multi-stage k-means (MKM) which uses a multi-stage filtering approach. The multi-stage filtering approach greatly accelerates the k-means algorithm via a coarse-to-fine search strategy. To further speed up the algorithm, hashing is introduced to accelerate the assignment step which is the most time-consuming part in k-means. Extensive experiments on several massive datasets show that the proposed algorithm can obtain up to 600X speed-up over the k-means algorithm with comparable accuracy.
Covert operations involving clandestine dealings and communication through cryptic and hidden messages have existed since time immemorial. While these do have a negative connotation, they have had their fair share of use in situations and applications beneficial to society in general. A "Dead Drop" is one such method of espionage trade craft used to physically exchange items or information between two individuals using a secret rendezvous point. With a "Dead Drop", to maintain operational security, the exchange itself is asynchronous. Information hiding in the slack space is one modern technique that has been used extensively. Slack space is the unused space within the last block allocated to a stored file. However, hiding in slack space operates under significant constraints with little resilience and fault tolerance. In this paper, we propose FROST – a novel asynchronous "Digital Dead Drop" robust to detection and data loss with tunable fault tolerance. Fault tolerance is a critical attribute of a secure and robust system design. Through extensive validation of FROST prototype implementation on Ubuntu Linux, we confirm the performance and robustness of the proposed digital dead drop to detection and data loss. We verify the recoverability of the secret message under various operating conditions ranging from block corruption and drive de-fragmentation to growing existing files on the target drive.
Wireless wearable embedded devices dominate the Internet of Things (IoT) due to their ability to provide useful information about the body and its local environment. The constrained resources of low power processors, however, pose a significant challenge to run-time error logging and hence, product reliability. Error logs classify error type and often system state following the occurrence of an error. Traditional error logging algorithms attempt to balance storage and accuracy by selectively overwriting past log entries. Since a specific combination of firmware faults may result in system instability, preserving all error occurrences becomes increasingly beneficial as IOT systems become more complex. In this paper, a novel hash-based error logging algorithm is presented which has both constant insertion time and constant memory while also exhibiting no false negatives and an acceptable false positive error rate. Both theoretical analysis and simulations are used to compare the performance of the hash-based and traditional approaches.
Obtaining frequency information of data streams, in limited space, is a well-recognized problem in literature. A number of recent practical applications (such as those in computational advertising) require temporally-aware solutions: obtaining historical count statistics for both time-points as well as time-ranges. In these scenarios, accuracy of estimates is typically more important for recent instances than for older ones; we call this desirable property Time Adaptiveness. With this observation, [20] introduced the Hokusai technique based on count-min sketches for estimating the frequency of any given item at any given time. The proposed approach is problematic in practice, as its memory requirements grow linearly with time, and it produces discontinuities in the estimation accuracy. In this work, we describe a new method, Time-adaptive Sketches, (Ada-sketch), that overcomes these limitations, while extending and providing a strict generalization of several popular sketching algorithms. The core idea of our method is inspired by the well-known digital Dolby noise reduction procedure that dates back to the 1960s. The theoretical analysis presented could be of independent interest in itself, as it provides clear results for the time-adaptive nature of the errors. An experimental evaluation on real streaming datasets demonstrates the superiority of the described method over Hokusai in estimating point and range queries over time. The method is simple to implement and offers a variety of design choices for future extensions. The simplicity of the procedure and the method's generalization of classic sketching techniques give hope for wide applicability of Ada-sketches in practice.
The secure hash algorithm (SHA)-3 has been selected in 2012 and will be used to provide security to any application which requires hashing, pseudo-random number generation, and integrity checking. This algorithm has been selected based on various benchmarks such as security, performance, and complexity. In this paper, in order to provide reliable architectures for this algorithm, an efficient concurrent error detection scheme for the selected SHA-3 algorithm, i.e., Keccak, is proposed. To the best of our knowledge, effective countermeasures for potential reliability issues in the hardware implementations of this algorithm have not been presented to date. In proposing the error detection approach, our aim is to have acceptable complexity and performance overheads while maintaining high error coverage. In this regard, we present a low-complexity recomputing with rotated operands-based scheme which is a step-forward toward reducing the hardware overhead of the proposed error detection approach. Moreover, we perform injection-based fault simulations and show that the error coverage of close to 100% is derived. Furthermore, we have designed the proposed scheme and through ASIC analysis, it is shown that acceptable complexity and performance overheads are reached. By utilizing the proposed high-performance concurrent error detection scheme, more reliable and robust hardware implementations for the newly-standardized SHA-3 are realized.
The need for increased surveillance due to increase in flight volume in remote or oceanic regions outside the range of traditional radar coverage has been fulfilled by the advent of space-based Automatic Dependent Surveillance — Broadcast (ADS-B) Surveillance systems. ADS-B systems have the capability of providing air traffic controllers with highly accurate real-time flight data. ADS-B is dependent on digital communications between aircraft and ground stations of the air route traffic control center (ARTCC); however these communications are not secured. Anyone with the appropriate capabilities and equipment can interrogate the signal and transmit their own false data; this is known as spoofing. The possibility of this type of attacks decreases the situational awareness of United States airspace. The purpose of this project is to design a secure transmission framework that prevents ADS-B signals from being spoofed. Three alternative methods of securing ADS-B signals are evaluated: hashing, symmetric encryption, and asymmetric encryption. Security strength of the design alternatives is determined from research. Feasibility criteria are determined by comparative analysis of alternatives. Economic implications and possible collision risk is determined from simulations that model the United State airspace over the Gulf of Mexico and part of the airspace under attack respectively. The ultimate goal of the project is to show that if ADS-B signals can be secured, the situational awareness can improve and the ARTCC can use information from this surveillance system to decrease the separation between aircraft and ultimately maximize the use of the United States airspace.