Biblio
Deep learning has made remarkable achievements in various domains. Active learning, which aims to reduce the budget for training a machine-learning model, is especially useful for the Deep learning tasks with the demand of a large number of labeled samples. Unfortunately, our empirical study finds that many of the active learning heuristics are not effective when applied to Deep learning models in batch settings. To tackle these limitations, we propose a density weighted diversity based query strategy (DWDS), which makes use of the geometry of the samples. Within a limited labeling budget, DWDS enhances model performance by querying labels for the new training samples with the maximum informativeness and representativeness. Furthermore, we propose a beam-search based method to obtain a good approximation to the optimum of such samples. Our experiments show that DWDS outperforms existing algorithms in Deep learning tasks.
The concept of the adversary model has been widely applied in the context of cryptography. When designing a cryptographic scheme or protocol, the adversary model plays a crucial role in the formalization of the capabilities and limitations of potential attackers. These models further enable the designer to verify the security of the scheme or protocol under investigation. Although being well established for conventional cryptanalysis attacks, adversary models associated with attackers enjoying the advantages of machine learning techniques have not yet been developed thoroughly. In particular, when it comes to composed hardware, often being security-critical, the lack of such models has become increasingly noticeable in the face of advanced, machine learning-enabled attacks. This paper aims at exploring the adversary models from the machine learning perspective. In this regard, we provide examples of machine learning-based attacks against hardware primitives, e.g., obfuscation schemes and hardware root-of-trust, claimed to be infeasible. We demonstrate that this assumption becomes however invalid as inaccurate adversary models have been considered in the literature.
This paper studies the deletion propagation problem in terms of minimizing view side-effect. It is a problem funda-mental to data lineage and quality management which could be a key step in analyzing view propagation and repairing data. The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views △V while preserving any other results. The complexity of this problem has been well studied for the case with only a single query. Dichotomies, even trichotomies, for different settings are developed. However, no results on multiple queries are given which is a more realistic case. We study the complexity and approximations of optimizing the side-effect on the views, i.e., find T to minimize the additional damage on V after removing all the tuples of △V. We focus on the class of key-preserving conjunctive queries which is a dichotomy for the single query case. It is surprising to find that except the single query case, this problem is NP-hard to approximate within any constant even for a non-trivial set of multiple project-free conjunctive queries in terms of view side-effect. The proposed algorithm shows that it can be approximated within a bound depending on the number of tuples of both V and △V. We identify a class of polynomial tractable inputs, and provide a dynamic programming algorithm to solve the problem. Besides data lineage, study on this problem could also provide important foundations for the computational issues in data repairing. Furthermore, we introduce some related applications of this problem, especially for query feedback based data cleaning.
The storage efficiency of hash codes and their application in the fast approximate nearest neighbor search, along with the explosion in the size of available labeled image datasets caused an intensive interest in developing learning based hash algorithms recently. In this paper, we present a learning based hash algorithm that utilize ordinal information of feature vectors. We have proposed a novel mathematically differentiable approximation of argmax function for this hash algorithm. It has enabled seamless integration of hash function with deep neural network architecture which can exploit the rich feature vectors generated by convolutional neural networks. We have also proposed a loss function for the case that the hash code is not binary and its entries are digits of arbitrary k-ary base. The resultant model comprised of feature vector generation and hashing layer is amenable to end-to-end training using gradient descent methods. In contrast to the majority of current hashing algorithms that are either not learning based or use hand-crafted feature vectors as input, simultaneous training of the components of our system results in better optimization. Extensive evaluations on NUS-WIDE, CIFAR-10 and MIRFlickr benchmarks show that the proposed algorithm outperforms state-of-art and classical data agnostic, unsupervised and supervised hashing methods by 2.6% to 19.8% mean average precision under various settings.