Biblio

Filters: Author is Giles, C. Lee  [Clear All Filters]
2018-06-20
Wang, Qinglong, Guo, Wenbo, Zhang, Kaixuan, Ororbia, II, Alexander G., Xing, Xinyu, Liu, Xue, Giles, C. Lee.  2017.  Adversary Resistant Deep Neural Networks with an Application to Malware Detection. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1145–1153.
Outside the highly publicized victories in the game of Go, there have been numerous successful applications of deep learning in the fields of information retrieval, computer vision, and speech recognition. In cybersecurity, an increasing number of companies have begun exploring the use of deep learning (DL) in a variety of security tasks with malware detection among the more popular. These companies claim that deep neural networks (DNNs) could help turn the tide in the war against malware infection. However, DNNs are vulnerable to adversarial samples, a shortcoming that plagues most, if not all, statistical and machine learning models. Recent research has demonstrated that those with malicious intent can easily circumvent deep learning-powered malware detection by exploiting this weakness. To address this problem, previous work developed defense mechanisms that are based on augmenting training data or enhancing model complexity. However, after analyzing DNN susceptibility to adversarial samples, we discover that the current defense mechanisms are limited and, more importantly, cannot provide theoretical guarantees of robustness against adversarial sampled-based attacks. As such, we propose a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within data vectors. Our proposed technique is evaluated on a real world dataset with 14,679 malware variants and 17,399 benign programs. We theoretically validate the robustness of our technique, and empirically show that our technique significantly boosts DNN robustness to adversarial samples while maintaining high accuracy in classification. To demonstrate the general applicability of our proposed method, we also conduct experiments using the MNIST and CIFAR-10 datasets, widely used in image recognition research.
2018-11-19
Liang, Chen, Yang, Xiao, Wham, Drew, Pursel, Bart, Passonneaur, Rebecca, Giles, C. Lee.  2017.  Distractor Generation with Generative Adversarial Nets for Automatically Creating Fill-in-the-Blank Questions. Proceedings of the Knowledge Capture Conference. :33:1–33:4.

Distractor generation is a crucial step for fill-in-the-blank question generation. We propose a generative model learned from training generative adversarial nets (GANs) to create useful distractors. Our method utilizes only context information and does not use the correct answer, which is completely different from previous Ontology-based or similarity-based approaches. Trained on the Wikipedia corpus, the proposed model is able to predict Wiki entities as distractors. Our method is evaluated on two biology question datasets collected from Wikipedia and actual college-level exams. Experimental results show that our context-based method achieves comparable performance to a frequently used word2vec-based method for the Wiki dataset. In addition, we propose a second-stage learner to combine the strengths of the two methods, which further improves the performance on both datasets, with 51.7% and 48.4% of generated distractors being acceptable.

2017-03-07
Kim, Kunho, Giles, C. Lee.  2016.  Financial Entity Record Linkage with Random Forests. Proceedings of the Second International Workshop on Data Science for Macro-Modeling. :13:1–13:2.

Record linkage refers to the task of finding same entity across different databases. We propose a machine learning based record linkage algorithm for financial entity databases. Record linkage on financial databases are essential for information integration on certain financial entity, since those databases do not have common unified identifier. Our algorithm works in two steps to determine if a pair of record is same entity or not. First we check with proposed rules if the record pair can be exactly matched after cleaning the entity name and address. Second, inspired by earlier work on author name disambiguation, we train a binary Random Forest classifier to decide the linkage. To reduce and scale the computation, this process is done only for candidate pairs within a proposed heuristic. Initial evaluation for precision, recall and F1 measures on two different linking tasks in the Financial Entity Identification and Information Integration (FEIII) Challenge show promising results.