Biblio

Found 1137 results

Filters: First Letter Of Last Name is X  [Clear All Filters]
2017-12-20
Zhang, S., Peng, J., Huang, K., Xu, X., Zhong, Z..  2017.  Physical layer security in IoT: A spatial-temporal perspective. 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP). :1–6.
Delay and security are both highly concerned in the Internet of Things (IoT). In this paper, we set up a secure analytical framework for IoT networks to characterize the network delay performance and secrecy performance. Firstly, stochastic geometry and queueing theory are adopted to model the location of IoT devices and the temporal arrival of packets. Based on this model, a low-complexity secure on-off scheme is proposed to improve the network performance. Then, the delay performance and secrecy performance are evaluated in terms of packet delay and packet secrecy outage probability. It is demonstrated that the intensity of IoT devices arouse a tradeoff between the delay and security and the secure on-off scheme can improve the network delay performance and secrecy performance. Moreover, secrecy transmission rate is adopted to reflect the delay-security tradeoff. The analytical and simulation results show the effects of intensity of IoT devices and secure on-off scheme on the network delay performance and secrecy performance.
2018-01-10
Zhang, Jun, Cormode, Graham, Procopiuc, Cecilia M., Srivastava, Divesh, Xiao, Xiaokui.  2017.  PrivBayes: Private Data Release via Bayesian Networks. ACM Trans. Database Syst.. 42:25:1–25:41.
Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents PrivBayes, a differentially private method for releasing high-dimensional data. Given a dataset D, PrivBayes first constructs a Bayesian network N, which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D. After that, PrivBayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D. Finally, PrivBayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, PrivBayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D. Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate PrivBayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.
2018-08-23
Xu, D., Xiao, L., Sun, L., Lei, M..  2017.  Game theoretic study on blockchain based secure edge networks. 2017 IEEE/CIC International Conference on Communications in China (ICCC). :1–5.

Blockchain has been applied to study data privacy and network security recently. In this paper, we propose a punishment scheme based on the action record on the blockchain to suppress the attack motivation of the edge servers and the mobile devices in the edge network. The interactions between a mobile device and an edge server are formulated as a blockchain security game, in which the mobile device sends a request to the server to obtain real-time service or launches attacks against the server for illegal security gains, and the server chooses to perform the request from the device or attack it. The Nash equilibria (NEs) of the game are derived and the conditions that each NE exists are provided to disclose how the punishment scheme impacts the adversary behaviors of the mobile device and the edge server.

2018-06-20
Chakraborty, S., Stokes, J. W., Xiao, L., Zhou, D., Marinescu, M., Thomas, A..  2017.  Hierarchical learning for automated malware classification. MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM). :23–28.

Despite widespread use of commercial anti-virus products, the number of malicious files detected on home and corporate computers continues to increase at a significant rate. Recently, anti-virus companies have started investing in machine learning solutions to augment signatures manually designed by analysts. A malicious file's determination is often represented as a hierarchical structure consisting of a type (e.g. Worm, Backdoor), a platform (e.g. Win32, Win64), a family (e.g. Rbot, Rugrat) and a family variant (e.g. A, B). While there has been substantial research in automated malware classification, the aforementioned hierarchical structure, which can provide additional information to the classification models, has been ignored. In this paper, we propose the novel idea and study the performance of employing hierarchical learning algorithms for automated classification of malicious files. To the best of our knowledge, this is the first research effort which incorporates the hierarchical structure of the malware label in its automated classification and in the security domain, in general. It is important to note that our method does not require any additional effort by analysts because they typically assign these hierarchical labels today. Our empirical results on a real world, industrial-scale malware dataset of 3.6 million files demonstrate that incorporation of the label hierarchy achieves a significant reduction of 33.1% in the binary error rate as compared to a non-hierarchical classifier which is traditionally used in such problems.

2019-05-31
Bradley Potteiger, William Emfinger, Himanshu Neema, Xenofon Koutsoukos, CheeYee Tang, Keith Stouffer.  2017.  Evaluating the effects of cyber-attacks on cyber physical systems using a hardware-in-the-loop simulation testbed. Resilience Week (RWS). :177-183.

Cyber-Physical Systems (CPS) consist of embedded computers with sensing and actuation capability, and are integrated into and tightly coupled with a physical system. Because the physical and cyber components of the system are tightly coupled, cyber-security is important for ensuring the system functions properly and safely. However, the effects of a cyberattack on the whole system may be difficult to determine, analyze, and therefore detect and mitigate. This work presents a model based software development framework integrated with a hardware-in-the-loop (HIL) testbed for rapidly deploying CPS attack experiments. The framework provides the ability to emulate low level attacks and obtain platform specific performance measurements that are difficult to obtain in a traditional simulation environment. The framework improves the cybersecurity design process which can become more informed and customized to the production environment of a CPS. The developed framework is illustrated with a case study of a railway transportation system.

2018-09-30
B. Potteiger, W. Emfinger, H. Neema, X. Koutosukos, C. Tang, K. Stouffer.  2017.  Evaluating the effects of cyber-attacks on cyber physical systems using a hardware-in-the-loop simulation testbed. 2017 Resilience Week (RWS). :177-183.
Cyber-Physical Systems (CPS) consist of embedded computers with sensing and actuation capability, and are integrated into and tightly coupled with a physical system. Because the physical and cyber components of the system are tightly coupled, cyber-security is important for ensuring the system functions properly and safely. However, the effects of a cyberattack on the whole system may be difficult to determine, analyze, and therefore detect and mitigate. This work presents a model based software development framework integrated with a hardware-in-the-loop (HIL) testbed for rapidly deploying CPS attack experiments. The framework provides the ability to emulate low level attacks and obtain platform specific performance measurements that are difficult to obtain in a traditional simulation environment. The framework improves the cybersecurity design process which can become more informed and customized to the production environment of a CPS. The developed framework is illustrated with a case study of a railway transportation system.
2017-12-20
Xiaohao, S., Baolong, L..  2017.  An Investigation on Tree-Based Tags Anti-collision Algorithms in RFID. 2017 International Conference on Computer Network, Electronic and Automation (ICCNEA). :5–11.

The tree-based tags anti-collision algorithm is an important method in the anti-collision algorithms. In this paper, several typical tree algorithms are evaluated. The comparison of algorithms is summarized including time complexity, communication complexity and recognition, and the characteristics and disadvantages of each algorithm are pointed out. Finally, the improvement strategies of tree anti-collision algorithm are proposed, and the future research directions are also prospected.

2018-05-17
Xiong, Xiaobin, Ames, Aaron D, Goldman, Daniel I.  2017.  A Stability Region Criterion for Flat-footed BipedalWalking on Deformable Granular Terrain. Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on.
2017-09-27
Dai, Hong-Ning, Wang, Hao, Xiao, Hong, Zheng, Zibin, Wang, Qiu, Li, Xuran, Zhuge, Xu.  2016.  On Analyzing Eavesdropping Behaviours in Underwater Acoustic Sensor Networks. Proceedings of the 11th ACM International Conference on Underwater Networks & Systems. :53:1–53:2.
Underwater Acoustic Sensor Networks (UWASNs) have the wide of applications with the proliferation of the increasing underwater activities recently. Most of current studies are focused on designing protocols to improve the network performance of WASNs. However, the security of UWASNs is also an important concern since malicious nodes can easily wiretap the information transmitted in UWASNs due to the vulnerability of UWASNs. In this paper, we investigate one of security problems in UWASNs - eavesdropping behaviours. In particular, we propose a general model to quantitatively evaluate the probability of eavesdropping behaviour in UWASNs. Simulation results also validate the accuracy of our proposed model.
2017-10-27
Xu, Peng, Li, Jingnan, Wang, Wei, Jin, Hai.  2016.  Anonymous Identity-Based Broadcast Encryption with Constant Decryption Complexity and Strong Security. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :223–233.
Anonymous Identity-Based Broadcast Encryption (AIBBE) allows a sender to broadcast a ciphertext to multi-receivers, and keeps receivers' anonymity. The existing AIBBE schemes fail to achieve efficient decryption or strong security, like the constant decryption complexity, the security under the adaptive attack, or the security in the standard model. Hence, we propose two new AIBBE schemes to overcome the drawbacks of previous schemes in the state-of-art. The biggest contribution in our work is the proposed AIBBE scheme with constant decryption complexity and the provable security under the adaptive attack in the standard model. This scheme should be the first one to obtain advantages in all above mentioned aspects, and has sufficient contribution in theory due to its strong security. We also propose another AIBBE scheme in the Random Oracle (RO) model, which is of sufficient interest in practice due to our experiment.
2021-04-08
Wu, X., Yang, Z., Ling, C., Xia, X..  2016.  Artificial-Noise-Aided Message Authentication Codes With Information-Theoretic Security. IEEE Transactions on Information Forensics and Security. 11:1278–1290.
In the past, two main approaches for the purpose of authentication, including information-theoretic authentication codes and complexity-theoretic message authentication codes (MACs), were almost independently developed. In this paper, we consider to construct new MACs, which are both computationally secure and information-theoretically secure. Essentially, we propose a new cryptographic primitive, namely, artificial-noise-aided MACs (ANA-MACs), where artificial noise is used to interfere with the complexity-theoretic MACs and quantization is further employed to facilitate packet-based transmission. With a channel coding formulation of key recovery in the MACs, the generation of standard authentication tags can be seen as an encoding process for the ensemble of codes, where the shared key between Alice and Bob is considered as the input and the message is used to specify a code from the ensemble of codes. Then, we show that artificial noise in ANA-MACs can be well employed to resist the key recovery attack even if the opponent has an unlimited computing power. Finally, a pragmatic approach for the analysis of ANA-MACs is provided, and we show how to balance the three performance metrics, including the completeness error, the false acceptance probability, and the conditional equivocation about the key. The analysis can be well applied to a class of ANA-MACs, where MACs with Rijndael cipher are employed.
2017-03-07
Ruan, Wenjie, Sheng, Quan Z., Yang, Lei, Gu, Tao, Xu, Peipei, Shangguan, Longfei.  2016.  AudioGest: Enabling Fine-grained Hand Gesture Detection by Decoding Echo Signal. Proceedings of the 2016 {ACM} {International} {Joint} {Conference} on {Pervasive} and {Ubiquitous} {Computing}. :474–485.
Hand gesture is becoming an increasingly popular means of interacting with consumer electronic devices, such as mobile phones, tablets and laptops. In this paper, we present AudioGest, a device-free gesture recognition system that can accurately sense the hand in-air movement around user's devices. Compared to the state-of-the-art, AudioGest is superior in using only one pair of built-in speaker and microphone, without any extra hardware or infrastructure support and with no training, to achieve fine-grained hand detection. Our system is able to accurately recognize various hand gestures, estimate the hand in-air time, as well as average moving speed and waving range. We achieve this by transforming the device into an active sonar system that transmits inaudible audio signal and decodes the echoes of hand at its microphone. We address various challenges including cleaning the noisy reflected sound signal, interpreting the echo spectrogram into hand gestures, decoding the Doppler frequency shifts into the hand waving speed and range, as well as being robust to the environmental motion and signal drifting. We implement the proof-of-concept prototype in three different electronic devices and extensively evaluate the system in four real-world scenarios using 3,900 hand gestures that collected by five users for more than two weeks. Our results show that AudioGest can detect six hand gestures with an accuracy up to 96%, and by distinguishing the gesture attributions, it can provide up to 162 control commands for various applications.
2017-09-27
Xu, Jinsheng, Yuan, Xiaohong, Velma, Ashrith.  2016.  Design and Evaluation of a Course Module on Android Cipher Programming (Abstract Only). Proceedings of the 47th ACM Technical Symposium on Computing Science Education. :689–690.
Encryption is critical in protecting the confidentiality of users' data on mobile devices. However, research has shown that many mobile apps are not correctly using the ciphers, which makes them vulnerable to the attacks. The existing resources on cipher programming education do not provide enough practical scenarios to help students learn the cipher programming in the context of real world situations with programs that have complex interacting modules with access to networking, storage, and database. This poster introduces a course module that teaches students how to develop secure Android applications by correctly using Android's cryptography APIs. This course module is targeted to two areas where programmers commonly make many mistakes: password based encryption and SSL certificate validation. The core of the module includes a real world sample Android program for students to secure by implementing cryptographic components correctly. The course module will use open-ended problem solving to let students freely explore the multiple options in securing the application. The course module includes a lecture slide on Android's Crypto library, its common misuses, and suggested good practices. Assessment materials will also be included in the course module. This course module will be used and evaluated in a Network Security class. We will present the results of the evaluation in the conference.
2017-10-27
Fang, Fuyang, Li, Bao, Lu, Xianhui, Liu, Yamin, Jia, Dingding, Xue, Haiyang.  2016.  (Deterministic) Hierarchical Identity-based Encryption from Learning with Rounding over Small Modulus. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :907–912.
In this paper, we propose a hierarchical identity-based encryption (HIBE) scheme in the random oracle (RO) model based on the learning with rounding (LWR) problem over small modulus \$q\$. Compared with the previous HIBE schemes based on the learning with errors (LWE) problem, the ciphertext expansion ratio of our scheme can be decreased to 1/2. Then, we utilize the HIBE scheme to construct a deterministic hierarchical identity-based encryption (D-HIBE) scheme based on the LWR problem over small modulus. Finally, with the technique of binary tree encryption (BTE) we can construct HIBE and D-HIBE schemes in the standard model based on the LWR problem over small modulus.
2017-09-27
Chen, Zhongyue, Xu, Wen, Chen, Huifang.  2016.  Distributed Sensor Layout Optimization for Target Detection with Data Fusion. Proceedings of the 11th ACM International Conference on Underwater Networks & Systems. :50:1–50:2.
Distributed detection with data fusion has gained great attention in recent years. Collaborative detection improves the performance, and the optimal sensor deployment may change with time. It has been shown that with data fusion less sensors are needed to get the same detection ability when abundant sensors are deployed randomly. However, because of limitations on equipment number and deployment methods, fixed sensor locations may be preferred underwater. In this paper, we try to establish a theoretical framework for finding sensor positions to maximize the detection probability with a distributed sensor network. With joint data processing, detection performance is related to all the sensor locations; as sensor number grows, the optimization problem would become more difficult. To simplify the demonstration, we choose a 1-dimensional line deployment model and present the relevant numerical results.
Wang, Deqing, Zhang, Youfeng, Hu, Xiaoyi, Zhang, Rongxin, Su, Wei, Xie, Yongjun.  2016.  A Dynamic Spectrum Decision Algorithm for Underwater Cognitive Acoustic Networks. Proceedings of the 11th ACM International Conference on Underwater Networks & Systems. :3:1–3:5.
Cognitive acoustic (CA) is emerging as a promising technique for spectrum-efficient Underwater Acoustic Networks (UANs). Due to the unique features of UANs, especially the long propagation delay, the busy terminal problem and large interference range, traditional spectrum decision methods used for radio networks need an overhaul to work efficiently in underwater environment. In this paper, we propose a dynamic spectrum decision algorithm called Receiver-viewed Dynamic Borrowing (RvDB) algorithm for Underwater Cognitive Acoustic Networks (UCANs) to improve the efficiency of spectrum utilization. RvDB algorithm is with the following features. Firstly, the spectrum resource is decided by receiver. Secondly, the receivers can borrow the idle spectrum resource from neighbouring nodes dynamically. Finally, the spectrum sensing is completed by control packets on control channel which is separated from data channels. Simulation results show that RvDB algorithm can greatly improve the performance on spectrum efficiency.
2017-10-27
Zhongjing Ma, Suli Zou, Long Ran, Xingyu Shi, Ian Hiskens.  2016.  Efficient decentralized coordination of large-scale plug-in electric vehicle charging. Automatica. 69:35-47.
Minimizing the grid impacts of large-scale plug-in electric vehicle (PEV) charging tends to be associated with coordination strategies that seek to fill the overnight valley in electricity demand. However such strategies can result in high charging power, raising the possibility of local overloads within the distribution grid and of accelerated battery degradation. The paper establishes a framework for PEV charging coordination that facilitates the tradeoff between total generation cost and the local costs associated with overloading and battery degradation. A decentralized approach to solving the resulting large-scale optimization problem involves each PEV minimizing their charging cost with respect to a forecast price profile while taking into account local grid and battery effects. The charging strategies proposed by participating PEVs are used to update the price profile which is subsequently rebroadcast to the PEVs. The process then repeats. It is shown that under mild conditions this iterative process converges to the unique, efficient (socially optimal) coordination strategy.
2017-09-27
Gao, Mingsheng, Chen, Zhenming, Yao, Xiao, Xu, Ning.  2016.  Harmonic Potential Field Based Routing Protocol for 3D Underwater Sensor Networks. Proceedings of the 11th ACM International Conference on Underwater Networks & Systems. :38:1–38:2.
The local minima has been deemed as a challenging issue when designing routing protocols for 3D underwater sensor networks. Recently, harmonic potential field method has been used to tackle the issue of local minima which was also a major bottleneck in path planning and obstacle avoidance of robotics community. Inspired by this, this paper proposes a harmonic potential field based routing protocol for 3D underwater sensor networks with local minima. More specifically, the harmonic potential field is calculated using harmonic functions and Dirichlet boundary conditions are used for the local minima, sink(or seabuoy) and sending node. Numerical results show the effectiveness of the proposed routing protocol.
Chen, Huifang, Zhang, Ying, Chen, Zhongyue, Xu, Wen.  2016.  Implementation and Application of Underwater Acoustic Sensor Nodes. Proceedings of the 11th ACM International Conference on Underwater Networks & Systems. :41:1–41:2.
Underwater sensing is envisioned using inexpensive underwater sensor nodes distributed over a wide area, deployed close to the bottom, and networked through underwater acoustic communications. In this paper, an underwater acoustic sensor node to perform the underwater sensing is designed and implemented. Specifically, we describe the design criteria, architecture and functional modules of underwater acoustic sensor node. Moreover, we give the experiment results of ocean current field estimation using the designed underwater acoustic sensor nodes at the sea area of Liuheng, Zhoushan, China.
2017-10-04
Wang, Zhao, Xi, Yuan.  2016.  A Kind of De-noising and Segmentation Method for Hollow CAPTCHAs with Noise Arcs. Proceedings of the Fifth International Conference on Network, Communication and Computing. :68–72.
While many text-based CAPTCHA schemes have been broken, hollow CAPTCHAs as a new technology have been used by many websites. The generation method of currently used hollow CAPTCHAs is investigated, we found there is color difference between the boundary of characters contour lines and noise arcs. An algorithm of noise arcs removal to deal with this vulnerability is proposed. Furthermore, a de-noising and segmentation scheme for hollow CAPTCHAs with noise arcs is presented. The scheme is verified by the real CAPTCHA data from the website Sina Weibo. The success segmentation rate is 77%. Finally, some advice is given to improve the design of hollow CAPTCHA.
2017-09-27
Xu, Yanli, Jiang, Shengming, Liu, Feng.  2016.  A LTE-based Communication Architecture for Coastal Networks. Proceedings of the 11th ACM International Conference on Underwater Networks & Systems. :6:1–6:2.
Currently, the coastal communication is mainly provided by satellite networks, which are expensive with low transmission rate and unable to support underwater communication efficiently. In this work, we propose a communication architecture for coastal network based on long term evolution (LTE) cellular networks in which a cellular network architecture is designed for the maritime communication scenario. Some key technologies of next-generation cellular networks such as device-to-device (D2D) and multiple input multiple output (MIMO) are integrated into the proposed architecture to support more efficient data transmission. In addition, over-water nodes aid the transmission of underwater network to improve the communication quality. With the proposed communication architecture, the coastal network can provide high-quality communication service to traffics with different quality-of-service (QoS) requirements.
2017-10-19
Zhang, Chenwei, Xie, Sihong, Li, Yaliang, Gao, Jing, Fan, Wei, Yu, Philip S..  2016.  Multi-source Hierarchical Prediction Consolidation. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2251–2256.
In big data applications such as healthcare data mining, due to privacy concerns, it is necessary to collect predictions from multiple information sources for the same instance, with raw features being discarded or withheld when aggregating multiple predictions. Besides, crowd-sourced labels need to be aggregated to estimate the ground truth of the data. Due to the imperfection caused by predictive models or human crowdsourcing workers, noisy and conflicting information is ubiquitous and inevitable. Although state-of-the-art aggregation methods have been proposed to handle label spaces with flat structures, as the label space is becoming more and more complicated, aggregation under a label hierarchical structure becomes necessary but has been largely ignored. These label hierarchies can be quite informative as they are usually created by domain experts to make sense of highly complex label correlations such as protein functionality interactions or disease relationships. We propose a novel multi-source hierarchical prediction consolidation method to effectively exploits the complicated hierarchical label structures to resolve the noisy and conflicting information that inherently originates from multiple imperfect sources. We formulate the problem as an optimization problem with a closed-form solution. The consolidation result is inferred in a totally unsupervised, iterative fashion. Experimental results on both synthetic and real-world data sets show the effectiveness of the proposed method over existing alternatives.
2017-05-16
Su, Jinshu, Chen, Shuhui, Han, Biao, Xu, Chengcheng, Wang, Xin.  2016.  A 60Gbps DPI Prototype Based on Memory-Centric FPGA. Proceedings of the 2016 ACM SIGCOMM Conference. :627–628.

Deep packet inspection (DPI) is widely used in content-aware network applications to detect string features. It is of vital importance to improve the DPI performance due to the ever-increasing link speed. In this demo, we propose a novel DPI architecture with a hierarchy memory structure and parallel matching engines based on memory-centric FPGA. The implemented DPI prototype is able to provide up to 60Gbps full-text string matching throughput and fast rules update speed.

2017-07-24
Liao, Xiaojing, Yuan, Kan, Wang, XiaoFeng, Li, Zhou, Xing, Luyi, Beyah, Raheem.  2016.  Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :755–766.

To adapt to the rapidly evolving landscape of cyber threats, security professionals are actively exchanging Indicators of Compromise (IOC) (e.g., malware signatures, botnet IPs) through public sources (e.g. blogs, forums, tweets, etc.). Such information, often presented in articles, posts, white papers etc., can be converted into a machine-readable OpenIOC format for automatic analysis and quick deployment to various security mechanisms like an intrusion detection system. With hundreds of thousands of sources in the wild, the IOC data are produced at a high volume and velocity today, which becomes increasingly hard to manage by humans. Efforts to automatically gather such information from unstructured text, however, is impeded by the limitations of today's Natural Language Processing (NLP) techniques, which cannot meet the high standard (in terms of accuracy and coverage) expected from the IOCs that could serve as direct input to a defense system. In this paper, we present iACE, an innovation solution for fully automated IOC extraction. Our approach is based upon the observation that the IOCs in technical articles are often described in a predictable way: being connected to a set of context terms (e.g., "download") through stable grammatical relations. Leveraging this observation, iACE is designed to automatically locate a putative IOC token (e.g., a zip file) and its context (e.g., "malware", "download") within the sentences in a technical article, and further analyze their relations through a novel application of graph mining techniques. Once the grammatical connection between the tokens is found to be in line with the way that the IOC is commonly presented, these tokens are extracted to generate an OpenIOC item that describes not only the indicator (e.g., a malicious zip file) but also its context (e.g., download from an external source). Running on 71,000 articles collected from 45 leading technical blogs, this new approach demonstrates a remarkable performance: it generated 900K OpenIOC items with a precision of 95% and a coverage over 90%, which is way beyond what the state-of-the-art NLP technique and industry IOC tool can achieve, at a speed of thousands of articles per hour. Further, by correlating the IOCs mined from the articles published over a 13-year span, our study sheds new light on the links across hundreds of seemingly unrelated attack instances, particularly their shared infrastructure resources, as well as the impacts of such open-source threat intelligence on security protection and evolution of attack strategies.

2017-05-22
Kantarcioglu, Murat, Xi, Bowei.  2016.  Adversarial Data Mining: Big Data Meets Cyber Security. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1866–1867.

As more and more cyber security incident data ranging from systems logs to vulnerability scan results are collected, manually analyzing these collected data to detect important cyber security events become impossible. Hence, data mining techniques are becoming an essential tool for real-world cyber security applications. For example, a report from Gartner [gartner12] claims that "Information security is becoming a big data analytics problem, where massive amounts of data will be correlated, analyzed and mined for meaningful patterns". Of course, data mining/analytics is a means to an end where the ultimate goal is to provide cyber security analysts with prioritized actionable insights derived from big data. This raises the question, can we directly apply existing techniques to cyber security applications? One of the most important differences between data mining for cyber security and many other data mining applications is the existence of malicious adversaries that continuously adapt their behavior to hide their actions and to make the data mining models ineffective. Unfortunately, traditional data mining techniques are insufficient to handle such adversarial problems directly. The adversaries adapt to the data miner's reactions, and data mining algorithms constructed based on a training dataset degrades quickly. To address these concerns, over the last couple of years new and novel data mining techniques which is more resilient to such adversarial behavior are being developed in machine learning and data mining community. We believe that lessons learned as a part of this research direction would be beneficial for cyber security researchers who are increasingly applying machine learning and data mining techniques in practice. To give an overview of recent developments in adversarial data mining, in this three hour long tutorial, we introduce the foundations, the techniques, and the applications of adversarial data mining to cyber security applications. We first introduce various approaches proposed in the past to defend against active adversaries, such as a minimax approach to minimize the worst case error through a zero-sum game. We then discuss a game theoretic framework to model the sequential actions of the adversary and the data miner, while both parties try to maximize their utilities. We also introduce a modified support vector machine method and a relevance vector machine method to defend against active adversaries. Intrusion detection and malware detection are two important application areas for adversarial data mining models that will be discussed in details during the tutorial. Finally, we discuss some practical guidelines on how to use adversarial data mining ideas in generic cyber security applications and how to leverage existing big data management tools for building data mining algorithms for cyber security.