Visible to the public Biblio

Found 16998 results

2017-12-12
Shaabani, Nuhad, Meinel, Christoph.  2017.  Incremental Discovery of Inclusion Dependencies. Proceedings of the 29th International Conference on Scientific and Statistical Database Management. :2:1–2:12.

Inclusion dependencies form one of the most fundamental classes of integrity constraints. Their importance in classical data management is reinforced by modern applications such as data profiling, data cleaning, entity resolution and schema matching. Their discovery in an unknown dataset is at the core of any data analysis effort. Therefore, several research approaches have focused on their efficient discovery in a given, static dataset. However, none of these approaches are appropriate for applications on dynamic datasets, such as transactional datasets, scientific applications, and social network. In these cases, discovery techniques should be able to efficiently update the inclusion dependencies after an update in the dataset, without reprocessing the entire dataset. We present the first approach for incrementally updating the unary inclusion dependencies. In particular, our approach is based on the concept of attribute clustering from which the unary inclusion dependencies are efficiently derivable. We incrementally update the clusters after each update of the dataset. Updating the clusters does not need to access the dataset because of special data structures designed to efficiently support the updating process. We perform an exhaustive analysis of our approach by applying it to large datasets with several hundred attributes and more than 116,200,000 million tuples. The results show that the incremental discovery significantly reduces the runtime needed by the static discovery. This reduction in the runtime is up to 99.9996 % for both the insert and the delete.

Zhou, G., Huang, J. X..  2017.  Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval. IEEE Transactions on Knowledge and Data Engineering. 29:1226–1239.

Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.

Saundry, A..  2017.  Institutional Repository Digital Object Metadata Enhancement and Re-Architecting. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). :1–3.

We present work undertaken at our institutional repository to enhance metadata and re-organize digital objects according to new information architecture, in an effort to minimize administrative object management and processing, and improve object discovery and use. This work was partly motivated by the launch of a new discovery platform at our institution, which aggregates metadata and full text from our four open access repositories into a cohesive, consistent, and enhanced searching and browsing experience. The platform provides digital object identifier (DOI) assignment, metadata access via various formats, and an open metadata and full text application program interface (API) for researchers, amongst other features. Functionality of these platform features relies heavily on accurate object representation and metadata. This work facilitates and improves the discovery and engagement of the diverse digital objects available from our institution, so they can be used and analyzed in new, flexible, and innovative ways by a myriad of communities and disciplines.

Kollenda, B., Göktaş, E., Blazytko, T., Koppe, P., Gawlik, R., Konoth, R. K., Giuffrida, C., Bos, H., Holz, T..  2017.  Towards Automated Discovery of Crash-Resistant Primitives in Binary Executables. 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :189–200.

Many modern defenses rely on address space layout randomization (ASLR) to efficiently hide security-sensitive metadata in the address space. Absent implementation flaws, an attacker can only bypass such defenses by repeatedly probing the address space for mapped (security-sensitive) regions, incurring a noisy application crash on any wrong guess. Recent work shows that modern applications contain idioms that allow the construction of crash-resistant code primitives, allowing an attacker to efficiently probe the address space without causing any visible crash. In this paper, we classify different crash-resistant primitives and show that this problem is much more prominent than previously assumed. More specifically, we show that rather than relying on labor-intensive source code inspection to find a few "hidden" application-specific primitives, an attacker can find such primitives semi-automatically, on many classes of real-world programs, at the binary level. To support our claims, we develop methods to locate such primitives in real-world binaries. We successfully identified 29 new potential primitives and constructed proof-of-concept exploits for four of them.

Ktob, A., Li, Z..  2017.  The Arabic Knowledge Graph: Opportunities and Challenges. 2017 IEEE 11th International Conference on Semantic Computing (ICSC). :48–52.

Semantic Web has brought forth the idea of computing with knowledge, hence, attributing the ability of thinking to machines. Knowledge Graphs represent a major advancement in the construction of the Web of Data where machines are context-aware when answering users' queries. The English Knowledge Graph was a milestone realized by Google in 2012. Even though it is a useful source of information for English users and applications, it does not offer much for the Arabic users and applications. In this paper, we investigated the different challenges and opportunities prone to the life-cycle of the construction of the Arabic Knowledge Graph (AKG) while following some best practices and techniques. Additionally, this work suggests some potential solutions to these challenges. The proprietary factor of data creates a major problem in the way of harvesting this latter. Moreover, when the Arabic data is openly available, it is generally in an unstructured form which requires further processing. The complexity of the Arabic language itself creates a further problem for any automatic or semi-automatic extraction processes. Therefore, the usage of NLP techniques is a feasible solution. Some preliminary results are presented later in this paper. The AKG has very promising outcomes for the Semantic Web in general and the Arabic community in particular. The goal of the Arabic Knowledge Graph is mainly the integration of the different isolated datasets available on the Web. Later, it can be used in both the academic (by providing a large dataset for many different research fields and enhance discovery) and commercial sectors (by improving search engines, providing metadata, interlinking businesses).

Nadgowda, S., Duri, S., Isci, C., Mann, V..  2017.  Columbus: Filesystem Tree Introspection for Software Discovery. 2017 IEEE International Conference on Cloud Engineering (IC2E). :67–74.

Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.

Kimmig, A., Memory, A., Miller, R. J., Getoor, L..  2017.  A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). :921–932.

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

Diaz, J. S. B., Medeiros, C. B..  2017.  WorkflowHunt: Combining Keyword and Semantic Search in Scientific Workflow Repositories. 2017 IEEE 13th International Conference on e-Science (e-Science). :138–147.

Scientific datasets and the experiments that analyze them are growing in size and complexity, and scientists are facing difficulties to share such resources. Some initiatives have emerged to try to solve this problem. One of them involves the use of scientific workflows to represent and enact experiment execution. There is an increasing number of workflows that are potentially relevant for more than one scientific domain. However, it is hard to find workflows suitable for reuse given an experiment. Creating a workflow takes time and resources, and their reuse helps scientists to build new workflows faster and in a more reliable way. Search mechanisms in workflow repositories should provide different options for workflow discovery, but it is difficult for generic repositories to provide multiple mechanisms. This paper presents WorkflowHunt, a hybrid architecture for workflow search and discovery for generic repositories, which combines keyword and semantic search to allow finding relevant workflows using different search methods. We validated our architecture creating a prototype that uses real workflows and metadata from myExperiment, and compare search results via WorkflowHunt and via myExperiment's search interface.

2017-12-04
Won, J., Singla, A., Bertino, E..  2017.  CertificateLess Cryptography-Based Rule Management Protocol for Advanced Mission Delivery Networks. 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW). :7–12.

Assured Mission Delivery Network (AMDN) is a collaborative network to support data-intensive scientific collaborations in a multi-cloud environment. Each scientific collaboration group, called a mission, specifies a set of rules to handle computing and network resources. Security is an integral part of the AMDN design since the rules must be set by authorized users and the data generated by each mission may be privacy-sensitive. In this paper, we propose a CertificateLess cryptography-based Rule-management Protocol (CL-RP) for AMDN, which supports authenticated rule registrations and updates with non-repudiation. We evaluate CL-RP through test-bed experiments and compare it with other standard protocols.

Sattar, N. S., Adnan, M. A., Kali, M. B..  2017.  Secured aerial photography using Homomorphic Encryption. 2017 International Conference on Networking, Systems and Security (NSysS). :107–114.

Aerial photography is fast becoming essential in scientific research that requires multi-agent system in several perspective and we proposed a secured system using one of the well-known public key cryptosystem namely NTRU that is somewhat homomorphic in nature. Here we processed images of aerial photography that were captured by multi-agents. The agents encrypt the images and upload those in the cloud server that is untrusted. Cloud computing is a buzzword in modern era and public cloud is being used by people everywhere for its shared, on-demand nature. Cloud Environment faces a lot of security and privacy issues that needs to be solved. This paper focuses on how to use cloud so effectively that there remains no possibility of data or computation breaches from the cloud server itself as it is prone to the attack of treachery in different ways. The cloud server computes on the encrypted data without knowing the contents of the images. After concatenation, encrypted result is delivered to the concerned authority where it is decrypted retaining its originality. We set up our experiment in Amazon EC2 cloud server where several instances were the agents and an instance acted as the server. We varied several parameters so that we could minimize encryption time. After experimentation we produced our desired result within feasible time sustaining the image quality. This work ensures data security in public cloud that was our main concern.

Thayananthan, V., Abdulkader, O., Jambi, K., Bamahdi, A. M..  2017.  Analysis of Cybersecurity Based on Li-Fi in Green Data Storage Environments. 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud). :327–332.

Industrial networking has many issues based on the type of industries, data storage, data centers, and cloud computing, etc. Green data storage improves the scientific, commercial and industrial profile of the networking. Future industries are looking for cybersecurity solution with the low-cost resources in which the energy serving is the main problem in the industrial networking. To improve these problems, green data storage will be the priority because data centers and cloud computing deals with the data storage. In this analysis, we have decided to use solar energy source and different light rays as methodologies include a prism and the Li-Fi techniques. In this approach, light rays sent through the prism which allows us to transmit the data with different frequencies. This approach provides green energy and maximum protection within the data center. As a result, we have illustrated that cloud services within the green data center in industrial networking will achieve better protection with the low-cost energy through this analysis. Finally, we have to conclude that Li-Fi enhances the use of green energy and protection which are advantages to current and future industrial networking.

Al-Shomrani, A., Fathy, F., Jambi, K..  2017.  Policy enforcement for big data security. 2017 2nd International Conference on Anti-Cyber Crimes (ICACC). :70–74.

Security and privacy of big data becomes challenging as data grows and more accessible by more and more clients. Large-scale data storage is becoming a necessity for healthcare, business segments, government departments, scientific endeavors and individuals. Our research will focus on the privacy, security and how we can make sure that big data is secured. Managing security policy is a challenge that our framework will handle for big data. Privacy policy needs to be integrated, flexible, context-aware and customizable. We will build a framework to receive data from customer and then analyze data received, extract privacy policy and then identify the sensitive data. In this paper we will present the techniques for privacy policy which will be created to be used in our framework.

Rodrigues, P., Sreedharan, S., Basha, S. A., Mahesh, P. S..  2017.  Security threat identification using energy points. 2017 2nd International Conference on Anti-Cyber Crimes (ICACC). :52–54.

This research paper identifies security issues; especially energy based security attacks and enhances security of the system. It is very essential to consider Security of the system to be developed in the initial Phases of the software Cycle of Software Development (SDLC) as many billions of bucks are drained owing to security flaws in software caused due to improper or no security process. Security breaches that occur on software system are in umpteen numbers. Scientific Literature propose many solutions to overcome security issues, all security mechanisms are reactive in nature. In this paper new security solution is proposed that is proactive in nature especially for energy based denial of service attacks which is frequent in the recent past. Proposed solution is based on energy consumption by system known as energy points.

Athinaiou, M..  2017.  Cyber security risk management for health-based critical infrastructures. 2017 11th International Conference on Research Challenges in Information Science (RCIS). :402–407.

This brief paper reports on an early stage ongoing PhD project in the field of cyber-physical security in health care critical infrastructures. The research overall aims to develop a methodology that will increase the ability of secure recovery of health critical infrastructures. This ambitious or reckless attempt, as it is currently at an early stage, in this paper, tries to answer why cyber-physical security for health care infrastructures is important and of scientific interest. An initial PhD project methodology and expected outcomes are also discussed. The report concludes with challenges that emerge and possible future directions.

Mayer, N., Feltus, C..  2017.  Evaluation of the risk and security overlay of archimate to model information system security risks. 2017 IEEE 21st International Enterprise Distributed Object Computing Workshop (EDOCW). :106–116.

We evaluated the support proposed by the RSO to represent graphically our EAM-ISSRM (Enterprise Architecture Management - Information System Security Risk Management) integrated model. The evaluation of the RSO visual notation has been done at two different levels: completeness with regards to the EAM-ISSRM integrated model (Section III) and cognitive effectiveness, relying on the nine principles established by D. Moody ["The 'Physics' of Notations: Toward a Scientific Basis for Constructing Visual Notations in Software Engineering," IEEE Trans. Softw. Eng., vol. 35, no. 6, pp. 756-779, Nov. 2009] (Section IV). Regarding completeness, the coverage of the EAMISSRM integrated model by the RSO is complete apart from 'Event'. As discussed in Section III, this lack is negligible and we can consider the RSO as an appropriate notation to support the EAM-ISSRM integrated model from a completeness point of view. Regarding cognitive effectiveness, many gaps have been identified with regards to the nine principle established by Moody. Although no quantitative analysis has been performed to objectify this conclusion, the RSO can decently not be considered as an appropriate notation from a cognitive effectiveness point of view and there is room to propose a notation better on this aspect. This paper is focused on assessing the RSO without suggesting improvements based on the conclusions drawn. As a consequence, our objective for future work is to propose a more cognitive effective visual notation for the EAM-ISSRM integrated model. The approach currently considered is to operationalize Moody's principles into concrete metrics and requirements, taking into account the needs and profile of the target group of our notation (information security risk managers) through personas development and user experience map. With such an approach, we will be able to make decisions on the necessary trade-offs about our visual syntax, taking care of a specific context. We also aim at valida- ing our proposal(s) with the help of tools and approaches extracted from cognitive psychology research applied to HCI domain (e.g., eye tracking, heuristic evaluation, user experience evaluation…).

Lier, B. van.  2017.  The industrial internet of things and cyber security: An ecological and systemic perspective on security in digital industrial ecosystems. 2017 21st International Conference on System Theory, Control and Computing (ICSTCC). :641–647.

All over the world, objects are increasingly connected in networks such as the Industrial Internet of Things. Interconnections, intercommunications and interactions are driving the development of an entirely new whole in the form of the Industrial Internet of Things. Communication and interaction are the norm both for separate components, such as cyber-physical systems, and for the functioning of the system as a whole. This new whole can be likened to a natural ecosystem where the process of homeostasis ensures the stability and security of the whole. Components of such an industrial ecosystem, or even an industrial ecosystem as a whole, are increasingly targeted by cyber attacks. Such attacks not only threaten the functioning of one or multiple components, they also constitute a threat to the functioning of the new whole. General systems theory can offer a scientific framework for the development of measures to improve the security and stability of both separate components and the new whole.

Hwang, T..  2017.  NSF GENI cloud enabled architecture for distributed scientific computing. 2017 IEEE Aerospace Conference. :1–8.

GENI (Global Environment for Network Innovations) is a National Science Foundation (NSF) funded program which provides a virtual laboratory for networking and distributed systems research and education. It is well suited for exploring networks at a scale, thereby promoting innovations in network science, security, services and applications. GENI allows researchers obtain compute resources from locations around the United States, connect compute resources using 100G Internet2 L2 service, install custom software or even custom operating systems on these compute resources, control how network switches in their experiment handle traffic flows, and run their own L3 and above protocols. GENI architecture incorporates cloud federation. With the federation, cloud resources can be federated and/or community of clouds can be formed. The heart of federation is user identity and an ability to “advertise” cloud resources into community including compute, storage, and networking. GENI administrators can carve out what resources are available to the community and hence a portion of GENI resources are reserved for internal consumption. GENI architecture also provides “stitching” of compute and storage resources researchers request. This provides L2 network domain over Internet2's 100G network. And researchers can run their Software Defined Networking (SDN) controllers on the provisioned L2 network domain for a complete control of networking traffic. This capability is useful for large science data transfer (bypassing security devices for high throughput). Renaissance Computing Institute (RENCI), a research institute in the state of North Carolina, has developed ORCA (Open Resource Control Architecture), a GENI control framework. ORCA is a distributed resource orchestration system to serve science experiments. ORCA provides compute resources as virtual machines and as well as baremetals. ORCA based GENI ra- k was designed to serve both High Throughput Computing (HTC) and High Performance Computing (HPC) type of computes. Although, GENI is primarily used in various universities and research entities today, GENI architecture can be leveraged in the commercial, aerospace and government settings. This paper will go over the architecture of GENI and discuss the GENI architecture for scientific computing experiments.

Johnston, B., Lee, B., Angove, L., Rendell, A..  2017.  Embedded Accelerators for Scientific High-Performance Computing: An Energy Study of OpenCL Gaussian Elimination Workloads. 2017 46th International Conference on Parallel Processing Workshops (ICPPW). :59–68.

Energy efficient High-Performance Computing (HPC) is becoming increasingly important. Recent ventures into this space have introduced an unlikely candidate to achieve exascale scientific computing hardware with a small energy footprint. ARM processors and embedded GPU accelerators originally developed for energy efficiency in mobile devices, where battery life is critical, are being repurposed and deployed in the next generation of supercomputers. Unfortunately, the performance of executing scientific workloads on many of these devices is largely unknown, yet the bulk of computation required in high-performance supercomputers is scientific. We present an analysis of one such scientific code, in the form of Gaussian Elimination, and evaluate both execution time and energy used on a range of embedded accelerator SoCs. These include three ARM CPUs and two mobile GPUs. Understanding how these low power devices perform on scientific workloads will be critical in the selection of appropriate hardware for these supercomputers, for how can we estimate the performance of tens of thousands of these chips if the performance of one is largely unknown?

Farinholt, B., Rezaeirad, M., Pearce, P., Dharmdasani, H., Yin, H., Blond, S. L., McCoy, D., Levchenko, K..  2017.  To Catch a Ratter: Monitoring the Behavior of Amateur DarkComet RAT Operators in the Wild. 2017 IEEE Symposium on Security and Privacy (SP). :770–787.

Remote Access Trojans (RATs) give remote attackers interactive control over a compromised machine. Unlike large-scale malware such as botnets, a RAT is controlled individually by a human operator interacting with the compromised machine remotely. The versatility of RATs makes them attractive to actors of all levels of sophistication: they've been used for espionage, information theft, voyeurism and extortion. Despite their increasing use, there are still major gaps in our understanding of RATs and their operators, including motives, intentions, procedures, and weak points where defenses might be most effective. In this work we study the use of DarkComet, a popular commercial RAT. We collected 19,109 samples of DarkComet malware found in the wild, and in the course of two, several-week-long experiments, ran as many samples as possible in our honeypot environment. By monitoring a sample's behavior in our system, we are able to reconstruct the sequence of operator actions, giving us a unique view into operator behavior. We report on the results of 2,747 interactive sessions captured in the course of the experiment. During these sessions operators frequently attempted to interact with victims via remote desktop, to capture video, audio, and keystrokes, and to exfiltrate files and credentials. To our knowledge, we are the first large-scale systematic study of RAT use.

Donno, M. De, Dragoni, N., Giaretta, A., Spognardi, A..  2017.  Analysis of DDoS-capable IoT malwares. 2017 Federated Conference on Computer Science and Information Systems (FedCSIS). :807–816.

The Internet of Things (IoT) revolution promises to make our lives easier by providing cheap and always connected smart embedded devices, which can interact on the Internet and create added values for human needs. But all that glitters is not gold. Indeed, the other side of the coin is that, from a security perspective, this IoT revolution represents a potential disaster. This plethora of IoT devices that flooded the market were very badly protected, thus an easy prey for several families of malwares that can enslave and incorporate them in very large botnets. This, eventually, brought back to the top Distributed Denial of Service (DDoS) attacks, making them more powerful and easier to achieve than ever. This paper aims at provide an up-to-date picture of DDoS attacks in the specific subject of the IoT, studying how these attacks work and considering the most common families in the IoT context, in terms of their nature and evolution through the years. It also explores the additional offensive capabilities that this arsenal of IoT malwares has available, to mine the security of Internet users and systems. We think that this up-to-date picture will be a valuable reference to the scientific community in order to take a first crucial step to tackle this urgent security issue.

Fraunholz, D., Zimmermann, M., Anton, S. D., Schneider, J., Schotten, H. Dieter.  2017.  Distributed and highly-scalable WAN network attack sensing and sophisticated analysing framework based on Honeypot technology. 2017 7th International Conference on Cloud Computing, Data Science Engineering - Confluence. :416–421.

Recently, the increase of interconnectivity has led to a rising amount of IoT enabled devices in botnets. Such botnets are currently used for large scale DDoS attacks. To keep track with these malicious activities, Honeypots have proven to be a vital tool. We developed and set up a distributed and highly-scalable WAN Honeypot with an attached backend infrastructure for sophisticated processing of the gathered data. For the processed data to be understandable we designed a graphical frontend that displays all relevant information that has been obtained from the data. We group attacks originating in a short period of time in one source as sessions. This enriches the data and enables a more in-depth analysis. We produced common statistics like usernames, passwords, username/password combinations, password lengths, originating country and more. From the information gathered, we were able to identify common dictionaries used for brute-force login attacks and other more sophisticated statistics like login attempts per session and attack efficiency.

Boudguiga, A., Bouzerna, N., Granboulan, L., Olivereau, A., Quesnel, F., Roger, A., Sirdey, R..  2017.  Towards Better Availability and Accountability for IoT Updates by Means of a Blockchain. 2017 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :50–58.

Building the Internet of Things requires deploying a huge number of objects with full or limited connectivity to the Internet. Given that these objects are exposed to attackers and generally not secured-by-design, it is essential to be able to update them, to patch their vulnerabilities and to prevent hackers from enrolling them into botnets. Ideally, the update infrastructure should implement the CIA triad properties, i.e., confidentiality, integrity and availability. In this work, we investigate how the use of a blockchain infrastructure can meet these requirements, with a focus on availability. In addition, we propose a peer-to-peer mechanism, to spread updates between objects that have limited access to the Internet. Finally, we give an overview of our ongoing prototype implementation.

Joshi, H. P., Bennison, M., Dutta, R..  2017.  Collaborative botnet detection with partial communication graph information. 2017 IEEE 38th Sarnoff Symposium. :1–6.

Botnets have long been used for malicious purposes with huge economic costs to the society. With the proliferation of cheap but non-secure Internet-of-Things (IoT) devices generating large amounts of data, the potential for damage from botnets has increased manifold. There are several approaches to detect bots or botnets, though many traditional techniques are becoming less effective as botnets with centralized command & control structure are being replaced by peer-to-peer (P2P) botnets which are harder to detect. Several algorithms have been proposed in literature that use graph analysis or machine learning techniques to detect the overlay structure of P2P networks in communication graphs. Many of these algorithms however, depend on the availability of a universal communication graph or a communication graph aggregated from several ISPs, which is not likely to be available in reality. In real world deployments, significant gaps in communication graphs are expected and any solution proposed should be able to work with partial information. In this paper, we analyze the effectiveness of some community detection algorithms in detecting P2P botnets, especially with partial information. We show that the approach can work with only about half of the nodes reporting their communication graphs, with only small increase in detection errors.

Alejandre, F. V., Cortés, N. C., Anaya, E. A..  2017.  Feature selection to detect botnets using machine learning algorithms. 2017 International Conference on Electronics, Communications and Computers (CONIELECOMP). :1–7.

In this paper, a novel method to do feature selection to detect botnets at their phase of Command and Control (C&C) is presented. A major problem is that researchers have proposed features based on their expertise, but there is no a method to evaluate these features since some of these features could get a lower detection rate than other. To this aim, we find the feature set based on connections of botnets at their phase of C&C, that maximizes the detection rate of these botnets. A Genetic Algorithm (GA) was used to select the set of features that gives the highest detection rate. We used the machine learning algorithm C4.5, this algorithm did the classification between connections belonging or not to a botnet. The datasets used in this paper were extracted from the repositories ISOT and ISCX. Some tests were done to get the best parameters in a GA and the algorithm C4.5. We also performed experiments in order to obtain the best set of features for each botnet analyzed (specific), and for each type of botnet (general) too. The results are shown at the end of the paper, in which a considerable reduction of features and a higher detection rate than the related work presented were obtained.

Gardner, M. T., Beard, C., Medhi, D..  2017.  Using SEIRS Epidemic Models for IoT Botnets Attacks. DRCN 2017 - Design of Reliable Communication Networks; 13th International Conference. :1–8.

The spread of Internet of Things (IoT) botnets like those utilizing the Mirai malware were successful enough to power some of the most powerful DDoS attacks that have been seen thus far on the Internet. Two such attacks occurred on October 21, 2016 and September 20, 2016. Since there are an estimated three billion IoT devices currently connected to the Internet, these attacks highlight the need to understand the spread of IoT worms like Mirai and the vulnerability that they create for the Internet. In this work, we describe the spread of IoT worms using a proposed model known as the IoT Botnet with Attack Information (IoT-BAI), which utilizes a variation of the Susceptible-Exposed-Infected-Recovered-Susceptible (SEIRS) epidemic model [14]. The IoT-BAI model has shown that it may be possible to mitigate the frequency of IoT botnet attacks with improved user information which may positively affect user behavior. Additionally, the IoT-BAI model has shown that increased vulnerability to attack can be caused by new hosts entering the IoT population on a daily basis. Models like IoT-BAI could be used to predict user behavior after significant events in the network like a significant botnet attack.