Biblio
Testing a software product line such as Linux implies building the source with different configurations. Manual approaches to generate configurations that enable code of interest are doomed to fail due to the high amount of variation points distributed over the feature model, the build system and the source code. Research has proposed various approaches to generate covering configurations, but the algorithms show many drawbacks related to run-time, exhaustiveness and the amount of generated configurations. Hence, analyzing an entire Linux source can yield more than 30 thousand configurations and thereby exceeds the limited budget and resources for build testing. In this paper, we present an approach to fill the gap between a systematic generation of configurations and the necessity to fully build software in order to test it. By merging previously generated configurations, we reduce the number of necessary builds and enable global variability-aware testing. We reduce the problem of merging configurations to finding maximum cliques in a graph. We evaluate the approach on the Linux kernel, compare the results to common practices in industry, and show that our implementation scales even when facing graphs with millions of edges.
Over transactional database systems MultiVersion concurrency control is maintained for secure, fast and efficient access to the shared data file implementation scenario. An effective coordination is supposed to be set up between owners and users also the developers & system operators, to maintain inter-cloud & intra-cloud communication Most of the services & application offered in cloud world are real-time, which entails optimized compatibility service environment between master and slave clusters. In the paper, offered methodology supports replication and triggering methods intended for data consistency and dynamicity. Where intercommunication between different clusters is processed through middleware besides slave intra-communication is handled by verification & identification protection. The proposed approach incorporates resistive flow to handle high impact systems that identifies and verifies multiple processes. Results show that the new scheme reduces the overheads from different master and slave servers as they are co-located in clusters which allow increased horizontal and vertical scalability of resources.
Time optimal reachability analysis employs model-checking to compute goal states that can be reached from an initial state with a minimal accumulated time duration. The model-checker may produce a corresponding diagnostic trace which can be interpreted as a feasible schedule for many scheduling and planning problems, response time optimization etc. We propose swarm verification to accelerate time optimal reachability using the real-time model-checker Uppaal. In swarm verification, a large number of model checker instances execute in parallel on a computer cluster using different, typically randomized search strategies. We develop four swarm algorithms and evaluate them with four models in terms scalability, and time- and memory consumption. Three of these cooperate by exchanging costs of intermediate solutions to prune the search using a branch-and-bound approach. Our results show that swarm algorithms work much faster than sequential algorithms, and especially two using combinations of random-depth-first and breadth-first show very promising performance.
Unsafe behavior of cyber-physical systems can have disastrous consequences, motivating the need for formal verification of these kinds of systems. Deductive verification in a proof assistant such as Coq is a promising technique for this verification because it (1) justifies all verification from first principles, (2) is not limited to classes of systems for which full automation is possible, and (3) provides a platform for proving powerful, higher-order modularity theorems that are crucial for scaling verification to complex systems. In this paper, we demonstrate the practicality, utility, and scalability of this approach by developing in Coq sound and powerful rules for modular construction and verification of sampled-data cyber-physical systems. We evaluate these rules by using them to verify a number of non-trivial controllers enforcing safety properties of a quadcopter, e.g. a geo-fence. We show that our controllers are realistic by running them on a real, flying quadcopter.
Cloud service providers typically adopt the multi-tenancy model to optimize resources usage and achieve the promised cost-effectiveness. Sharing resources between different tenants and the underlying complex technology increase the necessity of transparency and accountability. In this regard, auditing security compliance of the provider's infrastructure against standards, regulations and customers' policies takes on an increasing importance in the cloud to boost the trust between the stakeholders. However, virtualization and scalability make compliance verification challenging. In this work, we propose an automated framework that allows auditing the cloud infrastructure from the structural point of view while focusing on virtualization-related security properties and consistency between multiple control layers. Furthermore, to show the feasibility of our approach, we integrate our auditing system into OpenStack, one of the most used cloud infrastructure management systems. To show the scalability and validity of our framework, we present our experimental results on assessing several properties related to auditing inter-layer consistency, virtual machines co-residence, and virtual resources isolation.
Since debugging is a time-consuming activity, automated program repair tools such as GenProg have garnered interest. A recent study revealed that the majority of GenProg repairs avoid bugs simply by deleting functionality. We found that SPR, a state-of-the-art repair tool proposed in 2015, still deletes functionality in their many "plausible" repairs. Unlike generate-and-validate systems such as GenProg and SPR, semantic analysis based repair techniques synthesize a repair based on semantic information of the program. While such semantics-based repair methods show promise in terms of quality of generated repairs, their scalability has been a concern so far. In this paper, we present Angelix, a novel semantics-based repair method that scales up to programs of similar size as are handled by search-based repair tools such as GenProg and SPR. This shows that Angelix is more scalable than previously proposed semantics based repair methods such as SemFix and DirectFix. Furthermore, our repair method can repair multiple buggy locations that are dependent on each other. Such repairs are hard to achieve using SPR and GenProg. In our experiments, Angelix generated repairs from large-scale real-world software such as wireshark and php, and these generated repairs include multi-location repairs. We also report our experience in automatically repairing the well-known Heartbleed vulnerability.
We present the design and implementation of a trust-on-first-use (TOFU) policy for OpenPGP. When an OpenPGP user verifies a signature, TOFU checks that the signer used the same key as in the past. If not, this is a strong indicator that a key is a forgery and either the message is also a forgery or an active man-in-the-middle attack (MitM) is or was underway. That is, TOFU can proactively detect new attacks if the user had previously verified a message from the signer. And, it can reactively detect an attack if the signer gets a message through. TOFU cannot, however, protect against sustained MitM attacks. Despite this weakness, TOFU's practical security is stronger than the Web of Trust (WoT), OpenPGP's current trust policy, for most users. The problem with the WoT is that it requires too much user support. TOFU is also better than the most popular alternative, an X.509-based PKI, which relies on central servers whose certification processes are often sloppy. In this paper, we outline how TOFU can be integrated into OpenPGP; we address a number of potential attacks against TOFU; and, we show how TOFU can work alongside the WoT. Our implementation demonstrates the practicality of the approach.
The semantics of online authentication in the web are rather straightforward: if Alice has a certificate binding Bob's name to a public key, and if a remote entity can prove knowledge of Bob's private key, then (barring key compromise) that remote entity must be Bob. However, in reality, many websites' and the majority of the most popular ones-are hosted at least in part by third parties such as Content Delivery Networks (CDNs) or web hosting providers. Put simply: administrators of websites who deal with (extremely) sensitive user data are giving their private keys to third parties. Importantly, this sharing of keys is undetectable by most users, and widely unknown even among researchers. In this paper, we perform a large-scale measurement study of key sharing in today's web. We analyze the prevalence with which websites trust third-party hosting providers with their secret keys, as well as the impact that this trust has on responsible key management practices, such as revocation. Our results reveal that key sharing is extremely common, with a small handful of hosting providers having keys from the majority of the most popular websites. We also find that hosting providers often manage their customers' keys, and that they tend to react more slowly yet more thoroughly to compromised or potentially compromised keys.
In this paper we discuss several improvements to the security and reliability of a classic Bluetooth network (piconet) that can arise from the fact of being able to transmit the same frame with two frequencies on each slot, instead of the actual standard, that uses only one frequency. Furthermore, we build upon this possibility and we show that piconet participants can explore many strategies to increase the security of their communications by confounding eavesdroppers, such as multiple hopping sequences, random selection of a hopping sequence on each transmission slot and variable frame encryption per hopping sequence. Finally, all this can be decided independently by any piconet participant without having to agree in real time on some type of service with other participants of the same piconet.
Recent studies shows that by the end of 2016 more than 60% of Internet traffic would be running on HTTPS. In presence of secure tunnels such as HTTPS, transparent caching solutions become in vain, as the application payload is encrypted by lower level security protocols. This paper addresses this issue and provides an alternate approach, for contents caching without compromising their security. There are three parts to our proposal. First, we propose two new IP layer primitives that allow routers to differentiate between IP and ICN flows. Second, we introduce DCAR (Dual-mode Content Aware Router), which is a traditional IP router enabled to understand the proposed IP primitives. Third, design of DISCS (DCAR based Information centric Secure Content Sharing) framework is proposed that leverages DCAR to allow content object caching along with security services that are comparable to HTTPS. Finally we share details on realizing such system.
Twitter is one of the most popular microblogging social systems, which provides a set of distinctive posting services operating in real time. The flexibility of these services has attracted unethical individuals, so-called "spammers", aiming at spreading malicious, phishing, and misleading information. Unfortunately, the existence of spam results non-ignorable problems related to search and user's privacy. In the battle of fighting spam, various detection methods have been designed, which work by automating the detection process using the "features" concept combined with machine learning methods. However, the existing features are not effective enough to adapt spammers' tactics due to the ease of manipulation in the features. Also, the graph features are not suitable for Twitter based applications, though the high performance obtainable when applying such features. In this paper, beyond the simple statistical features such as number of hashtags and number of URLs, we examine the time property through advancing the design of some features used in the literature, and proposing new time based features. The new design of features is divided between robust advanced statistical features incorporating explicitly the time attribute, and behavioral features identifying any posting behavior pattern. The experimental results show that the new form of features is able to classify correctly the majority of spammers with an accuracy higher than 93% when using Random Forest learning algorithm, applied on a collected and annotated data-set. The results obtained outperform the accuracy of the state of the art features by about 6%, proving the significance of leveraging time in detecting spam accounts.
In the last few years, the high acceptability of service computing delivered over the internet has exponentially created immense security challenges for the services providers. Cyber criminals are using advanced malware such as polymorphic botnets for participating in our everyday online activities and trying to access the desired information in terms of personal details, credit card numbers and banking credentials. Polymorphic botnet attack is one of the biggest attacks in the history of cybercrime and currently, millions of computers are infected by the botnet clients over the world. Botnet attack is an intelligent and highly coordinated distributed attack which consists of a large number of bots that generates big volumes of spamming e-mails and launching distributed denial of service (DDoS) attacks on the victim machines in a heterogeneous network environment. Therefore, it is necessary to detect the malicious bots and prevent their planned attacks in the cloud environment. A number of techniques have been developed for detecting the malicious bots in a network in the past literature. This paper recognize the ineffectiveness exhibited by the singnature based detection technique and networktraffic based detection such as NetFlow or traffic flow detection and Anomaly based detection. We proposed a real time malware detection methodology based on Domain Generation Algorithm. It increasesthe throughput in terms of early detection of malicious bots and high accuracy of identifying the suspicious behavior.
Botnets play major roles in a vast number of threats to network security, such as DDoS attacks, generation of spam emails, information theft. Detecting Botnets is a difficult task in due to the complexity and performance issues when analyzing the huge amount of data from real large-scale networks. In major Botnet malware, the use of Domain Generation Algorithms allows to decrease possibility to be detected using white list - blacklist scheme and thus DGA Botnets have higher survival. This paper proposes a DGA Botnet detection scheme based on DNS traffic analysis which utilizes semantic measures such as entropy, meaning the level of the domain, frequency of n-gram appearances and Mahalanobis distance for domain classification. The proposed method is an improvement of Phoenix botnet detection mechanism, where in the classification phase, the modified Mahalanobis distance is used instead of the original for classification. The clustering phase is based on modified k-means algorithm for archiving better effectiveness. The effectiveness of the proposed method was measured and compared with Phoenix, Linguistic and SVM Light methods. The experimental results show the accuracy of proposed Botnet detection scheme ranges from 90 to 99,97% depending on Botnet type.
The hyperlink structure of World Wide Web is modeled as a directed, dynamic, and huge web graph. Web graphs are analyzed for determining page rank, fighting web spam, detecting communities, and so on, by performing tasks such as clustering, classification, and reachability. These tasks involve operations such as graph navigation, checking link existence, and identifying active links, which demand scanning of entire graphs. Frequent scanning of very large graphs involves more I/O operations and memory overheads. To rectify these issues, several data structures have been proposed to represent graphs in a compact manner. Even though the problem of representing graphs has been actively studied in the literature, there has been much less focus on representation of dynamic graphs. In this paper, we propose Tree-Dictionary-Representation (TDR), a compressed graph representation that supports dynamic nature of graphs as well as the various graph operations. Our experimental study shows that this representation works efficiently with limited main memory use and provides fast traversal of edges.
While email plays a growingly important role on the Internet, we are faced with more severe challenges brought by compromised email accounts, especially for the administrators of institutional email service providers. Inspired by the previous experience on spam filtering and compromised accounts detection, we propose several criteria, like Success Outdegree Proportion, Reverse Pagerank, Recipient Clustering Coefficient and Legitimate Recipient Proportion, for compromised email accounts detection from the perspective of graph topology in this paper. Specifically, several widely used social network analysis metrics are used and adapted according to the characteristics of mail log analysis. We evaluate our methods on a dataset constructed by mining the one month (30 days) mail log from an university with 118,617 local users and 11,460,399 mail log entries. The experimental results demonstrate that our methods achieve very positive performance, and we also prove that these methods can be efficiently applied on even larger datasets.
Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.
Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.
Amplification DDoS attacks have gained popularity and become a serious threat to Internet participants. However, little is known about where these attacks originate, and revealing the attack sources is a non-trivial problem due to the spoofed nature of the traffic. In this paper, we present novel techniques to uncover the infrastructures behind amplification DDoS attacks. We follow a two-step approach to tackle this challenge: First, we develop a methodology to impose a fingerprint on scanners that perform the reconnaissance for amplification attacks that allows us to link subsequent attacks back to the scanner. Our methodology attributes over 58% of attacks to a scanner with a confidence of over 99.9%. Second, we use Time-to-Live-based trilateration techniques to map scanners to the actual infrastructures launching the attacks. Using this technique, we identify 34 networks as being the source for amplification attacks at 98\textbackslash% certainty.
A honeypot is a deception tool for enticing attackers to make efforts to compromise the electronic information systems of an organization. A honeypot can serve as an advanced security surveillance tool for use in minimizing the risks of attacks on information technology systems and networks. Honeypots are useful for providing valuable insights into potential system security loopholes. The current research investigated the effectiveness of the use of centralized system management technologies called Puppet and Virtual Machines in the implementation automated honeypots for intrusion detection, correction and prevention. A centralized logging system was used to collect information of the source address, country and timestamp of intrusions by attackers. The unique contributions of this research include: a demonstration how open source technologies is used to dynamically add or modify hacking incidences in a high-interaction honeynet system; a presentation of strategies for making honeypots more attractive for hackers to spend more time to provide hacking evidences; and an exhibition of algorithms for system and network intrusion prevention.
IP tracking and cloaking are practices for identifying users which are used legitimately by websites to provide services and content tailored to particular users. However, it is believed that these practices are also used by malicious websites to avoid detection by anti-virus companies crawling the web to find malware. In addition, malicious websites are also believed to use IP tracking in order to deliver targeted malware based upon a history of previous visits by users. In this paper we empirically investigate these beliefs and collect a large dataset of suspicious URLs in order to identify at what level IP tracking takes place that is at the level of an individual address or at the level of their network provider or organisation (Network tracking). Our results illustrate that IP tracking is used in a small subset of domains within our dataset while no strong indication of network tracking was observed.
Honeypot systems are an effective method for defending production systems from security breaches and to gain detailed information about attackers' motivation, tactics, software and infrastructure. In this paper we present how different types of honeypots can be employed to gain valuable information about attacks and attackers, and also outline new and innovative possibilities for future research.
Defending information systems against advanced attacks is a challenging task; even if all the systems have been properly updated and all the known vulnerabilities have been patched, there is still the possibility of previously unknown zero day attack compromising the system. Honeypots offer a more proactive tool for detecting possible attacks. What is more, they can act as a tool for understanding attackers intentions. In this paper, we propose a design for a diversified honeypot. By increasing variability present in software, diversification decreases the number of assumptions an attacker can make about the target system.
When running large human computation tasks in the real-world, honeypots play an important role for assessing the overall quality of the work produced. The generation of such honeypots can be a significant burden on the task owner as they require specific characteristics in their design and implementation and continuous maintenance when operating data pipelines that include a human computation component. In this extended abstract we outline a novel approach for creating honeypots using automatically generated questions from a reference knowledge base with the ability to control such parameters as topic and difficulty.
Active defense is a popular defense technique based on systems that hinder an attacker's progress by design, rather than reactively responding to an attack only after its detection. Well-known active defense systems are honeypots. Honeypots are fake systems, designed to look like real production systems, aimed at trapping an attacker, and analyzing his attack strategy and goals. These types of systems suffer from a major weakness: it is extremely hard to design them in such a way that an attacker cannot distinguish them from a real production system. In this paper, we advocate that, instead of adding additional fake systems in the corporate network, the production systems themselves should be instrumented to provide active defense capabilities. This perspective to active defense allows containing costs and complexity, while at the same time provides the attacker with a more realistic-looking target, and gives the Incident Response Team more time to identify the attacker. The proposed proof-of-concept prototype system can be used to implement active defense in any corporate production network, with little upfront work, and little maintenance.
Multimedia security and copyright protection has been a popular topic for research and application, due to the explosion of data exchange over the internet and the widespread use of digital media. Watermarking is a process of hiding the digital information inside a digital media. Information hiding as digital watermarks in multimedia enables protection mechanism in decrypted contents. This paper presents a comparative study of existing technique used for digital watermarking an image using Genetic Algorithm and Bacterial Foraging Algorithm (BFO) based optimization technique with proposed one which consists of Genetic Algorithm and Honey Bee based optimization technique. The results obtained after experiment conclude that, new method has indeed outperformed then the conventional technique. The implementation is done over the MATLAB.