Biblio
Code clone detection is an important problem for software maintenance and evolution. Many approaches consider either structure or identifiers, but none of the existing detection techniques model both sources of information. These techniques also depend on generic, handcrafted features to represent code fragments. We introduce learning-based detection techniques where everything for representing terms and fragments in source code is mined from the repository. Our code analysis supports a framework, which relies on deep learning, for automatically linking patterns mined at the lexical level with patterns mined at the syntactic level. We evaluated our novel learning-based approach for code clone detection with respect to feasibility from the point of view of software maintainers. We sampled and manually evaluated 398 file- and 480 method-level pairs across eight real-world Java systems; 93% of the file- and method-level samples were evaluated to be true positives. Among the true positives, we found pairs mapping to all four clone types. We compared our approach to a traditional structure-oriented technique and found that our learning-based approach detected clones that were either undetected or suboptimally reported by the prominent tool Deckard. Our results affirm that our learning-based approach is suitable for clone detection and a tenable technique for researchers.
Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.
k-anonymity is an efficient way to anonymize the relational data to protect privacy against re-identification attacks. For the purpose of k-anonymity on transaction data, each item is considered as the quasi-identifier attribute, thus increasing high dimension problem as well as the computational complexity and information loss for anonymity. In this paper, an efficient anonymity system is designed to not only anonymize transaction data with lower information loss but also reduce the computational complexity for anonymity. An extensive experiment is carried to show the efficiency of the designed approach compared to the state-of-the-art algorithms for anonymity in terms of runtime and information loss. Experimental results indicate that the proposed anonymous system outperforms the compared algorithms in all respects.
Many solutions for composability and compositionality rely on specifying the interface for a component using bandwidth. Some previous works specify period (P) and budget (Q) as an interface for a component. Q/P provides us with a bandwidth (the share of a processor that this component may request); P specifies the time-granularity of the allocation of this processing capacity. Other works add another parameter deadline which can help to provide tighter bounds on how this processing capacity is distributed. Yet other works use the parameters α and Δ where α is the bandwidth and Δ specifies how smoothly this bandwidth is distributed. It is known [4] that such bandwidth-like interfaces carry a cost: there are tasksets that could be guaranteed to be schedulable if tasks were scheduled directly on the processor, but with bandwidth-like interfaces, it is impossible to guarantee the taskset to be schedulable. And it is also known that this penalty can be infinite, i.e., the use of bandwidth-like interfaces may require the use of a processor that has a speed that is k times faster, and one can show this for any k. This brings the question: "What is the average-case performance penalty of bandwidth-like interfaces?" This paper addresses this question. We answer the question by randomly generating tasksets and then for each of these tasksets, compute a lower bound on how much faster a processor needs to be when a bandwidth-like scheme is used. We do not consider any specific bandwidth-like scheme; instead, we derive an expression that states a lower bound on how much faster a processor needs to be when a bandwidth-like scheme is used. For the distributions considered in this paper, we find that (i) the experimental results depend on the experimental setup, (ii) this lower bound on the penalty was never larger than 4.0, (iii) for one experimental setup, for each taskset, it was greater than 2.4, (iv) the histogram of this penalty appears to be unimodal, and (v) for implicit-deadline sporadic tasks, this lower bound on the penalty was exactly 1.
Given a stream of heterogeneous graphs containing different types of nodes and edges, how can we spot anomalous ones in real-time while consuming bounded memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. We propose StreamSpot, a clustering based anomaly detection approach that addresses challenges in two key fronts: (1) heterogeneity, and (2) streaming nature. We introduce a new similarity function for heterogeneous graphs that compares two graphs based on their relative frequency of local substructures, represented as short strings. This function lends itself to a vector representation of a graph, which is (a) fast to compute, and (b) amenable to a sketched version with bounded size that preserves similarity. StreamSpot exhibits desirable properties that a streaming application requires: it is (i) fully-streaming; processing the stream one edge at a time as it arrives, (ii) memory-efficient; requiring constant space for the sketches and the clustering, (iii) fast; taking constant time to update the graph sketches and the cluster summaries that can process over 100,000 edges per second, and (iv) online; scoring and flagging anomalies in real time. Experiments on datasets containing simulated system-call flow graphs from normal browser activity and various attack scenarios (ground truth) show that StreamSpot is high-performance; achieving above 95% detection accuracy with small delay, as well as competitive time and memory usage.
In the field of image processing, it is more complex and challenging task to detect the Human motion in the video and recognize their actions from the video sequences. A novel approach is presented in this paper to detect the human motion and recognize their actions. By tracking the selected object over consecutive frames of a video or image sequences, the different Human actions are recognized. Initially, the background motion is subtracted from the input video stream and its binary images are constructed. Using spatiotemporal interest points, the object which needs to be monitored is selected by enclosing the required pixels within the bounding rectangle. The selected foreground pixels within the bounding rectangle are then tracked using edge tracking algorithm. The features are extracted and using these features human motion are detected. Finally, the different human actions are recognized using K-Nearest Neighbor classifier. The applications which uses this methodology where monitoring the human actions is required such as shop surveillance, city surveillance, airports surveillance and other important places where security is the prime factor. The results obtained are quite significant and are analyzed on the datasets like KTH and Weizmann dataset, which contains actions like bending, running, walking, skipping, and hand-waving.
Detecting lock-related defects has long been a hot research topic in software engineering. Many efforts have been spent on detecting such deadlocks in concurrent software systems. However, latent locks may be hidden in application programming interface (API) methods whose source code may not be accessible to developers. Many APIs have latent locks. For example, our study has shown that J2SE alone can have 2,000+ latent locks. As latent locks are less known by developers, they can cause deadlocks that are hard to perceive or diagnose. Meanwhile, the state-of-the-art tools mostly handle API methods as black boxes, and cannot detect deadlocks that involve such latent locks. In this paper, we propose a novel black-box testing approach, called LockPeeker, that reveals latent locks in Java APIs. The essential idea of LockPeeker is that latent locks of a given API method can be revealed by testing the method and summarizing the locking effects during testing execution. We have evaluated LockPeeker on ten real-world Java projects. Our evaluation results show that (1) LockPeeker detects 74.9% of latent locks in API methods, and (2) it enables state-of-the-art tools to detect deadlocks that otherwise cannot be detected.
In order to identify a personalized story, suitable for the needs of large masses of visitors and tourists, our work has been aimed at the definition of appropriate models and solutions of fruition that make the visit experience more appealing and immersive. This paper proposes the characteristic functionalities of narratology and of the techniques of storytelling for the dynamic creation of experiential stories on a sematic basis. Therefore, it represents a report about sceneries, implementation models and architectural and functional specifications of storytelling for the dynamic creation of functional contents for the visit. Our purpose is to indicate an approach for the realization of a dynamic storytelling engine that can allow the dynamic supply of narrative contents, not necessarily predetermined and pertinent to the needs and the dynamic behaviors of the users. In particular, we have chosen to employ an adaptive, social and mobile approach, using an ontological model in order to realize a dynamic digital storytelling system, able to collect and elaborate social information and contents about the users giving them a personalized story on the basis of the place they are visiting. A case of study and some experimental results are presented and discussed.
Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.
In the last few years, the high acceptability of service computing delivered over the internet has exponentially created immense security challenges for the services providers. Cyber criminals are using advanced malware such as polymorphic botnets for participating in our everyday online activities and trying to access the desired information in terms of personal details, credit card numbers and banking credentials. Polymorphic botnet attack is one of the biggest attacks in the history of cybercrime and currently, millions of computers are infected by the botnet clients over the world. Botnet attack is an intelligent and highly coordinated distributed attack which consists of a large number of bots that generates big volumes of spamming e-mails and launching distributed denial of service (DDoS) attacks on the victim machines in a heterogeneous network environment. Therefore, it is necessary to detect the malicious bots and prevent their planned attacks in the cloud environment. A number of techniques have been developed for detecting the malicious bots in a network in the past literature. This paper recognize the ineffectiveness exhibited by the singnature based detection technique and networktraffic based detection such as NetFlow or traffic flow detection and Anomaly based detection. We proposed a real time malware detection methodology based on Domain Generation Algorithm. It increasesthe throughput in terms of early detection of malicious bots and high accuracy of identifying the suspicious behavior.
Compositional theories and technologies facilitate the decomposition of a complex system into components, as well as their integration via interfaces. Component interfaces hide the internal details of the components, thereby reducing integration complexity. A system is said to be composable if the properties established and validated for components in isolation hold once the components are integrated to form the system. This brings us the question: "Is compositionality related to composability?" This paper answers this question in the affirmative; it considers a previously known interface for compositionality and shows that it can be used for composability. It also presents a run-time policing mechanism for this interface.
Institutions use the information security (InfoSec) policy document as a set of rules and guidelines to govern the use of the institutional information resources. However, a common problem is that these policies are often not followed or complied with. This study explores the extent to which the problem lies with the policy documents themselves. The InfoSec policies are documented in the natural languages, which are prone to ambiguity and misinterpretation. Subsequently such policies may be ambiguous, thereby making it hard, if not impossible for users to comply with. A case study approach with a content analysis was conducted. The research explores the extent of the problem by using a case study of an educational institution in South Africa.
Data randomization or scrambling has been effectively used in various applications to improve the data security. In this paper, we use the idea of data randomization to proactively randomize the spectrum (re)allocation to improve connections' security. As it is well-known that random (re)allocation fragments the spectrum and thus increases blocking in elastic optical networks, we analyze the tradeoff between system performance and security. To this end, in addition to spectrum randomization, we utilize an on-demand defragmentation scheme every time a request is blocked due to the spectrum fragmentation. We model the occupancy pattern of an elastic optical link (EOL) using a multi-class continuous-time Markov chain (CTMC) under the random-fit spectrum allocation method. Numerical results show that although both the blocking and security can be improved for a particular so-called randomization process (RP) arrival rate, while with the increase in RP arrival rate the connections' security improves at the cost of the increase in overall blocking.
Protecting Critical Infrastructures (CIs) against contemporary cyber attacks has become a crucial as well as complex task. Modern attack campaigns, such as Advanced Persistent Threats (APTs), leverage weaknesses in the organization's business processes and exploit vulnerabilities of several systems to hit their target. Although their life-cycle can last for months, these campaigns typically go undetected until they achieve their goal. They usually aim at performing data exfiltration, cause service disruptions and can also undermine the safety of humans. Novel detection techniques and incident handling approaches are therefore required, to effectively protect CI's networks and timely react to this type of threats. Correlating large amounts of data, collected from a multitude of relevant sources, is necessary and sometimes required by national authorities to establish cyber situational awareness, and allow to promptly adopt suitable countermeasures in case of an attack. In this paper we propose three novel methods for security information correlation designed to discover relevant insights and support the establishment of cyber situational awareness.
Smart grid is an evolving new power system framework with ICT driven power equipment massively layered structure. The new generation sensors, smart meters and electronic devices are integral components of smart grid. However, the upcoming deployment of smart devices at different layers followed by their integration with communication networks may introduce cyber threats. The interdependencies of various subsystems functioning in the smart grid, if affected by cyber-attack, may be vulnerable and greatly reduce efficiency and reliability due to any one of the device not responding in real time frame. The cyber security vulnerabilities become even more evident due to the existing superannuated cyber infrastructure. This paper presents a critical review on expected cyber security threats in complex environment and addresses the grave concern of a secure cyber infrastructure and related developments. An extensive review on the cyber security objectives and requirements along with the risk evaluation process has been undertaken. The paper analyses confidentiality and privacy issues of entire components of smart power system. A critical evaluation on upcoming challenges with innovative research concerns is highlighted to achieve a roadmap of an immune smart grid infrastructure. This will further facilitate R&d; associated developments.
This paper focuses on exploitable cyber vulnerabilities in industrial control systems (ICS) and on a new approach of resiliency against them. Even with numerous metrics and methods for intrusion detection and mitigation strategy, a complete detection and deterrence of cyber-attacks for ICS is impossible. Countering the impact and consequence of possible malfunctions caused by such attacks in the safety-critical ICS's, this paper proposes new controller architecture to fail-operate even under compromised situations. The proposed new ICS is realized with diversification of hardware/software and unidirectional communication in alerting suspicious infiltration to upper-level management. Equipped with control bus monitoring, this operation-basis approach of infiltration detection would become a truly cyber-resilient ICS. The proposed system is tested in a lab hardware experimentation setup and on a cybersecurity test bed, DeterLab, for validation.
The Internet of Things(IoT) has become a popular technology, and various middleware has been proposed and developed for IoT systems. However, there have been few studies on the data management of IoT systems. In this paper, we consider graph database models for the data management of IoT systems because these models can specify relationships in a straightforward manner among entities such as devices, users, and information that constructs IoT systems. However, applying a graph database to the data management of IoT systems raises issues regarding distribution and security. For the former issue, we propose graph database operations integrated with REST APIs. For the latter, we extend a graph edge property by adding access protocol permissions and checking permissions using the APIs with authentication. We present the requirements for a use case scenario in addition to the features of a distributed graph database for IoT data management to solve the aforementioned issues, and implement a prototype of the graph database.
Intrusion detection has been an active field of research for more than 35 years. Numerous systems had been built based on the two fundamental detection principles, knowledge-based and behavior-based detection. Anyway, having a look at day-to-day news about data breaches and successful attacks, detection effectiveness is still limited. Even more, heavy-weight intrusion detection systems cannot be installed in every endangered environment. For example, Industrial Control Systems are typically utilized for decades, charging off huge investments of companies. Thus, some of these systems have been in operation for years, but were designed afore without security in mind. Even worse, as systems often have connections to other networks and even the Internet nowadays, an adequate protection is mandatory, but integrating intrusion detection can be extremely difficult - or even impossible to date. We propose a new lightweight current-based IDS which is using a difficult to manipulate measurement base and verifiable ground truth. Focus of our system is providing intrusion detection for ICS and SCADA on a low-priced base, easy to integrate. Dr. WATTson, a prototype implemented based on our concept provides high detection and low false alarm rates.
Elliptic Curve Cryptosystems are very much delicate to attacks or physical attacks. This paper aims to correctly implementing the fault injection attack against Elliptic Curve Digital Signature Algorithm. More specifically, the proposed algorithm concerns to fault attack which is implemented to sufficiently alter signature against vigilant periodic sequence algorithm that supports the efficient speed up and security perspectives with most prominent and well known scalar multiplication algorithm for ECDSA. The purpose is to properly injecting attack whether any probable countermeasure threatening the pseudo code is determined by the attack model according to the predefined methodologies. We show the results of our experiment with bits acquire from the targeted implementation to determine the reliability of our attack.
Privacy and security have been discussed in many occasions and in most cases, the importance that these two aspects play on the information system domain are mentioned often. Many times, research is carried out on the individual information security or privacy measures where it is commonly regarded with the focus on the particular measure or both privacy and security are regarded as a whole subject. However, there have been no attempts at establishing a proper method in categorizing any form of objects of protection. Through the review done on this paper, we would like to investigate the relationship between privacy and security and form a break down the aspects of privacy and security in order to provide better understanding through determining if a measure or methodology is security, privacy oriented or both. We would recommend that in further research, a further refined formulation should be formed in order to carry out this determination process. As a result, we propose a Privacy-Security Tree (PST) in this paper that distinguishes the privacy from security measures.
With increasing popularity of cloud computing, the data owners are motivated to outsource their sensitive data to cloud servers for flexibility and reduced cost in data management. However, privacy is a big concern for outsourcing data to the cloud. The data owners typically encrypt documents before outsourcing for privacy-preserving. As the volume of data is increasing at a dramatic rate, it is essential to develop an efficient and reliable ciphertext search techniques, so that data owners can easily access and update cloud data. In this paper, we propose a privacy preserving multi-keyword ranked search scheme over encrypted data in cloud along with data integrity using a new authenticated data structure MIR-tree. The MIR-tree based index with including the combination of widely used vector space model and TF×IDF model in the index construction and query generation. We use inverted file index for storing word-digest, which provides efficient and fast relevance between the query and cloud data. Design an authentication set(AS) for authenticating the queries, for verifying top-k search results. Because of tree based index, our scheme achieves optimal search efficiency and reduces communication overhead for verifying the search results. The analysis shows security and efficiency of our scheme.
A smart city uses information technology to integrate and manage physical, social, and business infrastructures in order to provide better services to its dwellers while ensuring efficient and optimal utilization of available resources. With the proliferation of technologies such as Internet of Things (IoT), cloud computing, and interconnected networks, smart cities can deliver innovative solutions and more direct interaction and collaboration between citizens and the local government. Despite a number of potential benefits, digital disruption poses many challenges related to information security and privacy. This paper proposes a security framework that integrates the blockchain technology with smart devices to provide a secure communication platform in a smart city.
In the original algorithm for grey correlation analysis, the detected edge is comparatively rough and the thresholds need determining in advance. Thus, an adaptive edge detection method based on grey correlation analysis is proposed, in which the basic principle of the original algorithm for grey correlation analysis is used to get adaptively automatic threshold according to the mean value of the 3×3 area pixels around the detecting pixel and the property of people's vision. Because the false edge that the proposed algorithm detected is relatively large, the proposed algorithm is enhanced by dealing with the eight neighboring pixels around the edge pixel, which is merged to get the final edge map. The experimental results show that the algorithm can get more complete edge map with better continuity by comparing with the traditional edge detection algorithms.
In this paper, we propose a new color image encryption and compression algorithm based on the DNA complementary rule and the Chinese remainder theorem, which combines the DNA complementary rule with quantum chaotic map. We use quantum chaotic map and DNA complementary rule to shuffle the color image and obtain the shuffled image, then Chinese remainder theorem from number theory is utilized to diffuse and compress the shuffled image simultaneously. The security analysis and experiment results show that the proposed encryption algorithm has large key space and good encryption result, it also can resist against common attacks.