Biblio
Failure to detect malware at its very inception leaves room for it to post significant threat and cost to cyber security for not only individuals, organizations but also the society and nation. However, the rapid growth in volume and diversity of malware renders conventional detection techniques that utilize feature extraction and comparison insufficient, making it very difficult for well-trained network administrators to identify malware, not to mention regular users of internet. Challenges in malware detection is exacerbated since complexity in the type and structure also increase dramatically in these years to include source code, binary file, shell script, Perl script, instructions, settings and others. Such increased complexity offers a premium on misjudgment. In order to increase malware detection efficiency and accuracy under large volume and multiple types of malware, this research adopts Convolutional Neural Networks (CNN), one of the most successful deep learning techniques. The experiment shows an accuracy rate of over 90% in identifying malicious and benign codes. The experiment also presents that CNN is effective with detecting source code and binary code, it can further identify malware that is embedded into benign code, leaving malware no place to hide. This research proposes a feasible solution for network administrators to efficiently identify malware at the very inception in the severe network environment nowadays, so that information technology personnel can take protective actions in a timely manner and make preparations for potential follow-up cyber-attacks.
Defect prediction is an active topic in software quality assurance, which can help developers find potential bugs and make better use of resources. To improve prediction performance, this paper introduces cross-entropy, one common measure for natural language, as a new code metric into defect prediction tasks and proposes a framework called DefectLearner for this process. We first build a recurrent neural network language model to learn regularities in source code from software repository. Based on the trained model, the cross-entropy of each component can be calculated. To evaluate the discrimination for defect-proneness, cross-entropy is compared with 20 widely used metrics on 12 open-source projects. The experimental results show that cross-entropy metric is more discriminative than 50% of the traditional metrics. Besides, we combine cross-entropy with traditional metric suites together for accurate defect prediction. With cross-entropy added, the performance of prediction models is improved by an average of 2.8% in F1-score.
This paper proposed method for source code authorship attribution using modern natural language processing methods. Our method based on text embedding with convolutional recurrent neural network reaches 94.5% accuracy within 500 authors in one dataset, which outperformed many state of the art models for authorship attribution. Our approach is dealing with source code as with natural language texts, so it is potentially programming language independent with more potential of future improving.
When implemented on real systems, cryptographic algorithms are vulnerable to attacks observing their execution behavior, such as cache-timing attacks. Designing protected implementations must be done with knowledge and validation tools as early as possible in the development cycle. In this article we propose a methodology to assess the robustness of the candidates for the NIST post-quantum standardization project to cache-timing attacks. To this end we have developed a dedicated vulnerability research tool. It performs a static analysis with tainting propagation of sensitive variables across the source code and detects leakage patterns. We use it to assess the security of the NIST post-quantum cryptography project submissions. Our results show that more than 80% of the analyzed implementations have at least one potential flaw, and three submissions total more than 1000 reported flaws each. Finally, this comprehensive study of the competitors security allows us to identify the most frequent weaknesses amongst candidates and how they might be fixed.
Most mobile applications generate local data on internal memory with SharedPreference interface of an Android operating system. Therefore, many possible loopholes can access the confidential information such as passwords. We propose a hybrid encryption approach for SharedPreferences to protect the leaking confidential information through the source code. We develop an Android application and store some data using SharedPreference. We produce different experiments with which this data could be accessed. We apply Hybrid encryption approach combining encryption approach with Android Keystore system, for providing better encryption algorithm to hide sensitive data.
The article considers the approach to static analysis of program code and the general principles of static analyzer operation. The authors identify the most important syntactic and semantic information in the programs, which can be used to find errors in the source code. The general methodology for development of diagnostic rules is proposed, which will improve the efficiency of static code analyzers.
The growing popularity of Android applications makes them vulnerable to security threats. There exist several studies that focus on the analysis of the behaviour of Android applications to detect the repackaged and malicious ones. These techniques use a variety of features to model the application's behaviour, among which the calls to Android API, made by the application components, are shown to be the most reliable. To generate the APIs that an application calls is not an easy task. This is because most malicious applications are obfuscated and do not come with the source code. This makes the problem of identifying the API methods invoked by an application an interesting research issue. In this paper, we present HyDroid, a hybrid approach that combines static and dynamic analysis to generate API call traces from the execution of an application's services. We focus on services because they contain key characteristics that allure attackers to misuse them. We show that HyDroid can be used to extract API call trace signatures of several malware families.
Software security is an important aspect of ensuring software quality. Early detection of vulnerable code during development is essential for the developers to make cost and time effective software testing. The traditional software metrics are used for early detection of software vulnerability, but they are not directly related to code constructs and do not specify any particular granularity level. The goal of this study is to help developers evaluate software security using class-level traceable patterns called micro patterns to reduce security risks. The concept of micro patterns is similar to design patterns, but they can be automatically recognized and mined from source code. If micro patterns can better predict vulnerable classes compared to traditional software metrics, they can be used in developing a vulnerability prediction model. This study explores the performance of class-level patterns in vulnerability prediction and compares them with traditional class-level software metrics. We studied security vulnerabilities as reported for one major release of Apache Tomcat, Apache Camel and three stand-alone Java web applications. We used machine learning techniques for predicting vulnerabilities using micro patterns and class-level metrics as features. We found that micro patterns have higher recall in detecting vulnerable classes than the software metrics.
With the growth of the Internet, web applications are becoming very popular in the user communities. However, the presence of security vulnerabilities in the source code of these applications is raising cyber crime rate rapidly. It is required to detect and mitigate these vulnerabilities before their exploitation in the execution environment. Recently, Open Web Application Security Project (OWASP) and Common Vulnerabilities and Exposures (CWE) reported Cross-Site Scripting (XSS) as one of the most serious vulnerabilities in the web applications. Though many vulnerability detection approaches have been proposed in the past, existing detection approaches have the limitations in terms of false positive and false negative results. This paper proposes a context-sensitive approach based on static taint analysis and pattern matching techniques to detect and mitigate the XSS vulnerabilities in the source code of web applications. The proposed approach has been implemented in a prototype tool and evaluated on a public data set of 9408 samples. Experimental results show that proposed approach based tool outperforms over existing popular open source tools in the detection of XSS vulnerabilities.
The importance and potential advantages with a comprehensive product architecture description are well described in the literature. However, developing such a description takes additional resources, and it is difficult to maintain consistency with evolving implementations. This paper presents an approach and industrial experience which is based on architecture recovery from source code at truck manufacturer Scania CV AB. The extracted representation of the architecture is presented in several views and verified on CAN signal level. Lessons learned are discussed.
Currently, dependence on web applications is increasing rapidly for social communication, health services, financial transactions and many other purposes. Unfortunately, the presence of cross-site scripting vulnerabilities in these applications allows malicious user to steals sensitive information, install malware, and performs various malicious operations. Researchers proposed various approaches and developed tools to detect XSS vulnerability from source code of web applications. However, existing approaches and tools are not free from false positive and false negative results. In this paper, we propose a taint analysis and defensive programming based HTML context-sensitive approach for precise detection of XSS vulnerability from source code of PHP web applications. It also provides automatic suggestions to improve the vulnerable source code. Preliminary experiments and results on test subjects show that proposed approach is more efficient than existing ones.
Dependence on web applications is increasing very rapidly in recent time for social communications, health problem, financial transaction and many other purposes. Unfortunately, presence of security weaknesses in web applications allows malicious user's to exploit various security vulnerabilities and become the reason of their failure. Currently, SQL Injection (SQLI) and Cross-Site Scripting (XSS) vulnerabilities are most dangerous security vulnerabilities exploited in various popular web applications i.e. eBay, Google, Facebook, Twitter etc. Research on defensive programming, vulnerability detection and attack prevention techniques has been quite intensive in the past decade. Defensive programming is a set of coding guidelines to develop secure applications. But, mostly developers do not follow security guidelines and repeat same type of programming mistakes in their code. Attack prevention techniques protect the applications from attack during their execution in actual environment. The difficulties associated with accurate detection of SQLI and XSS vulnerabilities in coding phase of software development life cycle. This paper proposes a classification of software security approaches used to develop secure software in various phase of software development life cycle. It also presents a survey of static analysis based approaches to detect SQL Injection and cross-site scripting vulnerabilities in source code of web applications. The aim of these approaches is to identify the weaknesses in source code before their exploitation in actual environment. This paper would help researchers to note down future direction for securing legacy web applications in early phases of software development life cycle.
Most web applications have critical bugs (faults) affecting their security, which makes them vulnerable to attacks by hackers and organized crime. To prevent these security problems from occurring it is of utmost importance to understand the typical software faults. This paper contributes to this body of knowledge by presenting a field study on two of the most widely spread and critical web application vulnerabilities: SQL Injection and XSS. It analyzes the source code of security patches of widely used web applications written in weak and strong typed languages. Results show that only a small subset of software fault types, affecting a restricted collection of statements, is related to security. To understand how these vulnerabilities are really exploited by hackers, this paper also presents an analysis of the source code of the scripts used to attack them. The outcomes of this study can be used to train software developers and code inspectors in the detection of such faults and are also the foundation for the research of realistic vulnerability and attack injectors that can be used to assess security mechanisms, such as intrusion detection systems, vulnerability scanners, and static code analyzers.
With the rapid increase in cloud services collecting and using user data to offer personalized experiences, ensuring that these services comply with their privacy policies has become a business imperative for building user trust. However, most compliance efforts in industry today rely on manual review processes and audits designed to safeguard user data, and therefore are resource intensive and lack coverage. In this paper, we present our experience building and operating a system to automate privacy policy compliance checking in Bing. Central to the design of the system are (a) Legal ease-a language that allows specification of privacy policies that impose restrictions on how user data is handled, and (b) Grok-a data inventory for Map-Reduce-like big data systems that tracks how user data flows among programs. Grok maps code-level schema elements to data types in Legal ease, in essence, annotating existing programs with information flow types with minimal human input. Compliance checking is thus reduced to information flow analysis of Big Data systems. The system, bootstrapped by a small team, checks compliance daily of millions of lines of ever-changing source code written by several thousand developers.