Visible to the public Biblio

Filters: Keyword is source code  [Clear All Filters]
2023-01-13
Li, Xiuli, Wang, Guoshi, Wang, Chuping, Qin, Yanyan, Wang, Ning.  2022.  Software Source Code Security Audit Algorithm Supporting Incremental Checking. 2022 IEEE 7th International Conference on Smart Cloud (SmartCloud). :53—58.
Source code security audit is an effective technique to deal with security vulnerabilities and software bugs. As one kind of white-box testing approaches, it can effectively help developers eliminate defects in the code. However, it suffers from performance issues. In this paper, we propose an incremental checking mechanism which enables fast source code security audits. And we conduct comprehensive experiments to verify the effectiveness of our approach.
2022-09-09
Frankel, Sophia F., Ghosh, Krishnendu.  2021.  Machine Learning Approaches for Authorship Attribution using Source Code Stylometry. 2021 IEEE International Conference on Big Data (Big Data). :3298—3304.
Identification of source code authorship is vital for attribution. In this work, a machine learning framework is described to identify source code authorship. The framework integrates the features extracted using natural language processing based approaches and abstract syntax tree of the code. We evaluate the methodology on Google Code Jam dataset. We present the performance measures of the logistic regression and deep learning on the dataset.
2022-07-29
Ganesh, Sundarakrishnan, Ohlsson, Tobias, Palma, Francis.  2021.  Predicting Security Vulnerabilities using Source Code Metrics. 2021 Swedish Workshop on Data Science (SweDS). :1–7.
Large open-source systems generate and operate on a plethora of sensitive enterprise data. Thus, security threats or vulnerabilities must not be present in open-source systems and must be resolved as early as possible in the development phases to avoid catastrophic consequences. One way to recognize security vulnerabilities is to predict them while developers write code to minimize costs and resources. This study examines the effectiveness of machine learning algorithms to predict potential security vulnerabilities by analyzing the source code of a system. We obtained the security vulnerabilities dataset from Apache Tomcat security reports for version 4.x to 10.x. We also collected the source code of Apache Tomcat 4.x to 10.x to compute 43 object-oriented metrics. We assessed four traditional supervised learning algorithms, i.e., Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), and Logistic Regression (LR), to understand their efficacy in predicting security vulnerabilities. We obtained the highest accuracy of 80.6% using the KNN. Thus, the KNN classifier was demonstrated to be the most effective of all the models we built. The DT classifier also performed well but under-performed when it came to multi-class classification.
2022-03-10
Zhang, Zhongtang, Liu, Shengli, Yang, Qichao, Guo, Shichen.  2021.  Semantic Understanding of Source and Binary Code based on Natural Language Processing. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). 4:2010—2016.
With the development of open source projects, a large number of open source codes will be reused in binary software, and bugs in source codes will also be introduced into binary codes. In order to detect the reused open source codes in binary codes, it is sometimes necessary to compare and analyze the similarity between source codes and binary codes. One of the main challenge is that the compilation process can generate different binary code representations for the same source code, such as different compiler versions, compilation optimization options and target architectures, which greatly increases the difficulty of semantic similarity detection between source code and binary code. In order to solve the influence of the compilation process on the comparison of semantic similarity of codes, this paper transforms the source code and binary code into LLVM intermediate representation (LLVM IR), which is a universal intermediate representation independent of source code and binary code. We carry out semantic feature extraction and embedding training on LLVM IR based on natural language processing model. Experimental results show that LLVM IR eliminates the influence of compilation on the syntax differences between source code and binary code, and the semantic features of code are well represented and preserved.
2021-04-08
Wang, P., Zhang, J., Wang, S., Wu, D..  2020.  Quantitative Assessment on the Limitations of Code Randomization for Legacy Binaries. 2020 IEEE European Symposium on Security and Privacy (EuroS P). :1–16.
Software development and deployment are generally fast-pacing practices, yet to date there is still a significant amount of legacy software running in various critical industries with years or even decades of lifespans. As the source code of some legacy software became unavailable, it is difficult for maintainers to actively patch the vulnerabilities, leaving the outdated binaries appealing targets of advanced security attacks. One of the most powerful attacks today is code reuse, a technique that can circumvent most existing system-level security facilities. While there have been various countermeasures against code reuse, applying them to sourceless software appears to be exceptionally challenging. Fine-grained code randomization is considered to be an effective strategy to impede modern code-reuse attacks. To apply it to legacy software, a technique called binary rewriting is employed to directly reconstruct binaries without symbol or relocation information. However, we found that current rewriting-based randomization techniques, regardless of their designs and implementations, share a common security defect such that the randomized binaries may remain vulnerable in certain cases. Indeed, our finding does not invalidate fine-grained code randomization as a meaningful defense against code reuse attacks, for it significantly raises the bar for exploits to be successful. Nevertheless, it is critical for the maintainers of legacy software systems to be aware of this problem and obtain a quantitative assessment of the risks in adopting a potentially incomprehensive defense. In this paper, we conducted a systematic investigation into the effectiveness of randomization techniques designed for hardening outdated binaries. We studied various state-of-the-art, fine-grained randomization tools, confirming that all of them can leave a certain part of the retrofitted binary code still reusable. To quantify the risks, we proposed a set of concrete criteria to classify gadgets immune to rewriting-based randomization and investigated their availability and capability.
2021-03-15
Simon, L., Verma, A..  2020.  Improving Fuzzing through Controlled Compilation. 2020 IEEE European Symposium on Security and Privacy (EuroS P). :34–52.
We observe that operations performed by standard compilers harm fuzzing because the optimizations and the Intermediate Representation (IR) lead to transformations that improve execution speed at the expense of fuzzing. To remedy this problem, we propose `controlled compilation', a set of techniques to automatically re-factor a program's source code and cherry pick beneficial compiler optimizations to improve fuzzing. We design, implement and evaluate controlled compilation by building a new toolchain with Clang/LLVM. We perform an evaluation on 10 open source projects and compare the results of AFL to state-of-the-art grey-box fuzzers and concolic fuzzers. We show that when programs are compiled with this new toolchain, AFL covers 30 % new code on average and finds 21 additional bugs in real world programs. Our study reveals that controlled compilation often covers more code and finds more bugs than state-of-the-art fuzzing techniques, without the need to write a fuzzer from scratch or resort to advanced techniques. We identify two main reasons to explain why. First, it has proven difficult for researchers to appropriately configure existing fuzzers such as AFL. To address this problem, we provide guidelines and new LLVM passes to help automate AFL's configuration. This will enable researchers to perform a fairer comparison with AFL. Second, we find that current coverage-based evaluation measures (e.g. the total number of visited lines, edges or BBs) are inadequate because they lose valuable information such as which parts of a program a fuzzer actually visits and how consistently it does so. Coverage is considered a useful metric to evaluate a fuzzer's performance and devise a fuzzing strategy. However, the lack of a standard methodology for evaluating coverage remains a problem. To address this, we propose a rigorous evaluation methodology based on `qualitative coverage'. Qualitative coverage uniquely identifies each program line to help understand which lines are commonly visited by different fuzzers vs. which lines are visited only by a particular fuzzer. Throughout our study, we show the benefits of this new evaluation methodology. For example we provide valuable insights into the consistency of fuzzers, i.e. their ability to cover the same code or find the same bug across multiple independent runs. Overall, our evaluation methodology based on qualitative coverage helps to understand if a fuzzer performs better, worse, or is complementary to another fuzzer. This helps security practitioners adjust their fuzzing strategies.
2020-06-01
Nikolaidis, Fotios, Kossifidis, Nick, Leibovici, Thomas, Zertal, Soraya.  2018.  Towards a TRansparent I/O Solution. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). :1221–1228.
The benefits of data distribution to multiple storage platforms with different characteristics have been widely acknowledged. Such systems are more tolerant to outages and bottlenecks and allow for more flexible policies regarding cost reduction, security and workload diversity. To leverage platforms simultaneously additional orchestration steps are needed. Existing approaches either implement such steps in the application's source code, resulting to minimum reusability across applications, or handle them at the infrastructure level. The latter usually involves over-engineering to handle different application behaviors and binds the system to a specific infrastructure. In this paper we present a middle-ware that decouples the I/O path from the application's source code and performs in-transit processing before data lands on the storage platforms. Abstracting the I/O process as a graph of reusable components allows the developers to easily implement complex storage solutions without the burden of writing custom code. Similarly, the administrators can create their own graph that reflects the infrastructure setup and append it to the preceding graph, so that various policies and infrastructure-related changes can be performed transparently to the application. Users can also extend the graph chain to enhance the application's functionality by using plug-ins. Our approach eliminates the need for custom I/O management code and allows for the applications to evolve independently of the storage back-end. To evaluate our system we employed a secure web service scenario that was seamlessly adapted to the changes in its storage back-end.
2020-04-06
Chen, Chia-Mei, Wang, Shi-Hao, Wen, Dan-Wei, Lai, Gu-Hsin, Sun, Ming-Kung.  2019.  Applying Convolutional Neural Network for Malware Detection. 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST). :1—5.

Failure to detect malware at its very inception leaves room for it to post significant threat and cost to cyber security for not only individuals, organizations but also the society and nation. However, the rapid growth in volume and diversity of malware renders conventional detection techniques that utilize feature extraction and comparison insufficient, making it very difficult for well-trained network administrators to identify malware, not to mention regular users of internet. Challenges in malware detection is exacerbated since complexity in the type and structure also increase dramatically in these years to include source code, binary file, shell script, Perl script, instructions, settings and others. Such increased complexity offers a premium on misjudgment. In order to increase malware detection efficiency and accuracy under large volume and multiple types of malware, this research adopts Convolutional Neural Networks (CNN), one of the most successful deep learning techniques. The experiment shows an accuracy rate of over 90% in identifying malicious and benign codes. The experiment also presents that CNN is effective with detecting source code and binary code, it can further identify malware that is embedded into benign code, leaving malware no place to hide. This research proposes a feasible solution for network administrators to efficiently identify malware at the very inception in the severe network environment nowadays, so that information technology personnel can take protective actions in a timely manner and make preparations for potential follow-up cyber-attacks.

2019-11-12
Zhang, Xian, Ben, Kerong, Zeng, Jie.  2018.  Cross-Entropy: A New Metric for Software Defect Prediction. 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS). :111-122.

Defect prediction is an active topic in software quality assurance, which can help developers find potential bugs and make better use of resources. To improve prediction performance, this paper introduces cross-entropy, one common measure for natural language, as a new code metric into defect prediction tasks and proposes a framework called DefectLearner for this process. We first build a recurrent neural network language model to learn regularities in source code from software repository. Based on the trained model, the cross-entropy of each component can be calculated. To evaluate the discrimination for defect-proneness, cross-entropy is compared with 20 widely used metrics on 12 open-source projects. The experimental results show that cross-entropy metric is more discriminative than 50% of the traditional metrics. Besides, we combine cross-entropy with traditional metric suites together for accurate defect prediction. With cross-entropy added, the performance of prediction models is improved by an average of 2.8% in F1-score.

2019-02-22
Petrík, Juraj, Chudá, Daniela.  2018.  Source Code Authorship Approaches Natural Language Processing. Proceedings of the 19th International Conference on Computer Systems and Technologies. :58-61.

This paper proposed method for source code authorship attribution using modern natural language processing methods. Our method based on text embedding with convolutional recurrent neural network reaches 94.5% accuracy within 500 authors in one dataset, which outperformed many state of the art models for authorship attribution. Our approach is dealing with source code as with natural language texts, so it is potentially programming language independent with more potential of future improving.

2019-02-14
Facon, A., Guilley, S., Lec'Hvien, M., Schaub, A., Souissi, Y..  2018.  Detecting Cache-Timing Vulnerabilities in Post-Quantum Cryptography Algorithms. 2018 IEEE 3rd International Verification and Security Workshop (IVSW). :7-12.

When implemented on real systems, cryptographic algorithms are vulnerable to attacks observing their execution behavior, such as cache-timing attacks. Designing protected implementations must be done with knowledge and validation tools as early as possible in the development cycle. In this article we propose a methodology to assess the robustness of the candidates for the NIST post-quantum standardization project to cache-timing attacks. To this end we have developed a dedicated vulnerability research tool. It performs a static analysis with tainting propagation of sensitive variables across the source code and detects leakage patterns. We use it to assess the security of the NIST post-quantum cryptography project submissions. Our results show that more than 80% of the analyzed implementations have at least one potential flaw, and three submissions total more than 1000 reported flaws each. Finally, this comprehensive study of the competitors security allows us to identify the most frequent weaknesses amongst candidates and how they might be fixed.

2019-01-31
Kumbhar, S. S., Lee, Y., Yang, J..  2018.  Hybrid Encryption for Securing SharedPreferences of Android Applications. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :246–249.

Most mobile applications generate local data on internal memory with SharedPreference interface of an Android operating system. Therefore, many possible loopholes can access the confidential information such as passwords. We propose a hybrid encryption approach for SharedPreferences to protect the leaking confidential information through the source code. We develop an Android application and store some data using SharedPreference. We produce different experiments with which this data could be accessed. We apply Hybrid encryption approach combining encryption approach with Android Keystore system, for providing better encryption algorithm to hide sensitive data.

2018-06-07
Novikov, A. S., Ivutin, A. N., Troshina, A. G., Vasiliev, S. N..  2017.  The approach to finding errors in program code based on static analysis methodology. 2017 6th Mediterranean Conference on Embedded Computing (MECO). :1–4.

The article considers the approach to static analysis of program code and the general principles of static analyzer operation. The authors identify the most important syntactic and semantic information in the programs, which can be used to find errors in the source code. The general methodology for development of diagnostic rules is proposed, which will improve the efficiency of static code analyzers.

2018-04-02
Khanmohammadi, K., Hamou-Lhadj, A..  2017.  HyDroid: A Hybrid Approach for Generating API Call Traces from Obfuscated Android Applications for Mobile Security. 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). :168–175.

The growing popularity of Android applications makes them vulnerable to security threats. There exist several studies that focus on the analysis of the behaviour of Android applications to detect the repackaged and malicious ones. These techniques use a variety of features to model the application's behaviour, among which the calls to Android API, made by the application components, are shown to be the most reliable. To generate the APIs that an application calls is not an easy task. This is because most malicious applications are obfuscated and do not come with the source code. This makes the problem of identifying the API methods invoked by an application an interesting research issue. In this paper, we present HyDroid, a hybrid approach that combines static and dynamic analysis to generate API call traces from the execution of an application's services. We focus on services because they contain key characteristics that allure attackers to misuse them. We show that HyDroid can be used to extract API call trace signatures of several malware families.

2017-12-28
Sultana, K. Z., Williams, B. J..  2017.  Evaluating micro patterns and software metrics in vulnerability prediction. 2017 6th International Workshop on Software Mining (SoftwareMining). :40–47.

Software security is an important aspect of ensuring software quality. Early detection of vulnerable code during development is essential for the developers to make cost and time effective software testing. The traditional software metrics are used for early detection of software vulnerability, but they are not directly related to code constructs and do not specify any particular granularity level. The goal of this study is to help developers evaluate software security using class-level traceable patterns called micro patterns to reduce security risks. The concept of micro patterns is similar to design patterns, but they can be automatically recognized and mined from source code. If micro patterns can better predict vulnerable classes compared to traditional software metrics, they can be used in developing a vulnerability prediction model. This study explores the performance of class-level patterns in vulnerability prediction and compares them with traditional class-level software metrics. We studied security vulnerabilities as reported for one major release of Apache Tomcat, Apache Camel and three stand-alone Java web applications. We used machine learning techniques for predicting vulnerabilities using micro patterns and class-level metrics as features. We found that micro patterns have higher recall in detecting vulnerable classes than the software metrics.

2017-03-07
Gupta, M. K., Govil, M. C., Singh, G., Sharma, P..  2015.  XSSDM: Towards detection and mitigation of cross-site scripting vulnerabilities in web applications. 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :2010–2015.

With the growth of the Internet, web applications are becoming very popular in the user communities. However, the presence of security vulnerabilities in the source code of these applications is raising cyber crime rate rapidly. It is required to detect and mitigate these vulnerabilities before their exploitation in the execution environment. Recently, Open Web Application Security Project (OWASP) and Common Vulnerabilities and Exposures (CWE) reported Cross-Site Scripting (XSS) as one of the most serious vulnerabilities in the web applications. Though many vulnerability detection approaches have been proposed in the past, existing detection approaches have the limitations in terms of false positive and false negative results. This paper proposes a context-sensitive approach based on static taint analysis and pattern matching techniques to detect and mitigate the XSS vulnerabilities in the source code of web applications. The proposed approach has been implemented in a prototype tool and evaluated on a public data set of 9408 samples. Experimental results show that proposed approach based tool outperforms over existing popular open source tools in the detection of XSS vulnerabilities.

2015-05-06
Xinhai Zhang, Persson, M., Nyberg, M., Mokhtari, B., Einarson, A., Linder, H., Westman, J., DeJiu Chen, Torngren, M..  2014.  Experience on applying software architecture recovery to automotive embedded systems. Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on. :379-382.

The importance and potential advantages with a comprehensive product architecture description are well described in the literature. However, developing such a description takes additional resources, and it is difficult to maintain consistency with evolving implementations. This paper presents an approach and industrial experience which is based on architecture recovery from source code at truck manufacturer Scania CV AB. The extracted representation of the architecture is presented in several views and verified on CAN signal level. Lessons learned are discussed.
 

2015-05-05
Gupta, M.K., Govil, M.C., Singh, G..  2014.  A context-sensitive approach for precise detection of cross-site scripting vulnerabilities. Innovations in Information Technology (INNOVATIONS), 2014 10th International Conference on. :7-12.

Currently, dependence on web applications is increasing rapidly for social communication, health services, financial transactions and many other purposes. Unfortunately, the presence of cross-site scripting vulnerabilities in these applications allows malicious user to steals sensitive information, install malware, and performs various malicious operations. Researchers proposed various approaches and developed tools to detect XSS vulnerability from source code of web applications. However, existing approaches and tools are not free from false positive and false negative results. In this paper, we propose a taint analysis and defensive programming based HTML context-sensitive approach for precise detection of XSS vulnerability from source code of PHP web applications. It also provides automatic suggestions to improve the vulnerable source code. Preliminary experiments and results on test subjects show that proposed approach is more efficient than existing ones.

Gupta, M.K., Govil, M.C., Singh, G..  2014.  Static analysis approaches to detect SQL injection and cross site scripting vulnerabilities in web applications: A survey. Recent Advances and Innovations in Engineering (ICRAIE), 2014. :1-5.

Dependence on web applications is increasing very rapidly in recent time for social communications, health problem, financial transaction and many other purposes. Unfortunately, presence of security weaknesses in web applications allows malicious user's to exploit various security vulnerabilities and become the reason of their failure. Currently, SQL Injection (SQLI) and Cross-Site Scripting (XSS) vulnerabilities are most dangerous security vulnerabilities exploited in various popular web applications i.e. eBay, Google, Facebook, Twitter etc. Research on defensive programming, vulnerability detection and attack prevention techniques has been quite intensive in the past decade. Defensive programming is a set of coding guidelines to develop secure applications. But, mostly developers do not follow security guidelines and repeat same type of programming mistakes in their code. Attack prevention techniques protect the applications from attack during their execution in actual environment. The difficulties associated with accurate detection of SQLI and XSS vulnerabilities in coding phase of software development life cycle. This paper proposes a classification of software security approaches used to develop secure software in various phase of software development life cycle. It also presents a survey of static analysis based approaches to detect SQL Injection and cross-site scripting vulnerabilities in source code of web applications. The aim of these approaches is to identify the weaknesses in source code before their exploitation in actual environment. This paper would help researchers to note down future direction for securing legacy web applications in early phases of software development life cycle.

2015-04-30
Fonseca, J., Seixas, N., Vieira, M., Madeira, H..  2014.  Analysis of Field Data on Web Security Vulnerabilities. Dependable and Secure Computing, IEEE Transactions on. 11:89-100.

Most web applications have critical bugs (faults) affecting their security, which makes them vulnerable to attacks by hackers and organized crime. To prevent these security problems from occurring it is of utmost importance to understand the typical software faults. This paper contributes to this body of knowledge by presenting a field study on two of the most widely spread and critical web application vulnerabilities: SQL Injection and XSS. It analyzes the source code of security patches of widely used web applications written in weak and strong typed languages. Results show that only a small subset of software fault types, affecting a restricted collection of statements, is related to security. To understand how these vulnerabilities are really exploited by hackers, this paper also presents an analysis of the source code of the scripts used to attack them. The outcomes of this study can be used to train software developers and code inspectors in the detection of such faults and are also the foundation for the research of realistic vulnerability and attack injectors that can be used to assess security mechanisms, such as intrusion detection systems, vulnerability scanners, and static code analyzers.

Sen, S., Guha, S., Datta, A., Rajamani, S.K., Tsai, J., Wing, J.M..  2014.  Bootstrapping Privacy Compliance in Big Data Systems. Security and Privacy (SP), 2014 IEEE Symposium on. :327-342.

With the rapid increase in cloud services collecting and using user data to offer personalized experiences, ensuring that these services comply with their privacy policies has become a business imperative for building user trust. However, most compliance efforts in industry today rely on manual review processes and audits designed to safeguard user data, and therefore are resource intensive and lack coverage. In this paper, we present our experience building and operating a system to automate privacy policy compliance checking in Bing. Central to the design of the system are (a) Legal ease-a language that allows specification of privacy policies that impose restrictions on how user data is handled, and (b) Grok-a data inventory for Map-Reduce-like big data systems that tracks how user data flows among programs. Grok maps code-level schema elements to data types in Legal ease, in essence, annotating existing programs with information flow types with minimal human input. Compliance checking is thus reduced to information flow analysis of Big Data systems. The system, bootstrapped by a small team, checks compliance daily of millions of lines of ever-changing source code written by several thousand developers.