Visible to the public Biblio

Filters: Keyword is Open Source Software  [Clear All Filters]
2021-06-24
Angermeir, Florian, Voggenreiter, Markus, Moyón, Fabiola, Mendez, Daniel.  2021.  Enterprise-Driven Open Source Software: A Case Study on Security Automation. 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). :278—287.
Agile and DevOps are widely adopted by the industry. Hence, integrating security activities with industrial practices, such as continuous integration (CI) pipelines, is necessary to detect security flaws and adhere to regulators’ demands early. In this paper, we analyze automated security activities in CI pipelines of enterprise-driven open source software (OSS). This shall allow us, in the long-run, to better understand the extent to which security activities are (or should be) part of automated pipelines. In particular, we mine publicly available OSS repositories and survey a sample of project maintainers to better understand the role that security activities and their related tools play in their CI pipelines. To increase transparency and allow other researchers to replicate our study (and to take different perspectives), we further disclose our research artefacts.Our results indicate that security activities in enterprise-driven OSS projects are scarce and protection coverage is rather low. Only 6.83% of the analyzed 8,243 projects apply security automation in their CI pipelines, even though maintainers consider security to be rather important. This alerts industry to keep the focus on vulnerabilities of 3rd Party software and it opens space for other improvements of practice which we outline in this manuscript.
2021-05-18
Zeng, Jingxiang, Nie, Xiaofan, Chen, Liwei, Li, Jinfeng, Du, Gewangzi, Shi, Gang.  2020.  An Efficient Vulnerability Extrapolation Using Similarity of Graph Kernel of PDGs. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1664–1671.
Discovering the potential vulnerabilities in software plays a crucial role in ensuring the security of computer system. This paper proposes a method that can assist security auditors with the analysis of source code. When security auditors identify new vulnerabilities, our method can be adopted to make a list of recommendations that may have the same vulnerabilities for the security auditors. Our method relies on graph representation to automatically extract the mode of PDG(program dependence graph, a structure composed of control dependence and data dependence). Besides, it can be applied to the vulnerability extrapolation scenario, thus reducing the amount of audit code. We worked on an open-source vulnerability test set called Juliet. According to the evaluation results, the clustering effect produced is satisfactory, so that the feature vectors extracted by the Graph2Vec model are applied to labeling and supervised learning indicators are adopted to assess the model for its ability to extract features. On a total of 12,000 small data sets, the training score of the model can reach up to 99.2%, and the test score can reach a maximum of 85.2%. Finally, the recommendation effect of our work is verified as satisfactory.
2021-05-05
Đuranec, A., Gruičić, S., Žagar, M..  2020.  Forensic analysis of Windows 10 Sandbox. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). :1224—1229.

With each Windows operating system Microsoft introduces new features to its users. Newly added features present a challenge to digital forensics examiners as they are not analyzed or tested enough. One of the latest features, introduced in Windows 10 version 1909 is Windows Sandbox; a lightweight, temporary, environment for running untrusted applications. Because of the temporary nature of the Sandbox and insufficient documentation, digital forensic examiners are facing new challenges when examining this newly added feature which can be used to hide different illegal activities. Throughout this paper, the focus will be on analyzing different Windows artifacts and event logs, with various tools, left behind as a result of the user interaction with the Sandbox feature on a clear virtual environment. Additionally, the setup of testing environment will be explained, the results of testing and interpretation of the findings will be presented, as well as open-source tools used for the analysis.

Herrera, Adrian.  2020.  Optimizing Away JavaScript Obfuscation. 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). :215—220.

JavaScript is a popular attack vector for releasing malicious payloads on unsuspecting Internet users. Authors of this malicious JavaScript often employ numerous obfuscation techniques in order to prevent the automatic detection by antivirus and hinder manual analysis by professional malware analysts. Consequently, this paper presents SAFE-DEOBS, a JavaScript deobfuscation tool that we have built. The aim of SAFE-DEOBS is to automatically deobfuscate JavaScript malware such that an analyst can more rapidly determine the malicious script's intent. This is achieved through a number of static analyses, inspired by techniques from compiler theory. We demonstrate the utility of SAFE-DEOBS through a case study on real-world JavaScript malware, and show that it is a useful addition to a malware analyst's toolset.

Block, Matthew, Barcaskey, Benjamin, Nimmo, Andrew, Alnaeli, Saleh, Gilbert, Ian, Altahat, Zaid.  2020.  Scalable Cloud-Based Tool to Empirically Detect Vulnerable Code Patterns in Large-Scale System. 2020 IEEE International Conference on Electro Information Technology (EIT). :588—592.
Open-source development is a well-accepted model by software development communities from both academia and industry. Many companies and corporations adopt and use open source systems daily as a core component in their business activities. One of the most important factors that will determine the success of this model is security. The security of software systems is a combination of source code quality, stability, and vulnerabilities. Software vulnerabilities can be introduced by many factors, some of which are the way that programmers write their programs, their background on security standards, and safe programming practices. This paper describes a cloud-based software tool developed by the authors that can help our computing communities in both academia and research to evaluate their software systems on the source code level to help them identify and detect some of the well-known source code vulnerability patterns that can cause security issues if maliciously exploited. The paper also presents an empirical study on the prevalence of vulnerable C/C++ coding patterns inside three large-scale open-source systems comprising more than 42 million lines of source code. The historical data for the studied systems is presented over five years to uncover some historical trends to highlight the changes in the system analyzed over time concerning the presence of some of the source code vulnerabilities patterns. The majority of results show the continued usage of known unsafe functions.
Pawar, Shrikant, Stanam, Aditya.  2020.  Scalable, Reliable and Robust Data Mining Infrastructures. 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). :123—125.

Mining of data is used to analyze facts to discover formerly unknown patterns, classifying and grouping the records. There are several crucial scalable statistics mining platforms that have been developed in latest years. RapidMiner is a famous open source software which can be used for advanced analytics, Weka and Orange are important tools of machine learning for classifying patterns with techniques of clustering and regression, whilst Knime is often used for facts preprocessing like information extraction, transformation and loading. This article encapsulates the most important and robust platforms.

2021-03-29
Naik, N., Jenkins, P..  2020.  uPort Open-Source Identity Management System: An Assessment of Self-Sovereign Identity and User-Centric Data Platform Built on Blockchain. 2020 IEEE International Symposium on Systems Engineering (ISSE). :1—7.

Managing identity across an ever-growing digital services landscape has become one of the most challenging tasks for security experts. Over the years, several Identity Management (IDM) systems were introduced and adopted to tackle with the growing demand of an identity. In this series, a recently emerging IDM system is Self-Sovereign Identity (SSI) which offers greater control and access to users regarding their identity. This distinctive feature of the SSI IDM system represents a major development towards the availability of sovereign identity to users. uPort is an emerging open-source identity management system providing sovereign identity to users, organisations, and other entities. As an emerging identity management system, it requires meticulous analysis of its architecture, working, operational services, efficiency, advantages and limitations. Therefore, this paper contributes towards achieving all of these objectives. Firstly, it presents the architecture and working of the uPort identity management system. Secondly, it develops a Decentralized Application (DApp) to demonstrate and evaluate its operational services and efficiency. Finally, based on the developed DApp and experimental analysis, it presents the advantages and limitations of the uPort identity management system.

2021-03-22
Kellogg, M., Schäf, M., Tasiran, S., Ernst, M. D..  2020.  Continuous Compliance. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :511–523.
Vendors who wish to provide software or services to large corporations and governments must often obtain numerous certificates of compliance. Each certificate asserts that the software satisfies a compliance regime, like SOC or the PCI DSS, to protect the privacy and security of sensitive data. The industry standard for obtaining a compliance certificate is an auditor manually auditing source code. This approach is expensive, error-prone, partial, and prone to regressions. We propose continuous compliance to guarantee that the codebase stays compliant on each code change using lightweight verification tools. Continuous compliance increases assurance and reduces costs. Continuous compliance is applicable to any source-code compliance requirement. To illustrate our approach, we built verification tools for five common audit controls related to data security: cryptographically unsafe algorithms must not be used, keys must be at least 256 bits long, credentials must not be hard-coded into program text, HTTPS must always be used instead of HTTP, and cloud data stores must not be world-readable. We evaluated our approach in three ways. (1) We applied our tools to over 5 million lines of open-source software. (2) We compared our tools to other publicly-available tools for detecting misuses of encryption on a previously-published benchmark, finding that only ours are suitable for continuous compliance. (3) We deployed a continuous compliance process at AWS, a large cloud-services company: we integrated verification tools into the compliance process (including auditors accepting their output as evidence) and ran them on over 68 million lines of code. Our tools and the data for the former two evaluations are publicly available.
2021-02-08
Haque, M. A., Shetty, S., Kamhoua, C. A., Gold, K..  2020.  Integrating Mission-Centric Impact Assessment to Operational Resiliency in Cyber-Physical Systems. GLOBECOM 2020 - 2020 IEEE Global Communications Conference. :1–7.

Developing mission-centric impact assessment techniques to address cyber resiliency in the cyber-physical systems (CPSs) requires integrating system inter-dependencies to the risk and resilience analysis process. Generally, network administrators utilize attack graphs to estimate possible consequences in a networked environment. Attack graphs lack to incorporate the operations-specific dependencies. Localizing the dependencies among operational missions, tasks, and the hosting devices in a large-scale CPS is also challenging. In this work, we offer a graphical modeling technique to integrate the mission-centric impact assessment of cyberattacks by relating the effect to the operational resiliency by utilizing a combination of the logical attack graph and mission impact propagation graph. We propose formal techniques to compute cyberattacks’ impact on the operational mission and offer an optimization process to minimize the same, having budgetary restrictions. We also relate the effect to the system functional operability. We illustrate our modeling techniques using a SCADA (supervisory control and data acquisition) case study for the cyber-physical power systems. We believe our proposed method would help evaluate and minimize the impact of cyber attacks on CPS’s operational missions and, thus, enhance cyber resiliency.

2021-01-15
Korshunov, P., Marcel, S..  2019.  Vulnerability assessment and detection of Deepfake videos. 2019 International Conference on Biometrics (ICB). :1—6.
It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates (on high quality versions) respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found the best performing method based on visual quality metrics, which is often used in presentation attack detection domain, to lead to 8.97% equal error rate on high quality Deep-fakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.
2020-11-20
Lardier, W., Varo, Q., Yan, J..  2019.  Quantum-Sim: An Open-Source Co-Simulation Platform for Quantum Key Distribution-Based Smart Grid Communications. 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). :1—6.
Grid modernization efforts with the latest information and communication technologies will significantly benefit smart grids in the coming years. More optical fibre communications between consumers and the control center will promise better demand response and customer engagement, yet the increasing attack surface and man-in-the-middle (MITM) threats can result in security and privacy challenges. Among the studies for more secure smart grid communications, quantum key distribution protocols (QKD) have emerged as a promising option. To bridge the theoretical advantages of quantum communication to its practical utilization, however, comprehensive investigations have to be conducted with realistic cyber-physical smart grid structures and scenarios. To facilitate research in this direction, this paper proposes an open-source, research-oriented co-simulation platform that orchestrates cyber and power simulators under the MOSAIK framework. The proposed platform allows flexible and realistic power flow-based co-simulation of quantum communications and electrical grids, where different cyber and power topologies, QKD protocols, and attack threats can be investigated. Using quantum-based communication under MITM attacks, the paper presented detailed case studies to demonstrate how the platform enables quick setup of a lowvoltage distribution grid, implementation of different protocols and cryptosystems, as well as evaluations of both communication efficiency and security against MITM attacks. The platform has been made available online to empower researchers in the modelling of quantum-based cyber-physical systems, pilot studies on quantum communications in smart grid, as well as improved attack resilience against malicious intruders.
2020-09-28
Ibrahim, Ahmed, El-Ramly, Mohammad, Badr, Amr.  2019.  Beware of the Vulnerability! How Vulnerable are GitHub's Most Popular PHP Applications? 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA). :1–7.
The presence of software vulnerabilities is a serious threat to any software project. Exploiting them can compromise system availability, data integrity, and confidentiality. Unfortunately, many open source projects go for years with undetected ready-to-exploit critical vulnerabilities. In this study, we investigate the presence of software vulnerabilities in open source projects and the factors that influence this presence. We analyzed the top 100 open source PHP applications in GitHub using a static analysis vulnerability scanner to examine how common software vulnerabilities are. We also discussed which vulnerabilities are most present and what factors contribute to their presence. We found that 27% of these projects are insecure, with a median number of 3 vulnerabilities per vulnerable project. We found that the most common type is injection vulnerabilities, which made 58% of all detected vulnerabilities. Out of these, cross-site scripting (XSS) was the most common and made 43.5% of all vulnerabilities found. Statistical analysis revealed that project activities like branching, pulling, and committing have a moderate positive correlation with the number of vulnerabilities in the project. Other factors like project popularity, number of releases, and number of issues had almost no influence on the number of vulnerabilities. We recommend that open source project owners should set secure code development guidelines for their project members and establish secure code reviews as part of the project's development process.
2020-08-24
Harris, Daniel R., Delcher, Chris.  2019.  bench4gis: Benchmarking Privacy-aware Geocoding with Open Big Data. 2019 IEEE International Conference on Big Data (Big Data). :4067–4070.
Geocoding, the process of translating addresses to geographic coordinates, is a relatively straight-forward and well-studied process, but limitations due to privacy concerns may restrict usage of geographic data. The impact of these limitations are further compounded by the scale of the data, and in turn, also limits viable geocoding strategies. For example, healthcare data is protected by patient privacy laws in addition to possible institutional regulations that restrict external transmission and sharing of data. This results in the implementation of “in-house” geocoding solutions where data is processed behind an organization's firewall; quality assurance for these implementations is problematic because sensitive data cannot be used to externally validate results. In this paper, we present our software framework called bench4gis which benchmarks privacy-aware geocoding solutions by leveraging open big data as surrogate data for quality assurance; the scale of open big data sets for address data can ensure that results are geographically meaningful for the locale of the implementing institution.
2020-08-14
Gu, Zuxing, Wu, Jiecheng, Liu, Jiaxiang, Zhou, Min, Gu, Ming.  2019.  An Empirical Study on API-Misuse Bugs in Open-Source C Programs. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 1:11—20.
Today, large and complex software is developed with integrated components using application programming interfaces (APIs). Correct usage of APIs in practice presents a challenge due to implicit constraints, such as call conditions or call orders. API misuse, i.e., violation of these constraints, is a well-known source of bugs, some of which can cause serious security vulnerabilities. Although researchers have developed many API-misuse detectors over the last two decades, recent studies show that API misuses are still prevalent. In this paper, we provide a comprehensive empirical study on API-misuse bugs in open-source C programs. To understand the nature of API misuses in practice, we analyze 830 API-misuse bugs from six popular programs across different domains. For all the studied bugs, we summarize their root causes, fix patterns and usage statistics. Furthermore, to understand the capabilities and limitations of state-of-the-art static analysis detectors for API-misuse detection, we develop APIMU4C, a dataset of API-misuse bugs in C code based on our empirical study results, and evaluate three widely-used detectors on it qualitatively and quantitatively. We share all the findings and present possible directions towards more powerful API-misuse detectors.
2020-07-27
Babay, Amy, Schultz, John, Tantillo, Thomas, Amir, Yair.  2018.  Toward an Intrusion-Tolerant Power Grid: Challenges and Opportunities. 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). :1321–1326.
While cyberattacks pose a relatively new challenge for power grid control systems, commercial cloud systems have needed to address similar threats for many years. However, technology and approaches developed for cloud systems do not necessarily transfer directly to the power grid, due to important differences between the two domains. We discuss our experience adapting intrusion-tolerant cloud technologies to the power domain and describe the challenges we have encountered and potential directions for overcoming those obstacles.
2020-07-10
Koloveas, Paris, Chantzios, Thanasis, Tryfonopoulos, Christos, Skiadopoulos, Spiros.  2019.  A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:3—8.

The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness.

2020-05-15
Egert, Rolf, Grube, Tim, Born, Dustin, Mühlhäuser, Max.  2019.  Modular Vulnerability Indication for the IoT in IP-Based Networks. 2019 IEEE Globecom Workshops (GC Wkshps). :1—6.

With the rapidly increasing number of Internet of Things (IoT) devices and their extensive integration into peoples' daily lives, the security of those devices is of primary importance. Nonetheless, many IoT devices suffer from the absence, or the bad application, of security concepts, which leads to severe vulnerabilities in those devices. To achieve early detection of potential vulnerabilities, network scanner tools are frequently used. However, most of those tools are highly specialized; thus, multiple tools and a meaningful correlation of their results are required to obtain an adequate listing of identified network vulnerabilities. To simplify this process, we propose a modular framework for automated network reconnaissance and vulnerability indication in IP-based networks. It allows integrating a diverse set of tools as either, scanning tools or analysis tools. Moreover, the framework enables result aggregation of different modules and allows information sharing between modules facilitating the development of advanced analysis modules. Additionally, intermediate scanning and analysis data is stored, enabling a historical view of derived information and also allowing users to retrace decision-making processes. We show the framework's modular capabilities by implementing one scanner module and three analysis modules. The automated process is then evaluated using an exemplary scenario with common IP-based IoT components.

2020-04-13
Morishita, Shun, Hoizumi, Takuya, Ueno, Wataru, Tanabe, Rui, Gañán, Carlos, van Eeten, Michel J.G., Yoshioka, Katsunari, Matsumoto, Tsutomu.  2019.  Detect Me If You… Oh Wait. An Internet-Wide View of Self-Revealing Honeypots. 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). :134–143.
Open-source honeypots are a vital component in the protection of networks and the observation of trends in the threat landscape. Their open nature also enables adversaries to identify the characteristics of these honeypots in order to detect and avoid them. In this study, we investigate the prevalence of 14 open- source honeypots running more or less default configurations, making them easily detectable by attackers. We deploy 20 simple signatures and test them for false positives against servers for domains in the Alexa top 10,000, official FTP mirrors, mail servers in real operation, and real IoT devices running telnet. We find no matches, suggesting good accuracy. We then measure the Internet-wide prevalence of default open-source honeypots by matching the signatures with Censys scan data and our own scans. We discovered 19,208 honeypots across 637 Autonomous Systems that are trivially easy to identify. Concentrations are found in research networks, but also in enterprise, cloud and hosting networks. While some of these honeypots probably have no operational relevance, e.g., they are student projects, this explanation does not fit the wider population. One cluster of honeypots was confirmed to belong to a well-known security center and was in use for ongoing attack monitoring. Concentrations in an another cluster appear to be the result of government incentives. We contacted 11 honeypot operators and received response from 4 operators, suggesting the problem of lack of network hygiene. Finally, we find that some honeypots are actively abused by attackers for hosting malicious binaries. We notified the owners of the detected honeypots via their network operators and provided recommendations for customization to avoid simple signature-based detection. We also shared our results with the honeypot developers.
2020-02-17
Wang, Xinda, Sun, Kun, Batcheller, Archer, Jajodia, Sushil.  2019.  Detecting "0-Day" Vulnerability: An Empirical Study of Secret Security Patch in OSS. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :485–492.
Security patches in open source software (OSS) not only provide security fixes to identified vulnerabilities, but also make the vulnerable code public to the attackers. Therefore, armored attackers may misuse this information to launch N-day attacks on unpatched OSS versions. The best practice for preventing this type of N-day attacks is to keep upgrading the software to the latest version in no time. However, due to the concerns on reputation and easy software development management, software vendors may choose to secretly patch their vulnerabilities in a new version without reporting them to CVE or even providing any explicit description in their change logs. When those secretly patched vulnerabilities are being identified by armored attackers, they can be turned into powerful "0-day" attacks, which can be exploited to compromise not only unpatched version of the same software, but also similar types of OSS (e.g., SSL libraries) that may contain the same vulnerability due to code clone or similar design/implementation logic. Therefore, it is critical to identify secret security patches and downgrade the risk of those "0-day" attacks to at least "n-day" attacks. In this paper, we develop a defense system and implement a toolset to automatically identify secret security patches in open source software. To distinguish security patches from other patches, we first build a security patch database that contains more than 4700 security patches mapping to the records in CVE list. Next, we identify a set of features to help distinguish security patches from non-security ones using machine learning approaches. Finally, we use code clone identification mechanisms to discover similar patches or vulnerabilities in similar types of OSS. The experimental results show our approach can achieve good detection performance. A case study on OpenSSL, LibreSSL, and BoringSSL discovers 12 secret security patches.
2020-02-10
Ben Othmane, Lotfi, Jamil, Ameerah-Muhsina, Abdelkhalek, Moataz.  2019.  Identification of the Impacts of Code Changes on the Security of Software. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 2:569–574.
Companies develop their software in versions and iterations. Ensuring the security of each additional version using code review is costly and time consuming. This paper investigates automated tracing of the impacts of code changes on the security of a given software. To this end, we use call graphs to model the software code, and security assurance cases to model the security requirements of the software. Then we relate assurance case elements to code through the entry point methods of the software, creating a map of monitored security functions. This mapping allows to evaluate the security requirements that are affected by code changes. The approach is implemented in a set of tools and evaluated using three open-source ERP/E-commerce software applications. The limited evaluation showed that the approach is effective in identifying the impacts of code changes on the security of the software. The approach promises to considerably reduce the security assessment time of the subsequent releases and iterations of software, keeping the initial security state throughout the software lifetime.
2019-09-26
Miletić, M., Vuku\v sić, M., Mau\v sa, G., Grbac, T. G..  2018.  Cross-Release Code Churn Impact on Effort-Aware Software Defect Prediction. 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). :1460-1466.

Code churn has been successfully used to identify defect inducing changes in software development. Our recent analysis of the cross-release code churn showed that several design metrics exhibit moderate correlation with the number of defects in complex systems. The goal of this paper is to explore whether cross-release code churn can be used to identify critical design change and contribute to prediction of defects for software in evolution. In our case study, we used two types of data from consecutive releases of open-source projects, with and without cross-release code churn, to build standard prediction models. The prediction models were trained on earlier releases and tested on the following ones, evaluating the performance in terms of AUC, GM and effort aware measure Pop. The comparison of their performance was used to answer our research question. The obtained results showed that the prediction model performs better when cross-release code churn is included. Practical implication of this research is to use cross-release code churn to aid in safe planning of next release in software development.

2019-02-25
Alami, Adam, Dittrich, Yvonne, Wąsowski, Andrzej.  2018.  Influencers of Quality Assurance in an Open Source Community. Proceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering. :61-68.
ROS (Robot Operating System) is an open source community in robotics that is developing standard robotics operating system facilities such as hardware abstraction, low-level device control, communication middleware, and a wide range of software components for robotics functionality. This paper studies the quality assurance practices of the ROS community. We use qualitative methods to understand how ideology, priorities of the community, culture, sustainability, complexity, and adaptability of the community affect the implementation of quality assurance practices. Our analysis suggests that software engineering practices require social and cultural alignment and adaptation to the community particularities to achieve seamless implementation in open source environments. This alignment should be incorporated into the design and implementation of quality assurance practices in open source communities.
2019-02-14
Jenkins, J., Cai, H..  2018.  Leveraging Historical Versions of Android Apps for Efficient and Precise Taint Analysis. 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). :265-269.

Today, computing on various Android devices is pervasive. However, growing security vulnerabilities and attacks in the Android ecosystem constitute various threats through user apps. Taint analysis is a common technique for defending against these threats, yet it suffers from challenges in attaining practical simultaneous scalability and effectiveness. This paper presents a novel approach to fast and precise taint checking, called incremental taint analysis, by exploiting the evolving nature of Android apps. The analysis narrows down the search space of taint checking from an entire app, as conventionally addressed, to the parts of the program that are different from its previous versions. This technique improves the overall efficiency of checking multiple versions of the app as it evolves. We have implemented the techniques as a tool prototype, EVOTAINT, and evaluated our analysis by applying it to real-world evolving Android apps. Our preliminary results show that the incremental approach largely reduced the cost of taint analysis, by 78.6% on average, yet without sacrificing the analysis effectiveness, relative to a representative precise taint analysis as the baseline.

2018-12-03
Ma, Y..  2018.  Constructing Supply Chains in Open Source Software. 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion). :458–459.

The supply chain is an extremely successful way to cope with the risk posed by distributed decision making in product sourcing and distribution. While open source software has similarly distributed decision making and involves code and information flows similar to those in ordinary supply chains, the actual networks necessary to quantify and communicate risks in software supply chains have not been constructed on large scale. This work proposes to close this gap by measuring dependency, code reuse, and knowledge flow networks in open source software. We have done preliminary work by developing suitable tools and methods that rely on public version control data to measure and comparing these networks for R language and emberjs packages. We propose ways to calculate the three networks for the entirety of public software, evaluate their accuracy, and to provide public infrastructure to build risk assessment and mitigation tools for various individual and organizational participants in open sources software. We hope that this infrastructure will contribute to more predictable experience with OSS and lead to its even wider adoption.

2018-06-07
Reynolds, Z. P., Jayanth, A. B., Koc, U., Porter, A. A., Raje, R. R., Hill, J. H..  2017.  Identifying and Documenting False Positive Patterns Generated by Static Code Analysis Tools. 2017 IEEE/ACM 4th International Workshop on Software Engineering Research and Industrial Practice (SER IP). :55–61.

This paper presents our results from identifying anddocumenting false positives generated by static code analysistools. By false positives, we mean a static code analysis toolgenerates a warning message, but the warning message isnot really an error. The goal of our study is to understandthe different kinds of false positives generated so we can (1)automatically determine if an error message is truly indeed a truepositive, and (2) reduce the number of false positives developersand testers must triage. We have used two open-source tools andone commercial tool in our study. The results of our study haveled to 14 core false positive patterns, some of which we haveconfirmed with static code analysis tool developers.