Biblio

List
Filter

Found 152 results

Filters: Keyword is software engineering [Clear All Filters]

2021-06-24

Angermeir, Florian, Voggenreiter, Markus, Moyón, Fabiola, Mendez, Daniel. 2021. Enterprise-Driven Open Source Software: A Case Study on Security Automation. 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). :278—287.

Agile and DevOps are widely adopted by the industry. Hence, integrating security activities with industrial practices, such as continuous integration (CI) pipelines, is necessary to detect security flaws and adhere to regulators’ demands early. In this paper, we analyze automated security activities in CI pipelines of enterprise-driven open source software (OSS). This shall allow us, in the long-run, to better understand the extent to which security activities are (or should be) part of automated pipelines. In particular, we mine publicly available OSS repositories and survey a sample of project maintainers to better understand the role that security activities and their related tools play in their CI pipelines. To increase transparency and allow other researchers to replicate our study (and to take different perspectives), we further disclose our research artefacts.Our results indicate that security activities in enterprise-driven OSS projects are scarce and protection coverage is rather low. Only 6.83% of the analyzed 8,243 projects apply security automation in their CI pipelines, even though maintainers consider security to be rather important. This alerts industry to keep the focus on vulnerabilities of 3rd Party software and it opens space for other improvements of practice which we outline in this manuscript.

Moran, Kevin, Palacio, David N., Bernal-Cárdenas, Carlos, McCrystal, Daniel, Poshyvanyk, Denys, Shenefiel, Chris, Johnson, Jeff. 2020. Improving the Effectiveness of Traceability Link Recovery using Hierarchical Bayesian Networks. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :873—885.

Traceability is a fundamental component of the modern software development process that helps to ensure properly functioning, secure programs. Due to the high cost of manually establishing trace links, researchers have developed automated approaches that draw relationships between pairs of textual software artifacts using similarity measures. However, the effectiveness of such techniques are often limited as they only utilize a single measure of artifact similarity and cannot simultaneously model (implicit and explicit) relationships across groups of diverse development artifacts. In this paper, we illustrate how these limitations can be overcome through the use of a tailored probabilistic model. To this end, we design and implement a HierarchiCal PrObabilistic Model for SoftwarE Traceability (Comet) that is able to infer candidate trace links. Comet is capable of modeling relationships between artifacts by combining the complementary observational prowess of multiple measures of textual similarity. Additionally, our model can holistically incorporate information from a diverse set of sources, including developer feedback and transitive (often implicit) relationships among groups of software artifacts, to improve inference accuracy. We conduct a comprehensive empirical evaluation of Comet that illustrates an improvement over a set of optimally configured baselines of ≈14% in the best case and ≈5% across all subjects in terms of average precision. The comparative effectiveness of Comet in practice, where optimal configuration is typically not possible, is likely to be higher. Finally, we illustrate Comet's potential for practical applicability in a survey with developers from Cisco Systems who used a prototype Comet Jenkins plugin.

2021-06-01

Ghosal, Sandip, Shyamasundar, R. K.. 2020. A Generalized Notion of Non-interference for Flow Security of Sequential and Concurrent Programs. 2020 27th Asia-Pacific Software Engineering Conference (APSEC). :51–60.

For the last two decades, a wide spectrum of interpretations of non-interference11The notion of non-interference discussed in this paper enforces flow security in a program and is different from the concept of non-interference used for establishing functional correctness of parallel programs [1] have been used in the security analysis of programs, starting with the notion proposed by Goguen & Meseguer along with arguments of its impact on security practice. While the majority of works deal with sequential programs, several researchers have extended the notion of non-interference to enforce information flow-security in non-deterministic and concurrent programs. Major efforts of generalizations are based on (i) considering input sequences as a basic unit for input/output with semantic interpretation on a two-point information flow lattice, or (ii) typing of expressions as values for reading and writing, or (iii) typing of expressions along with its limited effects. Such approaches have limited compositionality and, thus, pose issues while extending these notions for concurrent programs. Further, in a general multi-point lattice, the notion of a public observer (or attacker) is not unique as it depends on the level of the attacker and the one attacked. In this paper, we first propose a compositional variant of non-interference for sequential systems that follow a general information flow lattice and place it in the context of earlier definitions of non-interference. We show that such an extension leads to the capturing of violations of information flow security in a concrete setting of a sequential language. Finally, we generalize non-interference for concurrent programs and illustrate its use for security analysis, particularly in the cases where information is transmitted through shared variables.

2021-05-03

Paulsen, Brandon, Wang, Jingbo, Wang, Jiawei, Wang, Chao. 2020. NEURODIFF: Scalable Differential Verification of Neural Networks using Fine-Grained Approximation. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :784–796.

As neural networks make their way into safety-critical systems, where misbehavior can lead to catastrophes, there is a growing interest in certifying the equivalence of two structurally similar neural networks - a problem known as differential verification. For example, compression techniques are often used in practice for deploying trained neural networks on computationally- and energy-constrained devices, which raises the question of how faithfully the compressed network mimics the original network. Unfortunately, existing methods either focus on verifying a single network or rely on loose approximations to prove the equivalence of two networks. Due to overly conservative approximation, differential verification lacks scalability in terms of both accuracy and computational cost. To overcome these problems, we propose NEURODIFF, a symbolic and fine-grained approximation technique that drastically increases the accuracy of differential verification on feed-forward ReLU networks while achieving many orders-of-magnitude speedup. NEURODIFF has two key contributions. The first one is new convex approximations that more accurately bound the difference of two networks under all possible inputs. The second one is judicious use of symbolic variables to represent neurons whose difference bounds have accumulated significant error. We find that these two techniques are complementary, i.e., when combined, the benefit is greater than the sum of their individual benefits. We have evaluated NEURODIFF on a variety of differential verification tasks. Our results show that NEURODIFF is up to 1000X faster and 5X more accurate than the state-of-the-art tool.

2021-04-09

Smith, B., Feather, M. S., Huntsberger, T., Bocchino, R.. 2020. Software Assurance of Autonomous Spacecraft Control. 2020 Annual Reliability and Maintainability Symposium (RAMS). :1—7.

Summary & Conclusions: The work described addresses assurance of a planning and execution software system being added to an in-orbit CubeSat to demonstrate autonomous control of that spacecraft. Our focus was on how to develop assurance of the correct operation of the added software in its operational context, our approach to which was to use an assurance case to guide and organize the information involved. The relatively manageable magnitude of the CubeSat and its autonomy demonstration experiment made it plausible to try out our assurance approach in a relatively short timeframe. Additionally, the time was ripe to inject useful assurance results into the ongoing development and testing of the autonomy demonstration. In conducting this, we sought to answer several questions about our assurance approach. The questions, and the conclusions we reached, are as follows: 1. Question: Would our approach to assurance apply to the introduction of a planning and execution software into an existing system? Conclusion: Yes. The use of an assurance case helped focus our attention on the more challenging aspects, notably the interactions between the added software and the existing software system into which it was being introduced. This guided us to choose a hazard analysis method specifically for software interactions. In addition, we were able to automate generation of assurance case elements from the hazard analysis' tabular representation. 2. Question: Would our methods prove understandable to the software engineers tasked with integrating the software into the CubeSat's existing system? Conclusion: Somewhat. In interim discussions with the software engineers we found the assurance case style, of decomposing an argument into smaller pieces, to be useful and understandable to organize discussion. Ultimately however we did not persuade them to adopt assurance cases as the means to present review information. We attribute this to reluctance to deviate from JPL's tried and true style of holding reviews. For the CubeSat project as a whole, hosting an autonomy demonstration was already a novelty. Combining this with presentation of review information via an assurance case, with which our reviewers would be unaccustomed, would have exacerbated the unfamiliarity. 3. Question: Would conducting our methods prove to be compatible with the (limited) time available of the software engineers? Conclusion: Yes. We used a series of six brief meetings (approximately one hour each) with the development team to first identify the interactions as the area on which to focus, and to then perform the hazard analysis on those interactions. We used the meetings to confirm, or correct as necessary, our understanding of the software system and the spacecraft context. Between meetings we studied the existing software documentation, did preliminary analyses by ourselves, and documented the results in a concise form suitable for discussion with the team. 4. Question: Would our methods yield useful results to the software engineers? Conclusion: Yes. The hazard analysis systematically confirmed existing hazards' mitigations, and drew attention to a mitigation whose implementation needed particular care. In some cases, the analysis identified potential hazards - and what to do about them - should some of the more sophisticated capabilities of the planning and execution software be used. These capabilities, not exercised in the initial experiments on the CubeSat, may be used in future experiments. We remain involved with the developers as they prepare for these future experiments, so our analysis results will be of benefit as these proceed.

2021-04-08

Mundie, D. A., Perl, S., Huth, C. L.. 2013. Toward an Ontology for Insider Threat Research: Varieties of Insider Threat Definitions. 2013 Third Workshop on Socio-Technical Aspects in Security and Trust. :26—36.

The lack of standardization of the terms insider and insider threat has been a noted problem for researchers in the insider threat field. This paper describes the investigation of 42 different definitions of the terms insider and insider threat, with the goal of better understanding the current conceptual model of insider threat and facilitating communication in the research community.

Claycomb, W. R., Huth, C. L., Phillips, B., Flynn, L., McIntire, D.. 2013. Identifying indicators of insider threats: Insider IT sabotage. 2013 47th International Carnahan Conference on Security Technology (ICCST). :1—5.

This paper describes results of a study seeking to identify observable events related to insider sabotage. We collected information from actual insider threat cases, created chronological timelines of the incidents, identified key points in each timeline such as when attack planning began, measured the time between key events, and looked for specific observable events or patterns that insiders held in common that may indicate insider sabotage is imminent or likely. Such indicators could be used by security experts to potentially identify malicious activity at or before the time of attack. Our process included critical steps such as identifying the point of damage to the organization as well as any malicious events prior to zero hour that enabled the attack but did not immediately cause harm. We found that nearly 71% of the cases we studied had either no observable malicious action prior to attack, or had one that occurred less than one day prior to attack. Most of the events observed prior to attack were behavioral, not technical, especially those occurring earlier in the case timelines. Of the observed technical events prior to attack, nearly one third involved installation of software onto the victim organizations IT systems.

2021-03-29

Grundy, J.. 2020. Human-centric Software Engineering for Next Generation Cloud- and Edge-based Smart Living Applications. 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). :1—10.

Humans are a key part of software development, including customers, designers, coders, testers and end users. In this keynote talk I explain why incorporating human-centric issues into software engineering for next-generation applications is critical. I use several examples from our recent and current work on handling human-centric issues when engineering various `smart living' cloud- and edge-based software systems. This includes using human-centric, domain-specific visual models for non-technical experts to specify and generate data analysis applications; personality impact on aspects of software activities; incorporating end user emotions into software requirements engineering for smart homes; incorporating human usage patterns into emerging edge computing applications; visualising smart city-related data; reporting diverse software usability defects; and human-centric security and privacy requirements for smart living systems. I assess the usefulness of these approaches, highlight some outstanding research challenges, and briefly discuss our current work on new human-centric approaches to software engineering for smart living applications.

2021-03-22

Sai, C. C., Prakash, C. S., Jose, J., Mana, S. C., Samhitha, B. K.. 2020. Analysing Android App Privacy Using Classification Algorithm. 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184). :551–555.

The interface permits the client to scan for a subjective utility on the Play Store; the authorizations posting and the protection arrangement are then routinely recovered, on all events imaginable. The client has then the capability of choosing an interesting authorization, and a posting of pertinent sentences are separated with the guide of the privateer's inclusion and introduced to them, alongside a right depiction of the consent itself. Such an interface allows the client to rapidly assess the security-related dangers of an Android application, by utilizing featuring the pertinent segments of the privateer's inclusion and by introducing helpful data about shrewd authorizations. A novel procedure is proposed for the assessment of privateer's protection approaches with regards to Android applications. The gadget actualized widely facilitates the way toward understanding the security ramifications of placing in 1/3 birthday celebration applications and it has just been checked in a situation to feature troubling examples of uses. The gadget is created in light of expandability, and correspondingly inclines in the strategy can without trouble be worked in to broaden the unwavering quality and adequacy. Likewise, if your application handles non-open or delicate individual information, it would be ideal if you also allude to the extra necessities in the “Individual and Sensitive Information” territory underneath. These Google Play necessities are notwithstanding any prerequisites endorsed by method for material security or data assurance laws. It has been proposed that, an individual who needs to perform the establishment and utilize any 1/3 festival application doesn't perceive the significance and which methods for the consents mentioned by method for an application, and along these lines sincerely gives all the authorizations as a final product of which unsafe applications furthermore get set up and work their malevolent leisure activity in the rear of the scene.

Kellogg, M., Schäf, M., Tasiran, S., Ernst, M. D.. 2020. Continuous Compliance. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :511–523.

Vendors who wish to provide software or services to large corporations and governments must often obtain numerous certificates of compliance. Each certificate asserts that the software satisfies a compliance regime, like SOC or the PCI DSS, to protect the privacy and security of sensitive data. The industry standard for obtaining a compliance certificate is an auditor manually auditing source code. This approach is expensive, error-prone, partial, and prone to regressions. We propose continuous compliance to guarantee that the codebase stays compliant on each code change using lightweight verification tools. Continuous compliance increases assurance and reduces costs. Continuous compliance is applicable to any source-code compliance requirement. To illustrate our approach, we built verification tools for five common audit controls related to data security: cryptographically unsafe algorithms must not be used, keys must be at least 256 bits long, credentials must not be hard-coded into program text, HTTPS must always be used instead of HTTP, and cloud data stores must not be world-readable. We evaluated our approach in three ways. (1) We applied our tools to over 5 million lines of open-source software. (2) We compared our tools to other publicly-available tools for detecting misuses of encryption on a previously-published benchmark, finding that only ours are suitable for continuous compliance. (3) We deployed a continuous compliance process at AWS, a large cloud-services company: we integrated verification tools into the compliance process (including auditors accepting their output as evidence) and ran them on over 68 million lines of code. Our tools and the data for the former two evaluations are publicly available.

2021-03-18

Baolin, X., Minhuan, Z.. 2020. A Solution of Text Based CAPTCHA without Network Flow Consumption. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :395—399.

With the widespread application of distributed information processing, information processing security issues have become one of the important research topics; CAPTCHA technology is often used as the first security barrier for distributed information processing and it prevents the client malicious programs to attack the server. The experiment proves that the existing “request / response” mode of CAPTCHA has great security risks. “The text-based CAPTCHA solution without network flow consumption” proposed in this paper avoids the “request / response” mode and the verification logic of the text-based CAPTCHA is migrated to the client in this solution, which fundamentally cuts off the client's attack facing to the server during the verification of the CAPTCHA and it is a high-security text-based CAPTCHA solution without network flow consumption.

2021-03-15

Staicu, C.-A., Torp, M. T., Schäfer, M., Møller, A., Pradel, M.. 2020. Extracting Taint Specifications for JavaScript Libraries. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :198—209.

Modern JavaScript applications extensively depend on third-party libraries. Especially for the Node.js platform, vulnerabilities can have severe consequences to the security of applications, resulting in, e.g., cross-site scripting and command injection attacks. Existing static analysis tools that have been developed to automatically detect such issues are either too coarse-grained, looking only at package dependency structure while ignoring dataflow, or rely on manually written taint specifications for the most popular libraries to ensure analysis scalability. In this work, we propose a technique for automatically extracting taint specifications for JavaScript libraries, based on a dynamic analysis that leverages the existing test suites of the libraries and their available clients in the npm repository. Due to the dynamic nature of JavaScript, mapping observations from dynamic analysis to taint specifications that fit into a static analysis is non-trivial. Our main insight is that this challenge can be addressed by a combination of an access path mechanism that identifies entry and exit points, and the use of membranes around the libraries of interest. We show that our approach is effective at inferring useful taint specifications at scale. Our prototype tool automatically extracts 146 additional taint sinks and 7 840 propagation summaries spanning 1 393 npm modules. By integrating the extracted specifications into a commercial, state-of-the-art static analysis, 136 new alerts are produced, many of which correspond to likely security vulnerabilities. Moreover, many important specifications that were originally manually written are among the ones that our tool can now extract automatically.

Danilova, A., Naiakshina, A., Smith, M.. 2020. One Size Does Not Fit All: A Grounded Theory and Online Survey Study of Developer Preferences for Security Warning Types. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :136–148.

A wide range of tools exist to assist developers in creating secure software. Many of these tools, such as static analysis engines or security checkers included in compilers, use warnings to communicate security issues to developers. The effectiveness of these tools relies on developers heeding these warnings, and there are many ways in which these warnings could be displayed. Johnson et al. [46] conducted qualitative research and found that warning presentation and integration are main issues. We built on Johnson et al.'s work and examined what developers want from security warnings, including what form they should take and how they should integrate into their workflow and work context. To this end, we conducted a Grounded Theory study with 14 professional software developers and 12 computer science students as well as a focus group with 7 academic researchers to gather qualitative insights. To back up the theory developed from the qualitative research, we ran a quantitative survey with 50 professional software developers. Our results show that there is significant heterogeneity amongst developers and that no one warning type is preferred over all others. The context in which the warnings are shown is also highly relevant, indicating that it is likely to be beneficial if IDEs and other development tools become more flexible in their warning interactions with developers. Based on our findings, we provide concrete recommendations for both future research as well as how IDEs and other security tools can improve their interaction with developers.

Hwang, S., Ryu, S.. 2020. Gap between Theory and Practice: An Empirical Study of Security Patches in Solidity. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :542–553.

Ethereum, one of the most popular blockchain platforms, provides financial transactions like payments and auctions through smart contracts. Due to the immense interest in smart contracts in academia, the research community of smart contract security has made a significant improvement recently. Researchers have reported various security vulnerabilities in smart contracts, and developed static analysis tools and verification frameworks to detect them. However, it is unclear whether such great efforts from academia has indeed enhanced the security of smart contracts in reality. To understand the security level of smart contracts in the wild, we empirically studied 55,046 real-world Ethereum smart contracts written in Solidity, the most popular programming language used by Ethereum smart contract developers. We first examined how many well-known vulnerabilities the Solidity compiler has patched, and how frequently the Solidity team publishes compiler releases. Unfortunately, we observed that many known vulnerabilities are not yet patched, and some patches are not even sufficient to avoid their target vulnerabilities. Subsequently, we investigated whether smart contract developers use the most recent compiler with vulnerabilities patched. We reported that developers of more than 98% of real-world Solidity contracts still use older compilers without vulnerability patches, and more than 25% of the contracts are potentially vulnerable due to the missing security patches. To understand actual impacts of the missing patches, we manually investigated potentially vulnerable contracts that are detected by our static analyzer and identified common mistakes by Solidity developers, which may cause serious security issues such as financial loss. We detected hundreds of vulnerable contracts and about one fourth of the vulnerable contracts are used by thousands of people. We recommend the Solidity team to make patches that resolve known vulnerabilities correctly, and developers to use the latest Solidity compiler to avoid missing security patches.

Brauckmann, A., Goens, A., Castrillon, J.. 2020. ComPy-Learn: A toolbox for exploring machine learning representations for compilers. 2020 Forum for Specification and Design Languages (FDL). :1–4.

Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.

2021-03-09

Zhou, B., He, J., Tan, M.. 2020. A Two-stage P2P Botnet Detection Method Based on Statistical Features. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :497—502.

P2P botnet has become one of the most serious threats to today's network security. It can be used to launch kinds of malicious activities, ranging from spamming to distributed denial of service attack. However, the detection of P2P botnet is always challenging because of its decentralized architecture. In this paper, we propose a two-stage P2P botnet detection method which only relies on several traffic statistical features. This method first detects P2P hosts based on three statistical features, and then distinguishes P2P bots from benign P2P hosts by means of another two statistical features. Experimental evaluations on real-world traffic datasets shows that our method is able to detect hidden P2P bots with a detection accuracy of 99.7% and a false positive rate of only 0.3% within 5 minutes.

Mashhadi, M. J., Hemmati, H.. 2020. Hybrid Deep Neural Networks to Infer State Models of Black-Box Systems. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :299–311.

Inferring behavior model of a running software system is quite useful for several automated software engineering tasks, such as program comprehension, anomaly detection, and testing. Most existing dynamic model inference techniques are white-box, i.e., they require source code to be instrumented to get run-time traces. However, in many systems, instrumenting the entire source code is not possible (e.g., when using black-box third-party libraries) or might be very costly. Unfortunately, most black-box techniques that detect states over time are either univariate, or make assumptions on the data distribution, or have limited power for learning over a long period of past behavior. To overcome the above issues, in this paper, we propose a hybrid deep neural network that accepts as input a set of time series, one per input/output signal of the system, and applies a set of convolutional and recurrent layers to learn the non-linear correlations between signals and the patterns, over time. We have applied our approach on a real UAV auto-pilot solution from our industry partner with half a million lines of C code. We ran 888 random recent system-level test cases and inferred states, over time. Our comparison with several traditional time series change point detection techniques showed that our approach improves their performance by up to 102%, in terms of finding state change points, measured by F1 score. We also showed that our state classification algorithm provides on average 90.45% F1 score, which improves traditional classification algorithms by up to 17%.

2021-03-04

Yangchun, Z., Zhao, Y., Yang, J.. 2020. New Virus Infection Technology and Its Detection. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :388—394.

Computer virus detection technology is an important basic security technology in the information age. The current detection technology has a high success rate for the detection of known viruses and known virus infection technologies, but the development of detection technology often lags behind the development of computer virus infection technology. Under Windows system, there are many kinds of file viruses, which change rapidly, and pose a continuous security threat to users. The research of new file virus infection technology can provide help for the development of virus detection technology. In this paper, a new virus infection technology based on dynamic binary analysis is proposed to execute file virus infection. Using the new virus infection technology, the infected executable file can be detected in the experimental environment. At the same time, this paper discusses the detection method of new virus infection technology. We hope to provide help for the development of virus detection technology from the perspective of virus design.

2021-02-08

Chesnokov, N. I., Korochentsev, D. A., Cherckesova, L. V., Safaryan, O. A., Chumakov, V. E., Pilipenko, I. A.. 2020. Software Development of Electronic Digital Signature Generation at Institution Electronic Document Circulation. 2020 IEEE East-West Design Test Symposium (EWDTS). :1–5.

the purpose of this paper is investigation of existing approaches to formation of electronic digital signatures, as well as the possibility of software developing for electronic signature generation at electronic document circulation of institution. The article considers and analyzes the existing algorithms for generating and processing electronic signatures. Authors propose the model for documented information exchanging in institution, including cryptographic module and secure key storage, blockchain storage of electronic signatures, central web-server and web-interface. Examples of the developed software are demonstrated, and recommendations are given for its implementation, integration and using in different institutions.

Wang, Y., Wen, M., Liu, Y., Wang, Y., Li, Z., Wang, C., Yu, H., Cheung, S.-C., Xu, C., Zhu, Z.. 2020. Watchman: Monitoring Dependency Conflicts for Python Library Ecosystem. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :125–135.

The PyPI ecosystem has indexed millions of Python libraries to allow developers to automatically download and install dependencies of their projects based on the specified version constraints. Despite the convenience brought by automation, version constraints in Python projects can easily conflict, resulting in build failures. We refer to such conflicts as Dependency Conflict (DC) issues. Although DC issues are common in Python projects, developers lack tool support to gain a comprehensive knowledge for diagnosing the root causes of these issues. In this paper, we conducted an empirical study on 235 real-world DC issues. We studied the manifestation patterns and fixing strategies of these issues and found several key factors that can lead to DC issues and their regressions. Based on our findings, we designed and implemented Watchman, a technique to continuously monitor dependency conflicts for the PyPI ecosystem. In our evaluation, Watchman analyzed PyPI snapshots between 11 Jul 2019 and 16 Aug 2019, and found 117 potential DC issues. We reported these issues to the developers of the corresponding projects. So far, 63 issues have been confirmed, 38 of which have been quickly fixed by applying our suggested patches.

2021-02-01

Calhoun, C. S., Reinhart, J., Alarcon, G. A., Capiola, A.. 2020. Establishing Trust in Binary Analysis in Software Development and Applications. 2020 IEEE International Conference on Human-Machine Systems (ICHMS). :1–4.

The current exploratory study examined software programmer trust in binary analysis techniques used to evaluate and understand binary code components. Experienced software developers participated in knowledge elicitations to identify factors affecting trust in tools and methods used for understanding binary code behavior and minimizing potential security vulnerabilities. Developer perceptions of trust in those tools to assess implementation risk in binary components were captured across a variety of application contexts. The software developers reported source security and vulnerability reports provided the best insight and awareness of potential issues or shortcomings in binary code. Further, applications where the potential impact to systems and data loss is high require relying on more than one type of analysis to ensure the binary component is sound. The findings suggest binary analysis is viable for identifying issues and potential vulnerabilities as part of a comprehensive solution for understanding binary code behavior and security vulnerabilities, but relying simply on binary analysis tools and binary release metadata appears insufficient to ensure a secure solution.

2021-01-20

Hazhirpasand, M., Ghafari, M., Nierstrasz, O.. 2020. CryptoExplorer: An Interactive Web Platform Supporting Secure Use of Cryptography APIs. 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). :632—636.

Research has shown that cryptographic APIs are hard to use. Consequently, developers resort to using code examples available in online information sources that are often not secure. We have developed a web platform, named CryptoExplorer, stocked with numerous real-world secure and insecure examples that developers can explore to learn how to use cryptographic APIs properly. This platform currently provides 3 263 secure uses, and 5 897 insecure uses of Java Cryptography Architecture mined from 2 324 Java projects on GitHub. A preliminary study shows that CryptoExplorer provides developers with secure crypto API use examples instantly, developers can save time compared to searching on the internet for such examples, and they learn to avoid using certain algorithms in APIs by studying misused API examples. We have a pipeline to regularly mine more projects, and, on request, we offer our dataset to researchers.

Mindermann, K., Wagner, S.. 2020. Fluid Intelligence Doesn't Matter! Effects of Code Examples on the Usability of Crypto APIs. 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). :306—307.

Context : Programmers frequently look for the code of previously solved problems that they can adapt for their own problem. Despite existing example code on the web, on sites like Stack Overflow, cryptographic Application Programming Interfaces (APIs) are commonly misused. There is little known about what makes examples helpful for developers in using crypto APIs. Analogical problem solving is a psychological theory that investigates how people use known solutions to solve new problems. There is evidence that the capacity to reason and solve novel problems a.k.a Fluid Intelligence (Gf) and structurally and procedurally similar solutions support problem solving. Aim: Our goal is to understand whether similarity and Gf also have an effect in the context of using cryptographic APIs with the help of code examples. Method : We conducted a controlled experiment with 76 student participants developing with or without procedurally similar examples, one of two Java crypto libraries and measured the Gf of the participants as well as the effect on usability (effectiveness, efficiency, satisfaction) and security bugs. Results: We observed a strong effect of code examples with a high procedural similarity on all dependent variables. Fluid intelligence Gf had no effect. It also made no difference which library the participants used. Conclusions: Example code must be more highly similar to a concrete solution, not very abstract and generic to have a positive effect in a development task.

2021-01-11

Wang, J., Wang, A.. 2020. An Improved Collaborative Filtering Recommendation Algorithm Based on Differential Privacy. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :310–315.

In this paper, differential privacy protection method is applied to matrix factorization method that used to solve the recommendation problem. For centralized recommendation scenarios, a collaborative filtering recommendation model based on matrix factorization is established, and a matrix factorization mechanism satisfying ε-differential privacy is proposed. Firstly, the potential characteristic matrix of users and projects is constructed. Secondly, noise is added to the matrix by the method of target disturbance, which satisfies the differential privacy constraint, then the noise matrix factorization model is obtained. The parameters of the model are obtained by the stochastic gradient descent algorithm. Finally, the differential privacy matrix factorization model is used for score prediction. The effectiveness of the algorithm is evaluated on the public datasets including Movielens and Netflix. The experimental results show that compared with the existing typical recommendation methods, the new matrix factorization method with privacy protection can recommend within a certain range of recommendation accuracy loss while protecting the users' privacy information.

2020-12-07

Qian, Y.. 2019. Research on Trusted Authentication Model and Mechanism of Data Fusion. 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). :479–482.

Firstly, this paper analyses the technical foundation of single sign-on solution of unified authentication platform, and analyses the advantages and disadvantages of each solution. Secondly, from the point of view of software engineering, such as function requirement, performance requirement, development mode, architecture scheme, technology development framework and system configuration environment of the unified authentication platform, the unified authentication platform is analyzed and designed, and the database design and system design framework of the system are put forward according to the system requirements. Thirdly, the idea and technology of unified authentication platform based on JA-SIG CAS are discussed, and the design and implementation of each module of unified authentication platform based on JA-SIG CAS are analyzed, which has been applied in ship cluster platform.