Visible to the public Biblio

Filters: Keyword is Cloning  [Clear All Filters]
2023-04-28
Li, Zongjie, Ma, Pingchuan, Wang, Huaijin, Wang, Shuai, Tang, Qiyi, Nie, Sen, Wu, Shi.  2022.  Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). :2253–2265.
Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural program embeddings di-rectly from the program source codes, by learning from features such as tokens, abstract syntax trees, and control flow graphs. This paper takes a fresh look at how to improve program embed-dings by leveraging compiler intermediate representation (IR). We first demonstrate simple yet highly effective methods for enhancing embedding quality by training embedding models alongside source code and LLVM IR generated by default optimization levels (e.g., -02). We then introduce IRGEN, a framework based on genetic algorithms (GA), to identify (near-)optimal sequences of optimization flags that can significantly improve embedding quality. We use IRGEN to find optimal sequences of LLVM optimization flags by performing GA on source code datasets. We then extend a popular code embedding model, CodeCMR, by adding a new objective based on triplet loss to enable a joint learning over source code and LLVM IR. We benchmark the quality of embedding using a rep-resentative downstream application, code clone detection. When CodeCMR was trained with source code and LLVM IRs optimized by findings of IRGEN, the embedding quality was significantly im-proved, outperforming the state-of-the-art model, CodeBERT, which was trained only with source code. Our augmented CodeCMR also outperformed CodeCMR trained over source code and IR optimized with default optimization levels. We investigate the properties of optimization flags that increase embedding quality, demonstrate IRGEN's generalization in boosting other embedding models, and establish IRGEN's use in settings with extremely limited training data. Our research and findings demonstrate that a straightforward addition to modern neural code embedding models can provide a highly effective enhancement.
2023-04-14
Salcedo, Mathew David, Abid, Mehdi, Kim, Yoohwan, Jo, Ju-Yeon.  2022.  Evil-Twin Browsers: Using Open-Source Code to Clone Browsers for Malicious Purposes. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). :0776—0784.
Browsers are one of the most widely used types of software around the world. This prevalence makes browsers a prime target for cyberattacks. To mitigate these threats, users can practice safe browsing habits and take advantage of the security features available to browsers. These protections, however, could be severely crippled if the browser itself were malicious. Presented in this paper is the concept of the evil-twin browser (ETB), a clone of a legitimate browser that looks and behaves identically to the original browser, but discreetly performs other tasks that harm a user's security. To better understand the concept of the evil-twin browser, a prototype ETB named ChroNe was developed. The creation and installation process of ChroN e is discussed in this paper. This paper also explores the motivation behind creating such a browser, examines existing relevant work, inspects the open-source codebase Chromium that assisted in ChroNe's development, and discusses relevant topics like ways to deliver an ETB, the capabilities of an ETB, and possible ways to defend against ETBs.
2022-03-01
Huang, Shanshi, Peng, Xiaochen, Jiang, Hongwu, Luo, Yandong, Yu, Shimeng.  2021.  Exploiting Process Variations to Protect Machine Learning Inference Engine from Chip Cloning. 2021 IEEE International Symposium on Circuits and Systems (ISCAS). :1–5.
Machine learning inference engine is of great interest to smart edge computing. Compute-in-memory (CIM) architecture has shown significant improvements in throughput and energy efficiency for hardware acceleration. Emerging nonvolatile memory (eNVM) technologies offer great potentials for instant on and off by dynamic power gating. Inference engine is typically pre-trained by the cloud and then being deployed to the field. There is a new security concern on cloning of the weights stored on eNVM-based CIM chip. In this paper, we propose a countermeasure to the weight cloning attack by exploiting the process variations of the periphery circuitry. In particular, we use weight fine-tuning to compensate the analog-to-digital converter (ADC) offset for a specific chip instance while inducing significant accuracy drop for cloned chip instances. We evaluate our proposed scheme on a CIFAR-10 classification task using a VGG- 8 network. Our results show that with precisely chosen transistor size on the employed SAR-ADC, we could maintain 88% 90% accuracy for the fine-tuned chip while the same set of weights cloned on other chips will only have 20 40% accuracy on average. The weight fine-tune could be completed within one epoch of 250 iterations. On average only 0.02%, 0.025%, 0.142% of cells are updated for 2-bit, 4-bit, 8-bit weight precisions in each iteration.
2021-11-29
Taghanaki, Saeid Rafiei, Arzandeh, Shohreh Behnam, Bohlooli, Ali.  2021.  A Decentralized Method for Detecting Clone ID Attacks on the Internet of Things. 2021 5th International Conference on Internet of Things and Applications (IoT). :1–6.
One of the attacks in the RPL protocol is the Clone ID attack, that the attacker clones the node's ID in the network. In this research, a Clone ID detection system is designed for the Internet of Things (IoT), implemented in Contiki operating system, and evaluated using the Cooja emulator. Our evaluation shows that the proposed method has desirable performance in terms of energy consumption overhead, true positive rate, and detection speed. The overhead cost of the proposed method is low enough that it can be deployed in limited-resource nodes. The proposed method in each node has two phases, which are the steps of gathering information and attack detection. In the proposed scheme, each node detects this type of attack using control packets received from its neighbors and their information such as IP, rank, Path ETX, and RSSI, as well as the use of a routing table. The design of this system will contribute to the security of the IoT network.
2021-08-02
Na, Yoonjong, Joo, Yejin, Lee, Heejo, Zhao, Xiangchen, Sajan, Kurian Karyakulam, Ramachandran, Gowri, Krishnamachari, Bhaskar.  2020.  Enhancing the Reliability of IoT Data Marketplaces through Security Validation of IoT Devices. 2020 16th International Conference on Distributed Computing in Sensor Systems (DCOSS). :265—272.
IoT data marketplaces are being developed to help cities and communities create large scale IoT applications. Such data marketplaces let the IoT device owners sell their data to the application developers. Following this application development model, the application developers need not deploy their own IoT devices when developing IoT applications; instead, they can buy data from a data marketplace. In a marketplace-based IoT application, the application developers are making critical business and operation decisions using the data produced by seller's IoT devices. Under these circumstances, it is crucial to verify and validate the security of IoT devices.In this paper, we assess the security of IoT data marketplaces. In particular, we discuss what kind of vulnerabilities exist in IoT data marketplaces using the well-known STRIDE model, and present a security assessment and certification framework for IoT data marketplaces to help the device owners to examine the security vulnerabilities of their devices. Most importantly, our solution certifies the IoT devices when they connect to the data marketplace, which helps the application developers to make an informed decision when buying and consuming data from a data marketplace. To demonstrate the effectiveness of the proposed approach, we have developed a proof-of-concept using I3 (Intelligent IoT Integrator), which is an open-source IoT data marketplace developed at the University of Southern California, and IoTcube, which is a vulnerability detection toolkit developed by researchers at Korea University. Through this work, we show that it is possible to increase the reliability of a IoT data marketplace while not damaging the convenience of the users.
2021-01-28
Kariyappa, S., Qureshi, M. K..  2020.  Defending Against Model Stealing Attacks With Adaptive Misinformation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :767—775.

Deep Neural Networks (DNNs) are susceptible to model stealing attacks, which allows a data-limited adversary with no knowledge of the training dataset to clone the functionality of a target model, just by using black-box query access. Such attacks are typically carried out by querying the target model using inputs that are synthetically generated or sampled from a surrogate dataset to construct a labeled dataset. The adversary can use this labeled dataset to train a clone model, which achieves a classification accuracy comparable to that of the target model. We propose "Adaptive Misinformation" to defend against such model stealing attacks. We identify that all existing model stealing attacks invariably query the target model with Out-Of-Distribution (OOD) inputs. By selectively sending incorrect predictions for OOD queries, our defense substantially degrades the accuracy of the attacker's clone model (by up to 40%), while minimally impacting the accuracy (\textbackslashtextless; 0.5%) for benign users. Compared to existing defenses, our defense has a significantly better security vs accuracy trade-off and incurs minimal computational overhead.

2020-11-17
Jaiswal, M., Malik, Y., Jaafar, F..  2018.  Android gaming malware detection using system call analysis. 2018 6th International Symposium on Digital Forensic and Security (ISDFS). :1—5.
Android operating systems have become a prime target for attackers as most of the market is currently dominated by Android users. The situation gets worse when users unknowingly download or sideload cloning applications, especially gaming applications that look like benign games. In this paper, we present, a dynamic Android gaming malware detection system based on system call analysis to classify malicious and legitimate games. We performed the dynamic system call analysis on normal and malicious gaming applications while applications are in execution state. Our analysis reveals the similarities and differences between benign and malware game system calls and shows how dynamically analyzing the behavior of malicious activity through system calls during runtime makes it easier and is more effective to detect malicious applications. Experimental analysis and results shows the efficiency and effectiveness of our approach.
2020-09-21
Osman, Amr, Bruckner, Pascal, Salah, Hani, Fitzek, Frank H. P., Strufe, Thorsten, Fischer, Mathias.  2019.  Sandnet: Towards High Quality of Deception in Container-Based Microservice Architectures. ICC 2019 - 2019 IEEE International Conference on Communications (ICC). :1–7.
Responding to network security incidents requires interference with ongoing attacks to restore the security of services running on production systems. This approach prevents damage, but drastically impedes the collection of threat intelligence and the analysis of vulnerabilities, exploits, and attack strategies. We propose the live confinement of suspicious microservices into a sandbox network that allows to monitor and analyze ongoing attacks under quarantine and that retains an image of the vulnerable and open production network. A successful sandboxing requires that it happens completely transparent to and cannot be detected by an attacker. Therefore, we introduce a novel metric to measure the Quality of Deception (QoD) and use it to evaluate three proposed network deception mechanisms. Our evaluation results indicate that in our evaluation scenario in best case, an optimal QoD is achieved. In worst case, only a small downtime of approx. 3s per microservice (MS) occurs and thus a momentary drop in QoD to 70.26% before it converges back to optimum as the quarantined services are restored.
2020-09-04
Laguduva, Vishalini, Islam, Sheikh Ariful, Aakur, Sathyanarayanan, Katkoori, Srinivas, Karam, Robert.  2019.  Machine Learning Based IoT Edge Node Security Attack and Countermeasures. 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). :670—675.
Advances in technology have enabled tremendous progress in the development of a highly connected ecosystem of ubiquitous computing devices collectively called the Internet of Things (IoT). Ensuring the security of IoT devices is a high priority due to the sensitive nature of the collected data. Physically Unclonable Functions (PUFs) have emerged as critical hardware primitive for ensuring the security of IoT nodes. Malicious modeling of PUF architectures has proven to be difficult due to the inherently stochastic nature of PUF architectures. Extant approaches to malicious PUF modeling assume that a priori knowledge and physical access to the PUF architecture is available for malicious attack on the IoT node. However, many IoT networks make the underlying assumption that the PUF architecture is sufficiently tamper-proof, both physically and mathematically. In this work, we show that knowledge of the underlying PUF structure is not necessary to clone a PUF. We present a novel non-invasive, architecture independent, machine learning attack for strong PUF designs with a cloning accuracy of 93.5% and improvements of up to 48.31% over an alternative, two-stage brute force attack model. We also propose a machine-learning based countermeasure, discriminator, which can distinguish cloned PUF devices and authentic PUFs with an average accuracy of 96.01%. The proposed discriminator can be used for rapidly authenticating millions of IoT nodes remotely from the cloud server.
2020-08-10
Kim, Byoungchul, Jung, Jaemin, Han, Sangchul, Jeon, Soyeon, Cho, Seong-je, Choi, Jongmoo.  2019.  A New Technique for Detecting Android App Clones Using Implicit Intent and Method Information. 2019 Eleventh International Conference on Ubiquitous and Future Networks (ICUFN). :478–483.
Detecting repackaged apps is one of the important issues in the Android ecosystem. Many attackers usually reverse engineer a legitimate app, modify or embed malicious codes into the app, repackage and distribute it in the online markets. They also employ code obfuscation techniques to hide app cloning or repackaging. In this paper, we propose a new technique for detecting repackaged Android apps, which is robust to code obfuscation. The technique analyzes the similarity of Android apps based on the method call information of component classes that receive implicit intents. We developed a tool Calldroid that implemented the proposed technique, and evaluated it on apps transformed using well-known obfuscators. The evaluation results showed that the proposed technique can effectively detect repackaged apps.
2020-02-10
Ding, Steven H. H., Fung, Benjamin C. M., Charland, Philippe.  2019.  Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. 2019 IEEE Symposium on Security and Privacy (SP). :472–489.

Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different. A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model \textbackslashemphAsm2Vec. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.

2019-02-14
Bae, S., Shin, Y..  2018.  An Automated System Recovery Using BlockChain. 2018 Tenth International Conference on Ubiquitous and Future Networks (ICUFN). :897-901.

The existing Disaster Recovery(DR) system has a technique for integrity of the duplicated file to be used for recovery, but it could not be used if the file was changed. In this study, a duplicate file is generated as a block and managed as a block-chain. If the duplicate file is corrupted, the DR system will check the integrity of the duplicated file by referring to the block-chain and proceed with the recovery. The proposed technology is verified through recovery performance evaluation and scenarios.

2018-06-20
Aslanyan, H., Avetisyan, A., Arutunian, M., Keropyan, G., Kurmangaleev, S., Vardanyan, V..  2017.  Scalable Framework for Accurate Binary Code Comparison. 2017 Ivannikov ISPRAS Open Conference (ISPRAS). :34–38.
Comparison of two binary files has many practical applications: the ability to detect programmatic changes between two versions, the ability to find old versions of statically linked libraries to prevent the use of well-known bugs, malware analysis, etc. In this article, a framework for comparison of binary files is presented. Framework uses IdaPro [1] disassembler and Binnavi [2] platform to recover structure of the target program and represent it as a call graph (CG). A program dependence graph (PDG) corresponds to each vertex of the CG. The proposed comparison algorithm consists of two main stages. At the first stage, several heuristics are applied to find the exact matches. Two functions are matched if at least one of the calculated heuristics is the same and unique in both binaries. At the second stage, backward and forward slicing is applied on matched vertices of CG to find further matches. According to empiric results heuristic method is effective and has high matching quality for unchanged or slightly modified functions. As a contradiction, to match heavily modified functions, binary code clone detection is used and it is based on finding maximum common subgraph for pair of PDGs. To achieve high performance on extensive binaries, the whole matching process is parallelized. The framework is tested on the number of real world libraries, such as python, openssh, openssl, libxml2, rsync, php, etc. Results show that in most cases more than 95% functions are truly matched. The tool is scalable due to parallelization of functions matching process and generation of PDGs and CGs.
2017-12-20
Maleki, H., Rahaeimehr, R., Jin, C., Dijk, M. van.  2017.  New clone-detection approach for RFID-based supply chains. 2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). :122–127.

Radio-Frequency Identification (RFID) tags have been widely used as a low-cost wireless method for detection of counterfeit product injection in supply chains. In order to adequately perform authentication, current RFID monitoring schemes need to either have a persistent online connection between supply chain partners and the back-end database or have a local database on each partner site. A persistent online connection is not guaranteed and local databases on each partner site impose extra cost and security issues. We solve this problem by introducing a new scheme in which a small Non-Volatile Memory (NVM) embedded in RFID tag is used to function as a tiny “encoded local database”. In addition our scheme resists “tag tracing” so that each partner's operation remains private. Our scheme can be implemented in less than 1200 gates satisfying current RFID technology requirements.

Ishio, T., Sakaguchi, Y., Ito, K., Inoue, K..  2017.  Source File Set Search for Clone-and-Own Reuse Analysis. 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). :257–268.
Clone-and-own approach is a natural way of source code reuse for software developers. To assess how known bugs and security vulnerabilities of a cloned component affect an application, developers and security analysts need to identify an original version of the component and understand how the cloned component is different from the original one. Although developers may record the original version information in a version control system and/or directory names, such information is often either unavailable or incomplete. In this research, we propose a code search method that takes as input a set of source files and extracts all the components including similar files from a software ecosystem (i.e., a collection of existing versions of software packages). Our method employs an efficient file similarity computation using b-bit minwise hashing technique. We use an aggregated file similarity for ranking components. To evaluate the effectiveness of this tool, we analyzed 75 cloned components in Firefox and Android source code. The tool took about two hours to report the original components from 10 million files in Debian GNU/Linux packages. Recall of the top-five components in the extracted lists is 0.907, while recall of a baseline using SHA-1 file hash is 0.773, according to the ground truth recorded in the source code repositories.
2017-03-07
Choi, S., Zage, D., Choe, Y. R., Wasilow, B..  2015.  Physically Unclonable Digital ID. 2015 IEEE International Conference on Mobile Services. :105–111.

The Center for Strategic and International Studies estimates the annual cost from cyber crime to be more than \$400 billion. Most notable is the recent digital identity thefts that compromised millions of accounts. These attacks emphasize the security problems of using clonable static information. One possible solution is the use of a physical device known as a Physically Unclonable Function (PUF). PUFs can be used to create encryption keys, generate random numbers, or authenticate devices. While the concept shows promise, current PUF implementations are inherently problematic: inconsistent behavior, expensive, susceptible to modeling attacks, and permanent. Therefore, we propose a new solution by which an unclonable, dynamic digital identity is created between two communication endpoints such as mobile devices. This Physically Unclonable Digital ID (PUDID) is created by injecting a data scrambling PUF device at the data origin point that corresponds to a unique and matching descrambler/hardware authentication at the receiving end. This device is designed using macroscopic, intentional anomalies, making them inexpensive to produce. PUDID is resistant to cryptanalysis due to the separation of the challenge response pair and a series of hash functions. PUDID is also unique in that by combining the PUF device identity with a dynamic human identity, we can create true two-factor authentication. We also propose an alternative solution that eliminates the need for a PUF mechanism altogether by combining tamper resistant capabilities with a series of hash functions. This tamper resistant device, referred to as a Quasi-PUDID (Q-PUDID), modifies input data, using a black-box mechanism, in an unpredictable way. By mimicking PUF attributes, Q-PUDID is able to avoid traditional PUF challenges thereby providing high-performing physical identity assurance with or without a low performing PUF mechanism. Three different application scenarios with mobile devices for PUDID and Q-PUDI- have been analyzed to show their unique advantages over traditional PUFs and outline the potential for placement in a host of applications.

2015-05-01
Kulkarni, A., Metta, R..  2014.  A New Code Obfuscation Scheme for Software Protection. Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on. :409-414.

IT industry loses tens of billions of dollars annually from security attacks such as tampering and malicious reverse engineering. Code obfuscation techniques counter such attacks by transforming code into patterns that resist the attacks. None of the current code obfuscation techniques satisfy all the obfuscation effectiveness criteria such as resistance to reverse engineering attacks and state space increase. To address this, we introduce new code patterns that we call nontrivial code clones and propose a new obfuscation scheme that combines nontrivial clones with existing obfuscation techniques to satisfy all the effectiveness criteria. The nontrivial code clones need to be constructed manually, thus adding to the development cost. This cost can be limited by cloning only the code fragments that need protection and by reusing the clones across projects. This makes it worthwhile considering the security risks. In this paper, we present our scheme and illustrate it with a toy example.