Biblio
Performing large-scale malware classification is increasingly becoming a critical step in malware analytics as the number and variety of malware samples is rapidly growing. Statistical machine learning constitutes an appealing method to cope with this increase as it can use mathematical tools to extract information out of large-scale datasets and produce interpretable models. This has motivated a surge of scientific work in developing machine learning methods for detection and classification of malicious executables. However, an optimal method for extracting the most informative features for different malware families, with the final goal of malware classification, is yet to be found. Fortunately, neural networks have evolved to the state that they can surpass the limitations of other methods in terms of hierarchical feature extraction. Consequently, neural networks can now offer superior classification accuracy in many domains such as computer vision and natural language processing. In this paper, we transfer the performance improvements achieved in the area of neural networks to model the execution sequences of disassembled malicious binaries. We implement a neural network that consists of convolutional and feedforward neural constructs. This architecture embodies a hierarchical feature extraction approach that combines convolution of n-grams of instructions with plain vectorization of features derived from the headers of the Portable Executable (PE) files. Our evaluation results demonstrate that our approach outperforms baseline methods, such as simple Feedforward Neural Networks and Support Vector Machines, as we achieve 93% on precision and recall, even in case of obfuscations in the data.
As the malware threat landscape is constantly evolving and over one million new malware strains are being generated every day [1], early automatic detection of threats constitutes a top priority of cybersecurity research, and amplifies the need for more advanced detection and classification methods that are effective and efficient. In this paper, we present the application of machine learning algorithms to predict the length of time malware should be executed in a sandbox to reveal its malicious intent. We also introduce a novel hybrid approach to malware classification based on static binary analysis and dynamic analysis of malware. Static analysis extracts information from a binary file without executing it, and dynamic analysis captures the behavior of malware in a sandbox environment. Our experimental results show that by turning the aforementioned problems into machine learning problems, it is possible to get an accuracy of up to 90% on the prediction of the malware analysis run time and up to 92% on the classification of malware families.
We present AVAMAT: AntiVirus and Malware Analysis Tool - a tool for analysing the malware detection capabilities of AntiVirus (AV) products running on different operating system (OS) platforms. Even though similar tools are available, such as VirusTotal and MetaDefender, they have several limitations, which motivated the creation of our own tool. With AVAMAT we are able to analyse not only whether an AV detects a malware, but also at what stage of inspection does it detect it and on what OS. AVAMAT enables experimental campaigns to answer various research questions, ranging from the detection capabilities of AVs on OSs, to optimal ways in which AVs could be combined to improve malware detection capabilities.
Malware analysis relies heavily on the use of virtual machines (VMs) for functionality and safety. There are subtle differences in operation between virtual and physical machines. Contemporary malware checks for these differences and changes its behavior when it detects a VM presence. These anti-VM techniques hinder malware analysis. Existing research approaches to uncover differences between VMs and physical machines use randomized testing, and thus cannot guarantee completeness. In this article, we propose a detect-and-hide approach, which systematically addresses anti-VM techniques in malware. First, we propose cardinal pill testing—a modification of red pill testing that aims to enumerate the differences between a given VM and a physical machine through carefully designed tests. Cardinal pill testing finds five times more pills by running 15 times fewer tests than red pill testing. We examine the causes of pills and find that, while the majority of them stem from the failure of VMs to follow CPU specifications, a small number stem from under-specification of certain instructions by the Intel manual. This leads to divergent implementations in different CPU and VM architectures. Cardinal pill testing successfully enumerates the differences that stem from the first cause. Finally, we propose VM Cloak—a WinDbg plug-in which hides the presence of VMs from malware. VM Cloak monitors each execute malware command, detects potential pills, and at runtime modifies the command’s outcomes to match those that a physical machine would generate. We implemented VM Cloak and verified that it successfully hides VM presence from malware.
Malware damages computers and the threat is a serious problem. Malware can be detected by pattern matching method or dynamic heuristic method. However, it is difficult to detect all new malware subspecies perfectly by existing methods. In this paper, we propose a new method which automatically detects new malware subspecies by static analysis of execution files and machine learning. The method can distinguish malware from benignware and it can also classify malware subspecies into malware families. We combine static analysis of execution files with machine learning classifier and natural language processing by machine learning. Information of DLL Import, assembly code and hexdump are acquired by static analysis of execution files of malware and benignware to create feature vectors. Paragraph vectors of information by static analysis of execution files are created by machine learning of PV-DBOW model for natural language processing. Support vector machine and classifier of k-nearest neighbor algorithm are used in our method, and the classifier learns paragraph vectors of information by static analysis. Unknown execution files are classified into malware or benignware by pre-learned SVM. Moreover, malware subspecies are also classified into malware families by pre-learned k-nearest. We evaluate the accuracy of the classification by experiments. We think that new malware subspecies can be effectively detected by our method without existing methods for malware analysis such as generic method and dynamic heuristic method.
Code reuse detection is a key technique in reverse engineering. However, existing source code similarity comparison techniques are not applicable to binary code. Moreover, compilers have made this problem even more difficult due to the fact that different assembly code and control flow structures can be generated by the compilers even when implementing the same functionality. To address this problem, we present a fuzzy matching approach to compare two functions. We first obtain an initial mapping between basic blocks by leveraging the concept of longest common subsequence on the basic block level and execution path level. We then extend the achieved mapping using neighborhood exploration. To make our approach applicable to large data sets, we designed an effective filtering process using Minhashing. Based on the proposed approach, we implemented a tool named BinSequence and conducted extensive experiments with it. Our results show that given a large assembly code repository with millions of functions, BinSequence is efficient and can attain high quality similarity ranking of assembly functions with an accuracy of above 90%. We also present several practical use cases including patch analysis, malware analysis and bug search.
The increasing growth of cybercrimes targeting mobile devices urges an efficient malware analysis platform. With the emergence of evasive malware, which is capable of detecting that it is being analyzed in virtualized environments, bare-metal analysis has become the definitive resort. Existing works mainly focus on extracting the malicious behaviors exposed during bare-metal analysis. However, after malware analysis, it is equally important to quickly restore the system to a clean state to examine the next sample. Unfortunately, state-of-the-art solutions on mobile platforms can only restore the disk, and require a time-consuming system reboot. In addition, all of the existing works require some in-guest components to assist the restoration. Therefore, a kernel-level malware is still able to detect the presence of the in-guest components. We propose Bolt, a transparent restoration mechanism for bare-metal analysis on mobile platform without rebooting. Bolt achieves a reboot-less restoration by simultaneously making a snapshot for both the physical memory and the disk. Memory snapshot is enabled by an isolated operating system (BoltOS) in the ARM TrustZone secure world, and disk snapshot is accomplished by a piece of customized firmware (BoltFTL) for flash-based block devices. Because both the BoltOS and the BoltFTL are isolated from the guest system, even kernel-level malware cannot interfere with the restoration. More importantly, Bolt does not require any modifications into the guest system. As such, Bolt is the first that simultaneously achieves efficiency, isolation, and stealthiness to recover from infection due to malware execution. We have implemented a Bolt prototype working with the Android OS. Experimental results show that Bolt can restore the guest system to a clean state in only 2.80 seconds.
The blockchain emerges as an innovative tool that has the potential to positively impact the way we design a number of online applications today. In many ways, the blockchain technology is, however, still not mature enough to cater for industrial standards. Namely, existing Byzantine tolerant permission-based blockchain deployments can only scale to a limited number of nodes. These systems typically require that all transactions (and their order of execution) are publicly available to all nodes in the system, which comes at odds with common data sharing practices in the industry, and prevents a centralized regulator from overseeing the full blockchain system. In this paper, we propose a novel blockchain architecture devised specifically to meet industrial standards. Our proposal leverages the notion of satellite chains that can privately run different consensus protocols in parallel - thereby considerably boosting the scalability premises of the system. Our solution also accounts for a hands-off regulator that oversees the entire network, enforces specific policies by means of smart contracts, etc. We implemented our solution and integrated it with Hyperledger Fabric v0.6.
Sharing and working on sensitive data in distributed settings from healthcare to finance is a major challenge due to security and privacy concerns. Secure multiparty computation (SMC) is a viable panacea for this, allowing distributed parties to make computations while the parties learn nothing about their data, but the final result. Although SMC is instrumental in such distributed settings, it does not provide any guarantees not to leak any information about individuals to adversaries. Differential privacy (DP) can be utilized to address this; however, achieving SMC with DP is not a trivial task, either. In this paper, we propose a novel Secure Multiparty Distributed Differentially Private (SM-DDP) protocol to achieve secure and private computations in a multiparty environment. Specifically, with our protocol, we simultaneously achieve SMC and DP in distributed settings focusing on linear regression on horizontally distributed data. That is, parties do not see each others’ data and further, can not infer information about individuals from the final constructed statistical model. Any statistical model function that allows independent calculation of local statistics can be computed through our protocol. The protocol implements homomorphic encryption for SMC and functional mechanism for DP to achieve the desired security and privacy guarantees. In this work, we first introduce the theoretical foundation for the SM-DDP protocol and then evaluate its efficacy and performance on two different datasets. Our results show that one can achieve individual-level privacy through the proposed protocol with distributed DP, which is independently applied by each party in a distributed fashion. Moreover, our results also show that the SM-DDP protocol incurs minimal computational overhead, is scalable, and provides security and privacy guarantees.
Security issues in vehicular communication have become a huge concern to safeguard increasing applications. A group signature is one of the popular authentication approaches for VANETs (Vehicular ad hoc networks) which can be implemented to secure the vehicular communication. However, securely distributing group keys to fast-moving vehicular nodes is still a challenging problem. In this paper, we propose an efficient key management protocol for group signature based authentication, where a group is extended to a domain with multiple road side units. Our scheme not only provides a secure way to deliver group keys to vehicular nodes, but also ensures security features. The experiment results show that our key distribution scheme is a scalable, efficient and secure solution to vehicular networking.
The Internet of Things (IoT) devices perform security-critical operations and deal with sensitive information in the IoT-based systems. Therefore, the increased deployment of smart devices will make them targets for cyber attacks. Adversaries can perform malicious actions, leak private information, and track devices' and their owners' location by gaining unauthorized access to IoT devices and networks. However, conventional security protocols are not primarily designed for resource constrained devices and therefore cannot be applied directly to IoT systems. In this paper, we propose Boot-IoT - a privacy-preserving, lightweight, and scalable security scheme for limited resource devices. Boot-IoT prevents a malicious device from joining an IoT network. Boot-IoT enables a device to compute a unique identity for authentication each time the device enters a network. Moreover, during device to device communication, Boot-IoT provides a lightweight mutual authentication scheme that ensures privacy-preserving identity usages. We present a detailed analysis of the security strength of BootIoT. We implemented a prototype of Boot-IoT on IoT devices powered by Contiki OS and provided an extensive comparative analysis of Boot-IoT with contemporary authentication methods. Our results show that Boot-IoT is resource efficient and provides better scalability compared to current solutions.
A majority of today's mobile apps integrate web content of various kinds. Unfortunately, the interactions between app code and web content expose new attack vectors: a malicious app can subvert its embedded web content to steal user secrets; on the other hand, malicious web content can use the privileges of its embedding app to exfiltrate sensitive information such as the user's location and contacts. In this paper, we discuss security weaknesses of the interface between app code and web content through attacks, then introduce defenses that can be deployed without modifying the OS. Our defenses feature WIREframe, a service that securely embeds and renders external web content in Android apps, and in turn, prevents attacks between em- bedded web and host apps. WIREframe fully mediates the interface between app code and embedded web content. Un- like the existing web-embedding mechanisms, WIREframe allows both apps and embedded web content to define simple access policies to protect their own resources. These policies recognize fine-grained security principals, such as origins, and control all interactions between apps and the web. We also introduce WIRE (Web Isolation Rewriting Engine), an offline app rewriting tool that allows app users to inject WIREframe protections into existing apps. Our evaluation, based on 7166 popular apps and 20 specially selected apps, shows these techniques work on complex apps and incur acceptable end-to-end performance overhead.
Mobile applications have grown from knowing basic personal information to knowing intimate details of consumer's lives. The explosion of knowledge that applications contain and share can be contributed to many factors. Mobile devices are equipped with advanced sensors including GPS and cameras, while storing large amounts of personal information including photos and contacts. With millions of applications available to install, personal data is at constant risk of being misused. While mobile operating systems provide basic security and privacy controls, they are insufficient, leaving the consumer unaware of how applications are using permissions that were granted. In this paper, we propose a solution that aims to provide consumers awareness of applications misusing data and policies that can protect their data. From this investigation we present SPEProxy. SPEProxy utilizes a knowledge based approach to provide consumer's an ability to understand how applications are using permissions beyond their stated intent. Additionally, SPEProxy provides an awareness of fine grained policies that would allow the user to protect their data. SPEProxy is device and mobile operating system agnostic, meaning it does not require a specific device or operating system nor modification to the operating system or applications. This approach allows consumers to utilize the solution without requiring a high degree of technical expertise. We evaluated SPEProxy across 817 of the most popular applications in the iOS App Store and Google Play. In our evaluation, SPEProxy was highly effective across 86.55% applications where several well known applications exhibited misusing granted permissions.
Internet of Things (IoT) devices are getting increasingly popular, becoming a core element for the next generations of informational architectures: smart city, smart factory, smart home, smart health-care and many others. IoT systems are mainly comprised of embedded devices with limited computing capabilities while having a cloud component which processes the data and delivers it to the end-users. IoT devices access the user private data, thus requiring robust security solution which must address features like usability and scalability. In this paper we discuss about an IoT authentication service for smart-home devices using a smart-phone as security anchor, QR codes and attribute based cryptography (ABC). Regarding the fact that in an IoT ecosystem some of the IoT devices and the cloud components may be considered untrusted, we propose a privacy preserving attribute based access control protocol to handle the device authentication to the cloud service. For the smart-phone centric authentication to the cloud component, we employ the FIDO UAF protocol and we extend it, by adding an attributed based privacy preserving component.
The revolution of smart devices has a significant and positive impact on the lives of many people, especially in regard to elements of healthcare. In part, this revolution is attributed to technological advances that enable individuals to wear and use medical devices to monitor their health activities, but remotely. Also, these smart, wearable medical devices assist health care providers in monitoring their patients remotely, thereby enabling physicians to respond quickly in the event of emergencies. An ancillary advantage is that health care costs will be reduced, another benefit that, when paired with prompt medical treatment, indicates significant advances in the contemporary management of health care. However, the competition among manufacturers of these medical devices creates a complexity of small and smart wearable devices such as ECG and EMG. This complexity results in other issues such as patient security, privacy, confidentiality, and identity theft. In this paper, we discuss the design and implementation of a hybrid real-time cryptography algorithm to secure lightweight wearable medical devices. The proposed system is based on an emerging innovative technology between the genomic encryptions and the deterministic chaos method to provide a quick and secure cryptography algorithm for real-time health monitoring that permits for threats to patient confidentiality to be addressed. The proposed algorithm also considers the limitations of memory and size of the wearable health devices. The experimental results and the encryption analysis indicate that the proposed algorithm provides a high level of security for the remote health monitoring system.
Applying security to the transmitted image is very important issues, because the transmission channel is open and can be compromised by attackers. To secure this channel from the eavesdropping attack, man in the middle attack, and so on. A new hybrid encryption image mechanism that utilize triangular scrambling, DNA encoding and chaotic map is implemented. The scheme takes a master key with a length of 320 bit, and produces a group of sub-keys with two length (32 and 128 bit) to encrypt the blocks of images, then a new triangular scrambling method is used to increase the security of the image. Many experiments are implemented using several different images. The analysis results for these experiments show that the security obtained on by using the proposed method is very suitable for securing the transmitted images. The current work has been compared with other works and the result of comparison shows that the current work is very strong against attacks.
Personalized medicine performs diagnoses and treatments according to the DNA information of the patients. The new paradigm will change the health care model in the future. A doctor will perform the DNA sequence matching instead of the regular clinical laboratory tests to diagnose and medicate the diseases. Additionally, with the help of the affordable personal genomics services such as 23andMe, personalized medicine will be applied to a great population. Cloud computing will be the perfect computing model as the volume of the DNA data and the computation over it are often immense. However, due to the sensitivity, the DNA data should be encrypted before being outsourced into the cloud. In this paper, we start from a practical system model of the personalize medicine and present a solution for the secure DNA sequence matching problem in cloud computing. Comparing with the existing solutions, our scheme protects the DNA data privacy as well as the search pattern to provide a better privacy guarantee. We have proved that our scheme is secure under the well-defined cryptographic assumption, i.e., the sub-group decision assumption over a bilinear group. Unlike the existing interactive schemes, our scheme requires only one round of communication, which is critical in practical application scenarios. We also carry out a simulation study using the real-world DNA data to evaluate the performance of our scheme. The simulation results show that the computation overhead for real world problems is practical, and the communication cost is small. Furthermore, our scheme is not limited to the genome matching problem but it applies to general privacy preserving pattern matching problems which is widely used in real world.
Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.
DNA cryptography is one of the promising fields in cryptographic research which emerged with the evolution of DNA computing. In this era, end to end transmission of secure data by ensuring confidentiality and authenticity over the networks is a real challenge. Even though various DNA based cryptographic algorithms exists, they are not secure enough to provide better security as required with today's security requirements. Hence we propose a cryptographic model which will enhance the message security. A new method of round key selection is used, which provides better and enhanced security against intruder's attack. The crucial attraction of this proposed model is providing multi level security of 3 levels with round key selection and message encryption in level 1, 16×16 matrix manipulation using asymmetric key encryption in level 2 and shift operations in level 3. Thus we design a system with multi level encryption without compromising complexity and size of the cipher text.
Cellular Automata based computing paradigm is an efficient platform for modeling complicated computational problems. This can be used for various applications in the field of Cryptography. In this paper, it is used for generating a DNA cryptography based encryption algorithm. The encoded message in binary format is encrypted to cipher colors with the help of a simple algorithm based on the principles of DNA cryptography and cellular automata. The message will be in compressed form using XOR operator. Since cellular automata and DNA cryptographic principles are exploited, high level of parallelism, reversibility, uniformity etc. can be achieved.
In today's growing concern for home security, we have developed an advanced security system using integrated digital signature and DNA cryptography. The digital signature is formed using multi-feature biometric traits which includes both fingerprint as well as iris image. We further increase the security by using DNA cryptography which is embedded on a smart card. In order to prevent unauthorized access manually or digitally, we use geo-detection which compares the unregistered devices location with the user's location using any of their personal devices such as smart phone or tab.