Visible to the public Biblio

Filters: Keyword is genomics  [Clear All Filters]
2022-04-12
Chen, Huiping, Dong, Changyu, Fan, Liyue, Loukides, Grigorios, Pissis, Solon P., Stougie, Leen.  2021.  Differentially Private String Sanitization for Frequency-Based Mining Tasks. 2021 IEEE International Conference on Data Mining (ICDM). :41—50.
Strings are used to model genomic, natural language, and web activity data, and are thus often shared broadly. However, string data sharing has raised privacy concerns stemming from the fact that knowledge of length-k substrings of a string and their frequencies (multiplicities) may be sufficient to uniquely reconstruct the string; and from that the inference of such substrings may leak confidential information. We thus introduce the problem of protecting length-k substrings of a single string S by applying Differential Privacy (DP) while maximizing data utility for frequency-based mining tasks. Our theoretical and empirical evidence suggests that classic DP mechanisms are not suitable to address the problem. In response, we employ the order-k de Bruijn graph G of S and propose a sampling-based mechanism for enforcing DP on G. We consider the task of enforcing DP on G using our mechanism while preserving the normalized edge multiplicities in G. We define an optimization problem on integer edge weights that is central to this task and develop an algorithm based on dynamic programming to solve it exactly. We also consider two variants of this problem with real edge weights. By relaxing the constraint of integer edge weights, we are able to develop linear-time exact algorithms for these variants, which we use as stepping stones towards effective heuristics. An extensive experimental evaluation using real-world large-scale strings (in the order of billions of letters) shows that our heuristics are efficient and produce near-optimal solutions which preserve data utility for frequency-based mining tasks.
2022-02-10
Rahman Mahdi, Md Safiur, Sadat, Md Nazmus, Mohammed, Noman, Jiang, Xiaoqian.  2020.  Secure Count Query on Encrypted Heterogeneous Data. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :548–555.
Cost-effective and efficient sequencing technologies have resulted in massive genomic data availability. To compute on a large-scale genomic dataset, it is often required to outsource the dataset to the cloud. To protect data confidentiality, data owners encrypt sensitive data before outsourcing. Outsourcing enhances data owners to eliminate the storage management problem. Since genome data is large in volume, secure execution of researchers query is challenging. In this paper, we propose a method to securely perform count query on datasets containing genotype, phenotype, and numeric data. Our method modifies the prefix-tree proposed by Hasan et al. [1] to incorporate numerical data. The proposed method guarantees data privacy, output privacy, and query privacy. We preserve the security through encryption and garbled circuits. For a query of 100 single-nucleotide polymorphism (SNPs) sequence, we achieve query execution time approximately 3.5 minutes in a database of 1500 records. To the best of our knowledge, this is the first proposed secure framework that addresses heterogeneous biomedical data including numeric attributes.
2021-09-16
Ullman, Steven, Samtani, Sagar, Lazarine, Ben, Zhu, Hongyi, Ampel, Benjamin, Patton, Mark, Chen, Hsinchun.  2020.  Smart Vulnerability Assessment for Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1–6.
The accelerated growth of computing technologies has provided interdisciplinary teams a platform for producing innovative research at an unprecedented speed. Advanced scientific cyberinfrastructures, in particular, provide data storage, applications, software, and other resources to facilitate the development of critical scientific discoveries. Users of these environments often rely on custom developed virtual machine (VM) images that are comprised of a diverse array of open source applications. These can include vulnerabilities undetectable by conventional vulnerability scanners. This research aims to identify the installed applications, their vulnerabilities, and how they vary across images in scientific cyberinfrastructure. We propose a novel unsupervised graph embedding framework that captures relationships between applications, as well as vulnerabilities identified on corresponding GitHub repositories. This embedding is used to cluster images with similar applications and vulnerabilities. We evaluate cluster quality using Silhouette, Calinski-Harabasz, and Davies-Bouldin indices, and application vulnerabilities through inspection of selected clusters. Results reveal that images pertaining to genomics research in our research testbed are at greater risk of high-severity shell spawning and data validation vulnerabilities.
2021-02-08
Mathur, G., Pandey, A., Goyal, S..  2020.  Immutable DNA Sequence Data Transmission for Next Generation Bioinformatics Using Blockchain Technology. 2nd International Conference on Data, Engineering and Applications (IDEA). :1–6.
In recent years, there is fast growth in the high throughput DNA sequencing technology, and also there is a reduction in the cost of genome-sequencing, that has led to a advances in the genetic industries. However, the reduction in cost and time required for DNA sequencing there is still an issue of managing such large amount of data. Also, the security and transmission of such huge amount of DNA sequence data is still an issue. The idea is to provide a secure storage platform for future generation bioinformatics systems for both researchers and healthcare user. Secure data sharing strategies, that can permit the healthcare providers along with their secured substances for verifying the accuracy of data, are crucial for ensuring proper medical services. In this paper, it has been surveyed about the applications of blockchain technology for securing healthcare data, where the recorded information is encrypted so that it becomes difficult to penetrate or being removed, as the primary goals of block-chaining technology is to make data immutable.
2020-10-29
Belenko, Viacheslav, Krundyshev, Vasiliy, Kalinin, Maxim.  2019.  Intrusion detection for Internet of Things applying metagenome fast analysis. 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity (WorldS4). :129—135.
Today, intrusion detection and prevention systems (IDS / IPS) are a necessary element of protection against network attacks. The main goal of such systems is to identify an unauthorized access to the network and take appropriate countermeasures: alarming security officers about intrusion, reconfiguration of firewall to block further acts of the attacker, protection against cyberattacks and malware. For traditional computer networks there are a large number of sufficiently effective approaches for protection against malicious activity, however, for the rapidly developing dynamic adhoc networks (Internet of Things - IoT, MANET, WSN, etc.) the task of creating a universal protection means is quite acute. In this paper, we review various methods for detecting polymorphic intrusion activity (polymorphic viral code and sequences of operations), present a comparative analysis, and implement the suggested technology for detecting polymorphic chains of operations using bioinformatics for IoT. The proposed approach has been tested with different lengths of operation sequences and different k-measures, as a result of which the optimal parameters of the proposed method have been determined.
2020-03-30
Narendra, Nanjangud C., Shukla, Anshu, Nayak, Sambit, Jagadish, Asha, Kalkur, Rachana.  2019.  Genoma: Distributed Provenance as a Service for IoT-based Systems. 2019 IEEE 5th World Forum on Internet of Things (WF-IoT). :755–760.
One of the key aspects of IoT-based systems, which we believe has not been getting the attention it deserves, is provenance. Provenance refers to those actions that record the usage of data in the system, along with the rationale for said usage. Historically, most provenance methods in distributed systems have been tightly coupled with those of the underlying data processing frameworks in such systems. However, in this paper, we argue that IoT provenance requires a different treatment, given the heterogeneity and dynamism of IoT-based systems. In particular, provenance in IoT-based systems should be decoupled as far as possible from the underlying data processing substrates in IoT-based systems.To that end, in this paper, we present Genoma, our ongoing work on a system for provenance-as-a-service in IoT-based systems. By "provenance-as-a-service" we mean the following: distributed provenance across IoT devices, edge and cloud; and agnostic of the underlying data processing substrate. Genoma comprises a set of services that act together to provide useful provenance information to users across the system. We also show how we are realizing Genoma via an implementation prototype built on Apache Atlas and Tinkergraph, through which we are investigating several key research issues in distributed IoT provenance.
2018-05-02
Clifford, J., Garfield, K., Towhidnejad, M., Neighbors, J., Miller, M., Verenich, E., Staskevich, G..  2017.  Multi-layer model of swarm intelligence for resilient autonomous systems. 2017 IEEE/AIAA 36th Digital Avionics Systems Conference (DASC). :1–4.

Embry-Riddle Aeronautical University (ERAU) is working with the Air Force Research Lab (AFRL) to develop a distributed multi-layer autonomous UAS planning and control technology for gathering intelligence in Anti-Access Area Denial (A2/AD) environments populated by intelligent adaptive adversaries. These resilient autonomous systems are able to navigate through hostile environments while performing Intelligence, Surveillance, and Reconnaissance (ISR) tasks, and minimizing the loss of assets. Our approach incorporates artificial life concepts, with a high-level architecture divided into three biologically inspired layers: cyber-physical, reactive, and deliberative. Each layer has a dynamic level of influence over the behavior of the agent. Algorithms within the layers act on a filtered view of reality, abstracted in the layer immediately below. Each layer takes input from the layer below, provides output to the layer above, and provides direction to the layer below. Fast-reactive control systems in lower layers ensure a stable environment supporting cognitive function on higher layers. The cyber-physical layer represents the central nervous system of the individual, consisting of elements of the vehicle that cannot be changed such as sensors, power plant, and physical configuration. On the reactive layer, the system uses an artificial life paradigm, where each agent interacts with the environment using a set of simple rules regarding wants and needs. Information is communicated explicitly via message passing and implicitly via observation and recognition of behavior. In the deliberative layer, individual agents look outward to the group, deliberating on efficient resource management and cooperation with other agents. Strategies at all layers are developed using machine learning techniques such as Genetic Algorithm (GA) or NN applied to system training that takes place prior to the mission.

2018-02-02
Khari, M., Vaishali, Kumar, M..  2016.  Analysis of software security testing using metaheuristic search technique. 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). :2147–2152.

Metaheuristic search technique is one of the advance approach when compared with traditional heuristic search technique. To select one option among different alternatives is not hard to get but really hard is give assurance that being cost effective. This hard problem is solved by the meta-heuristic search technique with the help of fitness function. Fitness function is a crucial metrics or a measure which helps in deciding which solution is optimal to choose from available set of test sets. This paper discusses hill climbing, simulated annealing, tabu search, genetic algorithm and particle swarm optimization techniques in detail explaining with the help of the algorithm. If metaheuristic search techniques combine some of the security testing methods, it would result in better searching technique as well as secure too. This paper primarily focusses on the metaheuristic search techniques.

2018-01-23
Aledhari, M., Marhoon, A., Hamad, A., Saeed, F..  2017.  A New Cryptography Algorithm to Protect Cloud-Based Healthcare Services. 2017 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE). :37–43.

The revolution of smart devices has a significant and positive impact on the lives of many people, especially in regard to elements of healthcare. In part, this revolution is attributed to technological advances that enable individuals to wear and use medical devices to monitor their health activities, but remotely. Also, these smart, wearable medical devices assist health care providers in monitoring their patients remotely, thereby enabling physicians to respond quickly in the event of emergencies. An ancillary advantage is that health care costs will be reduced, another benefit that, when paired with prompt medical treatment, indicates significant advances in the contemporary management of health care. However, the competition among manufacturers of these medical devices creates a complexity of small and smart wearable devices such as ECG and EMG. This complexity results in other issues such as patient security, privacy, confidentiality, and identity theft. In this paper, we discuss the design and implementation of a hybrid real-time cryptography algorithm to secure lightweight wearable medical devices. The proposed system is based on an emerging innovative technology between the genomic encryptions and the deterministic chaos method to provide a quick and secure cryptography algorithm for real-time health monitoring that permits for threats to patient confidentiality to be addressed. The proposed algorithm also considers the limitations of memory and size of the wearable health devices. The experimental results and the encryption analysis indicate that the proposed algorithm provides a high level of security for the remote health monitoring system.

Wang, B., Song, W., Lou, W., Hou, Y. T..  2017.  Privacy-preserving pattern matching over encrypted genetic data in cloud computing. IEEE INFOCOM 2017 - IEEE Conference on Computer Communications. :1–9.

Personalized medicine performs diagnoses and treatments according to the DNA information of the patients. The new paradigm will change the health care model in the future. A doctor will perform the DNA sequence matching instead of the regular clinical laboratory tests to diagnose and medicate the diseases. Additionally, with the help of the affordable personal genomics services such as 23andMe, personalized medicine will be applied to a great population. Cloud computing will be the perfect computing model as the volume of the DNA data and the computation over it are often immense. However, due to the sensitivity, the DNA data should be encrypted before being outsourced into the cloud. In this paper, we start from a practical system model of the personalize medicine and present a solution for the secure DNA sequence matching problem in cloud computing. Comparing with the existing solutions, our scheme protects the DNA data privacy as well as the search pattern to provide a better privacy guarantee. We have proved that our scheme is secure under the well-defined cryptographic assumption, i.e., the sub-group decision assumption over a bilinear group. Unlike the existing interactive schemes, our scheme requires only one round of communication, which is critical in practical application scenarios. We also carry out a simulation study using the real-world DNA data to evaluate the performance of our scheme. The simulation results show that the computation overhead for real world problems is practical, and the communication cost is small. Furthermore, our scheme is not limited to the genome matching problem but it applies to general privacy preserving pattern matching problems which is widely used in real world.

Backes, M., Berrang, P., Bieg, M., Eils, R., Herrmann, C., Humbert, M., Lehmann, I..  2017.  Identifying Personal DNA Methylation Profiles by Genotype Inference. 2017 IEEE Symposium on Security and Privacy (SP). :957–976.

Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health. In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.

2018-01-16
Ugwuoke, C., Erkin, Z., Lagendijk, R. L..  2017.  Privacy-safe linkage analysis with homomorphic encryption. 2017 25th European Signal Processing Conference (EUSIPCO). :961–965.

Genetic data are important dataset utilised in genetic epidemiology to investigate biologically coded information within the human genome. Enormous research has been delved into in recent years in order to fully sequence and understand the genome. Personalised medicine, patient response to treatments and relationships between specific genes and certain characteristics such as phenotypes and diseases, are positive impacts of studying the genome, just to mention a few. The sensitivity, longevity and non-modifiable nature of genetic data make it even more interesting, consequently, the security and privacy for the storage and processing of genomic data beg for attention. A common activity carried out by geneticists is the association analysis between allele-allele, or even a genetic locus and a disease. We demonstrate the use of cryptographic techniques such as homomorphic encryption schemes and multiparty computations, how such analysis can be carried out in a privacy friendly manner. We compute a 3 × 3 contingency table, and then, genome analyses algorithms such as linkage disequilibrium (LD) measures, all on the encrypted domain. Our computation guarantees privacy of the genome data under our security settings, and provides up to 98.4% improvement, compared to an existing solution.