Visible to the public Biblio

Found 15086 results

Filters: Keyword is pubcrawl  [Clear All Filters]
2017-06-05
Singh, Neha, Singh, Saurabh, Kumar, Naveen, Kumar, Rakesh.  2016.  Key Management Techniques for Securing MANET. Proceedings of the ACM Symposium on Women in Research 2016. :77–80.

A Mobile Ad hoc Network (MANET) is a spontaneous network consisting of wireless nodes which are mobile and self-configuring in nature. Devices in MANET can move freely in any direction independently and change its link frequently to other devices. MANET does not have centralized infrastructure and its characteristics makes this network vulnerable to various kinds of attacks. Data transfer is a major problem due to its nature of unreliable wireless medium. Commonly used technique for secure transmission in wireless network is cryptography. Use of cryptography key is often involved in most of cryptographic techniques. Key management is main component in security issues of MANET and various schemes have been proposed for it. In this paper, a study on various kinds of key management techniques in MANET is presented.

Khodaei, Mohammad, Papadimitratos, Panos.  2016.  Evaluating On-demand Pseudonym Acquisition Policies in Vehicular Communication Systems. Proceedings of the First International Workshop on Internet of Vehicles and Vehicles of Internet. :7–12.

Standardization and harmonization efforts have reached a consensus towards using a special-purpose Vehicular Public-Key Infrastructure (VPKI) in upcoming Vehicular Communication (VC) systems. However, there are still several technical challenges with no conclusive answers; one such an important yet open challenge is the acquisition of short-term credentials, pseudonym: how should each vehicle interact with the VPKI, e.g., how frequently and for how long? Should each vehicle itself determine the pseudonym lifetime? Answering these questions is far from trivial. Each choice can affect both the user privacy and the system performance and possibly, as a result, its security. In this paper, we make a novel systematic effort to address this multifaceted question. We craft three generally applicable policies and experimentally evaluate the VPKI system performance, leveraging two large-scale mobility datasets. We consider the most promising, in terms of efficiency, pseudonym acquisition policies; we find that within this class of policies, the most promising policy in terms of privacy protection can be supported with moderate overhead. Moreover, in all cases, this work is the first to provide tangible evidence that the state-of-the-art VPKI can serve sizable areas or domain with modest computing resources.

Yao, Qingsong, Ma, Jianfeng, Cong, Sun, Li, Xinghua, Li, Jinku.  2016.  Attack Gives Me Power: DoS-defending Constant-time Privacy-preserving Authentication of Low-cost Devices Such As Backscattering RFID Tags. Proceedings of the 3rd ACM Workshop on Mobile Sensing, Computing and Communication. :23–28.

Denial of service (DoS) attack is a great threaten to privacy-preserving authentication protocols for low-cost devices such as RFID. During such attack, the legal internal states can be consumed by the DoS attack. Then the attacker can observe the behavior of the attacked tag in authentication to break privacy. Due to the inadequate energy and computing power, the low cost devices can hardly defend against the DoS attacks. In this paper, we propose a new insight of the DoS attack on tags and leverage the attacking behavior as a new source of power harvesting. In this way, a low-cost device such as a tag grows more and more powerful under DoS attack. Finally, it can defend against the DoS attack. We further propose a protocol that enables DoS-defending constant-time privacy-preserving authentication.

Huang, Baohua, Jia, Fengwei, Yu, Jiguo, Cheng, Wei.  2016.  A Transparent Framework Based on Accessing Bridge and Mobile App for Protecting Database Privacy with PKI. Proceedings of the 1st ACM Workshop on Privacy-Aware Mobile Computing. :43–50.

With the popularity of cloud computing, database outsourcing has been adopted by many companies. However, database owners may not 100% trust their database service providers. As a result, database privacy becomes a key issue for protecting data from the database service providers. Many researches have been conducted to address this issue, but few of them considered the simultaneous transparent support of existing DBMSs (Database Management Systems), applications and RADTs (Rapid Application Development Tools). A transparent framework based on accessing bridge and mobile app for protecting database privacy with PKI (Public Key Infrastructure) is, therefore, proposed to fill the blank. The framework uses PKI as its security base and encrypts sensitive data with data owners' public keys to protect data privacy. Mobile app is used to control private key and decrypt data, so that accessing sensitive data is completely controlled by data owners in a secure and independent channel. Accessing bridge utilizes database accessing middleware standard to transparently support existing DBMSs, applications and RADTs. This paper presents the framework, analyzes its transparency and security, and evaluates its performance via experiments.

Hu, Chunqiang, Li, Ruinian, Li, Wei, Yu, Jiguo, Tian, Zhi, Bie, Rongfang.  2016.  Efficient Privacy-preserving Schemes for Dot-product Computation in Mobile Computing. Proceedings of the 1st ACM Workshop on Privacy-Aware Mobile Computing. :51–59.

Many applications of mobile computing require the computation of dot-product of two vectors. For examples, the dot-product of an individual's genome data and the gene biomarkers of a health center can help detect diseases in m-Health, and that of the interests of two persons can facilitate friend discovery in mobile social networks. Nevertheless, exposing the inputs of dot-product computation discloses sensitive information about the two participants, leading to severe privacy violations. In this paper, we tackle the problem of privacy-preserving dot-product computation targeting mobile computing applications in which secure channels are hardly established, and the computational efficiency is highly desirable. We first propose two basic schemes and then present the corresponding advanced versions to improve efficiency and enhance privacy-protection strength. Furthermore, we theoretically prove that our proposed schemes can simultaneously achieve privacy-preservation, non-repudiation, and accountability. Our numerical results verify the performance of the proposed schemes in terms of communication and computational overheads.

He, Zaobo, Cai, Zhipeng, Li, Yingshu.  2016.  Customized Privacy Preserving for Classification Based Applications. Proceedings of the 1st ACM Workshop on Privacy-Aware Mobile Computing. :37–42.

The rise of sensor-equipped smart phones has enabled a variety of classification based applications that provide personalized services based on user data extracted from sensor readings. However, malicious applications aggressively collect sensitive information from inherent user data without permissions. Furthermore, they can mine sensitive information from user data just in the classification process. These privacy threats raise serious privacy concerns. In this paper, we introduce two new privacy concerns which are inherent-data privacy and latent-data privacy. We propose a framework that enables a data-obfuscation mechanism to be developed easily. It preserves latent-data privacy while guaranteeing satisfactory service quality. The proposed framework preserves privacy against powerful adversaries who have knowledge of users' access pattern and the data-obfuscation mechanism. We validate our framework towards a real classification-orientated dataset. The experiment results confirm that our framework is superior to the basic obfuscation mechanism.

Jin, Haiming, Su, Lu, Xiao, Houping, Nahrstedt, Klara.  2016.  INCEPTION: Incentivizing Privacy-preserving Data Aggregation for Mobile Crowd Sensing Systems. Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing. :341–350.

The recent proliferation of human-carried mobile devices has given rise to mobile crowd sensing (MCS) systems that outsource the collection of sensory data to the public crowd equipped with various mobile devices. A fundamental issue in such systems is to effectively incentivize worker participation. However, instead of being an isolated module, the incentive mechanism usually interacts with other components which may affect its performance, such as data aggregation component that aggregates workers' data and data perturbation component that protects workers' privacy. Therefore, different from past literature, we capture such interactive effect, and propose INCEPTION, a novel MCS system framework that integrates an incentive, a data aggregation, and a data perturbation mechanism. Specifically, its incentive mechanism selects workers who are more likely to provide reliable data, and compensates their costs for both sensing and privacy leakage. Its data aggregation mechanism also incorporates workers' reliability to generate highly accurate aggregated results, and its data perturbation mechanism ensures satisfactory protection for workers' privacy and desirable accuracy for the final perturbed results. We validate the desirable properties of INCEPTION through theoretical analysis, as well as extensive simulations.

Annadata, Prasad, Eltarjaman, Wisam, Thurimella, Ramakrishna.  2016.  Person Detection Techniques for an IoT Based Emergency Evacuation Assistance System. Adjunct Proceedings of the 13th International Conference on Mobile and Ubiquitous Systems: Computing Networking and Services. :77–82.

Emergency evacuations during disasters minimize loss of lives and injuries. It is not surprising that emergency evacuation preparedness is mandatory for organizations in many jurisdictions. In the case of corporations, this requirement translates to considerable expenses, consisting of construction costs, equipment, recruitment, retention and training. In addition, required regular evacuation drills cause recurring expenses and loss of productivity. Any automation to assist in these drills and in actual evacuations can mean savings of costs, time and lives. Evacuation assistance systems rely on attendance systems that often fall short in accuracy, particularly in environments with lot of "non-swipers" (customers, visitors, etc.,). A critical question to answer in the case of an emergency is "How many people are still in the building?". This number is calculated by comparing the number of people gathered at assembly point to the last known number of people inside the building. An IoT based system can enhance the answer to that question by providing the number of people in the building, provide their last known locations in an automated fashion and even automate the reconciliation process. Our proposed system detects the people in the building automatically using multiple channels such as WiFi and motion detection. Such a system needs the ability to link specific identifiers to persons reliably. In this paper we present our statistics and heuristics based solutions for linking detected identifiers as belonging to an actual persons in a privacy preserving manner using IoT technologies.

Kellaris, Georgios, Kollios, George, Nissim, Kobbi, O'Neill, Adam.  2016.  Generic Attacks on Secure Outsourced Databases. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1329–1340.

Recently, various protocols have been proposed for securely outsourcing database storage to a third party server, ranging from systems with "full-fledged" security based on strong cryptographic primitives such as fully homomorphic encryption or oblivious RAM, to more practical implementations based on searchable symmetric encryption or even on deterministic and order-preserving encryption. On the flip side, various attacks have emerged that show that for some of these protocols confidentiality of the data can be compromised, usually given certain auxiliary information. We take a step back and identify a need for a formal understanding of the inherent efficiency/privacy trade-off in outsourced database systems, independent of the details of the system. We propose abstract models that capture secure outsourced storage systems in sufficient generality, and identify two basic sources of leakage, namely access pattern and ommunication volume. We use our models to distinguish certain classes of outsourced database systems that have been proposed, and deduce that all of them exhibit at least one of these leakage sources. We then develop generic reconstruction attacks on any system supporting range queries where either access pattern or communication volume is leaked. These attacks are in a rather weak passive adversarial model, where the untrusted server knows only the underlying query distribution. In particular, to perform our attack the server need not have any prior knowledge about the data, and need not know any of the issued queries nor their results. Yet, the server can reconstruct the secret attribute of every record in the database after about \$Ntextasciicircum4\$ queries, where N is the domain size. We provide a matching lower bound showing that our attacks are essentially optimal. Our reconstruction attacks using communication volume apply even to systems based on homomorphic encryption or oblivious RAM in the natural way. Finally, we provide experimental results demonstrating the efficacy of our attacks on real datasets with a variety of different features. On all these datasets, after the required number of queries our attacks successfully recovered the secret attributes of every record in at most a few seconds.

Chen, Yu-Chi, Chow, Sherman S.M., Chung, Kai-Min, Lai, Russell W.F., Lin, Wei-Kai, Zhou, Hong-Sheng.  2016.  Cryptography for Parallel RAM from Indistinguishability Obfuscation. Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science. :179–190.

Since many cryptographic schemes are about performing computation on data, it is important to consider a computation model which captures the prominent features of modern system architecture. Parallel random access machine (PRAM) is such an abstraction which not only models multiprocessor platforms, but also new frameworks supporting massive parallel computation such as MapReduce. In this work, we explore the feasibility of designing cryptographic solutions for the PRAM model of computation to achieve security while leveraging the power of parallelism and random data access. We demonstrate asymptotically optimal solutions for a wide-range of cryptographic tasks based on indistinguishability obfuscation. In particular, we construct the first publicly verifiable delegation scheme with privacy in the persistent database setting, which allows a client to privately delegate both computation and data to a server with optimal efficiency. Specifically, the server can perform PRAM computation on private data with parallel efficiency preserved (up to poly-logarithmic overhead). Our results also cover succinct randomized encoding, searchable encryption, functional encryption, secure multiparty computation, and indistinguishability obfuscation for PRAM. We obtain our results in a modular way through a notion of computational-trace indistinguishability obfuscation (CiO), which may be of independent interests.

Zhang, Rui, Xue, Rui, Yu, Ting, Liu, Ling.  2016.  Dynamic and Efficient Private Keyword Search over Inverted Index–Based Encrypted Data. ACM Trans. Internet Technol.. 16:21:1–21:20.

Querying over encrypted data is gaining increasing popularity in cloud-based data hosting services. Security and efficiency are recognized as two important and yet conflicting requirements for querying over encrypted data. In this article, we propose an efficient private keyword search (EPKS) scheme that supports binary search and extend it to dynamic settings (called DEPKS) for inverted index–based encrypted data. First, we describe our approaches of constructing a searchable symmetric encryption (SSE) scheme that supports binary search. Second, we present a novel framework for EPKS and provide its formal security definitions in terms of plaintext privacy and predicate privacy by modifying Shen et al.’s security notions [Shen et al. 2009]. Third, built on the proposed framework, we design an EPKS scheme whose complexity is logarithmic in the number of keywords. The scheme is based on the groups of prime order and enjoys strong notions of security, namely statistical plaintext privacy and statistical predicate privacy. Fourth, we extend the EPKS scheme to support dynamic keyword and document updates. The extended scheme not only maintains the properties of logarithmic-time search efficiency and plaintext privacy and predicate privacy but also has fewer rounds of communications for updates compared to existing dynamic search encryption schemes. We experimentally evaluate the proposed EPKS and DEPKS schemes and show that they are significantly more efficient in terms of both keyword search complexity and communication complexity than existing randomized SSE schemes.

Hoang, Thang, Yavuz, Attila Altay, Guajardo, Jorge.  2016.  Practical and Secure Dynamic Searchable Encryption via Oblivious Access on Distributed Data Structure. Proceedings of the 32Nd Annual Conference on Computer Security Applications. :302–313.

Dynamic Searchable Symmetric Encryption (DSSE) allows a client to perform keyword searches over encrypted files via an encrypted data structure. Despite its merits, DSSE leaks search and update patterns when the client accesses the encrypted data structure. These leakages may create severe privacy problems as already shown, for example, in recent statistical attacks on DSSE. While Oblivious Random Access Memory (ORAM) can hide such access patterns, it incurs significant communication overhead and, therefore, it is not yet fully practical for cloud computing systems. Hence, there is a critical need to develop private access schemes over the encrypted data structure that can seal the leakages of DSSE while achieving practical search/update operations. In this paper, we propose a new oblivious access scheme over the encrypted data structure for searchable encryption purposes, that we call textlessutextgreaterDtextless/utextgreateristributed textlessutextgreaterOtextless/utextgreaterblivious textlessutextgreaterDtextless/utextgreaterata structure textlessutextgreaterDSSEtextless/utextgreater (DOD-DSSE). The main idea is to create a distributed encrypted incidence matrix on two non-colluding servers such that no arbitrary queries on these servers can be linked to each other. This strategy prevents not only recent statistical attacks on the encrypted data structure but also other potential threats exploiting query linkability. Our security analysis proves that DOD-DSSE ensures the unlink-ability of queries and, therefore, offers much higher security than traditional DSSE. At the same time, our performance evaluation demonstrates that DOD-DSSE is two orders of magnitude faster than ORAM-based techniques (e.g., Path ORAM), since it only incurs a small-constant number of communication overhead. That is, we deployed DOD-DSSE on geographically distributed Amazon EC2 servers, and showed that, a search/update operation on a very large dataset only takes around one second with DOD-DSSE, while it takes 3 to 13 minutes with Path ORAM-based methods.

Abdelraheem, Mohamed Ahmed, Gehrmann, Christian, Lindström, Malin, Nordahl, Christian.  2016.  Executing Boolean Queries on an Encrypted Bitmap Index. Proceedings of the 2016 ACM on Cloud Computing Security Workshop. :11–22.

We propose a simple and efficient searchable symmetric encryption scheme based on a Bitmap index that evaluates Boolean queries. Our scheme provides a practical solution in settings where communications and computations are very constrained as it offers a suitable trade-off between privacy and performance.

Yuan, Xingliang, Wang, Xinyu, Wang, Cong, Qian, Chen, Lin, Jianxiong.  2016.  Building an Encrypted, Distributed, and Searchable Key-value Store. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :547–558.

Modern distributed key-value stores are offering superior performance, incremental scalability, and fine availability for data-intensive computing and cloud-based applications. Among those distributed data stores, the designs that ensure the confidentiality of sensitive data, however, have not been fully explored yet. In this paper, we focus on designing and implementing an encrypted, distributed, and searchable key-value store. It achieves strong protection on data privacy while preserving all the above prominent features of plaintext systems. We first design a secure data partition algorithm that distributes encrypted data evenly across a cluster of nodes. Based on this algorithm, we propose a secure transformation layer that supports multiple data models in a privacy-preserving way, and implement two basic APIs for the proposed encrypted key-value store. To enable secure search queries for secondary attributes of data, we leverage searchable symmetric encryption to design the encrypted secondary indexes which consider security, efficiency, and data locality simultaneously, and further enable secure query processing in parallel. For completeness, we present formal security analysis to demonstrate the strong security strength of the proposed designs. We implement the system prototype and deploy it to a cluster at Microsoft Azure. Comprehensive performance evaluation is conducted in terms of Put/Get throughput, Put/Get latency under different workloads, system scaling cost, and secure query performance. The comparison with Redis shows that our prototype can function in a practical manner.

Zhao, Zengzhen, Guan, Qingxiao, Zhao, Xianfeng.  2016.  Constructing Near-optimal Double-layered Syndrome-Trellis Codes for Spatial Steganography. Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security. :139–148.

In this paper, we present a new kind of near-optimal double-layered syndrome-trellis codes (STCs) for spatial domain steganography. The STCs can hide longer message or improve the security with the same-length message comparing to the previous double-layered STCs. In our scheme, according to the theoretical deduction we can more precisely divide the secret payload into two parts which will be embedded in the first layer and the second layer of the cover respectively with binary STCs. When embed the message, we encourage to realize the double-layered embedding by ±1 modifications. But in order to further decrease the modifications and improve the time efficient, we allow few pixels to be modified by ±2. Experiment results demonstrate that while applying this double-layered STCs to the adaptive steganographic algorithms, the embedding modifications become more concentrative and the number decreases, consequently the security of steganography is improved.

Roque, Antonio, Bush, Kevin B., Degni, Christopher.  2016.  Security is About Control: Insights from Cybernetics. Proceedings of the Symposium and Bootcamp on the Science of Security. :17–24.

Cybernetic closed loop regulators are used to model socio-technical systems in adversarial contexts. Cybernetic principles regarding these idealized control loops are applied to show how the incompleteness of system models enables system exploitation. We consider abstractions as a case study of model incompleteness, and we characterize the ways that attackers and defenders interact in such a formalism. We end by arguing that the science of security is most like a military science, whose foundations are analytical and generative rather than normative.

Kirchler, Matthias, Herrmann, Dominik, Lindemann, Jens, Kloft, Marius.  2016.  Tracked Without a Trace: Linking Sessions of Users by Unsupervised Learning of Patterns in Their DNS Traffic. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :23–34.

Behavior-based tracking is an unobtrusive technique that allows observers to monitor user activities on the Internet over long periods of time – in spite of changing IP addresses. Previous work has employed supervised classifiers in order to link the sessions of individual users. However, classifiers need labeled training sessions, which are difficult to obtain for observers. In this paper we show how this limitation can be overcome with an unsupervised learning technique. We present a modified k-means algorithm and evaluate it on a realistic dataset that contains the Domain Name System (DNS) queries of 3,862 users. For this purpose, we simulate an observer that tries to track all users, and an Internet Service Provider that assigns a different IP address to every user on every day. The highest tracking accuracy is achieved within the subgroup of highly active users. Almost all sessions of 73% of the users in this subgroup can be linked over a period of 56 days. 19% of the highly active users can be traced completely, i.e., all their sessions are assigned to a single cluster. This fraction increases to 40% for shorter periods of seven days. As service providers may engage in behavior-based tracking to complement their existing profiling efforts, it constitutes a severe privacy threat for users of online services. Users can defend against behavior-based tracking by changing their IP address frequently, but this is cumbersome at the moment.

Mirsky, Yisroel, Shabtai, Asaf, Rokach, Lior, Shapira, Bracha, Elovici, Yuval.  2016.  SherLock vs Moriarty: A Smartphone Dataset for Cybersecurity Research. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :1–12.

In this paper we describe and share with the research community, a significant smartphone dataset obtained from an ongoing long-term data collection experiment. The dataset currently contains 10 billion data records from 30 users collected over a period of 1.6 years and an additional 20 users for 6 months (totaling 50 active users currently participating in the experiment). The experiment involves two smartphone agents: SherLock and Moriarty. SherLock collects a wide variety of software and sensor data at a high sample rate. Moriarty perpetrates various attacks on the user and logs its activities, thus providing labels for the SherLock dataset. The primary purpose of the dataset is to help security professionals and academic researchers in developing innovative methods of implicitly detecting malicious behavior in smartphones. Specifically, from data obtainable without superuser (root) privileges. To demonstrate possible uses of the dataset, we perform a basic malware analysis and evaluate a method of continuous user authentication.

Deo, Amit, Dash, Santanu Kumar, Suarez-Tangil, Guillermo, Vovk, Volodya, Cavallaro, Lorenzo.  2016.  Prescience: Probabilistic Guidance on the Retraining Conundrum for Malware Detection. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :71–82.

Malware evolves perpetually and relies on increasingly so- phisticated attacks to supersede defense strategies. Data-driven approaches to malware detection run the risk of becoming rapidly antiquated. Keeping pace with malware requires models that are periodically enriched with fresh knowledge, commonly known as retraining. In this work, we propose the use of Venn-Abers predictors for assessing the quality of binary classification tasks as a first step towards identifying antiquated models. One of the key benefits behind the use of Venn-Abers predictors is that they are automatically well calibrated and offer probabilistic guidance on the identification of nonstationary populations of malware. Our framework is agnostic to the underlying classification algorithm and can then be used for building better retraining strategies in the presence of concept drift. Results obtained over a timeline-based evaluation with about 90K samples show that our framework can identify when models tend to become obsolete.

Pevny, Tomas, Somol, Petr.  2016.  Discriminative Models for Multi-instance Problems with Tree Structure. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :83–91.

Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.

Zhang, Hao, Yao, Danfeng(Daphne), Ramakrishnan, Naren.  2016.  Causality-based Sensemaking of Network Traffic for Android Application Security. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :47–58.

Malicious Android applications pose serious threats to mobile security. They threaten the data confidentiality and system integrity on Android devices. Monitoring runtime activities serves as an important technique for analyzing dynamic app behaviors. We design a triggering relation model for dynamically analyzing network traffic on Android devices. Our model enables one to infer the dependency of outbound network requests from the device. We describe a new machine learning approach for discovering the dependency of network requests. These request-level dependence relations are used to detect stealthy malware activities. Malicious requests are identified due to the lack of dependency with legitimate triggers. Our prototype is evaluated on 14GB network traffic data and system logs collected from an Android tablet. Experimental results show that our solution achieves a high accuracy (99.1%) in detecting malicious requests sent from new malicious apps.

Abdulla, Parosh Aziz, Aiswarya, C., Atig, Mohamed Faouzi, Montali, Marco, Rezine, Othmane.  2016.  Recency-Bounded Verification of Dynamic Database-Driven Systems. Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. :195–210.

We propose a formalism to model database-driven systems, called database manipulating systems (DMS). The actions of a (DMS) modify the current instance of a relational database by adding new elements into the database, deleting tuples from the relations and adding tuples to the relations. The elements which are modified by an action are chosen by (full) first-order queries. (DMS) is a highly expressive model and can be thought of as a succinct representation of an infinite state relational transition system, in line with similar models proposed in the literature. We propose monadic second order logic (MSO-FO) to reason about sequences of database instances appearing along a run. Unsurprisingly, the linear-time model checking problem of (DMS) against (MSO-FO) is undecidable. Towards decidability, we propose under-approximate model checking of (DMS), where the under-approximation parameter is the "bound on recency". In a k-recency-bounded run, only the most recent k elements in the current active domain may be modified by an action. More runs can be verified by increasing the bound on recency. Our main result shows that recency-bounded model checking of (DMS) against (MSO-FO) is decidable, by a reduction to the satisfiability problem of MSO over nested words.

Jing, Xiao-Yuan, Qi, Fumin, Wu, Fei, Xu, Baowen.  2016.  Missing Data Imputation Based on Low-rank Recovery and Semi-supervised Regression for Software Effort Estimation. Proceedings of the 38th International Conference on Software Engineering. :607–618.

Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.

Fang, Yi, Godavarthy, Archana, Lu, Haibing.  2016.  A Utility Maximization Framework for Privacy Preservation of User Generated Content. Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval. :281–290.

The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.

Bender, Michael A., Berry, Jonathan W., Johnson, Rob, Kroeger, Thomas M., McCauley, Samuel, Phillips, Cynthia A., Simon, Bertrand, Singh, Shikha, Zage, David.  2016.  Anti-Persistence on Persistent Storage: History-Independent Sparse Tables and Dictionaries. Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. :289–302.

We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way–-a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linear-sized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O((log2 N)/B+ logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Theta(log N) I/Os. This is no better than the bounds one gets from running an in-memory skip list in external memory.