Biblio
The huge amount of user log data collected by search engine providers creates new opportunities to understand user loyalty and defection behavior at an unprecedented scale. However, this also poses a great challenge to analyze the behavior and glean insights into the complex, large data. In this paper, we introduce LoyalTracker, a visual analytics system to track user loyalty and switching behavior towards multiple search engines from the vast amount of user log data. We propose a new interactive visualization technique (flow view) based on a flow metaphor, which conveys a proper visual summary of the dynamics of user loyalty of thousands of users over time. Two other visualization techniques, a density map and a word cloud, are integrated to enable analysts to gain further insights into the patterns identified by the flow view. Case studies and the interview with domain experts are conducted to demonstrate the usefulness of our technique in understanding user loyalty and switching behavior in search engines.
Cloud computing is widely deployed to handle challenges such as big data processing and storage. Due to the outsourcing and sharing feature of cloud computing, security is one of the main concerns that hinders the end users to shift their businesses to the cloud. A lot of cryptographic techniques have been proposed to alleviate the data security issues in cloud computing, but most of these works focus on solving a specific security problem such as data sharing, comparison, searching, etc. At the same time, little efforts have been done on program security and formalization of the security requirements in the context of cloud computing. We propose a formal definition of the security of cloud computing, which captures the essence of the security requirements of both data and program. Analysis of some existing technologies under the proposed definition shows the effectiveness of the definition. We also give a simple look-up table based solution for secure cloud computing which satisfies the given definition. As FPGA uses look-up table as its main computation component, it is a suitable hardware platform for the proposed secure cloud computing scheme. So we use FPGAs to implement the proposed solution for k-means clustering algorithm, which shows the effectiveness of the proposed solution.
In the era of big data, many users and companies start to move their data to cloud storage to simplify data management and reduce data maintenance cost. However, security and privacy issues become major concerns because third-party cloud service providers are not always trusty. Although data contents can be protected by encryption, the access patterns that contain important information are still exposed to clouds or malicious attackers. In this paper, we apply the ORAM algorithm to enable privacy-preserving access to big data that are deployed in distributed file systems built upon hundreds or thousands of servers in a single or multiple geo-distributed cloud sites. Since the ORAM algorithm would lead to serious access load unbalance among storage servers, we study a data placement problem to achieve a load balanced storage system with improved availability and responsiveness. Due to the NP-hardness of this problem, we propose a low-complexity algorithm that can deal with large-scale problem size with respect to big data. Extensive simulations are conducted to show that our proposed algorithm finds results close to the optimal solution, and significantly outperforms a random data placement algorithm.
In the last decade, the request for Internet access in heterogeneous environments keeps on growing, principally in mobile platforms such as buses, airplanes and trains. Consequently, several extensions and schemes have been introduced to achieve seamless handoff of mobile networks from one subnet to another. Even with these enhancements, the problem of maintaining the security concerns and availability has not been resolved yet, especially, the absence of authentication mechanism between network entities in order to avoid vulnerability from attacks. To eliminate the threats on the interface between the mobile access gateway (MAG) and the mobile router (MR) in improving fast PMIPv6-based network mobility (IFP-NEMO) protocol, we propose a lightweight mutual authentication mechanism in improving fast PMIPv6-based network mobility scheme (LMAIFPNEMO). This scheme uses authentication, authorization and accounting (AAA) servers to enhance the security of the protocol IFP-NEMO which allows the integration of improved fast proxy mobile IPv6 (PMIPv6) in network mobility (NEMO). We use only symmetric cryptographic, generated nonces and hash operation primitives to ensure a secure authentication procedure. Then, we analyze the security aspect of the proposed scheme and evaluate it using the automated validation of internet security protocols and applications (AVISPA) software which has proved that authentication goals are achieved.
Billions of dollars of services and goods are sold through email marketing. Subject lines have a strong influence on open rates of the e-mails, as the consumers often open e-mails based on the subject. Traditionally, the e-mail-subject lines are compiled based on the best assessment of the human editors. We propose a method to help the editors by predicting subject line open rates by learning from past subject lines. The method derives different types of features from subject lines based on keywords, performance of past subject lines and syntax. Furthermore, we evaluate the contribution of individual subject-line keywords to overall open rates based on an iterative method-namely Attribution Scoring - and use this for improved predictions. A random forest based model is trained to combine these features to predict the performance. We use a dataset of more than a hundred thousand different subject lines with many billions of impressions to train and test the method. The proposed method shows significant improvement in prediction accuracy over the baselines for both new as well as already used subject lines.
Denial of Service (DoS) and Distributed Denial of Service (DDoS) attack, exhausts the resources of server/service and makes it unavailable for legitimate users. With increasing use of online services and attacks on these services, the importance of Intrusion Detection System (IDS) for detection of DoS/DDoS attacks has also grown. Detection accuracy & CPU utilization of Data mining based IDS is directly proportional to the quality of training dataset used to train it. Various preprocessing methods like normalization, discretization, fuzzification are used by researchers to improve the quality of training dataset. This paper evaluates the effect of various data preprocessing methods on the detection accuracy of DoS/DDoS attack detection IDS and proves that numeric to binary preprocessing method performs better compared to other methods. Experimental results obtained using KDD 99 dataset are provided to support the efficiency of proposed combination.
Dual Energy CT (DECT) has recently gained significant research interest owing to its ability to discriminate materials, and hence is widely applied in the field of nuclear safety and security inspection. With the current technological developments, DECT can be typically realized by using two sets of detectors, one for detecting lower energy X-rays and another for detecting higher energy X-rays. This makes the imaging system expensive, limiting its practical implementation. In 2009, our group performed a preliminary study on a new low-cost system design, using only a complete data set for lower energy level and a sparse data set for the higher energy level. This could significantly reduce the cost of the system, as it contained much smaller number of detector elements. Reconstruction method is the key point of this system. In the present study, we further validated this system and proposed a robust method, involving three main steps: (1) estimation of the missing data iteratively with TV constraints; (2) use the reconstruction from the complete lower energy CT data set to form an initial estimation of the projection data for higher energy level; (3) use ordered views to accelerate the computation. Numerical simulations with different number of detector elements have also been examined. The results obtained in this study demonstrate that 1 + 14% CT data is sufficient enough to provide a rather good reconstruction of both the effective atomic number and electron density distributions of the scanned object, instead of 2 sets CT data.
This paper investigates mechanisms that guarantee secure information flow in a computer system. These mechanisms are examined within a mathematical framework suitable for formulating the requirements of secure information flow among security classes. The central component of the model is a lattice structure derived from the security classes and justified by the semantics of information flow. The lattice properties permit concise formulations of the security requirements of different existing systems and facilitate the construction of mechanisms that enforce security. The model provides a unifying view of all systems that restrict information flow, enables a classification of them according to security objectives, and suggests some new approaches. It also leads to the construction of automatic program certification mechanisms for verifying the secure flow of information through a program.
This article was identified by the SoS Best Scientific Cybersecurity Paper Competition Distinguished Experts as a Science of Security Significant Paper.
The Science of Security Paper Competition was developed to recognize and honor recently published papers that advance the science of cybersecurity. During the development of the competition, members of the Distinguished Experts group suggested that listing papers that made outstanding contributions, empirical or theoretical, to the science of cybersecurity in earlier years would also benefit the research community.
Online cyber threat descriptions are rich, but little research has attempted to systematically analyze these descriptions. In this paper, we process and analyze two of Symantec’s online threat description corpora. The Anti-Virus (AV) corpus contains descriptions of more than 12,400 threats detected by Symantec’s AV, and the Intrusion Prevention System (IPS) corpus contains descriptions of more than 2,700 attacks detected by Symantec’s IPS. In our analysis, we quantify the over time evolution of threat severity and type in the corpora. We also assess the amount of time Symantec takes to release signatures for newly discovered threats. Our analysis indicates that a very small minority of threats in the AV corpus are high-severity, whereas the majority of attacks in the IPS corpus are high-severity. Moreover, we find that the prevalence of different threat types such as worms and viruses in the corpora varies considerably over time. Finally, we find that Symantec prioritizes releasing signatures for fast propagating threats.
In highly configurable systems the configuration space is too big for (re-)certifying every configuration in isolation. In this project, we combine software analysis with network analysis to detect which configuration options interact and which have local effects. Instead of analyzing a system as Linux and SELinux for every combination of configuration settings one by one (>102000 even considering compile-time configurations only), we analyze the effect of each configuration option once for the entire configuration space. The analysis will guide us to designs separating interacting configuration options in a core system and isolating orthogonal and less trusted configuration options from this core.
Information system developers and administrators often overlook critical security requirements and best practices. This may be due to lack of tools and techniques that allow practitioners to tailor security knowledge to their particular context. In order to explore the impact of new security methods, we must improve our ability to study the impact of security tools and methods on software and system development. In this paper, we present early findings of an experiment to assess the extent to which the number and type of examples used in security training stimuli can impact security problem solving. To motivate this research, we formulate hypotheses from analogical transfer theory in psychology. The independent variables include number of problem surfaces and schemas, and the dependent variable is the answer accuracy. Our study results do not show a statistically significant difference in performance when the number and types of examples are varied. We discuss the limitations, threats to validity and opportunities for future studies in this area.
According to a 2011 survey in healthcare, the most commonly reported breaches of protected health information involved employees snooping into medical records of friends and relatives. Logging mechanisms can provide a means for forensic analysis of user activity in software systems by proving that a user performed certain actions in the system. However, logging mechanisms often inconsistently capture user interactions with sensitive data, creating gaps in traces of user activity. Explicit design principles and systematic testing of logging mechanisms within the software development lifecycle may help strengthen the overall security of software. The objective of this research is to observe the current state of logging mechanisms by performing an exploratory case study in which we systematically evaluate logging mechanisms by supplementing the expected results of existing functional black-box test cases to include log output. We perform an exploratory case study of four open-source electronic health record (EHR) logging mechanisms: OpenEMR, OSCAR, Tolven eCHR, and WorldVistA. We supplement the expected results of 30 United States government-sanctioned test cases to include log output to track access of sensitive data. We then execute the test cases on each EHR system. Six of the 30 (20%) test cases failed on all four EHR systems because user interactions with sensitive data are not logged. We find that viewing protected data is often not logged by default, allowing unauthorized views of data to go undetected. Based on our results, we propose a set of principles that developers should consider when developing logging mechanisms to ensure the ability to capture adequate traces of user activity.