Biblio
under review
Encryption ransomware is a malicious software that stealthily encrypts user files and demands a ransom to provide access to these files. Several prior studies have developed systems to detect ransomware by monitoring the activities that typically occur during a ransomware attack. Unfortunately, by the time the ransomware is detected, some files already undergo encryption and the user is still required to pay a ransom to access those files. Furthermore, ransomware variants can obtain kernel privilege, which allows them to terminate software-based defense systems, such as anti-virus. While periodic backups have been explored as a means to mitigate ransomware, such backups incur storage overheads and are still vulnerable as ransomware can obtain kernel privilege to stop or destroy backups. Ideally, we would like to defend against ransomware without relying on software-based solutions and without incurring the storage overheads of backups. To that end, this paper proposes FlashGuard, a ransomware tolerant Solid State Drive (SSD) which has a firmware-level recovery system that allows quick and effective recovery from encryption ransomware without relying on explicit backups. FlashGuard leverages the observation that the existing SSD already performs out-of-place writes in order to mitigate the long erase latency of flash memories. Therefore, when a page is updated or deleted, the older copy of that page is anyway present in the SSD. FlashGuard slightly modifies the garbage collection mechanism of the SSD to retain the copies of the data encrypted by ransomware and ensure effective data recovery. Our experiments with 1,447 manually labeled ransomware samples show that FlashGuard can efficiently restore files encrypted by ransomware. In addition, we demonstrate that FlashGuard has a negligible impact on the performance and lifetime of the SSD.
In recommender systems based on low-rank factorization of a partially observed user-item matrix, a common phenomenon that plagues many otherwise effective models is the interleaving of good and spurious recommendations in the top-K results. A single spurious recommendation can dramatically impact the perceived quality of a recommender system. Spurious recommendations do not result in serendipitous discoveries but rather cognitive dissonance. In this work, we investigate folding, a major contributing factor to spurious recommendations. Folding refers to the unintentional overlap of disparate groups of users and items in the low-rank embedding vector space, induced by improper handling of missing data. We formally define a metric that quantifies the severity of folding in a trained system, to assist in diagnosing its potential to make inappropriate recommendations. The folding metric complements existing information retrieval metrics that focus on the number of good recommendations and their ranks but ignore the impact of undesired recommendations. We motivate the folding metric definition on synthetic data and evaluate its effectiveness on both synthetic and real world datasets. In studying the relationship between the folding metric and other characteristics of recommender systems, we observe that optimizing for goodness metrics can lead to high folding and thus more spurious recommendations.
In the open network environment, the strange entities can establish the mutual trust through Automated Trust Negotiation (ATN) that is based on exchanging digital credentials. In traditional ATN, the attribute certificate required to either satisfied or not, and in the strategy, the importance of the certificate is same, it may cause some unnecessary negotiation failure. And in the actual situation, the properties is not just 0 or 1, it is likely to between 0 and 1, so the satisfaction degree is different, and the negotiation strategy need to be quantified. This paper analyzes the fuzzy negotiation process, in order to improve the trust establishment in high efficiency and accuracy further.
To meet the growing railway-transportation demand, a new train control system, communication-based train control (CBTC) system, aims to maximize the ability of train lines by reducing the headway of each train. However, the wireless communications expose the CBTC system to new security threats. Due to the cyber-physical nature of the CBTC system, a jamming attack can damage the physical part of the train system by disrupting the communications. To address this issue, we develop a secure framework to mitigate the impact of the jamming attack based on a security criterion. At the cyber layer, we apply a multi-channel model to enhance the reliability of the communications and develop a zero-sum stochastic game to capture the interactions between the transmitter and jammer. We present analytical results and apply dynamic programming to find the equilibrium of the stochastic game. Finally, the experimental results are provided to evaluate the performance of the proposed secure mechanism.
accepted
Reliable operation of electrical power systems in the presence of multiple critical N − k contingencies is an important challenge for the system operators. Identifying all the possible N − k critical contingencies to design effective mitigation strategies is computationally infeasible due to the combinatorial explosion of the search space. This paper describes two heuristic algorithms based on the iterative pruning of the candidate contingency set to effectively and efficiently identify all the critical N − k contingencies resulting in system failure. These algorithms are applied to the standard IEEE-14 bus system, IEEE-39 bus system, and IEEE-57 bus system to identify multiple critical N − k contingencies. The algorithms are able to capture all the possible critical N − k contingencies (where 1 ≤ k ≤ 9) without missing any dangerous contingency.
A high definition(HD) wide dynamic video surveillance system is designed and implemented based on Field Programmable Gate Array(FPGA). This system is composed of three subsystems, which are video capture, video wide dynamic processing and video display subsystem. The images in the video are captured directly through the camera that is configured in a pattern have long exposure in odd frames and short exposure in even frames. The video data stream is buffered in DDR2 SDRAM to obtain two adjacent frames. Later, the image data fusion is completed by fusing the long exposure image with the short exposure image (pixel by pixel). The video image display subsystem can display the image through a HDMI interface. The system is designed on the platform of Lattice ECP3-70EA FPGA, and camera is the Panasonic MN34229 sensor. The experimental result shows that this system can expand dynamic range of the HD video with 30 frames per second and a resolution equal to 1920*1080 pixels by real-time wide dynamic range (WDR) video processing, and has a high practical value.
With millions of apps available to users, the mobile app market is rapidly becoming very crowded. Given the intense competition, the time to market is a critical factor for the success and profitability of an app. In order to shorten the development cycle, developers often focus their efforts on the unique features and workflows of their apps and rely on third-party Open Source Software (OSS) for the common features. Unfortunately, despite their benefits, careless use of OSS can introduce significant legal and security risks, which if ignored can not only jeopardize security and privacy of end users, but can also cause app developers high financial loss. However, tracking OSS components, their versions, and interdependencies can be very tedious and error-prone, particularly if an OSS is imported with little to no knowledge of its provenance. We therefore propose OSSPolice, a scalable and fully-automated tool for mobile app developers to quickly analyze their apps and identify free software license violations as well as usage of known vulnerable versions of OSS. OSSPolice introduces a novel hierarchical indexing scheme to achieve both high scalability and accuracy, and is capable of efficiently comparing similarities of app binaries against a database of hundreds of thousands of OSS sources (billions of lines of code). We populated OSSPolice with 60K C/C++ and 77K Java OSS sources and analyzed 1.6M free Google Play Store apps. Our results show that 1) over 40K apps potentially violate GPL/AGPL licensing terms, and 2) over 100K of apps use known vulnerable versions of OSS. Further analysis shows that developers violate GPL/AGPL licensing terms due to lack of alternatives, and use vulnerable versions of OSS despite efforts from companies like Google to improve app security. OSSPolice is available on GitHub.
Recent studies have shown that adding explicit social trust information to social recommendation significantly improves the prediction accuracy of ratings, but it is difficult to obtain a clear trust data among users in real life. Scholars have studied and proposed some trust measure methods to calculate and predict the interaction and trust between users. In this article, a method of social trust relationship extraction based on hellinger distance is proposed, and user similarity is calculated by describing the f-divergence of one side node in user-item bipartite networks. Then, a new matrix factorization model based on implicit social relationship is proposed by adding the extracted implicit social relations into the improved matrix factorization. The experimental results support that the effect of using implicit social trust to recommend is almost the same as that of using actual explicit user trust ratings, and when the explicit trust data cannot be extracted, our method has a better effect than the other traditional algorithms.
Third-party IME (Input Method Editor) apps are often the preference means of interaction for Android users' input. In this paper, we first discuss the insecurity of IME apps, including the Potentially Harmful Apps (PHA) and malicious IME apps, which may leak users' sensitive keystrokes. The current defense system, such as I-BOX, is vulnerable to the prefix-substitution attack and the colluding attack due to the post-IME nature. We provide a deeper understanding that all the designs with the post-IME nature are subject to the prefix-substitution and colluding attacks. To remedy the above post-IME system's flaws, we propose a new idea, pre-IME, which guarantees that "Is this touch event a sensitive keystroke?" analysis will always access user touch events prior to the execution of any IME app code. We designed an innovative TrustZone-based framework named IM-Visor which has the pre-IME nature. Specifically, IM-Visor creates the isolation environment named STIE as soon as a user intends to type on a soft keyboard, then the STIE intercepts, translates and analyzes the user's touch input. If the input is sensitive, the translation of keystrokes will be delivered to user apps through a trusted path. Otherwise, IM-Visor replays non-sensitive keystroke touch events for IME apps or replays non-keystroke touch events for other apps. A prototype of IM-Visor has been implemented and tested with several most popular IMEs. The experimental results show that IM-Visor has small runtime overheads.
Modbus over TCP/IP is one of the most popular industrial network protocol that are widely used in critical infrastructures. However, vulnerability of Modbus TCP protocol has attracted widely concern in the public. The traditional intrusion detection methods can identify some intrusion behaviors, but there are still some problems. In this paper, we present an innovative approach, SD-IDS (Stereo Depth IDS), which is designed for perform real-time deep inspection for Modbus TCP traffic. SD-IDS algorithm is composed of two parts: rule extraction and deep inspection. The rule extraction module not only analyzes the characteristics of industrial traffic, but also explores the semantic relationship among the key field in the Modbus TCP protocol. The deep inspection module is based on rule-based anomaly intrusion detection. Furthermore, we use the online test to evaluate the performance of our SD-IDS system. Our approach get a low rate of false positive and false negative.
In order to investigate the relationship and effect on the performance of magnetic modulator among applied DC current, excitation source, excitation loop current, sensitivity and induced voltage of detecting winding, this paper measured initial permeability, maximum permeability, saturation magnetic induction intensity, remanent magnetic induction intensity, coercivity, saturated magnetic field intensity, magnetization curve, permeability curve and hysteresis loop of main core 1J85 permalloy of magnetic modulator based on ballistic method. On this foundation, employ curve fitting tool of MATLAB; adopt multiple regression method to comprehensively compare and analyze the sum of squares due to error (SSE), coefficient of determination (R-square), degree-of-freedom adjusted coefficient of determination (Adjusted R-square), and root mean squared error (RMSE) of fitting results. Finally, establish B-H curve mathematical model based on the sum of arc-hyperbolic sine function and polynomial.
Existing methods of generative adversarial network (GAN) use different criteria to distinguish between real and fake samples, such as probability [9],energy [44] energy or other losses [30]. In this paper, by employing the merits of deep metric learning, we propose a novel metric-based generative adversarial network (MBGAN), which uses the distance-criteria to distinguish between real and fake samples. Specifically, the discriminator of MBGAN adopts a triplet structure and learns a deep nonlinear transformation, which maps input samples into a new feature space. In the transformed space, the distance between real samples is minimized, while the distance between real sample and fake sample is maximized. Similar to the adversarial procedure of existing GANs, a generator is trained to produce synthesized examples, which are close to real examples, while a discriminator is trained to maximize the distance between real and fake samples to a large margin. Meanwhile, instead of using a fixed margin, we adopt a data-dependent margin [30], so that the generator could focus on improving the synthesized samples with poor quality, instead of wasting energy on well-produce samples. Our proposed method is verified on various benchmarks, such as CIFAR-10, SVHN and CelebA, and generates high-quality samples.
Data deduplication [3] is able to effectively identify and eliminate redundant data and only maintain a single copy of files and chunks. Hence, it is widely used in cloud storage systems to save the users' network bandwidth for uploading data. However, the occurrence of deduplication can be easily identified by monitoring and analyzing network traffic, which leads to the risk of user privacy leakage. The attacker can carry out a very dangerous side channel attack, i.e., learn-the-remaining-information (LRI) attack, to reveal users' privacy information by exploiting the side channel of network traffic in deduplication [1]. In the LRI attack, the attacker knows a large part of the target file in the cloud and tries to learn the remaining unknown parts via uploading all possible versions of the file's content. For example, the attacker knows all the contents of the target file X except the sensitive information \texttheta. To learn the sensitive information, the attacker needs to upload m files with all possible values of \texttheta, respectively. If a file Xd with the value \textthetad is deduplicated and other files are not, the attacker knows that the information \texttheta = \textthetad. In the threat model of the LRI attack, we consider a general cloud storage service model that includes two entities, i.e., the user and cloud storage server. The attack is launched by the users who aim to steal the privacy information of other users [1]. The attacker can act as a user via its own account or use multiple accounts to disguise as multiple users. The cloud storage server communicates with the users through Internet. The connections from the clients to the cloud storage server are encrypted by SSL or TLS protocol. Hence, the attacker can monitor and measure the amount of network traffic between the client and server but cannot intercept and analyze the contents of the transmitted data due to the encryption. The attacker can then perform the sophisticated traffic analysis with sufficient computing resources. We propose a simple yet effective scheme, called randomized redundant chunk scheme (RRCS), to significantly mitigate the risk of the LRI attack while maintaining the high bandwidth efficiency of deduplication. The basic idea behind RRCS is to add randomized redundant chunks to mix up the real deduplication states of files used for the LRI attack, which effectively obfuscates the view of the attacker, who attempts to exploit the side channel of network traffic for the LRI attack. RRCS includes three key function modules, range generation (RG), secure bounds setting (SBS), and security-irrelevant redundancy elimination (SRE). When uploading the random-number redundant chunks, RRCS first uses RG to generate a fixed range [0,$łambda$N] ($łambda$ $ε$ (0,1]), in which the number of added redundant chunks is randomly chosen, where N is the total number of chunks in a file and $łambda$ is a system parameter. However, the fixed range may cause a security issue. SBS is used to deal with the bounds of the fixed range to avoid the security issue. There may exist security-irrelevant redundant chunks in RRCS. SRE reduces the security-irrelevant redundant chunks to improve the deduplication efficiency. The design details are presented in our technical report [5]. Our security analysis demonstrates RRCS can significantly reduce the risk of the LRI attack [5]. We examine the performance of RRCS using three real-world trace-based datasets, i.e., Fslhomes [2], MacOS [2], and Onefull [4], and compare RRCS with the randomized threshold scheme (RTS) [1]. Our experimental results show that source-based deduplication eliminates 100% data redundancy which however has no security guarantee. File-level (chunk-level) RTS only eliminates 8.1% – 16.8% (9.8% – 20.3%) redundancy, due to only eliminating the redundancy of the files (chunks) that have many copies. RRCS with $łambda$ = 0.5 eliminates 76.1% – 78.0% redundancy and RRCS with $łambda$ = 1 eliminates 47.9% – 53.6% redundancy.
As parallel and distributed systems are evolving toward extreme scale, for example, high-performance computing systems involve millions of cores and billion-way parallelism, and high- capacity storage systems require efficient access to petabyte or exabyte of data, many new challenges are posed on designing and deploying next-generation interconnection communication networks in these systems. Fat-tree networks have been widely used in both data centers and high-performance computing (HPC) systems in the past decades and are promising candidates of the next-generation extreme-scale networks. In this article, we present FatTreeSim, a simulation framework that supports modeling and simulation of extreme-scale fattree networks with the goal of understanding the design constraints of next-generation HPC and distributed systems and aiding the design and performance optimization of the applications running on these systems. We have systematically experimented FatTreeSim on Emulab and Blue Gene/Q and analyzed the scalability and fidelity of FatTreeSim with various network configurations. On the Blue Gene/Q Mira, FatTreeSim can achieve a peak performance of 305 million events per second using 16,384 cores. Finally, we have applied FatTreeSim to simulate several large-scale Hadoop YARN applications to demonstrate its usability.