Visible to the public Biblio

Filters: Author is Chen, Y.  [Clear All Filters]
2021-04-27
Zhang, M., Chen, Y., Huang, J..  2020.  SE-PPFM: A Searchable Encryption Scheme Supporting Privacy-Preserving Fuzzy Multikeyword in Cloud Systems. IEEE Systems Journal. :1–9.
Cloud computing provides an appearing application for compelling vision in managing big-data files and responding queries over a distributed cloud platform. To overcome privacy revealing risks, sensitive documents and private data are usually stored in the clouds in a cipher-based manner. However, it is inefficient to search the data in traditional encryption systems. Searchable encryption is a useful cryptographic primitive to enable users to retrieve data in ciphertexts. However, the traditional searchable encryptions provide lower search efficiency and cannot carry out fuzzy multikeyword queries. To solve this issue, in this article, we propose a searchable encryption that supports privacy-preserving fuzzy multikeyword search (SE-PPFM) in cloud systems, which is built by asymmetric scalar-product-preserving encryptions and Hadamard product operations. In order to realize the functionality of efficient fuzzy searches, we employ Word2vec as the primitive of machine learning to obtain a fuzzy correlation score between encrypted data and queries predicates. We analyze and evaluate the performance in terms of token of multikeyword, retrieval and match time, file retrieval time and matching accuracy, etc. The experimental results show that our scheme can achieve a higher efficiency in fuzzy multikeyword ciphertext search and provide a higher accuracy in retrieving and matching procedure.
2021-03-30
Li, Y., Ji, X., Li, C., Xu, X., Yan, W., Yan, X., Chen, Y., Xu, W..  2020.  Cross-domain Anomaly Detection for Power Industrial Control System. 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). :383—386.

In recent years, artificial intelligence has been widely used in the field of network security, which has significantly improved the effect of network security analysis and detection. However, because the power industrial control system is faced with the problem of shortage of attack data, the direct deployment of the network intrusion detection system based on artificial intelligence is faced with the problems of lack of data, low precision, and high false alarm rate. To solve this problem, we propose an anomaly traffic detection method based on cross-domain knowledge transferring. By using the TrAdaBoost algorithm, we achieve a lower error rate than using LSTM alone.

2021-02-10
Xie, J., Chen, Y., Wang, L., Wang, Z..  2020.  A Network Covert Timing Channel Detection Method Based on Chaos Theory and Threshold Secret Sharing. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 1:2380—2384.

Network covert timing channel(NCTC) is a process of transmitting hidden information by means of inter-packet delay (IPD) of legitimate network traffic. Their ability to evade traditional security policies makes NCTCs a grave security concern. However, a robust method that can be used to detect a large number of NCTCs is missing. In this paper, a NCTC detection method based on chaos theory and threshold secret sharing is proposed. Our method uses chaos theory to reconstruct a high-dimensional phase space from one-dimensional time series and extract the unique and stable channel traits. Then, a channel identifier is constructed using the secret reconstruction strategy from threshold secret sharing to realize the mapping of the channel features to channel identifiers. Experimental results show that the approach can detect varieties of NCTCs with a guaranteed true positive rate and greatly improve the versatility and robustness.

2021-02-08
Nikouei, S. Y., Chen, Y., Faughnan, T. R..  2018.  Smart Surveillance as an Edge Service for Real-Time Human Detection and Tracking. 2018 IEEE/ACM Symposium on Edge Computing (SEC). :336—337.

Monitoring for security and well-being in highly populated areas is a critical issue for city administrators, policy makers and urban planners. As an essential part of many dynamic and critical data-driven tasks, situational awareness (SAW) provides decision-makers a deeper insight of the meaning of urban surveillance. Thus, surveillance measures are increasingly needed. However, traditional surveillance platforms are not scalable when more cameras are added to the network. In this work, a smart surveillance as an edge service has been proposed. To accomplish the object detection, identification, and tracking tasks at the edge-fog layers, two novel lightweight algorithms are proposed for detection and tracking respectively. A prototype has been built to validate the feasibility of the idea, and the test results are very encouraging.

2020-12-11
Zhang, W., Byna, S., Niu, C., Chen, Y..  2019.  Exploring Metadata Search Essentials for Scientific Data Management. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). :83—92.

Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.

2020-12-02
Wang, W., Xuan, S., Yang, W., Chen, Y..  2019.  User Credibility Assessment Based on Trust Propagation in Microblog. 2019 Computing, Communications and IoT Applications (ComComAp). :270—275.

Nowadays, Microblog has become an important online social networking platform, and a large number of users share information through Microblog. Many malicious users have released various false news driven by various interests, which seriously affects the availability of Microblog platform. Therefore, the evaluation of Microblog user credibility has become an important research issue. This paper proposes a microblog user credibility evaluation algorithm based on trust propagation. In view of the high consumption and low precision caused by malicious users' attacking algorithms and manual selection of seed sets by establishing false social relationships, this paper proposes two optimization strategies: pruning algorithm based on social activity and similarity and based on The seed node selection algorithm of clustering. The pruning algorithm can trim off the attack edges established by malicious users and normal users. The seed node selection algorithm can efficiently select the highly available seed node set, and finally use the user social relationship graph to perform the two-way propagation trust scoring, so that the low trusted user has a lower trusted score and thus identifies the malicious user. The related experiments verify the effectiveness of the trustworthiness-based user credibility evaluation algorithm in the evaluation of Microblog user credibility.

2020-12-01
Yang, R., Ouyang, X., Chen, Y., Townend, P., Xu, J..  2018.  Intelligent Resource Scheduling at Scale: A Machine Learning Perspective. 2018 IEEE Symposium on Service-Oriented System Engineering (SOSE). :132—141.

Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.

2020-11-16
Shen, N., Yeh, J., Chen, C., Chen, Y., Zhang, Y..  2019.  Ensuring Query Completeness in Outsourced Database Using Order-Preserving Encryption. 2019 IEEE Intl Conf on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom). :776–783.
Nowadays database outsourcing has become business owners' preferred option and they are benefiting from its flexibility, reliability, and low cost. However, because database service providers cannot always be fully trusted and data owners will no longer have a direct control over their own data, how to make the outsourced data secure becomes a hot research topic. From the data integrity protection aspect, the client wants to make sure the data returned is correct, complete, and up-to-date. Previous research work in literature put more efforts on data correctness, while data completeness is still a challenging problem to solve. There are some existing works that tried to protect the completeness of data. Unfortunately, these solutions were considered not fully solving the problem because of their high communication or computation overhead. The implementations and limitations of existing works will be further discussed in this paper. From the data confidentiality protection aspect, order-preserving encryption (OPE) is a widely used encryption scheme in protecting data confidentiality. It allows the client to perform range queries and some other operations such as GROUP BY and ORDER BY over the OPE encrypted data. Therefore, it is worthy to develop a solution that allows user to verify the query completeness for an OPE encrypted database so that both data confidentiality and completeness are both protected. Inspired by this motivation, we propose a new data completeness protecting scheme by inserting fake tuples into databases. Both the real and fake tuples are OPE encrypted and thus the cloud server cannot distinguish among them. While our new scheme is much more efficient than all existing approaches, the level of security protection remains the same.
2020-11-04
Wu, X., Chen, Y., Li, S..  2018.  Contactless Smart Card Experiments in a Cybersecurity Course. 2018 IEEE Frontiers in Education Conference (FIE). :1—4.

This Innovate Practice Work in Progress paper is about education on Cybersecurity, which is essential in training of innovative talents in the era of the Internet. Besides knowledge and skills, it is important as well to enhance the students' awareness of cybersecurity in daily life. Considering that contactless smart cards are common and widely used in various areas, one basic and two advanced contactless smart card experiments were designed innovatively and assigned to junior students in 3-people groups in an introductory cybersecurity summer course. The experimental principles, facilities, contents and arrangement are introduced successively. Classroom tests were managed before and after the experiments, and a box and whisker plot is used to describe the distributions of the scores in both tests. The experimental output and student feedback implied the learning objectives were achieved through the problem-based, active and group learning experience during the experiments.

2019-10-14
Koo, H., Chen, Y., Lu, L., Kemerlis, V. P., Polychronakis, M..  2018.  Compiler-Assisted Code Randomization. 2018 IEEE Symposium on Security and Privacy (SP). :461–477.

Despite decades of research on software diversification, only address space layout randomization has seen widespread adoption. Code randomization, an effective defense against return-oriented programming exploits, has remained an academic exercise mainly due to i) the lack of a transparent and streamlined deployment model that does not disrupt existing software distribution norms, and ii) the inherent incompatibility of program variants with error reporting, whitelisting, patching, and other operations that rely on code uniformity. In this work we present compiler-assisted code randomization (CCR), a hybrid approach that relies on compiler-rewriter cooperation to enable fast and robust fine-grained code randomization on end-user systems, while maintaining compatibility with existing software distribution models. The main concept behind CCR is to augment binaries with a minimal set of transformation-assisting metadata, which i) facilitate rapid fine-grained code transformation at installation or load time, and ii) form the basis for reversing any applied code transformation when needed, to maintain compatibility with existing mechanisms that rely on referencing the original code. We have implemented a prototype of this approach by extending the LLVM compiler toolchain, and developing a simple binary rewriter that leverages the embedded metadata to generate randomized variants using basic block reordering. The results of our experimental evaluation demonstrate the feasibility and practicality of CCR, as on average it incurs a modest file size increase of 11.46% and a negligible runtime overhead of 0.28%, while it is compatible with link-time optimization and control flow integrity.

2019-10-08
Liu, Y., Yuan, X., Li, M., Zhang, W., Zhao, Q., Zhong, J., Cao, Y., Li, Y., Chen, L., Li, H. et al..  2018.  High Speed Device-Independent Quantum Random Number Generation without Detection Loophole. 2018 Conference on Lasers and Electro-Optics (CLEO). :1–2.

We report a an experimental study of device-independent quantum random number generation based on an detection-loophole free Bell test with entangled photons. After considering statistical fluctuations and applying an 80 Gb × 45.6 Mb Toeplitz matrix hashing, we achieve a final random bit rate of 114 bits/s, with a failure probability less than 10-5.

2019-09-23
Hunag, C., Yang, C., Weng, C., Chen, Y., Wang, S..  2019.  Secure Protocol for Identity-based Provable Data Possession in Cloud Storage. 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS). :327–331.
Remote data possession is becoming an increasingly important issue in cloud storage. It enables users to verify if their outsourced data have remained intact while in cloud storage. The existing remote data audit (RDA) protocols were designed with the public key infrastructure (PKI) system. However, this incurs considerable costs when users need to frequently access data from the cloud service provider with PKI. This study proposes a protocol, called identity-based RDA (ID-RDA) that addresses this problem without the need for users’ certificates. This study outperforms existing RDA protocols in computation and communication.
2019-04-05
Chen, S., Chen, Y., Tzeng, W..  2018.  Effective Botnet Detection Through Neural Networks on Convolutional Features. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :372-378.

Botnet is one of the major threats on the Internet for committing cybercrimes, such as DDoS attacks, stealing sensitive information, spreading spams, etc. It is a challenging issue to detect modern botnets that are continuously improving for evading detection. In this paper, we propose a machine learning based botnet detection system that is shown to be effective in identifying P2P botnets. Our approach extracts convolutional version of effective flow-based features, and trains a classification model by using a feed-forward artificial neural network. The experimental results show that the accuracy of detection using the convolutional features is better than the ones using the traditional features. It can achieve 94.7% of detection accuracy and 2.2% of false positive rate on the known P2P botnet datasets. Furthermore, our system provides an additional confidence testing for enhancing performance of botnet detection. It further classifies the network traffic of insufficient confidence in the neural network. The experiment shows that this stage can increase the detection accuracy up to 98.6% and decrease the false positive rate up to 0.5%.

2019-01-31
Chen, Y., Wu, B..  2018.  An Efficient Algorithm for Minimal Edit Cost of Graph Degree Anonymity. 2018 IEEE International Conference on Applied System Invention (ICASI). :574–577.

Personal privacy is an important issue when publishing social network data. An attacker may have information to reidentify private data. So, many researchers developed anonymization techniques, such as k-anonymity, k-isomorphism, l-diversity, etc. In this paper, we focus on graph k-degree anonymity by editing edges. Our method is divided into two steps. First, we propose an efficient algorithm to find a new degree sequence with theoretically minimal edit cost. Second, we insert and delete edges based on the new degree sequence to achieve k-degree anonymity.

2019-01-21
Lu, L., Yu, J., Chen, Y., Liu, H., Zhu, Y., Liu, Y., Li, M..  2018.  LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. :1466–1474.

To prevent users' privacy from leakage, more and more mobile devices employ biometric-based authentication approaches, such as fingerprint, face recognition, voiceprint authentications, etc., to enhance the privacy protection. However, these approaches are vulnerable to replay attacks. Although state-of-art solutions utilize liveness verification to combat the attacks, existing approaches are sensitive to ambient environments, such as ambient lights and surrounding audible noises. Towards this end, we explore liveness verification of user authentication leveraging users' lip movements, which are robust to noisy environments. In this paper, we propose a lip reading-based user authentication system, LipPass, which extracts unique behavioral characteristics of users' speaking lips leveraging build-in audio devices on smartphones for user authentication. We first investigate Doppler profiles of acoustic signals caused by users' speaking lips, and find that there are unique lip movement patterns for different individuals. To characterize the lip movements, we propose a deep learning-based method to extract efficient features from Doppler profiles, and employ Support Vector Machine and Support Vector Domain Description to construct binary classifiers and spoofer detectors for user identification and spoofer detection, respectively. Afterwards, we develop a binary tree-based authentication approach to accurately identify each individual leveraging these binary classifiers and spoofer detectors with respect to registered users. Through extensive experiments involving 48 volunteers in four real environments, LipPass can achieve 90.21% accuracy in user identification and 93.1% accuracy in spoofer detection.

2018-11-19
Chen, Y., Lai, Y., Liu, Y..  2017.  Transforming Photos to Comics Using Convolutional Neural Networks. 2017 IEEE International Conference on Image Processing (ICIP). :2010–2014.

In this paper, inspired by Gatys's recent work, we propose a novel approach that transforms photos to comics using deep convolutional neural networks (CNNs). While Gatys's method that uses a pre-trained VGG network generally works well for transferring artistic styles such as painting from a style image to a content image, for more minimalist styles such as comics, the method often fails to produce satisfactory results. To address this, we further introduce a dedicated comic style CNN, which is trained for classifying comic images and photos. This new network is effective in capturing various comic styles and thus helps to produce better comic stylization results. Even with a grayscale style image, Gatys's method can still produce colored output, which is not desirable for comics. We develop a modified optimization framework such that a grayscale image is guaranteed to be synthesized. To avoid converging to poor local minima, we further initialize the output image using grayscale version of the content image. Various examples show that our method synthesizes better comic images than the state-of-the-art method.

2018-06-20
Zhou, H., Zhang, W., Wei, F., Chen, Y..  2017.  Analysis of Android Malware Family Characteristic Based on Isomorphism of Sensitive API Call Graph. 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). :319–327.

The analysis of multiple Android malware families indicates malware instances within a common malware family always have similar call graph structures. Based on the isomorphism of sensitive API call graph, we propose a method which is used to construct malware family features via combining static analysis approach with graph similarity metric. The experiment is performed on a malware dataset which contains 1326 malware samples from 16 different malware families. The result shows that the method can differentiate distinct malware family features and divide suspect malware samples into corresponding families with a high accuracy of 96.77% overall and even defend a certain extent of obfuscation.

2018-05-02
Jian, R., Chen, Y., Cheng, Y., Zhao, Y..  2017.  Millimeter Wave Microstrip Antenna Design Based on Swarm Intelligence Algorithm in 5G. 2017 IEEE Globecom Workshops (GC Wkshps). :1–6.

In order to solve the problem of millimeter wave (mm-wave) antenna impedance mismatch in 5G communication system, a optimization algorithm for Particle Swarm Ant Colony Optimization (PSACO) is proposed to optimize antenna patch parameter. It is proved that the proposed method can effectively achieve impedance matching in 28GHz center frequency, and the return loss characteristic is obviously improved. At the same time, the nonlinear regression model is used to solve the nonlinear relationship between the resonant frequency and the patch parameters. The Elman Neural Network (Elman NN) model is used to verify the reliability of PSACO and nonlinear regression model. Patch parameters optimized by PSACO were introduced into the nonlinear relationship, which obtained error within 2%. The method proposed in this paper improved efficiency in antenna design.

2018-04-02
Chen, Y., Chen, W..  2017.  Finger ECG-Based Authentication for Healthcare Data Security Using Artificial Neural Network. 2017 IEEE 19th International Conference on E-Health Networking, Applications and Services (Healthcom). :1–6.

Wearable and mobile medical devices provide efficient, comfortable, and economic health monitoring, having a wide range of applications from daily to clinical scenarios. Health data security becomes a critically important issue. Electrocardiogram (ECG) has proven to be a potential biometric in human recognition over the past decade. Unlike conventional authentication methods using passwords, fingerprints, face, etc., ECG signal can not be simply intercepted, duplicated, and enables continuous identification. However, in many of the studies, algorithms developed are not suitable for practical application, which usually require long ECG data for authentication. In this work, we introduce a two-phase authentication using artificial neural network (NN) models. This algorithm enables fast authentication within only 3 seconds, meanwhile achieves reasonable performance in recognition. We test the proposed method in a controlled laboratory experiment with 50 subjects. Finger ECG signals are collected using a mobile device at different times and physical statues. At the first stage, a ``General'' NN model is constructed based on data from the cohort and used for preliminary screening, while at the second stage ``Personal'' NN models constructed from single individual's data are applied as fine-grained identification. The algorithm is tested on the whole data set, and on different sizes of subsets (5, 10, 20, 30, and 40). Results proved that the proposed method is feasible and reliable for individual authentication, having obtained average False Acceptance Rate (FAR) and False Rejection Rate (FRR) below 10% for the whole data set.

2017-12-12
Dai, D., Chen, Y., Carns, P., Jenkins, J., Ross, R..  2017.  Lightweight Provenance Service for High-Performance Computing. 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). :117–129.

Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.