Visible to the public Biblio

Filters: Author is Wang, Tao  [Clear All Filters]
2023-06-16
Ren, Lijuan, Wang, Tao, Seklouli, Aicha Sekhari, Zhang, Haiqing, Bouras, Abdelaziz.  2022.  Missing Values for Classification of Machine Learning in Medical data. 2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD). :101—106.
Missing values are an unavoidable problem for classification tasks of machine learning in medical data. With the rapid development of the medical system, large scale medical data is increasing. Missing values increase the difficulty of mining hidden but useful information in these medical datasets. Deletion and imputation methods are the most popular methods for dealing with missing values. Existing studies ignored to compare and discuss the deletion and imputation methods of missing values under the row missing rate and the total missing rate. Meanwhile, they rarely used experiment data sets that are mixed-type and large scale. In this work, medical data sets of various sizes and mixed-type are used. At the same time, performance differences of deletion and imputation methods are compared under the MCAR (Missing Completely At Random) mechanism in the baseline task using LR (Linear Regression) and SVM (Support Vector Machine) classifiers for classification with the same row and total missing rates. Experimental results show that under the MCAR missing mechanism, the performance of two types of processing methods is related to the size of datasets and missing rates. As the increasing of missing rate, the performance of two types for processing missing values decreases, but the deletion method decreases faster, and the imputation methods based on machine learning have more stable and better classification performance on average. In addition, small data sets are easily affected by processing methods of missing values.
2020-01-27
Cao, Mengchen, Hou, Xiantong, Wang, Tao, Qu, Hunter, Zhou, Yajin, Bai, Xiaolong, Wang, Fuwei.  2019.  Different is Good: Detecting the Use of Uninitialized Variables through Differential Replay. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. :1883–1897.
The use of uninitialized variables is a common issue. It could cause kernel information leak, which defeats the widely deployed security defense, i.e., kernel address space layout randomization (KASLR). Though a recent system called Bochspwn Reloaded reported multiple memory leaks in Windows kernels, how to effectively detect this issue is still largely behind. In this paper, we propose a new technique, i.e., differential replay, that could effectively detect the use of uninitialized variables. Specifically, it records and replays a program's execution in multiple instances. One instance is with the vanilla memory, the other one changes (or poisons) values of variables allocated from the stack and the heap. Then it compares program states to find references to uninitialized variables. The idea is that if a variable is properly initialized, it will overwrite the poisoned value and program states in two running instances should be the same. After detecting the differences, our system leverages the symbolic taint analysis to further identify the location where the variable was allocated. This helps us to identify the root cause and facilitate the development of real exploits. We have implemented a prototype called TimePlayer. After applying it to both Windows 7 and Windows 10 kernels (x86/x64), it successfully identified 34 new issues and another 85 ones that had been patched (some of them were publicly unknown.) Among 34 new issues, 17 of them have been confirmed as zero-day vulnerabilities by Microsoft.
2020-01-21
Gao, Jiaqiong, Wang, Tao.  2019.  Research on the IPv6 Technical Defects and Countermeasures. 2019 International Conference on Computer Network, Electronic and Automation (ICCNEA). :165–170.
The current global Internet USES the TCP/IP protocol cluster, the current version is IPv4. The IPv4 is with 32-bit addresses, the maximum number of computers connected to the Internet in the world is 232. With the development of Internet of things, big data and cloud storage and other technologies, the limited address space defined by IPv4 has been exhausted. To expand the address space, the IETF designed the next generation IPv6 to replace IPv4. IPv6 using a 128-bit address length that provides almost unlimited addresses. However, with the development and application of the Internet of things, big data and cloud storage, IPv6 has some shortcomings in its addressing structure design; security and network compatibility, These technologies are gradually applied in recent years, the continuous development of new technologies application show that the IPv6 address structure design ideas have some fatal defects. This paper proposed a route to upgrade the original IPv4 by studying on the structure of IPv6 "spliced address", and point out the defects in the design of IPv6 interface ID and the potential problems such as security holes.