Visible to the public Biblio

Filters: Keyword is Data preprocessing  [Clear All Filters]
2023-09-20
Shi, Yong.  2022.  A Machine Learning Study on the Model Performance of Human Resources Predictive Algorithms. 2022 4th International Conference on Applied Machine Learning (ICAML). :405—409.
A good ecological environment is crucial to attracting talents, cultivating talents, retaining talents and making talents fully effective. This study provides a solution to the current mainstream problem of how to deal with excellent employee turnover in advance, so as to promote the sustainable and harmonious human resources ecological environment of enterprises with a shortage of talents.This study obtains open data sets and conducts data preprocessing, model construction and model optimization, and describes a set of enterprise employee turnover prediction models based on RapidMiner workflow. The data preprocessing is completed with the help of the data statistical analysis software IBM SPSS Statistic and RapidMiner.Statistical charts, scatter plots and boxplots for analysis are generated to realize data visualization analysis. Machine learning, model application, performance vector, and cross-validation through RapidMiner's multiple operators and workflows. Model design algorithms include support vector machines, naive Bayes, decision trees, and neural networks. Comparing the performance parameters of the algorithm model from the four aspects of accuracy, precision, recall and F1-score. It is concluded that the performance of the decision tree algorithm model is the highest. The performance evaluation results confirm the effectiveness of this model in sustainable exploring of enterprise employee turnover prediction in human resource management.
2023-09-07
Xie, Xinjia, Guo, Yunxiao, Yin, Jiangting, Gai, Shun, Long, Han.  2022.  Research on Intellectual Property Protection of Artificial Intelligence Creation in China Based on SVM Kernel Methods. 2022 International Conference on Blockchain Technology and Information Security (ICBCTIS). :230–236.
Artificial intelligence creation comes into fashion and has brought unprecedented challenges to intellectual property law. In order to study the viewpoints of AI creation copyright ownership from professionals in different institutions, taking the papers of AI creation on CNKI from 2016 to 2021, we applied orthogonal design and analysis of variance method to construct the dataset. A kernel-SVM classifier with different kernel methods in addition to some shallow machine learning classifiers are selected in analyzing and predicting the copyright ownership of AI creation. Support vector machine (svm) is widely used in statistics and the performance of SVM method is closely related to the choice of the kernel function. SVM with RBF kernel surpasses the other seven kernel-SVM classifiers and five shallow classifier, although the accuracy provided by all of them was not satisfactory. Various performance metrics such as accuracy, F1-score are used to evaluate the performance of KSVM and other classifiers. The purpose of this study is to explore the overall viewpoints of AI creation copyright ownership, investigate the influence of different features on the final copyright ownership and predict the most likely viewpoint in the future. And it will encourage investors, researchers and promote intellectual property protection in China.
2023-03-17
Gao, Chulan, Shahriar, Hossain, Lo, Dan, Shi, Yong, Qian, Kai.  2022.  Improving the Prediction Accuracy with Feature Selection for Ransomware Detection. 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC). :424–425.
This paper presents the machine learning algorithm to detect whether an executable binary is benign or ransomware. The ransomware cybercriminals have targeted our infrastructure, businesses, and everywhere which has directly affected our national security and daily life. Tackling the ransomware threats more effectively is a big challenge. We applied a machine-learning model to classify and identify the security level for a given suspected malware for ransomware detection and prevention. We use the feature selection data preprocessing to improve the prediction accuracy of the model.
ISSN: 0730-3157
2022-09-30
Baptiste, Millot, Julien, Francq, Franck, Sicard.  2021.  Systematic and Efficient Anomaly Detection Framework using Machine Learning on Public ICS Datasets. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :292–297.
Industrial Control Systems (ICSs) are used in several domains such as Transportation, Manufacturing, Defense and Power Generation and Distribution. ICSs deal with complex physical systems in order to achieve an industrial purpose with operational safety. Security has not been taken into account by design in these systems that makes them vulnerable to cyberattacks.In this paper, we rely on existing public ICS datasets as well as on the existing literature of Machine Learning (ML) applications for anomaly detection in ICSs in order to improve detection scores. To perform this purpose, we propose a systematic framework, relying on established ML algorithms and suitable data preprocessing methods, which allows us to quickly get efficient, and surprisingly, better results than the literature. Finally, some recommendations for future public ICS dataset generations end this paper, which would be fruitful for improving future attack detection models and then protect new ICSs designed in the next future.
2022-09-09
Raafat, Maryam A., El-Wakil, Rania Abdel-Fattah, Atia, Ayman.  2021.  Comparative study for Stylometric analysis techniques for authorship attribution. 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC). :176—181.
A text is a meaningful source of information. Capturing the right patterns in written text gives metrics to measure and infer to what extent this text belongs or is relevant to a specific author. This research aims to introduce a new feature that goes more in deep in the language structure. The feature introduced is based on an attempt to differentiate stylistic changes among authors according to the different sentence structure each author uses. The study showed the effect of introducing this new feature to machine learning models to enhance their performance. It was found that the prediction of authors was enhanced by adding sentence structure as an additional feature as the f1\_scores increased by 0.3% and when normalizing the data and adding the feature it increased by 5%.
2022-03-01
Zhao, Ruijie, Li, Zhaojie, Xue, Zhi, Ohtsuki, Tomoaki, Gui, Guan.  2021.  A Novel Approach Based on Lightweight Deep Neural Network for Network Intrusion Detection. 2021 IEEE Wireless Communications and Networking Conference (WCNC). :1–6.
With the ubiquitous network applications and the continuous development of network attack technology, all social circles have paid close attention to the cyberspace security. Intrusion detection systems (IDS) plays a very important role in ensuring computer and communication systems security. Recently, deep learning has achieved a great success in the field of intrusion detection. However, the high computational complexity poses a major hurdle for the practical deployment of DL-based models. In this paper, we propose a novel approach based on a lightweight deep neural network (LNN) for IDS. We design a lightweight unit that can fully extract data features while reducing the computational burden by expanding and compressing feature maps. In addition, we use inverse residual structure and channel shuffle operation to achieve more effective training. Experiment results show that our proposed model for intrusion detection not only reduces the computational cost by 61.99% and the model size by 58.84%, but also achieves satisfactory accuracy and detection rate.
2022-02-22
Cancela, Brais, Bolón-Canedo, Verónica, Alonso-Betanzos, Amparo.  2021.  A delayed Elastic-Net approach for performing adversarial attacks. 2020 25th International Conference on Pattern Recognition (ICPR). :378–384.
With the rise of the so-called Adversarial Attacks, there is an increased concern on model security. In this paper we present two different contributions: novel measures of robustness (based on adversarial attacks) and a novel adversarial attack. The key idea behind these metrics is to obtain a measure that could compare different architectures, with independence of how the input is preprocessed (robustness against different input sizes and value ranges). To do so, a novel adversarial attack is presented, performing a delayed elastic-net adversarial attack (constraints are only used whenever a successful adversarial attack is obtained). Experimental results show that our approach obtains state-of-the-art adversarial samples, in terms of minimal perturbation distance. Finally, a benchmark of ImageNet pretrained models is used to conduct experiments aiming to shed some light about which model should be selected whenever security is a role factor.
2021-03-29
Begaj, S., Topal, A. O., Ali, M..  2020.  Emotion Recognition Based on Facial Expressions Using Convolutional Neural Network (CNN). 2020 International Conference on Computing, Networking, Telecommunications Engineering Sciences Applications (CoNTESA). :58—63.

Over the last few years, there has been an increasing number of studies about facial emotion recognition because of the importance and the impact that it has in the interaction of humans with computers. With the growing number of challenging datasets, the application of deep learning techniques have all become necessary. In this paper, we study the challenges of Emotion Recognition Datasets and we also try different parameters and architectures of the Conventional Neural Networks (CNNs) in order to detect the seven emotions in human faces, such as: anger, fear, disgust, contempt, happiness, sadness and surprise. We have chosen iCV MEFED (Multi-Emotion Facial Expression Dataset) as the main dataset for our study, which is relatively new, interesting and very challenging.

2020-08-28
Huang, Angus F.M., Chi-Wei, Yang, Tai, Hsiao-Chi, Chuan, Yang, Huang, Jay J.C., Liao, Yu-Han.  2019.  Suspicious Network Event Recognition Using Modified Stacking Ensemble Machine Learning. 2019 IEEE International Conference on Big Data (Big Data). :5873—5880.
This study aims to detect genuine suspicious events and false alarms within a dataset of network traffic alerts. The rapid development of cloud computing and artificial intelligence-oriented automatic services have enabled a large amount of data and information to be transmitted among network nodes. However, the amount of cyber-threats, cyberattacks, and network intrusions have increased in various domains of network environments. Based on the fields of data science and machine learning, this paper proposes a series of solutions involving data preprocessing, exploratory data analysis, new features creation, features selection, ensemble learning, models construction, and verification to identify suspicious network events. This paper proposes a modified form of stacking ensemble machine learning which includes AdaBoost, Neural Networks, Random Forest, LightGBM, and Extremely Randomised Trees (Extra Trees) to realise a high-performance classification. A suspicious network event recognition dataset for a security operations centre, which uses real network log observations from the 2019 IEEE BigData Cup Challenge, is used as an experimental dataset. This paper investigates the possibility of integrating big-data analytics, machine learning, and data science to improve intelligent cybersecurity.
2020-02-10
Ishtiaq, Asra, Islam, Muhammad Arshad, Azhar Iqbal, Muhammad, Aleem, Muhammad, Ahmed, Usman.  2019.  Graph Centrality Based Spam SMS Detection. 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :629–633.

Short messages usage has been tremendously increased such as SMS, tweets and status updates. Due to its popularity and ease of use, many companies use it for advertisement purpose. Hackers also use SMS to defraud users and steal personal information. In this paper, the use of Graphs centrality metrics is proposed for spam SMS detection. The graph centrality measures: degree, closeness, and eccentricity are used for classification of SMS. Graphs for each class are created using labeled SMS and then unlabeled SMS is classified using the centrality scores of the token available in the unclassified SMS. Our results show that highest precision and recall is achieved by using degree centrality. Degree centrality achieved the highest precision i.e. 0.81 and recall i.e., 0.76 for spam messages.

2020-01-06
Mo, Ran, Liu, Jianfeng, Yu, Wentao, Jiang, Fu, Gu, Xin, Zhao, Xiaoshuai, Liu, Weirong, Peng, Jun.  2019.  A Differential Privacy-Based Protecting Data Preprocessing Method for Big Data Mining. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :693–699.

Analyzing clustering results may lead to the privacy disclosure issue in big data mining. In this paper, we put forward a differential privacy-based protecting data preprocessing method for distance-based clustering. Firstly, the data distortion technique differential privacy is used to prevent the distances in distance-based clustering from disclosing the relationships. Differential privacy may affect the clustering results while protecting privacy. Then an adaptive privacy budget parameter adjustment mechanism is applied for keeping the balance between the privacy protection and the clustering results. By solving the maximum and minimum problems, the differential privacy budget parameter can be obtained for different clustering algorithms. Finally, we conduct extensive experiments to evaluate the performance of our proposed method. The results demonstrate that our method can provide privacy protection with precise clustering results.

2019-02-25
Popovac, M., Karanovic, M., Sladojevic, S., Arsenovic, M., Anderla, A..  2018.  Convolutional Neural Network Based SMS Spam Detection. 2018 26th Telecommunications Forum (℡FOR). :1–4.
SMS spam refers to undesired text message. Machine Learning methods for anti-spam filters have been noticeably effective in categorizing spam messages. Dataset used in this research is known as Tiago's dataset. Crucial step in the experiment was data preprocessing, which involved reducing text to lower case, tokenization, removing stopwords. Convolutional Neural Network was the proposed method for classification. Overall model's accuracy was 98.4%. Obtained model can be used as a tool in many applications.
2017-03-07
Sadri, Mehdi, Mehrotra, Sharad, Yu, Yaming.  2016.  Online Adaptive Topic Focused Tweet Acquisition. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2353–2358.

Twitter provides a public streaming API that is strictly limited, making it difficult to simultaneously achieve good coverage and relevance when monitoring tweets for a specific topic of interest. In this paper, we address the tweet acquisition challenge to enhance monitoring of tweets based on the client/application needs in an online adaptive manner such that the quality and quantity of the results improves over time. We propose a Tweet Acquisition System (TAS), that iteratively selects phrases to track based on an explore-exploit strategy. Our experimental studies show that TAS significantly improves recall of relevant tweets and the performance improves when the topics are more specific.

2015-04-30
Katkar, V.D., Bhatia, D.S..  2014.  Lightweight approach for detection of denial of service attacks using numeric to binary preprocessing. Circuits, Systems, Communication and Information Technology Applications (CSCITA), 2014 International Conference on. :207-212.


Denial of Service (DoS) and Distributed Denial of Service (DDoS) attack, exhausts the resources of server/service and makes it unavailable for legitimate users. With increasing use of online services and attacks on these services, the importance of Intrusion Detection System (IDS) for detection of DoS/DDoS attacks has also grown. Detection accuracy & CPU utilization of Data mining based IDS is directly proportional to the quality of training dataset used to train it. Various preprocessing methods like normalization, discretization, fuzzification are used by researchers to improve the quality of training dataset. This paper evaluates the effect of various data preprocessing methods on the detection accuracy of DoS/DDoS attack detection IDS and proves that numeric to binary preprocessing method performs better compared to other methods. Experimental results obtained using KDD 99 dataset are provided to support the efficiency of proposed combination.