Filters: Keyword is Categorical data [Clear All Filters]
Data Imputation Techniques: An Empirical Study using Chronic Kidney Disease and Life Expectancy Datasets. 2022 International Conference on Innovative Trends in Information Technology (ICITIIT). :1—7.
2022. Data is a collection of information from the activities of the real world. The file in which such data is stored after transforming into a form that machines can process is generally known as data set. In the real world, many data sets are not complete, and they contain various types of noise. Missing values is of one such kind. Thus, imputing data of these missing values is one of the significant task of data pre-processing. This paper deals with two real time health care data sets namely life expectancy (LE) dataset and chronic kidney disease (CKD) dataset, which are very different in their nature. This paper provides insights on various data imputation techniques to fill missing values by analyzing them. When coming to Data imputation, it is very common to impute the missing values with measure of central tendencies like mean, median, mode Which can represent the central value of distribution but choosing the apt choice is real challenge. In accordance with best of our knowledge this is the first and foremost paper which provides the complete analysis of impact of basic data imputation techniques on various data distributions which can be classified based on the size of data set, number of missing values, type of data (categorical/numerical), etc. This paper compared and analyzed the original data distribution with the data distribution after each imputation in terms of their skewness, outliers and by various descriptive statistic parameters.
Feature-Weighted Fuzzy K-Modes Clustering. Proceedings of the 2019 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence. :63–68.
2019. Fuzzy k-modes (FKM) are variants of fuzzy c-means used for categorical data. The FKM algorithms generally treat feature components with equal importance. However, in clustering process, different feature weights need to be assigned for feature components because some irrelevant features may degrade the performance of the FKM algorithms. In this paper, we propose a novel algorithm, called feature-weighted fuzzy k-modes (FW-FKM), to improve FKM with a feature-weight entropy term such that it can automatically compute different feature weights for categorical data. Some numerical and real data sets are used to compare FW-FKM with some existing methods in the literature. Experimental results and comparisons actually demonstrate these good aspects of the proposed FW-FKM with its effectiveness and usefulness in practice.