Visible to the public Simplex Based Vector Mapping for Categorical Attributes Clustering

TitleSimplex Based Vector Mapping for Categorical Attributes Clustering
Publication TypeConference Paper
Year of Publication2018
AuthorsAn, Ning, Jiang, Siyuan, Yang, Jiaoyun, Li, Lian
Conference NameProceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-6595-6
Keywordsattribution, Categorical Attributes, clustering, composability, Human Behavior, human-in-the-loop security center paradigm, Metrics, pubcrawl, Simplex Theory, Vector Mapping
AbstractWhen clustering unlabeled data, categorical attributes are usually treated differently from numerical attributes because of their unique characteristics, which introduces difficulties in clustering data with both types of attributes. In this paper, we propose a strategy to map categorical attributes to high dimensional vectors based on the Simplex Theory, hence categorical attributes could be handled the same as numeral attributes. To achieve identical distances between any two values under Euclidean distance, we theoretically prove a categorical attribute with n types of values should be mapped to at least n-1 dimensional vectors. Furthermore, numerical vector mapping solutions are provided on condition of 0 normalized constraint. Experimentally, we show that integrating our vector mapping strategy with K-means algorithm achieves better accuracy than integrating similarities for categorical attributes with K-modes algorithm on four datasets.
URLhttp://doi.acm.org/10.1145/3293475.3293481
DOI10.1145/3293475.3293481
Citation Keyan_simplex_2018