Title | Simplex Based Vector Mapping for Categorical Attributes Clustering |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | An, Ning, Jiang, Siyuan, Yang, Jiaoyun, Li, Lian |
Conference Name | Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-6595-6 |
Keywords | attribution, Categorical Attributes, clustering, composability, Human Behavior, human-in-the-loop security center paradigm, Metrics, pubcrawl, Simplex Theory, Vector Mapping |
Abstract | When clustering unlabeled data, categorical attributes are usually treated differently from numerical attributes because of their unique characteristics, which introduces difficulties in clustering data with both types of attributes. In this paper, we propose a strategy to map categorical attributes to high dimensional vectors based on the Simplex Theory, hence categorical attributes could be handled the same as numeral attributes. To achieve identical distances between any two values under Euclidean distance, we theoretically prove a categorical attribute with n types of values should be mapped to at least n-1 dimensional vectors. Furthermore, numerical vector mapping solutions are provided on condition of 0 normalized constraint. Experimentally, we show that integrating our vector mapping strategy with K-means algorithm achieves better accuracy than integrating similarities for categorical attributes with K-modes algorithm on four datasets. |
URL | http://doi.acm.org/10.1145/3293475.3293481 |
DOI | 10.1145/3293475.3293481 |
Citation Key | an_simplex_2018 |