Title | Compositional Data Analysis with PLS-DA and Security Applications |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Ankam, D., Bouguila, N. |
Conference Name | 2018 IEEE International Conference on Information Reuse and Integration (IRI) |
Date Published | jul |
Keywords | Aitchison transformation, Compositional data, compositional data analysis, compositional vectors, compositionality, Data analysis, Data models, data science models, data-based power transformation, Electronic mail, Euclidean distance, ILR, Information Reuse and Security, information system security applications, Intrusion detection, isometric log ratio transformation, Least squares approximations, Loading, Mathematical model, partial least squares discriminant analysis, PLS-DA algorithm, pubcrawl, Resiliency, security of data, spam filtering, spam filters, Standards, statistical analysis, unsolicited e-mail |
Abstract | In Compositional data, the relative proportions of the components contain important relevant information. In such case, Euclidian distance fails to capture variation when considered within data science models and approaches such as partial least squares discriminant analysis (PLS-DA). Indeed, the Euclidean distance assumes implicitly that the data is normally distributed which is not the case of compositional vectors. Aitchison transformation has been considered as a standard in compositional data analysis. In this paper, we consider two other transformation methods, Isometric log ratio (ILR) transformation and data-based power (alpha) transformation, before feeding the data to PLS-DA algorithm for classification [1]. In order to investigate the merits of both methods, we apply them in two challenging information system security applications namely spam filtering and intrusion detection. |
DOI | 10.1109/IRI.2018.00058 |
Citation Key | ankam_compositional_2018 |