Visible to the public Compositional Data Analysis with PLS-DA and Security Applications

TitleCompositional Data Analysis with PLS-DA and Security Applications
Publication TypeConference Paper
Year of Publication2018
AuthorsAnkam, D., Bouguila, N.
Conference Name2018 IEEE International Conference on Information Reuse and Integration (IRI)
Date Publishedjul
KeywordsAitchison transformation, Compositional data, compositional data analysis, compositional vectors, compositionality, Data analysis, Data models, data science models, data-based power transformation, Electronic mail, Euclidean distance, ILR, Information Reuse and Security, information system security applications, Intrusion detection, isometric log ratio transformation, Least squares approximations, Loading, Mathematical model, partial least squares discriminant analysis, PLS-DA algorithm, pubcrawl, Resiliency, security of data, spam filtering, spam filters, Standards, statistical analysis, unsolicited e-mail
AbstractIn Compositional data, the relative proportions of the components contain important relevant information. In such case, Euclidian distance fails to capture variation when considered within data science models and approaches such as partial least squares discriminant analysis (PLS-DA). Indeed, the Euclidean distance assumes implicitly that the data is normally distributed which is not the case of compositional vectors. Aitchison transformation has been considered as a standard in compositional data analysis. In this paper, we consider two other transformation methods, Isometric log ratio (ILR) transformation and data-based power (alpha) transformation, before feeding the data to PLS-DA algorithm for classification [1]. In order to investigate the merits of both methods, we apply them in two challenging information system security applications namely spam filtering and intrusion detection.
DOI10.1109/IRI.2018.00058
Citation Keyankam_compositional_2018