Biblio

Filters: Keyword is dimensionality reduction  [Clear All Filters]
2023-03-31
Bauspieß, Pia, Olafsson, Jonas, Kolberg, Jascha, Drozdowski, Pawel, Rathgeb, Christian, Busch, Christoph.  2022.  Improved Homomorphically Encrypted Biometric Identification Using Coefficient Packing. 2022 International Workshop on Biometrics and Forensics (IWBF). :1–6.

Efficient large-scale biometric identification is a challenging open problem in biometrics today. Adding biometric information protection by cryptographic techniques increases the computational workload even further. Therefore, this paper proposes an efficient and improved use of coefficient packing for homomorphically protected biometric templates, allowing for the evaluation of multiple biometric comparisons at the cost of one. In combination with feature dimensionality reduction, the proposed technique facilitates a quadratic computational workload reduction for biometric identification, while long-term protection of the sensitive biometric data is maintained throughout the system. In previous works on using coefficient packing, only a linear speed-up was reported. In an experimental evaluation on a public face database, efficient identification in the encrypted domain is achieved on off-the-shelf hardware with no loss in recognition performance. In particular, the proposed improved use of coefficient packing allows for a computational workload reduction down to 1.6% of a conventional homomorphically protected identification system without improved packing.

2023-01-30
Cámara, Javier, Wohlrab, Rebekka, Garlan, David, Schmerl, Bradley.  2022.  ExTrA: Explaining architectural design tradeoff spaces via dimensionality reduction. Journal of Systems and Software. 198

In software design, guaranteeing the correctness of run-time system behavior while achieving an acceptable balance among multiple quality attributes remains a challenging problem. Moreover, providing guarantees about the satisfaction of those requirements when systems are subject to uncertain environments is even more challenging. While recent developments in architectural analysis techniques can assist architects in exploring the satisfaction of quantitative guarantees across the design space, existing approaches are still limited because they do not explicitly link design decisions to satisfaction of quality requirements. Furthermore, the amount of information they yield can be overwhelming to a human designer, making it difficult to see the forest for the trees. In this paper we present ExTrA (Explaining Tradeoffs of software Architecture design spaces), an approach to analyzing architectural design spaces that addresses these limitations and provides a basis for explaining design tradeoffs. Our approach employs dimensionality reduction techniques employed in machine learning pipelines like Principal Component Analysis (PCA) and Decision Tree Learning (DTL) to enable architects to understand how design decisions contribute to the satisfaction of extra-functional properties across the design space. Our results show feasibility of the approach in two case studies and evidence that combining complementary techniques like PCA and DTL is a viable approach to facilitate comprehension of tradeoffs in poorly-understood design spaces.

2023-09-20
He, Zhenghao.  2022.  Comparison Of Different Machine Learning Methods Applied To Obesity Classification. 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE). :467—472.
Estimation for obesity levels is always an important topic in medical field since it can provide useful guidance for people that would like to lose weight or keep fit. The article tries to find a model that can predict obesity and provides people with the information of how to avoid overweight. To be more specific, this article applied dimension reduction to the data set to simplify the data and tried to Figure out a most decisive feature of obesity through Principal Component Analysis (PCA) based on the data set. The article also used some machine learning methods like Support Vector Machine (SVM), Decision Tree to do prediction of obesity and wanted to find the major reason of obesity. In addition, the article uses Artificial Neural Network (ANN) to do prediction which has more powerful feature extraction ability to do this. Finally, the article found that family history of obesity is the most decisive feature, and it may because of obesity may be greatly affected by genes or the family eating diet may have great influence. And both ANN and Decision tree’s accuracy of prediction is higher than 90%.
2022-07-14
Gong, Changqing, Dong, Zhaoyang, Gani, Abdullah, Qi, Han.  2021.  Quantum Ciphertext Dimension Reduction Scheme for Homomorphic Encrypted Data. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :903—910.

At present, in the face of the huge and complex data in cloud computing, the parallel computing ability of quantum computing is particularly important. Quantum principal component analysis algorithm is used as a method of quantum state tomography. We perform feature extraction on the eigenvalue matrix of the density matrix after feature decomposition to achieve dimensionality reduction, proposed quantum principal component extraction algorithm (QPCE). Compared with the classic algorithm, this algorithm achieves an exponential speedup under certain conditions. The specific realization of the quantum circuit is given. And considering the limited computing power of the client, we propose a quantum homomorphic ciphertext dimension reduction scheme (QHEDR), the client can encrypt the quantum data and upload it to the cloud for computing. And through the quantum homomorphic encryption scheme to ensure security. After the calculation is completed, the client updates the key locally and decrypts the ciphertext result. We have implemented a quantum ciphertext dimensionality reduction scheme implemented in the quantum cloud, which does not require interaction and ensures safety. In addition, we have carried out experimental verification on the QPCE algorithm on IBM's real computing platform. Experimental results show that the algorithm can perform ciphertext dimension reduction safely and effectively.

2022-01-12
Cámara, Javier, Silva, Mariana, Garlan, David, Schmerl, Bradley.  2021.  Explaining Architectural Design Tradeoff Spaces: a Machine Learning Approach. Proceedings of the 15th European Conference on Software Architecture, Virtual (Originally, Vaxjo Sweden).
In software design, guaranteeing the correctness of run-time system behavior while achieving an acceptable balance among multiple quality attributes remains a challenging problem. Moreover, providing guarantees about the satisfaction of those requirements when systems are subject to uncertain environments is even more challenging. While recent developments in architectural analysis techniques can assist architects in exploring the satisfaction of quantitative guarantees across the design space, existing approaches are still limited because they do not explicitly link design decisions to satisfaction of quality requirements. Furthermore, the amount of information they yield can be overwhelming to a human designer, making it difficult to distinguish the forest through the trees. In this paper, we present an approach to analyzing architectural design spaces that addresses these limitations and provides a basis to enable the explainability of design tradeoffs. Our approach combines dimensionality reduction techniques employed in machine learning pipelines with quantitative verification to enable architects to understand how design decisions contribute to the satisfaction of strict quantitative guarantees under uncertainty across the design space. Our results show feasibility of the approach in two case studies and evidence that dimensionality reduction is a viable approach to facilitate comprehension of tradeoffs in poorly-understood design spaces.
2022-05-19
Zhang, Xiaoyu, Fujiwara, Takanori, Chandrasegaran, Senthil, Brundage, Michael P., Sexton, Thurston, Dima, Alden, Ma, Kwan-Liu.  2021.  A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data. 2021 IEEE 14th Pacific Visualization Symposium (PacificVis). :196–205.
Analysis of large, high-dimensional, and heterogeneous datasets is challenging as no one technique is suitable for visualizing and clustering such data in order to make sense of the underlying information. For instance, heterogeneous logs detailing machine repair and maintenance in an organization often need to be analyzed to diagnose errors and identify abnormal patterns, formalize root-cause analyses, and plan preventive maintenance. Such real-world datasets are also beset by issues such as inconsistent and/or missing entries. To conduct an effective diagnosis, it is important to extract and understand patterns from the data with support from analytic algorithms (e.g., finding that certain kinds of machine complaints occur more in the summer) while involving the human-in-the-loop. To address these challenges, we adopt existing techniques for dimensionality reduction (DR) and clustering of numerical, categorical, and text data dimensions, and introduce a visual analytics approach that uses multiple coordinated views to connect DR + clustering results across each kind of the data dimension stated. To help analysts label the clusters, each clustering view is supplemented with techniques and visualizations that contrast a cluster of interest with the rest of the dataset. Our approach assists analysts to make sense of machine maintenance logs and their errors. Then the gained insights help them carry out preventive maintenance. We illustrate and evaluate our approach through use cases and expert studies respectively, and discuss generalization of the approach to other heterogeneous data.
2022-06-09
Qiang, Rong.  2021.  Improved Depth Neural Network Industrial Control Security Algorithm Based On PCA Dimension Reduction. 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). :891–894.
In order to improve the security and anti-interference ability of industrial control system, this paper proposes an improved industrial neural network defense method based on the PCA dimension reduction and the improved deep neural network. Firstly, the proposed method reduces the dimensionality of the industrial data using the dimension reduction theory of principal component analysis (PCA). Then the deep neural network extracts the features of the network. Finally, the softmax classifier classifies industrial data. Experiment results show that compared with unintegrated algorithm, this method achieves higher recognition accuracy and has great application potential.
2022-05-19
Anusha, M, Leelavathi, R.  2021.  Analysis on Sentiment Analytics Using Deep Learning Techniques. 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :542–547.
Sentiment analytics is the process of applying natural language processing and methods for text-based information to define and extract subjective knowledge of the text. Natural language processing and text classifications can deal with limited corpus data and more attention has been gained by semantic texts and word embedding methods. Deep learning is a powerful method that learns different layers of representations or qualities of information and produces state-of-the-art prediction results. In different applications of sentiment analytics, deep learning methods are used at the sentence, document, and aspect levels. This review paper is based on the main difficulties in the sentiment assessment stage that significantly affect sentiment score, pooling, and polarity detection. The most popular deep learning methods are a Convolution Neural Network and Recurrent Neural Network. Finally, a comparative study is made with a vast literature survey using deep learning models.
2021-09-21
Ghanem, Sahar M., Aldeen, Donia Naief Saad.  2020.  AltCC: Alternating Clustering and Classification for Batch Analysis of Malware Behavior. 2020 International Symposium on Networks, Computers and Communications (ISNCC). :1–6.
The most common goal of malware analysis is to determine if a given binary is malware or benign. Another objective is similarity analysis of malware binaries to understand how new samples differ from known ones. Similarity analysis helps to analyze the malware with respect to those already analyzed and guides the discovery of novel aspects that should be analyzed more in depth. In this work, we are concerned with similarities and differences detection of malware binaries. Thousands of malware are created every day and machine learning is an indispensable tool for its analysis. Previous work has studied clustering and classification as competing paradigms. However, in this work, a malware similarity analysis technique (AltCC) is proposed that alternates the use of clustering and classification. In addition it assumes the malware are not available all at once but processed in batches. Initially, clustering is applied to the first batch to group similar binaries into novel malware classes. Then, the discovered classes are used to train a classifier. For the following batches, the classifier is used to decide if a new binary classifies to a known class or otherwise unclassified. The unclassified binaries are clustered and the process repeats. Malware clustering (i.e. labeling) may entail further human expert analysis but dramatically reduces the effort. The effectiveness of AltCC is studied using a dataset of 29,661 malware binaries that represent malware received in six consecutive days/batches. When KMeans is used to label the dataset all at once and its labeling is compared to AltCC's, the adjusted-rand-index scores 0.71.
2021-06-24
Nilă, Constantin, Patriciu, Victor.  2020.  Taking advantage of unsupervised learning in incident response. 2020 12th International Conference on Electronics, Computers and Artificial Intelligence (ECAI). :1–6.
This paper looks at new ways to improve the necessary time for incident response triage operations. By employing unsupervised K-means, enhanced by both manual and automated feature extraction techniques, the incident response team can quickly and decisively extrapolate malicious web requests that concluded to the investigated exploitation. More precisely, we evaluated the benefits of different visualization enhancing methods that can improve feature selection and other dimensionality reduction techniques. Furthermore, early tests of the gross framework have shown that the necessary time for triage is diminished, more so if a hybrid multi-model is employed. Our case study revolved around the need for unsupervised classification of unknown web access logs. However, the demonstrated principals may be considered for other applications of machine learning in the cybersecurity domain.
2021-09-08
Ali, Jehad, Roh, Byeong-hee, Lee, Byungkyu, Oh, Jimyung, Adil, Muhammad.  2020.  A Machine Learning Framework for Prevention of Software-Defined Networking Controller from DDoS Attacks and Dimensionality Reduction of Big Data. 2020 International Conference on Information and Communication Technology Convergence (ICTC). :515–519.
The controller is an indispensable entity in software-defined networking (SDN), as it maintains a global view of the underlying network. However, if the controller fails to respond to the network due to a distributed denial of service (DDoS) attacks. Then, the attacker takes charge of the whole network via launching a spoof controller and can also modify the flow tables. Hence, faster, and accurate detection of DDoS attacks against the controller will make the SDN reliable and secure. Moreover, the Internet traffic is drastically increasing due to unprecedented growth of connected devices. Consequently, the processing of large number of requests cause a performance bottleneck regarding SDN controller. In this paper, we propose a hierarchical control plane SDN architecture for multi-domain communication that uses a statistical method called principal component analysis (PCA) to reduce the dimensionality of the big data traffic and the support vector machine (SVM) classifier is employed to detect a DDoS attack. SVM has high accuracy and less false positive rate while the PCA filters attribute drastically. Consequently, the performance of classification and accuracy is improved while the false positive rate is reduced.
2020-08-17
Yao, Yepeng, Su, Liya, Lu, Zhigang, Liu, Baoxu.  2019.  STDeepGraph: Spatial-Temporal Deep Learning on Communication Graphs for Long-Term Network Attack Detection. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :120–127.
Network communication data are high-dimensional and spatiotemporal, and their information content is often degraded by common traffic analysis methods. For long-term network attack detection based on network flows, it is important to extract a discriminative, high-dimensional intrinsic representation of such flows. This work focuses on a hybrid deep neural network design using a combination of a convolutional neural network (CNN) and long short-term memory (LSTM) with graph similarity measures to learn high-dimensional representations from the network traffic. In particular, examining a set of network flows, we commence by constructing a temporal communication graph and then computing graph kernel matrices. Having obtained the kernel matrices, for each graph, we use the kernel value between graphs and calculate graph characterization vectors by graph signal processing. This vector can be regarded as a kernel-based similarity embedding vector of the graph that integrates structural similarity information and leverages efficient graph kernel using the graph Laplacian matrix. Our approach exploits graph structures as the additional prior information, the graph Laplacian matrix for feature extraction and hybrid deep learning models for long-term information learning on communication graphs. Experiments on two real-world network attack datasets show that our approach can extract more discriminative representations, leading to an improved accuracy in a supervised classification task. The experimental results show that our method increases the overall accuracy by approximately 10%-15%.
2020-12-01
Abdulhammed, R., Faezipour, M., Musafer, H., Abuzneid, A..  2019.  Efficient Network Intrusion Detection Using PCA-Based Dimensionality Reduction of Features. 2019 International Symposium on Networks, Computers and Communications (ISNCC). :1—6.

Designing a machine learning based network intrusion detection system (IDS) with high-dimensional features can lead to prolonged classification processes. This is while low-dimensional features can reduce these processes. Moreover, classification of network traffic with imbalanced class distributions has posed a significant drawback on the performance attainable by most well-known classifiers. With the presence of imbalanced data, the known metrics may fail to provide adequate information about the performance of the classifier. This study first uses Principal Component Analysis (PCA) as a feature dimensionality reduction approach. The resulting low-dimensional features are then used to build various classifiers such as Random Forest (RF), Bayesian Network, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) for designing an IDS. The experimental findings with low-dimensional features in binary and multi-class classification show better performance in terms of Detection Rate (DR), F-Measure, False Alarm Rate (FAR), and Accuracy. Furthermore, in this paper, we apply a Multi-Class Combined performance metric Combi ned Mc with respect to class distribution through incorporating FAR, DR, Accuracy, and class distribution parameters. In addition, we developed a uniform distribution based balancing approach to handle the imbalanced distribution of the minority class instances in the CICIDS2017 network intrusion dataset. We were able to reduce the CICIDS2017 dataset's feature dimensions from 81 to 10 using PCA, while maintaining a high accuracy of 99.6% in multi-class and binary classification.

2019-10-15
Panagiotakis, C., Papadakis, H., Fragopoulou, P..  2018.  Detection of Hurriedly Created Abnormal Profiles in Recommender Systems. 2018 International Conference on Intelligent Systems (IS). :499–506.

Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.

2020-10-12
Sharafaldin, Iman, Ghorbani, Ali A..  2018.  EagleEye: A Novel Visual Anomaly Detection Method. 2018 16th Annual Conference on Privacy, Security and Trust (PST). :1–6.
We propose a novel visualization technique (Eagle-Eye) for intrusion detection, which visualizes a host as a commu- nity of system call traces in two-dimensional space. The goal of EagleEye is to visually cluster the system call traces. Although human eyes can easily perceive anomalies using EagleEye view, we propose two different methods called SAM and CPM that use the concept of data depth to help administrators distinguish between normal and abnormal behaviors. Our experimental results conducted on Australian Defence Force Academy Linux Dataset (ADFA-LD), which is a modern system calls dataset that includes new exploits and attacks on various programs, show EagleEye's efficiency in detecting diverse exploits and attacks.
2018-06-11
Fujiwara, Yasuhiro, Marumo, Naoki, Blondel, Mathieu, Takeuchi, Koh, Kim, Hideaki, Iwata, Tomoharu, Ueda, Naonori.  2017.  Scaling Locally Linear Embedding. Proceedings of the 2017 ACM International Conference on Management of Data. :1479–1492.
Locally Linear Embedding (LLE) is a popular approach to dimensionality reduction as it can effectively represent nonlinear structures of high-dimensional data. For dimensionality reduction, it computes a nearest neighbor graph from a given dataset where edge weights are obtained by applying the Lagrange multiplier method, and it then computes eigenvectors of the LLE kernel where the edge weights are used to obtain the kernel. Although LLE is used in many applications, its computation cost is significantly high. This is because, in obtaining edge weights, its computation cost is cubic in the number of edges to each data point. In addition, the computation cost in obtaining the eigenvectors of the LLE kernel is cubic in the number of data points. Our approach, Ripple, is based on two ideas: (1) it incrementally updates the edge weights by exploiting the Woodbury formula and (2) it efficiently computes eigenvectors of the LLE kernel by exploiting the LU decomposition-based inverse power method. Experiments show that Ripple is significantly faster than the original approach of LLE by guaranteeing the same results of dimensionality reduction.
2017-12-28
Liu, H., Ditzler, G..  2017.  A fast information-theoretic approximation of joint mutual information feature selection. 2017 International Joint Conference on Neural Networks (IJCNN). :4610–4617.

Feature selection is an important step in data analysis to address the curse of dimensionality. Such dimensionality reduction techniques are particularly important when if a classification is required and the model scales in polynomial time with the size of the feature (e.g., some applications include genomics, life sciences, cyber-security, etc.). Feature selection is the process of finding the minimum subset of features that allows for the maximum predictive power. Many of the state-of-the-art information-theoretic feature selection approaches use a greedy forward search; however, there are concerns with the search in regards to the efficiency and optimality. A unified framework was recently presented for information-theoretic feature selection that tied together many of the works in over the past twenty years. The work showed that joint mutual information maximization (JMI) is generally the best options; however, the complexity of greedy search for JMI scales quadratically and it is infeasible on high dimensional datasets. In this contribution, we propose a fast approximation of JMI based on information theory. Our approach takes advantage of decomposing the calculations within JMI to speed up a typical greedy search. We benchmarked the proposed approach against JMI on several UCI datasets, and we demonstrate that the proposed approach returns feature sets that are highly consistent with JMI, while decreasing the run time required to perform feature selection.

2015-05-05
Zadeh, B.Q., Handschuh, S..  2014.  Random Manhattan Indexing. Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. :203-208.

Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in text processing. In these models, high-dimensional, often sparse vectors represent text units. In an application, the similarity of vectors -- and hence the text units that they represent -- is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a new method, called Random Manhattan Indexing (RMI), for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental, and thus scalable, procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections.