Visible to the public Biblio

Filters: Keyword is dimension reduction  [Clear All Filters]
2023-09-20
He, Zhenghao.  2022.  Comparison Of Different Machine Learning Methods Applied To Obesity Classification. 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE). :467—472.
Estimation for obesity levels is always an important topic in medical field since it can provide useful guidance for people that would like to lose weight or keep fit. The article tries to find a model that can predict obesity and provides people with the information of how to avoid overweight. To be more specific, this article applied dimension reduction to the data set to simplify the data and tried to Figure out a most decisive feature of obesity through Principal Component Analysis (PCA) based on the data set. The article also used some machine learning methods like Support Vector Machine (SVM), Decision Tree to do prediction of obesity and wanted to find the major reason of obesity. In addition, the article uses Artificial Neural Network (ANN) to do prediction which has more powerful feature extraction ability to do this. Finally, the article found that family history of obesity is the most decisive feature, and it may because of obesity may be greatly affected by genes or the family eating diet may have great influence. And both ANN and Decision tree’s accuracy of prediction is higher than 90%.
2021-02-23
Xia, H., Gao, N., Peng, J., Mo, J., Wang, J..  2020.  Binarized Attributed Network Embedding via Neural Networks. 2020 International Joint Conference on Neural Networks (IJCNN). :1—8.
Traditional attributed network embedding methods are designed to map structural and attribute information of networks jointly into a continuous Euclidean space, while recently a novel branch of them named binarized attributed network embedding has emerged to learn binary codes in Hamming space, aiming to save time and memory costs and to naturally fit node retrieval task. However, current binarized attributed network embedding methods are scarce and mostly ignore the local attribute similarity between each pair of nodes. Besides, none of them attempt to control the independency of each dimension(bit) of the learned binary representation vectors. As existing methods still need improving, we propose an unsupervised Neural-based Binarized Attributed Network Embedding (NBANE) approach. Firstly, we inherit the Weisfeiler-Lehman proximity matrix from predecessors to aggregate high-order features for each node. Secondly, we feed the aggregated features into an autoencoder with the attribute similarity penalizing term and the orthogonality term to make further dimension reduction. To solve the problem of integer optimization we adopt the relaxation-quantization method during the process of training neural networks. Empirically, we evaluate the performance of NBANE through node classification and clustering tasks on three real-world datasets and study a case on fast retrieval in academic networks. Our method achieves better performance over state- of-the-art baselines methods of various types.
2017-05-16
Pandey, Shishir, Vaze, Rahul.  2016.  Trustworthiness of t-Distributed Stochastic Neighbour Embedding. Proceedings of the 3rd IKDD Conference on Data Science, 2016. :17:1–17:2.

A well known technique for embedding high dimensional objects in two or three dimensional space is the t-distributed stochastic neighbour embedding (t-SNE). The t-SNE minimizes the Kullback-Liebler (KL) divergence between two probability distributions, one induced on points in the high dimensional space and the other induced on points in the low dimensional embedding space. In this work, we consider a more general framework of using Rényi divergence which is parametrized by the order α, the KL-divergence is a special case when α → 1.We study how various Rényi divergences perform when compared to the KL-divergence. We show that in terms of the metrics of trustworthiness and neighbourhood preservation, the embedding becomes better as Rényi divergence approaches the KL-divergence.

2015-05-06
Huang, T., Drake, B., Aalfs, D., Vidakovic, B..  2014.  Nonlinear Adaptive Filtering with Dimension Reduction in the Wavelet Domain. Data Compression Conference (DCC), 2014. :408-408.

Recent advances in adaptive filter theory and the hardware for signal acquisition have led to the realization that purely linear algorithms are often not adequate in these domains. Nonlinearities in the input space have become apparent with today's real world problems. Algorithms that process the data must keep pace with the advances in signal acquisition. Recently kernel adaptive (online) filtering algorithms have been proposed that make no assumptions regarding the linearity of the input space. Additionally, advances in wavelet data compression/dimension reduction have also led to new algorithms that are appropriate for producing a hybrid nonlinear filtering framework. In this paper we utilize a combination of wavelet dimension reduction and kernel adaptive filtering. We derive algorithms in which the dimension of the data is reduced by a wavelet transform. We follow this by kernel adaptive filtering algorithms on the reduced-domain data to find the appropriate model parameters demonstrating improved minimization of the mean-squared error (MSE). Another important feature of our methods is that the wavelet filter is also chosen based on the data, on-the-fly. In particular, it is shown that by using a few optimal wavelet coefficients from the constructed wavelet filter for both training and testing data sets as the input to the kernel adaptive filter, convergence to the near optimal learning curve (MSE) results. We demonstrate these algorithms on simulated and a real data set from food processing.

2015-05-04
Pratanwanich, N., Lio, P..  2014.  Who Wrote This? Textual Modeling with Authorship Attribution in Big Data Data Mining Workshop (ICDMW), 2014 IEEE International Conference on. :645-652.

By representing large corpora with concise and meaningful elements, topic-based generative models aim to reduce the dimension and understand the content of documents. Those techniques originally analyze on words in the documents, but their extensions currently accommodate meta-data such as authorship information, which has been proved useful for textual modeling. The importance of learning authorship is to extract author interests and assign authors to anonymous texts. Author-Topic (AT) model, an unsupervised learning technique, successfully exploits authorship information to model both documents and author interests using topic representations. However, the AT model simplifies that each author has equal contribution on multiple-author documents. To overcome this limitation, we assumes that authors give different degrees of contributions on a document by using a Dirichlet distribution. This automatically transforms the unsupervised AT model to Supervised Author-Topic (SAT) model, which brings a novelty of authorship prediction on anonymous texts. The SAT model outperforms the AT model for identifying authors of documents written by either single authors or multiple authors with a better Receiver Operating Characteristic (ROC) curve and a significantly higher Area Under Curve (AUC). The SAT model not only achieves competitive performance to state-of-the-art techniques e.g. Random forests but also maintains the characteristics of the unsupervised models for information discovery i.e. Word distributions of topics, author interests, and author contributions.