Biblio

List
Filter

Found 5 results

Filters: Keyword is Distribution functions [Clear All Filters]

2022-03-08

Ma, Xiaoyu, Yang, Tao, Chen, Jiangchuan, Liu, Ziyu. 2021. k-Nearest Neighbor algorithm based on feature subspace. 2021 International Conference on Big Data Analysis and Computer Science (BDACS). :225—228.

The traditional KNN algorithm takes insufficient consideration of the spatial distribution of training samples, which leads to low accuracy in processing high-dimensional data sets. Moreover, the generation of k nearest neighbors requires all known samples to participate in the distance calculation, resulting in high time overhead. To solve these problems, a feature subspace based KNN algorithm (Feature Subspace KNN, FSS-KNN) is proposed in this paper. First, the FSS-KNN algorithm solves all the feature subspaces according to the distribution of the training samples in the feature space, so as to ensure that the samples in the same subspace have higher similarity. Second, the corresponding feature subspace is matched for the test set samples. On this basis, the search of k nearest neighbors is carried out in the corresponding subspace first, thus improving the accuracy and efficiency of the algorithm. Experimental results show that compared with the traditional KNN algorithm, FSS-KNN algorithm improves the accuracy and efficiency on Kaggle data set and UCI data set. Compared with the other four classical machine learning algorithms, FSS-KNN algorithm can significantly improve the accuracy.

2020-09-04

Moe, Khin Su Myat, Win, Thanda. 2018. Enhanced Honey Encryption Algorithm for Increasing Message Space against Brute Force Attack. 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). :86—89.

In the era of digitization, data security is a vital role in message transmission and all systems that deal with users require stronger encryption techniques that against brute force attack. Honey encryption (HE) algorithm is a user data protection algorithm that can deceive the attackers from unauthorized access to user, database and websites. The main part of conventional HE is distribution transforming encoder (DTE). However, the current DTE process using cumulative distribution function (CDF) has the weakness in message space limitation because CDF cannot solve the probability theory in more than four messages. So, we propose a new method in DTE process using discrete distribution function in order to solve message space limitation problem. In our proposed honeywords generation method, the current weakness of existing honeywords generation method such as storage overhead problem can be solved. In this paper, we also describe the case studies calculation of DTE in order to prove that new DTE process has no message space limitation and mathematical model using discrete distribution function for DTE process facilitates the distribution probability theory.

2020-03-23

Hayashi, Masahito. 2019. Semi-Finite Length Analysis for Secure Random Number Generation. 2019 IEEE International Symposium on Information Theory (ISIT). :952–956.

To discuss secure key generation from imperfect random numbers, we address the secure key generation length. There are several studies for its asymptotic expansion up to the order √n or log n. However, these expansions have errors of the order o(√n) or o(log n), which does not go to zero asymptotically. To resolve this problem, we derive the asymptotic expansion up to the constant order for upper and lower bounds of these optimal values. While the expansions of upper and lower bonds do not match, they clarify the ranges of these optimal values, whose errors go to zero asymptotically.

2019-02-25

Lekshmi, M. B., Deepthi, V. R.. 2018. Spam Detection Framework for Online Reviews Using Hadoop’ s Computational Capability. 2018 International CET Conference on Control, Communication, and Computing (IC4). :436–440.

Nowadays, online reviews have become one of the vital elements for customers to do online shopping. Organizations and individuals use this information to buy the right products and make business decisions. This has influenced the spammers or unethical business people to create false reviews and promote their products to out-beat competitions. Sophisticated systems are developed by spammers to create bulk of spam reviews in any websites within hours. To tackle this problem, studies have been conducted to formulate effective ways to detect the spam reviews. Various spam detection methods have been introduced in which most of them extracts meaningful features from the text or used machine learning techniques. These approaches gave little importance on extracted feature type and processing rate. NetSpam[1] defines a framework which can classify the review dataset based on spam features and maps them to a spam detection procedure which performs better than previous works in predictive accuracy. In this work, a method is proposed that can improve the processing rate by applying a distributed approach on review dataset using MapReduce feature. Parallel programming concept using MapReduce is used for processing big data in Hadoop. The solution involves parallelising the algorithm defined in NetSpam and it defines a spam detection procedure with better predictive accuracy and processing rate.

2017-03-08

Prabhakar, A., Flaßkamp, K., Murphey, T. D.. 2015. Symplectic integration for optimal ergodic control. 2015 54th IEEE Conference on Decision and Control (CDC). :2594–2600.

Autonomous active exploration requires search algorithms that can effectively balance the need for workspace coverage with energetic costs. We present a strategy for planning optimal search trajectories with respect to the distribution of expected information over a workspace. We formulate an iterative optimal control algorithm for general nonlinear dynamics, where the metric for information gain is the difference between the spatial distribution and the statistical representation of the time-averaged trajectory, i.e. ergodicity. Previous work has designed a continuous-time trajectory optimization algorithm. In this paper, we derive two discrete-time iterative trajectory optimization approaches, one based on standard first-order discretization and the other using symplectic integration. The discrete-time methods based on first-order discretization techniques are both faster than the continuous-time method in the studied examples. Moreover, we show that even for a simple system, the choice of discretization has a dramatic impact on the resulting control and state trajectories. While the standard discretization method turns unstable, the symplectic method, which is structure-preserving, achieves lower values for the objective.