Visible to the public Biblio

Filters: Keyword is Statistics  [Clear All Filters]
2020-12-01
Wang, S., Mei, Y., Park, J., Zhang, M..  2019.  A Two-Stage Genetic Programming Hyper-Heuristic for Uncertain Capacitated Arc Routing Problem. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :1606—1613.

Genetic Programming Hyper-heuristic (GPHH) has been successfully applied to automatically evolve effective routing policies to solve the complex Uncertain Capacitated Arc Routing Problem (UCARP). However, GPHH typically ignores the interpretability of the evolved routing policies. As a result, GP-evolved routing policies are often very complex and hard to be understood and trusted by human users. In this paper, we aim to improve the interpretability of the GP-evolved routing policies. To this end, we propose a new Multi-Objective GP (MOGP) to optimise the performance and size simultaneously. A major issue here is that the size is much easier to be optimised than the performance, and the search tends to be biased to the small but poor routing policies. To address this issue, we propose a simple yet effective Two-Stage GPHH (TS-GPHH). In the first stage, only the performance is to be optimised. Then, in the second stage, both objectives are considered (using our new MOGP). The experimental results showed that TS-GPHH could obtain much smaller and more interpretable routing policies than the state-of-the-art single-objective GPHH, without deteriorating the performance. Compared with traditional MOGP, TS-GPHH can obtain a much better and more widespread Pareto front.

2020-11-23
Ma, S..  2018.  Towards Effective Genetic Trust Evaluation in Open Network. 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :563–569.
In open network environments, since there is no centralized authority to monitor misbehaving entities, malicious entities can easily cause the degradation of the service quality. Trust has become an important factor to ensure network security, which can help entities to distinguish good partners from bad ones. In this paper, trust in open network environment is regarded as a self-organizing system, using self-organization principle of human social trust propagation, a genetic trust evaluation method with self-optimization and family attributes is proposed. In this method, factors of trust evaluation include time, IP, behavior feedback and intuitive trust. Data structure of access record table and trust record table are designed to store the relationship between ancestor nodes and descendant nodes. A genetic trust search algorithm is designed by simulating the biological evolution process. Based on trust information of the current node's ancestors, heuristics generate randomly chromosome populations, whose structure includes time, IP address, behavior feedback and intuitive trust. Then crossover and mutation strategy is used to make the population evolutionary searching. According to the genetic searching termination condition, the optimal trust chromosome in the population is selected, and trust value of the chromosome is computed, which is the node's genetic trust evaluation result. The simulation result shows that the genetic trust evaluation method is effective, and trust evaluation process of the current node can be regarded as the process of searching for optimal trust results from the ancestor nodes' information. With increasing of ancestor nodes' genetic trust information, the trust evaluation result from genetic algorithm searching is more accurate, which can effectively solve the joint fraud problem.
2020-10-16
Kő, Andrea, Molnár, Tamás, Mátyus, Bálint.  2018.  A User-centred Design Approach for Mobile- Government Systems for the Elderly. 2018 12th International Conference on Software, Knowledge, Information Management Applications (SKIMA). :1—7.

This paper aims to discover the characteristics of acceptance of mobile government systems by elderly. Several initiatives and projects offer various governmental services for them, like information sharing, alerting and mHealth services. All of them carry important benefits for this user group, but these can only be utilized if the user acceptance is at a certain level. This is a requirement in order for the users to perceive the services as a benefit and not as hindrance. The key aspects for high acceptance are usability and user-friendliness, which will lead to successful-government systems designed for the target group. We have applied a combination of qualitative and quantitative research methods including an m-Government prototype to explore the key acceptance factors. Research approach utilizes the IGUAN framework, which is a user-driven method. We collected and analysed data guided by IGUAN framework about the acceptance of e-government services by elderly. The target group was recruited from Germany and Hungary. Our findings draw the attention to perceived security and perceived usability of an application; these are decisive factors for this target group.

2020-10-12
Puspitaningrum, Diyah, Fernando, Julio, Afriando, Edo, Utama, Ferzha Putra, Rahmadini, Rina, Pinata, Y..  2019.  Finding Local Experts for Dynamic Recommendations Using Lazy Random Walk. 2019 7th International Conference on Cyber and IT Service Management (CITSM). 7:1–6.
Statistics based privacy-aware recommender systems make suggestions more powerful by extracting knowledge from the log of social contacts interactions, but unfortunately, they are static - moreover, advice from local experts effective in finding specific business categories in a particular area. We propose a dynamic recommender algorithm based on a lazy random walk that recommends top-rank shopping places to potentially interested visitors. We consider local authority and topical authority. The algorithm tested on FourSquare shopping data sets of 5 cities in Indonesia with k-steps=5,7,9 (lazy) random walks and compared the results with other state-of-the-art ranking techniques. The results show that it can reach high score precisions (0.5, 0.37, and 0.26 respectively on p@1, p@3, and p@5 for k=5). The algorithm also shows scalability concerning execution time. The advantage of dynamicity is the database used to power the recommender system; no need to be very frequently updated to produce a good recommendation.
2020-09-28
Kohli, Nitin, Laskowski, Paul.  2018.  Epsilon Voting: Mechanism Design for Parameter Selection in Differential Privacy. 2018 IEEE Symposium on Privacy-Aware Computing (PAC). :19–30.
The behavior of a differentially private system is governed by a parameter epsilon which sets a balance between protecting the privacy of individuals and returning accurate results. While a system owner may use a number of heuristics to select epsilon, existing techniques may be unresponsive to the needs of the users who's data is at risk. A promising alternative is to allow users to express their preferences for epsilon. In a system we call epsilon voting, users report the parameter values they want to a chooser mechanism, which aggregates them into a single value. We apply techniques from mechanism design to ask whether such a chooser mechanism can itself be truthful, private, anonymous, and also responsive to users. Without imposing restrictions on user preferences, the only feasible mechanisms belong to a class we call randomized dictatorships with phantoms. This is a restrictive class in which at most one user has any effect on the chosen epsilon. On the other hand, when users exhibit single-peaked preferences, a broader class of mechanisms - ones that generalize the median and other order statistics - becomes possible.
2020-09-11
Spradling, Matthew, Allison, Mark, Tsogbadrakh, Tsenguun, Strong, Jay.  2019.  Toward Limiting Social Botnet Effectiveness while Detection Is Performed: A Probabilistic Approach. 2019 International Conference on Computational Science and Computational Intelligence (CSCI). :1388—1391.
The prevalence of social botnets has increased public distrust of social media networks. Current methods exist for detecting bot activity on Twitter, Reddit, Facebook, and other social media platforms. Most of these detection methods rely upon observing user behavior for a period of time. Unfortunately, the behavior observation period allows time for a botnet to successfully propagate one or many posts before removal. In this paper, we model the post propagation patterns of normal users and social botnets. We prove that a botnet may exploit deterministic propagation actions to elevate a post even with a small botnet population. We propose a probabilistic model which can limit the impact of social media botnets until they can be detected and removed. While our approach maintains expected results for non-coordinated activity, coordinated botnets will be detected before propagation with high probability.
2020-09-04
Wu, Yi, Liu, Jian, Chen, Yingying, Cheng, Jerry.  2019.  Semi-black-box Attacks Against Speech Recognition Systems Using Adversarial Samples. 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN). :1—5.
As automatic speech recognition (ASR) systems have been integrated into a diverse set of devices around us in recent years, security vulnerabilities of them have become an increasing concern for the public. Existing studies have demonstrated that deep neural networks (DNNs), acting as the computation core of ASR systems, is vulnerable to deliberately designed adversarial attacks. Based on the gradient descent algorithm, existing studies have successfully generated adversarial samples which can disturb ASR systems and produce adversary-expected transcript texts designed by adversaries. Most of these research simulated white-box attacks which require knowledge of all the components in the targeted ASR systems. In this work, we propose the first semi-black-box attack against the ASR system - Kaldi. Requiring only partial information from Kaldi and none from DNN, we can embed malicious commands into a single audio chip based on the gradient-independent genetic algorithm. The crafted audio clip could be recognized as the embedded malicious commands by Kaldi and unnoticeable to humans in the meanwhile. Experiments show that our attack can achieve high attack success rate with unnoticeable perturbations to three types of audio clips (pop music, pure music, and human command) without the need of the underlying DNN model parameters and architecture.
Taori, Rohan, Kamsetty, Amog, Chu, Brenton, Vemuri, Nikita.  2019.  Targeted Adversarial Examples for Black Box Audio Systems. 2019 IEEE Security and Privacy Workshops (SPW). :15—20.
The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity, with 35% targeted attack success rate, after 3000 generations while maintaining 94.6% audio file similarity.
2020-06-12
Domniţa, Dan, Oprişa, Ciprian.  2018.  A genetic algorithm for obtaining memory constrained near-perfect hashing. 2018 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR). :1—6.

The problem of fast items retrieval from a fixed collection is often encountered in most computer science areas, from operating system components to databases and user interfaces. We present an approach based on hash tables that focuses on both minimizing the number of comparisons performed during the search and minimizing the total collection size. The standard open-addressing double-hashing approach is improved with a non-linear transformation that can be parametrized in order to ensure a uniform distribution of the data in the hash table. The optimal parameter is determined using a genetic algorithm. The paper results show that near-perfect hashing is faster than binary search, yet uses less memory than perfect hashing, being a good choice for memory-constrained applications where search time is also critical.

2020-05-22
Ranjan, G S K, Kumar Verma, Amar, Radhika, Sudha.  2019.  K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). :1—5.
Fault detection in a machine at earlier stage can prevent severe damage and loss to the industries. Fault detection techniques are broadly classified into three categories; signature extraction-based, model-based and knowledge-based approach. Model-based techniques are efficient for raising an alarm signal if there is any fault in the machine. This paper focuses on one such model based-technique to identify the internal faults of induction machine. The model developed is deployed in the end to make it feasible to use in real time. K-Nearest Neighbors (KNN) and grid search cross validation (CV) have been used to train and optimize the model to give the best results. The advantage of proposed algorithm is the accuracy in prediction which has been seen to be 80%. Finally, a user friendly interface has been built using Flask, a python web framework.
2020-05-18
Han, Ying, Li, Kun, Ge, Fawei.  2019.  Multiple Fault Diagnosis for Sucker Rod Pumping Systems Based on Matter Element Analysis with F-statistics. 2019 IEEE 8th Data Driven Control and Learning Systems Conference (DDCLS). :66–70.
Dynamometer cards can reflect different down-hole working conditions of sucker rod pumping wells. It has great significances to realize multiple fault diagnosis for actual oilfield production. In this paper, the extension theory is used to build a matter-element model to describe the fault diagnosis problem of the sucker rod pumping wells. The correlation function is used to calculate the correlation degree between the diagnostic fault and many standard fault types. The diagnosed sample and many possible fault types are divided into different combinations according to the correlation degree; the F-statistics of each combination is calculated and the “unbiased transformation” is used to find the mean of interval vectors. Larger F-statistics means greater differences within the faults classification; and the minimum F-statistics reflects the real multiple fault types. Case study shows the effectiveness of the proposed method.
Yang, Xiaoliu, Li, Zetao, Zhang, Fabin.  2018.  Simultaneous diagnosis of multiple parametric faults based on differential evolution algorithm. 2018 Chinese Control And Decision Conference (CCDC). :2781–2786.
This paper addresses analysis and design of multiple fault diagnosis for a class of Lipschitz nonlinear system. In order to automatically estimate multi-fault parameters efficiently, a new method of multi-fault diagnosis based on the differential evolution algorithm (DE) is proposed. Finally, a series of experiments validate the feasibility and effectiveness of the proposed method. The simulation show the high accuracy of the proposed strategies in multiple abrupt faults diagnosis.
2020-05-15
Kelly, Jonathan, DeLaus, Michael, Hemberg, Erik, O’Reilly, Una-May.  2019.  Adversarially Adapting Deceptive Views and Reconnaissance Scans on a Software Defined Network. 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). :49—54.

To gain strategic insight into defending against the network reconnaissance stage of advanced persistent threats, we recreate the escalating competition between scans and deceptive views on a Software Defined Network (SDN). Our threat model presumes the defense is a deceptive network view unique for each node on the network. It can be configured in terms of the number of honeypots and subnets, as well as how real nodes are distributed across the subnets. It assumes attacks are NMAP ping scans that can be configured in terms of how many IP addresses are scanned and how they are visited. Higher performing defenses detect the scanner quicker while leaking as little information as possible while higher performing attacks are better at evading detection and discovering real nodes. By using Artificial Intelligence in the form of a competitive coevolutionary genetic algorithm, we can analyze the configurations of high performing static defenses and attacks versus their evolving adversary as well as the optimized configuration of the adversary itself. When attacks and defenses both evolve, we can observe that the extent of evolution influences the best configurations.

2020-05-11
Chae, Younghun, Katenka, Natallia, DiPippo, Lisa.  2019.  An Adaptive Threshold Method for Anomaly-based Intrusion Detection Systems. 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA). :1–4.
Anomaly-based Detection Systems (ADSs) attempt to learn the features of behaviors and events of a system and/or users over a period to build a profile of normal behaviors. There has been a growing interest in ADSs and typically conceived as more powerful systems One of the important factors for ADSs is an ability to distinguish between normal and abnormal behaviors in a given period. However, it is getting complicated due to the dynamic network environment that changes every minute. It is dangerous to distinguish between normal and abnormal behaviors with a fixed threshold in a dynamic environment because it cannot guarantee the threshold is always an indication of normal behaviors. In this paper, we propose an adaptive threshold for a dynamic environment with a trust management scheme for efficiently managing the profiles of normal and abnormal behaviors. Based on the assumption of the statistical analysis-based ADS that normal data instances occur in high probability regions while malicious data instances occur in low probability regions of a stochastic model, we set two adaptive thresholds for normal and abnormal behaviors. The behaviors between the two thresholds are classified as suspicious behaviors, and they are efficiently evaluated with a trust management scheme.
2020-04-13
Morishita, Shun, Hoizumi, Takuya, Ueno, Wataru, Tanabe, Rui, Gañán, Carlos, van Eeten, Michel J.G., Yoshioka, Katsunari, Matsumoto, Tsutomu.  2019.  Detect Me If You… Oh Wait. An Internet-Wide View of Self-Revealing Honeypots. 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). :134–143.
Open-source honeypots are a vital component in the protection of networks and the observation of trends in the threat landscape. Their open nature also enables adversaries to identify the characteristics of these honeypots in order to detect and avoid them. In this study, we investigate the prevalence of 14 open- source honeypots running more or less default configurations, making them easily detectable by attackers. We deploy 20 simple signatures and test them for false positives against servers for domains in the Alexa top 10,000, official FTP mirrors, mail servers in real operation, and real IoT devices running telnet. We find no matches, suggesting good accuracy. We then measure the Internet-wide prevalence of default open-source honeypots by matching the signatures with Censys scan data and our own scans. We discovered 19,208 honeypots across 637 Autonomous Systems that are trivially easy to identify. Concentrations are found in research networks, but also in enterprise, cloud and hosting networks. While some of these honeypots probably have no operational relevance, e.g., they are student projects, this explanation does not fit the wider population. One cluster of honeypots was confirmed to belong to a well-known security center and was in use for ongoing attack monitoring. Concentrations in an another cluster appear to be the result of government incentives. We contacted 11 honeypot operators and received response from 4 operators, suggesting the problem of lack of network hygiene. Finally, we find that some honeypots are actively abused by attackers for hosting malicious binaries. We notified the owners of the detected honeypots via their network operators and provided recommendations for customization to avoid simple signature-based detection. We also shared our results with the honeypot developers.
2020-04-06
Sun, Xuezi, Xu, Guangxian, Liu, Chao.  2019.  A Network Coding Optimization Scheme for Niche Algorithm based on Security Performance. 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). 1:1969—1972.

The network coding optimization based on niche genetic algorithm can observably reduce the network overhead of encoding technology, however, security issues haven't been considered in the coding operation. In order to solve this problem, we propose a network coding optimization scheme for niche algorithm based on security performance (SNGA). It is on the basis of multi-target niche genetic algorithm(NGA)to construct a fitness function which with k-secure network coding mechanism, and to ensure the realization of information security and achieve the maximum transmission of the network. The simulation results show that SNGA can effectively improve the security of network coding, and ensure the running time and convergence speed of the optimal solution.

2020-03-16
Ullah, Faheem, Ali Babar, M..  2019.  QuickAdapt: Scalable Adaptation for Big Data Cyber Security Analytics. 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS). :81–86.
Big Data Cyber Security Analytics (BDCA) leverages big data technologies for collecting, storing, and analyzing a large volume of security events data to detect cyber-attacks. Accuracy and response time, being the most important quality concerns for BDCA, are impacted by changes in security events data. Whilst it is promising to adapt a BDCA system's architecture to the changes in security events data for optimizing accuracy and response time, it is important to consider large search space of architectural configurations. Searching a large space of configurations for potential adaptation incurs an overwhelming adaptation time, which may cancel the benefits of adaptation. We present an adaptation approach, QuickAdapt, to enable quick adaptation of a BDCA system. QuickAdapt uses descriptive statistics (e.g., mean and variance) of security events data and fuzzy rules to (re) compose a system with a set of components to ensure optimal accuracy and response time. We have evaluated QuickAdapt for a distributed BDCA system using four datasets. Our evaluation shows that on average QuickAdapt reduces adaptation time by 105× with a competitive adaptation accuracy of 70% as compared to an existing solution.
2020-01-20
Elisa, Noe, Yang, Longzhi, Fu, Xin, Naik, Nitin.  2019.  Dendritic Cell Algorithm Enhancement Using Fuzzy Inference System for Network Intrusion Detection. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1–6.

Dendritic cell algorithm (DCA) is an immune-inspired classification algorithm which is developed for the purpose of anomaly detection in computer networks. The DCA uses a weighted function in its context detection phase to process three categories of input signals including safe, danger and pathogenic associated molecular pattern to three output context values termed as co-stimulatory, mature and semi-mature, which are then used to perform classification. The weighted function used by the DCA requires either manually pre-defined weights usually provided by the immunologists, or empirically derived weights from the training dataset. Neither of these is sufficiently flexible to work with different datasets to produce optimum classification result. To address such limitation, this work proposes an approach for computing the three output context values of the DCA by employing the recently proposed TSK+ fuzzy inference system, such that the weights are always optimal for the provided data set regarding a specific application. The proposed approach was validated and evaluated by applying it to the two popular datasets KDD99 and UNSW NB15. The results from the experiments demonstrate that, the proposed approach outperforms the conventional DCA in terms of classification accuracy.

2019-04-05
Vastel, A., Rudametkin, W., Rouvoy, R..  2018.  FP -TESTER : Automated Testing of Browser Fingerprint Resilience. 2018 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :103-107.
Despite recent regulations and growing user awareness, undesired browser tracking is increasing. In addition to cookies, browser fingerprinting is a stateless technique that exploits a device's configuration for tracking purposes. In particular, browser fingerprinting builds on attributes made available from Javascript and HTTP headers to create a unique and stable fingerprint. For example, browser plugins have been heavily exploited by state-of-the-art browser fingerprinters as a rich source of entropy. However, as browser vendors abandon plugins in favor of extensions, fingerprinters will adapt. We present FP-TESTER, an approach to automatically test the effectiveness of browser fingerprinting countermeasure extensions. We implement a testing toolkit to be used by developers to reduce browser fingerprintability. While countermeasures aim to hinder tracking by changing or blocking attributes, they may easily introduce subtle side-effects that make browsers more identifiable, rendering the extensions counterproductive. FP-TESTER reports on the side-effects introduced by the countermeasure, as well as how they impact tracking duration from a fingerprinter's point-of-view. To the best of our knowledge, FP-TESTER is the first tool to assist developers in fighting browser fingerprinting and reducing the exposure of end-users to such privacy leaks.
2019-02-21
Gao, Y..  2018.  An Improved Hybrid Group Intelligent Algorithm Based on Artificial Bee Colony and Particle Swarm Optimization. 2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS). :160–163.
Aiming at the disadvantage of poor convergence performance of PSO and artificial swarm algorithm, an improved hybrid algorithm is proposed to overcome the shortcomings of complex optimization problems. Through the test of four standard function by hybrid algorithm and compared the result with standard particle swarm optimization (PSO) algorithm and Artificial Bee Colony (ABC) algorithm, the convergence rate and convergence precision of the hybrid algorithm are both superior to those of the standard particle swarm algorithm and Artificial Bee Colony algorithm, presenting a better optimal performance.
2018-03-26
Hosseinpourpia, M., Oskoei, M. A..  2017.  GA Based Parameter Estimation for Multi-Faceted Trust Model of Recommender Systems. 2017 5th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS). :160–165.

Recommender system is to suggest items that might be interest of the users in social networks. Collaborative filtering is an approach that works based on similarity and recommends items liked by other similar users. Trust model adopts users' trust network in place of similarity. Multi-faceted trust model considers multiple and heterogeneous trust relationship among the users and recommend items based on rating exist in the network of trustees of a specific facet. This paper applies genetic algorithm to estimate parameters of multi-faceted trust model, in which the trust weights are calculated based on the ratings and the trust network for each facet, separately. The model was built on Epinions data set that includes consumers' opinion, rating for items and the web of trust network. It was used to predict users' rating for items in different facets and root mean squared of prediction error (RMSE) was considered as a measure of performance. Empirical evaluations demonstrated that multi-facet models improve performance of the recommender system.

2018-02-02
Tramèr, F., Atlidakis, V., Geambasu, R., Hsu, D., Hubaux, J. P., Humbert, M., Juels, A., Lin, H..  2017.  FairTest: Discovering Unwarranted Associations in Data-Driven Applications. 2017 IEEE European Symposium on Security and Privacy (EuroS P). :401–416.

In a world where traditional notions of privacy are increasingly challenged by the myriad companies that collect and analyze our data, it is important that decision-making entities are held accountable for unfair treatments arising from irresponsible data usage. Unfortunately, a lack of appropriate methodologies and tools means that even identifying unfair or discriminatory effects can be a challenge in practice. We introduce the unwarranted associations (UA) framework, a principled methodology for the discovery of unfair, discriminatory, or offensive user treatment in data-driven applications. The UA framework unifies and rationalizes a number of prior attempts at formalizing algorithmic fairness. It uniquely combines multiple investigative primitives and fairness metrics with broad applicability, granular exploration of unfair treatment in user subgroups, and incorporation of natural notions of utility that may account for observed disparities. We instantiate the UA framework in FairTest, the first comprehensive tool that helps developers check data-driven applications for unfair user treatment. It enables scalable and statistically rigorous investigation of associations between application outcomes (such as prices or premiums) and sensitive user attributes (such as race or gender). Furthermore, FairTest provides debugging capabilities that let programmers rule out potential confounders for observed unfair effects. We report on use of FairTest to investigate and in some cases address disparate impact, offensive labeling, and uneven rates of algorithmic error in four data-driven applications. As examples, our results reveal subtle biases against older populations in the distribution of error in a predictive health application and offensive racial labeling in an image tagger.

2017-12-20
Gayathri, S..  2017.  Phishing websites classifier using polynomial neural networks in genetic algorithm. 2017 Fourth International Conference on Signal Processing, Communication and Networking (ICSCN). :1–4.

Genetic Algorithms are group of mathematical models in computational science by exciting evolution in AI techniques nowadays. These algorithms preserve critical information by applying data structure with simple chromosome recombination operators by encoding solution to a specific problem. Genetic algorithms they are optimizer, in which range of problems applied to it are quite broad. Genetic Algorithms with its global search includes basic principles like selection, crossover and mutation. Data structures, algorithms and human brain inspiration are found for classification of data and for learning which works using Neural Networks. Artificial Intelligence (AI) it is a field, where so many tasks performed naturally by a human. When AI conventional methods are used in a computer it was proved as a complicated task. Applying Neural Networks techniques will create an internal structure of rules by which a program can learn by examples, to classify different inputs than mining techniques. This paper proposes a phishing websites classifier using improved polynomial neural networks in genetic algorithm.

2017-10-10
Bassily, Raef, Nissim, Kobbi, Smith, Adam, Steinke, Thomas, Stemmer, Uri, Ullman, Jonathan.  2016.  Algorithmic Stability for Adaptive Data Analysis. Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing. :1046–1059.

Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P. We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen ``queries'' about the unknown distribution P. How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: We give upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

2017-06-27
Jordan, Michael I..  2016.  On Computational Thinking, Inferential Thinking and Data Science. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :47–47.

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.