Visible to the public Biblio

Filters: Keyword is python  [Clear All Filters]
2020-05-22
Ranjan, G S K, Kumar Verma, Amar, Radhika, Sudha.  2019.  K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). :1—5.
Fault detection in a machine at earlier stage can prevent severe damage and loss to the industries. Fault detection techniques are broadly classified into three categories; signature extraction-based, model-based and knowledge-based approach. Model-based techniques are efficient for raising an alarm signal if there is any fault in the machine. This paper focuses on one such model based-technique to identify the internal faults of induction machine. The model developed is deployed in the end to make it feasible to use in real time. K-Nearest Neighbors (KNN) and grid search cross validation (CV) have been used to train and optimize the model to give the best results. The advantage of proposed algorithm is the accuracy in prediction which has been seen to be 80%. Finally, a user friendly interface has been built using Flask, a python web framework.
2020-03-16
Al Ghazo, Alaa T., Kumar, Ratnesh.  2019.  ICS/SCADA Device Recognition: A Hybrid Communication-Patterns and Passive-Fingerprinting Approach. 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). :19–24.
The Industrial Control System (ICS) and Supervisory Control and Data Acquisition (SCADA) systems are the backbones for monitoring and supervising factories, power grids, water distribution systems, nuclear plants, and other critical infrastructures. These systems are installed by third party contractors, maintained by site engineers, and operate for a long time. This makes tracing the documentation of the systems' changes and updates challenging since some of their components' information (type, manufacturer, model, etc.) may not be up-to-date, leading to possibly unaccounted security vulnerabilities in the systems. Device recognition is useful first step in vulnerability identification and defense augmentation, but due to the lack of full traceability in case of legacy ICS/SCADA systems, the typical device recognition based on document inspection is not applicable. In this paper, we propose a hybrid approach involving the mix of communication-patterns and passive-fingerprinting to identify the unknown devices' types, manufacturers, and models. The algorithm uses the ICS/SCADA devices's communication-patterns to recognize the control hierarchy levels of the devices. In conjunction, certain distinguishable features in the communication-packets are used to recognize the device manufacturer, and model. We have implemented this hybrid approach in Python, and tested on traffic data from a water treatment SCADA testbed in Singapore (iTrust).
2020-02-10
Rahman, Md Rayhanur, Rahman, Akond, Williams, Laurie.  2019.  Share, But Be Aware: Security Smells in Python Gists. 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). :536–540.

Github Gist is a service provided by Github which is used by developers to share code snippets. While sharing, developers may inadvertently introduce security smells in code snippets as well, such as hard-coded passwords. Security smells are recurrent coding patterns that are indicative of security weaknesses, which could potentially lead to security breaches. The goal of this paper is to help software practitioners avoid insecure coding practices through an empirical study of security smells in publicly-available GitHub Gists. Through static analysis, we found 13 types of security smells with 4,403 occurrences in 5,822 publicly-available Python Gists. 1,817 of those Gists, which is around 31%, have at least one security smell including 689 instances of hard-coded secrets. We also found no significance relation between the presence of these security smells and the reputation of the Gist author. Based on our findings, we advocate for increased awareness and rigorous code review efforts related to software security for Github Gists so that propagation of insecure coding practices are mitigated.

2019-12-16
Hou, Xin-Yu, Zhao, Xiao-Lin, Wu, Mei-Jing, Ma, Rui, Chen, Yu-Peng.  2018.  A Dynamic Detection Technique for XSS Vulnerabilities. 2018 4th Annual International Conference on Network and Information Systems for Computers (ICNISC). :34–43.

This paper studies the principle of vulnerability generation and mechanism of cross-site scripting attack, designs a dynamic cross-site scripting vulnerabilities detection technique based on existing theories of black box vulnerabilities detection. The dynamic detection process contains five steps: crawler, feature construct, attacks simulation, results detection and report generation. Crawling strategy in crawler module and constructing algorithm in feature construct module are key points of this detection process. Finally, according to the detection technique proposed in this paper, a detection tool is accomplished in Linux using python language to detect web applications. Experiments were launched to verify the results and compare with the test results of other existing tools, analyze the usability, advantages and disadvantages of the detection method above, confirm the feasibility of applying dynamic detection technique to cross-site scripting vulnerabilities detection.

2019-09-26
Jackson, K. A., Bennett, B. T..  2018.  Locating SQL Injection Vulnerabilities in Java Byte Code Using Natural Language Techniques. SoutheastCon 2018. :1-5.

With so much our daily lives relying on digital devices like personal computers and cell phones, there is a growing demand for code that not only functions properly, but is secure and keeps user data safe. However, ensuring this is not such an easy task, and many developers do not have the required skills or resources to ensure their code is secure. Many code analysis tools have been written to find vulnerabilities in newly developed code, but this technology tends to produce many false positives, and is still not able to identify all of the problems. Other methods of finding software vulnerabilities automatically are required. This proof-of-concept study applied natural language processing on Java byte code to locate SQL injection vulnerabilities in a Java program. Preliminary findings show that, due to the high number of terms in the dataset, using singular decision trees will not produce a suitable model for locating SQL injection vulnerabilities, while random forest structures proved more promising. Still, further work is needed to determine the best classification tool.

2019-03-04
Aborisade, O., Anwar, M..  2018.  Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers. 2018 IEEE International Conference on Information Reuse and Integration (IRI). :269–276.

At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.

2019-02-22
Gharibi, Gharib, Tripathi, Rashmi, Lee, Yugyung.  2018.  Code2Graph: Automatic Generation of Static Call Graphs for Python Source Code. Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. :880-883.

A static call graph is an imperative prerequisite used in most interprocedural analyses and software comprehension tools. However, there is a lack of software tools that can automatically analyze the Python source-code and construct its static call graph. In this paper, we introduce a prototype Python tool, named code2graph, which automates the tasks of (1) analyzing the Python source-code and extracting its structure, (2) constructing static call graphs from the source code, and (3) generating a similarity matrix of all possible execution paths in the system. Our goal is twofold: First, assist the developers in understanding the overall structure of the system. Second, provide a stepping stone for further research that can utilize the tool in software searching and similarity detection applications. For example, clustering the execution paths into a logical workflow of the system would be applied to automate specific software tasks. Code2graph has been successfully used to generate static call graphs and similarity matrices of the paths for three popular open-source Deep Learning projects (TensorFlow, Keras, PyTorch). A tool demo is available at https://youtu.be/ecctePpcAKU.

2019-02-08
Sisiaridis, D., Markowitch, O..  2018.  Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :43-48.

Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cybersecurity threats and attacks by utilizing data mining techniques in the field of Artificial Intelligence. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection utilizing machine learning algorithms for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.

2017-03-07
Agnihotri, Lalitha, Mojarad, Shirin, Lewkow, Nicholas, Essa, Alfred.  2016.  Educational Data Mining with Python and Apache Spark: A Hands-on Tutorial. Proceedings of the Sixth International Conference on Learning Analytics & Knowledge. :507–508.

Enormous amount of educational data has been accumulated through Massive Open Online Courses (MOOCs), as well as commercial and non-commercial learning platforms. This is in addition to the educational data released by US government since 2012 to facilitate disruption in education by making data freely available. The high volume, variety and velocity of collected data necessitate use of big data tools and storage systems such as distributed databases for storage and Apache Spark for analysis. This tutorial will introduce researchers and faculty to real-world applications involving data mining and predictive analytics in learning sciences. In addition, the tutorial will introduce statistics required to validate and accurately report results. Topics will cover how big data is being used to transform education. Specifically, we will demonstrate how exploratory data analysis, data mining, predictive analytics, machine learning, and visualization techniques are being applied to educational big data to improve learning and scale insights driven from millions of student's records. The tutorial will be held over a half day and will be hands on with pre-posted material. Due to the interdisciplinary nature of work, the tutorial appeals to researchers from a wide range of backgrounds including big data, predictive analytics, learning sciences, educational data mining, and in general, those interested in how big data analytics can transform learning. As a prerequisite, attendees are required to have familiarity with at least one programming language.