Biblio
Mobile phones have become nowadays a commodity to the majority of people. Using them, people are able to access the world of Internet and connect with their friends, their colleagues at work or even unknown people with common interests. This proliferation of the mobile devices has also been seen as an opportunity for the cyber criminals to deceive smartphone users and steel their money directly or indirectly, respectively, by accessing their bank accounts through the smartphones or by blackmailing them or selling their private data such as photos, credit card data, etc. to third parties. This is usually achieved by installing malware to smartphones masking their malevolent payload as a legitimate application and advertise it to the users with the hope that mobile users will install it in their devices. Thus, any existing application can easily be modified by integrating a malware and then presented it as a legitimate one. In response to this, scientists have proposed a number of malware detection and classification methods using a variety of techniques. Even though, several of them achieve relatively high precision in malware classification, there is still space for improvement. In this paper, we propose a text mining all repeated pattern detection method which uses the decompiled files of an application in order to classify a suspicious application into one of the known malware families. Based on the experimental results using a real malware dataset, the methodology tries to correctly classify (without any misclassification) all randomly selected malware applications of 3 categories with 3 different families each.
With so much our daily lives relying on digital devices like personal computers and cell phones, there is a growing demand for code that not only functions properly, but is secure and keeps user data safe. However, ensuring this is not such an easy task, and many developers do not have the required skills or resources to ensure their code is secure. Many code analysis tools have been written to find vulnerabilities in newly developed code, but this technology tends to produce many false positives, and is still not able to identify all of the problems. Other methods of finding software vulnerabilities automatically are required. This proof-of-concept study applied natural language processing on Java byte code to locate SQL injection vulnerabilities in a Java program. Preliminary findings show that, due to the high number of terms in the dataset, using singular decision trees will not produce a suitable model for locating SQL injection vulnerabilities, while random forest structures proved more promising. Still, further work is needed to determine the best classification tool.
Software1 vulnerabilities are closely associated with information systems security, a major and critical field in today's technology. Vulnerabilities constitute a constant and increasing threat for various aspects of everyday life, especially for safety and economy, since the social impact from the problems that they cause is complicated and often unpredictable. Although there is an entire research branch in software engineering that deals with the identification and elimination of vulnerabilities, the growing complexity of software products and the variability of software production procedures are factors contributing to the ongoing occurrence of vulnerabilities, Hence, another area that is being developed in parallel focuses on the study and management of the vulnerabilities that have already been reported and registered in databases. The information contained in such databases includes, a textual description and a number of metrics related to vulnerabilities. The purpose of this paper is to investigate to what extend the assessment of the vulnerability severity can be inferred directly from the corresponding textual description, or in other words, to examine the informative power of the description with respect to the vulnerability severity. For this purpose, text mining techniques, i.e. text analysis and three different classification methods (decision trees, neural networks and support vector machines) were employed. The application of text mining to a sample of 70,678 vulnerabilities from a public data source shows that the description itself is a reliable and highly accurate source of information for vulnerability prioritization.
The security of computer programs and systems is a very critical issue. With the number of attacks launched on computer networks and software, businesses and IT professionals are taking steps to ensure that their information systems are as secure as possible. However, many programmers do not think about adding security to their programs until their projects are near completion. This is a major mistake because a system is as secure as its weakest link. If security is viewed as an afterthought, it is highly likely that the resulting system will have a large number of vulnerabilities, which could be exploited by attackers. One of the reasons programmers overlook adding security to their code is because it is viewed as a complicated or time-consuming process. This paper presents a tool that will help programmers think more about security and add security tactics to their code with ease. We created a model that learns from existing open source projects and documentation using machine learning and text mining techniques. Our tool contains a module that runs in the background to analyze code as the programmer types and offers suggestions of where security could be included. In addition, our tool fetches existing open source implementations of cryptographic algorithms and sample code from repositories to aid programmers in adding security easily to their projects.
Social media has become an important platform for people to express opinions, share information and communicate with others. Detecting and tracking topics from social media can help people grasp essential information and facilitate many security-related applications. As social media texts are usually short, traditional topic evolution models built based on LDA or HDP often suffer from the data sparsity problem. Recently proposed topic evolution models are more suitable for short texts, but they need to manually specify topic number which is fixed during different time period. To address these issues, in this paper, we propose a nonparametric topic evolution model for social media short texts. We first propose the recurrent semantic dependent Chinese restaurant process (rsdCRP), which is a nonparametric process incorporating word embeddings to capture semantic similarity information. Then we combine rsdCRP with word co-occurrence modeling and build our short-text oriented topic evolution model sdTEM. We carry out experimental studies on Twitter dataset. The results demonstrate the effectiveness of our method to monitor social media topic evolution compared to the baseline methods.