Biblio
Spams are unsolicited and unnecessary messages which may contain harmful codes or links for activation of malicious viruses and spywares. Increasing popularity of social networks attracts the spammers to perform malicious activities in social networks. So an efficient spam detection method is necessary for social networks. In this paper, feed forward neural network with back propagation based spam detection model is proposed. The quality of the learning process is improved by tuning initial weights of feed forward neural network using proposed enhanced step size firefly algorithm which reduces the time for finding optimal weights during the learning process. The model is applied for twitter dataset and the experimental results show that, the proposed model performs well in terms of accuracy and detection rate and has lower false positive rate.
The aim of this paper is to show the importance of Computational Stylometry (CS) and Machine Learning (ML) support in author's gender and age detection in cyberbullying texts. We developed a cyberbullying detection platform and we show the results of performances in terms of Precision, Recall and F -Measure for gender and age detection in cyberbullying texts we collected.
Trust Relationships have shown great potential to improve recommendation quality, especially for cold start and sparse users. Since each user trust their friends in different degrees, there are numbers of works been proposed to take Trust Strength into account for recommender systems. However, these methods ignore the information of trust directions between users. In this paper, we propose a novel method to adaptively learn directive trust strength to improve trust-aware recommender systems. Advancing previous works, we propose to establish direction of trust strength by modeling the implicit relationships between users with roles of trusters and trustees. Specially, under new trust strength with directions, how to compute the directive trust strength is becoming a new challenge. Therefore, we present a novel method to adaptively learn directive trust strengths in a unified framework by enforcing the trust strength into range of [0, 1] through a mapping function. Our experiments on Epinions and Ciao datasets demonstrate that the proposed algorithm can effectively outperform several state-of-art algorithms on both MAE and RMSE metrics.
While the rapid adaptation of mobile devices changes our daily life more conveniently, the threat derived from malware is also increased. There are lots of research to detect malware to protect mobile devices, but most of them adopt only signature-based malware detection method that can be easily bypassed by polymorphic and metamorphic malware. To detect malware and its variants, it is essential to adopt behavior-based detection for efficient malware classification. This paper presents a system that classifies malware by using common behavioral characteristics along with malware families. We measure the similarity between malware families with carefully chosen features commonly appeared in the same family. With the proposed similarity measure, we can classify malware by malware's attack behavior pattern and tactical characteristics. Also, we apply community detection algorithm to increase the modularity within each malware family network aggregation. To maintain high classification accuracy, we propose a process to derive the optimal weights of the selected features in the proposed similarity measure. During this process, we find out which features are significant for representing the similarity between malware samples. Finally, we provide an intuitive graph visualization of malware samples which is helpful to understand the distribution and likeness of the malware networks. In the experiment, the proposed system achieved 97% accuracy for malware classification and 95% accuracy for prediction by K-fold cross-validation using the real malware dataset.
The importance of peer-to-peer (P2P) network overlays produced enormous interest in the research community due to their robustness, scalability, and increase of data availability. P2P networks are overlays of logically connected hosts and other nodes including servers. P2P networks allow users to share their files without the need for any centralized servers. Since P2P networks are largely constructed of end-hosts, they are susceptible to abuse and malicious activity, such as sybil attacks. Impostors perform sybil attacks by assigning nodes multiple addresses, as opposed to a single address, with the goal of degrading network quality. Sybil nodes will spread malicious data and provide bogus responses to requests. To prevent sybil attacks from occurring, a novel defense mechanism is proposed. In the proposed scheme, the DHT key-space is divided and treated in a similar manner to radio frequency allocation incensing. An overlay of trusted nodes is used to detect and handle sybil nodes with the aid of source-destination pairs reporting on each other. The simulation results show that the proposed scheme detects sybil nodes in large sized networks with thousands of interactions.
Personal privacy is an important issue when publishing social network data. An attacker may have information to reidentify private data. So, many researchers developed anonymization techniques, such as k-anonymity, k-isomorphism, l-diversity, etc. In this paper, we focus on graph k-degree anonymity by editing edges. Our method is divided into two steps. First, we propose an efficient algorithm to find a new degree sequence with theoretically minimal edit cost. Second, we insert and delete edges based on the new degree sequence to achieve k-degree anonymity.
It is a challenging problem to preserve the friendly-correlations between individuals when publishing social-network data. To alleviate this problem, uncertain graph has been presented recently. The main idea of uncertain graph is converting an original graph into an uncertain form, where the correlations between individuals is an associated probability. However, the existing methods of uncertain graph lack rigorous guarantees of privacy and rely on the assumption of adversary's knowledge. In this paper we first introduced a general model for constructing uncertain graphs. Then, we proposed an algorithm under the model which is based on differential privacy and made an analysis of algorithm's privacy. Our algorithm provides rigorous guarantees of privacy and against the background knowledge attack. Finally, the algorithm we proposed satisfied differential privacy and showed feasibility in the experiments. And then, we compare our algorithm with (k, ε)-obfuscation algorithm in terms of data utility, the importance of nodes for network in our algorithm is similar to (k, ε)-obfuscation algorithm.
In an Internet of Things (IOT) network, each node (device) provides and requires services and with the growth in IOT, the number of nodes providing the same service have also increased, thus creating a problem of selecting one reliable service from among many providers. In this paper, we propose a scalable graph-based collaborative filtering recommendation algorithm, improved using trust to solve service selection problem, which can scale to match the growth in IOT unlike a central recommender which fails. Using this recommender, a node can predict its ratings for the nodes that are providing the required service and then select the best rated service provider.
Recent studies have shown that adding explicit social trust information to social recommendation significantly improves the prediction accuracy of ratings, but it is difficult to obtain a clear trust data among users in real life. Scholars have studied and proposed some trust measure methods to calculate and predict the interaction and trust between users. In this article, a method of social trust relationship extraction based on hellinger distance is proposed, and user similarity is calculated by describing the f-divergence of one side node in user-item bipartite networks. Then, a new matrix factorization model based on implicit social relationship is proposed by adding the extracted implicit social relations into the improved matrix factorization. The experimental results support that the effect of using implicit social trust to recommend is almost the same as that of using actual explicit user trust ratings, and when the explicit trust data cannot be extracted, our method has a better effect than the other traditional algorithms.
Social networking sites such as Flickr, YouTube, Facebook, etc. contain huge amount of user contributed data for a variety of real-world events. We describe an unsupervised approach to the problem of automatically detecting subgroups of people holding similar tastes or either taste. Item or taste tags play an important role in detecting group or subgroup, if two or more persons share the same opinion on the item or taste, they tend to use similar content. We consider the latter to be an implicit attitude. In this paper, we have investigated the impact of implicit and explicit attitude in two genres of social media discussion data, more formal wikipedia discussions and a debate discussion forum that is much more informal. Experimental results strongly suggest that implicit attitude is an important complement for explicit attitudes (expressed via sentiment) and it can improve the sub-group detection performance independent of genre. Here, we have proposed taste-based group, which can enhance the quality of service.
In assessing privacy on online social networks, it is important to investigate their vulnerability to reconnaissance strategies, in which attackers lure targets into being their friends by exploiting the social graph in order to extract victims' sensitive information. As the network topology is only partially revealed after each successful friend request, attackers need to employ an adaptive strategy. Existing work only considered a simple strategy in which attackers sequentially acquire one friend at a time, which causes tremendous delay in waiting for responses before sending the next request, and which lack the ability to retry failed requests after the network has changed. In contrast, we investigate an adaptive and parallel strategy, of which attackers can simultaneously send multiple friend requests in batch and recover from failed requests by retrying after topology changes, thereby significantly reducing the time to reach the targets and greatly improving robustness. We cast this approach as an optimization problem, Max-Crawling, and show it inapproximable within (1 - 1/e + $ε$). We first design our core algorithm PM-AReST which has an approximation ratio of (1 - e-(1-1/e)) using adaptive monotonic submodular properties. We next tighten our algorithm to provide a nearoptimal solution, i.e. having a ratio of (1 - 1/e), via a two-stage stochastic programming approach. We further establish the gap bound of (1 - e-(1-1/e)2) between batch strategies versus the optimal sequential one. We experimentally validate our theoretical results, finding that our algorithm performs nearoptimally in practice and that this is robust under a variety of problem settings.
The ability to discover patterns of interest in criminal networks can support and ease the investigation tasks by security and law enforcement agencies. By considering criminal networks as a special case of social networks, we can properly reuse most of the state-of-the-art techniques to discover patterns of interests, i.e., hidden and potential links. Nevertheless, in time-sensible scenarios, like the one involving criminal actions, the ability to discover patterns in a (near) real-time manner can be of primary importance.In this paper, we investigate the identification of patterns for link detection and prediction on an evolving criminal network. To extract valuable information as soon as data is generated, we exploit a stream processing approach. To this end, we also propose three new similarity social network metrics, specifically tailored for criminal link detection and prediction. Then, we develop a flexible data stream processing application relying on the Apache Flink framework; this solution allows us to deploy and evaluate the newly proposed metrics as well as the ones existing in literature. The experimental results show that the new metrics we propose can reach up to 83% accuracy in detection and 82% accuracy in prediction, resulting competitive with the state of the art metrics.
Social media has become an important platform for people to express opinions, share information and communicate with others. Detecting and tracking topics from social media can help people grasp essential information and facilitate many security-related applications. As social media texts are usually short, traditional topic evolution models built based on LDA or HDP often suffer from the data sparsity problem. Recently proposed topic evolution models are more suitable for short texts, but they need to manually specify topic number which is fixed during different time period. To address these issues, in this paper, we propose a nonparametric topic evolution model for social media short texts. We first propose the recurrent semantic dependent Chinese restaurant process (rsdCRP), which is a nonparametric process incorporating word embeddings to capture semantic similarity information. Then we combine rsdCRP with word co-occurrence modeling and build our short-text oriented topic evolution model sdTEM. We carry out experimental studies on Twitter dataset. The results demonstrate the effectiveness of our method to monitor social media topic evolution compared to the baseline methods.
In our era, most of the communication between people is realized in the form of electronic messages and especially through smart mobile devices. As such, the written text exchanged suffers from bad use of punctuation, misspelling words, continuous chunk of several words without spaces, tables, internet addresses etc. which make traditional text analytics methods difficult or impossible to be applied without serious effort to clean the dataset. Our proposed method in this paper can work in massive noisy and scrambled texts with minimal preprocessing by removing special characters and spaces in order to create a continuous string and detect all the repeated patterns very efficiently using the Longest Expected Repeated Pattern Reduced Suffix Array (LERP-RSA) data structure and a variant of All Repeated Patterns Detection (ARPaD) algorithm. Meta-analyses of the results can further assist a digital forensics investigator to detect important information to the chunk of text analyzed.
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.
Internet plays a crucial role in today's life, so the usage of online social network monotonically increasing. People can share multimedia information's fastly and keep in touch or communicate with friend's easily through online social network across the world. Security in authentication is a big challenge in online social network and authentication is a preliminary process for identifying legitimate user. Conventionally, we are using alphanumeric textbased password for authentication approach. But the main flaw points of text based password is highly vulnerable to attacks and difficulty of recalling password during authentication time due to the irregular use of passwords. To overcome the shortcoming of text passwords, we propose a Graphical Password authentication. An approach of Graphical Password is an authentication of amalgam of pictures. It is less vulnerable to attacks and human can easily recall pictures better than text. So the graphical password is a better alternative to text passwords. As the image uploads are increasing by users share through online site, privacy preserving has become a major problem. So we need a Caption Based Metadata Stratification of images for delivers an automatic suggestion of similar category already in database, it works by comparing the caption metadata of album with caption metadata already in database or extract the synonyms of caption metadata of new album for checking the similarity with caption metadata already in database. This stratification offers an enhanced automatic privacy prediction for uploaded images in online social network, privacy is an inevitable factor for uploaded images, and privacy violation is a major concern. So we propose an Automatic Policy Prediction for uploaded images that are classified by caption metadata. An automatic policy prediction is a hassle-free privacy setting proposed to the user.
Online Social Networks have emerged as an interesting area for analysis where each user having a personalized user profile interact and share information with each other. Apart from analyzing the structural characteristics, detection of abnormal and anomalous activities in social networks has become need of the hour. These anomalous activities represent the rare and mischievous activities that take place in the network. Graphical structure of social networks has encouraged the researchers to use various graph metrics to detect the anomalous activities. One such measure that seemed to be highly beneficial to detect the anomalies was brokerage value which helped to detect the anomalies with high accuracy. Also, further application of the measure to different datasets verified the fact that the anomalous behavior detected by the proposed measure was efficient as compared to the already proposed measures in Oddball Algorithm.
Information threatening the security of critical infrastructures are exchanged over the Internet through communication platforms, such as online discussion forums. This information can be used by malicious hackers to attack critical computer networks and data systems. Much of the literature on the hacking of critical infrastructure has focused on developing typologies of cyber-attacks, but has not examined the communication activities of the actors involved. To address this gap in the literature, the language of hackers was analyzed to identify potential threats against critical infrastructures using automated analysis tools. First, discussion posts were collected from a selected hacker forum using a customized web-crawler. Posts were analyzed using a parts of speech tagger, which helped determine a list of keywords used to query the data. Next, a sentiment analysis tool scored these keywords, which were then analyzed to determine the effectiveness of this method.
Attributing the culprit of a cyber-attack is widely considered one of the major technical and policy challenges of cyber-security. The lack of ground truth for an individual responsible for a given attack has limited previous studies. Here, we overcome this limitation by leveraging DEFCON capture-the-flag (CTF) exercise data where the actual ground-truth is known. In this work, we use various classification techniques to identify the culprit in a cyberattack and find that deceptive activities account for the majority of misclassified samples. We also explore several heuristics to alleviate some of the misclassification caused by deception.
Certain crimes are difficult to be committed by individuals but carefully organised by group of associates and affiliates loosely connected to each other with a single or small group of individuals coordinating the overall actions. A common starting point in understanding the structural organisation of criminal groups is to identify the criminals and their associates. Situations arise in many criminal datasets where there is no direct connection among the criminals. In this paper, we investigate ties and community structure in crime data in order to understand the operations of both traditional and cyber criminals, as well as to predict the existence of organised criminal networks. Our contributions are twofold: we propose a bipartite network model for inferring hidden ties between actors who initiated an illegal interaction and objects affected by the interaction, we then validate the method in two case studies on pharmaceutical crime and underground forum data using standard network algorithms for structural and community analysis. The vertex level metrics and community analysis results obtained indicate the significance of our work in understanding the operations and structure of organised criminal networks which were not immediately obvious in the data. Identifying these groups and mapping their relationship to one another is essential in making more effective disruption strategies in the future.
Online Social Networks exploit a lightweight process to identify their users so as to facilitate their fast adoption. However, such convenience comes at the price of making legitimate users subject to different threats created by fake accounts. Therefore, there is a crucial need to empower users with tools helping them in assigning a level of trust to whomever they interact with. To cope with this issue, in this paper we introduce a novel model, DIVa, that leverages on mining techniques to find correlations among user profile attributes. These correlations are discovered not from user population as a whole, but from individual communities, where the correlations are more pronounced. DIVa exploits a decentralized learning approach and ensures privacy preservation as each node in the OSN independently processes its local data and is required to know only its direct neighbors. Extensive experiments using real-world OSN datasets show that DIVa is able to extract fine-grained community-aware correlations among profile attributes with average improvements up to 50% than the global approach.
Denial-of-Service (DoS) attacks pose a threat to any service provider on the internet. While traditional DoS flooding attacks require the attacker to control at least as much resources as the service provider in order to be effective, so-called low-rate DoS attacks can exploit weaknesses in careless design to effectively deny a service using minimal amounts of network traffic. This paper investigates one such weakness found within version 2.2 of the popular Apache HTTP Server software. The weakness concerns how the server handles the persistent connection feature in HTTP 1.1. An attack simulator exploiting this weakness has been developed and shown to be effective. The attack was then studied with spectral analysis for the purpose of examining how well the attack could be detected. Similar to other papers on spectral analysis of low-rate DoS attacks, the results show that disproportionate amounts of energy in the lower frequencies can be detected when the attack is present. However, by randomizing the attack pattern, an attacker can efficiently reduce this disproportion to a degree where it might be impossible to correctly identify an attack in a real world scenario.
The integration of social networking concepts into the Internet of things has led to the Social Internet of Things (SIoT) paradigm, according to which objects are capable of establishing social relationships in an autonomous way with respect to their owners with the benefits of improving the network scalability in information/service discovery. Within this scenario, we focus on the problem of understanding how the information provided by members of the social IoT has to be processed so as to build a reliable system on the basis of the behavior of the objects. We define two models for trustworthiness management starting from the solutions proposed for P2P and social networks. In the subjective model each node computes the trustworthiness of its friends on the basis of its own experience and on the opinion of the friends in common with the potential service providers. In the objective model, the information about each node is distributed and stored making use of a distributed hash table structure so that any node can make use of the same information. Simulations show how the proposed models can effectively isolate almost any malicious nodes in the network at the expenses of an increase in the network traffic for feedback exchange.