Biblio
Trust prediction in online social networks is crucial for information dissemination, product promotion, and decision making. Existing work on trust prediction mainly utilizes the network structure or the low-rank approximation of a trust network. These approaches can suffer from the problem of data sparsity and prediction accuracy. Inspired by the homophily theory, which shows a pervasive feature of social and economic networks that trust relations tend to be developed among similar people, we propose a novel deep user model for trust prediction based on user similarity measurement. It is a comprehensive data sparsity insensitive model that combines a user review behavior and the item characteristics that this user is interested in. With this user model, we firstly generate a user's latent features mined from user review behavior and the item properties that the user cares. Then we develop a pair-wise deep neural network to further learn and represent these user features. Finally, we measure the trust relations between a pair of people by calculating the user feature vector cosine similarity. Extensive experiments are conducted on two real-world datasets, which demonstrate the superior performance of the proposed approach over the representative baseline works.
Information shared on Twitter is ever increasing and users-recipients are overwhelmed by the number of tweets they receive, many of which of no interest. Filters that estimate the interest of each incoming post can alleviate this problem, for example by allowing users to sort incoming posts by predicted interest (e.g., "top stories" vs. "most recent" in Facebook). Global and personal filters have been used to detect interesting posts in social networks. Global filters are trained on large collections of posts and reactions to posts (e.g., retweets), aiming to predict how interesting a post is for a broad audience. In contrast, personal filters are trained on posts received by a particular user and the reactions of the particular user. Personal filters can provide recommendations tailored to a particular user's interests, which may not coincide with the interests of the majority of users that global filters are trained to predict. On the other hand, global filters are typically trained on much larger datasets compared to personal filters. Hence, global filters may work better in practice, especially with new users, for which personal filters may have very few training instances ("cold start" problem). Following Uysal and Croft, we devised a hybrid approach that combines the strengths of both global and personal filters. As in global filters, we train a single system on a large, multi-user collection of tweets. Each tweet, however, is represented as a feature vector with a number of user-specific features.
User modeling of individual users on the Social Web platforms such as Twitter plays a significant role in providing personalized recommendations and filtering interesting information from social streams. Recently, researchers proposed the use of concepts (e.g., DBpedia entities) for representing user interests instead of word-based approaches, since Knowledge Bases such as DBpedia provide cross-domain background knowledge about concepts, and thus can be used for extending user interest profiles. Even so, not all concepts can be covered by a Knowledge Base, especially in the case of microblogging platforms such as Twitter where new concepts/topics emerge everyday. In this short paper, instead of using concepts alone, we propose using synsets from WordNet and concepts from DBpedia for representing user interests. We evaluate our proposed user modeling strategies by comparing them with other bag-of-concepts approaches. The results show that using synsets and concepts together for representing user interests improves the quality of user modeling significantly in the context of link recommendations on Twitter.
Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.
Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.
Distributed Denial of Service (DDoS) attacks are one of the most important threads in network systems. Due to the distributed nature, DDoS attacks are very hard to detect, while they also have the destructive potential of classical denial of service attacks. In this study, a novel 2-step system is proposed for the detection of DDoS attacks. In the first step an anomaly detection is performed on the destination IP traffic. If an anomaly is detected on the network, the system proceeds into the second step where a decision on every user is made due to the behaviour models. Hence, it is possible to detect attacks in the network that diverges from users' behavior model.