Visible to the public Biblio

Found 286 results

Filters: First Letter Of Title is U  [Clear All Filters]
2017-09-05
Deng, Jing, Gao, Xiaoli, Wang, Chunyue.  2016.  Using Bi-level Penalized Logistic Classifier to Detect Zombie Accounts in Online Social Networks. Proceedings of the Fifth International Conference on Network, Communication and Computing. :126–130.

The huge popularity of online social networks and the potential financial gain have led to the creation and proliferation of zombie accounts, i.e., fake user accounts. For considerable amount of payment, zombie accounts can be directed by their managers to provide pre-arranged biased reactions to different social events or the quality of a commercial product. It is thus critical to detect and screen these accounts. Prior arts are either inaccurate or relying heavily on complex posting/tweeting behaviors in the classification process of normal/zombie accounts. In this work, we propose to use a bi-level penalized logistic classifier, an efficient high-dimensional data analysis technique, to detect zombie accounts based on their publicly available profile information and the statistics of their followers' registration locations. Our approach, termed (B)i-level (P)enalized (LO)gistic (C)lassifier (BPLOC), is data adaptive and can be extended to mount more accurate detections. Our experimental results are based on a small number of SINA WeiBo accounts and have demonstrated that BPLOC can classify zombie accounts accurately.

2017-08-02
Schroeder, Jan, Berger, Christian, Staron, Miroslaw, Herpel, Thomas, Knauss, Alessia.  2016.  Unveiling Anomalies and Their Impact on Software Quality in Model-based Automotive Software Revisions with Software Metrics and Domain Experts. Proceedings of the 25th International Symposium on Software Testing and Analysis. :154–164.

The validation of simulation models (e.g., of electronic control units for vehicles) in industry is becoming increasingly challenging due to their growing complexity. To systematically assess the quality of such models, software metrics seem to be promising. In this paper we explore the use of software metrics and outlier analysis as a means to assess the quality of model-based software. More specifically, we investigate how results from regression analysis applied to measurement data received from size and complexity metrics can be mapped to software quality. Using the moving averages approach, models were fit to data received from over 65,000 software revisions for 71 simulation models that represent different electronic control units of real premium vehicles. Consecutive investigations using studentized deleted residuals and Cook’s Distance revealed outliers among the measurements. From these outliers we identified a subset, which provides meaningful information (anomalies) by comparing outlier scores with expert opinions. Eight engineers were interviewed separately for outlier impact on software quality. Findings were validated in consecutive workshops. The results show correlations between outliers and their impact on four of the considered quality characteristics. They also demonstrate the applicability of this approach in industry.

Liu, Yepang, Xu, Chang, Cheung, Shing-Chi, Terragni, Valerio.  2016.  Understanding and Detecting Wake Lock Misuses for Android Applications. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. :396–409.

Wake locks are widely used in Android apps to protect critical computations from being disrupted by device sleeping. Inappropriate use of wake locks often seriously impacts user experience. However, little is known on how wake locks are used in real-world Android apps and the impact of their misuses. To bridge the gap, we conducted a large-scale empirical study on 44,736 commercial and 31 open-source Android apps. By automated program analysis and manual investigation, we observed (1) common program points where wake locks are acquired and released, (2) 13 types of critical computational tasks that are often protected by wake locks, and (3) eight patterns of wake lock misuses that commonly cause functional and non-functional issues, only three of which had been studied by existing work. Based on our findings, we designed a static analysis technique, Elite, to detect two most common patterns of wake lock misuses. Our experiments on real-world subjects showed that Elite is effective and can outperform two state-of-the-art techniques.

Cao, Cong, Yan, Jun, Li, Mengxiang.  2016.  Understanding the Influence and Service Type of Trusted Third Party on Consumers' Online Trust: Evidence from Australian B2C Marketplace. Proceedings of the 18th Annual International Conference on Electronic Commerce: E-Commerce in Smart Connected World. :18:1–18:8.

In this study, the trusted third party (TTP) in Australia's B2C marketplace is studied and the factors influencing consumers' trust behaviour are examined from the perspective of consumers' online trust. Based on the literature review and combined with the development status and background of Australia's e-commerce, underpinned by the Theory of Planned Behaviour (TPB) and a conceptual trust model, this paper expatiates the specific factors and influence mechanism of TTP on consumers' trust behaviour. Also this paper explains two different functions of TTP to solve the online trust problem faced by consumers. Meanwhile, this paper summarizes five different types of services provided by TTPs during the establishment of the trust relationship. Finally, the present study selects 100 B2C enterprises by the simple random sampling method and makes a detailed analysis of their TTPs, to verify the services and functions of the proposed TTP in the trust model. This study is of some significance for comprehending the influence mechanism, functions and services of TTPs on consumers' trust behaviour in the realistic Australian B2C environment.

Feil, Sebastian, Kretzer, Martin, Werder, Karl, Maedche, Alexander.  2016.  Using Gamification to Tackle the Cold-Start Problem in Recommender Systems. Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. :253–256.

The cold start problem in recommender systems refers to the inability of making reliable recommendations if a critical mass of items has not yet been rated. To bypass this problem existing research focused on developing more reliable prediction models for situations in which only few items ratings exist. However, most of these approaches depend on adjusting the algorithm that determines a recommendation. We present a complimentary approach that does not require any adjustments to the recommendation algorithm. We draw on motivation theory and reward users for rating items. In particular, we instantiate different gamification patterns and examine their effect on the average user’s number of provided report ratings. Our results confirm the positive effect of instantiating gamification patterns on the number of received report ratings.

Piao, Guangyuan, Breslin, John G..  2016.  User Modeling on Twitter with WordNet Synsets and DBpedia Concepts for Personalized Recommendations. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2057–2060.

User modeling of individual users on the Social Web platforms such as Twitter plays a significant role in providing personalized recommendations and filtering interesting information from social streams. Recently, researchers proposed the use of concepts (e.g., DBpedia entities) for representing user interests instead of word-based approaches, since Knowledge Bases such as DBpedia provide cross-domain background knowledge about concepts, and thus can be used for extending user interest profiles. Even so, not all concepts can be covered by a Knowledge Base, especially in the case of microblogging platforms such as Twitter where new concepts/topics emerge everyday. In this short paper, instead of using concepts alone, we propose using synsets from WordNet and concepts from DBpedia for representing user interests. We evaluate our proposed user modeling strategies by comparing them with other bag-of-concepts approaches. The results show that using synsets and concepts together for representing user interests improves the quality of user modeling significantly in the context of link recommendations on Twitter.

2017-07-24
Ahmad, Kashif, Conci, Nicola, Boato, Giulia, De Natale, Francesco G. B..  2016.  USED: A Large-scale Social Event Detection Dataset. Proceedings of the 7th International Conference on Multimedia Systems. :50:1–50:6.

Event discovery from single pictures is a challenging problem that has raised significant interest in the last decade. During this time, a number of interesting solutions have been proposed to tackle event discovery in still images. However, a large scale benchmarking image dataset for the evaluation and comparison of event discovery algorithms from single images is still lagging behind. To this aim, in this paper we provide a large-scale properly annotated and balanced dataset of 490,000 images, covering every aspect of 14 different types of social events, selected among the most shared ones in the social network. Such a large scale collection of event-related images is intended to become a powerful support tool for the research community in multimedia analysis by providing a common benchmark for training, testing, validation and comparison of existing and novel algorithms. In this paper, we provide a detailed description of how the dataset is collected, organized and how it can be beneficial for the researchers in the multimedia analysis domain. Moreover, a deep learning based approach is introduced into event discovery from single images as one of the possible applications of this dataset with a belief that deep learning can prove to be a breakthrough also in this research area. By providing this dataset, we hope to gather research community in the multimedia and signal processing domains to advance this application.

2017-07-18
Benjamin Andow, Akhil Acharya, Dengfeng Li, University of Illinois at Urbana-Champaign, William Enck, Kapil Singh, Tao Xie, University of Illinois at Urbana-Champaign.  2017.  UiRef: Analysis of Sensitive User Inputs in Android Applications. 10th ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec 2017).

Mobile applications frequently request sensitive data. While prior work has focused on analyzing sensitive-data uses originating from well-dened API calls in the system, the security and privacy implications of inputs requested via application user interfaces have been widely unexplored. In this paper, our goal is to understand the broad implications of such requests in terms of the type of sensitive data being requested by applications.

To this end, we propose UiRef (User Input REsolution Framework), an automated approach for resolving the semantics of user inputs requested by mobile applications. UiRef’s design includes a number of novel techniques for extracting and resolving user interface labels and addressing ambiguity in semantics, resulting in signicant improvements over prior work.We apply UiRef to 50,162 Android applications from Google Play and use outlier analysis to triage applications with questionable input requests. We identify concerning developer practices, including insecure exposure of account passwords and non-consensual input disclosures to third parties. These ndings demonstrate the importance of user-input semantics when protecting end users.

2017-06-27
Mu, Xin, Zhu, Feida, Lim, Ee-Peng, Xiao, Jing, Wang, Jianzong, Zhou, Zhi-Hua.  2016.  User Identity Linkage by Latent User Space Modelling. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1775–1784.

User identity linkage across social platforms is an important problem of great research challenge and practical value. In real applications, the task often assumes an extra degree of difficulty by requiring linkage across multiple platforms. While pair-wise user linkage between two platforms, which has been the focus of most existing solutions, provides reasonably convincing linkage, the result depends by nature on the order of platform pairs in execution with no theoretical guarantee on its stability. In this paper, we explore a new concept of ``Latent User Space'' to more naturally model the relationship between the underlying real users and their observed projections onto the varied social platforms, such that the more similar the real users, the closer their profiles in the latent user space. We propose two effective algorithms, a batch model(ULink) and an online model(ULink-On), based on latent user space modelling. Two simple yet effective optimization methods are used for optimizing objective function: the first one based on the constrained concave-convex procedure(CCCP) and the second on accelerated proximal gradient. To our best knowledge, this is the first work to propose a unified framework to address the following two important aspects of the multi-platform user identity linkage problem –- (I) the platform multiplicity and (II) online data generation. We present experimental evaluations on real-world data sets for not only traditional pairwise-platform linkage but also multi-platform linkage. The results demonstrate the superiority of our proposed method over the state-of-the-art ones.

Isaakidis, Marios, Halpin, Harry, Danezis, George.  2016.  UnlimitID: Privacy-Preserving Federated Identity Management Using Algebraic MACs. Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society. :139–142.

UnlimitID is a method for enhancing the privacy of commodity OAuth and applications such as OpenID Connect, using anonymous attribute-based credentials based on algebraic Message Authentication Codes (aMACs). OAuth is one of the most widely used protocols on the Web, but it exposes each of the requests of a user for data by each relying party (RP) to the identity provider (IdP). Our approach allows for the creation of multiple persistent and unlinkable pseudo-identities and requires no change in the deployed code of relying parties, only in identity providers and the client.

2017-06-05
Fang, Yi, Godavarthy, Archana, Lu, Haibing.  2016.  A Utility Maximization Framework for Privacy Preservation of User Generated Content. Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval. :281–290.

The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.

Hubaux, Jean-Pierre.  2016.  The Ultimate Frontier for Privacy and Security: Medicine. Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks. :1–1.

Personalized medicine brings the promise of better diagnoses, better treatments, a higher quality of life and increased longevity. To achieve these noble goals, it exploits a number of revolutionary technologies, including genome sequencing and DNA editing, as well as wearable devices and implantable or even edible biosensors. In parallel, the popularity of "quantified self" gadgets shows the willingness of citizens to be more proactive with respect to their own health. Yet, this evolution opens the door to all kinds of abuses, notably in terms of discrimination, blackmailing, stalking, and subversion of devices. After giving a general description of this situation, in this talk we will expound on some of the main concerns, including the temptation to permanently and remotely monitor the physical (and metabolic) activity of individuals. We will describe the potential and the limitations of techniques such as cryptography (including secure multi-party computation), trusted hardware and differential privacy. We will also discuss the notion of consent in the face of the intrinsic correlations of human data. We will argue in favor of a more systematic, principled and cross-disciplinary research effort in this field and will discuss the motives of the various stakeholders.

2017-05-30
Lu, Kangjie, Song, Chengyu, Kim, Taesoo, Lee, Wenke.  2016.  UniSan: Proactive Kernel Memory Initialization to Eliminate Data Leakages. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :920–932.

Operating system kernel is the de facto trusted computing base for most computer systems. To secure the OS kernel, many security mechanisms, e.g., kASLR and StackGuard, have been increasingly deployed to defend against attacks (e.g., code reuse attack). However, the effectiveness of these protections has been proven to be inadequate-there are many information leak vulnerabilities in the kernel to leak the randomized pointer or canary, thus bypassing kASLR and StackGuard. Other sensitive data in the kernel, such as cryptographic keys and file caches, can also be leaked. According to our study, most kernel information leaks are caused by uninitialized data reads. Unfortunately, existing techniques like memory safety enforcements and dynamic access tracking tools are not adequate or efficient enough to mitigate this threat. In this paper, we propose UniSan, a novel, compiler-based approach to eliminate all information leaks caused by uninitialized read in the OS kernel. UniSan achieves this goal using byte-level, flow-sensitive, context-sensitive, and field-sensitive initialization analysis and reachability analysis to check whether an allocation has been fully initialized when it leaves kernel space; if not, it automatically instruments the kernel to initialize this allocation. UniSan's analyses are conservative to avoid false negatives and are robust by preserving the semantics of the OS kernel. We have implemented UniSan as passes in LLVM and applied it to the latest Linux kernel (x86\_64) and Android kernel (AArch64). Our evaluation showed that UniSan can successfully prevent 43 known and many new uninitialized data leak vulnerabilities. Further, 19 new vulnerabilities in the latest kernels have been confirmed by Linux and Google. Our extensive performance evaluation with LMBench, ApacheBench, Android benchmarks, and the SPEC benchmarks also showed that UniSan imposes a negligible performance overhead.

2017-05-18
Shamsi, Zain, Loguinov, Dmitri.  2016.  Unsupervised Clustering Under Temporal Feature Volatility in Network Stack Fingerprinting. Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. :127–138.

Maintaining and updating signature databases is a tedious task that normally requires a large amount of user effort. The problem becomes harder when features can be distorted by observation noise, which we call volatility. To address this issue, we propose algorithms and models to automatically generate signatures in the presence of noise, with a focus on stack fingerprinting, which is a research area that aims to discover the operating system (OS) of remote hosts using TCP/IP packets. Armed with this framework, we construct a database with 420 network stacks, label the signatures, develop a robust classifier for this database, and fingerprint 66M visible webservers on the Internet.

2017-05-17
Das, Aveek K., Pathak, Parth H., Chuah, Chen-Nee, Mohapatra, Prasant.  2016.  Uncovering Privacy Leakage in BLE Network Traffic of Wearable Fitness Trackers. Proceedings of the 17th International Workshop on Mobile Computing Systems and Applications. :99–104.

There has been a tremendous increase in popularity and adoption of wearable fitness trackers. These fitness trackers predominantly use Bluetooth Low Energy (BLE) for communicating and syncing the data with user's smartphone. This paper presents a measurement-driven study of possible privacy leakage from BLE communication between the fitness tracker and the smartphone. Using real BLE traffic traces collected in the wild and in controlled experiments, we show that majority of the fitness trackers use unchanged BLE address while advertising, making it feasible to track them. The BLE traffic of the fitness trackers is found to be correlated with the intensity of user's activity, making it possible for an eavesdropper to determine user's current activity (walking, sitting, idle or running) through BLE traffic analysis. Furthermore, we also demonstrate that the BLE traffic can represent user's gait which is known to be distinct from user to user. This makes it possible to identify a person (from a small group of users) based on the BLE traffic of her fitness tracker. As BLE-based wearable fitness trackers become widely adopted, our aim is to identify important privacy implications of their usage and discuss prevention strategies.

Azriel, Leonid, Ginosar, Ran, Gueron, Shay, Mendelson, Avi.  2016.  Using Scan Side Channel for Detecting IP Theft. Proceedings of the Hardware and Architectural Support for Security and Privacy 2016. :1:1–1:8.

We present a process for detection of IP theft in VLSI devices that exploits the internal test scan chains. The IP owner learns implementation details in the suspect device to find evidence of the theft, while the top level function is public. The scan chains supply direct access to the internal registers in the device, thus making it possible to learn the logic functions of the internal combinational logic chunks. Our work introduces an innovative way of applying Boolean function analysis techniques for learning digital circuits with the goal of IP theft detection. By using Boolean function learning methods, the learner creates a partial dependency graph of the internal flip-flops. The graph is further partitioned using the SNN graph clustering method, and individual blocks of combinational logic are isolated. These blocks can be matched with known building blocks that compose the original function. This enables reconstruction of the function implementation to the level of pipeline structure. The IP owner can compare the resulting structure with his own implementation to confirm or refute that an IP violation has occurred. We demonstrate the power of the presented approach with a test case of an open source Bitcoin SHA-256 accelerator, containing more than 80,000 registers. With the presented method we discover the microarchitecture of the module, locate all the main components of the SHA-256 algorithm, and learn the module's flow control.

2017-05-16
Zhang, Lin, Zhang, Zhenfeng, Hu, Xuexian.  2016.  UC-secure Two-Server Password-Based Authentication Protocol and Its Applications. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :153–164.

A two-server password-based authentication (2PA) protocol is a special kind of authentication primitive that provides additional protection for the user's password. Through a 2PA protocol, a user can distribute his low-entropy password between two authentication servers in the initialization phase and authenticate himself merely via a matching password in the login phase. No single server can learn any information about the user's password, nor impersonate the legitimate user to authenticate to the honest server. In this paper, we first formulate and realize the security definition of two-server password-based authentication in the well-known universal composability (UC) framework, which thus provides desirable properties such as composable security. We show that our construction is suitable for the asymmetric communication model in which one server acts as the front-end server interacting directly with the user and the other stays backstage. Then, we show that our protocol could be easily extended to more complicate password-based cryptographic protocols such as two-server password-authenticated key exchange (2PAKE) and two-server password-authenticated secret sharing (2PASS), which enjoy stronger security guarantees and better efficiency performances in comparison with the existing schemes.

2017-04-21
Christopher Hannon, Illinois Institute of Technology, Dong Jin, Illinois Institute of Technology, Chen Chen, Argonne National Laboratory, Jianhui Wang, Argonne National Laboratory.  2017.  Ultimate Forwarding Resilience in OpenFlow Networks. ACM International Workshop on Security in Software Defined Networks & Network Function Virtualization (SDN-NFV Security 2017).

Software defined networking is a rapidly expanding networking paradigm that aims to separate the control logic from the forwarding devices. Through centralized control, network operators are able to deploy and manage more efficient forwarding strategies. Traditionally, when the network undergoes a change through maintenance, failure, or cyber attack, the centralized controller processes these events and deploys new forwarding rules reactively. This work provides a strategy that does not require a controller in order to maintain connectivity while only using features within the existing OpenFlow protocol version 1.3 or greater. In this paper we illustrate why forwarding resiliency is desired in OpenFlow networks and provide an algorithm that computes the flow entries required to achieve maximal forwarding resiliency in presence of both multiple link and controller failures on any arbitrary network.

2017-04-01
Weining Yang, Aiping Xiong, Jing Chen, Robert W. Proctor, Ninghui Li.  2017.  Use of Phishing Training to Improve Security Warning Compliance: Evidence from a Field Experiment.

The current approach to protect users from phishing attacks is to display a warning when the webpage is considered suspicious. We hypothesize that users are capable of making correct informed decisions when the warning also conveys the reasons why it is displayed. We chose to use traffic rankings of domains, which can be easily described to users, as a warning trigger and evaluated the effect of the phishing warning message and phishing training. The evaluation was conducted in a field experiment. We found that knowledge gained from the training enhances the effectiveness of phishing warnings, as the number of participants being phished was reduced. However, the knowledge by itself was not sufficient to provide phishing protection. We suggest that integrating training in the warning interface, involving traffic ranking in phishing detection, and explaining why warnings are generated will improve current phishing defense.

2017-03-29
Shi, Yang, Wei, Wujing, He, Zongjian, Fan, Hongfei.  2016.  An Ultra-lightweight White-box Encryption Scheme for Securing Resource-constrained IoT Devices. Proceedings of the 32Nd Annual Conference on Computer Security Applications. :16–29.

Embedded devices with constrained computational resources, such as wireless sensor network nodes, electronic tag readers, roadside units in vehicular networks, and smart watches and wristbands, are widely used in the Internet of Things. Many of such devices are deployed in untrustable environments, and others may be easy to lose, leading to possible capture by adversaries. Accordingly, in the context of security research, these devices are running in the white-box attack context, where the adversary may have total visibility of the implementation of the built-in cryptosystem with full control over its execution. It is undoubtedly a significant challenge to deal with attacks from a powerful adversary in white-box attack contexts. Existing encryption algorithms for white-box attack contexts typically require large memory use, varying from one to dozens of megabytes, and thus are not suitable for resource-constrained devices. As a countermeasure in such circumstances, we propose an ultra-lightweight encryption scheme for protecting the confidentiality of data in white-box attack contexts. The encryption is executed with secret components specialized for resource-constrained devices against white-box attacks, and the encryption algorithm requires a relatively small amount of static data, ranging from 48 to 92 KB. The security and efficiency of the proposed scheme have been theoretically analyzed with positive results, and experimental evaluations have indicated that the scheme satisfies the resource constraints in terms of limited memory use and low computational cost.

2017-03-20
Kumar, Sumeet, Carley, Kathleen M..  2016.  Understanding DDoS cyber-attacks using social media analytics. :231–236.

Cyber-attacks are cheap, easy to conduct and often pose little risk in terms of attribution, but their impact could be lasting. The low attribution is because tracing cyber-attacks is primitive in the current network architecture. Moreover, even when attribution is known, the absence of enforcement provisions in international law makes cyber attacks tough to litigate, and hence attribution is hardly a deterrent. Rather than attributing attacks, we can re-look at cyber-attacks as societal events associated with social, political, economic and cultural (SPEC) motivations. Because it is possible to observe SPEC motives on the internet, social media data could be valuable in understanding cyber attacks. In this research, we use sentiment in Twitter posts to observe country-to-country perceptions, and Arbor Networks data to build ground truth of country-to-country DDoS cyber-attacks. Using this dataset, this research makes three important contributions: a) We evaluate the impact of heightened sentiments towards a country on the trend of cyber-attacks received by the country. We find that, for some countries, the probability of attacks increases by up to 27% while experiencing negative sentiments from other nations. b) Using cyber-attacks trend and sentiments trend, we build a decision tree model to find attacks that could be related to extreme sentiments. c) To verify our model, we describe three examples in which cyber-attacks follow increased tension between nations, as perceived in social media.

Hinh, Robert, Shin, Sangmi, Taylor, Julia.  2016.  Using frame semantics in authorship attribution. :004093–004098.

Authorship attribution is a stylometric technique that associates text to authors based on the type of writing styles. Researchers have looked for ways to analyze the context of these texts, in some cases with limited results. Most of the approaches view information at the syntactic and physical levels and tend to ignore information from the semantic levels. In this paper, we present a technique that incorporates the use of semantic frames as a method for authorship attribution. We hypothesize that it provides a deeper view into the semantic level of texts, which is an influencing factor in a writer's style. We use a variety of online resources in a pipeline fashion to extract information about frames within the text. The results show that our “bag of frames” approach can be used successfully for stylometry.

Kumar, Sumeet, Carley, Kathleen M..  2016.  Understanding DDoS cyber-attacks using social media analytics. :231–236.

Cyber-attacks are cheap, easy to conduct and often pose little risk in terms of attribution, but their impact could be lasting. The low attribution is because tracing cyber-attacks is primitive in the current network architecture. Moreover, even when attribution is known, the absence of enforcement provisions in international law makes cyber attacks tough to litigate, and hence attribution is hardly a deterrent. Rather than attributing attacks, we can re-look at cyber-attacks as societal events associated with social, political, economic and cultural (SPEC) motivations. Because it is possible to observe SPEC motives on the internet, social media data could be valuable in understanding cyber attacks. In this research, we use sentiment in Twitter posts to observe country-to-country perceptions, and Arbor Networks data to build ground truth of country-to-country DDoS cyber-attacks. Using this dataset, this research makes three important contributions: a) We evaluate the impact of heightened sentiments towards a country on the trend of cyber-attacks received by the country. We find that, for some countries, the probability of attacks increases by up to 27% while experiencing negative sentiments from other nations. b) Using cyber-attacks trend and sentiments trend, we build a decision tree model to find attacks that could be related to extreme sentiments. c) To verify our model, we describe three examples in which cyber-attacks follow increased tension between nations, as perceived in social media.

Hinh, Robert, Shin, Sangmi, Taylor, Julia.  2016.  Using frame semantics in authorship attribution. :004093–004098.

Authorship attribution is a stylometric technique that associates text to authors based on the type of writing styles. Researchers have looked for ways to analyze the context of these texts, in some cases with limited results. Most of the approaches view information at the syntactic and physical levels and tend to ignore information from the semantic levels. In this paper, we present a technique that incorporates the use of semantic frames as a method for authorship attribution. We hypothesize that it provides a deeper view into the semantic level of texts, which is an influencing factor in a writer's style. We use a variety of online resources in a pipeline fashion to extract information about frames within the text. The results show that our “bag of frames” approach can be used successfully for stylometry.
 

2017-03-07
Lu, Yen-Cheng, Wu, Chih-Wei, Lu, Chang-Tien, Lerch, Alexander.  2016.  An Unsupervised Approach to Anomaly Detection in Music Datasets. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. :749–752.

This paper presents an unsupervised method for systematically identifying anomalies in music datasets. The model integrates categorical regression and robust estimation techniques to infer anomalous scores in music clips. When applied to a music genre recognition dataset, the new method is able to detect corrupted, distorted, or mislabeled audio samples based on commonly used features in music information retrieval. The evaluation results show that the algorithm outperforms other anomaly detection methods and is capable of finding problematic samples identified by human experts. The proposed method introduces a preliminary framework for anomaly detection in music data that can serve as a useful tool to improve data integrity in the future.