Biblio
The explosion in Internet-connected household devices, such as light-bulbs, smoke-alarms, power-switches, and webcams, is creating new vectors for attacking "smart-homes" at an unprecedented scale. Common perception is that smart-home IoT devices are protected from Internet attacks by the perimeter security offered by home routers. In this paper we demonstrate how an attacker can infiltrate the home network via a doctored smart-phone app. Unbeknownst to the user, this app scouts for vulnerable IoT devices within the home, reports them to an external entity, and modifies the firewall to allow the external entity to directly attack the IoT device. The ability to infiltrate smart-homes via doctored smart-phone apps demonstrates that home routers are poor protection against Internet attacks and highlights the need for increased security for IoT devices.
Authenticating a user based on her unique behavioral bio-metric traits has been extensively researched over the past few years. The most researched behavioral biometrics techniques are based on keystroke and mouse dynamics. These schemes, however, have been shown to be vulnerable to human-based and robotic attacks that attempt to mimic the user's behavioral pattern to impersonate the user. In this paper, we aim to verify the user's identity through the use of active, cognition-based user interaction in the authentication process. Such interaction boasts to provide two key advantages. First, it may enhance the security of the authentication process as multiple rounds of active interaction would serve as a mechanism to prevent against several types of attacks, including zero-effort attack, expert trained attackers, and automated attacks. Second, it may enhance the usability of the authentication process by actively engaging the user in the process. We explore the cognitive authentication paradigm through very simplistic interactive challenges, called Dynamic Cognitive Games, which involve objects floating around within the images, where the user's task is to match the objects with their respective target(s) and drag/drop them to the target location(s). Specifically, we introduce, build and study Gametrics ("Game-based biometrics"), an authentication mechanism based on the unique way the user solves such simple challenges captured by multiple features related to her cognitive abilities and mouse dynamics. Based on a comprehensive data set collected in both online and lab settings, we show that Gametrics can identify the users with a high accuracy (false negative rates, FNR, as low as 0.02) while rejecting zero-effort attackers (false positive rates, FPR, as low as 0.02). Moreover, Gametrics shows promising results in defending against expert attackers that try to learn and later mimic the user's pattern of solving the challenges (FPR for expert human attacker as low as 0.03). Furthermore, we argue that the proposed biometrics is hard to be replayed or spoofed by automated means, such as robots or malware attacks.
The function of query auto-completion in modern search engines is to help users formulate queries fast and precisely. Conventional context-aware methods primarily rank candidate queries according to term- and query- relationships to the context. However, most sessions are extremely short. How to capture search intents with such relationships becomes difficult when the context generally contains only few queries. In this paper, we investigate the feasibility of discovering search intents within short context for query auto-completion. The class distribution of the search session (i.e., issued queries and click behavior) is derived as search intents. Several distribution-based features are proposed to estimate the proximity between candidates and search intents. Finally, we apply learning-to-rank to predict the user's intended query according to these features. Moreover, we also design an ensemble model to combine the benefits of our proposed features and term-based conventional approaches. Extensive experiments have been conducted on the publicly available AOL search engine log. The experimental results demonstrate that our approach significantly outperforms six competitive baselines. The performance of keystrokes is also evaluated in experiments. Furthermore, an in-depth analysis is made to justify the usability of search intent classification for query auto-completion.
Important information regarding the learning experience and relative preparedness of Computer Science students can be obtained by analyzing their coding activity at a fine-grained level, using an online IDE that records student code editing, compiling, and testing activities down to the individual keystroke. We report results from analyses of student coding patterns using such an online IDE. In particular, we gather data from a group of students performing an assigned programming lab, using the online IDE indicated to gather statistics. We extract high-level statistics from the student data, and apply supervised learning techniques to identify those that are the most salient prediction of student success as measured by later performance in the class. We use these results to make predictions of course performance for another student group, and report on the reliability of those predictions
In this paper, we consider side-channel mechanisms, specifically using smart device ambient light sensors, to capture information about user computing activity. We distinguish keyboard keystrokes using only the ambient light sensor readings from a smart watch worn on the user's non-dominant hand. Additionally, we investigate the feasibility of capturing screen emanations for determining user browser usage patterns. The experimental results expose privacy and security risks, as well as the potential for new mobile user interfaces and applications.
In traditional programming courses, students have usually been at least partly graded using pen and paper exams. One of the problems related to such exams is that they only partially connect to the practice conducted within such courses. Testing students in a more practical environment has been constrained due to the limited resources that are needed, for example, for authentication. In this work, we study whether students in a programming course can be identified in an exam setting based solely on their typing patterns. We replicate an earlier study that indicated that keystroke analysis can be used for identifying programmers. Then, we examine how a controlled machine examination setting affects the identification accuracy, i.e. if students can be identified reliably in a machine exam based on typing profiles built with data from students' programming assignments from a course. Finally, we investigate the identification accuracy in an uncontrolled machine exam, where students can complete the exam at any time using any computer they want. Our results indicate that even though the identification accuracy deteriorates when identifying students in an exam, the accuracy is high enough to reliably identify students if the identification is not required to be exact, but top k closest matches are regarded as correct.
Twitter is one of the most popular microblogging social systems, which provides a set of distinctive posting services operating in real time. The flexibility of these services has attracted unethical individuals, so-called "spammers", aiming at spreading malicious, phishing, and misleading information. Unfortunately, the existence of spam results non-ignorable problems related to search and user's privacy. In the battle of fighting spam, various detection methods have been designed, which work by automating the detection process using the "features" concept combined with machine learning methods. However, the existing features are not effective enough to adapt spammers' tactics due to the ease of manipulation in the features. Also, the graph features are not suitable for Twitter based applications, though the high performance obtainable when applying such features. In this paper, beyond the simple statistical features such as number of hashtags and number of URLs, we examine the time property through advancing the design of some features used in the literature, and proposing new time based features. The new design of features is divided between robust advanced statistical features incorporating explicitly the time attribute, and behavioral features identifying any posting behavior pattern. The experimental results show that the new form of features is able to classify correctly the majority of spammers with an accuracy higher than 93% when using Random Forest learning algorithm, applied on a collected and annotated data-set. The results obtained outperform the accuracy of the state of the art features by about 6%, proving the significance of leveraging time in detecting spam accounts.
In the last few years, the high acceptability of service computing delivered over the internet has exponentially created immense security challenges for the services providers. Cyber criminals are using advanced malware such as polymorphic botnets for participating in our everyday online activities and trying to access the desired information in terms of personal details, credit card numbers and banking credentials. Polymorphic botnet attack is one of the biggest attacks in the history of cybercrime and currently, millions of computers are infected by the botnet clients over the world. Botnet attack is an intelligent and highly coordinated distributed attack which consists of a large number of bots that generates big volumes of spamming e-mails and launching distributed denial of service (DDoS) attacks on the victim machines in a heterogeneous network environment. Therefore, it is necessary to detect the malicious bots and prevent their planned attacks in the cloud environment. A number of techniques have been developed for detecting the malicious bots in a network in the past literature. This paper recognize the ineffectiveness exhibited by the singnature based detection technique and networktraffic based detection such as NetFlow or traffic flow detection and Anomaly based detection. We proposed a real time malware detection methodology based on Domain Generation Algorithm. It increasesthe throughput in terms of early detection of malicious bots and high accuracy of identifying the suspicious behavior.
Botnets play major roles in a vast number of threats to network security, such as DDoS attacks, generation of spam emails, information theft. Detecting Botnets is a difficult task in due to the complexity and performance issues when analyzing the huge amount of data from real large-scale networks. In major Botnet malware, the use of Domain Generation Algorithms allows to decrease possibility to be detected using white list - blacklist scheme and thus DGA Botnets have higher survival. This paper proposes a DGA Botnet detection scheme based on DNS traffic analysis which utilizes semantic measures such as entropy, meaning the level of the domain, frequency of n-gram appearances and Mahalanobis distance for domain classification. The proposed method is an improvement of Phoenix botnet detection mechanism, where in the classification phase, the modified Mahalanobis distance is used instead of the original for classification. The clustering phase is based on modified k-means algorithm for archiving better effectiveness. The effectiveness of the proposed method was measured and compared with Phoenix, Linguistic and SVM Light methods. The experimental results show the accuracy of proposed Botnet detection scheme ranges from 90 to 99,97% depending on Botnet type.
The hyperlink structure of World Wide Web is modeled as a directed, dynamic, and huge web graph. Web graphs are analyzed for determining page rank, fighting web spam, detecting communities, and so on, by performing tasks such as clustering, classification, and reachability. These tasks involve operations such as graph navigation, checking link existence, and identifying active links, which demand scanning of entire graphs. Frequent scanning of very large graphs involves more I/O operations and memory overheads. To rectify these issues, several data structures have been proposed to represent graphs in a compact manner. Even though the problem of representing graphs has been actively studied in the literature, there has been much less focus on representation of dynamic graphs. In this paper, we propose Tree-Dictionary-Representation (TDR), a compressed graph representation that supports dynamic nature of graphs as well as the various graph operations. Our experimental study shows that this representation works efficiently with limited main memory use and provides fast traversal of edges.
While email plays a growingly important role on the Internet, we are faced with more severe challenges brought by compromised email accounts, especially for the administrators of institutional email service providers. Inspired by the previous experience on spam filtering and compromised accounts detection, we propose several criteria, like Success Outdegree Proportion, Reverse Pagerank, Recipient Clustering Coefficient and Legitimate Recipient Proportion, for compromised email accounts detection from the perspective of graph topology in this paper. Specifically, several widely used social network analysis metrics are used and adapted according to the characteristics of mail log analysis. We evaluate our methods on a dataset constructed by mining the one month (30 days) mail log from an university with 118,617 local users and 11,460,399 mail log entries. The experimental results demonstrate that our methods achieve very positive performance, and we also prove that these methods can be efficiently applied on even larger datasets.
Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.
Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.
Face recognition has attained a greater importance in bio-metric authentication due to its non-intrusive property of identifying individuals at varying stand-off distance. Face recognition based on multi-spectral imaging has recently gained prime importance due to its ability to capture spatial and spectral information across the spectrum. Our first contribution in this paper is to use extended multi-spectral face recognition in two different age groups. The second contribution is to show empirically the performance of face recognition for two age groups. Thus, in this paper, we developed a multi-spectral imaging sensor to capture facial database for two different age groups (≤ 15years and ≥ 20years) at nine different spectral bands covering 530nm to 1000nm range. We then collected a new facial images corresponding to two different age groups comprises of 168 individuals. Extensive experimental evaluation is performed independently on two different age group databases using four different state-of-the-art face recognition algorithms. We evaluate the verification and identification rate across individual spectral bands and fused spectral band for two age groups. The obtained evaluation results shows higher recognition rate for age groups ≥ 20years than ≤ 15years, which indicates the variation in face recognition across the different age groups.
The face is the most dominant and distinct communication tool of human beings. Automatic analysis of facial behavior allows machines to understand and interpret a human's states and needs for natural interactions. This research focuses on developing advanced computer vision techniques to process and analyze facial images for the recognition of various facial behaviors. Specifically, this research consists of two parts: automatic facial landmark detection and tracking, and facial behavior analysis and recognition using the tracked facial landmark points. In the first part, we develop several facial landmark detection and tracking algorithms on facial images with varying conditions, such as varying facial expressions, head poses and facial occlusions. First, to handle facial expression and head pose variations, we introduce a hierarchical probabilistic face shape model and a discriminative deep face shape model to capture the spatial relationships among facial landmark points under different facial expressions and face poses to improve facial landmark detection. Second, to handle facial occlusion, we improve upon the effective cascade regression framework and propose the robust cascade regression framework for facial landmark detection, which iteratively predicts the landmark visibility probabilities and landmark locations. The second part of this research applies our facial landmark detection and tracking algorithms to facial behavior analysis, including facial action recognition and face pose estimation. For facial action recognition, we introduce a novel regression framework for joint facial landmark detection and facial action recognition. For head pose estimation, we are working on a robust algorithm that can perform head pose estimation under facial occlusion.
Recognition of facial expressions authenticity is quite troublesome for humans. Therefore, it is an interesting topic for the computer vision community, as the developed algorithms for facial expressions authenticity estimation may be used as indicators of deception. This paper discusses the state-of-the art methods developed for smile veracity estimation and proposes a plan of development and validation of a novel approach to automated discrimination between genuine and posed facial expressions. The proposed fully automated technique is based on the extension of the high-dimensional Local Binary Patterns (LBP) to the spatio-temporal domain and combines them with the dynamics of facial landmarks movements. The proposed technique will be validated on several existing smile databases and a novel database created with the use of a high speed camera. Finally, the developed framework will be applied for the detection of deception in real life scenarios.
Automatic face recognition techniques applied on particular group or mass database introduces error cases. Error prevention is crucial for the court. Reranking of recognition results based on anthropology analysis can significant improve the accuracy of automatic methods. Previous studies focused on manual facial comparison. This paper proposed a weighted facial similarity computing method based on morphological analysis of components characteristics. Search sequence of face recognition reranked according to similarity, while the interference terms can be removed. Within this research project, standardized photographs, surveillance videos, 3D face images, identity card photographs of 241 male subjects from China were acquired. Sequencing results were modified by modeling selected individual features from the DMV altas. The improved method raises the accuracy of face recognition through anthroposophic or morphologic theory.
Face is crucial for human identity, while face identification has become crucial to information security. It is important to understand and work with the problems and challenges for all different aspects of facial feature extraction and face identification. In this tutorial, we identify and discuss four research challenges in current Face Detection/Recognition research and related research areas: (1) Unavoidable Facial Feature Alterations, (2) Voluntary Facial Feature Alterations, (3) Uncontrolled Environments, and (4) Accuracy Control on Large-scale Dataset. We also direct several different applications (spin-offs) of facial feature studies in the tutorial.
Augmented reality is poised to become a dominant computing paradigm over the next decade. With promises of three-dimensional graphics and interactive interfaces, augmented reality experiences will rival the very best science fiction novels. This breakthrough also brings in unique challenges on how users can authenticate one another to share rich content between augmented reality headsets. Traditional authentication protocols fall short when there is no common central entity or when access to the central authentication server is not available or desirable. Looks Good To Me (LGTM) is an authentication protocol that leverages the unique hardware and context provided with augmented reality headsets to bring innate human trust mechanisms into the digital world to solve authentication in a usable and secure way. LGTM works over point to point wireless communication so users can authenticate one another in a variety of circumstances and is designed with usability at its core, requiring users to perform only two actions: one to initiate and one to confirm. Users intuitively authenticate one another, using seemingly only each other's faces, but under the hood LGTM uses a combination of facial recognition and wireless localization to bootstrap trust from a wireless signal, to a location, to a face, for secure and usable authentication.
Heterogeneous face recognition aims to identify or verify person identity by matching facial images of different modalities. In practice, it is known that its performance is highly influenced by modality inconsistency, appearance occlusions, illumination variations and expressions. In this paper, a new method named as ensemble of sparse cross-modal metrics is proposed for tackling these challenging issues. In particular, a weak sparse cross-modal metric learning method is firstly developed to measure distances between samples of two modalities. It learns to adjust rank-one cross-modal metrics to satisfy two sets of triplet based cross-modal distance constraints in a compact form. Meanwhile, a group based feature selection is performed to enforce that features in the same position of two modalities are selected simultaneously. By neglecting features that attribute to "noise" in the face regions (eye glasses, expressions and so on), the performance of learned weak metrics can be markedly improved. Finally, an ensemble framework is incorporated to combine the results of differently learned sparse metrics into a strong one. Extensive experiments on various face datasets demonstrate the benefit of such feature selection especially when heavy occlusions exist. The proposed ensemble metric learning has been shown superiority over several state-of-the-art methods in heterogeneous face recognition.
Machine learning is enabling a myriad innovations, including new algorithms for cancer diagnosis and self-driving cars. The broad use of machine learning makes it important to understand the extent to which machine-learning algorithms are subject to attack, particularly when used in applications where physical security or safety is at risk. In this paper, we focus on facial biometric systems, which are widely used in surveillance and access control. We define and investigate a novel class of attacks: attacks that are physically realizable and inconspicuous, and allow an attacker to evade recognition or impersonate another individual. We develop a systematic method to automatically generate such attacks, which are realized through printing a pair of eyeglass frames. When worn by the attacker whose image is supplied to a state-of-the-art face-recognition algorithm, the eyeglasses allow her to evade being recognized or to impersonate another individual. Our investigation focuses on white-box face-recognition systems, but we also demonstrate how similar techniques can be used in black-box scenarios, as well as to avoid face detection.
In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the fine-tuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.
In Data mining is the method of extracting the knowledge from huge amount of data and interesting patterns. With the rapid increase of data storage, cloud and service-based computing, the risk of misuse of data has become a major concern. Protecting sensitive information present in the data is crucial and critical. Data perturbation plays an important role in privacy preserving data mining. The major challenge of privacy preserving is to concentrate on factors to achieve privacy guarantee and data utility. We propose a data perturbation method that perturbs the data using fuzzy logic and random rotation. It also describes aspects of comparable level of quality over perturbed data and original data. The comparisons are illustrated on different multivariate datasets. Experimental study has proved the model is better in achieving privacy guarantee of data, as well as data utility.
Within few years, Cloud computing has emerged as the most promising IT business model. Thanks to its various technical and financial advantages, Cloud computing continues to convince every day new users coming from scientific and industrial sectors. To satisfy the various users' requirements, Cloud providers must maximize the performance of their IT resources to ensure the best service at the lowest cost. The performance optimization efforts in the Cloud can be achieved at different levels and aspects. In the present paper, we propose to introduce a fuzzy logic process in scheduling strategy for public Cloud in order to improve the response time, processing time and total cost. In fact, fuzzy logic has proven his ability to solve the problem of optimization in several fields such as data mining, image processing, networking and much more.
Over the last few decades, accessibility scenarios have undergone a drastic change. Today the way people access information and resources is quite different from the age when internet was not evolved. The evolution of the Internet has made remarkable, epoch-making changes and has become the backbone of smart city. The vision of smart city revolves around seamless connectivity. Constant connectivity can provide uninterrupted services to users such as e-governance, e-banking, e-marketing, e-shopping, e-payment and communication through social media. And to provide uninterrupted services to such applications to citizens is our prime concern. So this paper focuses on smart handoff framework for next generation heterogeneous networks in smart cities to provide all time connectivity to anyone, anyhow and anywhere. To achieve this, three strategies have been proposed for handoff initialization phase-Mobile controlled, user controlled and network controlled handoff initialization. Each strategy considers a different set of parameters. Results show that additional parameters with RSSI and adaptive threshold and hysteresis solve ping-pong and corner effect problems in smart city.