Biblio
Recent methods for learning vector space representations of words, word embedding, such as GloVe and Word2Vec have succeeded in capturing fine-grained semantic and syntactic regularities. We analyzed the effectiveness of these methods for e-commerce recommender systems by transferring the sequence of items generated by users' browsing journey in an e-commerce website into a sentence of words. We examined the prediction of fine-grained item similarity (such as item most similar to iPhone 6 64GB smart phone) and item analogy (such as iPhone 5 is to iPhone 6 as Samsung S5 is to Samsung S6) using real life users' browsing history of an online European department store. Our results reveal that such methods outperform related models such as singular value decomposition (SVD) with respect to item similarity and analogy tasks across different product categories. Furthermore, these methods produce a highly condensed item vector space representation, item embedding, with behavioral meaning sub-structure. These vectors can be used as features in a variety of recommender system applications. In particular, we used these vectors as features in a neural network based models for anonymous user recommendation based on session's first few clicks. It is found that recurrent neural network that preserves the order of user's clicks outperforms standard neural network, item-to-item similarity and SVD (recall@10 value of 42% based on first three clicks) for this task.
In this paper we describe and share with the research community, a significant smartphone dataset obtained from an ongoing long-term data collection experiment. The dataset currently contains 10 billion data records from 30 users collected over a period of 1.6 years and an additional 20 users for 6 months (totaling 50 active users currently participating in the experiment). The experiment involves two smartphone agents: SherLock and Moriarty. SherLock collects a wide variety of software and sensor data at a high sample rate. Moriarty perpetrates various attacks on the user and logs its activities, thus providing labels for the SherLock dataset. The primary purpose of the dataset is to help security professionals and academic researchers in developing innovative methods of implicitly detecting malicious behavior in smartphones. Specifically, from data obtainable without superuser (root) privileges. To demonstrate possible uses of the dataset, we perform a basic malware analysis and evaluate a method of continuous user authentication.