Biblio
Despite the latest initiatives and research efforts to increase user privacy in digital scenarios, identity-related cybercrimes such as identity theft, wrong identity or user transactions surveillance are growing. In particular, blanket surveillance that might be potentially accomplished by Identity Providers (IdPs) contradicts the data minimization principle laid out in GDPR. Hence, user movements across Service Providers (SPs) might be tracked by malicious IdPs that become a central dominant entity, as well as a single point of failure in terms of privacy and security, putting users at risk when compromised. To cope with this issue, the OLYMPUS H2020 EU project is devising a truly privacy-preserving, yet user-friendly, and distributed identity management system that addresses the data minimization challenge in both online and offline scenarios. Thus, OLYMPUS divides the role of the IdP among various authorities by relying on threshold cryptography, thereby preventing user impersonation and surveillance from malicious or nosy IdPs. This paper overviews the OLYMPUS framework, including requirements considered, the proposed architecture, a series of use cases as well as the privacy analysis from the legal point of view.
At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.