Biblio

List
Filter

Found 2348 results

Filters: Keyword is privacy [Clear All Filters]

2021-11-08

Brown, Brandon, Richardson, Alexicia, Smith, Marcellus, Dozier, Gerry, King, Michael C.. 2020. The Adversarial UFP/UFN Attack: A New Threat to ML-based Fake News Detection Systems? 2020 IEEE Symposium Series on Computational Intelligence (SSCI). :1523–1527.

In this paper, we propose two new attacks: the Adversarial Universal False Positive (UFP) Attack and the Adversarial Universal False Negative (UFN) Attack. The objective of this research is to introduce a new class of attack using only feature vector information. The results show the potential weaknesses of five machine learning (ML) classifiers. These classifiers include k-Nearest Neighbor (KNN), Naive Bayes (NB), Random Forrest (RF), a Support Vector Machine (SVM) with a Radial Basis Function (RBF) Kernel, and XGBoost (XGB).

Afroz, Sabrina, Ariful Islam, S.M, Nawer Rafa, Samin, Islam, Maheen. 2020. A Two Layer Machine Learning System for Intrusion Detection Based on Random Forest and Support Vector Machine. 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE). :300–303.

Unauthorized access or intrusion is a massive threatening issue in the modern era. This study focuses on designing a model for an ideal intrusion detection system capable of defending a network by alerting the admins upon detecting any sorts of malicious activities. The study proposes a two layered anomaly-based detection model that uses filter co-relation method for dimensionality reduction along with Random forest and Support Vector Machine as its classifiers. It achieved a very good detection rate against all sorts of attacks including a low rate of false alarms as well. The contribution of this study is that it could be of a major help to the computer scientists designing good intrusion detection systems to keep an industry or organization safe from the cyber threats as it has achieved the desired qualities of a functional IDS model.

Chang, Sang-Yoon, Park, Younghee, Kengalahalli, Nikhil Vijayakumar, Zhou, Xiaobo. 2020. Query-Crafting DoS Threats Against Internet DNS. 2020 IEEE Conference on Communications and Network Security (CNS). :1–9.

Domain name system (DNS) resolves the IP addresses of domain names and is critical for IP networking. Recent denial-of-service (DoS) attacks on Internet targeted the DNS system (e.g., Dyn), which has the cascading effect of denying the availability of the services and applications relying on the targeted DNS. In view of these attacks, we investigate the DoS on DNS system and introduce the query-crafting threats where the attacker controls the DNS query payload (the domain name) to maximize the threat impact per query (increasing the communications between the DNS servers and the threat time duration), which is orthogonal to other DoS approaches to increase the attack impact such as flooding and DNS amplification. We model the DNS system using a state diagram and comprehensively analyze the threat space, identifying the threat vectors which include not only the random/invalid domains but also those using the domain name structure to combine valid strings and random strings. Query-crafting DoS threats generate new domain-name payloads for each query and force increased complexity in the DNS query resolution. We test the query-crafting DoS threats by taking empirical measurements on the Internet and show that they amplify the DoS impact on the DNS system (recursive resolver) by involving more communications and taking greater time duration. To defend against such DoS or DDoS threats, we identify the relevant detection features specific to query-crafting threats and evaluate the defense using our prototype in CloudLab.

Rashid, Junaid, Mahmood, Toqeer, Nisar, Muhammad Wasif, Nazir, Tahira. 2020. Phishing Detection Using Machine Learning Technique. 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH). :43–46.

Today, everyone is highly dependent on the internet. Everyone performed online shopping and online activities such as online Bank, online booking, online recharge and more on internet. Phishing is a type of website threat and phishing is Illegally on the original website Information such as login id, password and information of credit card. This paper proposed an efficient machine learning based phishing detection technique. Overall, experimental results show that the proposed technique, when integrated with the Support vector machine classifier, has the best performance of accurately distinguishing 95.66% of phishing and appropriate websites using only 22.5% of the innovative functionality. The proposed technique exhibits optimistic results when benchmarking with a range of standard phishing datasets of the “University of California Irvine (UCI)” archive. Therefore, proposed technique is preferred and used for phishing detection based on machine learning.

Ma, Qicheng, Rastogi, Nidhi. 2020. DANTE: Predicting Insider Threat using LSTM on system logs. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1151–1156.

Insider threat is one of the most pernicious threat vectors to information and communication technologies (ICT) across the world due to the elevated level of trust and access that an insider is afforded. This type of threat can stem from both malicious users with a motive as well as negligent users who inadvertently reveal details about trade secrets, company information, or even access information to malignant players. In this paper, we propose a novel approach that uses system logs to detect insider behavior using a special recurrent neural network (RNN) model. Ground truth is established using DANTE and used as baseline for identifying anomalous behavior. For this, system logs are modeled as a natural language sequence and patterns are extracted from these sequences. We create workflows of sequences of actions that follow a natural language logic and control flow. These flows are assigned various categories of behaviors - malignant or benign. Any deviation from these sequences indicates the presence of a threat. We further classify threats into one of the five categories provided in the CERT insider threat dataset. Through experimental evaluation, we show that the proposed model can achieve 93% prediction accuracy.

Cai, Junhui, Li, Qianmu. 2020. Machine Learning-Based Threat Identification of Industrial Internet. 2020 IEEE International Conference on Progress in Informatics and Computing (PIC). :335–340.

In order to improve production and management efficiency, traditional industrial control systems are gradually connected to the Internet, and more likely to use advanced modern information technologies, such as cloud computing, big data technology, and artificial intelligence. Industrial control system is widely used in national key infrastructure. Meanwhile, a variety of attack threats and risks follow, and once the industrial control network suffers maliciously attack, the loss caused is immeasurable. In order to improve the security and stability of the industrial Internet, this paper studies the industrial control network traffic threat identification technology based on machine learning methods, including GK-SVDD, RNN and KPCA reconstruction error algorithm, and proposes a heuristic method for selecting Gaussian kernel width parameter in GK-SVDD to accelerate real-time threat detection in industrial control environments. Experiments were conducted on two public industrial control network traffic datasets. Compared with the existing methods, these methods can obtain faster detection efficiency and better threat identification performance.

Shaukat, Kamran, Luo, Suhuai, Chen, Shan, Liu, Dongxi. 2020. Cyber Threat Detection Using Machine Learning Techniques: A Performance Evaluation Perspective. 2020 International Conference on Cyber Warfare and Security (ICCWS). :1–6.

The present-day world has become all dependent on cyberspace for every aspect of daily living. The use of cyberspace is rising with each passing day. The world is spending more time on the Internet than ever before. As a result, the risks of cyber threats and cybercrimes are increasing. The term `cyber threat' is referred to as the illegal activity performed using the Internet. Cybercriminals are changing their techniques with time to pass through the wall of protection. Conventional techniques are not capable of detecting zero-day attacks and sophisticated attacks. Thus far, heaps of machine learning techniques have been developed to detect the cybercrimes and battle against cyber threats. The objective of this research work is to present the evaluation of some of the widely used machine learning techniques used to detect some of the most threatening cyber threats to the cyberspace. Three primary machine learning techniques are mainly investigated, including deep belief network, decision tree and support vector machine. We have presented a brief exploration to gauge the performance of these machine learning techniques in the spam detection, intrusion detection and malware detection based on frequently used and benchmark datasets.

2021-10-12

Jayabalan, Manoj. 2020. Towards an Approach of Risk Analysis in Access Control. 2020 13th International Conference on Developments in eSystems Engineering (DeSE). :287–292.

Information security provides a set of mechanisms to be implemented in the organisation to protect the disclosure of data to the unauthorised person. Access control is the primary security component that allows the user to authorise the consumption of resources and data based on the predefined permissions. However, the access rules are static in nature, which does not adapt to the dynamic environment includes but not limited to healthcare, cloud computing, IoT, National Security and Intelligence Arena and multi-centric system. There is a need for an additional countermeasure in access decision that can adapt to those working conditions to assess the threats and to ensure privacy and security are maintained. Risk analysis is an act of measuring the threats to the system through various means such as, analysing the user behaviour, evaluating the user trust, and security policies. It is a modular component that can be integrated into the existing access control to predict the risk. This study presents the different techniques and approaches applied for risk analysis in access control. Based on the insights gained, this paper formulates the taxonomy of risk analysis and properties that will allow researchers to focus on areas that need to be improved and new features that could be beneficial to stakeholders.

Adibi, Mahya, van der Woude, Jacob. 2020. Distributed Learning Control for Economic Power Dispatch: A Privacy Preserved Approach*. 2020 IEEE 29th International Symposium on Industrial Electronics (ISIE). :821–826.

We present a privacy-preserving distributed reinforcement learning-based control scheme to address the problem of frequency control and economic dispatch in power generation systems. The proposed control approach requires neither a priori system model knowledge nor the mathematical formulation of the generation cost functions. Due to not requiring the generation cost models, the control scheme is capable of dealing with scenarios in which the cost functions are hard to formulate and/or non-convex. Furthermore, it is privacy-preserving, i.e. none of the units in the network needs to communicate its cost function and/or control policy to its neighbors. To realize this, we propose an actor-critic algorithm with function approximation in which the actor step is performed individually by each unit with no need to infer the policies of others. Moreover, in the critic step each generation unit shares its estimate of the local measurements and the estimate of its cost function with the neighbors, and via performing a consensus algorithm, a consensual estimate is achieved. The performance of our proposed control scheme, in terms of minimizing the overall cost while persistently fulfilling the demand and fast reaction and convergence of our distributed algorithm, is demonstrated on a benchmark case study.

Ferraro, Angelo. 2020. When AI Gossips. 2020 IEEE International Symposium on Technology and Society (ISTAS). :69–71.

The concept of AI Gossip is presented. It is analogous to the traditional understanding of a pernicious human failing. It is made more egregious by the technology of AI, internet, current privacy policies, and practices. The recognition by the technological community of its complacency is critical to realizing its damaging influence on human rights. A current example from the medical field is provided to facilitate the discussion and illustrate the seriousness of AI Gossip. Further study and model development is encouraged to support and facilitate the need to develop standards to address the implications and consequences to human rights and dignity.

Zhou, Yimin, Zhang, Kai. 2020. DoS Vulnerability Verification of IPSec VPN. 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). :698–702.

This paper analyzes the vulnerability in the process of key negotiation between the main mode and aggressive mode of IKEv1 protocol in IPSec VPN, and proposes a DOS attack method based on OSPF protocol adjacent route spoofing. The experiment verifies the insecurity of IPSec VPN using IKEv1 protocol. This attack method has the advantages of lower cost and easier operation compared with using botnet.

Vinarskii, Evgenii, Demakov, Alexey, Kamkin, Alexander, Yevtushenko, Nina. 2020. Verifying cryptographic protocols by Tamarin Prover. 2020 Ivannikov Memorial Workshop (IVMEM). :69–75.

Cryptographic protocols are utilized for establishing a secure session between “honest” agents which communicate strictly according to the protocol rules as well as for ensuring the authenticated and confidential transmission of messages. The specification of a cryptographic protocol is usually presented as a set of requirements for the sequences of transmitted messages including the format of such messages. Note that protocol can describe several execution scenarios. All these requirements lead to a huge formal specification for a real cryptographic protocol and therefore, it is difficult to verify the security of the whole cryptographic protocol at once. In this paper, to overcome this problem, we suggest verifying the protocol security for its fragments. Namely, we verify the security properties for a special set of so-called traces of the cryptographic protocol. Intuitively, a trace of the cryptographic protocol is a sequence of computations, value checks, and transmissions on the sides of “honest” agents permitted by the protocol. In order to choose such set of traces, we introduce an Adversary model and the notion of a similarity relation for traces. We then verify the security properties of selected traces with Tamarin Prover. Experimental results for the EAP and Noise protocols clearly show that this approach can be promising for automatic verification of large protocols.

Hassan, Mehmood, Sultan, Aiman, Awan, Ali Afzal, Tahir, Shahzaib, Ihsan, Imran. 2020. An Enhanced and Secure Multiserver-based User Authentication Protocol. 2020 International Conference on Cyber Warfare and Security (ICCWS). :1–6.

The extensive use of the internet and web-based applications spot the multiserver authentication as a significant component. The users can get their services after authenticating with the service provider by using similar registration records. Various protocol schemes are developed for multiserver authentication, but the existing schemes are not secure and often lead towards various vulnerabilities and different security issues. Recently, Zhao et al. put forward a proposal for smart card and user's password-based authentication protocol for the multiserver environment and showed that their proposed protocol is efficient and secure against various security attacks. This paper points out that Zhao et al.'s authentication scheme is susceptive to traceability as well as anonymity attacks. Thus, it is not feasible for the multiserver environment. Furthermore, in their scheme, it is observed that a user while authenticating does not send any information with any mention of specific server identity. Therefore, this paper proposes an enhanced, efficient and secure user authentication scheme for use in any multiserver environment. The formal security analysis and verification of the protocol is performed using state-of-the-art tool “ProVerif” yielding that the proposed scheme provides higher levels of security.

Kai, Wang, Wei, Li, Tao, Chen, Longmei, Nan. 2020. Research on Secure JTAG Debugging Model Based on Schnorr Identity Authentication Protocol. 2020 IEEE 15th International Conference on Solid-State Integrated Circuit Technology (ICSICT). :1–3.

As a general interface for chip system testing and on-chip debugging, JTAG is facing serious security threats. By analyzing the typical JTAG attack model and security protection measures, this paper designs a secure JTAG debugging model based on Schnorr identity authentication protocol, and takes RISCV as an example to build a set of SoC prototype system to complete functional verification. Experiments show that this secure JTAG debugging model has high security, flexible implementation, and good portability. It can meet the JTAG security protection requirements in various application scenarios. The maximum clock frequency can reach 833MHZ, while the hardware overhead is only 47.93KGate.

Li, Yongjian, Cao, Taifeng, Jansen, David N., Pang, Jun, Wei, Xiaotao. 2020. Accelerated Verification of Parametric Protocols with Decision Trees. 2020 IEEE 38th International Conference on Computer Design (ICCD). :397–404.

Within a framework for verifying parametric network protocols through induction, one needs to find invariants based on a protocol instance of a small number of nodes. In this paper, we propose a new approach to accelerate parameterized verification by adopting decision trees to represent the state space of a protocol instance. Such trees can be considered as a knowledge base that summarizes all behaviors of the protocol instance. With this knowledge base, we are able to efficiently construct an oracle to effectively assess candidates of invariants of the protocol, which are suggested by an invariant finder. With the discovered invariants, a formal proof for the correctness of the protocol can be derived in the framework after proper generalization. The effectiveness of our method is demonstrated by experiments with typical benchmarks.

He, Leifeng, Liu, Guanjun. 2020. Petri Nets Based Verification of Epistemic Logic and Its Application on Protocols of Privacy and Security. 2020 IEEE World Congress on Services (SERVICES). :25–28.

Epistemic logic can specify many design requirements of privacy and security of multi-agent systems (MAS). The existing model checkers of epistemic logic use some programming languages to describe MAS, induce Kripke models as the behavioral representation of MAS, apply Ordered Binary Decision Diagrams (OBDD) to encode Kripke models to solve their state explosion problem and verify epistemic logic based on the encoded Kripke models. However, these programming languages are usually non-intuitive. More seriously, their OBDD-based model checking processes are often time-consuming due to their dynamic variable ordering for OBDD. Therefore, we define Knowledge-oriented Petri Nets (KPN) to intuitively describe MAS, induce similar reachability graphs as the behavioral representation of KPN, apply OBDD to encode all reachable states, and finally verify epistemic logic. Although we also use OBDD, we adopt a heuristic method for the computation of a static variable order instead of dynamic variable ordering. More importantly, while verifying an epistemic formula, we dynamically generate its needed similar relations, which makes our model checking process much more efficient. In this paper, we introduce our work.

Remlein, Piotr, Rogacki, Mikołaj, Stachowiak, Urszula. 2020. Tamarin software – the tool for protocols verification security. 2020 Baltic URSI Symposium (URSI). :118–123.

In order to develop safety-reliable standards for IoT (Internet of Things) networks, appropriate tools for their verification are needed. Among them there is a group of tools based on automated symbolic analysis. Such a tool is Tamarin software. Its usage for creating formal proofs of security protocols correctness has been presented in this paper using the simple example of an exchange of messages with asynchronous encryption between two agents. This model can be used in sensor networks or IoT e.g. in TLS protocol to provide a mechanism for secure cryptographic key exchange.

Naveed, Sarah, Sultan, Aiman, Mansoor, Khwaja. 2020. An Enhanced SIP Authentication Protocol for Preserving User Privacy. 2020 International Conference on Cyber Warfare and Security (ICCWS). :1–6.

Owing to the advancements in communication media and devices all over the globe, there has arisen a dire need for to limit the alarming number of attacks targeting these and to enhance their security. Multiple techniques have been incorporated in different researches and various protocols and schemes have been put forward to cater security issues of session initiation protocol (SIP). In 2008, Qiu et al. presented a proposal for SIP authentication which while effective than many existing schemes, was still found vulnerable to many security attacks. To overcome those issues, Zhang et al. proposed an authentication protocol. This paper presents the analysis of Zhang et al. authentication scheme and concludes that their proposed scheme is susceptible to user traceablity. It also presents an improved SIP authentication scheme that eliminates the possibility of traceability of user's activities. The proposed scheme is also verified by contemporary verification tool, ProVerif and it is found to be more secure, efficient and practical than many similar SIP authetication scheme.

Chang, Kai Chih, Nokhbeh Zaeem, Razieh, Barber, K. Suzanne. 2020. Is Your Phone You? How Privacy Policies of Mobile Apps Allow the Use of Your Personally Identifiable Information 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). :256–262.

People continue to store their sensitive information in their smart-phone applications. Users seldom read an app's privacy policy to see how their information is being collected, used, and shared. In this paper, using a reference list of over 600 Personally Identifiable Information (PII) attributes, we investigate the privacy policies of 100 popular health and fitness mobile applications in both Android and iOS app markets to find the set of personal information these apps collect, use and share. The reference list of PII was independently built from a longitudinal study at The University of Texas investigating thousands of identity theft and fraud cases where PII attributes and associated value and risks were empirically quantified. This research leverages the reference PII list to identify and analyze the value of personal information collected by the mobile apps and the risk of disclosing this information. We found that the set of PII collected by these mobile apps covers 35% of the entire reference set of PII and, due to dependencies between PII attributes, these mobile apps have a likelihood of indirectly impacting 70% of the reference PII if breached. For a specific app, we discovered the monetary loss could reach \$1M if the set of sensitive data it collects is breached. We finally utilize Bayesian inference to measure risks of a set of PII gathered by apps: the probability that fraudsters can discover, impersonate and cause harm to the user by misusing only the PII the mobile apps collected.

Liao, Guocheng, Chen, Xu, Huang, Jianwei. 2020. Privacy Policy in Online Social Network with Targeted Advertising Business. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. :934–943.

In an online social network, users exhibit personal information to enjoy social interaction. The social network provider (SNP) exploits users' information for revenue generation through targeted advertising. The SNP can present ads to proper users efficiently. Therefore, an advertiser is more willing to pay for targeted advertising. However, the over-exploitation of users' information would invade users' privacy, which would negatively impact users' social activeness. Motivated by this, we study the optimal privacy policy of the SNP with targeted advertising business. We characterize the privacy policy in terms of the fraction of users' information that the provider should exploit, and formulate the interactions among users, advertiser, and SNP as a three-stage Stackelberg game. By carefully leveraging supermodularity property, we reveal from the equilibrium analysis that higher information exploitation will discourage users from exhibiting information, lowering the overall amount of exploited information and harming advertising revenue. We further characterize the optimal privacy policy based on the connection between users' information levels and privacy policy. Numerical results reveal some useful insights that the optimal policy can well balance the users' trade-off between social benefit and privacy loss.

Faurie, Pascal, Moldovan, Arghir-Nicolae, Tal, Irina. 2020. Privacy Policy – ``I Agree''⁈ – Do Alternatives to Text-Based Policies Increase the Awareness of the Users? 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security). :1–6.

Since GDPR was introduced, there is a reinforcement of the fact that users must give their consent before their personal data can be managed by any website. However, many studies have demonstrated that users often skip these policies and click the "I agree" button to continue browsing, being unaware of what the consent they gave was about, hence defeating the purpose of GDPR. This paper investigates if different ways of presenting users the privacy policy can change this behaviour and can lead to an increased awareness of the user in relation to what the user agrees with. Three different types of policies were used in the study: a full-text policy, a so-called usable policy, and a video-based policy. Results demonstrated that the type of policy has a direct influence on the user awareness and user satisfaction. The two alternatives to the text-based policy lead to a significant increase of user awareness in relation to the content of the policy and to a significant increase in the user satisfaction in relation to the usability of the policy.

Farooq, Emmen, Nawaz UI Ghani, M. Ahmad, Naseer, Zuhaib, Iqbal, Shaukat. 2020. Privacy Policies' Readability Analysis of Contemporary Free Healthcare Apps. 2020 14th International Conference on Open Source Systems and Technologies (ICOSST). :1–7.

mHealth apps have a vital role in facilitation of human health management. Users have to enter sensitive health related information in these apps to fully utilize their functionality. Unauthorized sharing of sensitive health information is undesirable by the users. mHealth apps also collect data other than that required for their functionality like surfing behavior of a user or hardware details of devices used. mHealth software and their developers also share such data with third parties for reasons other than medical support provision to the user, like advertisements of medicine and health insurance plans. Existence of a comprehensive and easy to understand data privacy policy, on user data acquisition, sharing and management is a salient requirement of modern user privacy protection demands. Readability is one parameter by which ease of understanding of privacy policy is determined. In this research, privacy policies of 27 free Android, medical apps are analyzed. Apps having user rating of 4.0 and downloads of 1 Million or more are included in data set of this research.RGL, Flesch-Kincaid Reading Grade Level, SMOG, Gunning Fox, Word Count, and Flesch Reading Ease of privacy policies are calculated. Average Reading Grade Level of privacy policies is 8.5. It is slightly greater than average adult RGL in the US. Free mHealth apps have a large number of users in other, less educated parts of the World. Privacy policies with an average RGL of 8.5 may be difficult to comprehend in less educated populations.

Al Omar, Abdullah, Jamil, Abu Kaisar, Nur, Md. Shakhawath Hossain, Hasan, Md Mahamudul, Bosri, Rabeya, Bhuiyan, Md Zakirul Alam, Rahman, Mohammad Shahriar. 2020. Towards A Transparent and Privacy-Preserving Healthcare Platform with Blockchain for Smart Cities. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1291–1296.

In smart cities, data privacy and security issues of Electronic Health Record(EHR) are grabbing importance day by day as cyber attackers have identified the weaknesses of EHR platforms. Besides, health insurance companies interacting with the EHRs play a vital role in covering the whole or a part of the financial risks of a patient. Insurance companies have specific policies for which patients have to pay them. Sometimes the insurance policies can be altered by fraudulent entities. Another problem that patients face in smart cities is when they interact with a health organization, insurance company, or others, they have to prove their identity to each of the organizations/companies separately. Health organizations or insurance companies have to ensure they know with whom they are interacting. To build a platform where a patient's personal information and insurance policy are handled securely, we introduce an application of blockchain to solve the above-mentioned issues. In this paper, we present a solution for the healthcare system that will provide patient privacy and transparency towards the insurance policies incorporating blockchain. Privacy of the patient information will be provided using cryptographic tools.

Martiny, Karsten, Denker, Grit. 2020. Partial Decision Overrides in a Declarative Policy Framework. 2020 IEEE 14th International Conference on Semantic Computing (ICSC). :271–278.

The ability to specify various policies with different overriding criteria allows for complex sets of sharing policies. This is particularly useful in situations in which data privacy depends on various properties of the data, and complex policies are needed to express the conditions under which data is protected. However, if overriding policy decisions constrain the affected data, decisions from overridden policies should not be suppressed completely, because they can still apply to subsets of the affected data. This article describes how a privacy policy framework can be extended with a mechanism to partially override decisions based on specified constraints. Our solution automatically generates complementary sets of decisions for both the overridden and the complementary, non-overridden subsets of the data, and thus, provides a means to specify a complex policies tailored to specific properties of the protected data.

Zaeem, Razieh Nokhbeh, Anya, Safa, Issa, Alex, Nimergood, Jake, Rogers, Isabelle, Shah, Vinay, Srivastava, Ayush, Barber, K. Suzanne. 2020. PrivacyCheck's Machine Learning to Digest Privacy Policies: Competitor Analysis and Usage Patterns. 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). :291–298.

Online privacy policies are lengthy and hard to comprehend. To address this problem, researchers have utilized machine learning (ML) to devise tools that automatically summarize online privacy policies for web users. One such tool is our free and publicly available browser extension, PrivacyCheck. In this paper, we enhance PrivacyCheck by adding a competitor analysis component-a part of PrivacyCheck that recommends other organizations in the same market sector with better privacy policies. We also monitored the usage patterns of about a thousand actual PrivacyCheck users, the first work to track the usage and traffic of an ML-based privacy analysis tool. Results show: (1) there is a good number of privacy policy URLs checked repeatedly by the user base; (2) the users are particularly interested in privacy policies of software services; and (3) PrivacyCheck increased the number of times a user consults privacy policies by 80%. Our work demonstrates the potential of ML-based privacy analysis tools and also sheds light on how these tools are used in practice to give users actionable knowledge they can use to pro-actively protect their privacy.