Biblio
In the current world, day by day the data growth and the investigation about that information increased due to the pervasiveness of computing devices, but people are reluctant to share their information on online portals or surveys fearing safety because sensitive information such as credit card information, medical conditions and other personal information in the wrong hands can mean danger to the society. These days privacy preserving has become a setback for storing data in data repository so for that reason data in the repository should be made undistinguishable, data is encrypted while storing and later decrypted when needed for analysis purpose in data mining. While storing the raw data of the individuals it is important to remove person-identifiable information such as name, employee id. However, the other attributes pertaining to the person should be encrypted so the methodologies used to implement. These methodologies can make data in the repository secure and PPDM task can made easier.
As the number of data in various industries and government sectors is growing exponentially, the `7V' concept of big data aims to create a new value by indiscriminately collecting and analyzing information from various fields. At the same time as the ecosystem of the ICT industry arrives, big data utilization is treatened by the privacy attacks such as infringement due to the large amount of data. To manage and sustain the controllable privacy level, there need some recommended de-identification techniques. This paper exploits those de-identification processes and three types of commonly used privacy models. Furthermore, this paper presents use cases which can be adopted those kinds of technologies and future development directions.
In order to protect individuals' privacy, data have to be "well-sanitized" before sharing them, i.e. one has to remove any personal information before sharing data. However, it is not always clear when data shall be deemed well-sanitized. In this paper, we argue that the evaluation of sanitized data should be based on whether the data allows the inference of sensitive information that is specific to an individual, instead of being centered around the concept of re-identification. We propose a framework to evaluate the effectiveness of different sanitization techniques on a given dataset by measuring how much an individual's record from the sanitized dataset influences the inference of his/her own sensitive attribute. Our intent is not to accurately predict any sensitive attribute but rather to measure the impact of a single record on the inference of sensitive information. We demonstrate our approach by sanitizing two real datasets in different privacy models and evaluate/compare each sanitized dataset in our framework.
In the era of the ever-growing number of smart devices, fraudulent practices through Phishing Websites have become an increasingly severe threat to modern computers and internet security. These websites are designed to steal the personal information from the user and spread over the internet without the knowledge of the user using the system. These websites give a false impression of genuinity to the user by mirroring the real trusted web pages which then leads to the loss of important credentials of the user. So, Detection of such fraudulent websites is an essence and the need of the hour. In this paper, various classifiers have been considered and were found that ensemble classifiers predict to utmost efficiency. The idea behind was whether a combined classifier model performs better than a single classifier model leading to a better efficiency and accuracy. In this paper, for experimentation, three Meta Classifiers, namely, AdaBoostM1, Stacking, and Bagging have been taken into consideration for performance comparison. It is found that Meta Classifier built by combining of simple classifier(s) outperform the simple classifier's performance.
Nowadays, mobile devices have become one of the most popular instruments used by a person on its regular life, mainly due to the importance of their applications. In that context, mobile devices store user's personal information and even more data, becoming a personal tracker for daily activities that provides important information about the user. Derived from this gathering of information, many tools are available to use on mobile devices, with the restrain that each tool only provides isolated information about a specific application or activity. Therefore, the present work proposes a tool that allows investigators to obtain a complete report and timeline of the activities that were performed on the device. This report incorporates the information provided by many sources into a unique set of data. Also, by means of an example, it is presented the operation of the solution, which shows the feasibility in the use of this tool and shows the way in which investigators have to apply the tool.
Now-a-days, the security of data becomes more and more important, people store many personal information in their phones. However, stored information require security and maintain privacy. Encryption algorithm has become the main force of maintaining the security of data. Thus, the algorithm complexity and encryption efficiency have become the main measurement of whether the encryption algorithm is save or not. With the development of hardware, we have many tools to improve the algorithm at present. Because modular exponentiation in RSA algorithm can be divided into several parts mathematically. In this paper, we introduce a conception by dividing the process of encryption and add the model into graphics process unit (GPU). By using GPU's capacity in parallel computing, the core of RSA can be accelerated by using central process unit (CPU) and GPU. Compute unified device architecture (CUDA) is a platform which can combine CPU and GPU together to realize GPU parallel programming and this is the tool we use to perform experience of accelerating RSA algorithm. This paper will also build up a mathematical model to help understand the mechanism of RSA encryption algorithm.
Phishing as one of the most well-known cybercrime activities is a deception of online users to steal their personal or confidential information by impersonating a legitimate website. Several machine learning-based strategies have been proposed to detect phishing websites. These techniques are dependent on the features extracted from the website samples. However, few studies have actually considered efficient feature selection for detecting phishing attacks. In this work, we investigate an agreement on the definitive features which should be used in phishing detection. We apply Fuzzy Rough Set (FRS) theory as a tool to select most effective features from three benchmarked data sets. The selected features are fed into three often used classifiers for phishing detection. To evaluate the FRS feature selection in developing a generalizable phishing detection, the classifiers are trained by a separate out-of-sample data set of 14,000 website samples. The maximum F-measure gained by FRS feature selection is 95% using Random Forest classification. Also, there are 9 universal features selected by FRS over all the three data sets. The F-measure value using this universal feature set is approximately 93% which is a comparable result in contrast to the FRS performance. Since the universal feature set contains no features from third-part services, this finding implies that with no inquiry from external sources, we can gain a faster phishing detection which is also robust toward zero-day attacks.
Phishing is a security attack to acquire personal information like passwords, credit card details or other account details of a user by means of websites or emails. Phishing websites look similar to the legitimate ones which make it difficult for a layman to differentiate between them. As per the reports of Anti Phishing Working Group (APWG) published in December 2018, phishing against banking services and payment processor was high. Almost all the phishy URLs use HTTPS and use redirects to avoid getting detected. This paper presents a focused literature survey of methods available to detect phishing websites. A comparative study of the in-use anti-phishing tools was accomplished and their limitations were acknowledged. We analyzed the URL-based features used in the past to improve their definitions as per the current scenario which is our major contribution. Also, a step wise procedure of designing an anti-phishing model is discussed to construct an efficient framework which adds to our contribution. Observations made out of this study are stated along with recommendations on existing systems.
Personal data have been compiled and harnessed by a great number of establishments to execute their legal activities. Establishments are legally bound to maintain the confidentiality and security of personal data. Hence it is a requirement to provide access logs for the personal information. Depending on the needs and capacity, personal data can be opened to the users via platforms such as file system, database and web service. Web service platform is a popular alternative since it is autonomous and can isolate the data source from the user. In this paper, the way to log personal data accessed via web service method has been discussed. As an alternative to classical method in which logs were recorded and saved by client applications, a different mechanism of forming a central audit log with API manager has been investigated. By forging a model policy to exemplify central logging method, its advantages and disadvantages have been explored. It has been concluded in the end that this model could be employed in centrally recording audit logs.
Recently, IoT, 5G mobile, big data, and artificial intelligence are increasingly used in the real world. These technologies are based on convergenced in Cyber Physical System(Cps). Cps technology requires core technologies to ensure reliability, real-time, safety, autonomy, and security. CPS is the system that can connect between cyberspace and physical space. Cyberspace attacks are confused in the real world and have a lot of damage. The personal information that dealing in CPS has high confidentiality, so the policies and technique will needed to protect the attack in advance. If there is an attack on the CPS, not only personal information but also national confidential data can be leaked. In order to prevent this, the risk is measured using the Factor Analysis of Information Risk (FAIR) Model, which can measure risk by element for situational awareness in CPS environment. To reduce risk by preventing attacks in CPS, this paper measures risk after using the concept of Crime Prevention Through Environmental Design(CPTED).
Today's emerging Industrial Internet of Things (IIoT) scenarios are characterized by the exchange of data between services across enterprises. Traditional access and usage control mechanisms are only able to determine if data may be used by a subject, but lack an understanding of how it may be used. The ability to control the way how data is processed is however crucial for enterprises to guarantee (and provide evidence of) compliant processing of critical data, as well as for users who need to control if their private data may be analyzed or linked with additional information - a major concern in IoT applications processing personal information. In this paper, we introduce LUCON, a data-centric security policy framework for distributed systems that considers data flows by controlling how messages may be routed across services and how they are combined and processed. LUCON policies prevent information leaks, bind data usage to obligations, and enforce data flows across services. Policy enforcement is based on a dynamic taint analysis at runtime and an upfront static verification of message routes against policies. We discuss the semantics of these two complementing enforcement models and illustrate how LUCON policies are compiled from a simple policy language into a first-order logic representation. We demonstrate the practical application of LUCON in a real-world IoT middleware and discuss its integration into Apache Camel. Finally, we evaluate the runtime impact of LUCON and discuss performance and scalability aspects.
Many companies within the Internet of Things (IoT) sector rely on the personal data of users to deliver and monetize their services, creating a high demand for personal information. A user can be seen as making a series of transactions, each involving the exchange of personal data for a service. In this paper, we argue that privacy can be described quantitatively, using the game- theoretic concept of value of information (VoI), enabling us to assess whether each exchange is an advantageous one for the user. We introduce PrivacyGate, an extension to the Android operating system built for the purpose of studying privacy of IoT transactions. An example study, and its initial results, are provided to illustrate its capabilities.
Learning analytics open up a complex landscape of privacy and policy issues, which, in turn, influence how learning analytics systems and practices are designed. Research and development is governed by regulations for data storage and management, and by research ethics. Consequently, when moving solutions out the research labs implementers meet constraints defined in national laws and justified in privacy frameworks. This paper explores how the OECD, APEC and EU privacy frameworks seek to regulate data privacy, with significant implications for the discourse of learning, and ultimately, an impact on the design of tools, architectures and practices that now are on the drawing board. A detailed list of requirements for learning analytics systems is developed, based on the new legal requirements defined in the European General Data Protection Regulation, which from 2018 will be enforced as European law. The paper also gives an initial account of how the privacy discourse in Europe, Japan, South-Korea and China is developing and reflects upon the possible impact of the different privacy frameworks on the design of LA privacy solutions in these countries. This research contributes to knowledge of how concerns about privacy and data protection related to educational data can drive a discourse on new approaches to privacy engineering based on the principles of Privacy by Design. For the LAK community, this study represents the first attempt to conceptualise the issues of privacy and learning analytics in a cross-cultural context. The paper concludes with a plan to follow up this research on privacy policies and learning analytics systems development with a new international study.
Wireless sensor networks (WSNs) are implemented in various Internet-of-Things applications such as energy management systems. As the applications may involve personal information, they must be protected from attackers attempting to read information or control network devices. Research on WSN security is essential to protect WSNs from attacks. Studies in such research domains propose solutions against the attacks. However, they focus mainly on the security measures rather than on their ease in implementation in WSNs. In this paper, we propose a coordination middleware that provides an environment for constructing updatable WSNs for security. The middleware is based on LINC, a rule-based coordination middleware. The proposed approach allows the development of WSNs and attaches or detaches security modules when required. We implemented three security modules on LINC and on a real network, as case studies. Moreover, we evaluated the implementation costs while comparing the case studies.
Sensitive data such as text messages, contact lists, and personal information are stored on mobile devices. This makes authentication of paramount importance. More security is needed on mobile devices since, after point-of-entry authentication, the user can perform almost all tasks without having to re-authenticate. For this reason, many authentication methods have been suggested to improve the security of mobile devices in a transparent and continuous manner, providing a basis for convenient and secure user re-authentication. This paper presents a comprehensive analysis and literature review on transparent authentication systems for mobile device security. This review indicates a need to investigate when to authenticate the mobile user by focusing on the sensitivity level of the application, and understanding whether a certain application may require a protection or not.
The amount of personal information contributed by individuals to digital repositories such as social network sites has grown substantially. The existence of this data offers unprecedented opportunities for data analytics research in various domains of societal importance including medicine and public policy. The results of these analyses can be considered a public good which benefits data contributors as well as individuals who are not making their data available. At the same time, the release of personal information carries perceived and actual privacy risks to the contributors. Our research addresses this problem area. In our work, we study a game-theoretic model in which individuals take control over participation in data analytics projects in two ways: 1) individuals can contribute data at a self-chosen level of precision, and 2) individuals can decide whether they want to contribute at all (or not). From the analyst's perspective, we investigate to which degree the research analyst has flexibility to set requirements for data precision, so that individuals are still willing to contribute to the project, and the quality of the estimation improves. We study this tradeoffs scenario for populations of homogeneous and heterogeneous individuals, and determine Nash equilibrium that reflect the optimal level of participation and precision of contributions. We further prove that the analyst can substantially increase the accuracy of the analysis by imposing a lower bound on the precision of the data that users can reveal.