Biblio

Found 350 results

Filters: Keyword is data mining  [Clear All Filters]
2015-05-05
Koch, S., John, M., Worner, M., Muller, A., Ertl, T..  2014.  VarifocalReader #x2014; In-Depth Visual Analysis of Large Text Documents. Visualization and Computer Graphics, IEEE Transactions on. 20:1723-1732.

Interactive visualization provides valuable support for exploring, analyzing, and understanding textual documents. Certain tasks, however, require that insights derived from visual abstractions are verified by a human expert perusing the source text. So far, this problem is typically solved by offering overview-detail techniques, which present different views with different levels of abstractions. This often leads to problems with visual continuity. Focus-context techniques, on the other hand, succeed in accentuating interesting subsections of large text documents but are normally not suited for integrating visual abstractions. With VarifocalReader we present a technique that helps to solve some of these approaches' problems by combining characteristics from both. In particular, our method simplifies working with large and potentially complex text documents by simultaneously offering abstract representations of varying detail, based on the inherent structure of the document, and access to the text itself. In addition, VarifocalReader supports intra-document exploration through advanced navigation concepts and facilitates visual analysis tasks. The approach enables users to apply machine learning techniques and search mechanisms as well as to assess and adapt these techniques. This helps to extract entities, concepts and other artifacts from texts. In combination with the automatic generation of intermediate text levels through topic segmentation for thematic orientation, users can test hypotheses or develop interesting new research questions. To illustrate the advantages of our approach, we provide usage examples from literature studies.

2015-05-06
Sung-Hwan Ahn, Nam-Uk Kim, Tai-Myoung Chung.  2014.  Big data analysis system concept for detecting unknown attacks. Advanced Communication Technology (ICACT), 2014 16th International Conference on. :269-272.

Recently, threat of previously unknown cyber-attacks are increasing because existing security systems are not able to detect them. Past cyber-attacks had simple purposes of leaking personal information by attacking the PC or destroying the system. However, the goal of recent hacking attacks has changed from leaking information and destruction of services to attacking large-scale systems such as critical infrastructures and state agencies. In the other words, existing defence technologies to counter these attacks are based on pattern matching methods which are very limited. Because of this fact, in the event of new and previously unknown attacks, detection rate becomes very low and false negative increases. To defend against these unknown attacks, which cannot be detected with existing technology, we propose a new model based on big data analysis techniques that can extract information from a variety of sources to detect future attacks. We expect our model to be the basis of the future Advanced Persistent Threat(APT) detection and prevention system implementations.

Sumit, S., Mitra, D., Gupta, D..  2014.  Proposed Intrusion Detection on ZRP based MANET by effective k-means clustering method of data mining. Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on. :156-160.

Mobile Ad-Hoc Networks (MANET) consist of peer-to-peer infrastructure less communicating nodes that are highly dynamic. As a result, routing data becomes more challenging. Ultimately routing protocols for such networks face the challenges of random topology change, nature of the link (symmetric or asymmetric) and power requirement during data transmission. Under such circumstances both, proactive as well as reactive routing are usually inefficient. We consider, zone routing protocol (ZRP) that adds the qualities of the proactive (IARP) and reactive (IERP) protocols. In ZRP, an updated topological map of zone centered on each node, is maintained. Immediate routes are available inside each zone. In order to communicate outside a zone, a route discovery mechanism is employed. The local routing information of the zones helps in this route discovery procedure. In MANET security is always an issue. It is possible that a node can turn malicious and hamper the normal flow of packets in the MANET. In order to overcome such issue we have used a clustering technique to separate the nodes having intrusive behavior from normal behavior. We call this technique as effective k-means clustering which has been motivated from k-means. We propose to implement Intrusion Detection System on each node of the MANET which is using ZRP for packet flow. Then we will use effective k-means to separate the malicious nodes from the network. Thus, our Ad-Hoc network will be free from any malicious activity and normal flow of packets will be possible.

Sumit, S., Mitra, D., Gupta, D..  2014.  Proposed Intrusion Detection on ZRP based MANET by effective k-means clustering method of data mining. Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on. :156-160.

Mobile Ad-Hoc Networks (MANET) consist of peer-to-peer infrastructure less communicating nodes that are highly dynamic. As a result, routing data becomes more challenging. Ultimately routing protocols for such networks face the challenges of random topology change, nature of the link (symmetric or asymmetric) and power requirement during data transmission. Under such circumstances both, proactive as well as reactive routing are usually inefficient. We consider, zone routing protocol (ZRP) that adds the qualities of the proactive (IARP) and reactive (IERP) protocols. In ZRP, an updated topological map of zone centered on each node, is maintained. Immediate routes are available inside each zone. In order to communicate outside a zone, a route discovery mechanism is employed. The local routing information of the zones helps in this route discovery procedure. In MANET security is always an issue. It is possible that a node can turn malicious and hamper the normal flow of packets in the MANET. In order to overcome such issue we have used a clustering technique to separate the nodes having intrusive behavior from normal behavior. We call this technique as effective k-means clustering which has been motivated from k-means. We propose to implement Intrusion Detection System on each node of the MANET which is using ZRP for packet flow. Then we will use effective k-means to separate the malicious nodes from the network. Thus, our Ad-Hoc network will be free from any malicious activity and normal flow of packets will be possible.

Pathan, A.C., Potey, M.A..  2014.  Detection of Malicious Transaction in Database Using Log Mining Approach. Electronic Systems, Signal Processing and Computing Technologies (ICESC), 2014 International Conference on. :262-265.

Data mining is the process of finding correlations in the relational databases. There are different techniques for identifying malicious database transactions. Many existing approaches which profile is SQL query structures and database user activities to detect intrusion, the log mining approach is the automatic discovery for identifying anomalous database transactions. Mining of the Data is very helpful to end users for extracting useful business information from large database. Multi-level and multi-dimensional data mining are employed to discover data item dependency rules, data sequence rules, domain dependency rules, and domain sequence rules from the database log containing legitimate transactions. Database transactions that do not comply with the rules are identified as malicious transactions. The log mining approach can achieve desired true and false positive rates when the confidence and support are set up appropriately. The implemented system incrementally maintain the data dependency rule sets and optimize the performance of the intrusion detection process.
 

2015-04-30
Godwin, J.L., Matthews, P..  2014.  Rapid labelling of SCADA data to extract transparent rules using RIPPER. Reliability and Maintainability Symposium (RAMS), 2014 Annual. :1-7.

This paper addresses a robust methodology for developing a statistically sound, robust prognostic condition index and encapsulating this index as a series of highly accurate, transparent, human-readable rules. These rules can be used to further understand degradation phenomena and also provide transparency and trust for any underlying prognostic technique employed. A case study is presented on a wind turbine gearbox, utilising historical supervisory control and data acquisition (SCADA) data in conjunction with a physics of failure model. Training is performed without failure data, with the technique accurately identifying gearbox degradation and providing prognostic signatures up to 5 months before catastrophic failure occurred. A robust derivation of the Mahalanobis distance is employed to perform outlier analysis in the bivariate domain, enabling the rapid labelling of historical SCADA data on independent wind turbines. Following this, the RIPPER rule learner was utilised to extract transparent, human-readable rules from the labelled data. A mean classification accuracy of 95.98% of the autonomously derived condition was achieved on three independent test sets, with a mean kappa statistic of 93.96% reported. In total, 12 rules were extracted, with an independent domain expert providing critical analysis, two thirds of the rules were deemed to be intuitive in modelling fundamental degradation behaviour of the wind turbine gearbox.

Biedermann, S., Ruppenthal, T., Katzenbeisser, S..  2014.  Data-centric phishing detection based on transparent virtualization technologies. Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on. :215-223.

We propose a novel phishing detection architecture based on transparent virtualization technologies and isolation of the own components. The architecture can be deployed as a security extension for virtual machines (VMs) running in the cloud. It uses fine-grained VM introspection (VMI) to extract, filter and scale a color-based fingerprint of web pages which are processed by a browser from the VM's memory. By analyzing the human perceptual similarity between the fingerprints, the architecture can reveal and mitigate phishing attacks which are based on redirection to spoofed web pages and it can also detect “Man-in-the-Browser” (MitB) attacks. To the best of our knowledge, the architecture is the first anti-phishing solution leveraging virtualization technologies. We explain details about the design and the implementation and we show results of an evaluation with real-world data.

2015-05-05
Jandel, M., Svenson, P., Johansson, R..  2014.  Fusing restricted information. Information Fusion (FUSION), 2014 17th International Conference on. :1-9.

Information fusion deals with the integration and merging of data and information from multiple (heterogeneous) sources. In many cases, the information that needs to be fused has security classification. The result of the fusion process is then by necessity restricted with the strictest information security classification of the inputs. This has severe drawbacks and limits the possible dissemination of the fusion results. It leads to decreased situational awareness: the organization knows information that would enable a better situation picture, but since parts of the information is restricted, it is not possible to distribute the most correct situational information. In this paper, we take steps towards defining fusion and data mining processes that can be used even when all the underlying data that was used cannot be disseminated. The method we propose here could be used to produce a classifier where all the sensitive information has been removed and where it can be shown that an antagonist cannot even in principle obtain knowledge about the classified information by using the classifier or situation picture.
 

Kaci, A., Kamwa, I., Dessaint, L.-A., Guillon, S..  2014.  Phase angles as predictors of network dynamic security limits and further implications. PES General Meeting | Conference Exposition, 2014 IEEE. :1-6.

In the United States, the number of Phasor Measurement Units (PMU) will increase from 166 networked devices in 2010 to 1043 in 2014. According to the Department of Energy, they are being installed in order to “evaluate and visualize reliability margin (which describes how close the system is to the edge of its stability boundary).” However, there is still a lot of debate in academia and industry around the usefulness of phase angles as unambiguous predictors of dynamic stability. In this paper, using 4-year of actual data from Hydro-Québec EMS, it is shown that phase angles enable satisfactory predictions of power transfer and dynamic security margins across critical interface using random forest models, with both explanation level and R-squares accuracy exceeding 99%. A generalized linear model (GLM) is next implemented to predict phase angles from day-ahead to hour-ahead time frames, using historical phase angles values and load forecast. Combining GLM based angles forecast with random forest mapping of phase angles to power transfers result in a new data-driven approach for dynamic security monitoring.
 

Marchal, S., Xiuyan Jiang, State, R., Engel, T..  2014.  A Big Data Architecture for Large Scale Security Monitoring. Big Data (BigData Congress), 2014 IEEE International Congress on. :56-63.

Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterprise networks. The application domain of such a system is mainly network intrusion detection and prevention, but can be used as well for forensic analysis. This architecture integrates two systems, one dedicated to scalable distributed data storage and management and the other dedicated to data exploitation. DNS data, NetFlow records, HTTP traffic and honeypot data are mined and correlated in a distributed system that leverages state of the art big data solution. Data correlation schemes are proposed and their performance are evaluated against several well-known big data framework including Hadoop and Spark.

Lomotey, R.K., Deters, R..  2014.  Terms Mining in Document-Based NoSQL: Response to Unstructured Data. Big Data (BigData Congress), 2014 IEEE International Congress on. :661-668.

Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.

2015-05-06
Mokhtar, B., Eltoweissy, M..  2014.  Towards a Data Semantics Management System for Internet Traffic. New Technologies, Mobility and Security (NTMS), 2014 6th International Conference on. :1-5.

Although current Internet operations generate voluminous data, they remain largely oblivious of traffic data semantics. This poses many inefficiencies and challenges due to emergent or anomalous behavior impacting the vast array of Internet elements such as services and protocols. In this paper, we propose a Data Semantics Management System (DSMS) for learning Internet traffic data semantics to enable smarter semantics- driven networking operations. We extract networking semantics and build and utilize a dynamic ontology of network concepts to better recognize and act upon emergent or abnormal behavior. Our DSMS utilizes: (1) Latent Dirichlet Allocation algorithm (LDA) for latent features extraction and semantics reasoning; (2) big tables as a cloud-like data storage technique to maintain large-scale data; and (3) Locality Sensitive Hashing algorithm (LSH) for reducing data dimensionality. Our preliminary evaluation using real Internet traffic shows the efficacy of DSMS for learning behavior of normal and abnormal traffic data and for accurately detecting anomalies at low cost.
 

2015-05-05
Haoliang Lou, Yunlong Ma, Feng Zhang, Min Liu, Weiming Shen.  2014.  Data mining for privacy preserving association rules based on improved MASK algorithm. Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on. :265-270.

With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.
 

Haoliang Lou, Yunlong Ma, Feng Zhang, Min Liu, Weiming Shen.  2014.  Data mining for privacy preserving association rules based on improved MASK algorithm. Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on. :265-270.

With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.
 

2015-04-30
Srivastava, M..  2014.  In Sensors We Trust – A Realistic Possibility? Distributed Computing in Sensor Systems (DCOSS), 2014 IEEE International Conference on. :1-1.

Sensors of diverse capabilities and modalities, carried by us or deeply embedded in the physical world, have invaded our personal, social, work, and urban spaces. Our relationship with these sensors is a complicated one. On the one hand, these sensors collect rich data that are shared and disseminated, often initiated by us, with a broad array of service providers, interest groups, friends, and family. Embedded in this data is information that can be used to algorithmically construct a virtual biography of our activities, revealing intimate behaviors and lifestyle patterns. On the other hand, we and the services we use, increasingly depend directly and indirectly on information originating from these sensors for making a variety of decisions, both routine and critical, in our lives. The quality of these decisions and our confidence in them depend directly on the quality of the sensory information and our trust in the sources. Sophisticated adversaries, benefiting from the same technology advances as the sensing systems, can manipulate sensory sources and analyze data in subtle ways to extract sensitive knowledge, cause erroneous inferences, and subvert decisions. The consequences of these compromises will only amplify as our society increasingly complex human-cyber-physical systems with increased reliance on sensory information and real-time decision cycles.Drawing upon examples of this two-faceted relationship with sensors in applications such as mobile health and sustainable buildings, this talk will discuss the challenges inherent in designing a sensor information flow and processing architecture that is sensitive to the concerns of both producers and consumer. For the pervasive sensing infrastructure to be trusted by both, it must be robust to active adversaries who are deceptively extracting private information, manipulating beliefs and subverting decisions. While completely solving these challenges would require a new science of resilient, secure and trustworthy networked sensing and decision systems that would combine hitherto disciplines of distributed embedded systems, network science, control theory, security, behavioral science, and game theory, this talk will provide some initial ideas. These include an approach to enabling privacy-utility trade-offs that balance the tension between risk of information sharing to the producer and the value of information sharing to the consumer, and method to secure systems against physical manipulation of sensed information.

2015-05-05
Srivastava, M..  2014.  In Sensors We Trust – A Realistic Possibility? Distributed Computing in Sensor Systems (DCOSS), 2014 IEEE International Conference on. :1-1.

Sensors of diverse capabilities and modalities, carried by us or deeply embedded in the physical world, have invaded our personal, social, work, and urban spaces. Our relationship with these sensors is a complicated one. On the one hand, these sensors collect rich data that are shared and disseminated, often initiated by us, with a broad array of service providers, interest groups, friends, and family. Embedded in this data is information that can be used to algorithmically construct a virtual biography of our activities, revealing intimate behaviors and lifestyle patterns. On the other hand, we and the services we use, increasingly depend directly and indirectly on information originating from these sensors for making a variety of decisions, both routine and critical, in our lives. The quality of these decisions and our confidence in them depend directly on the quality of the sensory information and our trust in the sources. Sophisticated adversaries, benefiting from the same technology advances as the sensing systems, can manipulate sensory sources and analyze data in subtle ways to extract sensitive knowledge, cause erroneous inferences, and subvert decisions. The consequences of these compromises will only amplify as our society increasingly complex human-cyber-physical systems with increased reliance on sensory information and real-time decision cycles.Drawing upon examples of this two-faceted relationship with sensors in applications such as mobile health and sustainable buildings, this talk will discuss the challenges inherent in designing a sensor information flow and processing architecture that is sensitive to the concerns of both producers and consumer. For the pervasive sensing infrastructure to be trusted by both, it must be robust to active adversaries who are deceptively extracting private information, manipulating beliefs and subverting decisions. While completely solving these challenges would require a new science of resilient, secure and trustworthy networked sensing and decision systems that would combine hitherto disciplines of distributed embedded systems, network science, control theory, security, behavioral science, and game theory, this talk will provide some initial ideas. These include an approach to enabling privacy-utility trade-offs that balance the tension between risk of information sharing to the producer and the value of information sharing to the consumer, and method to secure systems against physical manipulation of sensed information.
 

Eun Hee Ko, Klabjan, D..  2014.  Semantic Properties of Customer Sentiment in Tweets. Advanced Information Networking and Applications Workshops (WAINA), 2014 28th International Conference on. :657-663.

An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the purposes of this study are twofold: 1) finding similarity and dissimilarity between two sets of textual documents that include consumers' sentiment polarities, two forms of positive vs. negative opinions and 2) driving actual content from the textual data that has a semantic trend. The considered tweets include consumers' opinions on US retail companies (e.g., Amazon, Walmart). Cosine similarity and K-means clustering methods are used to achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic modeling algorithm, is used for the latter purpose. This is the first study which discover semantic properties of textual data in consumption context beyond sentiment analysis. In addition to major findings, we apply LDA (Latent Dirichlet Allocations) to the same data and drew latent topics that represent consumers' positive opinions and negative opinions on social media.

Kaci, A., Kamwa, I., Dessaint, L.A., Guillon, S..  2014.  Synchrophasor Data Baselining and Mining for Online Monitoring of Dynamic Security Limits. Power Systems, IEEE Transactions on. 29:2681-2695.

When the system is in normal state, actual SCADA measurements of power transfers across critical interfaces are continuously compared with limits determined offline and stored in look-up tables or nomograms in order to assess whether the network is secure or insecure and inform the dispatcher to take preventive action in the latter case. However, synchrophasors could change this paradigm by enabling new features, the phase-angle differences, which are well-known measures of system stress, with the added potential to increase system visibility. The paper develops a systematic approach to baseline the phase-angles versus actual transfer limits across system interfaces and enable synchrophasor-based situational awareness (SBSA). Statistical methods are first used to determine seasonal exceedance levels of angle shifts that can allow real-time scoring and detection of atypical conditions. Next, key buses suitable for SBSA are identified using correlation and partitioning around medoid (PAM) clustering. It is shown that angle shifts of this subset of 15% of the network backbone buses can be effectively used as features in ensemble decision tree-based forecasting of seasonal security margins across critical interfaces.
 

2015-05-04
Yun Shen, Thonnard, O..  2014.  MR-TRIAGE: Scalable multi-criteria clustering for big data security intelligence applications. Big Data (Big Data), 2014 IEEE International Conference on. :627-635.

Security companies have recently realised that mining massive amounts of security data can help generate actionable intelligence and improve their understanding of Internet attacks. In particular, attack attribution and situational understanding are considered critical aspects to effectively deal with emerging, increasingly sophisticated Internet attacks. This requires highly scalable analysis tools to help analysts classify, correlate and prioritise security events, depending on their likely impact and threat level. However, this security data mining process typically involves a considerable amount of features interacting in a non-obvious way, which makes it inherently complex. To deal with this challenge, we introduce MR-TRIAGE, a set of distributed algorithms built on MapReduce that can perform scalable multi-criteria data clustering on large security data sets and identify complex relationships hidden in massive datasets. The MR-TRIAGE workflow is made of a scalable data summarisation, followed by scalable graph clustering algorithms in which we integrate multi-criteria evaluation techniques. Theoretical computational complexity of the proposed parallel algorithms are discussed and analysed. The experimental results demonstrate that the algorithms can scale well and efficiently process large security datasets on commodity hardware. Our approach can effectively cluster any type of security events (e.g., spam emails, spear-phishing attacks, etc) that are sharing at least some commonalities among a number of predefined features.
 

2015-05-01
Lu Wang, Yung, N.H.C., Lisheng Xu.  2014.  Multiple-Human Tracking by Iterative Data Association and Detection Update. Intelligent Transportation Systems, IEEE Transactions on. 15:1886-1899.

Multiple-object tracking is an important task in automated video surveillance. In this paper, we present a multiple-human-tracking approach that takes the single-frame human detection results as input and associates them to form trajectories while improving the original detection results by making use of reliable temporal information in a closed-loop manner. It works by first forming tracklets, from which reliable temporal information is extracted, and then refining the detection responses inside the tracklets, which also improves the accuracy of tracklets' quantities. After this, local conservative tracklet association is performed and reliable temporal information is propagated across tracklets so that more detection responses can be refined. The global tracklet association is done last to resolve association ambiguities. Experimental results show that the proposed approach improves both the association and detection results. Comparison with several state-of-the-art approaches demonstrates the effectiveness of the proposed approach.

2015-05-05
Mittal, D., Kaur, D., Aggarwal, A..  2014.  Secure Data Mining in Cloud Using Homomorphic Encryption. Cloud Computing in Emerging Markets (CCEM), 2014 IEEE International Conference on. :1-7.

With the advancement in technology, industry, e-commerce and research a large amount of complex and pervasive digital data is being generated which is increasing at an exponential rate and often termed as big data. Traditional Data Storage systems are not able to handle Big Data and also analyzing the Big Data becomes a challenge and thus it cannot be handled by traditional analytic tools. Cloud Computing can resolve the problem of handling, storage and analyzing the Big Data as it distributes the big data within the cloudlets. No doubt, Cloud Computing is the best answer available to the problem of Big Data storage and its analyses but having said that, there is always a potential risk to the security of Big Data storage in Cloud Computing, which needs to be addressed. Data Privacy is one of the major issues while storing the Big Data in a Cloud environment. Data Mining based attacks, a major threat to the data, allows an adversary or an unauthorized user to infer valuable and sensitive information by analyzing the results generated from computation performed on the raw data. This thesis proposes a secure k-means data mining approach assuming the data to be distributed among different hosts preserving the privacy of the data. The approach is able to maintain the correctness and validity of the existing k-means to generate the final results even in the distributed environment.
 

Sanger, J., Richthammer, C., Hassan, S., Pernul, G..  2014.  Trust and Big Data: A Roadmap for Research. Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. :278-282.

We are currently living in the age of Big Data coming along with the challenge to grasp the golden opportunities at hand. This mixed blessing also dominates the relation between Big Data and trust. On the one side, large amounts of trust-related data can be utilized to establish innovative data-driven approaches for reputation-based trust management. On the other side, this is intrinsically tied to the trust we can put in the origins and quality of the underlying data. In this paper, we address both sides of trust and Big Data by structuring the problem domain and presenting current research directions and inter-dependencies. Based on this, we define focal issues which serve as future research directions for the track to our vision of Next Generation Online Trust within the FORSEC project.
 

2021-02-08
Geetha, C. R., Basavaraju, S., Puttamadappa, C..  2013.  Variable load image steganography using multiple edge detection and minimum error replacement method. 2013 IEEE Conference on Information Communication Technologies. :53—58.

This paper proposes a steganography method using the digital images. Here, we are embedding the data which is to be secured into the digital image. Human Visual System proved that the changes in the image edges are insensitive to human eyes. Therefore we are using edge detection method in steganography to increase data hiding capacity by embedding more data in these edge pixels. So, if we can increase number of edge pixels, we can increase the amount of data that can be hidden in the image. To increase the number of edge pixels, multiple edge detection is employed. Edge detection is carried out using more sophisticated operator like canny operator. To compensate for the resulting decrease in the PSNR because of increase in the amount of data hidden, Minimum Error Replacement [MER] method is used. Therefore, the main goal of image steganography i.e. security with highest embedding capacity and good visual qualities are achieved. To extract the data we need the original image and the embedding ratio. Extraction is done by taking multiple edges detecting the original image and the data is extracted corresponding to the embedding ratio.

Wang Xiao, Mi Hong, Wang Wei.  2010.  Inner edge detection of PET bottle opening based on the Balloon Snake. 2010 2nd International Conference on Advanced Computer Control. 4:56—59.

Edge detection of bottle opening is a primary section to the machine vision based bottle opening detection system. This paper, taking advantage of the Balloon Snake, on the PET (Polyethylene Terephthalate) images sampled at rotating bottle-blowing machine producing pipelines, extracts the opening. It first uses the grayscale weighting average method to calculate the centroid as the initial position of Snake and then based on the energy minimal theory, it extracts the opening. Experiments show that compared with the conventional edge detection and center location methods, Balloon Snake is robust and can easily step over the weak noise points. Edge extracted thorough Balloon Snake is more integral and continuous which provides a guarantee to correctly judge the opening.

2018-07-06
Du, Xiaojiang.  2004.  Using k-nearest neighbor method to identify poison message failure. IEEE Global Telecommunications Conference, 2004. GLOBECOM '04. 4:2113–2117Vol.4.

Poison message failure is a mechanism that has been responsible for large scale failures in both telecommunications and IP networks. The poison message failure can propagate in the network and cause an unstable network. We apply a machine learning, data mining technique in the network fault management area. We use the k-nearest neighbor method to identity the poison message failure. We also propose a "probabilistic" k-nearest neighbor method which outputs a probability distribution about the poison message. Through extensive simulations, we show that the k-nearest neighbor method is very effective in identifying the responsible message type.