Biblio

Found 200 results

Filters: Keyword is Data analysis  [Clear All Filters]
2017-03-07
Kim, J., Moon, I., Lee, K., Suh, S. C., Kim, I..  2015.  Scalable Security Event Aggregation for Situation Analysis. 2015 IEEE First International Conference on Big Data Computing Service and Applications. :14–23.

Cyber-attacks have been evolved in a way to be more sophisticated by employing combinations of attack methodologies with greater impacts. For instance, Advanced Persistent Threats (APTs) employ a set of stealthy hacking processes running over a long period of time, making it much hard to detect. With this trend, the importance of big-data security analytics has taken greater attention since identifying such latest attacks requires large-scale data processing and analysis. In this paper, we present SEAS-MR (Security Event Aggregation System over MapReduce) that facilitates scalable security event aggregation for comprehensive situation analysis. The introduced system provides the following three core functions: (i) periodic aggregation, (ii) on-demand aggregation, and (iii) query support for effective analysis. We describe our design and implementation of the system over MapReduce and high-level query languages, and report our experimental results collected through extensive settings on a Hadoop cluster for performance evaluation and design impacts.

Kumar, B., Kumar, P., Mundra, A., Kabra, S..  2015.  DC scanner: Detecting phishing attack. 2015 Third International Conference on Image Information Processing (ICIIP). :271–276.

Data mining has been used as a technology in various applications of engineering, sciences and others to analysis data of systems and to solve problems. Its applications further extend towards detecting cyber-attacks. We are presenting our work with simple and less efforts similar to data mining which detects email based phishing attacks. This work digs html contents of emails and web pages referred. Also domains and domain related authority details of these links, script codes associated to web pages are analyzed to conclude for the probability of phishing attacks.

2017-02-27
Dou, Huijing, Bian, Tingting.  2015.  An effective information filtering method based on the LTE network. 2015 4th International Conference on Computer Science and Network Technology (ICCSNT). 01:1428–1432.

With the rapid development of the information technology, more and more high-speed networks came out. The 4G LTE network as a recently emerging network has gradually entered the mainstream of the communication network. This paper proposed an effective content-based information filtering based on the 4G LTE high-speed network by combing the content-based filter and traditional simple filter. Firstly, raw information is pre-processed by five-tuple filter. Secondly, we determine the topics and character of the source data by key nearest neighbor text classification after minimum-risk Bayesian classification. Finally, the improved AdaBoost algorithm achieves the four-level content-based information filtering. The experiments reveal that the effective information filtering method can be applied to the network security, big data analysis and other fields. It has high research value and market value.

Latif, Z. A., Mohamad, M. H..  2015.  Mapping of Dengue Outbreak Distribution Using Spatial Statistics and Geographical Information System. 2015 2nd International Conference on Information Science and Security (ICISS). :1–6.

This study presents spatial analysis of Dengue Fever (DF) outbreak using Geographic Information System (GIS) in the state of Selangor, Malaysia. DF is an Aedes mosquito-borne disease. The aim of the study is to map the spread of DF outbreak in Selangor by producing a risk map while the objective is to identify high risk areas of DF by producing a risk map using GIS tools. The data used was DF dengue cases in 2012 obtained from Ministry of Health, Malaysia. The analysis was carried out using Moran's I, Average Nearest Neighbor (ANN), Kernel Density Estimation (KDE) and buffer analysis using GIS. From the Moran's I analysis, the distribution pattern of DF in Selangor clustered. From the ANN analysis, the result shows a dispersed pattern where the ratio is more than 1. The third analysis was based on KDE to locate the hot spot location. The result shows that some districts are classified as high risk areas which are Ampang, Damansara, Kapar, Kajang, Klang, Semenyih, Sungai Buloh and Petaling. The buffer analysis, area ranges between 200m. to 500m. above sea level shows a clustered pattern where the highest frequent cases in the year are at the same location. It was proven that the analysis based on the spatial statistic, spatial interpolation, and buffer analysis can be used as a method in controlling and locating the DF affection with the aid of GIS.

Chessa, M., Grossklags, J., Loiseau, P..  2015.  A Game-Theoretic Study on Non-monetary Incentives in Data Analytics Projects with Privacy Implications. 2015 IEEE 28th Computer Security Foundations Symposium. :90–104.

The amount of personal information contributed by individuals to digital repositories such as social network sites has grown substantially. The existence of this data offers unprecedented opportunities for data analytics research in various domains of societal importance including medicine and public policy. The results of these analyses can be considered a public good which benefits data contributors as well as individuals who are not making their data available. At the same time, the release of personal information carries perceived and actual privacy risks to the contributors. Our research addresses this problem area. In our work, we study a game-theoretic model in which individuals take control over participation in data analytics projects in two ways: 1) individuals can contribute data at a self-chosen level of precision, and 2) individuals can decide whether they want to contribute at all (or not). From the analyst's perspective, we investigate to which degree the research analyst has flexibility to set requirements for data precision, so that individuals are still willing to contribute to the project, and the quality of the estimation improves. We study this tradeoffs scenario for populations of homogeneous and heterogeneous individuals, and determine Nash equilibrium that reflect the optimal level of participation and precision of contributions. We further prove that the analyst can substantially increase the accuracy of the analysis by imposing a lower bound on the precision of the data that users can reveal.

2017-03-07
Hu, Zhiyong, Baynard, C. W., Hu, Hongda, Fazio, M..  2015.  GIS mapping and spatial analysis of cybersecurity attacks on a florida university. 2015 23rd International Conference on Geoinformatics. :1–5.

As the centers of knowledge, discovery, and intellectual exploration, US universities provide appealing cybersecurity targets. Cyberattack origin patterns and relationships are not evident until data is visualized in maps and tested with statistical models. The current cybersecurity threat detection software utilized by University of North Florida's IT department records large amounts of attacks and attempted intrusions by the minute. This paper presents GIS mapping and spatial analysis of cybersecurity attacks on UNF. First, locations of cyberattack origins were detected by geographic Internet Protocol (GEO-IP) software. Second, GIS was used to map the cyberattack origin locations. Third, we used advanced spatial statistical analysis functions (exploratory spatial data analysis and spatial point pattern analysis) and R software to explore cyberattack patterns. The spatial perspective we promote is novel because there are few studies employing location analytics and spatial statistics in cyber-attack detection and prevention research.

2017-02-14
A. Oprea, Z. Li, T. F. Yen, S. H. Chin, S. Alrwais.  2015.  "Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data". 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. :45-56.

Recent years have seen the rise of sophisticated attacks including advanced persistent threats (APT) which pose severe risks to organizations and governments. Additionally, new malware strains appear at a higher rate than ever before. Since many of these malware evade existing security products, traditional defenses deployed by enterprises today often fail at detecting infections at an early stage. We address the problem of detecting early-stage APT infection by proposing a new framework based on belief propagation inspired from graph theory. We demonstrate that our techniques perform well on two large datasets. We achieve high accuracy on two months of DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of web proxy logs collected at the border of a large enterprise and identify hundreds of malicious domains overlooked by state-of-the-art security products.

J. Kim, I. Moon, K. Lee, S. C. Suh, I. Kim.  2015.  "Scalable Security Event Aggregation for Situation Analysis". 2015 IEEE First International Conference on Big Data Computing Service and Applications. :14-23.

Cyber-attacks have been evolved in a way to be more sophisticated by employing combinations of attack methodologies with greater impacts. For instance, Advanced Persistent Threats (APTs) employ a set of stealthy hacking processes running over a long period of time, making it much hard to detect. With this trend, the importance of big-data security analytics has taken greater attention since identifying such latest attacks requires large-scale data processing and analysis. In this paper, we present SEAS-MR (Security Event Aggregation System over MapReduce) that facilitates scalable security event aggregation for comprehensive situation analysis. The introduced system provides the following three core functions: (i) periodic aggregation, (ii) on-demand aggregation, and (iii) query support for effective analysis. We describe our design and implementation of the system over MapReduce and high-level query languages, and report our experimental results collected through extensive settings on a Hadoop cluster for performance evaluation and design impacts.

2017-02-23
H. M. Ruan, M. H. Tsai, Y. N. Huang, Y. H. Liao, C. L. Lei.  2015.  "Discovery of De-identification Policies Considering Re-identification Risks and Information Loss". 2015 10th Asia Joint Conference on Information Security. :69-76.

In data analysis, it is always a tough task to strike the balance between the privacy and the applicability of the data. Due to the demand for individual privacy, the data are being more or less obscured before being released or outsourced to avoid possible privacy leakage. This process is so called de-identification. To discuss a de-identification policy, the most important two aspects should be the re-identification risk and the information loss. In this paper, we introduce a novel policy searching method to efficiently find out proper de-identification policies according to acceptable re-identification risk while retaining the information resided in the data. With the UCI Machine Learning Repository as our real world dataset, the re-identification risk can therefore be able to reflect the true risk of the de-identified data under the de-identification policies. Moreover, using the proposed algorithm, one can then efficiently acquire policies with higher information entropy.

H. Ulusoy, M. Kantarcioglu, B. Thuraisingham, L. Khan.  2015.  "Honeypot based unauthorized data access detection in MapReduce systems". 2015 IEEE International Conference on Intelligence and Security Informatics (ISI). :126-131.

The data processing capabilities of MapReduce systems pioneered with the on-demand scalability of cloud computing have enabled the Big Data revolution. However, the data controllers/owners worried about the privacy and accountability impact of storing their data in the cloud infrastructures as the existing cloud computing solutions provide very limited control on the underlying systems. The intuitive approach - encrypting data before uploading to the cloud - is not applicable to MapReduce computation as the data analytics tasks are ad-hoc defined in the MapReduce environment using general programming languages (e.g, Java) and homomorphic encryption methods that can scale to big data do not exist. In this paper, we address the challenges of determining and detecting unauthorized access to data stored in MapReduce based cloud environments. To this end, we introduce alarm raising honeypots distributed over the data that are not accessed by the authorized MapReduce jobs, but only by the attackers and/or unauthorized users. Our analysis shows that unauthorized data accesses can be detected with reasonable performance in MapReduce based cloud environments.

2017-03-07
Adebayo, O. J., ASuleiman, I., Ade, A. Y., Ganiyu, S. O., Alabi, I. O..  2015.  Digital Forensic analysis for enhancing information security. 2015 International Conference on Cyberspace (CYBER-Abuja). :38–44.

Digital Forensics is an area of Forensics Science that uses the application of scientific method toward crime investigation. The thwarting of forensic evidence is known as anti-forensics, the aim of which is ambiguous in the sense that it could be bad or good. The aim of this project is to simulate digital crimes scenario and carry out forensic and anti-forensic analysis to enhance security. This project uses several forensics and anti-forensic tools and techniques to carry out this work. The data analyzed were gotten from result of the simulation. The results reveal that although it might be difficult to investigate digital crime but with the help of sophisticated forensic tools/anti-forensics tools it can be accomplished.

2017-02-27
Njenga, K., Ndlovu, S..  2015.  Mobile banking and information security risks: Demand-side predilections of South African lead-users. 2015 Second International Conference on Information Security and Cyber Forensics (InfoSec). :86–92.

South Africa's lead-users predilections to tinker and innovate mobile banking services is driven by various constructs. Advanced technologies have made mobile banking services easy to use, attractive and beneficial. While this is welcome news to many, there are concerns that when lead-users tinker with these services, information security risks are exacerbated. The aim of this article is to present an insightful understanding of the demand-side predilections of South Africa's lead-users in such contexts. We assimilate the theories of Usage Control, (UCON), the Theory of Technology Acceptance Model (TAM), and the Theory of Perceived Risk (TPP) to explain predilections over technology. We demonstrate that constructs derived from these theories can explain the general demand-side predilection to tinker with mobile banking services. A quantitative approach was used to test this. From a sample of South African banking lead-users operating in Gauteng province of South Africa, data was collected and analysed with the help of a software package. We found unexpectedly that, lead-users predilections to tinker with mobile banking services was inhibited by perceived risk. Moreover, male lead-users were more domineering in the tinkering process than female lead-users. The implication for this is discussed and explained in the main body of work.

2015-01-09
2015-05-06
Khobragade, P.K., Malik, L.G..  2014.  Data Generation and Analysis for Digital Forensic Application Using Data Mining. Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on. :458-462.

In the cyber crime huge log data, transactional data occurs which tends to plenty of data for storage and analyze them. It is difficult for forensic investigators to play plenty of time to find out clue and analyze those data. In network forensic analysis involves network traces and detection of attacks. The trace involves an Intrusion Detection System and firewall logs, logs generated by network services and applications, packet captures by sniffers. In network lots of data is generated in every event of action, so it is difficult for forensic investigators to find out clue and analyzing those data. In network forensics is deals with analysis, monitoring, capturing, recording, and analysis of network traffic for detecting intrusions and investigating them. This paper focuses on data collection from the cyber system and web browser. The FTK 4.0 is discussing for memory forensic analysis and remote system forensic which is to be used as evidence for aiding investigation.
 

2015-05-05
Dey, L., Mahajan, D., Gupta, H..  2014.  Obtaining Technology Insights from Large and Heterogeneous Document Collections. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:102-109.

Keeping up with rapid advances in research in various fields of Engineering and Technology is a challenging task. Decision makers including academics, program managers, venture capital investors, industry leaders and funding agencies not only need to be abreast of latest developments but also be able to assess the effect of growth in certain areas on their core business. Though analyst agencies like Gartner, McKinsey etc. Provide such reports for some areas, thought leaders of all organisations still need to amass data from heterogeneous collections like research publications, analyst reports, patent applications, competitor information etc. To help them finalize their own strategies. Text mining and data analytics researchers have been looking at integrating statistics, text analytics and information visualization to aid the process of retrieval and analytics. In this paper, we present our work on automated topical analysis and insight generation from large heterogeneous text collections of publications and patents. While most of the earlier work in this area provides search-based platforms, ours is an integrated platform for search and analysis. We have presented several methods and techniques that help in analysis and better comprehension of search results. We have also presented methods for generating insights about emerging and popular trends in research along with contextual differences between academic research and patenting profiles. We also present novel techniques to present topic evolution that helps users understand how a particular area has evolved over time.
 

2015-05-06
Jian Sun, Haitao Liao, Upadhyaya, B.R..  2014.  A Robust Functional-Data-Analysis Method for Data Recovery in Multichannel Sensor Systems. Cybernetics, IEEE Transactions on. 44:1420-1431.

Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
 

Jian Sun, Haitao Liao, Upadhyaya, B.R..  2014.  A Robust Functional-Data-Analysis Method for Data Recovery in Multichannel Sensor Systems. Cybernetics, IEEE Transactions on. 44:1420-1431.

Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
 

2015-05-05
Conglei Shi, Yingcai Wu, Shixia Liu, Hong Zhou, Huamin Qu.  2014.  LoyalTracker: Visualizing Loyalty Dynamics in Search Engines. Visualization and Computer Graphics, IEEE Transactions on. 20:1733-1742.

The huge amount of user log data collected by search engine providers creates new opportunities to understand user loyalty and defection behavior at an unprecedented scale. However, this also poses a great challenge to analyze the behavior and glean insights into the complex, large data. In this paper, we introduce LoyalTracker, a visual analytics system to track user loyalty and switching behavior towards multiple search engines from the vast amount of user log data. We propose a new interactive visualization technique (flow view) based on a flow metaphor, which conveys a proper visual summary of the dynamics of user loyalty of thousands of users over time. Two other visualization techniques, a density map and a word cloud, are integrated to enable analysts to gain further insights into the patterns identified by the flow view. Case studies and the interview with domain experts are conducted to demonstrate the usefulness of our technique in understanding user loyalty and switching behavior in search engines.
 

Campbell, S..  2014.  Open science, open security. High Performance Computing Simulation (HPCS), 2014 International Conference on. :584-587.

We propose that to address the growing problems with complexity and data volumes in HPC security wee need to refactor how we look at data by creating tools that not only select data, but analyze and represent it in a manner well suited for intuitive analysis. We propose a set of rules describing what this means, and provide a number of production quality tools that represent our current best effort in implementing these ideas.
 

2019-09-09
A. Endert.  2014.  Semantic Interaction for Visual Analytics: Toward Coupling Cognition and Computation. IEEE Computer Graphics and Applications. 34:8-15.

Alex Endert's dissertation "Semantic Interaction for Visual Analytics: Inferring Analytical Reasoning for Model Steering" described semantic interaction, a user interaction methodology for visual analytics (VA). It showed that user interaction embodies users' analytic process and can thus be mapped to model-steering functionality for "human-in-the-loop" system design. The dissertation contributed a framework (or pipeline) that describes such a process, a prototype VA system to test semantic interaction, and a user evaluation to demonstrate semantic interaction's impact on the analytic process. This research is influencing current VA research and has implications for future VA research.

2015-05-05
Gaff, Brian M., Sussman, Heather Egan, Geetter, Jennifer.  2014.  Privacy and Big Data. Computer. 47:7-9.

Big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth. The Web extra at http://youtu.be/j49eoe5g8-c is an audio recording from the Computing and the Law column, in which authors Brian M. Gaff, Heather Egan Sussman, and Jennifer Geetter discuss how big data's explosive growth has prompted the US government to release new reports that address the issues--particularly related to privacy--resulting from this growth.
 

Mukkamala, R.R., Hussain, A., Vatrapu, R..  2014.  Towards a Set Theoretical Approach to Big Data Analytics. Big Data (BigData Congress), 2014 IEEE International Congress on. :629-636.

Formal methods, models and tools for social big data analytics are largely limited to graph theoretical approaches such as social network analysis (SNA) informed by relational sociology. There are no other unified modeling approaches to social big data that integrate the conceptual, formal and software realms. In this paper, we first present and discuss a theory and conceptual model of social data. Second, we outline a formal model based on set theory and discuss the semantics of the formal model with a real-world social data example from Facebook. Third, we briefly present and discuss the Social Data Analytics Tool (SODATO) that realizes the conceptual model in software and provisions social data analysis based on the conceptual and formal models. Fourth and last, based on the formal model and sentiment analysis of text, we present a method for profiling of artifacts and actors and apply this technique to the data analysis of big social data collected from Facebook page of the fast fashion company, H&M.
 

2015-05-06
Ochian, A., Suciu, G., Fratu, O., Voicu, C., Suciu, V..  2014.  An overview of cloud middleware services for interconnection of healthcare platforms. Communications (COMM), 2014 10th International Conference on. :1-4.

Using heterogeneous clouds has been considered to improve performance of big-data analytics for healthcare platforms. However, the problem of the delay when transferring big-data over the network needs to be addressed. The purpose of this paper is to analyze and compare existing cloud computing environments (PaaS, IaaS) in order to implement middleware services. Understanding the differences and similarities between cloud technologies will help in the interconnection of healthcare platforms. The paper provides a general overview of the techniques and interfaces for cloud computing middleware services, and proposes a cloud architecture for healthcare. Cloud middleware enables heterogeneous devices to act as data sources and to integrate data from other healthcare platforms, but specific APIs need to be developed. Furthermore, security and management problems need to be addressed, given the heterogeneous nature of the communication and computing environment. The present paper fills a gap in the electronic healthcare register literature by providing an overview of cloud computing middleware services and standardized interfaces for the integration with medical devices.

2015-05-05
Mittal, D., Kaur, D., Aggarwal, A..  2014.  Secure Data Mining in Cloud Using Homomorphic Encryption. Cloud Computing in Emerging Markets (CCEM), 2014 IEEE International Conference on. :1-7.

With the advancement in technology, industry, e-commerce and research a large amount of complex and pervasive digital data is being generated which is increasing at an exponential rate and often termed as big data. Traditional Data Storage systems are not able to handle Big Data and also analyzing the Big Data becomes a challenge and thus it cannot be handled by traditional analytic tools. Cloud Computing can resolve the problem of handling, storage and analyzing the Big Data as it distributes the big data within the cloudlets. No doubt, Cloud Computing is the best answer available to the problem of Big Data storage and its analyses but having said that, there is always a potential risk to the security of Big Data storage in Cloud Computing, which needs to be addressed. Data Privacy is one of the major issues while storing the Big Data in a Cloud environment. Data Mining based attacks, a major threat to the data, allows an adversary or an unauthorized user to infer valuable and sensitive information by analyzing the results generated from computation performed on the raw data. This thesis proposes a secure k-means data mining approach assuming the data to be distributed among different hosts preserving the privacy of the data. The approach is able to maintain the correctness and validity of the existing k-means to generate the final results even in the distributed environment.
 

Kotenko, I., Novikova, E..  2014.  Visualization of Security Metrics for Cyber Situation Awareness. Availability, Reliability and Security (ARES), 2014 Ninth International Conference on. :506-513.

One of the important direction of research in situational awareness is implementation of visual analytics techniques which can be efficiently applied when working with big security data in critical operational domains. The paper considers a visual analytics technique for displaying a set of security metrics used to assess overall network security status and evaluate the efficiency of protection mechanisms. The technique can assist in solving such security tasks which are important for security information and event management (SIEM) systems. The approach suggested is suitable for displaying security metrics of large networks and support historical analysis of the data. To demonstrate and evaluate the usefulness of the proposed technique we implemented a use case corresponding to the Olympic Games scenario.