Visible to the public Biblio

Filters: Keyword is data analytics  [Clear All Filters]
2023-06-09
Wang, Shuangbao Paul, Arafin, Md Tanvir, Osuagwu, Onyema, Wandji, Ketchiozo.  2022.  Cyber Threat Analysis and Trustworthy Artificial Intelligence. 2022 6th International Conference on Cryptography, Security and Privacy (CSP). :86—90.
Cyber threats can cause severe damage to computing infrastructure and systems as well as data breaches that make sensitive data vulnerable to attackers and adversaries. It is therefore imperative to discover those threats and stop them before bad actors penetrating into the information systems.Threats hunting algorithms based on machine learning have shown great advantage over classical methods. Reinforcement learning models are getting more accurate for identifying not only signature-based but also behavior-based threats. Quantum mechanics brings a new dimension in improving classification speed with exponential advantage. The accuracy of the AI/ML algorithms could be affected by many factors, from algorithm, data, to prejudicial, or even intentional. As a result, AI/ML applications need to be non-biased and trustworthy.In this research, we developed a machine learning-based cyber threat detection and assessment tool. It uses two-stage (both unsupervised and supervised learning) analyzing method on 822,226 log data recorded from a web server on AWS cloud. The results show the algorithm has the ability to identify the threats with high confidence.
2023-03-31
Soderi, Mirco, Kamath, Vignesh, Breslin, John G..  2022.  A Demo of a Software Platform for Ubiquitous Big Data Engineering, Visualization, and Analytics, via Reconfigurable Micro-Services, in Smart Factories. 2022 IEEE International Conference on Smart Computing (SMARTCOMP). :1–3.
Intelligent, smart, Cloud, reconfigurable manufac-turing, and remote monitoring, all intersect in modern industry and mark the path toward more efficient, effective, and sustain-able factories. Many obstacles are found along the path, including legacy machineries and technologies, security issues, and software that is often hard, slow, and expensive to adapt to face unforeseen challenges and needs in this fast-changing ecosystem. Light-weight, portable, loosely coupled, easily monitored, variegated software components, supporting Edge, Fog and Cloud computing, that can be (re)created, (re)configured and operated from remote through Web requests in a matter of milliseconds, and that rely on libraries of ready-to-use tasks also extendable from remote through sub-second Web requests, constitute a fertile technological ground on top of which fourth-generation industries can be built. In this demo it will be shown how starting from a completely virgin Docker Engine, it is possible to build, configure, destroy, rebuild, operate, exclusively from remote, exclusively via API calls, computation networks that are capable to (i) raise alerts based on configured thresholds or trained ML models, (ii) transform Big Data streams, (iii) produce and persist Big Datasets on the Cloud, (iv) train and persist ML models on the Cloud, (v) use trained models for one-shot or stream predictions, (vi) produce tabular visualizations, line plots, pie charts, histograms, at real-time, from Big Data streams. Also, it will be shown how easily such computation networks can be upgraded with new functionalities at real-time, from remote, via API calls.
ISSN: 2693-8340
2022-11-18
Banasode, Praveen, Padmannavar, Sunita.  2021.  Evaluation of Performance for Big Data Security Using Advanced Cryptography Policy. 2021 International Conference on Forensics, Analytics, Big Data, Security (FABS). 1:1—5.
The revolution caused by the advanced analysis features of Internet of Things and big data have made a big turnaround in the digital world. Data analysis is not only limited to collect useful data but also useful in analyzing information quickly. Therefore, most of the variants of the shared system based on the parallel structural model are explored simultaneously as the appropriate big data storage library stimulates researchers’ interest in the distributed system. Due to the emerging digital technologies, different groups such as healthcare facilities, financial institutions, e-commerce, food service and supply chain management generate a surprising amount of information. Although the process of statistical analysis is essential, it can cause significant security and privacy issues. Therefore, the analysis of data privacy protection is very important. Using the platform, technology should focus on providing Advanced Cryptography Policy (ACP). This research explores different security risks, evolutionary mechanisms and risks of privacy protection. It further recommends the post-statistical modern privacy protection act to manage data privacy protection in binary format, because it is kept confidential by the user. The user authentication program has already filed access restrictions. To maintain this purpose, everyone’s attitude is to achieve a changing identity. This article is designed to protect the privacy of users and propose a new system of restoration of controls.
2022-05-19
Rabbani, Mustafa Raza, Bashar, Abu, Atif, Mohd, Jreisat, Ammar, Zulfikar, Zehra, Naseem, Yusra.  2021.  Text mining and visual analytics in research: Exploring the innovative tools. 2021 International Conference on Decision Aid Sciences and Application (DASA). :1087–1091.
The aim of the study is to present an advanced overview and potential application of the innovative tools/software's/methods used for data visualization, text mining, scientific mapping, and bibliometric analysis. Text mining and data visualization has been a topic of research for several years for academic researchers and practitioners. With the advancement in technology and innovation in the data analysis techniques, there are many online and offline software tools available for text mining and visualisation. The purpose of this study is to present an advanced overview of latest, sophisticated, and innovative tools available for this purpose. The unique characteristic about this study is that it provides an overview with examples of the five most adopted software tools such as VOSviewer, Biblioshiny, Gephi, HistCite and CiteSpace in social science research. This study will contribute to the academic literature and will help the researchers and practitioners to apply these tools in future research to present their findings in a more scientific manner.
2021-09-07
Simud, Thikamporn, Ruengittinun, Somchoke, Surasvadi, Navaporn, Sanglerdsinlapachai, Nuttapong, Plangprasopchok, Anon.  2020.  A Conversational Agent for Database Query: A Use Case for Thai People Map and Analytics Platform. 2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP). :1–6.
Since 2018, Thai People Map and Analytics Platform (TPMAP) has been developed with the aims of supporting government officials and policy makers with integrated household and community data to analyze strategic plans, implement policies and decisions to alleviate poverty. However, to acquire complex information from the platform, non-technical users with no database background have to ask a programmer or a data scientist to query data for them. Such a process is time-consuming and might result in inaccurate information retrieved due to miscommunication between non-technical and technical users. In this paper, we have developed a Thai conversational agent on top of TPMAP to support self-service data analytics on complex queries. Users can simply use natural language to fetch information from our chatbot and the query results are presented to users in easy-to-use formats such as statistics and charts. The proposed conversational agent retrieves and transforms natural language queries into query representations with relevant entities, query intentions, and output formats of the query. We employ Rasa, an open-source conversational AI engine, for agent development. The results show that our system yields Fl-score of 0.9747 for intent classification and 0.7163 for entity extraction. The obtained intents and entities are then used for query target information from a graph database. Finally, our system achieves end-to-end performance with accuracies ranging from 57.5%-80.0%, depending on query message complexity. The generated answers are then returned to users through a messaging channel.
2021-05-25
Zanin, M., Menasalvas, E., González, A. Rodriguez, Smrz, P..  2020.  An Analytics Toolbox for Cyber-Physical Systems Data Analysis: Requirements and Challenges. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). :271–276.
The fast improvement in telecommunication technologies that has characterised the last decade is enabling a revolution centred on Cyber-Physical Systems (CPSs). Elements inside cities, from vehicles to cars, can now be connected and share data, describing both our environment and our behaviours. These data can also be used in an active way, by becoming the tenet of innovative services and products, i.e. of Cyber-Physical Products (CPPs). Still, having data is not tantamount to having knowledge, and an important overlooked topic is how should them be analysed. In this contribution we tackle the issue of the development of an analytics toolbox for processing CPS data. Specifically, we review and quantify the main requirements that should be fulfilled, both functional (e.g. flexibility or dependability) and technical (e.g. scalability, response time, etc.). We further propose an initial set of analysis that should in it be included. We finally review some challenges and open issues, including how security and privacy could be tackled by emerging new technologies.
2021-04-27
Khalid, O., Senthilananthan, S..  2020.  A review of data analytics techniques for effective management of big data using IoT. 2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA). :1—10.
IoT and big data are energetic technology of the world for quite a time, and both of these have become a necessity. On the one side where IoT is used to connect different objectives via the internet, the big data means having a large number of the set of structured, unstructured, and semi-structured data. The device used for processing based on the tools used. These tools help provide meaningful information used for effective management in different domains. Some of the commonly faced issues with the inadequate about the technologies are related to data privacy, insufficient analytical capabilities, and this issue is faced by in different domains related to the big data. Data analytics tools help discover the pattern of data and consumer preferences which is resulting in better decision making for the organizations. The major part of this work is to review different types of data analytics techniques for the effective management of big data using IoT. For the effective management of the ABD solution collection, analysis and control are used as the components. Each of the ingredients is described to find an effective way to manage big data. These components are considered and used in the validation criteria. The solution of effective data management is a stage towards the management of big data in IoT devices which will help the user to understand different types of elements of data management.
Reddy, C. b Manjunath, reddy, U. k, Brumancia, E., Gomathi, R. M., Indira, K..  2020.  Integrative Approach Of Big Data And Network Attacks Analysis In Cloud Environment. 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184). :314—317.

Lately mining of information from online life is pulling in more consideration because of the blast in the development of Big Data. In security, Big Data manages an assortment of immense advanced data for investigating, envisioning and to draw the bits of knowledge for the expectation and anticipation of digital assaults. Big Data Analytics (BDA) is the term composed by experts to portray the art of dealing with, taking care of and gathering a great deal of data for future evaluation. Data is being made at an upsetting rate. The quick improvement of the Internet, Internet of Things (IoT) and other creative advances are the rule liable gatherings behind this proceeded with advancement. The data made is an impression of the earth, it is conveyed out of, along these lines can use the data got away from structures to understand the internal exercises of that system. This has become a significant element in cyber security where the objective is to secure resources. Moreover, the developing estimation of information has made large information a high worth objective. Right now, investigate ongoing exploration works in cyber security comparable to huge information and feature how Big information is secured and how huge information can likewise be utilized as a device for cyber security. Simultaneously, a Big Data based concentrated log investigation framework is actualized to distinguish the system traffic happened with assailants through DDOS, SQL Injection and Bruce Force assault. The log record is naturally transmitted to the brought together cloud server and big information is started in the investigation process.

2021-02-22
Alzahrani, A., Feki, J..  2020.  Toward a Natural Language-Based Approach for the Specification of Decisional-Users Requirements. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The number of organizations adopting the Data Warehouse (DW) technology along with data analytics in order to improve the effectiveness of their decision-making processes is permanently increasing. Despite the efforts invested, the DW design remains a great challenge research domain. More accurately, the design quality of the DW depends on several aspects; among them, the requirement-gathering phase is a critical and complex task. In this context, we propose a Natural language (NL) NL-template based design approach, which is twofold; firstly, it facilitates the involvement of decision-makers in the early step of the DW design; indeed, using NL is a good and natural means to encourage the decision-makers to express their requirements as query-like English sentences. Secondly, our approach aims to generate a DW multidimensional schema from a set of gathered requirements (as OLAP: On-Line-Analytical-Processing queries, written according to the NL suggested templates). This approach articulates around: (i) two NL-templates for specifying multidimensional components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from requirements. Really, we are developing a software prototype that accepts the decision-makers' requirements then automatically identifies the multidimensional components of the DW model.
2020-11-23
Jolfaei, A., Kant, K., Shafei, H..  2019.  Secure Data Streaming to Untrusted Road Side Units in Intelligent Transportation System. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :793–798.
The paper considers data security issues in vehicle-to-infrastructure communications, where vehicles stream data to a road side unit. We assume aggregated data in road side units can be stored or used for data analytics. In this environment, there are issues in regards to the scalability of key management and computation limitations at the edge of the network. To address these issues, we suggest the formation of groups in the vehicle layer, where a group leader is assigned to communicate with group devices and the road side unit. We propose a lightweight permutation mechanism for preserving the confidentiality of sensory data.
2020-10-06
Dattana, Vishal, Gupta, Kishu, Kush, Ashwani.  2019.  A Probability based Model for Big Data Security in Smart City. 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC). :1—6.

Smart technologies at hand have facilitated generation and collection of huge volumes of data, on daily basis. It involves highly sensitive and diverse data like personal, organisational, environment, energy, transport and economic data. Data Analytics provide solution for various issues being faced by smart cities like crisis response, disaster resilience, emergence management, smart traffic management system etc.; it requires distribution of sensitive data among various entities within or outside the smart city,. Sharing of sensitive data creates a need for efficient usage of smart city data to provide smart applications and utility to the end users in a trustworthy and safe mode. This shared sensitive data if get leaked as a consequence can cause damage and severe risk to the city's resources. Fortification of critical data from unofficial disclosure is biggest issue for success of any project. Data Leakage Detection provides a set of tools and technology that can efficiently resolves the concerns related to smart city critical data. The paper, showcase an approach to detect the leakage which is caused intentionally or unintentionally. The model represents allotment of data objects between diverse agents using Bigraph. The objective is to make critical data secure by revealing the guilty agent who caused the data leakage.

2020-09-28
Madhan, E.S., Ghosh, Uttam, Tosh, Deepak K., Mandal, K., Murali, E., Ghosh, Soumalya.  2019.  An Improved Communications in Cyber Physical System Architecture, Protocols and Applications. 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). :1–6.
In recent trends, Cyber-Physical Systems (CPS) and Internet of Things interpret an evolution of computerized integration connectivity. The specific research challenges in CPS as security, privacy, data analytics, participate sensing, smart decision making. In addition, The challenges in Wireless Sensor Network (WSN) includes secure architecture, energy efficient protocols and quality of services. In this paper, we present an architectures of CPS and its protocols and applications. We propose software related mobile sensing paradigm namely Mobile Sensor Information Agent (MSIA). It works as plug-in based for CPS middleware and scalable applications in mobile devices. The working principle MSIA is acts intermediary device and gathers data from a various external sensors and its upload to cloud on demand. CPS needs tight integration between cyber world and man-made physical world to achieve stability, security, reliability, robustness, and efficiency in the system. Emerging software-defined networking (SDN) can be integrated as the communication infrastructure with CPS infrastructure to accomplish such system. Thus we propose a possible SDN-based CPS framework to improve the performance of the system.
2020-09-14
Ortiz Garcés, Ivan, Cazares, Maria Fernada, Andrade, Roberto Omar.  2019.  Detection of Phishing Attacks with Machine Learning Techniques in Cognitive Security Architecture. 2019 International Conference on Computational Science and Computational Intelligence (CSCI). :366–370.
The number of phishing attacks has increased in Latin America, exceeding the operational skills of cybersecurity analysts. The cognitive security application proposes the use of bigdata, machine learning, and data analytics to improve response times in attack detection. This paper presents an investigation about the analysis of anomalous behavior related with phishing web attacks and how machine learning techniques can be an option to face the problem. This analysis is made with the use of an contaminated data sets, and python tools for developing machine learning for detect phishing attacks through of the analysis of URLs to determinate if are good or bad URLs in base of specific characteristics of the URLs, with the goal of provide realtime information for take proactive decisions that minimize the impact of an attack.
2020-08-28
Zobaed, S.M., ahmad, sahan, Gottumukkala, Raju, Salehi, Mohsen Amini.  2019.  ClustCrypt: Privacy-Preserving Clustering of Unstructured Big Data in the Cloud. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :609—616.
Security and confidentiality of big data stored in the cloud are important concerns for many organizations to adopt cloud services. One common approach to address the concerns is client-side encryption where data is encrypted on the client machine before being stored in the cloud. Having encrypted data in the cloud, however, limits the ability of data clustering, which is a crucial part of many data analytics applications, such as search systems. To overcome the limitation, in this paper, we present an approach named ClustCrypt for efficient topic-based clustering of encrypted unstructured big data in the cloud. ClustCrypt dynamically estimates the optimal number of clusters based on the statistical characteristics of encrypted data. It also provides clustering approach for encrypted data. We deploy ClustCrypt within the context of a secure cloud-based semantic search system (S3BD). Experimental results obtained from evaluating ClustCrypt on three datasets demonstrate on average 60% improvement on clusters' coherency. ClustCrypt also decreases the search-time overhead by up to 78% and increases the accuracy of search results by up to 35%.
2020-08-07
Smith, Gary.  2019.  Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe. 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). :150—153.
Artificial Intelligence (AI) can and does use individual's data to make predictions about their wants, their needs, their influences on them and predict what they could do. The use of individual's data naturally raises privacy concerns. This article focuses on AI, the privacy issue against the backdrop of the endless growth of the Digital Universe where Big Data, AI, Data Analytics and 5G Technology live and grow in The Internet of Things (IoT).
2020-06-01
Vegh, Laura.  2018.  Cyber-physical systems security through multi-factor authentication and data analytics. 2018 IEEE International Conference on Industrial Technology (ICIT). :1369–1374.
We are living in a society where technology is present everywhere we go. We are striving towards smart homes, smart cities, Internet of Things, Internet of Everything. Not so long ago, a password was all you needed for secure authentication. Nowadays, even the most complicated passwords are not considered enough. Multi-factor authentication is gaining more and more terrain. Complex system may also require more than one solution for real, strong security. The present paper proposes a framework based with MFA as a basis for access control and data analytics. Events within a cyber-physical system are processed and analyzed in an attempt to detect, prevent and mitigate possible attacks.
2020-04-17
Almousa, May, Anwar, Mohd.  2019.  Detecting Exploit Websites Using Browser-based Predictive Analytics. 2019 17th International Conference on Privacy, Security and Trust (PST). :1—3.
The popularity of Web-based computing has given increase to browser-based cyberattacks. These cyberattacks use websites that exploit various web browser vulnerabilities. To help regular users avoid exploit websites and engage in safe online activities, we propose a methodology of building a machine learning-powered predictive analytical model that will measure the risk of attacks and privacy breaches associated with visiting different websites and performing online activities using web browsers. The model will learn risk levels from historical data and metadata scraped from web browsers.
2020-02-26
Tran, Geoffrey Phi, Walters, John Paul, Crago, Stephen.  2019.  Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy. 2019 IEEE International Conference on Services Computing (SCC). :51–55.

Data analytics and telemetry have become paramount to monitoring and maintaining quality-of-service in addition to business analytics. Stream processing-a model where a network of operators receives and processes continuously arriving discrete elements-is well-suited for these needs. Current and previous studies and frameworks have focused on continuity of operations and aggregate performance metrics. However, real-time performance and tail latency are also important. Timing errors caused by either performance or failed communication faults also affect real-time performance more drastically than aggregate metrics. In this paper, we introduce redundancy in the stream data to improve the real-time performance and resiliency to timing errors caused by either performance or failed communication faults. We also address limitations in previous solutions using a fine-grained acknowledgment tracking scheme to both increase the effectiveness for resiliency to performance faults and enable effectiveness for failed communication faults. Our results show that fine-grained acknowledgment schemes can improve the tail and mean latencies by approximately 30%. We also show that these schemes can improve resiliency to performance faults compared to existing work. Our improvements result in 47.4% to 92.9% fewer missed deadlines compared to 17.3% to 50.6% for comparable topologies and redundancy levels in the state of the art. Finally, we show that redundancies of 25% to 100% can reduce the number of data elements that miss their deadline constraints by 0.76% to 14.04% for applications with high fan-out and by 7.45% up to 50% for applications with no fan-out.

2019-08-26
Mavroeidis, V., Vishi, K., Jøsang, A..  2018.  A Framework for Data-Driven Physical Security and Insider Threat Detection. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :1108–1115.

This paper presents PSO, an ontological framework and a methodology for improving physical security and insider threat detection. PSO can facilitate forensic data analysis and proactively mitigate insider threats by leveraging rule-based anomaly detection. In all too many cases, rule-based anomaly detection can detect employee deviations from organizational security policies. In addition, PSO can be considered a security provenance solution because of its ability to fully reconstruct attack patterns. Provenance graphs can be further analyzed to identify deceptive actions and overcome analytical mistakes that can result in bad decision-making, such as false attribution. Moreover, the information can be used to enrich the available intelligence (about intrusion attempts) that can form use cases to detect and remediate limitations in the system, such as loosely-coupled provenance graphs that in many cases indicate weaknesses in the physical security architecture. Ultimately, validation of the framework through use cases demonstrates and proves that PS0 can improve an organization's security posture in terms of physical security and insider threat detection.

2019-06-24
Bessa, Ricardo J., Rua, David, Abreu, Cláudia, Machado, Paulo, Andrade, José R., Pinto, Rui, Gonçalves, Carla, Reis, Marisa.  2018.  Data Economy for Prosumers in a Smart Grid Ecosystem. Proceedings of the Ninth International Conference on Future Energy Systems. :622–630.

Smart grids technologies are enablers of new business models for domestic consumers with local flexibility (generation, loads, storage) and where access to data is a key requirement in the value stream. However, legislation on personal data privacy and protection imposes the need to develop local models for flexibility modeling and forecasting and exchange models instead of personal data. This paper describes the functional architecture of an home energy management system (HEMS) and its optimization functions. A set of data-driven models, embedded in the HEMS, are discussed for improving renewable energy forecasting skill and modeling multi-period flexibility of distributed energy resources.

2019-03-28
McDermott, C. D., Petrovski, A. V., Majdani, F..  2018.  Towards Situational Awareness of Botnet Activity in the Internet of Things. 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). :1-8.
The following topics are dealt with: security of data; risk management; decision making; computer crime; invasive software; critical infrastructures; data privacy; insurance; Internet of Things; learning (artificial intelligence).
2019-03-06
Leung, C. K., Hoi, C. S. H., Pazdor, A. G. M., Wodi, B. H., Cuzzocrea, A..  2018.  Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data. 2018 IEEE International Conference on Big Data (Big Data). :5101-5110.
As we are living in the era of big data, high volumes of wide varieties of data which may be of different veracity (e.g., precise data, imprecise and uncertain data) are easily generated or collected at a high velocity in many real-life applications. Embedded in these big data is valuable knowledge and useful information, which can be discovered by big data science solutions. As a popular data science task, frequent pattern mining aims to discover implicit, previously unknown and potentially useful information and valuable knowledge in terms of sets of frequently co-occurring merchandise items and/or events. Many of the existing frequent pattern mining algorithms use a transaction-centric mining approach to find frequent patterns from precise data. However, there are situations in which an item-centric mining approach is more appropriate, and there are also situations in which data are imprecise and uncertain. Hence, in this paper, we present an item-centric algorithm for mining frequent patterns from big uncertain data. In recent years, big data have been gaining the attention from the research community as driven by relevant technological innovations (e.g., clouds) and novel paradigms (e.g., social networks). As big data are typically published online to support knowledge management and fruition processes, these big data are usually handled by multiple owners with possible secure multi-part computation issues. Thus, privacy and security of big data has become a fundamental problem in this research context. In this paper, we present, not only an item-centric algorithm for mining frequent patterns from big uncertain data, but also a privacy-preserving algorithm. In other words, we present- in this paper-a privacy-preserving item-centric algorithm for mining frequent patterns from big uncertain data. Results of our analytical and empirical evaluation show the effectiveness of our algorithm in mining frequent patterns from big uncertain data in a privacy-preserving manner.
Khan, Latifur.  2018.  Big IoT Data Stream Analytics with Issues in Privacy and Security. Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. :22-22.
Internet of Things (IoT) Devices are monitoring and controlling systems that interact with the physical world by collecting, processing and transmitting data using the internet. IoT devices include home automation systems, smart grid, transportation systems, medical devices, building controls, manufacturing and industrial control systems. With the increase in deployment of IoT devices, there will be a corresponding increase in the amount of data generated by these devices, therefore, resulting in the need of large scale data processing systems to process and extract information for efficient and impactful decision making that will improve quality of living.
2018-06-07
Jiang, Jun, Zhao, Xinghui, Wallace, Scott, Cotilla-Sanchez, Eduardo, Bass, Robert.  2017.  Mining PMU Data Streams to Improve Electric Power System Resilience. Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. :95–102.
Phasor measurement units (PMUs) provide high-fidelity situational awareness of electric power grid operations. PMU data are used in real-time to inform wide area state estimation, monitor area control error, and event detection. As PMU data becomes more reliable, these devices are finding roles within control systems such as demand response programs and early fault detection systems. As with other cyber physical systems, maintaining data integrity and security are significant challenges for power system operators. In this paper, we present a comprehensive study of multiple machine learning techniques for detecting malicious data injection within PMU data streams. The two datasets used in this study are from the Bonneville Power Administration's PMU network and an inter-university PMU network among three universities, located in the U.S. Pacific Northwest. These datasets contain data from both the transmission level and the distribution level. Our results show that both SVM and ANN are generally effective in detecting spoofed data, and TensorFlow, the newly released tool, demonstrates potential for distributing the training workload and achieving higher performance. We expect these results to shed light on future work of adopting machine learning and data analytics techniques in the electric power industry.
2018-05-24
Sallam, A., Bertino, E..  2017.  Detection of Temporal Insider Threats to Relational Databases. 2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC). :406–415.

The mitigation of insider threats against databases is a challenging problem as insiders often have legitimate access privileges to sensitive data. Therefore, conventional security mechanisms, such as authentication and access control, may be insufficient for the protection of databases against insider threats and need to be complemented with techniques that support real-time detection of access anomalies. The existing real-time anomaly detection techniques consider anomalies in references to the database entities and the amounts of accessed data. However, they are unable to track the access frequencies. According to recent security reports, an increase in the access frequency by an insider is an indicator of a potential data misuse and may be the result of malicious intents for stealing or corrupting the data. In this paper, we propose techniques for tracking users' access frequencies and detecting anomalous related activities in real-time. We present detailed algorithms for constructing accurate profiles that describe the access patterns of the database users and for matching subsequent accesses by these users to the profiles. Our methods report and log mismatches as anomalies that may need further investigation. We evaluated our techniques on the OLTP-Benchmark. The results of the evaluation indicate that our techniques are very effective in the detection of anomalies.