Visible to the public Biblio

Filters: Keyword is data reduction  [Clear All Filters]
2021-02-23
Xia, H., Gao, N., Peng, J., Mo, J., Wang, J..  2020.  Binarized Attributed Network Embedding via Neural Networks. 2020 International Joint Conference on Neural Networks (IJCNN). :1—8.
Traditional attributed network embedding methods are designed to map structural and attribute information of networks jointly into a continuous Euclidean space, while recently a novel branch of them named binarized attributed network embedding has emerged to learn binary codes in Hamming space, aiming to save time and memory costs and to naturally fit node retrieval task. However, current binarized attributed network embedding methods are scarce and mostly ignore the local attribute similarity between each pair of nodes. Besides, none of them attempt to control the independency of each dimension(bit) of the learned binary representation vectors. As existing methods still need improving, we propose an unsupervised Neural-based Binarized Attributed Network Embedding (NBANE) approach. Firstly, we inherit the Weisfeiler-Lehman proximity matrix from predecessors to aggregate high-order features for each node. Secondly, we feed the aggregated features into an autoencoder with the attribute similarity penalizing term and the orthogonality term to make further dimension reduction. To solve the problem of integer optimization we adopt the relaxation-quantization method during the process of training neural networks. Empirically, we evaluate the performance of NBANE through node classification and clustering tasks on three real-world datasets and study a case on fast retrieval in academic networks. Our method achieves better performance over state- of-the-art baselines methods of various types.
2019-01-21
Tang, Yutao, Li, Ding, Li, Zhichun, Zhang, Mu, Jee, Kangkook, Xiao, Xusheng, Wu, Zhenyu, Rhee, Junghwan, Xu, Fengyuan, Li, Qun.  2018.  NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :1324–1337.
Today's enterprises are exposed to sophisticated attacks, such as Advanced Persistent Threats\textbackslashtextasciitilde(APT) attacks, which usually consist of stealthy multiple steps. To counter these attacks, enterprises often rely on causality analysis on the system activity data collected from a ubiquitous system monitoring to discover the initial penetration point, and from there identify previously unknown attack steps. However, one major challenge for causality analysis is that the ubiquitous system monitoring generates a colossal amount of data and hosting such a huge amount of data is prohibitively expensive. Thus, there is a strong demand for techniques that reduce the storage of data for causality analysis and yet preserve the quality of the causality analysis. To address this problem, in this paper, we propose NodeMerge, a template based data reduction system for online system event storage. Specifically, our approach can directly work on the stream of system dependency data and achieve data reduction on the read-only file events based on their access patterns. It can either reduce the storage cost or improve the performance of causality analysis under the same budget. Only with a reasonable amount of resource for online data reduction, it nearly completely preserves the accuracy for causality analysis. The reduced form of data can be used directly with little overhead. To evaluate our approach, we conducted a set of comprehensive evaluations, which show that for different categories of workloads, our system can reduce the storage capacity of raw system dependency data by as high as 75.7 times, and the storage capacity of the state-of-the-art approach by as high as 32.6 times. Furthermore, the results also demonstrate that our approach keeps all the causality analysis information and has a reasonably small overhead in memory and hard disk.
2018-09-28
Brandauer, C., Dorfinger, P., Paiva, P. Y. A..  2017.  Towards scalable and adaptable security monitoring. 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC). :1–6.

A long time ago Industrial Control Systems were in a safe place due to the use of proprietary technology and physical isolation. This situation has changed dramatically and the systems are nowadays often prone to severe attacks executed from remote locations. In many cases, intrusions remain undetected for a long time and this allows the adversary to meticulously prepare an attack and maximize its destructiveness. The ability to detect an attack in its early stages thus has a high potential to significantly reduce its impact. To this end, we propose a holistic, multi-layered, security monitoring and mitigation framework spanning the physical- and cyber domain. The comprehensiveness of the approach demands for scalability measures built-in by design. In this paper we present how scalability is addressed by an architecture that enforces geographically decentralized data reduction approaches that can be dynamically adjusted to the currently perceived context. A specific focus is put on a robust and resilient solution to orchestrate dynamic configuration updates. Experimental results based on a prototype implementation show the feasibility of the approach.

2018-02-21
Foreman, J. C., Pacheco, F. E..  2017.  Aggregation architecture for data reduction and privacy in advanced metering infrastructure. 2017 IEEE PES Innovative Smart Grid Technologies Conference - Latin America (ISGT Latin America). :1–5.

Advanced Metering Infrastructure (AMI) have rapidly become a topic of international interest as governments have sponsored their deployment for the purposes of utility service reliability and efficiency, e.g., water and electricity conservation. Two problems plague such deployments. First is the protection of consumer privacy. Second is the problem of huge amounts of data from such deployments. A new architecture is proposed to address these problems through the use of Aggregators, which incorporate temporary data buffering and the modularization of utility grid analysis. These Aggregators are used to deliver anonymized summary data to the central utility while preserving billing and automated connection services.

2017-12-28
Liu, H., Ditzler, G..  2017.  A fast information-theoretic approximation of joint mutual information feature selection. 2017 International Joint Conference on Neural Networks (IJCNN). :4610–4617.

Feature selection is an important step in data analysis to address the curse of dimensionality. Such dimensionality reduction techniques are particularly important when if a classification is required and the model scales in polynomial time with the size of the feature (e.g., some applications include genomics, life sciences, cyber-security, etc.). Feature selection is the process of finding the minimum subset of features that allows for the maximum predictive power. Many of the state-of-the-art information-theoretic feature selection approaches use a greedy forward search; however, there are concerns with the search in regards to the efficiency and optimality. A unified framework was recently presented for information-theoretic feature selection that tied together many of the works in over the past twenty years. The work showed that joint mutual information maximization (JMI) is generally the best options; however, the complexity of greedy search for JMI scales quadratically and it is infeasible on high dimensional datasets. In this contribution, we propose a fast approximation of JMI based on information theory. Our approach takes advantage of decomposing the calculations within JMI to speed up a typical greedy search. We benchmarked the proposed approach against JMI on several UCI datasets, and we demonstrate that the proposed approach returns feature sets that are highly consistent with JMI, while decreasing the run time required to perform feature selection.

2017-05-30
Xu, Zhang, Wu, Zhenyu, Li, Zhichun, Jee, Kangkook, Rhee, Junghwan, Xiao, Xusheng, Xu, Fengyuan, Wang, Haining, Jiang, Guofei.  2016.  High Fidelity Data Reduction for Big Data Security Dependency Analyses. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :504–516.

Intrusive multi-step attacks, such as Advanced Persistent Threat (APT) attacks, have plagued enterprises with significant financial losses and are the top reason for enterprises to increase their security budgets. Since these attacks are sophisticated and stealthy, they can remain undetected for years if individual steps are buried in background "noise." Thus, enterprises are seeking solutions to "connect the suspicious dots" across multiple activities. This requires ubiquitous system auditing for long periods of time, which in turn causes overwhelmingly large amount of system audit events. Given a limited system budget, how to efficiently handle ever-increasing system audit logs is a great challenge. This paper proposes a new approach that exploits the dependency among system events to reduce the number of log entries while still supporting high-quality forensic analysis. In particular, we first propose an aggregation algorithm that preserves the dependency of events during data reduction to ensure the high quality of forensic analysis. Then we propose an aggressive reduction algorithm and exploit domain knowledge for further data reduction. To validate the efficacy of our proposed approach, we conduct a comprehensive evaluation on real-world auditing systems using log traces of more than one month. Our evaluation results demonstrate that our approach can significantly reduce the size of system logs and improve the efficiency of forensic analysis without losing accuracy.

2015-05-06
Mokhtar, B., Eltoweissy, M..  2014.  Towards a Data Semantics Management System for Internet Traffic. New Technologies, Mobility and Security (NTMS), 2014 6th International Conference on. :1-5.

Although current Internet operations generate voluminous data, they remain largely oblivious of traffic data semantics. This poses many inefficiencies and challenges due to emergent or anomalous behavior impacting the vast array of Internet elements such as services and protocols. In this paper, we propose a Data Semantics Management System (DSMS) for learning Internet traffic data semantics to enable smarter semantics- driven networking operations. We extract networking semantics and build and utilize a dynamic ontology of network concepts to better recognize and act upon emergent or abnormal behavior. Our DSMS utilizes: (1) Latent Dirichlet Allocation algorithm (LDA) for latent features extraction and semantics reasoning; (2) big tables as a cloud-like data storage technique to maintain large-scale data; and (3) Locality Sensitive Hashing algorithm (LSH) for reducing data dimensionality. Our preliminary evaluation using real Internet traffic shows the efficacy of DSMS for learning behavior of normal and abnormal traffic data and for accurately detecting anomalies at low cost.
 

2015-05-05
Zadeh, B.Q., Handschuh, S..  2014.  Random Manhattan Indexing. Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. :203-208.

Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in text processing. In these models, high-dimensional, often sparse vectors represent text units. In an application, the similarity of vectors -- and hence the text units that they represent -- is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a new method, called Random Manhattan Indexing (RMI), for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental, and thus scalable, procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections.

2015-05-04
Liu, J.N.K., Yanxing Hu, You, J.J., Yulin He.  2014.  An advancing investigation on reduct and consistency for decision tables in Variable Precision Rough Set models. Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on. :1496-1503.

Variable Precision Rough Set (VPRS) model is one of the most important extensions of the Classical Rough Set (RS) theory. It employs a majority inclusion relation mechanism in order to make the Classical RS model become more fault tolerant, and therefore the generalization of the model is improved. This paper can be viewed as an extension of previous investigations on attribution reduction problem in VPRS model. In our investigation, we illustrated with examples that the previously proposed reduct definitions may spoil the hidden classification ability of a knowledge system by ignoring certian essential attributes in some circumstances. Consequently, by proposing a new β-consistent notion, we analyze the relationship between the structures of Decision Table (DT) and different definitions of reduct in VPRS model. Then we give a new notion of β-complement reduct that can avoid the defects of reduct notions defined in previous literatures. We also supply the method to obtain the β- complement reduct using a decision table splitting algorithm, and finally demonstrate the feasibility of our approach with sample instances.