Visible to the public Biblio

Filters: Keyword is Bipartite graph  [Clear All Filters]
2021-01-11
Zhao, F., Skums, P., Zelikovsky, A., Sevigny, E. L., Swahn, M. H., Strasser, S. M., Huang, Y., Wu, Y..  2020.  Computational Approaches to Detect Illicit Drug Ads and Find Vendor Communities Within Social Media Platforms. IEEE/ACM Transactions on Computational Biology and Bioinformatics. :1–1.
The opioid abuse epidemic represents a major public health threat to global populations. The role social media may play in facilitating illicit drug trade is largely unknown due to limited research. However, it is known that social media use among adults in the US is widespread, there is vast capability for online promotion of illegal drugs with delayed or limited deterrence of such messaging, and further, general commercial sale applications provide safeguards for transactions; however, they do not discriminate between legal and illegal sale transactions. These characteristics of the social media environment present challenges to surveillance which is needed for advancing knowledge of online drug markets and the role they play in the drug abuse and overdose deaths. In this paper, we present a computational framework developed to automatically detect illicit drug ads and communities of vendors.The SVM- and CNNbased methods for detecting illicit drug ads, and a matrix factorization based method for discovering overlapping communities have been extensively validated on the large dataset collected from Google+, Flickr and Tumblr. Pilot test results demonstrate that our computational methods can effectively identify illicit drug ads and detect vendor-community with accuracy. These methods hold promise to advance scientific knowledge surrounding the role social media may play in perpetuating the drug abuse epidemic.
2020-12-14
Ge, K., He, Y..  2020.  Detection of Sybil Attack on Tor Resource Distribution. 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS). :328–332.
Tor anonymous communication system's resource publishing is vulnerable to enumeration attacks. Zhao determines users who requested resources are unavailable as suspicious malicious users, and gradually reduce the scope of suspicious users through several stages to reduce the false positive rate. However, it takes several stages to distinguish users. Although this method successfully detects the malicious user, the malicious user has acquired many resources in the previous stages, which reduce the availability of the anonymous communication system. This paper proposes a detection method based on Integer Linear Program to detect malicious users who perform enumeration attacks on resources in the process of resource distribution. First, we need construct a bipartite graph between the unavailable resources and the users who requested for these resources in the anonymous communication system; next we use Integer Linear Program to find the minimum malicious user set. We simulate the resource distribution process through computer program, we perform an experimental analysis of the method in this paper is carried out. Experimental results show that the accuracy of the method in this paper is above 80%, when the unavailable resources in the system account for no more than 50%. It is about 10% higher than Zhao's method.
2020-07-03
Jia, Guanbo, Miller, Paul, Hong, Xin, Kalutarage, Harsha, Ban, Tao.  2019.  Anomaly Detection in Network Traffic Using Dynamic Graph Mining with a Sparse Autoencoder. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :458—465.

Network based attacks on ecommerce websites can have serious economic consequences. Hence, anomaly detection in dynamic network traffic has become an increasingly important research topic in recent years. This paper proposes a novel dynamic Graph and sparse Autoencoder based Anomaly Detection algorithm named GAAD. In GAAD, the network traffic over contiguous time intervals is first modelled as a series of dynamic bipartite graph increments. One mode projection is performed on each bipartite graph increment and the adjacency matrix derived. Columns of the resultant adjacency matrix are then used to train a sparse autoencoder to reconstruct it. The sum of squared errors between the reconstructed approximation and original adjacency matrix is then calculated. An online learning algorithm is then used to estimate a Gaussian distribution that models the error distribution. Outlier error values are deemed to represent anomalous traffic flows corresponding to possible attacks. In the experiment, a network emulator was used to generate representative ecommerce traffic flows over a time period of 225 minutes with five attacks injected, including SYN scans, host emulation and DDoS attacks. ROC curves were generated to investigate the influence of the autoencoder hyper-parameters. It was found that increasing the number of hidden nodes and their activation level, and increasing sparseness resulted in improved performance. Analysis showed that the sparse autoencoder was unable to encode the highly structured adjacency matrix structures associated with attacks, hence they were detected as anomalies. In contrast, SVD and variants, such as the compact matrix decomposition, were found to accurately encode the attack matrices, hence they went undetected.

2020-05-11
Chae, Younghun, Katenka, Natallia, DiPippo, Lisa.  2019.  An Adaptive Threshold Method for Anomaly-based Intrusion Detection Systems. 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA). :1–4.
Anomaly-based Detection Systems (ADSs) attempt to learn the features of behaviors and events of a system and/or users over a period to build a profile of normal behaviors. There has been a growing interest in ADSs and typically conceived as more powerful systems One of the important factors for ADSs is an ability to distinguish between normal and abnormal behaviors in a given period. However, it is getting complicated due to the dynamic network environment that changes every minute. It is dangerous to distinguish between normal and abnormal behaviors with a fixed threshold in a dynamic environment because it cannot guarantee the threshold is always an indication of normal behaviors. In this paper, we propose an adaptive threshold for a dynamic environment with a trust management scheme for efficiently managing the profiles of normal and abnormal behaviors. Based on the assumption of the statistical analysis-based ADS that normal data instances occur in high probability regions while malicious data instances occur in low probability regions of a stochastic model, we set two adaptive thresholds for normal and abnormal behaviors. The behaviors between the two thresholds are classified as suspicious behaviors, and they are efficiently evaluated with a trust management scheme.
2020-03-18
Kalashnikov, A.O., Anikina, E.V..  2019.  Complex Network Cybersecurity Monitoring Method. 2019 Twelfth International Conference "Management of large-scale system development" (MLSD). :1–3.
This paper considers one of the methods of efficient allocation of limited resources in special-purpose devices (sensors) to monitor complex network unit cybersecurity.
2019-08-05
Kaiafas, G., Varisteas, G., Lagraa, S., State, R., Nguyen, C. D., Ries, T., Ourdane, M..  2018.  Detecting Malicious Authentication Events Trustfully. NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium. :1-6.

Anomaly detection on security logs is receiving more and more attention. Authentication events are an important component of security logs, and being able to produce trustful and accurate predictions minimizes the effort of cyber-experts to stop false attacks. Observed events are classified into Normal, for legitimate user behavior, and Malicious, for malevolent actions. These classes are consistently excessively imbalanced which makes the classification problem harder; in the commonly used Los Alamos dataset, the malicious class comprises only 0.00033% of the total. This work proposes a novel method to extract advanced composite features, and a supervised learning technique for classifying authentication logs trustfully; the models are Random Forest, LogitBoost, Logistic Regression, and ultimately Majority Voting which leverages the predictions of the previous models and gives the final prediction for each authentication event. We measure the performance of our experiments by using the False Negative Rate and False Positive Rate. In overall we achieve 0 False Negative Rate (i.e. no attack was missed), and on average a False Positive Rate of 0.0019.

2019-02-14
Zhang, S., Wolthusen, S. D..  2018.  Efficient Control Recovery for Resilient Control Systems. 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). :1-6.

Resilient control systems should efficiently restore control into physical systems not only after the sabotage of themselves, but also after breaking physical systems. To enhance resilience of control systems, given an originally minimal-input controlled linear-time invariant(LTI) physical system, we address the problem of efficient control recovery into it after removing a known system vertex by finding the minimum number of inputs. According to the minimum input theorem, given a digraph embedded into LTI model and involving a precomputed maximum matching, this problem is modeled into recovering controllability of it after removing a known network vertex. Then, we recover controllability of the residual network by efficiently finding a maximum matching rather than recomputation. As a result, except for precomputing a maximum matching and the following removed vertex, the worst-case execution time of control recovery into the residual LTI physical system is linear.

2018-09-05
Li, C., Palanisamy, B., Joshi, J..  2017.  Differentially Private Trajectory Analysis for Points-of-Interest Recommendation. 2017 IEEE International Congress on Big Data (BigData Congress). :49–56.

Ubiquitous deployment of low-cost mobile positioning devices and the widespread use of high-speed wireless networks enable massive collection of large-scale trajectory data of individuals moving on road networks. Trajectory data mining finds numerous applications including understanding users' historical travel preferences and recommending places of interest to new visitors. Privacy-preserving trajectory mining is an important and challenging problem as exposure of sensitive location information in the trajectories can directly invade the location privacy of the users associated with the trajectories. In this paper, we propose a differentially private trajectory analysis algorithm for points-of-interest recommendation to users that aims at maximizing the accuracy of the recommendation results while protecting the privacy of the exposed trajectories with differential privacy guarantees. Our algorithm first transforms the raw trajectory dataset into a bipartite graph with nodes representing the users and the points-of-interest and the edges representing the visits made by the users to the locations, and then extracts the association matrix representing the bipartite graph to inject carefully calibrated noise to meet έ-differential privacy guarantees. A post-processing of the perturbed association matrix is performed to suppress noise prior to performing a Hyperlink-Induced Topic Search (HITS) on the transformed data that generates an ordered list of recommended points-of-interest. Extensive experiments on a real trajectory dataset show that our algorithm is efficient, scalable and demonstrates high recommendation accuracy while meeting the required differential privacy guarantees.

Palanisamy, B., Li, C., Krishnamurthy, P..  2017.  Group Differential Privacy-Preserving Disclosure of Multi-level Association Graphs. 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). :2587–2588.

Traditional privacy-preserving data disclosure solutions have focused on protecting the privacy of individual's information with the assumption that all aggregate (statistical) information about individuals is safe for disclosure. Such schemes fail to support group privacy where aggregate information about a group of individuals may also be sensitive and users of the published data may have different levels of access privileges entitled to them. We propose the notion ofεg-Group Differential Privacy that protects sensitive information of groups of individuals at various defined privacy levels, enabling data users to obtain the level of access entitled to them. We present a preliminary evaluation of the proposed notion of group privacy through experiments on real association graph data that demonstrate the guarantees on group privacy on the disclosed data.

2018-05-30
Moriano, Pablo, Pendleton, Jared, Rich, Steven, Camp, L Jean.  2017.  Insider Threat Event Detection in User-System Interactions. Proceedings of the 2017 International Workshop on Managing Insider Security Threats. :1–12.

Detection of insider threats relies on monitoring individuals and their interactions with organizational resources. Identification of anomalous insiders typically relies on supervised learning models that use labeled data. However, such labeled data is not easily obtainable. The labeled data that does exist is also limited by current insider threat detection methods and undetected insiders would not be included. These models also inherently assume that the insider threat is not rapidly evolving between model generation and use of the model in detection. Yet there is a large body of research that illustrates that the insider threat changes significantly after some types of precipitating events, such as layoffs, significant restructuring, and plant or facility closure. To capture this temporal evolution of user-system interactions, we use an unsupervised learning framework to evaluate whether potential insider threat events are triggered following precipitating events. The analysis leverages a bipartite graph of user and system interactions. The approach shows a clear correlation between precipitating events and the number of apparent anomalies. The results of our empirical analysis show a clear shift in behaviors after events which have previously been shown to increase insider activity, specifically precipitating events. We argue that this metadata about the level of insider threat behaviors validates the potential of the approach. We apply our method to a dataset that comprises interactions between engineers and software components in an enterprise version control system spanning more than 22 years. We use this unlabeled dataset and automatically detect statistically significant events. We show that there is statistically significant evidence that a subset of users diversify their committing behavior after precipitating events have been announced. Although these findings do not constitute detection of insider threat events per se, they do identify patterns of potentially malicious high-risk insider behavior. They reinforce the idea that insider operations can be motivated by the insiders' environment. Our proposed framework outperforms algorithms based on naive random approaches and algorithms using volume dependent statistics. This graph mining technique has potential for early detection of insider threat behavior in user-system interactions independent of the volume of interactions. The proposed method also enables organizations without a corpus of identified insider threats to train its own anomaly detection system.

2018-02-06
Eslami, M., Zheng, G., Eramian, H., Levchuk, G..  2017.  Anomaly Detection on Bipartite Graphs for Cyber Situational Awareness and Threat Detection. 2017 IEEE International Conference on Big Data (Big Data). :4741–4743.

Data from cyber logs can often be represented as a bipartite graph (e.g. internal IP-external IP, user-application, or client-server). State-of-the-art graph based anomaly detection often generalizes across all types of graphs — namely bipartite and non-bipartite. This confounds the interpretation and use of specific graph features such as degree, page rank, and eigencentrality that can provide a security analyst with rapid situational awareness of their network. Furthermore, graph algorithms applied to data collected from large, distributed enterprise scale networks require accompanying methods that allow them to scale to the data collected. In this paper, we provide a novel, scalable, directional graph projection framework that operates on cyber logs that can be represented as bipartite graphs. This framework computes directional graph projections and identifies a set of interpretable graph features that describe anomalies within each partite.

Palanisamy, B., Li, C., Krishnamurthy, P..  2017.  Group Privacy-Aware Disclosure of Association Graph Data. 2017 IEEE International Conference on Big Data (Big Data). :1043–1052.

In the age of Big Data, we are witnessing a huge proliferation of digital data capturing our lives and our surroundings. Data privacy is a critical barrier to data analytics and privacy-preserving data disclosure becomes a key aspect to leveraging large-scale data analytics due to serious privacy risks. Traditional privacy-preserving data publishing solutions have focused on protecting individual's private information while considering all aggregate information about individuals as safe for disclosure. This paper presents a new privacy-aware data disclosure scheme that considers group privacy requirements of individuals in bipartite association graph datasets (e.g., graphs that represent associations between entities such as customers and products bought from a pharmacy store) where even aggregate information about groups of individuals may be sensitive and need protection. We propose the notion of $ε$g-Group Differential Privacy that protects sensitive information of groups of individuals at various defined group protection levels, enabling data users to obtain the level of information entitled to them. Based on the notion of group privacy, we develop a suite of differentially private mechanisms that protect group privacy in bipartite association graphs at different group privacy levels based on specialization hierarchies. We evaluate our proposed techniques through extensive experiments on three real-world association graph datasets and our results demonstrate that the proposed techniques are effective, efficient and provide the required guarantees on group privacy.

2018-01-23
Eslami, M., Zheng, G., Eramian, H., Levchuk, G..  2017.  Deriving cyber use cases from graph projections of cyber data represented as bipartite graphs. 2017 IEEE International Conference on Big Data (Big Data). :4658–4663.

Graph analysis can capture relationships between network entities and can be used to identify and rank anomalous hosts, users, or applications from various types of cyber logs. It is often the case that the data in the logs can be represented as a bipartite graph (e.g. internal IP-external IP, user-application, or client-server). State-of-the-art graph based anomaly detection often generalizes across all types of graphs — namely bipartite and non-bipartite. This confounds the interpretation and use of specific graph features such as degree, page rank, and eigencentrality that can provide a security analyst with situational awareness and even insights to potential attacks on enterprise scale networks. Furthermore, graph algorithms applied to data collected from large, distributed enterprise scale networks require accompanying methods that allow them to scale to the data collected. In this paper, we provide a novel, scalable, directional graph projection framework that operates on cyber logs that can be represented as bipartite graphs. We also present methodologies to further narrow returned results to anomalous/outlier cases that may be indicative of a cyber security event. This framework computes directional graph projections and identifies a set of interpretable graph features that describe anomalies within each partite.

2017-12-12
Gamachchi, A., Boztas, S..  2017.  Insider Threat Detection Through Attributed Graph Clustering. 2017 IEEE Trustcom/BigDataSE/ICESS. :112–119.

While most organizations continue to invest in traditional network defences, a formidable security challenge has been brewing within their own boundaries. Malicious insiders with privileged access in the guise of a trusted source have carried out many attacks causing far reaching damage to financial stability, national security and brand reputation for both public and private sector organizations. Growing exposure and impact of the whistleblower community and concerns about job security with changing organizational dynamics has further aggravated this situation. The unpredictability of malicious attackers, as well as the complexity of malicious actions, necessitates the careful analysis of network, system and user parameters correlated with insider threat problem. Thus it creates a high dimensional, heterogeneous data analysis problem in isolating suspicious users. This research work proposes an insider threat detection framework, which utilizes the attributed graph clustering techniques and outlier ranking mechanism for enterprise users. Empirical results also confirm the effectiveness of the method by achieving the best area under curve value of 0.7648 for the receiver operating characteristic curve.

2017-11-03
Harrigan, M., Fretter, C..  2016.  The Unreasonable Effectiveness of Address Clustering. 2016 Intl IEEE Conferences on Ubiquitous Intelligence Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). :368–373.

Address clustering tries to construct the one-to-many mapping from entities to addresses in the Bitcoin system. Simple heuristics based on the micro-structure of transactions have proved very effective in practice. In this paper we describe the primary reasons behind this effectiveness: address reuse, avoidable merging, super-clusters with high centrality,, the incremental growth of address clusters. We quantify their impact during Bitcoin's first seven years of existence.

2015-05-06
Kanizo, Y., Hay, D., Keslassy, I..  2015.  Maximizing the Throughput of Hash Tables in Network Devices with Combined SRAM/DRAM Memory. Parallel and Distributed Systems, IEEE Transactions on. 26:796-809.

Hash tables form a core component of many algorithms as well as network devices. Because of their large size, they often require a combined memory model, in which some of the elements are stored in a fast memory (for example, cache or on-chip SRAM) while others are stored in much slower memory (namely, the main memory or off-chip DRAM). This makes the implementation of real-life hash tables particularly delicate, as a suboptimal choice of the hashing scheme parameters may result in a higher average query time, and therefore in a lower throughput. In this paper, we focus on multiple-choice hash tables. Given the number of choices, we study the tradeoff between the load of a hash table and its average lookup time. The problem is solved by analyzing an equivalent problem: the expected maximum matching size of a random bipartite graph with a fixed left-side vertex degree. Given two choices, we provide exact results for any finite system, and also deduce asymptotic results as the fast memory size increases. In addition, we further consider other variants of this problem and model the impact of several parameters. Finally, we evaluate the performance of our models on Internet backbone traces, and illustrate the impact of the memories speed difference on the choice of parameters. In particular, we show that the common intuition of entirely avoiding slow memory accesses by using highly efficient schemes (namely, with many fast-memory choices) is not always optimal.