Visible to the public Biblio

Found 2493 results

Filters: First Letter Of Last Name is W  [Clear All Filters]
2021-02-01
Zhang, Y., Liu, Y., Chung, C.-L., Wei, Y.-C., Chen, C.-H..  2020.  Machine Learning Method Based on Stream Homomorphic Encryption Computing. 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan). :1–2.
This study proposes a machine learning method based on stream homomorphic encryption computing for improving security and reducing computational time. A case study of mobile positioning based on k nearest neighbors ( kNN) is selected to evaluate the proposed method. The results showed the proposed method can save computational resources than others.
2021-01-28
Wang, N., Song, H., Luo, T., Sun, J., Li, J..  2020.  Enhanced p-Sensitive k-Anonymity Models for Achieving Better Privacy. 2020 IEEE/CIC International Conference on Communications in China (ICCC). :148—153.

To our best knowledge, the p-sensitive k-anonymity model is a sophisticated model to resist linking attacks and homogeneous attacks in data publishing. However, if the distribution of sensitive values is skew, the model is difficult to defend against skew attacks and even faces sensitive attacks. In practice, the privacy requirements of different sensitive values are not always identical. The “one size fits all” unified privacy protection level may cause unnecessary information loss. To address these problems, the paper quantifies privacy requirements with the concept of IDF and concerns more about sensitive groups. Two enhanced anonymous models with personalized protection characteristic, that is, (p,αisg) -sensitive k-anonymity model and (pi,αisg)-sensitive k-anonymity model, are then proposed to resist skew attacks and sensitive attacks. Furthermore, two clustering algorithms with global search and local search are designed to implement our models. Experimental results show that the two enhanced models have outstanding advantages in better privacy at the expense of a little data utility.

Zhang, M., Wei, T., Li, Z., Zhou, Z..  2020.  A service-oriented adaptive anonymity algorithm. 2020 39th Chinese Control Conference (CCC). :7626—7631.

Recently, a large amount of research studies aiming at the privacy-preserving data publishing have been conducted. We find that most K-anonymity algorithms fail to consider the characteristics of attribute values distribution in data and the contribution value differences in quasi-identifier attributes when service-oriented. In this paper, the importance of distribution characteristics of attribute values and the differences in contribution value of quasi-identifier attributes to anonymous results are illustrated. In order to maximize the utility of released data, a service-oriented adaptive anonymity algorithm is proposed. We establish a model of reaction dispersion degree to quantify the characteristics of attribute value distribution and introduce the concept of utility weight related to the contribution value of quasi-identifier attributes. The priority coefficient and the characterization coefficient of partition quality are defined to optimize selection strategies of dimension and splitting value in anonymity group partition process adaptively, which can reduce unnecessary information loss so as to further improve the utility of anonymized data. The rationality and validity of the algorithm are verified by theoretical analysis and multiple experiments.

Wang, Y., Gao, W., Hei, X., Mungwarama, I., Ren, J..  2020.  Independent credible: Secure communication architecture of Android devices based on TrustZone. 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics). :85—92.

The development of mobile internet has brought convenience to people, but the openness and diversity of mobile Internet make it face the security threat of communication privacy data disclosure. In this paper, a trusted android device security communication method based on TrustZone is proposed. Firstly, Elliptic Curve Diffie-Hellman (ECDH) key agreement algorithm is used to make both parties negotiate the session key in the Trusted Execution Environment (TEE), and then, we stored the key safely in the TEE. Finally, TEE completes the encryption and decryption of the transmitted data. This paper constructs a secure communication between mobile devices without a trusted third party and analyzes the feasibility of the method from time efficiency and security. The experimental results show that the method can resist malicious application monitoring in the process of data encryption and ensures the security of the session key. Compared with the traditional scheme, it is found that the performance of the scheme is not significantly reduced.

Goswami, U., Wang, K., Nguyen, G., Lagesse, B..  2020.  Privacy-Preserving Mobile Video Sharing using Fully Homomorphic Encryption. 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). :1—3.

Increased availability of mobile cameras has led to more opportunities for people to record videos of significantly more of their lives. Many times people want to share these videos, but only to certain people who were co-present. Since the videos may be of a large event where the attendees are not necessarily known, we need a method for proving co-presence without revealing information before co-presence is proven. In this demonstration, we present a privacy-preserving method for comparing the similarity of two videos without revealing the contents of either video. This technique leverages the Similarity of Simultaneous Observation technique for detecting hidden webcams and modifies the existing algorithms so that they are computationally feasible to run under fully homomorphic encryption scheme on modern mobile devices. The demonstration will consist of a variety of devices preloaded with our software. We will demonstrate the video sharing software performing comparisons in real time. We will also make the software available to Android devices via a QR code so that participants can record and exchange their own videos.

Nweke, L. O., Weldehawaryat, G. Kahsay, Wolthusen, S. D..  2020.  Adversary Model for Attacks Against IEC 61850 Real-Time Communication Protocols. 2020 16th International Conference on the Design of Reliable Communication Networks DRCN 2020. :1—8.

Adversarial models are well-established for cryptographic protocols, but distributed real-time protocols have requirements that these abstractions are not intended to cover. The IEEE/IEC 61850 standard for communication networks and systems for power utility automation in particular not only requires distributed processing, but in case of the generic object oriented substation events and sampled value (GOOSE/SV) protocols also hard real-time characteristics. This motivates the desire to include both quality of service (QoS) and explicit network topology in an adversary model based on a π-calculus process algebraic formalism based on earlier work. This allows reasoning over process states, placement of adversarial entities and communication behaviour. We demonstrate the use of our model for the simple case of a replay attack against the publish/subscribe GOOSE/SV subprotocol, showing bounds for non-detectability of such an attack.

Wang, W., Tang, B., Zhu, C., Liu, B., Li, A., Ding, Z..  2020.  Clustering Using a Similarity Measure Approach Based on Semantic Analysis of Adversary Behaviors. 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC). :1—7.

Rapidly growing shared information for threat intelligence not only helps security analysts reduce time on tracking attacks, but also bring possibilities to research on adversaries' thinking and decisions, which is important for the further analysis of attackers' habits and preferences. In this paper, we analyze current models and frameworks used in threat intelligence that suited to different modeling goals, and propose a three-layer model (Goal, Behavior, Capability) to study the statistical characteristics of APT groups. Based on the proposed model, we construct a knowledge network composed of adversary behaviors, and introduce a similarity measure approach to capture similarity degree by considering different semantic links between groups. After calculating similarity degrees, we take advantage of Girvan-Newman algorithm to discover community groups, clustering result shows that community structures and boundaries do exist by analyzing the behavior of APT groups.

2021-01-25
Malzahn, D., Birnbaum, Z., Wright-Hamor, C..  2020.  Automated Vulnerability Testing via Executable Attack Graphs. 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security). :1–10.
Cyber risk assessments are an essential process for analyzing and prioritizing security issues. Unfortunately, many risk assessment methodologies are marred by human subjectivity, resulting in non-repeatable, inconsistent findings. The absence of repeatable and consistent results can lead to suboptimal decision making with respect to cyber risk reduction. There is a pressing need to reduce cyber risk assessment uncertainty by using tools that use well defined inputs, producing well defined results. This paper presents Automated Vulnerability and Risk Analysis (AVRA), an end-to-end process and tool for identifying and exploiting vulnerabilities, designed for use in cyber risk assessments. The approach presented is more comprehensive than traditional vulnerability scans due to its analysis of an entire network, integrating both host and network information. AVRA automatically generates a detailed model of the network and its individual components, which is used to create an attack graph. Then, AVRA follows individual attack paths, automatically launching exploits to reach a particular objective. AVRA was successfully tested within a virtual environment to demonstrate practicality and usability. The presented approach and resulting system enhances the cyber risk assessment process through rigor, repeatability, and objectivity.
Feng, Y., Sun, G., Liu, Z., Wu, C., Zhu, X., Wang, Z., Wang, B..  2020.  Attack Graph Generation and Visualization for Industrial Control Network. 2020 39th Chinese Control Conference (CCC). :7655–7660.
Attack graph is an effective way to analyze the vulnerabilities for industrial control networks. We develop a vulnerability correlation method and a practical visualization technology for industrial control network. First of all, we give a complete attack graph analysis for industrial control network, which focuses on network model and vulnerability context. Particularly, a practical attack graph algorithm is proposed, including preparing environments and vulnerability classification and correlation. Finally, we implement a three-dimensional interactive attack graph visualization tool. The experimental results show validation and verification of the proposed method.
2021-01-22
Burr, B., Wang, S., Salmon, G., Soliman, H..  2020.  On the Detection of Persistent Attacks using Alert Graphs and Event Feature Embeddings. NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium. :1—4.
Intrusion Detection Systems (IDS) generate a high volume of alerts that security analysts do not have the resources to explore fully. Modelling attacks, especially the coordinated campaigns of Advanced Persistent Threats (APTs), in a visually-interpretable way is a useful approach for network security. Graph models combine multiple alerts and are well suited for visualization and interpretation, increasing security effectiveness. In this paper, we use feature embeddings, learned from network event logs, and community detection to construct and segment alert graphs of related alerts and networks hosts. We posit that such graphs can aid security analysts in investigating alerts and may capture multiple aspects of an APT attack. The eventual goal of this approach is to construct interpretable attack graphs and extract causality information to identify coordinated attacks.
2021-01-20
Li, Y., Yang, Y., Yu, X., Yang, T., Dong, L., Wang, W..  2020.  IoT-APIScanner: Detecting API Unauthorized Access Vulnerabilities of IoT Platform. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1—5.

The Internet of Things enables interaction between IoT devices and users through the cloud. The cloud provides services such as account monitoring, device management, and device control. As the center of the IoT platform, the cloud provides services to IoT devices and IoT applications through APIs. Therefore, the permission verification of the API is essential. However, we found that some APIs are unverified, which allows unauthorized users to access cloud resources or control devices; it could threaten the security of devices and cloud. To check for unauthorized access to the API, we developed IoT-APIScanner, a framework to check the permission verification of the cloud API. Through observation, we found there is a large amount of interactive information between IoT application and cloud, which include the APIs and related parameters, so we can extract them by analyzing the code of the IoT application, and use this for mutating API test cases. Through these test cases, we can effectively check the permissions of the API. In our research, we extracted a total of 5 platform APIs. Among them, the proportion of APIs without permission verification reached 13.3%. Our research shows that attackers could use the API without permission verification to obtain user privacy or control of devices.

Li, H., Xie, R., Kong, X., Wang, L., Li, B..  2020.  An Analysis of Utility for API Recommendation: Do the Matched Results Have the Same Efforts? 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). :479—488.

The current evaluation of API recommendation systems mainly focuses on correctness, which is calculated through matching results with ground-truth APIs. However, this measurement may be affected if there exist more than one APIs in a result. In practice, some APIs are used to implement basic functionalities (e.g., print and log generation). These APIs can be invoked everywhere, and they may contribute less than functionally related APIs to the given requirements in recommendation. To study the impacts of correct-but-useless APIs, we use utility to measure them. Our study is conducted on more than 5,000 matched results generated by two specification-based API recommendation techniques. The results show that the matched APIs are heavily overlapped, 10% APIs compose more than 80% matched results. The selected 10% APIs are all correct, but few of them are used to implement the required functionality. We further propose a heuristic approach to measure the utility and conduct an online evaluation with 15 developers. Their reports confirm that the matched results with higher utility score usually have more efforts on programming than the lower ones.

Mindermann, K., Wagner, S..  2020.  Fluid Intelligence Doesn't Matter! Effects of Code Examples on the Usability of Crypto APIs. 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). :306—307.

Context : Programmers frequently look for the code of previously solved problems that they can adapt for their own problem. Despite existing example code on the web, on sites like Stack Overflow, cryptographic Application Programming Interfaces (APIs) are commonly misused. There is little known about what makes examples helpful for developers in using crypto APIs. Analogical problem solving is a psychological theory that investigates how people use known solutions to solve new problems. There is evidence that the capacity to reason and solve novel problems a.k.a Fluid Intelligence (Gf) and structurally and procedurally similar solutions support problem solving. Aim: Our goal is to understand whether similarity and Gf also have an effect in the context of using cryptographic APIs with the help of code examples. Method : We conducted a controlled experiment with 76 student participants developing with or without procedurally similar examples, one of two Java crypto libraries and measured the Gf of the participants as well as the effect on usability (effectiveness, efficiency, satisfaction) and security bugs. Results: We observed a strong effect of code examples with a high procedural similarity on all dependent variables. Fluid intelligence Gf had no effect. It also made no difference which library the participants used. Conclusions: Example code must be more highly similar to a concrete solution, not very abstract and generic to have a positive effect in a development task.

Wang, H., Yang, J., Wang, X., Li, F., Liu, W., Liang, H..  2020.  Feature Fingerprint Extraction and Abnormity Diagnosis Method of the Vibration on the GIS. 2020 IEEE International Conference on High Voltage Engineering and Application (ICHVE). :1—4.

Mechanical faults of Gas Insulated Switchgear (GIS) often occurred, which may cause serious losses. Detecting vibration signal was effective for condition monitoring and fault diagnosis of GIS. The vibration characteristic of GIS in service was detected and researched based on a developed testing system in this paper, and feature fingerprint extraction method was proposed to evaluate vibration characteristics and diagnose mechanical defects. Through analyzing the spectrum of the vibration signal, we could see that vibration frequency of operating GIS was about 100Hz under normal condition. By means of the wavelet transformation, the vibration fingerprint was extracted for the diagnosis of mechanical vibration. The mechanical vibration characteristic of GIS including circuit breaker and arrester in service was detected, we could see that the frequency distribution of abnormal vibration signal was wider, it contained a lot of high harmonic components besides the 100Hz component, and the vibration acoustic fingerprint was totally different from the normal ones, that is, by comparing the frequency spectra and vibration fingerprint, the mechanical faults of GIS could be found effectively.

Lei, M., Jin, M., Huang, T., Guo, Z., Wang, Q., Wu, Z., Chen, Z., Chen, X., Zhang, J..  2020.  Ultra-wideband Fingerprinting Positioning Based on Convolutional Neural Network. 2020 International Conference on Computer, Information and Telecommunication Systems (CITS). :1—5.

The Global Positioning System (GPS) can determine the position of any person or object on earth based on satellite signals. But when inside the building, the GPS cannot receive signals, the indoor positioning system will determine the precise position. How to achieve more precise positioning is the difficulty of an indoor positioning system now. In this paper, we proposed an ultra-wideband fingerprinting positioning method based on a convolutional neural network (CNN), and we collect the dataset in a room to test the model, then compare our method with the existing method. In the experiment, our method can reach an accuracy of 98.36%. Compared with other fingerprint positioning methods our method has a great improvement in robustness. That results show that our method has good practicality while achieves higher accuracy.

2021-01-18
Yadav, M. K., Gugal, D., Matkar, S., Waghmare, S..  2019.  Encrypted Keyword Search in Cloud Computing using Fuzzy Logic. 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT). :1–4.
Research and Development, and information management professionals routinely employ simple keyword searches or more complex Boolean queries when using databases such as PubMed and Ovid and search engines like Google to find the information they need. While satisfying the basic needs of the researcher, basic search is limited which can adversely affect both precision and recall, decreasing productivity and damaging the researchers' ability to discover new insights. The cloud service providers who store user's data may access sensitive information without any proper authority. A basic approach to save the data confidentiality is to encrypt the data. Data encryption also demands the protection of keyword privacy since those usually contain very vital information related to the files. Encryption of keywords protects keyword safety. Fuzzy keyword search enhances system usability by matching the files perfectly or to the nearest possible files against the keywords entered by the user based on similar semantics. Encrypted keyword search in cloud using this logic provides the user, on entering keywords, to receive best possible files in a more secured manner, by protecting the user's documents.
Liu, J., Tong, X., Zhang, M., Wang, Z..  2020.  The Design of S-box Based on Combined Chaotic Map. 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). :350–353.
The strength of the substitution box (S-box) determines the security of the cryptographic algorithm because it's the only nonlinear component in the block cipher. Because of the disadvantages of non-uniformity sequence and limited range in the one-dimension (1D) chaotic map, this paper constructs the logistic map and the sine map into a combined chaotic map, and a new S-box construction method based on this combined chaotic map is presented. Performance tests were performed on the S-box, including nonlinearity, linear probability, differential probability, strict avalanche criterion, bits independence criterion. Compared with others S-box, this result indicates that the S-box has more excellent cryptographic performance and can be used as a nonlinear component in the lightweight block cipher algorithm.
Huang, Y., Wang, S., Wang, Y., Li, H..  2020.  A New Four-Dimensional Chaotic System and Its Application in Speech Encryption. 2020 Information Communication Technologies Conference (ICTC). :171–175.
Traditional encryption algorithms are not suitable for modern mass speech situations, while some low-dimensional chaotic encryption algorithms are simple and easy to implement, but their key space often small, leading to poor security, so there is still a lot of room for improvement. Aiming at these problems, this paper proposes a new type of four-dimensional chaotic system and applies it to speech encryption. Simulation results show that the encryption scheme in this paper has higher key space and security, which can achieve the speech encryption goal.
2021-01-15
Brockschmidt, J., Shang, J., Wu, J..  2019.  On the Generality of Facial Forgery Detection. 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems Workshops (MASSW). :43—47.
A variety of architectures have been designed or repurposed for the task of facial forgery detection. While many of these designs have seen great success, they largely fail to address challenges these models may face in practice. A major challenge is posed by generality, wherein models must be prepared to perform in a variety of domains. In this paper, we investigate the ability of state-of-the-art facial forgery detection architectures to generalize. We first propose two criteria for generality: reliably detecting multiple spoofing techniques and reliably detecting unseen spoofing techniques. We then devise experiments which measure how a given architecture performs against these criteria. Our analysis focuses on two state-of-the-art facial forgery detection architectures, MesoNet and XceptionNet, both being convolutional neural networks (CNNs). Our experiments use samples from six state-of-the-art facial forgery techniques: Deepfakes, Face2Face, FaceSwap, GANnotation, ICface, and X2Face. We find MesoNet and XceptionNet show potential to generalize to multiple spoofing techniques but with a slight trade-off in accuracy, and largely fail against unseen techniques. We loosely extrapolate these results to similar CNN architectures and emphasize the need for better architectures to meet the challenges of generality.
Khalid, H., Woo, S. S..  2020.  OC-FakeDect: Classifying Deepfakes Using One-class Variational Autoencoder. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). :2794—2803.
An image forgery method called Deepfakes can cause security and privacy issues by changing the identity of a person in a photo through the replacement of his/her face with a computer-generated image or another person's face. Therefore, a new challenge of detecting Deepfakes arises to protect individuals from potential misuses. Many researchers have proposed various binary-classification based detection approaches to detect deepfakes. However, binary-classification based methods generally require a large amount of both real and fake face images for training, and it is challenging to collect sufficient fake images data in advance. Besides, when new deepfakes generation methods are introduced, little deepfakes data will be available, and the detection performance may be mediocre. To overcome these data scarcity limitations, we formulate deepfakes detection as a one-class anomaly detection problem. We propose OC-FakeDect, which uses a one-class Variational Autoencoder (VAE) to train only on real face images and detects non-real images such as deepfakes by treating them as anomalies. Our preliminary result shows that our one class-based approach can be promising when detecting Deepfakes, achieving a 97.5% accuracy on the NeuralTextures data of the well-known FaceForensics++ benchmark dataset without using any fake images for the training process.
Zhu, K., Wu, B., Wang, B..  2020.  Deepfake Detection with Clustering-based Embedding Regularization. 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC). :257—264.

In recent months, AI-synthesized face swapping videos referred to as deepfake have become an emerging problem. False video is becoming more and more difficult to distinguish, which brings a series of challenges to social security. Some scholars are devoted to studying how to improve the detection accuracy of deepfake video. At the same time, in order to conduct better research, some datasets for deepfake detection are made. Companies such as Google and Facebook have also spent huge sums of money to produce datasets for deepfake video detection, as well as holding deepfake detection competitions. The continuous advancement of video tampering technology and the improvement of video quality have also brought great challenges to deepfake detection. Some scholars have achieved certain results on existing datasets, while the results on some high-quality datasets are not as good as expected. In this paper, we propose new method with clustering-based embedding regularization for deepfake detection. We use open source algorithms to generate videos which can simulate distinctive artifacts in the deepfake videos. To improve the local smoothness of the representation space, we integrate a clustering-based embedding regularization term into the classification objective, so that the obtained model learns to resist adversarial examples. We evaluate our method on three latest deepfake datasets. Experimental results demonstrate the effectiveness of our method.

2021-01-11
Wu, N., Farokhi, F., Smith, D., Kaafar, M. A..  2020.  The Value of Collaboration in Convex Machine Learning with Differential Privacy. 2020 IEEE Symposium on Security and Privacy (SP). :304–317.
In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.
Xin, B., Yang, W., Geng, Y., Chen, S., Wang, S., Huang, L..  2020.  Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :2927–2931.
Generative Adversarial Network (GAN) has already made a big splash in the field of generating realistic "fake" data. However, when data is distributed and data-holders are reluctant to share data for privacy reasons, GAN's training is difficult. To address this issue, we propose private FL-GAN, a differential privacy generative adversarial network model based on federated learning. By strategically combining the Lipschitz limit with the differential privacy sensitivity, the model can generate high-quality synthetic data without sacrificing the privacy of the training data. We theoretically prove that private FL-GAN can provide strict privacy guarantee with differential privacy, and experimentally demonstrate our model can generate satisfactory data.
Wang, J., Wang, A..  2020.  An Improved Collaborative Filtering Recommendation Algorithm Based on Differential Privacy. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :310–315.
In this paper, differential privacy protection method is applied to matrix factorization method that used to solve the recommendation problem. For centralized recommendation scenarios, a collaborative filtering recommendation model based on matrix factorization is established, and a matrix factorization mechanism satisfying ε-differential privacy is proposed. Firstly, the potential characteristic matrix of users and projects is constructed. Secondly, noise is added to the matrix by the method of target disturbance, which satisfies the differential privacy constraint, then the noise matrix factorization model is obtained. The parameters of the model are obtained by the stochastic gradient descent algorithm. Finally, the differential privacy matrix factorization model is used for score prediction. The effectiveness of the algorithm is evaluated on the public datasets including Movielens and Netflix. The experimental results show that compared with the existing typical recommendation methods, the new matrix factorization method with privacy protection can recommend within a certain range of recommendation accuracy loss while protecting the users' privacy information.
Zhao, F., Skums, P., Zelikovsky, A., Sevigny, E. L., Swahn, M. H., Strasser, S. M., Huang, Y., Wu, Y..  2020.  Computational Approaches to Detect Illicit Drug Ads and Find Vendor Communities Within Social Media Platforms. IEEE/ACM Transactions on Computational Biology and Bioinformatics. :1–1.
The opioid abuse epidemic represents a major public health threat to global populations. The role social media may play in facilitating illicit drug trade is largely unknown due to limited research. However, it is known that social media use among adults in the US is widespread, there is vast capability for online promotion of illegal drugs with delayed or limited deterrence of such messaging, and further, general commercial sale applications provide safeguards for transactions; however, they do not discriminate between legal and illegal sale transactions. These characteristics of the social media environment present challenges to surveillance which is needed for advancing knowledge of online drug markets and the role they play in the drug abuse and overdose deaths. In this paper, we present a computational framework developed to automatically detect illicit drug ads and communities of vendors.The SVM- and CNNbased methods for detecting illicit drug ads, and a matrix factorization based method for discovering overlapping communities have been extensively validated on the large dataset collected from Google+, Flickr and Tumblr. Pilot test results demonstrate that our computational methods can effectively identify illicit drug ads and detect vendor-community with accuracy. These methods hold promise to advance scientific knowledge surrounding the role social media may play in perpetuating the drug abuse epidemic.