Visible to the public Biblio

Filters: Author is Wang, Gang  [Clear All Filters]
2019-12-16
Guo, Wenbo, Mu, Dongliang, Xu, Jun, Su, Purui, Wang, Gang, Xing, Xinyu.  2018.  LEMNA: Explaining Deep Learning Based Security Applications. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :364–379.
While deep learning has shown a great potential in various domains, the lack of transparency has limited its application in security or safety-critical areas. Existing research has attempted to develop explanation techniques to provide interpretable explanations for each classification decision. Unfortunately, current methods are optimized for non-security tasks ( e.g., image analysis). Their key assumptions are often violated in security applications, leading to a poor explanation fidelity. In this paper, we propose LEMNA, a high-fidelity explanation method dedicated for security applications. Given an input data sample, LEMNA generates a small set of interpretable features to explain how the input sample is classified. The core idea is to approximate a local area of the complex deep learning decision boundary using a simple interpretable model. The local interpretable model is specially designed to (1) handle feature dependency to better work with security applications ( e.g., binary code analysis); and (2) handle nonlinear local boundaries to boost explanation fidelity. We evaluate our system using two popular deep learning applications in security (a malware classifier, and a function start detector for binary reverse-engineering). Extensive evaluations show that LEMNA's explanation has a much higher fidelity level compared to existing methods. In addition, we demonstrate practical use cases of LEMNA to help machine learning developers to validate model behavior, troubleshoot classification errors, and automatically patch the errors of the target models.
2019-11-26
Tian, Ke, Jan, Steve T. K., Hu, Hang, Yao, Danfeng, Wang, Gang.  2018.  Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild. Proceedings of the Internet Measurement Conference 2018. :429-442.

Today's phishing websites are constantly evolving to deceive users and evade the detection. In this paper, we perform a measurement study on squatting phishing domains where the websites impersonate trusted entities not only at the page content level but also at the web domain level. To search for squatting phishing pages, we scanned five types of squatting domains over 224 million DNS records and identified 657K domains that are likely impersonating 702 popular brands. Then we build a novel machine learning classifier to detect phishing pages from both the web and mobile pages under the squatting domains. A key novelty is that our classifier is built on a careful measurement of evasive behaviors of phishing pages in practice. We introduce new features from visual analysis and optical character recognition (OCR) to overcome the heavy content obfuscation from attackers. In total, we discovered and verified 1,175 squatting phishing pages. We show that these phishing pages are used for various targeted scams, and are highly effective to evade detection. More than 90% of them successfully evaded popular blacklists for at least a month.

2019-02-22
Wang, Xiangwen, Peng, Peng, Wang, Chun, Wang, Gang.  2018.  You Are Your Photographs: Detecting Multiple Identities of Vendors in the Darknet Marketplaces. Proceedings of the 2018 on Asia Conference on Computer and Communications Security. :431-442.

Darknet markets are online services behind Tor where cybercriminals trade illegal goods and stolen datasets. In recent years, security analysts and law enforcement start to investigate the darknet markets to study the cybercriminal networks and predict future incidents. However, vendors in these markets often create multiple accounts ($\backslash$em i.e., Sybils), making it challenging to infer the relationships between cybercriminals and identify coordinated crimes. In this paper, we present a novel approach to link the multiple accounts of the same darknet vendors through photo analytics. The core idea is that darknet vendors often have to take their own product photos to prove the possession of the illegal goods, which can reveal their distinct photography styles. To fingerprint vendors, we construct a series deep neural networks to model the photography styles. We apply transfer learning to the model training, which allows us to accurately fingerprint vendors with a limited number of photos. We evaluate the system using real-world datasets from 3 large darknet markets (7,641 vendors and 197,682 product photos). A ground-truth evaluation shows that the system achieves an accuracy of 97.5%, outperforming existing stylometry-based methods in both accuracy and coverage. In addition, our system identifies previously unknown Sybil accounts within the same markets (23) and across different markets (715 pairs). Further case studies reveal new insights into the coordinated Sybil activities such as price manipulation, buyer scam, and product stocking and reselling.

2018-11-19
Kedrowitsch, Alexander, Yao, Danfeng(Daphne), Wang, Gang, Cameron, Kirk.  2017.  A First Look: Using Linux Containers for Deceptive Honeypots. Proceedings of the 2017 Workshop on Automated Decision Making for Active Cyber Defense. :15–22.

The ever-increasing sophistication of malware has made malicious binary collection and analysis an absolute necessity for proactive defenses. Meanwhile, malware authors seek to harden their binaries against analysis by incorporating environment detection techniques, in order to identify if the binary is executing within a virtual environment or in the presence of monitoring tools. For security researchers, it is still an open question regarding how to remove the artifacts from virtual machines to effectively build deceptive "honeypots" for malware collection and analysis. In this paper, we explore a completely different and yet promising approach by using Linux containers. Linux containers, in theory, have minimal virtualization artifacts and are easily deployable on low-power devices. Our work performs the first controlled experiments to compare Linux containers with bare metal and 5 major types of virtual machines. We seek to measure the deception capabilities offered by Linux containers to defeat mainstream virtual environment detection techniques. In addition, we empirically explore the potential weaknesses in Linux containers to help defenders to make more informed design decisions.

2017-09-15
Wang, Gang, Zhang, Xinyi, Tang, Shiliang, Zheng, Haitao, Zhao, Ben Y..  2016.  Unsupervised Clickstream Clustering for User Behavior Analysis. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. :225–236.

Online services are increasingly dependent on user participation. Whether it's online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this paper, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users' click events), and visualize the detected behaviors in an intuitive manner. Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity). The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors. For evaluation, we present case studies on two large-scale clickstream traces (142 million events) from real social networks. Our system effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters. Also, our user study shows people can easily interpret identified behaviors using our visualization tool.