Visible to the public Style Counsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry

TitleStyle Counsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry
Publication TypeConference Paper
Year of Publication2018
AuthorsMcKnight, Christopher, Goldberg, Ian
Conference NameProceedings of the 2018 Workshop on Privacy in the Electronic Society
Date PublishedJanuary 2018
PublisherACM
ISBN Number978-1-4503-5989-4
KeywordsAdversarial Machine Learning, Human Behavior, Metrics, programmer privacy, pubcrawl, software authorship attribution, source code stylometry, stylometry
Abstract

The results of recent experiments have suggested that code stylometry can successfully identify the author of short programs from among hundreds of candidates with up to 98% precision. This potential ability to discern the programmer of a code sample from a large group of possible authors could have concerning consequences for the open-source community at large, particularly those contributors that may wish to remain anonymous. Recent international events have suggested the developers of certain anti-censorship and anti-surveillance tools are being targeted by their governments and forced to delete their repositories or face prosecution. In light of this threat to the freedom and privacy of individual programmers around the world, we devised a tool, Style Counsel, to aid programmers in obfuscating their inherent style and imitating another, overt, author's style in order to protect their anonymity from this forensic technique. Our system utilizes the implicit rules encoded in the decision points of a random forest ensemble in order to derive a set of recommendations to present to the user detailing how to achieve this obfuscation and mimicry attack.

URLhttps://dl.acm.org/doi/10.1145/3267323.3268951
DOI10.1145/3267323.3268951
Citation Keymcknight_style_2018