Visible to the public Biblio

Filters: Author is Greenstadt, Rachel  [Clear All Filters]
2019-02-22
Dauber, Edwin, Caliskan, Aylin, Harang, Richard, Greenstadt, Rachel.  2018.  Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. :356-357.

Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution "in the wild," examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.

2018-03-05
Greenstadt, Rachel.  2017.  Using Stylometry to Attribute Programmers and Writers. Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. :91–91.

In this talk, I will discuss my lab's work in the emerging field of adversarial stylometry and machine learning. Machine learning algorithms are increasingly being used in security and privacy domains, in areas that go beyond intrusion or spam detection. For example, in digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. We have applied stylometry to difficult domains such as underground hacker forums, open source projects (code), and tweets. I will discuss our Doppelgnger Finder algorithm, which enables us to group Sybil accounts on underground forums and detect blogs from Twitter feeds and reddit comments. In addition, I will discuss our work attributing unknown source code and binaries.