Biblio
In Machine Learning, White Box Adversarial Attacks rely on knowing underlying knowledge about the model attributes. This works focuses on discovering to distrinct pieces of model information: the underlying architecture and primary training dataset. With the process in this paper, a structured set of input probes and the output of the model become the training data for a deep classifier. Two subdomains in Machine Learning are explored - image based classifiers and text transformers with GPT-2. With image classification, the focus is on exploring commonly deployed architectures and datasets available in popular public libraries. Using a single transformer architecture with multiple levels of parameters, text generation is explored by fine tuning off different datasets. Each dataset explored in image and text are distinguishable from one another. Diversity in text transformer outputs implies further research is needed to successfully classify architecture attribution in text domain.
Although Stylometry has been effectively used for Authorship Attribution, there is a growing number of methods being developed that allow authors to mask their identity [2, 13]. In this paper, we investigate the usage of non-traditional feature sets for Authorship Attribution. By using non-traditional feature sets, one may be able to reveal the identity of adversarial authors who are attempting to evade detection from Authorship Attribution systems that are based on more traditional feature sets. In addition, we demonstrate how GEFeS (Genetic & Evolutionary Feature Selection) can be used to evolve high-performance hybrid feature sets composed of two non-traditional feature sets for Authorship Attribution: LIWC (Linguistic Inquiry & Word Count) and Sentiment Analysis. These hybrids were able to reduce the Adversarial Effectiveness on a test set presented in [2] by approximately 33.4%.
We are currently witnessing the development of increasingly effective author identification systems (AISs) that have the potential to track users across the internet based on their writing style. In this paper, we discuss two methods for providing user anonymity with respect to writing style: Adversarial Stylometry and Adversarial Authorship. With Adversarial Stylometry, a user attempts to obfuscate their writing style by consciously altering it. With Adversarial Authorship, a user can select an author cluster target (ACT) and write toward this target with the intention of subverting an AIS so that the user's writing sample will be misclassified Our results show that Adversarial Authorship via interactive evolutionary hill-climbing outperforms Adversarial Stylometry.