Authorship Attribution for Social Media Forensics
Title | Authorship Attribution for Social Media Forensics |
Publication Type | Journal Article |
Year of Publication | 2017 |
Authors | Rocha, A., Scheirer, W. J., Forstall, C. W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A. R. B., Stamatatos, E. |
Journal | IEEE Transactions on Information Forensics and Security |
Volume | 12 |
Pagination | 5–33 |
ISSN | 1556-6013 |
Keywords | authorship attribution, authorship attribution algorithm, computational linguistics, Context, context exploitation, digital forensics, distributed networks, feature extraction, Forensics, habit peculiarities, Human Behavior, Internet, learning (artificial intelligence), machine learning, machine learning algorithms, Media, Metrics, multimodal data processing, prepaid SIM cards, pubcrawl, public Wi-Fi hotspots, smart phones, smartphones, social media, social media forensics, social networking (online), Speech, stylometry, supervised learning-based, user anonymity, user identification, Writing, writing style |
Abstract | The veil of anonymity provided by smartphones with pre-paid SIM cards, public Wi-Fi hotspots, and distributed networks like Tor has drastically complicated the task of identifying users of social media during forensic investigations. In some cases, the text of a single posted message will be the only clue to an author's identity. How can we accurately predict who that author might be when the message may never exceed 140 characters on a service like Twitter? For the past 50 years, linguists, computer scientists, and scholars of the humanities have been jointly developing automated methods to identify authors based on the style of their writing. All authors possess peculiarities of habit that influence the form and content of their written works. These characteristics can often be quantified and measured using machine learning algorithms. In this paper, we provide a comprehensive review of the methods of authorship attribution that can be applied to the problem of social media forensics. Furthermore, we examine emerging supervised learning-based methods that are effective for small sample sizes, and provide step-by-step explanations for several scalable approaches as instructional case studies for newcomers to the field. We argue that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time. |
URL | http://ieeexplore.ieee.org/document/7555393/ |
DOI | 10.1109/TIFS.2016.2603960 |
Citation Key | rocha_authorship_2017 |
- social networking (online)
- multimodal data processing
- prepaid SIM cards
- pubcrawl
- public Wi-Fi hotspots
- smart phones
- Smartphones
- social media
- social media forensics
- Metrics
- Speech
- stylometry
- supervised learning-based
- user anonymity
- user identification
- Writing
- writing style
- authorship attribution
- Media
- machine learning algorithms
- machine learning
- learning (artificial intelligence)
- internet
- Human behavior
- habit peculiarities
- Forensics
- feature extraction
- distributed networks
- Digital Forensics
- context exploitation
- Context
- computational linguistics
- authorship attribution algorithm