Authorship attribution techniques identify the author of an unsigned document such as an e-mail, memo, or social media post by analyzing candidate authors' writing styles for tell-tale "fingerprints" such as distinctive words and sentence structure. Everyone leaves these fingerprints in his or her writing. This creates a problem for people who have a need to remain anonymous, people including whistleblowers and journalists working in states hostile to their work. Because a small sample of prose risks exposing the identity of a writer, guarantees of anonymity and confidentiality offered by government-sponsored whistleblower programs are difficult to sustain. Many strategies have been proposed for protecting writers from attackers who want to discover their identities. This research compares the effectiveness of these methods and develops software to help individuals write documents in ways that minimize information revealed about their identities.
This project characterizes the effectiveness of known defenses against authorship attribution attacks, including a novel defense which requires a source to write using a widely-used simplified variant of English. The project develops software which provides interactive feedback to sources who need to write prose which does not reveal their identity. This software tool supports automatic, non-interactive rewriting of English prose that preserves the semantic content of a document while removing identifying stylistic fingerprints. The research translates into deliverables which are of practical use to organizations and corporations running anonymous and confidential whistleblower programs to better protect the identities of sources and, in doing so, make reporting suspected wrongdoing less risky. The project will involve graduate and undergraduate students.
|