Visible to the public Big Data Processing of School Shooting Archives

TitleBig Data Processing of School Shooting Archives
Publication TypeConference Paper
Year of Publication2016
AuthorsFarag, Mohamed, Nakate, Pranav, Fox, Edward A.
Conference NameProceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
Date PublishedJune 2016
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4229-2
Keywordsbig data processing, classification, digital libraries, pubcrawl170201, web archives
Abstract

Web archives about school shootings consist of webpages that may or may not be relevant to the events of interest. There are 3 main goals of this work; first is to clean the webpages, which involves getting rid of the stop words and non-relevant parts of a webpage. The second goal is to select just webpages relevant to the events of interest. The third goal is to upload the cleaned and relevant webpages to Apache Solr so that they are easily accessible. We show the details of all the steps required to achieve these goals. The results show that representative Web archives are noisy, with 2% - 40% relevant content. By cleaning the archives, we aid researchers to focus on relevant content for their analysis.

URLhttps://dl.acm.org/doi/10.1145/2910896.2925466
DOI10.1145/2910896.2925466
Citation Keyfarag_big_2016