Big Data Processing of School Shooting Archives
Title | Big Data Processing of School Shooting Archives |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Farag, Mohamed, Nakate, Pranav, Fox, Edward A. |
Conference Name | Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries |
Date Published | June 2016 |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4229-2 |
Keywords | big data processing, classification, digital libraries, pubcrawl170201, web archives |
Abstract | Web archives about school shootings consist of webpages that may or may not be relevant to the events of interest. There are 3 main goals of this work; first is to clean the webpages, which involves getting rid of the stop words and non-relevant parts of a webpage. The second goal is to select just webpages relevant to the events of interest. The third goal is to upload the cleaned and relevant webpages to Apache Solr so that they are easily accessible. We show the details of all the steps required to achieve these goals. The results show that representative Web archives are noisy, with 2% - 40% relevant content. By cleaning the archives, we aid researchers to focus on relevant content for their analysis. |
URL | https://dl.acm.org/doi/10.1145/2910896.2925466 |
DOI | 10.1145/2910896.2925466 |
Citation Key | farag_big_2016 |