Visible to the public Biblio

Filters: Author is Farag, Mohamed  [Clear All Filters]
2017-03-07
Farag, Mohamed, Nakate, Pranav, Fox, Edward A..  2016.  Big Data Processing of School Shooting Archives. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. :271–272.

Web archives about school shootings consist of webpages that may or may not be relevant to the events of interest. There are 3 main goals of this work; first is to clean the webpages, which involves getting rid of the stop words and non-relevant parts of a webpage. The second goal is to select just webpages relevant to the events of interest. The third goal is to upload the cleaned and relevant webpages to Apache Solr so that they are easily accessible. We show the details of all the steps required to achieve these goals. The results show that representative Web archives are noisy, with 2% - 40% relevant content. By cleaning the archives, we aid researchers to focus on relevant content for their analysis.