Visible to the public Biblio

Filters: Keyword is genre  [Clear All Filters]
2020-07-13
Agrawal, Shriyansh, Sanagavarapu, Lalit Mohan, Reddy, YR.  2019.  FACT - Fine grained Assessment of web page CredibiliTy. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON). :1088–1097.
With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ( GCS). We validated our approach on `Information Security' dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on `Health' domain web pages and had comparable results. The calculated GCS correlated 69% with crowdsourced Web Of Trust ( WOT) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our GCS approach aligns with human way ( WOT) as compared to algorithmic way (Alexa) of web assessment in both the experiments.
2017-03-07
Summers, Cameron, Tronel, Greg, Cramer, Jason, Vartakavi, Aneesh, Popp, Phillip.  2016.  GNMID14: A Collection of 110 Million Global Music Identification Matches. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. :693–696.

A new dataset is presented composed of music identification matches from Gracenote, a leading global music metadata company. Matches from January 1, 2014 to December 31, 2014 have been curated and made available as a public dataset called Gracenote Music Identification 2014, or GNMID14, at the following address: https://developer.gracenote.com/mid2014. This collection is the first significant music identification dataset and one of the largest music related datasets available containing more than 110M matches in 224 countries for 3M unique tracks, and 509K unique artists. It features geotemporal information (i.e. country and match date), genre and mood metadata. In this paper, we characterize the dataset and demonstrate its utility for Information Retrieval (IR) research.