Visible to the public FACT - Fine grained Assessment of web page CredibiliTy

TitleFACT - Fine grained Assessment of web page CredibiliTy
Publication TypeConference Paper
Year of Publication2019
AuthorsAgrawal, Shriyansh, Sanagavarapu, Lalit Mohan, Reddy, YR
Conference NameTENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)
Date Publishedoct
KeywordsAutomated, automated genre-aware credibility assessment, composability, Credibility, crowdsourced Web, Decision trees, feature extraction, fine grained assessment, Framework, GCS, genre, genre credibility score, gradient boosted decision tree classified genres, health domain Web pages, information retrieval, Information security, Internet, learning (artificial intelligence), pattern classification, pubcrawl, quality, ranking, Resiliency, search engine queries, search engines, security of data, software engineering, supervised learning, text analysis, Tools, trillion web pages, web assessment, web of trust, Web page, Web page credibility, Web pages, Web sites, WEBCred framework, WOT
AbstractWith more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ( GCS). We validated our approach on `Information Security' dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on `Health' domain web pages and had comparable results. The calculated GCS correlated 69% with crowdsourced Web Of Trust ( WOT) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our GCS approach aligns with human way ( WOT) as compared to algorithmic way (Alexa) of web assessment in both the experiments.
DOI10.1109/TENCON.2019.8929515
Citation Keyagrawal_fact_2019