Title | Crawling and cluster hidden web using crawler framework and fuzzy-KNN |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Rahayuda, I. G. S., Santiari, N. P. L. |
Conference Name | 2017 5th International Conference on Cyber and IT Service Management (CITSM) |
Keywords | Browsers, crawler framework, Crawlers, crawling framework, dark web, database classification process, database management systems, Databases, deep web, fuzzy set theory, fuzzy-KNN, fuzzy-KNN method, hidden web, hidden Web clustering, Human Behavior, human factors, Internet, pattern classification, pubcrawl, search engines, search process, Weapons, Web crawling, Web site level, Web sites, World Wide Web |
Abstract | Today almost everyone is using internet for daily activities. Whether it's for social, academic, work or business. But only a few of us are aware that internet generally we access only a small part of the overall of internet access. The Internet or the world wide web is divided into several levels, such as web surfaces, deep web or dark web. Accessing internet into deep or dark web is a dangerous thing. This research will be conducted with research on web content and deep content. For a faster and safer search, in this research will be use crawler framework. From the search process will be obtained various kinds of data to be stored into the database. The database classification process will be implemented to know the level of the website. The classification process is done by using the fuzzy-KNN method. The fuzzy-KNN method classifies the results of the crawling framework that contained in the database. Crawling framework will generate data in the form of url address, page info and other. Crawling data will be compared with predefined sample data. The classification result of fuzzy-KNN will result in the data of the web level based on the value of the word specified in the sample data. From the research conducted on several data tests that found there are as much as 20% of the web surface, 7.5% web bergie, 20% deep web, 22.5% charter and 30% dark web. Research is only done on some test data, it is necessary to add some data in order to get better result. Better crawler frameworks can speed up crawling results, especially at certain web levels because not all crawler frameworks can work at a particular web level, the tor browser's can be used but the crawler framework sometimes can not work. |
DOI | 10.1109/CITSM.2017.8089225 |
Citation Key | rahayuda_crawling_2017 |