Visible to the public Biblio

Filters: Keyword is search engines  [Clear All Filters]
2018-03-26
Ma, H., Tao, O., Zhao, C., Li, P., Wang, L..  2017.  Impact of Replacement Policies on Static-Dynamic Query Results Cache in Web Search Engines. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :137–139.

Caching query results is an efficient technique for Web search engines. A state-of-the-art approach named Static-Dynamic Cache (SDC) is widely used in practice. Replacement policy is the key factor on the performance of cache system, and has been widely studied such as LIRS, ARC, CLOCK, SKLRU and RANDOM in different research areas. In this paper, we discussed replacement policies for static-dynamic cache and conducted the experiments on real large scale query logs from two famous commercial Web search engine companies. The experimental results show that ARC replacement policy could work well with static-dynamic cache, especially for large scale query results cache.

2018-03-19
Shahid, U., Farooqi, S., Ahmad, R., Shafiq, Z., Srinivasan, P., Zaffar, F..  2017.  Accurate Detection of Automatically Spun Content via Stylometric Analysis. 2017 IEEE International Conference on Data Mining (ICDM). :425–434.

Spammers use automated content spinning techniques to evade plagiarism detection by search engines. Text spinners help spammers in evading plagiarism detectors by automatically restructuring sentences and replacing words or phrases with their synonyms. Prior work on spun content detection relies on the knowledge about the dictionary used by the text spinning software. In this work, we propose an approach to detect spun content and its seed without needing the text spinner's dictionary. Our key idea is that text spinners introduce stylometric artifacts that can be leveraged for detecting spun documents. We implement and evaluate our proposed approach on a corpus of spun documents that are generated using a popular text spinning software. The results show that our approach can not only accurately detect whether a document is spun but also identify its source (or seed) document - all without needing the dictionary used by the text spinner.

2017-11-03
Iliou, C., Kalpakis, G., Tsikrika, T., Vrochidis, S., Kompatsiaris, I..  2016.  Hybrid Focused Crawling for Homemade Explosives Discovery on Surface and Dark Web. 2016 11th International Conference on Availability, Reliability and Security (ARES). :229–234.
This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly traverse the Surface Web and several darknets present in the Dark Web (i.e. Tor, I2P and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the network type. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed ap-proach both for the Surface and the Dark Web.
2015-05-06
Yang Xu, Zhaobo Liu, Zhuoyuan Zhang, Chao, H.J..  2014.  High-Throughput and Memory-Efficient Multimatch Packet Classification Based on Distributed and Pipelined Hash Tables. Networking, IEEE/ACM Transactions on. 22:982-995.

The emergence of new network applications, such as the network intrusion detection system and packet-level accounting, requires packet classification to report all matched rules instead of only the best matched rule. Although several schemes have been proposed recently to address the multimatch packet classification problem, most of them require either huge memory or expensive ternary content addressable memory (TCAM) to store the intermediate data structure, or they suffer from steep performance degradation under certain types of classifiers. In this paper, we decompose the operation of multimatch packet classification from the complicated multidimensional search to several single-dimensional searches, and present an asynchronous pipeline architecture based on a signature tree structure to combine the intermediate results returned from single-dimensional searches. By spreading edges of the signature tree across multiple hash tables at different stages, the pipeline can achieve a high throughput via the interstage parallel access to hash tables. To exploit further intrastage parallelism, two edge-grouping algorithms are designed to evenly divide the edges associated with each stage into multiple work-conserving hash tables. To avoid collisions involved in hash table lookup, a hybrid perfect hash table construction scheme is proposed. Extensive simulation using realistic classifiers and traffic traces shows that the proposed pipeline architecture outperforms HyperCuts and B2PC schemes in classification speed by at least one order of magnitude, while having a similar storage requirement. Particularly, with different types of classifiers of 4K rules, the proposed pipeline architecture is able to achieve a throughput between 26.8 and 93.1 Gb/s using perfect hash tables.

2015-05-05
Conglei Shi, Yingcai Wu, Shixia Liu, Hong Zhou, Huamin Qu.  2014.  LoyalTracker: Visualizing Loyalty Dynamics in Search Engines. Visualization and Computer Graphics, IEEE Transactions on. 20:1733-1742.

The huge amount of user log data collected by search engine providers creates new opportunities to understand user loyalty and defection behavior at an unprecedented scale. However, this also poses a great challenge to analyze the behavior and glean insights into the complex, large data. In this paper, we introduce LoyalTracker, a visual analytics system to track user loyalty and switching behavior towards multiple search engines from the vast amount of user log data. We propose a new interactive visualization technique (flow view) based on a flow metaphor, which conveys a proper visual summary of the dynamics of user loyalty of thousands of users over time. Two other visualization techniques, a density map and a word cloud, are integrated to enable analysts to gain further insights into the patterns identified by the flow view. Case studies and the interview with domain experts are conducted to demonstrate the usefulness of our technique in understanding user loyalty and switching behavior in search engines.
 

Mewara, B., Bairwa, S., Gajrani, J., Jain, V..  2014.  Enhanced browser defense for reflected Cross-Site Scripting. Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), 2014 3rd International Conference on. :1-6.

Cross-Site Scripting (XSS) is a common attack technique that lets attackers insert the code in the output application of web page which is referred to the web browser of visitor and then the inserted code executes automatically and steals the sensitive information. In order to prevent the users from XSS attack, many client- side solutions have been implemented; most of them being used are the filters that sanitize the malicious input. However, many of these filters do not provide prevention to the newly designed sophisticated attacks such as multiple points of injection, injection into script etc. This paper proposes and implements an approach based on encoding unfiltered reflections for detecting vulnerable web applications which can be exploited using above mentioned sophisticated attacks. Results prove that the proposed approach provides accurate higher detection rate of exploits. In addition to this, an implementation of blocking the execution of malicious scripts have contributed to XSS-Me: an open source Mozilla Firefox security extension that detects for reflected XSS vulnerabilities which can be considered as an effective solution if it is integrated inside the browser rather than being enforced as an extension.

2015-05-04
Swati, K., Patankar, A.J..  2014.  Effective personalized mobile search using KNN. Data Science Engineering (ICDSE), 2014 International Conference on. :157-160.

Effective Personalized Mobile Search Using KNN, implements an architecture to improve user's personalization effectiveness over large set of data maintaining security of the data. User preferences are gathered through clickthrough data. Clickthrough data obtained is sent to the server in encrypted form. Clickthrough data obtained is classified into content concepts and location concepts. To improve classification and minimize processing time, KNN(K Nearest Neighborhood) algorithm is used. Preferences identified(location and content) are merged to provide effective preferences to the user. System make use of four entropies to balance weight between content concepts and location concepts. System implements client server architecture. Role of client is to collect user queries and to maintain them in files for future reference. User preference privacy is ensured through privacy parameters and also through encryption techniques. Server is responsible to carry out the tasks like training, reranking of the search results obtained and the concept extraction. Experiments are carried out on Android based mobile. Results obtained through experiments show that system significantly gives improved results over previous algorithm for the large set of data maintaining security.

2015-04-30
Sen, S., Guha, S., Datta, A., Rajamani, S.K., Tsai, J., Wing, J.M..  2014.  Bootstrapping Privacy Compliance in Big Data Systems. Security and Privacy (SP), 2014 IEEE Symposium on. :327-342.

With the rapid increase in cloud services collecting and using user data to offer personalized experiences, ensuring that these services comply with their privacy policies has become a business imperative for building user trust. However, most compliance efforts in industry today rely on manual review processes and audits designed to safeguard user data, and therefore are resource intensive and lack coverage. In this paper, we present our experience building and operating a system to automate privacy policy compliance checking in Bing. Central to the design of the system are (a) Legal ease-a language that allows specification of privacy policies that impose restrictions on how user data is handled, and (b) Grok-a data inventory for Map-Reduce-like big data systems that tracks how user data flows among programs. Grok maps code-level schema elements to data types in Legal ease, in essence, annotating existing programs with information flow types with minimal human input. Compliance checking is thus reduced to information flow analysis of Big Data systems. The system, bootstrapped by a small team, checks compliance daily of millions of lines of ever-changing source code written by several thousand developers.