Visible to the public Biblio

Filters: Keyword is search engines  [Clear All Filters]
2023-06-02
Nikoletos, Sotirios, Raftopoulou, Paraskevi.  2022.  Employing social network analysis to dark web communities. 2022 IEEE International Conference on Cyber Security and Resilience (CSR). :311—316.

Deep web refers to sites that cannot be found by search engines and makes up the 96% of the digital world. The dark web is the part of the deep web that can only be accessed through specialised tools and anonymity networks. To avoid monitoring and control, communities that seek for anonymization are moving to the dark web. In this work, we scrape five dark web forums and construct five graphs to model user connections. These networks are then studied and compared using data mining techniques and social network analysis tools; for each community we identify the key actors, we study the social connections and interactions, we observe the small world effect, and we highlight the type of discussions among the users. Our results indicate that only a small subset of users are influential, while the rapid dissemination of information and resources between users may affect behaviours and formulate ideas for future members.

2022-12-20
Gracia, Mulumba Banza, Malele, Vusumuzi, Ndlovu, Sphiwe Promise, Mathonsi, Topside Ehleketani, Maaka, Lebogang, Muchenje, Tonderai.  2022.  6G Security Challenges and Opportunities. 2022 IEEE 13th International Conference on Mechanical and Intelligent Manufacturing Technologies (ICMIMT). :339–343.
The Sixth Generation (6G) is currently under development and it is a planned successor of the Fifth Generation (5G). It is a new wireless communication technology expected to have a greater coverage area, significant fast and a higher data rate. The aim of this paper is to examine the literature on challenges and possible solutions of 6G's security, privacy and trust. It uses the systematic literature review technique by searching five research databases for search engines which are precise keywords like “6G,” “6G Wireless communication,” and “sixth generation”. The latter produced a total of 1856 papers, then the security, privacy and trust issues of the 6G wireless communication were extracted. Two security issues, the artificial intelligence and visible light communication, were apparent. In conclusion, there is a need for new paradigms that will provide a clear 6G security solutions.
2022-11-18
Tian, Pu, Hatcher, William Grant, Liao, Weixian, Yu, Wei, Blasch, Erik.  2021.  FALIoTSE: Towards Federated Adversarial Learning for IoT Search Engine Resiliency. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :290–297.
To improve efficiency and resource usage in data retrieval, an Internet of Things (IoT) search engine organizes a vast amount of scattered data and responds to client queries with processed results. Machine learning provides a deep understanding of complex patterns and enables enhanced feedback to users through well-trained models. Nonetheless, machine learning models are prone to adversarial attacks via the injection of elaborate perturbations, resulting in subverted outputs. Particularly, adversarial attacks on time-series data demand urgent attention, as sensors in IoT systems are collecting an increasing volume of sequential data. This paper investigates adversarial attacks on time-series analysis in an IoT search engine (IoTSE) system. Specifically, we consider the Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) as our base model, implemented in a simulated federated learning scheme. We propose the Federated Adversarial Learning for IoT Search Engine (FALIoTSE) that exploits the shared parameters of the federated model as the target for adversarial example generation and resiliency. Using a real-world smart parking garage dataset, the impact of an attack on FALIoTSE is demonstrated under various levels of perturbation. The experiments show that the training error increases significantly with noises from the gradient.
2022-04-13
Govindaraj, Logeswari, Sundan, Bose, Thangasamy, Anitha.  2021.  An Intrusion Detection and Prevention System for DDoS Attacks using a 2-Player Bayesian Game Theoretic Approach. 2021 4th International Conference on Computing and Communications Technologies (ICCCT). :319—324.

Distributed Denial-of-Service (DDoS) attacks pose a huge risk to the network and threaten its stability. A game theoretic approach for intrusion detection and prevention is proposed to avoid DDoS attacks in the internet. Game theory provides a control mechanism that automates the intrusion detection and prevention process within a network. In the proposed system, system-subject interaction is modeled as a 2-player Bayesian signaling zero sum game. The game's Nash Equilibrium gives a strategy for the attacker and the system such that neither can increase their payoff by changing their strategy unilaterally. Moreover, the Intent Objective and Strategy (IOS) of the attacker and the system are modeled and quantified using the concept of incentives. In the proposed system, the prevention subsystem consists of three important components namely a game engine, database and a search engine for computing the Nash equilibrium, to store and search the database for providing the optimum defense strategy. The framework proposed is validated via simulations using ns3 network simulator and has acquired over 80% detection rate, 90% prevention rate and 6% false positive alarms.

2021-03-09
Badawi, E., Jourdan, G.-V., Bochmann, G., Onut, I.-V..  2020.  An Automatic Detection and Analysis of the Bitcoin Generator Scam. 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :407—416.

We investigate what we call the "Bitcoin Generator Scam" (BGS), a simple system in which the scammers promise to "generate" new bitcoins using the ones that were sent to them. A typical offer will suggest that, for a small fee, one could receive within minutes twice the amount of bitcoins submitted. BGS is clearly not a very sophisticated attack. The modus operandi is simply to put up some web page on which to find the address to send the money and wait for the payback. The pages are then indexed by search engines, and ready to find for victims looking for free bitcoins. We describe here a generic system to find and analyze scams such as BGS. We have trained a classifier to detect these pages, and we have a crawler searching for instances using a series of search engines. We then monitor the instances that we find to trace payments and bitcoin addresses that are being used over time. Unlike most bitcoin-based scam monitoring systems, we do not rely on analyzing transactions on the blockchain to find scam instances. Instead, we proactively find these instances through the web pages advertising the scam. Thus our system is able to find addresses with very few transactions, or even none at all. Indeed, over half of the addresses that have eventually received funds were detected before receiving any transactions. The data for this paper was collected over four months, from November 2019 to February 2020. We have found more than 1,300 addresses directly associated with the scam, hosted on over 500 domains. Overall, these addresses have received (at least) over 5 million USD to the scam, with an average of 47.3 USD per transaction.

2021-02-16
Mujib, M., Sari, R. F..  2020.  Performance Evaluation of Data Center Network with Network Micro-segmentation. 2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE). :27—32.

Research on the design of data center infrastructure is increasing, both from academia and industry, due to the rapid development of cloud-based applications such as search engines, social networks, and large-scale computing. On a large scale, data centers can consist of hundreds to thousands of servers that require systems with high-performance requirements and low downtime. To meet the network's needs in a dynamic data center, infrastructure of applications and services are growing. It takes a process of designing a network topology so that it can guarantee availability and security. One way to surmount this is by implementing the zero trust security model based on micro-segmentation. Zero trust is a security idea based on the principle of "never trust, always verify" in which no concepts of trust and untrust in network traffic. The zero trust security model implemented network traffic in the form of untrust. Micro-segmentation is a way to achieve zero trust by dividing a network into smaller logical segments to restrict the traffic. In this research, data center network performance based on software-defined networking with zero trust security model using micro-segmentation has been evaluated using a testbed simulation of Cisco Application Centric Infrastructure by measuring the round trip time, jitter, and packet loss during experiments. Performance evaluation results show that micro-segmentation adds an average round trip time of 4 μs and jitter of 11 μs without packet loss so that the security can be improved without significantly affecting network performance on the data center.

2021-01-18
Yadav, M. K., Gugal, D., Matkar, S., Waghmare, S..  2019.  Encrypted Keyword Search in Cloud Computing using Fuzzy Logic. 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT). :1–4.
Research and Development, and information management professionals routinely employ simple keyword searches or more complex Boolean queries when using databases such as PubMed and Ovid and search engines like Google to find the information they need. While satisfying the basic needs of the researcher, basic search is limited which can adversely affect both precision and recall, decreasing productivity and damaging the researchers' ability to discover new insights. The cloud service providers who store user's data may access sensitive information without any proper authority. A basic approach to save the data confidentiality is to encrypt the data. Data encryption also demands the protection of keyword privacy since those usually contain very vital information related to the files. Encryption of keywords protects keyword safety. Fuzzy keyword search enhances system usability by matching the files perfectly or to the nearest possible files against the keywords entered by the user based on similar semantics. Encrypted keyword search in cloud using this logic provides the user, on entering keywords, to receive best possible files in a more secured manner, by protecting the user's documents.
2021-01-15
Park, W..  2020.  A Study on Analytical Visualization of Deep Web. 2020 22nd International Conference on Advanced Communication Technology (ICACT). :81—83.

Nowadays, there is a flood of data such as naked body photos and child pornography, which is making people bloodless. In addition, people also distribute drugs through unknown dark channels. In particular, most transactions are being made through the Deep Web, the dark path. “Deep Web refers to an encrypted network that is not detected on search engine like Google etc. Users must use Tor to visit sites on the dark web” [4]. In other words, the Dark Web uses Tor's encryption client. Therefore, users can visit multiple sites on the dark Web, but not know the initiator of the site. In this paper, we propose the key idea based on the current status of such crimes and a crime information visual system for Deep Web has been developed. The status of deep web is analyzed and data is visualized using Java. It is expected that the program will help more efficient management and monitoring of crime in unknown web such as deep web, torrent etc.

2020-09-11
Kim, Donghoon, Sample, Luke.  2019.  Search Prevention with Captcha Against Web Indexing: A Proof of Concept. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :219—224.
A website appears in search results based on web indexing conducted by a search engine bot (e.g., a web crawler). Some webpages do not want to be found easily because they include sensitive information. There are several methods to prevent web crawlers from indexing in search engine database. However, such webpages can still be indexed by malicious web crawlers. Through this study, we explore a paradox perspective on a new use of captchas for search prevention. Captchas are used to prevent web crawlers from indexing by converting sensitive words to captchas. We have implemented the web-based captcha conversion tool based on our search prevention algorithm. We also describe our proof of concept with the web-based chat application modified to utilize our algorithm. We have conducted the experiment to evaluate our idea on Google search engine with two versions of webpages, one containing plain text and another containing sensitive words converted to captchas. The experiment results show that the sensitive words on the captcha version of the webpages are unable to be found by Google's search engine, while the plain text versions are.
2020-08-17
Yang, Shiman, Shi, Yijie, Guo, Fenzhuo.  2019.  Risk Assessment of Industrial Internet System By Using Game-Attack Graphs. 2019 IEEE 5th International Conference on Computer and Communications (ICCC). :1660–1663.
In this paper, we propose a game-attack graph-based risk assessment model for industrial Internet system. Firstly, use non-destructive asset profiling to scan components and devices included in the system and their open services and communication protocols. Further compare the CNVD and CVE to find the vulnerability through the search engine keyword segment matching method, and generate an asset threat list. Secondly, build the attack rule base based on the network information, and model the system using the attribute attack graph. Thirdly, combine the game theory with the idea of the established model. Finally, optimize and quantify the analysis to get the best attack path and the best defense strategy.
2020-07-13
Agrawal, Shriyansh, Sanagavarapu, Lalit Mohan, Reddy, YR.  2019.  FACT - Fine grained Assessment of web page CredibiliTy. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON). :1088–1097.
With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ( GCS). We validated our approach on `Information Security' dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on `Health' domain web pages and had comparable results. The calculated GCS correlated 69% with crowdsourced Web Of Trust ( WOT) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our GCS approach aligns with human way ( WOT) as compared to algorithmic way (Alexa) of web assessment in both the experiments.
Mahmood, Shah.  2019.  The Anti-Data-Mining (ADM) Framework - Better Privacy on Online Social Networks and Beyond. 2019 IEEE International Conference on Big Data (Big Data). :5780–5788.
The unprecedented and enormous growth of cloud computing, especially online social networks, has resulted in numerous incidents of the loss of users' privacy. In this paper, we provide a framework, based on our anti-data-mining (ADM) principle, to enhance users' privacy against adversaries including: online social networks; search engines; financial terminal providers; ad networks; eavesdropping governments; and other parties who can monitor users' content from the point where the content leaves users' computers to within the data centers of these information accumulators. To achieve this goal, our framework proactively uses the principles of suppression of sensitive data and disinformation. Moreover, we use social-bots in a novel way for enhanced privacy and provide users' with plausible deniability for their photos, audio, and video content uploaded online.
2020-07-10
Schäfer, Matthias, Fuchs, Markus, Strohmeier, Martin, Engel, Markus, Liechti, Marc, Lenders, Vincent.  2019.  BlackWidow: Monitoring the Dark Web for Cyber Security Information. 2019 11th International Conference on Cyber Conflict (CyCon). 900:1—21.

The Dark Web, a conglomerate of services hidden from search engines and regular users, is used by cyber criminals to offer all kinds of illegal services and goods. Multiple Dark Web offerings are highly relevant for the cyber security domain in anticipating and preventing attacks, such as information about zero-day exploits, stolen datasets with login information, or botnets available for hire. In this work, we analyze and discuss the challenges related to information gathering in the Dark Web for cyber security intelligence purposes. To facilitate information collection and the analysis of large amounts of unstructured data, we present BlackWidow, a highly automated modular system that monitors Dark Web services and fuses the collected data in a single analytics framework. BlackWidow relies on a Docker-based micro service architecture which permits the combination of both preexisting and customized machine learning tools. BlackWidow represents all extracted data and the corresponding relationships extracted from posts in a large knowledge graph, which is made available to its security analyst users for search and interactive visual exploration. Using BlackWidow, we conduct a study of seven popular services on the Deep and Dark Web across three different languages with almost 100,000 users. Within less than two days of monitoring time, BlackWidow managed to collect years of relevant information in the areas of cyber security and fraud monitoring. We show that BlackWidow can infer relationships between authors and forums and detect trends for cybersecurity-related topics. Finally, we discuss exemplary case studies surrounding leaked data and preparation for malicious activity.

2020-07-03
León, Raquel, Domínguez, Adrián, Carballo, Pedro P., Núñez, Antonio.  2019.  Deep Packet Inspection Through Virtual Platforms using System-On-Chip FPGAs. 2019 XXXIV Conference on Design of Circuits and Integrated Systems (DCIS). :1—6.

Virtual platforms provide a full hardware/software platform to study device limitations in an early stages of the design flow and to develop software without requiring a physical implementation. This paper describes the development process of a virtual platform for Deep Packet Inspection (DPI) hardware accelerators by using Transaction Level Modeling (TLM). We propose two DPI architectures oriented to System-on-Chip FPGA. The first architecture, CPU-DMA based architecture, is a hybrid CPU/FPGA where the packets are filtered in the software domain. The second architecture, Hardware-IP based architecture, is mainly implemented in the hardware domain. We have created two virtual platforms and performed the simulation, the debugging and the analysis of the hardware/software features, in order to compare results for both architectures.

2020-05-22
Ito, Toshitaka, Itotani, Yuri, Wakabayashi, Shin'ichi, Nagayama, Shinobu, Inagi, Masato.  2018.  A Nearest Neighbor Search Engine Using Distance-Based Hashing. 2018 International Conference on Field-Programmable Technology (FPT). :150—157.
This paper proposes an FPGA-based nearest neighbor search engine for high-dimensional data, in which nearest neighbor search is performed based on distance-based hashing. The proposed hardware search engine implements a nearest neighbor search algorithm based on an extension of flexible distance-based hashing (FDH, for short), which finds an exact solution with high probability. The proposed engine is a parallel processing and pipelined circuit so that search results can be obtained in a short execution time. Experimental results show the effectiveness and efficiency of the proposed engine.
2020-04-17
Stark, Emily, Sleevi, Ryan, Muminovic, Rijad, O'Brien, Devon, Messeri, Eran, Felt, Adrienne Porter, McMillion, Brendan, Tabriz, Parisa.  2019.  Does Certificate Transparency Break the Web? Measuring Adoption and Error Rate 2019 IEEE Symposium on Security and Privacy (SP). :211—226.
Certificate Transparency (CT) is an emerging system for enabling the rapid discovery of malicious or misissued certificates. Initially standardized in 2013, CT is now finally beginning to see widespread support. Although CT provides desirable security benefits, web browsers cannot begin requiring all websites to support CT at once, due to the risk of breaking large numbers of websites. We discuss challenges for deployment, analyze the adoption of CT on the web, and measure the error rates experienced by users of the Google Chrome web browser. We find that CT has so far been widely adopted with minimal breakage and warnings. Security researchers often struggle with the tradeoff between security and user frustration: rolling out new security requirements often causes breakage. We view CT as a case study for deploying ecosystem-wide change while trying to minimize end user impact. We discuss the design properties of CT that made its success possible, as well as draw lessons from its risks and pitfalls that could be avoided in future large-scale security deployments.
2020-04-03
Sattar, Naw Safrin, Arifuzzaman, Shaikh, Zibran, Minhaz F., Sakib, Md Mohiuddin.  2019.  An Ensemble Approach for Suspicious Traffic Detection from High Recall Network Alerts. {2019 IEEE International Conference on Big Data (Big Data. :4299—4308}}@inproceedings{wu_ensemble_2019.
Web services from large-scale systems are prevalent all over the world. However, these systems are naturally vulnerable and incline to be intruded by adversaries for illegal benefits. To detect anomalous events, previous works focus on inspecting raw system logs by identifying the outliers in workflows or relying on machine learning methods. Though those works successfully identify the anomalies, their models use large training set and process whole system logs. To reduce the quantity of logs that need to be processed, high recall suspicious network alert systems can be applied to preprocess system logs. Only the logs that trigger alerts are retrieved for further usage. Due to the universally usage of network traffic alerts among Security Operations Center, anomalies detection problems could be transformed to classify truly suspicious network traffic alerts from false alerts.In this work, we propose an ensemble model to distinguish truly suspicious alerts from false alerts. Our model consists of two sub-models with different feature extraction strategies to ensure the diversity and generalization. We use decision tree based boosters and deep neural networks to build ensemble models for classification. Finally, we evaluate our approach on suspicious network alerts dataset provided by 2019 IEEE BigData Cup: Suspicious Network Event Recognition. Under the metric of AUC scores, our model achieves 0.9068 on the whole testing set.
2020-02-10
Ding, Steven H. H., Fung, Benjamin C. M., Charland, Philippe.  2019.  Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. 2019 IEEE Symposium on Security and Privacy (SP). :472–489.

Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different. A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model \textbackslashemphAsm2Vec. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.

2020-01-28
Kurniawan, Agus, Kyas, Marcel.  2019.  Securing Machine Learning Engines in IoT Applications with Attribute-Based Encryption. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :30–34.

Machine learning has been adopted widely to perform prediction and classification. Implementing machine learning increases security risks when computation process involves sensitive data on training and testing computations. We present a proposed system to protect machine learning engines in IoT environment without modifying internal machine learning architecture. Our proposed system is designed for passwordless and eliminated the third-party in executing machine learning transactions. To evaluate our a proposed system, we conduct experimental with machine learning transactions on IoT board and measure computation time each transaction. The experimental results show that our proposed system can address security issues on machine learning computation with low time consumption.

Monaco, John V..  2019.  Feasibility of a Keystroke Timing Attack on Search Engines with Autocomplete. 2019 IEEE Security and Privacy Workshops (SPW). :212–217.
Many websites induce the browser to send network traffic in response to user input events. This includes websites with autocomplete, a popular feature on search engines that anticipates the user's query while they are typing. Websites with this functionality require HTTP requests to be made as the query input field changes, such as when the user presses a key. The browser responds to input events by generating network traffic to retrieve the search predictions. The traffic emitted by the client can expose the timings of keyboard input events which may lead to a keylogging side channel attack whereby the query is revealed through packet inter-arrival times. We investigate the feasibility of such an attack on several popular search engines by characterizing the behavior of each website and measuring information leakage at the network level. Three out of the five search engines we measure preserve the mutual information between keystrokes and timings to within 1% of what it is on the host. We describe the ways in which two search engines mitigate this vulnerability with minimal effects on usability.
2019-10-02
McMahon, E., Patton, M., Samtani, S., Chen, H..  2018.  Benchmarking Vulnerability Assessment Tools for Enhanced Cyber-Physical System (CPS) Resiliency. 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). :100–105.

Cyber-Physical Systems (CPSs) are engineered systems seamlessly integrating computational algorithms and physical components. CPS advances offer numerous benefits to domains such as health, transportation, smart homes and manufacturing. Despite these advances, the overall cybersecurity posture of CPS devices remains unclear. In this paper, we provide knowledge on how to improve CPS resiliency by evaluating and comparing the accuracy, and scalability of two popular vulnerability assessment tools, Nessus and OpenVAS. Accuracy and suitability are evaluated with a diverse sample of pre-defined vulnerabilities in Industrial Control Systems (ICS), smart cars, smart home devices, and a smart water system. Scalability is evaluated using a large-scale vulnerability assessment of 1,000 Internet accessible CPS devices found on Shodan, the search engine for the Internet of Things (IoT). Assessment results indicate several CPS devices from major vendors suffer from critical vulnerabilities such as unsupported operating systems, OpenSSH vulnerabilities allowing unauthorized information disclosure, and PHP vulnerabilities susceptible to denial of service attacks.

2019-09-04
Vanjari, M. S. P., Balsaraf, M. K. P..  2018.  Efficient Exploration of Algorithm in Scholarly Big Data Document. 2018 International Conference on Information , Communication, Engineering and Technology (ICICET). :1–5.
Algorithms are used to develop, analyzing, and applying in the computer field and used for developing new application. It is used for finding solutions to any problems in different condition. It transforms the problems into algorithmic ones on which standard algorithms are applied. Day by day Scholarly Digital documents are increasing. AlgorithmSeer is a search engine used for searching algorithms. The main aim of it provides a large algorithm database. It is used to automatically encountering and take these algorithms in this big collection of documents that enable algorithm indexing, searching, discovery, and analysis. An original set to identify and pull out algorithm representations in a big collection of scholarly documents is proposed, of scale able techniques used by AlgorithmSeer. Along with this, particularly important and relevant textual content can be accessed the platform and highlight portions by anyone with different levels of knowledge. In support of lectures and self-learning, the highlighted documents can be shared with others. But different levels of learners cannot use the highlighted part of text at same understanding level. The problem of guessing new highlights of partially highlighted documents can be solved by us.
Xiong, M., Li, A., Xie, Z., Jia, Y..  2018.  A Practical Approach to Answer Extraction for Constructing QA Solution. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). :398–404.
Question Answering system(QA) plays an increasingly important role in the Internet age. The proportion of using the QA is getting higher and higher for the Internet users to obtain knowledge and solve problems, especially in the modern agricultural filed. However, the answer quality in QA varies widely due to the agricultural expert's level. Answer quality assessment is important. Due to the lexical gap between questions and answers, the existing approaches are not quite satisfactory. A practical approach RCAS is proposed to rank the candidate answers, which utilizes the support sets to reduce the impact of lexical gap between questions and answers. Firstly, Similar questions are retrieved and support sets are produced with their high-quality answers. Based on the assumption that high quality answers would also have intrinsic similarity, the quality of candidate answers are then evaluated through their distance from the support sets. Secondly, Different from the existing approaches, previous knowledge from similar question-answer pairs are used to bridge the straight lexical and semantic gaps between questions and answers. Experiments are implemented on approximately 0.15 million question-answer pairs about agriculture, dietetics and food from Yahoo! Answers. The results show that our approach can rank the candidate answers more precisely.
2018-09-12
Rahayuda, I. G. S., Santiari, N. P. L..  2017.  Crawling and cluster hidden web using crawler framework and fuzzy-KNN. 2017 5th International Conference on Cyber and IT Service Management (CITSM). :1–7.
Today almost everyone is using internet for daily activities. Whether it's for social, academic, work or business. But only a few of us are aware that internet generally we access only a small part of the overall of internet access. The Internet or the world wide web is divided into several levels, such as web surfaces, deep web or dark web. Accessing internet into deep or dark web is a dangerous thing. This research will be conducted with research on web content and deep content. For a faster and safer search, in this research will be use crawler framework. From the search process will be obtained various kinds of data to be stored into the database. The database classification process will be implemented to know the level of the website. The classification process is done by using the fuzzy-KNN method. The fuzzy-KNN method classifies the results of the crawling framework that contained in the database. Crawling framework will generate data in the form of url address, page info and other. Crawling data will be compared with predefined sample data. The classification result of fuzzy-KNN will result in the data of the web level based on the value of the word specified in the sample data. From the research conducted on several data tests that found there are as much as 20% of the web surface, 7.5% web bergie, 20% deep web, 22.5% charter and 30% dark web. Research is only done on some test data, it is necessary to add some data in order to get better result. Better crawler frameworks can speed up crawling results, especially at certain web levels because not all crawler frameworks can work at a particular web level, the tor browser's can be used but the crawler framework sometimes can not work.
2018-06-11
Kondo, D., Silverston, T., Tode, H., Asami, T., Perrin, O..  2017.  Risk analysis of information-leakage through interest packets in NDN. 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :360–365.

Information-leakage is one of the most important security issues in the current Internet. In Named-Data Networking (NDN), Interest names introduce novel vulnerabilities that can be exploited. By setting up a malware, Interest names can be used to encode critical information (steganography embedded) and to leak information out of the network by generating anomalous Interest traffic. This security threat based on Interest names does not exist in IP network, and it is essential to solve this issue to secure the NDN architecture. This paper performs risk analysis of information-leakage in NDN. We first describe vulnerabilities with Interest names and, as countermeasures, we propose a name-based filter using search engine information, and another filter using one-class Support Vector Machine (SVM). We collected URLs from the data repository provided by Common Crawl and we evaluate the performances of our per-packet filters. We show that our filters can choke drastically the throughput of information-leakage, which makes it easier to detect anomalous Interest traffic. It is therefore possible to mitigate information-leakage in NDN network and it is a strong incentive for future deployment of this architecture at the Internet scale.