Biblio

List
Filter

Found 16998 results

2018-03-26

Aslan, Ö, Samet, R.. 2017. Mitigating Cyber Security Attacks by Being Aware of Vulnerabilities and Bugs. 2017 International Conference on Cyberworlds (CW). :222–225.

Because the Internet makes human lives easier, many devices are connected to the Internet daily. The private data of individuals and large companies, including health-related data, user bank accounts, and military and manufacturing data, are increasingly accessible via the Internet. Because almost all data is now accessible through the Internet, protecting these valuable assets has become a major concern. The goal of cyber security is to protect such assets from unauthorized use. Attackers use automated tools and manual techniques to penetrate systems by exploiting existing vulnerabilities and software bugs. To provide good enough security; attack methodologies, vulnerability concepts and defence strategies should be thoroughly investigated. The main purpose of this study is to show that the patches released for existing vulnerabilities at the operating system (OS) level and in software programs does not completely prevent cyber-attack. Instead, producing specific patches for each company and fixing software bugs by being aware of the software running on each specific system can provide a better result. This study also demonstrates that firewalls, antivirus software, Windows Defender and other prevention techniques are not sufficient to prevent attacks. Instead, this study examines different aspects of penetration testing to determine vulnerable applications and hosts using the Nmap and Metasploit frameworks. For a test case, a virtualized system is used that includes different versions of Windows and Linux OS.

Movahedi, Y., Cukier, M., Andongabo, A., Gashi, I.. 2017. Cluster-Based Vulnerability Assessment Applied to Operating Systems. 2017 13th European Dependable Computing Conference (EDCC). :18–25.

Organizations face the issue of how to best allocate their security resources. Thus, they need an accurate method for assessing how many new vulnerabilities will be reported for the operating systems (OSs) they use in a given time period. Our approach consists of clustering vulnerabilities by leveraging the text information within vulnerability records, and then simulating the mean value function of vulnerabilities by relaxing the monotonic intensity function assumption, which is prevalent among the studies that use software reliability models (SRMs) and nonhomogeneous Poisson process (NHPP) in modeling. We applied our approach to the vulnerabilities of four OSs: Windows, Mac, IOS, and Linux. For the OSs analyzed in terms of curve fitting and prediction capability, our results, compared to a power-law model without clustering issued from a family of SRMs, are more accurate in all cases we analyzed.

Zahilah, R., Tahir, F., Zainal, A., Abdullah, A. H., Ismail, A. S.. 2017. Unified Approach for Operating System Comparisons with Windows OS Case Study. 2017 IEEE Conference on Application, Information and Network Security (AINS). :91–96.

The advancement in technology has changed how people work and what software and hardware people use. From conventional personal computer to GPU, hardware technology and capability have dramatically improved so does the operating systems that come along. Unfortunately, current industry practice to compare OS is performed with single perspective. It is either benchmark the hardware level performance or performs penetration testing to check the security features of an OS. This rigid method of benchmarking does not really reflect the true performance of an OS as the performance analysis is not comprehensive and conclusive. To illustrate this deficiency, the study performed hardware level and operational level benchmarking on Windows XP, Windows 7 and Windows 8 and the results indicate that there are instances where Windows XP excels over its newer counterparts. Overall, the research shows Windows 8 is a superior OS in comparison to its predecessors running on the same hardware. Furthermore, the findings also show that the automated benchmarking tools are proved less efficient benchmark systems that run on Windows XP and older OS as they do not support DirectX 11 and other advanced features that the hardware supports. There lies the need to have a unified benchmarking approach to compare other aspects of OS such as user oriented tasks and security parameters to provide a complete comparison. Therefore, this paper is proposing a unified approach for Operating System (OS) comparisons with the help of a Windows OS case study. This unified approach includes comparison of OS from three aspects which are; hardware level, operational level performance and security tests.

d Krit, S., Haimoud, E.. 2017. Overview of Firewalls: Types and Policies: Managing Windows Embedded Firewall Programmatically. 2017 International Conference on Engineering MIS (ICEMIS). :1–7.

Due to the increasing threat of network attacks, Firewall has become crucial elements in network security, and have been widely deployed in most businesses and institutions for securing private networks. The function of a firewall is to examine each packet that passes through it and decide whether to letting them pass or halting them based on preconfigured rules and policies, so firewall now is the first defense line against cyber attacks. However most of people doesn't know how firewall works, and the most users of windows operating system doesn't know how to use the windows embedded firewall. This paper explains how firewall works, firewalls types, and all you need to know about firewall policies, then presents a novel application (QudsWall) developed by authors that manages windows embedded firewall and make it easy to use.

Shockley, Matt, Maixner, Chris, Johnson, Ryan, DeRidder, Mitch, Petullo, W. Michael. 2017. Using VisorFlow to Control Information Flow Without Modifying the Operating System Kernel or Its Userspace. Proceedings of the 2017 International Workshop on Managing Insider Security Threats. :13–24.

VisorFlow aims to monitor the flow of information between processes without requiring modifications to the operating system kernel or its userspace. VisorFlow runs in a privileged Xen domain and monitors the system calls executing in other domains running either Linux or Windows. VisorFlow uses its observations to prevent confidential information from leaving a local network. We describe the design and implementation of VisorFlow, describe how we used VisorFlow to confine na\"ıve users and malicious insiders during the 2017 Cyber-Defense Exercise, and provide performance measurements. We have released VisorFlow and its companion library, libguestrace, as open-source software.

Ashraf, Muhammed Naazer. 2017. Scratching the Surface of Windows Server 2016 and System Center Configuration Manager Current Branch. Proceedings of the 2017 ACM Annual Conference on SIGUCCS. :73–79.

Lehigh University has set a goal to implement System Center Configuration Manager by the end of 2017. This project is being spearheaded by one of our Senior Computing Consultants who has been researching and trained in the Microsoft Virtualization stack. We will discuss our roadmaps, results from our proof-of-concept environments, and discussions in driving this project.

Assaf, Eran, Basat, Ran Ben, Einziger, Gil, Friedman, Roy, Kassner, Yaron. 2017. Counting Distinct Elements over Sliding Windows. Proceedings of the 10th ACM International Systems and Storage Conference. :22:1–22:1.

In Distributed Denial of Service (DDoS) attacks, an attacker tries to disable a service with a flood of seemingly legitimate requests from multiple devices; this is usually accompanied by a sharp spike in the number of distinct IP addresses / flows accessing the system in a short time frame. Hence, the number of distinct elements over sliding windows is a fundamental signal in DDoS identification. Additionally, assessing whether a specific flow has recently accessed the system, known as the Set Membership problem, can help us identify the attacking parties. Here, we show how to extend the functionality of a state of the art algorithm for set membership over a W elements sliding window. We now also support estimation of the distinct flow count, using as little as log2 (W) additional bits.

Kim, Doowon, Kwon, Bum Jun, Dumitra\c s, Tudor. 2017. Certified Malware: Measuring Breaches of Trust in the Windows Code-Signing PKI. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :1435–1448.

Digitally signed malware can bypass system protection mechanisms that install or launch only programs with valid signatures. It can also evade anti-virus programs, which often forego scanning signed binaries. Known from advanced threats such as Stuxnet and Flame, this type of abuse has not been measured systematically in the broader malware landscape. In particular, the methods, effectiveness window, and security implications of code-signing PKI abuse are not well understood. We propose a threat model that highlights three types of weaknesses in the code-signing PKI. We overcome challenges specific to code-signing measurements by introducing techniques for prioritizing the collection of code signing certificates that are likely abusive. We also introduce an algorithm for distinguishing among different types of threats. These techniques allow us to study threats that breach the trust encoded in the Windows code signing PKI. The threats include stealing the private keys associated with benign certificates and using them to sign malware or by impersonating legitimate companies that do not develop software and, hence, do not own code-signing certificates. Finally, we discuss the actionable implications of our findings and propose concrete steps for improving the security of the code-signing ecosystem.

Finkbeiner, Bernd, Müller, Christian, Seidl, Helmut, Z\u alinescu, Eugen. 2017. Verifying Security Policies in Multi-Agent Workflows with Loops. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :633–645.

We consider the automatic verification of information flow security policies of web-based workflows, such as conference submission systems like EasyChair. Our workflow description language allows for loops, non-deterministic choice, and an unbounded number of participating agents. The information flow policies are specified in a temporal logic for hyperproperties. We show that the verification problem can be reduced to the satisfiability of a formula of first-order linear-time temporal logic, and provide decidability results for relevant classes of workflows and specifications. We report on experimental results obtained with an implementation of our approach on a series of benchmarks.

Wilson, Judson, Wahby, Riad S., Corrigan-Gibbs, Henry, Boneh, Dan, Levis, Philip, Winstein, Keith. 2017. Trust but Verify: Auditing the Secure Internet of Things. Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. :464–474.

Internet-of-Things devices often collect and transmit sensitive information like camera footage, health monitoring data, or whether someone is home. These devices protect data in transit with end-to-end encryption, typically using TLS connections between devices and associated cloud services. But these TLS connections also prevent device owners from observing what their own devices are saying about them. Unlike in traditional Internet applications, where the end user controls one end of a connection (e.g., their web browser) and can observe its communication, Internet-of-Things vendors typically control the software in both the device and the cloud. As a result, owners have no way to audit the behavior of their own devices, leaving them little choice but to hope that these devices are transmitting only what they should. This paper presents TLS–Rotate and Release (TLS-RaR), a system that allows device owners (e.g., consumers, security researchers, and consumer watchdogs) to authorize devices, called auditors, to decrypt and verify recent TLS traffic without compromising future traffic. Unlike prior work, TLS-RaR requires no changes to TLS's wire format or cipher suites, and it allows the device's owner to conduct a surprise inspection of recent traffic, without prior notice to the device that its communications will be audited.

Ali, Irfan, Hong, Jiwon, Kim, Sang-Wook. 2017. Exploiting Implicit and Explicit Signed Trust Relationships for Effective Recommendations. Proceedings of the Symposium on Applied Computing. :804–810.

Trust networks have been widely used to mitigate the data sparsity and cold-start problems of collaborative filtering. Recently, some approaches have been proposed which exploit explicit signed trust relationships, i.e., trust and distrust relationships. These approaches ignore the fact that users despite trusting/distrusting each other in a trust network may have different preferences in real-life. Most of these approaches also handle the notion of the transitivity of distrust as well as trust. However, other existing work observed that trust is transitive while distrust is intransitive. Moreover, explicit signed trust relationships are fairly sparse and may not contribute to infer true preferences of users. In this paper, we propose to create implicit signed trust relationships and exploit them along with explicit signed trust relationship to solve sparsity problem of trust relationships. We also confirm the similarity (resp. dissimilarity) of implicit and explicit trust (resp. distrust) relationships by using the similarity score between users so that users' true preferences can be inferred. In addition to these strategies, we also propose a matrix factorization model that simultaneously exploits implicit and explicit signed trust relationships along with rating information and also handles transitivity of trust and intransitivity of distrust. Extensive experiments on Epinions dataset show that the proposed approach outperforms existing approaches in terms of accuracy.

Goltzsche, David, Wulf, Colin, Muthukumaran, Divya, Rieck, Konrad, Pietzuch, Peter, Kapitza, Rüdiger. 2017. TrustJS: Trusted Client-Side Execution of JavaScript. Proceedings of the 10th European Workshop on Systems Security. :7:1–7:6.

Client-side JavaScript has become ubiquitous in web applications to improve user experience and reduce server load. However, since clients are untrusted, servers cannot rely on the confidentiality or integrity of client-side JavaScript code and the data that it operates on. For example, client-side input validation must be repeated at server side, and confidential business logic cannot be offloaded. In this paper, we present TrustJS, a framework that enables trustworthy execution of security-sensitive JavaScript inside commodity browsers. TrustJS leverages trusted hardware support provided by Intel SGX to protect the client-side execution of JavaScript, enabling a flexible partitioning of web application code. We present the design of TrustJS and provide initial evaluation results, showing that trustworthy JavaScript offloading can further improve user experience and conserve more server resources.

Durand, Arnaud, Gremaud, Pascal, Pasquier, Jacques. 2017. Decentralized Web of Trust and Authentication for the Internet of Things. Proceedings of the Seventh International Conference on the Internet of Things. :27:1–27:2.

As the Internet of Thing (IoT) matures, a lot of concerns are being raised about security, privacy and interoperability. The Web of Things (WoT) model leverages web technologies to improve interoperability. Due to its distributed components, the web scaled well beyond initial expectations. Still, secure authentication and communication across organization boundaries rely on the Public Key Infrastructure (PKI) which is a non-transparent, centralized single point of failure. We can improve transparency and reduce the chain of trust–-thus significantly improving the IoT security–-by empowering blockchain technology and web security standards. In this paper, we build a scalable, decentralized IoT-centric PKI and discuss how we can combine it with the emerging web authentication and authorization framework for constrained environments.

Pandey, M., Pandey, R., Chopra, U. K.. 2017. Rendering Trustability to Semantic Web Applications-Manchester Approach. 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS). :255–259.

The Semantic Web today is a web that allows for intelligent knowledge retrieval by means of semantically annotated tags. This web also known as Intelligent web aims to provide meaningful information to man and machines equally. However, the information thus provided lacks the component of trust. Therefore we propose a method to embed trust in semantic web documents by the concept of provenance which provides answers to who, when, where and by whom the documents were created or modified. This paper demonstrates the same using the Manchester approach of provenance implemented in a University Ontology.

Abuein, Q., Shatnawi, A., Al-Sheyab, H.. 2017. Trusted Recomendation System Based on Level of Trust(TRS_LoT). 2017 International Conference on Engineering and Technology (ICET). :1–5.

There are vast amounts of information in our world. Accessing the most accurate information in a speedy way is becoming more difficult and complicated. A lot of relevant information gets ignored which leads to much duplication of work and effort. The focuses tend to provide rapid and intelligent retrieval systems. Information retrieval (IR) is the process of searching for information that is related to some topics of interest. Due to the massive search results, the user will normally have difficulty in identifying the relevant ones. To alleviate this problem, a recommendation system is used. A recommendation system is a sort of filtering information system, which predicts the relevance of retrieved information to the user's needs according to some criteria. Hence, it can provide the user with the results that best fit their needs. The services provided through the web normally provide massive information about any requested item or service. An efficient recommendation system is required to classify this information result. A recommendation system can be further improved if augmented with a level of trust information. That is, recommendations are ranked according to their level of trust. In our research, we produced a recommendation system combined with an efficient level of trust system to guarantee that the posts, comments and feedbacks from users are trusted. We customized the concept of LoT (Level of Trust) [1] since it can cover medical, shopping and learning through social media. The proposed system TRS\_LoT provides trusted recommendations to the users with a high percentage of accuracy. Whereas a 300 post with more than 5000 comments from ``Amazon'' was selected to be used as a dataset, the experiment has been conducted by using same dataset based on ``post rating''.

Hosseinpourpia, M., Oskoei, M. A.. 2017. GA Based Parameter Estimation for Multi-Faceted Trust Model of Recommender Systems. 2017 5th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS). :160–165.

Recommender system is to suggest items that might be interest of the users in social networks. Collaborative filtering is an approach that works based on similarity and recommends items liked by other similar users. Trust model adopts users' trust network in place of similarity. Multi-faceted trust model considers multiple and heterogeneous trust relationship among the users and recommend items based on rating exist in the network of trustees of a specific facet. This paper applies genetic algorithm to estimate parameters of multi-faceted trust model, in which the trust weights are calculated based on the ratings and the trust network for each facet, separately. The model was built on Epinions data set that includes consumers' opinion, rating for items and the web of trust network. It was used to predict users' rating for items in different facets and root mean squared of prediction error (RMSE) was considered as a measure of performance. Empirical evaluations demonstrated that multi-facet models improve performance of the recommender system.

Alexopoulos, N., Daubert, J., Mühlhäuser, M., Habib, S. M.. 2017. Beyond the Hype: On Using Blockchains in Trust Management for Authentication. 2017 IEEE Trustcom/BigDataSE/ICESS. :546–553.

Trust Management (TM) systems for authentication are vital to the security of online interactions, which are ubiquitous in our everyday lives. Various systems, like the Web PKI (X.509) and PGP's Web of Trust are used to manage trust in this setting. In recent years, blockchain technology has been introduced as a panacea to our security problems, including that of authentication, without sufficient reasoning, as to its merits.In this work, we investigate the merits of using open distributed ledgers (ODLs), such as the one implemented by blockchain technology, for securing TM systems for authentication. We formally model such systems, and explore how blockchain can help mitigate attacks against them. After formal argumentation, we conclude that in the context of Trust Management for authentication, blockchain technology, and ODLs in general, can offer considerable advantages compared to previous approaches. Our analysis is, to the best of our knowledge, the first to formally model and argue about the security of TM systems for authentication, based on blockchain technology. To achieve this result, we first provide an abstract model for TM systems for authentication. Then, we show how this model can be conceptually encoded in a blockchain, by expressing it as a series of state transitions. As a next step, we examine five prevalent attacks on TM systems, and provide evidence that blockchain-based solutions can be beneficial to the security of such systems, by mitigating, or completely negating such attacks.

Chen, K., Mao, H., Shi, X., Xu, Y., Liu, A.. 2017. Trust-Aware and Location-Based Collaborative Filtering for Web Service QoS Prediction. 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). 2:143–148.

The rapid development of cloud computing has resulted in the emergence of numerous web services on the Internet. Selecting a suitable cloud service is becoming a major problem for users especially non-professionals. Quality of Service (QoS) is considered to be the criterion for judging web services. There are several Collaborative Filtering (CF)-based QoS prediction methods proposed in recent years. QoS values among different users may vary largely due to the network and geographical location. Moreover, QoS data provided by untrusted users will definitely affect the prediction accuracy. However, most existing methods seldom take both facts into consideration. In this paper, we present a trust-aware and location-based approach for web service QoS prediction. A trust value for each user is evaluated before the similarity calculation and the location is taken into account in similar neighbors selecting. A series of experiments are performed based on a realworld QoS dataset including 339 service users and 5,825 services. The experimental analysis shows that the accuracy of our method is much higher than other CF-based methods.

Ma, H., Tao, O., Zhao, C., Li, P., Wang, L.. 2017. Impact of Replacement Policies on Static-Dynamic Query Results Cache in Web Search Engines. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :137–139.

Caching query results is an efficient technique for Web search engines. A state-of-the-art approach named Static-Dynamic Cache (SDC) is widely used in practice. Replacement policy is the key factor on the performance of cache system, and has been widely studied such as LIRS, ARC, CLOCK, SKLRU and RANDOM in different research areas. In this paper, we discussed replacement policies for static-dynamic cache and conducted the experiments on real large scale query logs from two famous commercial Web search engine companies. The experimental results show that ARC replacement policy could work well with static-dynamic cache, especially for large scale query results cache.

Hasslinger, G., Kunbaz, M., Hasslinger, F., Bauschert, T.. 2017. Web Caching Evaluation from Wikipedia Request Statistics. 2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). :1–6.

Wikipedia is one of the most popular information platforms on the Internet. The user access pattern to Wikipedia pages depends on their relevance in the current worldwide social discourse. We use publically available statistics about the top-1000 most popular pages on each day to estimate the efficiency of caches for support of the platform. While the data volumes are moderate, the main goal of Wikipedia caches is to reduce access times for page views and edits. We study the impact of most popular pages on the achievable cache hit rate in comparison to Zipf request distributions and we include daily dynamics in popularity.

Mihindukulasooriya, Nandana, Rico, Mariano, Santana-Pérez, Idafen, Garc\'ıa-Castro, Raúl, Gómez-Pérez, Asunción. 2017. Repairing Hidden Links in Linked Data: Enhancing the Quality of RDF Knowledge Graphs. Proceedings of the Knowledge Capture Conference. :6:1–6:8.

Knowledge Graphs (KG) are becoming core components of most artificial intelligence applications. Linked Data, as a method of publishing KGs, allows applications to traverse within, and even out of, the graph thanks to global dereferenceable identifiers denoting entities, in the form of IRIs. However, as we show in this work, after analyzing several popular datasets (namely DBpedia, LOD Cache, and Web Data Commons JSON-LD data) many entities are being represented using literal strings where IRIs should be used, diminishing the advantages of using Linked Data. To remedy this, we propose an approach for identifying such strings and replacing them with their corresponding entity IRIs. The proposed approach is based on identifying relations between entities based on both ontological axioms as well as data profiling information and converting strings to entity IRIs based on the types of entities linked by each relation. Our approach showed 98% recall and 76% precision in identifying such strings and 97% precision in converting them to their corresponding IRI in the considered KG. Further, we analyzed how the connectivity of the KG is increased when new relevant links are added to the entities as a result of our method. Our experiments on a subset of the Spanish DBpedia data show that it could add 25% more links to the KG and improve the overall connectivity by 17%.

Kane, Andrew, Tompa, Frank Wm.. 2017. Small-Term Distribution for Disk-Based Search. Proceedings of the 2017 ACM Symposium on Document Engineering. :49–58.

A disk-based search system distributes a large index across multiple disks on one or more machines, where documents are typically assigned to disks at random in order to achieve load balancing. However, random distribution degrades clustering, which is required for efficient index compression. Using the GOV2 dataset, we demonstrate the effect of various ordering techniques on index compression, and then quantify the effect of various document distribution approaches on compression and load balancing. We explore runtime performance by simulating a disk-based search system for a scaled-out 10xGOV2 index over ten disks using two standard approaches, document and term distribution, as well as a hybrid approach: small-term distribution. We find that small-term distribution has the best performance, especially in the presence of list caching, and argue that this rarely discussed distribution approach can improve disk-based search performance for many real-world installations.

Sundarrajan, Aditya, Feng, Mingdong, Kasbekar, Mangesh, Sitaraman, Ramesh K.. 2017. Footprint Descriptors: Theory and Practice of Cache Provisioning in a Global CDN. Proceedings of the 13th International Conference on Emerging Networking EXperiments and Technologies. :55–67.

Modern CDNs cache and deliver a highly-diverse set of traffic classes, including web pages, images, videos and software downloads. It is economically advantageous for a CDN to cache and deliver all traffic classes using a shared distributed cache server infrastructure. However, such sharing of cache resources across multiple traffic classes poses significant cache provisioning challenges that are the focus of this paper. Managing a vast shared caching infrastructure requires careful modeling of user request sequences for each traffic class. Using extensive traces from Akamai's CDN, we show how each traffic class has drastically different object access patterns, object size distributions, and cache resource requirements. We introduce the notion of a footprint descriptor that is a succinct representation of the cache requirements of a request sequence. Leveraging novel connections to Fourier analysis, we develop a footprint descriptor calculus that allows us to predict the cache requirements when different traffic classes are added, subtracted and scaled to within a prediction error of 2.5%. We integrated our footprint calculus in the cache provisioning operations of the production CDN and show how it is used to solve key challenges in cache sizing, traffic mixing, and cache partitioning.

Jin, Boram, Kim, Daewoo, Yun, Se-Young, Shin, Jinwoo, Hong, Seongik, Lee, Byoung-Joon B.J., Yi, Yung. 2017. On the Delay Scaling Laws of Cache Networks. Proceedings of the 12th International Conference on Future Internet Technologies. :3:1–3:6.

The Internet is becoming more and more content-oriented. CDN (Content Distribution Networks) has been a popular architecture compatible with the current Internet, and a new revolutionary paradigm such as ICN (Information Centric Networking) has studied. One of the main components in both CDN and ICN is considering cache on network. Despite a surge of extensive use of cache in the current and future Internet architectures, analysis on the performance of general cache networks are still quite limited due to complex inter-plays among various components and thus analytical intractability. Due to mathematical tractability, we consider 'static' cache policies and study asymptotic delay performance of those policies in cache networks, in particular, focusing on the impact of heterogeneous content popularities and nodes' geographical 'importances' in caching policies. Furthermore, our simulation results suggest that they perform quite similarly as popular 'dynamic' policies such as LFU (Least-Frequently-Used) and LRU (Least-Recently-Used). We believe that our theoretical findings provide useful engineering implications such as when and how various factors have impact on caching performance.

Nishioka, Chifumi, Scherp, Ansgar. 2017. Keeping Linked Open Data Caches Up-to-Date by Predicting the Life-Time of RDF Triples. Proceedings of the International Conference on Web Intelligence. :73–80.

Many Linked Open Data applications require fresh copies of RDF data at their local repositories. Since RDF documents constantly change and those changes are not automatically propagated to the LOD applications, it is important to regularly visit the RDF documents to refresh the local copies and keep them up-to-date. For this purpose, crawling strategies determine which RDF documents should be preferentially fetched. Traditional crawling strategies rely only on how an RDF document has been modified in the past. In contrast, we predict on the triple level whether a change will occur in the future. We use the weekly snapshots of the DyLDO dataset as well as the monthly snapshots of the Wikidata dataset. First, we conduct an in-depth analysis of the life span of triples in RDF documents. Through the analysis, we identify which triples are stable and which are ephemeral. We introduce different features based on the triples and apply a simple but effective linear regression model. Second, we propose a novel crawling strategy based on the linear regression model. We conduct two experimental setups where we vary the amount of available bandwidth as well as iteratively observe the quality of the local copies over time. The results demonstrate that the novel crawling strategy outperforms the state of the art in both setups.