Biblio
Most Web sites rely on resources hosted by third parties such as CDNs. Third parties may be compromised or coerced into misbehaving, e.g. delivering a malicious script or stylesheet. Unexpected changes to resources hosted by third parties can be detected with the Subresource Integrity (SRI) mechanism. The focus of SRI is on scripts and stylesheets. Web fonts cannot be secured with that mechanism under all circumstances. The first contribution of this paper is to evaluates the potential for attacks using malicious fonts. With an instrumented browser we find that (1) more than 95% of the top 50,000 Web sites of the Tranco top list rely on resources hosted by third parties and that (2) only a small fraction employs SRI. Moreover, we find that more than 60% of the sites in our sample use fonts hosted by third parties, most of which are being served by Google. The second contribution of the paper is a proof of concept of a malicious font as well as a tool for automatically generating such a font, which targets security-conscious users who are used to verifying cryptographic fingerprints. Software vendors publish such fingerprints along with their software packages to allow users to verify their integrity. Due to incomplete SRI support for Web fonts, a third party could force a browser to load our malicious font. The font targets a particular cryptographic fingerprint and renders it as a desired different fingerprint. This allows attackers to fool users into believing that they download a genuine software package although they are actually downloading a maliciously modified version. Finally, we propose countermeasures that could be deployed to protect the integrity of Web fonts.
The security of web browsers is of paramount importance, these days perhaps more than ever. Unfortunately, acquiring real data for security-related research is not an easy task, as access to sensitive information is rarely granted to researchers who are not members of a trusted security team. In this paper, we describe a method to mine security-related commits from open source software repositories, even if the reports of already fixed security issues have access restrictions, and we show the applicability of the method on two popular web browser projects. We also made the mined dataset available, listing more than 13,000 security-related commits, with which we hope to facilitate research on security-targeted bug prediction.
The Web ecosystem has been evolving over the past years and new Internet protocols, namely HTTP/2 over TLS/TCP and QUIC/UDP, are now used to deliver Web contents. Similarly, CDNs (Content Delivery Network) are deployed worldwide, caching contents close to end-users to optimize web browsing quality. We present in this paper an analysis of the influence of the Internet protocols and CDN on the Top 10,000 Alexa websites, based on a 12-month measurement campaign (from April 2018 to April 2019) performed via our tool Web View [1]. Part of our measurements are made public, represented on a monitoring website1, showing the results for the Top 50 Alexa Websites plus few specific websites and 8 french websites, suggested by the French Agency in charge of regulating telecommunications. Our analysis of this long-term measurement campaign allows to better analyze the delivery of public websites. For instance, it shows that even if some argue that QUIC optimizes the quality, it is not observed in the real-life since QUIC is not largely deployed. Our method for analyzing CDN delivery in the Web browsing allows us to evaluate its influence, which is important since their usage can decrease the web pages' loading time, on average 43.1% with HTTP/2 and 38.5% with QUIC, when requesting a second time the same home page.
Todays analyzing web weaknesses and vulnerabilities in order to find security attacks has become more urgent. In case there is a communication contrary to the system security policies, a covert channel has been created. The attacker can easily disclosure information from the victim's system with just one public access permission. Covert timing channels, unlike covert storage channels, do not have memory storage and they draw less attention. Different methods have been proposed for their identification, which generally benefit from the shape of traffic and the channel's regularity. In this article, an entropy-based detection method is designed and implemented. The attacker can adjust the amount of channel entropy by controlling measures such as changing the channel's level or creating noise on the channel to protect from the analyst's detection. As a result, the entropy threshold is not always constant for detection. By comparing the entropy from different levels of the channel and the analyst, we conclude that the analyst must investigate traffic at all possible levels.
Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.
One of the major threats against web applications is Cross-Site Scripting (XSS). The final target of XSS attacks is the client running a particular web browser. During this last decade, several competing web browsers (IE, Netscape, Chrome, Firefox) have evolved to support new features. In this paper, we explore whether the evolution of web browsers is done using systematic security regression testing. Beginning with an analysis of their current exposure degree to XSS, we extend the empirical study to a decade of most popular web browser versions. We use XSS attack vectors as unit test cases and we propose a new method supported by a tool to address this XSS vector testing issue. The analysis on a decade releases of most popular web browsers including mobile ones shows an urgent need of XSS regression testing. We advocate the use of a shared security testing benchmark as a good practice and propose a first set of publicly available XSS vectors as a basis to ensure that security is not sacrificed when a new version is delivered.