Biblio
With the development of e-Science and data intensive scientific discovery, it needs to ensure scientific data available for the long-term, with the goal that the valuable scientific data should be discovered and re-used for downstream investigations, either alone, or in combination with newly generated data. As such, the preservation of scientific data enables that not only might experiment be reproducible and verifiable, but also new questions can be raised by other scientists to promote research and innovation. In this paper, we focus on the two main problems of digital preservation that are format migration and preservation metadata. Format migration includes both format verification and object transformation. The system architecture of format migration and preservation metadata is presented, mapping rules of object transformation are analyzed, data fixity and integrity and authenticity, digital signature and so on are discussed and an example is shown in detail.
Intentionally deceptive content presented under the guise of legitimate journalism is a worldwide information accuracy and integrity problem that affects opinion forming, decision making, and voting patterns. Most so-called `fake news' is initially distributed over social media conduits like Facebook and Twitter and later finds its way onto mainstream media platforms such as traditional television and radio news. The fake news stories that are initially seeded over social media platforms share key linguistic characteristics such as making excessive use of unsubstantiated hyperbole and non-attributed quoted content. In this paper, the results of a fake news identification study that documents the performance of a fake news classifier are presented. The Textblob, Natural Language, and SciPy Toolkits were used to develop a novel fake news detector that uses quoted attribution in a Bayesian machine learning system as a key feature to estimate the likelihood that a news article is fake. The resultant process precision is 63.333% effective at assessing the likelihood that an article with quotes is fake. This process is called influence mining and this novel technique is presented as a method that can be used to enable fake news and even propaganda detection. In this paper, the research process, technical analysis, technical linguistics work, and classifier performance and results are presented. The paper concludes with a discussion of how the current system will evolve into an influence mining system.
In this paper, we present a combinatorial testing methodology for testing web applications in regards to SQL injection vulnerabilities. We describe three attack grammars that were developed and used to generate concrete attack vectors. Furthermore, we present and evaluate two different oracles used to observe the application's behavior when subjected to such attack vectors. We also present a prototype tool called SQLInjector capable of automated SQL injection vulnerability testing for web applications. The developed methodology can be applied to any web application that uses server side scripting and HTML for handling user input and has a SQL database backend. Our approach relies on the use of a database proxy, making this a gray-box testing method. We establish the effectiveness of the proposed tool with the WAVSEP verification framework and conduct a case study on real-world web applications, where we are able to discover both known vulnerabilities and additional previously undiscovered flaws.
Data dependency flow have been reformulated as Context Free Grammar (CFG) reachability problem, and the idea was explored in detection of some web vulnerabilities, particularly Cross Site Scripting (XSS) and Access Control. However, reformulation of SQL Injection Vulnerability (SQLIV) detection as grammar reachability problem has not been investigated. In this paper, concepts of data dependency flow was used to reformulate SQLIVs detection as a CFG reachability problem. The paper, consequently defines reachability analysis strategy for SQLIVs detection.
For over two decades the OpenPGP format has provided the mainstay of email confidentiality and authenticity, and is currently being relied upon to provide authenticated package distributions in open source Unix systems. In this work, we provide the first language theoretical analysis of the OpenPGP format, classifying it as a deterministic context free language and establishing that an automatically generated parser can in principle be defined. However, we show that the number of rules required to describe it with a deterministic context free grammar is prohibitively high, and we identify security vulnerabilities in the OpenPGP format specification. We identify possible attacks aimed at tampering with messages and certificates while retaining their syntactical and semantical validity. We evaluate the effectiveness of these attacks against the two OpenPGP implementations covering the overwhelming majority of uses, i.e., the GNU Privacy Guard (GPG) and Symantec PGP. The results of the evaluation show that both implementations turn out not to be vulnerable due to conser- vative choices in dealing with malicious input data. Finally, we provide guidelines to improve the OpenPGP specification
Software systems nowadays communicate via a number of complex languages. This is often the cause of security vulnerabilities like arbitrary code execution, or injections. Whereby injections such as cross-site scripting are widely known from textual languages such as HTML and JSON that constantly gain more popularity. These systems use parsers to read input and unparsers write output, where these security vulnerabilities arise. Therefore correct parsing and unparsing of messages is of the utmost importance when developing secure and reliable systems. Part of the challenge developers face is to correctly encode data during unparsing and decode it during parsing. This paper presents McHammerCoder, an (un)parser and encoding generator supporting textual and binary languages. Those (un)parsers automatically apply the generated encoding, that is derived from the language's grammar. Therefore manually defining and applying encoding is not required to effectively prevent injections when using McHammerCoder. By specifying the communication language within a grammar, McHammerCoder provides developers with correct input and output handling code for their custom language.
The threat from insiders is an ever-growing concern for organisations, and in recent years the harm that insiders pose has been widely demonstrated. This paper describes our recent work into how we might support insider threat detection when actions are taken which can be immediately determined as of concern because they fall into one of two categories: they violate a policy which is specifically crafted to describe behaviours that are highly likely to be of concern if they are exhibited, or they exhibit behaviours which follow a pattern of a known insider threat attack. In particular, we view these concerning actions as something that we can design and implement tripwires within a system to detect. We then orchestrate these tripwires in conjunction with an anomaly detection system and present an approach to formalising tripwires of both categories. Our intention being that by having a single framework for describing them, alongside a library of existing tripwires in use, we can provide the community of practitioners and researchers with the basis to document and evolve this common understanding of tripwires.
Malicious applications can be introduced to attack users and services so as to gain financial rewards, individuals' sensitive information, company and government intellectual property, and to gain remote control of systems. However, traditional methods of malicious code detection, such as signature detection, behavior detection, virtual machine detection, and heuristic detection, have various weaknesses which make them unreliable. This paper presents the existing technologies of malicious code detection and a malicious code detection model is proposed based on behavior association. The behavior points of malicious code are first extracted through API monitoring technology and integrated into the behavior; then a relation between behaviors is established according to data dependence. Next, a behavior association model is built up and a discrimination method is put forth using pushdown automation. Finally, the exact malicious code is taken as a sample to carry out an experiment on the behavior's capture, association, and discrimination, thus proving that the theoretical model is viable.
In the present paper, we present our approach for the transformation of workflow applications based on institution theory. The workflow application is modeled with UML Activity Diagram(UML AD). Then, for a formal verification purposes, the graphical model will be translated to an Event-B specification. Institution theory will be used in two levels. First, we defined a local semantic for UML AD and Event B specification using a categorical description of each one. Second, we defined institution comorphism to link the two defined institutions. The theoretical foundations of our approach will be studied in the same mathematical framework since the use of institution theory. The resulted Event-B specification, after applying the transformation approach, will be used for the formal verification of functional proprieties and the verification of absences of problems such deadlock. Additionally, with the institution comorphism, we define a semantic correctness and coherence of the model transformation.
Malicious applications can be introduced to attack users and services so as to gain financial rewards, individuals' sensitive information, company and government intellectual property, and to gain remote control of systems. However, traditional methods of malicious code detection, such as signature detection, behavior detection, virtual machine detection, and heuristic detection, have various weaknesses which make them unreliable. This paper presents the existing technologies of malicious code detection and a malicious code detection model is proposed based on behavior association. The behavior points of malicious code are first extracted through API monitoring technology and integrated into the behavior; then a relation between behaviors is established according to data dependence. Next, a behavior association model is built up and a discrimination method is put forth using pushdown automation. Finally, the exact malicious code is taken as a sample to carry out an experiment on the behavior's capture, association, and discrimination, thus proving that the theoretical model is viable.