Biblio
Brute-force login attempts are common for every host on the public Internet. While most of them can be discarded as low-threat attacks, targeted attack campaigns often use a dictionary-based brute-force attack to establish a foothold in the network. Therefore, it is important to characterize the attackers' behavior to prioritize defensive measures and react to new threats quickly. In this paper we present a set of metrics that can support threat hunters in characterizing brute-force login attempts. Based on connection metadata, timing information, and the attacker's dictionary these metrics can help to differentiate scans and to find common behavior across distinct IP addresses. We evaluated our novel metrics on a real-world data set of malicious login attempts collected by our honeypot Honeygrove. We highlight interesting metrics, show how clustering can be leveraged to reveal common behavior across IP addresses, and describe how selected metrics help to assess the threat level of attackers. Amongst others, we for example found strong indicators for collusion between ten otherwise unrelated IP addresses confirming that a clustering of the right metrics can help to reveal coordinated attacks.
With the development of e-Science and data intensive scientific discovery, it needs to ensure scientific data available for the long-term, with the goal that the valuable scientific data should be discovered and re-used for downstream investigations, either alone, or in combination with newly generated data. As such, the preservation of scientific data enables that not only might experiment be reproducible and verifiable, but also new questions can be raised by other scientists to promote research and innovation. In this paper, we focus on the two main problems of digital preservation that are format migration and preservation metadata. Format migration includes both format verification and object transformation. The system architecture of format migration and preservation metadata is presented, mapping rules of object transformation are analyzed, data fixity and integrity and authenticity, digital signature and so on are discussed and an example is shown in detail.
Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.
Phishing is typically deployed as an attack vector in the initial stages of a hacking endeavour. Due to it low-risk rightreward nature it has seen a widespread adoption, and detecting it has become a challenge in recent times. This paper proposes a novel means of detecting phishing websites using a Generative Adversarial Network. Taking into account the internal structure and external metadata of a website, the proposed approach uses a generator network which generates both legitimate as well as synthetic phishing features to train a discriminator network. The latter then determines if the features are either normal or phishing websites, before improving its detection accuracy based on the classification error. The proposed approach is evaluated using two different phishing datasets and is found to achieve a detection accuracy of up to 94%.
We propose a new spam detection approach based solely on meta data features gained from email headers. The approach achieves above 99 % classification accuracy on the CSDMC2010 dataset, which matches or surpasses state-of-the-art spam classifiers. We utilize a static set of engineered features, supplemented with automatically extracted features. The approach is just as effective for spam detection in end-to-end encryption, as our feature set remains unchanged for encrypted emails. In contrast to most established spam detectors, we disregard the email body completely and can therefore deliver very high classification speeds, as computationally expensive text preprocessing is not necessary.
Development of information systems dealing with education and labour market using web and grid service architecture enables their modularity, expandability and interoperability. Application of ontologies to the web helps with collecting and selecting the knowledge about a certain field in a generic way, thus enabling different applications to understand, use, reuse and share the knowledge among them. A necessary step before publishing computer-interpretable data on the public web is the implementation of common standards that will ensure the exchange of information. Croatian Qualification Framework (CROQF) is a project of standardization of occupations for the labour market, as well as standardization of sets of qualifications, skills and competences and their mutual relations. This paper analysis a respectable amount of research dealing with application of ontologies to information systems in education during the last decade. The main goal is to compare achieved results according to: 1) phases of development/classifications of education-related ontologies; 2) areas of education and 3) standards and structures of metadata for educational systems. Collected information is used to provide insight into building blocks of CROQF, both the ones well supported by experience and best practices, and the ones that are not, together with guidelines for development of own standards using ontological structures.
Traditional firewalls, Intrusion Detection Systems(IDS) and network analytics tools extensively use the `flow' connection concept, consisting of five `tuples' of source and destination IP, ports and protocol type, for classification and management of network activities. By analysing flows, information can be obtained from TCP/IP fields and packet content to give an understanding of what is being transferred within a single connection. As networks have evolved to incorporate more connections and greater bandwidth, particularly from ``always on'' IoT devices and video and data streaming, so too have malicious network threats, whose communication methods have increased in sophistication. As a result, the concept of the 5 tuple flow in isolation is unable to detect such threats and malicious behaviours. This is due to factors such as the length of time and data required to understand the network traffic behaviour, which cannot be accomplished by observing a single connection. To alleviate this issue, this paper proposes the use of additional, two tuple and single tuple flow types to associate multiple 5 tuple communications, with generated metadata used to profile individual connnection behaviour. This proposed approach enables advanced linking of different connections and behaviours, developing a clearer picture as to what network activities have been taking place over a prolonged period of time. To demonstrate the capability of this approach, an expert system rule set has been developed to detect the presence of a multi-peered ZeuS botnet, which communicates by making multiple connections with multiple hosts, thus undetectable to standard IDS systems observing 5 tuple flow types in isolation. Finally, as the solution is rule based, this implementation operates in realtime and does not require post-processing and analytics of other research solutions. This paper aims to demonstrate possible applications for next generation firewalls and methods to acquire additional information from network traffic.
The pervasive use of databases for the storage of critical and sensitive information in many organizations has led to an increase in the rate at which databases are exploited in computer crimes. While there are several techniques and tools available for database forensic analysis, such tools usually assume an apriori database preparation, such as relying on tamper-detection software to already be in place and the use of detailed logging. Further, such tools are built-in and thus can be compromised or corrupted along with the database itself. In practice, investigators need forensic and security audit tools that work on poorlyconfigured systems and make no assumptions about the extent of damage or malicious hacking in a database.In this paper, we present our database forensics methods, which are capable of examining database content from a storage (disk or RAM) image without using any log or file system metadata. We describe how these methods can be used to detect security breaches in an untrusted environment where the security threat arose from a privileged user (or someone who has obtained such privileges). Finally, we argue that a comprehensive and independent audit framework is necessary in order to detect and counteract threats in an environment where the security breach originates from an administrator (either at database or operating system level).
This paper presents a theoretical background of main research activity focused on the evaluation of wiping/erasure standards which are mostly implemented in specific software products developed and programming for data wiping. The information saved in storage devices often consists of metadata and trace data. Especially but not only these kinds of data are very important in the process of forensic analysis because they sometimes contain information about interconnection on another file. Most people saving their sensitive information on their local storage devices and later they want to secure erase these files but usually there is a problem with this operation. Secure file destruction is one of many Anti-forensics methods. The outcome of this paper is to define the future research activities focused on the establishment of the suitable digital environment. This environment will be prepared for testing and evaluating selected wiping standards and appropriate eraser software.
Recently, cloud computing is an emerging technology along with big data. Both technologies come together. Due to the enormous size of data in big data, it is impossible to store them in local storage. Alternatively, even we want to store them locally, we have to spend much money to create bit data center. One way to save money is store big data in cloud storage service. Cloud storage service provides users space and security to store the file. However, relying on single cloud storage may cause trouble for the customer. CSP may stop its service anytime. It is too risky if data owner hosts his file only single CSP. Also, the CSP is the third party that user have to trust without verification. After deploying his file to CSP, the user does not know who access his file. Even CSP provides a security mechanism to prevent outsider attack. However, how user ensure that there is no insider attack to steal or corrupt the file. This research proposes the way to minimize the risk, ensure data privacy, also accessing control. The big data file is split into chunks and distributed to multiple cloud storage provider. Even there is insider attack; the attacker gets only part of the file. He cannot reconstruct the whole file. After splitting the file, metadata is generated. Metadata is a place to keep chunk information, includes, chunk locations, access path, username and password of data owner to connect each CSP. Asymmetric security concept is applied to this research. The metadata will be encrypted and transfer to the user who requests to access the file. The file accessing, monitoring, metadata transferring is functions of dew computing which is an intermediate server between the users and cloud service.
This paper aims to explain static analysis techniques in detail, and to highlight the weaknesses and challenges which face it. To this end, more than 80 static analysis-based framework have been studied, and in their light, the process of detecting malicious applications has been divided into four phases that were explained in a schematic manner. Also, the features that is used in static analysis were discussed in detail by dividing it into four categories namely, Manifest-based features, code-based features, semantic features and app's metadata-based features. Also, the challenges facing methods based on static analysis were discussed in detail. Finally, a case study was conducted to test the strength of some known commercial antivirus and one of the stat-of-art academic static analysis frameworks against obfuscation techniques used by developers of malicious applications. The results showed a significant impact on the performance of the most tested antiviruses and frameworks, which is reflecting the urgent need for more accurately tools.
Internet plays a crucial role in today's life, so the usage of online social network monotonically increasing. People can share multimedia information's fastly and keep in touch or communicate with friend's easily through online social network across the world. Security in authentication is a big challenge in online social network and authentication is a preliminary process for identifying legitimate user. Conventionally, we are using alphanumeric textbased password for authentication approach. But the main flaw points of text based password is highly vulnerable to attacks and difficulty of recalling password during authentication time due to the irregular use of passwords. To overcome the shortcoming of text passwords, we propose a Graphical Password authentication. An approach of Graphical Password is an authentication of amalgam of pictures. It is less vulnerable to attacks and human can easily recall pictures better than text. So the graphical password is a better alternative to text passwords. As the image uploads are increasing by users share through online site, privacy preserving has become a major problem. So we need a Caption Based Metadata Stratification of images for delivers an automatic suggestion of similar category already in database, it works by comparing the caption metadata of album with caption metadata already in database or extract the synonyms of caption metadata of new album for checking the similarity with caption metadata already in database. This stratification offers an enhanced automatic privacy prediction for uploaded images in online social network, privacy is an inevitable factor for uploaded images, and privacy violation is a major concern. So we propose an Automatic Policy Prediction for uploaded images that are classified by caption metadata. An automatic policy prediction is a hassle-free privacy setting proposed to the user.
Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.
We present work undertaken at our institutional repository to enhance metadata and re-organize digital objects according to new information architecture, in an effort to minimize administrative object management and processing, and improve object discovery and use. This work was partly motivated by the launch of a new discovery platform at our institution, which aggregates metadata and full text from our four open access repositories into a cohesive, consistent, and enhanced searching and browsing experience. The platform provides digital object identifier (DOI) assignment, metadata access via various formats, and an open metadata and full text application program interface (API) for researchers, amongst other features. Functionality of these platform features relies heavily on accurate object representation and metadata. This work facilitates and improves the discovery and engagement of the diverse digital objects available from our institution, so they can be used and analyzed in new, flexible, and innovative ways by a myriad of communities and disciplines.
Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.
We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.
Now a day's cloud computing is power station to run multiple businesses. It is cumulating more and more users every day. Database-as-a-service is service model provided by cloud computing to store, manage and process data on a cloud platform. Database-as-a-service has key characteristics such as availability, scalability, elasticity. A customer does not have to worry about database installation and management. As a replacement, the cloud database service provider takes responsibility for installing and maintaining the database. The real problem occurs when it comes to storing confidential or private information in the cloud database, we cannot rely on the cloud data vendor. A curious cloud database vendor may capture and leak the secret information. For that purpose, Protected Database-as-a-service is a novel solution to this problem that provides provable and pragmatic privacy in the face of a compromised cloud database service provider. Protected Database-as-a-service defines various encryption schemes to choose encryption algorithm and encryption key to encrypt and decrypt data. It also provides "Master key" to users, so that a metadata storage table can be decrypted only by using the master key of the users. As a result, a cloud service vendor never gets access to decrypted data, and even if all servers are jeopardized, in such inauspicious circumstances a cloud service vendor will not be able to decrypt the data. Proposed Protected Database-as-a-service system allows multiple geographically distributed clients to execute concurrent and independent operation on encrypted data and also conserve data confidentiality and consistency at cloud level, to eradicate any intermediate server between the client and the cloud database.