Biblio
Record linkage refers to the task of finding same entity across different databases. We propose a machine learning based record linkage algorithm for financial entity databases. Record linkage on financial databases are essential for information integration on certain financial entity, since those databases do not have common unified identifier. Our algorithm works in two steps to determine if a pair of record is same entity or not. First we check with proposed rules if the record pair can be exactly matched after cleaning the entity name and address. Second, inspired by earlier work on author name disambiguation, we train a binary Random Forest classifier to decide the linkage. To reduce and scale the computation, this process is done only for candidate pairs within a proposed heuristic. Initial evaluation for precision, recall and F1 measures on two different linking tasks in the Financial Entity Identification and Information Integration (FEIII) Challenge show promising results.
Recent advances in differential privacy make it possible to guarantee user privacy while preserving the main characteristics of the data. However, most differential privacy mechanisms assume that the underlying dataset is clean. This paper explores the link between data cleaning and differential privacy in a framework we call PrivateClean. PrivateClean includes a technique for creating private datasets of numerical and discrete-valued attributes, a formalism for privacy-preserving data cleaning, and techniques for answering sum, count, and avg queries after cleaning. We show: (1) how the degree of privacy affects subsequent aggregate query accuracy, (2) how privacy potentially amplifies certain types of errors in a dataset, and (3) how this analysis can be used to tune the degree of privacy. The key insight is to maintain a bipartite graph relating dirty values to clean values and use this graph to estimate biases due to the interaction between cleaning and privacy. We validate these results on four datasets with a variety of well-studied cleaning techniques including using functional dependencies, outlier filtering, and resolving inconsistent attributes.
Cryptographic protocols and algorithms are the strength of digital era in which we are living. Unluckily, the security of many confidential information and credentials has been compromised due to ignorance of required security services. As a result, various attacks have been introduced by talented attackers and many security issues like as financial loss, violations of personal privacy, and security threats to democracy. This research paper provides the secure design and architecture of cryptographic protocols and expedites the authentication of cryptographic system. Designing and developing a secure cryptographic system is like a game in which designer or developer tries to maintain the security while attacker tries to penetrate the security features to perform successful attack.
Web Service Architecture gives a compatible and scalable structure for web service interactions with performance, responsiveness, reliability and security to make a quality of software design. Systematic quantitative approaches have been discussed for designing and developing software systems that meet performance objectives. Many companies have successfully applied these techniques in different applications to achieve better performance in terms of financial, customer satisfaction, and other benefits. This paper describes the architecture, design, implementation, integration testing, performance and maintenance of new applications. The most successful best practices used in world class organizations are discussed. This will help the application, component, and software system designers to develop web applications and fine tune the existing methods in line with the best practices. In business process automation, many standard practices and technologies have been used to model and execute business processes. The emerging technology is web applications technology which provides a great flexibility for development of interoperable environment services. In this paper we propose a Case study of Automatic Gas Booking system, a business process development strategy and best practices used in development of software components used in web applications. The classification of QWS dataset with 2507 records, service invocations, integration and security for web applications have been discussed.
In this paper, we propose a practical and efficient word and phrase proximity searchable encryption protocols for cloud based relational databases. The proposed advanced searchable encryption protocols are provably secure. We formalize the security assurance with cryptographic security definitions and prove the security of our searchable encryption protocols under Shannon's perfect secrecy assumption. We have tested the proposed protocols comprehensively on Amazon's high performance computing server using mysql database and presented the results. The proposed protocols ensure that there is zero overhead of space and communication because cipher text size being equal to plaintext size. For the same reason, the database schema also does not change for existing applications. In this paper, we also present results of comprehensive analysis for Song, Wagner, and Perrig scheme.
Malicious domains are key components to a variety of cyber attacks. Several recent techniques are proposed to identify malicious domains through analysis of DNS data. The general approach is to build classifiers based on DNS-related local domain features. One potential problem is that many local features, e.g., domain name patterns and temporal patterns, tend to be not robust. Attackers could easily alter these features to evade detection without affecting much their attack capabilities. In this paper, we take a complementary approach. Instead of focusing on local features, we propose to discover and analyze global associations among domains. The key challenges are (1) to build meaningful associations among domains; and (2) to use these associations to reason about the potential maliciousness of domains. For the first challenge, we take advantage of the modus operandi of attackers. To avoid detection, malicious domains exhibit dynamic behavior by, for example, frequently changing the malicious domain-IP resolutions and creating new domains. This makes it very likely for attackers to reuse resources. It is indeed commonly observed that over a period of time multiple malicious domains are hosted on the same IPs and multiple IPs host the same malicious domains, which creates intrinsic association among them. For the second challenge, we develop a graph-based inference technique over associated domains. Our approach is based on the intuition that a domain having strong associations with known malicious domains is likely to be malicious. Carefully established associations enable the discovery of a large set of new malicious domains using a very small set of previously known malicious ones. Our experiments over a public passive DNS database show that the proposed technique can achieve high true positive rates (over 95%) while maintaining low false positive rates (less than 0.5%). Further, even with a small set of known malicious domains (a couple of hundreds), our technique can discover a large set of potential malicious domains (in the scale of up to tens of thousands).
Databases have become one of the most important components in modern software systems. For example, web services, cloud computing systems, and online transaction processing systems all rely heavily on databases. To abstract the complexity of accessing a database, developers make use of Object-Relational Mapping (ORM) frameworks. ORM frameworks provide an abstraction layer between the application logic and the underlying database. Such abstraction layer automatically maps objects in Object-Oriented Languages to database records, which significantly reduces the amount of boilerplate code that needs to be written. Despite the advantages of using ORM frameworks, we observe several difficulties in maintaining ORM code (i.e., code that makes use of ORM frameworks) when cooperating with our industrial partner. After conducting studies on other open source systems, we find that such difficulties are common in other Java systems. Our study finds that i) ORM cannot completely encapsulate database accesses in objects or abstract the underlying database technology, thus may cause ORM code changes more scattered; ii) ORM code changes are more frequent than regular code, but there is a lack of tools that help developers verify ORM code at compilation time; iii) we find that changes to ORM code are more commonly due to performance or security reasons; however, traditional static code analyzers need to be extended to capture the peculiarities of ORM code in order to detect such problems. Our study highlights the hidden maintenance costs of using ORM frameworks, and provides some initial insights about potential approaches to help maintain ORM code. Future studies should carefully examine ORM code, especially given the rising use of ORM in modern software systems.
In this paper, we focus on the definition of estimators to predict method calls in Android apps. Estimation models are based on information from requirements specification documents (e.g., number of actors, number of use cases, and number of classes in the conceptual model). We have used a dataset containing information on 23 Android apps. After performing data-cleaning, we applied linear regression to build estimation models on 21 data points. Results suggest that measures gathered from requirements specification documents can be considered good predictors to estimate the number of internal calls (i.e., methods invoking other methods present in the app) and external calls (i.e., invocations to API) as well as their sum.
3D die stacking and 2.5D interposer design are promising technologies to improve integration density, performance and cost. Current approaches face serious issues in dealing with emerging security challenges such as side channel attacks, hardware trojans, secure IC manufacturing and IP piracy. By utilizing intrinsic characteristics of 2.5D and 3D technologies, we propose novel opportunities in designing secure systems. We present: (i) a 3D architecture for shielding side-channel information; (ii) split fabrication using active interposers; (iii) circuit camouflage on monolithic 3D IC, and (iv) 3D IC-based security processing-in-memory (PIM). Advantages and challenges of these designs are discussed, showing that the new designs can improve existing countermeasures against security threats and further provide new security features.
3D die stacking and 2.5D interposer design are promising technologies to improve integration density, performance and cost. Current approaches face serious issues in dealing with emerging security challenges such as side channel attacks, hardware trojans, secure IC manufacturing and IP piracy. By utilizing intrinsic characteristics of 2.5D and 3D technologies, we propose novel opportunities in designing secure systems. We present: (i) a 3D architecture for shielding side-channel information; (ii) split fabrication using active interposers; (iii) circuit camouflage on monolithic 3D IC, and (iv) 3D IC-based security processing-in-memory (PIM). Advantages and challenges of these designs are discussed, showing that the new designs can improve existing countermeasures against security threats and further provide new security features.
Reputation systems in current electronic marketplaces can easily be manipulated by malicious sellers in order to appear more reputable than appropriate. We conducted a controlled experiment with 40 UK and 41 German participants on their ability to detect malicious behavior by means of an eBay-like feedback profile versus a novel interface involving an interactive visualization of reputation data. The results show that participants using the new interface could better detect and understand malicious behavior in three out of four attacks (the overall detection accuracy 77% in the new vs. 56% in the old interface). Moreover, with the new interface, only 7% of the users decided to buy from the malicious seller (the options being to buy from one of the available sellers or to abstain from buying), as opposed to 30% in the old interface condition.
Smartphones nowadays are customized to help users with their daily tasks such as storing important data or making transactions through the internet. With the sensitivity of the data involved, authentication mechanism such as fixed-text password, PIN, or unlock patterns are used to safeguard these data against intruders. However, these mechanisms have the risk from security threats such as cracking or shoulder surfing. To enhance mobile and/or information security, this study aimed to develop a free-form handwriting gesture user authentication for smartphones. It also tried to discover the static and dynamic handwriting features that significantly influence the recognition of a legitimate user. The experiment was then conducted by asking thirty (30) individuals to draw or swipe using their fingertip their desired free-form security pattern ten (10) times. These patterns were then cleaned and processed, and extracted seven (7) static and eleven (11) dynamic handwriting features. By means of Neural Network classifier of the RapidMiner data mining tool, these features were used to develop, validate, and test a model for user authentication. The model showed a very promising recognition rate of 96.67%. The model is further tested through a prototype, and it still gave a very satisfactory result.
This demo dramatically illustrates how replacing 'Classic' TCP congestion control (Reno, Cubic, etc.) with a 'Scalable' alternative like Data Centre TCP (DCTCP) keeps queuing delay ultra-low; not just for a select few light applications like voice or gaming, but even when a variety of interactive applications all heavily load the same (emulated) Internet access. DCTCP has so far been confined to data centres because it is too aggressive–-it starves Classic TCP flows. To allow DCTCP to be exploited on the public Internet, we developed DualQ Coupled Active Queue Management (AQM), which allows the two TCP types to safely co-exist. Visitors can test all these claims. As well as running Web-based apps, they can pan and zoom a panoramic video of a football stadium on a touch-screen, and experience how their personalized HD scene seems to stick to their finger, even though it is encoded on the fly on servers accessed via an emulated delay, representing 'the cloud'. A pair of VR goggles can be used at the same time, making a similar point. The demo provides a dashboard so that visitors can not only experience the interactivity of each application live, but they can also quantify it via a wide range of performance stats, updated live. It also includes controls so visitors can configure different TCP variants, AQMs, network parameters and background loads and immediately test the effect.
Arrays of nanosized hollow spheres of Ni were studied using micromagnetic simulation by the Object Oriented Micromagnetic Framework. Before all the results, we will present an analysis of the properties for an individual hollow sphere in order to separate the real effects due to the array. The results in this paper are divided into three parts in order to analyze the magnetic behaviors in the static and dynamic regimes. The first part presents calculations for the magnetic field applied parallel to the plane of the array; specifically, we present the magnetization for equilibrium configurations. The obtained magnetization curves show that decreasing the thickness of the shell decreases the coercive field and it is difficult to obtain magnetic saturation. The values of the coercive field obtained in our work are of the same order as reported in experimental studies in the literature. The magnetic response in our study is dominated by the shape effects and we obtained high values for the reduced remanence, Mr/MS = 0.8. In the second part of this paper, we have changed the orientation of the magnetic field and calculated hysteresis curves to study the angular dependence of the coercive field and remanence. In thin shells, we have observed how the moments are oriented tangentially to the spherical surface. For the inversion of the magnetic moments we have observed the formation of vortex and onion modes. In the third part of this paper, we present an analysis for the process of magnetization reversal in the dynamic regime. The analysis showed that inversion occurs in the nonhomogeneous configuration. We could see that self-demagnetizing effects are predominant in the magnetic properties of the array. We could also observe that there are two contributions: one due to the shell as an independent object and the other due to the effects of the array.
The inevitable temperature raise leads to the demagnetization of permanent magnet synchronous motor (PMSM), that is undesirable in the application of electrical vehicle. This paper presents a nonlinear demagnetization model taking into account temperature with the Wiener structure and neural network characteristics. The remanence and intrinsic coercivity are chosen as intermediate variables, thus the relationship between motor temperature and maximal permanent magnet flux is described by the proposed neural Wiener model. Simulation and experimental results demonstrate the precision of temperature dependent demagnetization model. This work makes the basis of temperature compensation for the output torque from PMSM.
The landscape of automotive in-vehicle networks is changing driven by the vast options for infotainment features and progress toward fully-autonomous vehicles. However, the security of automotive networks is lagging behind feature-driven technologies, and new vulnerabilities are constantly being discovered. In this paper, we introduce a road map towards a security solution for in-vehicle networks that can detect anomalous and failed states of the network and adaptively respond in real-time to maintain a fail-operational system.
In the modern-day development, projects use Continuous Integration Services (CISs) to execute the build for every change in the source code. To ensure that the project remains correct and deployable, a CIS performs a clean build each time. In a clean environment, a build system needs to retrieve the project's dependencies (e.g., guava.jar). The retrieval, however, can be costly due to dependency bloat: despite a project using only a few files from each library, the existing build systems still eagerly retrieve all the libraries at the beginning of the build. This paper presents a novel build system, Molly, which lazily retrieves parts of libraries (i.e., files) that are needed during the execution of a build target. For example, the compilation target needs only public interfaces of classes within the libraries and the test target needs only implementation of the classes that are being invoked by the tests. Additionally, Molly generates a transfer script that retrieves parts of libraries based on prior builds. Molly's design requires that we ignore the boundaries set by the library developers and look at the files within the libraries. We implemented Molly for Java and evaluated it on 17 popular open-source projects. We show that test targets (on average) depend on only 9.97% of files in libraries. A variant of Molly speeds up retrieval by 44.28%. Furthermore, the scripts generated by Molly retrieve dependencies, on average, 93.81% faster than the Maven build system.
Developers often wonder how to implement a certain functionality (e.g., how to parse XML files) using APIs. Obtaining an API usage sequence based on an API-related natural language query is very helpful in this regard. Given a query, existing approaches utilize information retrieval models to search for matching API sequences. These approaches treat queries and APIs as bags-of-words and lack a deep understanding of the semantics of the query. We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query. Instead of a bag-of-words assumption, it learns the sequence of words in a query and the sequence of associated APIs. DeepAPI adapts a neural language model named RNN Encoder-Decoder. It encodes a word sequence (user query) into a fixed-length context vector, and generates an API sequence based on the context vector. We also augment the RNN Encoder-Decoder by considering the importance of individual APIs. We empirically evaluate our approach with more than 7 million annotated code snippets collected from GitHub. The results show that our approach generates largely accurate API sequences and outperforms the related approaches.
We present NonDex, a tool for detecting and debugging wrong assumptions on Java APIs. Some APIs have underdetermined specifications to allow implementations to achieve different goals, e.g., to optimize performance. When clients of such APIs assume stronger-than-specified guarantees, the resulting client code can fail. For example, HashSet’s iteration order is underdetermined, and code assuming some implementation-specific iteration order can fail. NonDex helps to proactively detect and debug such wrong assumptions. NonDex performs detection by randomly exploring different behaviors of underdetermined APIs during test execution. When a test fails during exploration, NonDex searches for the invocation instance of the API that caused the failure. NonDex is open source, well-integrated with Maven, and also runs from the command line. During our experiments with the NonDex Maven plugin, we detected 21 new bugs in eight Java projects from GitHub, and, using the debugging feature of NonDex, we identified the underlying wrong assumptions for these 21 new bugs and 54 previously detected bugs. We opened 13 pull requests; developers already accepted 12, and one project changed the continuous-integration configuration to run NonDex on every push. The demo video is at: https://youtu.be/h3a9ONkC59c
Named Data Networking (NDN), a clean-slate data oriented Internet architecture targeting on replacing IP, brings many potential benefits for content distribution. Real deployment of NDN is crucial to verify this new architecture and promote academic research, but work in this field is at an early stage. Due to the fundamental design paradigm difference between NDN and IP, Deploying NDN as IP overlay causes high overhead and inefficient transmission, typically in streaming applications. Aiming at achieving efficient NDN streaming distribution, this paper proposes a transitional architecture of NDN/IP hybrid network dubbed Centaur, which embodies both NDN's smartness, scalability and IP's transmission efficiency and deployment feasibility. In Centaur, the upper NDN module acts as the smart head while the lower IP module functions as the powerful feet. The head is intelligent in content retrieval and self-control, while the IP feet are able to transport large amount of media data faster than that if NDN directly overlaying on IP. To evaluate the performance of our proposal, we implement a real streaming prototype in ndnSIM and compare it with both NDN-Hippo and P2P under various experiment scenarios. The result shows that Centaur can achieve better load balance with lower overhead, which is close to the performance that ideal NDN can achieve. All of these validate that our proposal is a promising choice for the incremental and compatible deployment of NDN.
In this paper we analyse possibilities of application of post-quantum code based signature schemes for message authentication purposes. An error-correcting code based digital signature algorithm is presented. There also shown results of computer simulation for this algorithm in case of Reed-Solomon codes and the estimated efficiency of its software implementation. We consider perspectives of error-correcting codes for message authentication and outline further research directions.
Early Access DOI: 10.1109/TCNS.2016.2619900