Biblio
Optical Coherence Tomography (OCT) has shown a great potential as a complementary imaging tool in the diagnosis of skin diseases. Speckle noise is the most prominent artifact present in OCT images and could limit the interpretation and detection capabilities. In this work we evaluate various denoising filters with high edge-preserving potential for the reduction of speckle noise in 256 dermatological OCT B-scans. Our results show that the Enhanced Sigma Filter and the Block Matching 3-D (BM3D) as 2D denoising filters and the Wavelet Multiframe algorithm considering adjacent B-scans achieved the best results in terms of the enhancement quality metrics used. Our results suggest that a combination of 2D filtering followed by a wavelet based compounding algorithm may significantly reduce speckle, increasing signal-to-noise and contrast-to-noise ratios, without the need of extra acquisitions of the same frame.
Segmentation of land and water regions is necessary in many applications involving analysis of remote sensing imagery. Not only is manual segmentation of these regions prone to considerable subjective variability, but the large volume of imagery collected by modern platforms makes manual segmentation extremely tedious to perform, particularly in applications that require frequent re-measurement. This paper examines a robust, semi-automated approach that utilizes simple and efficient machine learning algorithms to perform supervised classification of multi-spectral image data into land and water regions. By combining the four wavelength bands widely available in imaging platforms such as IKONOS, QuickBird, and GeoEye-1 with basic texture metrics, high quality segmentation can be achieved. An efficient workflow was created by constructing a Graphical User Interface (GUI) to these machine learning algorithms.
The limited battery lifetime and rapidly increasing functionality of portable multimedia devices demand energy-efficient designs. The filters employed mainly in these devices are based on Gaussian smoothing, which is slow and, severely affects the performance. In this paper, we propose a novel energy-efficient approximate 2D Gaussian smoothing filter (2D-GSF) architecture by exploiting "nearest pixel approximation" and rounding-off Gaussian kernel coefficients. The proposed architecture significantly improves Speed-Power-Area-Accuracy (SPAA) metrics in designing energy-efficient filters. The efficacy of the proposed approximate 2D-GSF is demonstrated on real application such as edge detection. The simulation results show 72%, 79% and 76% reduction in area, power and delay, respectively with acceptable 0.4dB loss in PSNR as compared to the well-known approximate 2D-GSF.
Next generation cellular networks will provide users better experiences by densely deploying smaller cells, which results in more complicated interferences environment. In order to coordinate interference, power control for uplink is particularly challenging due to random locations of uplink transmitter and dense deployment. In this paper, we address the uplink fractional power control (FPC) optimization problem from network optimization perspective. The relations between FPC parameters and network KPIs (Key Performance Indicators) are investigated. Rather than considering any single KPI in conventional approaches, multi-KPI optimization problem is formulated and solved. By relaxing the discrete optimization problem to a continuous one, the gradients of multiple KPIs with respect to FPC parameters are derived. The gradient enables efficiently searching for optimized FPC parameters which is particularly desirable for dense deployment of large number of cells. Simulation results show that the proposed scheme greatly outperforms the traditional one, in terms of network mean load, call drop & block ratio, and convergence speed.
We consider a class of robust optimization problems that we call “robust-to-dynamics optimization” (RDO). The input to an RDO problem is twofold: (i) a mathematical program (e.g., an LP, SDP, IP, etc.), and (ii) a dynamical system (e.g., a linear, nonlinear, discrete, or continuous dynamics). The objective is to maximize over the set of initial conditions that forever remain feasible under the dynamics. The focus of this paper is on the case where the optimization problem is a linear program and the dynamics are linear. We establish some structural properties of the feasible set and prove that if the linear system is asymptotically stable, then the RDO problem can be solved in polynomial time. We also outline a semidefinite programming based algorithm for providing upper bounds on robust-to-dynamics linear programs.
In this paper, we focus on energy management of distributed generators (DGs) and energy storage system (ESS) in microgrids (MG) considering uncertainties in renewable energy and load demand. The MG energy management problem is formulated as a two-stage stochastic programming model based on optimization principle. Then, the optimization model is decomposed into a mixed integer quadratic programming problem by using discrete stochastic scenarios to approximate the continuous random variables. A Scenarios generation approach based on time-homogeneous Markov chain model is proposed to generate simulated time-series of renewable energy generation and load demand. Finally, the proposed stochastic programming model is tested in a typical LV network and solved by Matlab optimization toolbox. The simulation results show that the proposed stochastic programming model has a better performance to obtain robust scheduling solutions and lower the operating cost compared to the deterministic optimization modeling methods.
Because of poor performance of heuristic algorithms on virtual machine placement problem in cloud environments, a multi-objective constraint optimization model of virtual machine placement is presented, which taking energy consumption and resource wastage as the objective. We solve the model based on the proposed discrete firefly algorithm. It takes firefly's location as the placement result, brightness as the objective value. Its movement strategy makes darker fireflies move to brighter fireflies in solution space. The continuous position after movement is discretized by the proposed discrete strategy. In order to speed up the search for solution, the local search mechanism for the optimal solution is introduced. The experimental results in OpenStack cloud platform show that the proposed algorithm makes less energy consumption and resource wastage compared with other algorithms.
Poor data quality has become a persistent challenge for organizations as data continues to grow in complexity and size. Existing data cleaning solutions focus on identifying repairs to the data to minimize either a cost function or the number of updates. These techniques, however, fail to consider underlying data privacy requirements that exist in many real data sets containing sensitive and personal information. In this demonstration, we present PARC, a Privacy-AwaRe data Cleaning system that corrects data inconsistencies w.r.t. a set of FDs, and limits the disclosure of sensitive values during the cleaning process. The system core contains modules that evaluate three key metrics during the repair search, and solves a multi-objective optimization problem to identify repairs that balance the privacy vs. utility tradeoff. This demonstration will enable users to understand: (1) the characteristics of a privacy-preserving data repair; (2) how to customize data cleaning and data privacy requirements using two real datasets; and (3) the distinctions among the repair recommendations via visualization summaries.
Recent advances in differential privacy make it possible to guarantee user privacy while preserving the main characteristics of the data. However, most differential privacy mechanisms assume that the underlying dataset is clean. This paper explores the link between data cleaning and differential privacy in a framework we call PrivateClean. PrivateClean includes a technique for creating private datasets of numerical and discrete-valued attributes, a formalism for privacy-preserving data cleaning, and techniques for answering sum, count, and avg queries after cleaning. We show: (1) how the degree of privacy affects subsequent aggregate query accuracy, (2) how privacy potentially amplifies certain types of errors in a dataset, and (3) how this analysis can be used to tune the degree of privacy. The key insight is to maintain a bipartite graph relating dirty values to clean values and use this graph to estimate biases due to the interaction between cleaning and privacy. We validate these results on four datasets with a variety of well-studied cleaning techniques including using functional dependencies, outlier filtering, and resolving inconsistent attributes.
Today's cellular core, which connects the radio access network to the Internet, relies on fixed hardware appliances placed at a few dedicated locations and uses relatively static routing policies. As such, today's core design has key limitations—it induces inefficient provisioning tradeoffs and is poorly equipped to handle overload, failure scenarios, and diverse application requirements. To address these limitations, ongoing efforts envision "clean slate" solutions that depart from cellular standards and routing protocols; e.g., via programmable switches at base stations and per-flow SDN-like orchestration. The driving question of this work is to ask if a clean-slate redesign is necessary and if not, how can we design a flexible cellular core that is minimally disruptive. We propose KLEIN, a design that stays within the confines of current cellular standards and addresses the above limitations by combining network functions virtualization with smart resource management. We address key challenges w.r.t. scalability and responsiveness in realizing KLEIN via backwards-compatible orchestration mechanisms. Our evaluations through data-driven simulations and real prototype experiments using OpenAirInterface show that KLEIN can scale to billions of devices and is close to optimal for wide variety of traffic and deployment parameters.
Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data types involve parsing and pretty printing their string representations. This presents many challenges to users because cleaning such data requires some background knowledge about the data itself and moreover this data is typically non-uniform, unstructured, and ambiguous. Spreadsheet systems and Programming Languages provide some UI-based and programmatic solutions for this problem but they are either insufficient for the user's needs or are beyond their expertise. In this paper, we present a programming by example methodology of cleaning data types that learns the desired transformation from a few input-output examples. We propose a domain specific language with probabilistic semantics that is parameterized with declarative data type definitions. The probabilistic semantics is based on three key aspects: (i) approximate predicate matching, (ii) joint learning of data type interpretation, and (iii) weighted branches. This probabilistic semantics enables the language to handle non-uniform, unstructured, and ambiguous data. We then present a synthesis algorithm that learns the desired program in this language from a set of input-output examples. We have implemented our algorithm as an Excel add-in and present its successful evaluation on 55 benchmark problems obtained from online help forums and Excel product team.
Repairing erroneous or conflicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.
Databases can be corrupted with various errors such as missing, incorrect, or inconsistent values. Increasingly, modern data analysis pipelines involve Machine Learning, and the effects of dirty data can be difficult to debug.Dirty data is often sparse, and naive sampling solutions are not suited for high-dimensional models. We propose ActiveClean, a progressive framework for training Machine Learning models with data cleaning. Our framework updates a model iteratively as the analyst cleans small batches of data, and includes numerous optimizations such as importance weighting and dirty data detection. We designed a visual interface to wrap around this framework and demonstrate ActiveClean for a video classification problem and a topic modeling problem.
Flooding attacks are well-known security threats that can lead to a denial of service (DoS) in computer networks. These attacks consist of an excessive traffic generation, by which an attacker aim to disrupt or interrupt some services in the network. The impact of flooding attacks is not just about some nodes, it can be also the whole network. Many routing protocols are vulnerable to these attacks, especially those using reactive mechanism of route discovery, like AODV. In this paper, we propose a statistical approach to defense against RREQ flooding attacks in MANETs. Our detection mechanism can be applied on AODV-based ad hoc networks. Simulation results prove that these attacks can be detected with a low rate of false alerts.
Authorities like the Federal Financial Institutions Examination Council in the US and the European Central Bank in Europe have stepped up their expected minimum security requirements for financial institutions, including the requirements for risk analysis. In a previous article, we introduced a visual tool and a systematic way to estimate the probability of a successful incident response process, which we called an incident response tree (IRT). In this article, we present several scenarios using the IRT which could be used in a risk analysis of online financial services concerning fraud prevention. By minimizing the problem of underreporting, we are able to calculate the conditional probabilities of prevention, detection, and response in the incident response process of a financial institution. We also introduce a quantitative model for estimating expected loss from fraud, and conditional fraud value at risk, which enables a direct comparison of risk among online banking channels in a multi-channel environment.
The ExFAT file system is for large capacity flash memory medium. On the base of analyzing the characteristics of ExFAT file system, this paper presents a model of electronic data recovery forensics and judicial Identification based on ExFAT. The proposed model aims at different destroyed situation of data recovery medium. It uses the file location algorithm, file character code algorithm, document fragment reassembly algorithm for accurate, efficient recovery of electronic data for forensics and judicial Identification. The model implements the digital multi-signature, process monitoring, media mirror and Hash authentication in the data recovery process to improve the acceptability, weight of evidence and Legal effect of the electronic data in the lawsuit. The experimental results show that the model has good work efficiency based on accuracy.
Digital Forensics is an area of Forensics Science that uses the application of scientific method toward crime investigation. The thwarting of forensic evidence is known as anti-forensics, the aim of which is ambiguous in the sense that it could be bad or good. The aim of this project is to simulate digital crimes scenario and carry out forensic and anti-forensic analysis to enhance security. This project uses several forensics and anti-forensic tools and techniques to carry out this work. The data analyzed were gotten from result of the simulation. The results reveal that although it might be difficult to investigate digital crime but with the help of sophisticated forensic tools/anti-forensics tools it can be accomplished.
The rate at which cyber-attacks are increasing globally portrays a terrifying picture upfront. The main dynamics of such attacks could be studied in terms of the actions of attackers and defenders in a cyber-security game. However currently little research has taken place to study such interactions. In this paper we use behavioral game theory and try to investigate the role of certain actions taken by attackers and defenders in a simulated cyber-attack scenario of defacing a website. We choose a Reinforcement Learning (RL) model to represent a simulated attacker and a defender in a 2×4 cyber-security game where each of the 2 players could take up to 4 actions. A pair of model participants were computationally simulated across 1000 simulations where each pair played at most 30 rounds in the game. The goal of the attacker was to deface the website and the goal of the defender was to prevent the attacker from doing so. Our results show that the actions taken by both the attackers and defenders are a function of attention paid by these roles to their recently obtained outcomes. It was observed that if attacker pays more attention to recent outcomes then he is more likely to perform attack actions. We discuss the implication of our results on the evolution of dynamics between attackers and defenders in cyber-security games.
With the growth of the Internet, web applications are becoming very popular in the user communities. However, the presence of security vulnerabilities in the source code of these applications is raising cyber crime rate rapidly. It is required to detect and mitigate these vulnerabilities before their exploitation in the execution environment. Recently, Open Web Application Security Project (OWASP) and Common Vulnerabilities and Exposures (CWE) reported Cross-Site Scripting (XSS) as one of the most serious vulnerabilities in the web applications. Though many vulnerability detection approaches have been proposed in the past, existing detection approaches have the limitations in terms of false positive and false negative results. This paper proposes a context-sensitive approach based on static taint analysis and pattern matching techniques to detect and mitigate the XSS vulnerabilities in the source code of web applications. The proposed approach has been implemented in a prototype tool and evaluated on a public data set of 9408 samples. Experimental results show that proposed approach based tool outperforms over existing popular open source tools in the detection of XSS vulnerabilities.
Governments needs reliable data on crime in order to both devise adequate policies, and allocate the correct revenues so that the measures are cost-effective, i.e., The money spent in prevention, detection, and handling of security incidents is balanced with a decrease in losses from offences. The analysis of the actual scenario of government actions in cyber security shows that the availability of multiple contrasting figures on the impact of cyber-attacks is holding back the adoption of policies for cyber space as their cost-effectiveness cannot be clearly assessed. The most relevant literature on the topic is reviewed to highlight the research gaps and to determine the related future research issues that need addressing to provide a solid ground for future legislative and regulatory actions at national and international levels.
Mathematical formulae are essential in science, but face challenges of ambiguity, due to the use of a small number of identifiers to represent an immense number of concepts. Corresponding to word sense disambiguation in Natural Language Processing, we disambiguate mathematical identifiers. By regarding formulae and natural text as one monolithic information source, we are able to extract the semantics of identifiers in a process we term Mathematical Language Processing (MLP). As scientific communities tend to establish standard (identifier) notations, we use the document domain to infer the actual meaning of an identifier. Therefore, we adapt the software development concept of namespaces to mathematical notation. Thus, we learn namespace definitions by clustering the MLP results and mapping those clusters to subject classification schemata. In addition, this gives fundamental insights into the usage of mathematical notations in science, technology, engineering and mathematics. Our gold standard based evaluation shows that MLP extracts relevant identifier-definitions. Moreover, we discover that identifier namespaces improve the performance of automated identifier-definition extraction, and elevate it to a level that cannot be achieved within the document context alone.
In the modern-day development, projects use Continuous Integration Services (CISs) to execute the build for every change in the source code. To ensure that the project remains correct and deployable, a CIS performs a clean build each time. In a clean environment, a build system needs to retrieve the project's dependencies (e.g., guava.jar). The retrieval, however, can be costly due to dependency bloat: despite a project using only a few files from each library, the existing build systems still eagerly retrieve all the libraries at the beginning of the build. This paper presents a novel build system, Molly, which lazily retrieves parts of libraries (i.e., files) that are needed during the execution of a build target. For example, the compilation target needs only public interfaces of classes within the libraries and the test target needs only implementation of the classes that are being invoked by the tests. Additionally, Molly generates a transfer script that retrieves parts of libraries based on prior builds. Molly's design requires that we ignore the boundaries set by the library developers and look at the files within the libraries. We implemented Molly for Java and evaluated it on 17 popular open-source projects. We show that test targets (on average) depend on only 9.97% of files in libraries. A variant of Molly speeds up retrieval by 44.28%. Furthermore, the scripts generated by Molly retrieve dependencies, on average, 93.81% faster than the Maven build system.
In this paper, we focus on the definition of estimators to predict method calls in Android apps. Estimation models are based on information from requirements specification documents (e.g., number of actors, number of use cases, and number of classes in the conceptual model). We have used a dataset containing information on 23 Android apps. After performing data-cleaning, we applied linear regression to build estimation models on 21 data points. Results suggest that measures gathered from requirements specification documents can be considered good predictors to estimate the number of internal calls (i.e., methods invoking other methods present in the app) and external calls (i.e., invocations to API) as well as their sum.
This paper addresses the problem of estimating the quality of an image as it would be perceived by a human. A well accepted approach to assess perceptual quality of an image is to quantify its loss of structural information. We propose a blind image quality assessment method that aims at quantifying structural information loss in a given (possibly distorted) image by comparing its structures with those extracted from a database of clean images. We first construct a subspace from the clean natural images using (i) principal component analysis (PCA), and (ii) overcomplete dictionary learning with sparsity constraint. While PCA provides mathematical convenience, an overcomplete dictionary is known to capture the perceptually important structures resembling the simple cells in the primary visual cortex. The subspace learned from the clean images is called the source subspace. Similarly, a subspace, called the target subspace, is learned from the distorted image. In order to quantify the structural information loss, we use a subspace alignment technique which transforms the target subspace into the source by optimizing over a transformation matrix. This transformation matrix is subsequently used to measure the global and local (patch-based) quality score of the distorted image. The quality scores obtained by the proposed method are shown to correlate well with the subjective scores obtained from human annotators. Our method achieves competitive results when evaluated on three benchmark databases.
Mutex locks have traditionally been the most common mechanism for protecting shared data structures in parallel programs. However, the robustness of such locks against process failures has not been studied thoroughly. Most (user-level) mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their time complexity. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of time complexity, which we define in terms of remote memory references.