Biblio
In this research paper, we present a function-based methodology to evaluate the resilience of gas pipeline systems under two different cyber-physical attack scenarios. The first attack scenario is the pressure integrity attack on the natural gas high-pressure transmission pipeline. Through simulations, we have analyzed the cyber attacks that propagate from cyber to the gas pipeline physical domain, the time before which the SCADA system should respond to such attacks, and finally, an attack which prevents the response of the system. We have used the combined results of simulations of a wireless mesh network for remote terminal units and of a gas pipeline simulation to measure the shortest Time to Criticality (TTC) parameter; the time for an event to reach the failure state. The second attack scenario describes how a failure of a cyber node controlling power grid functionality propagates from cyber to power to gas pipeline systems. We formulate this problem using a graph-theoretic approach and quantify the resilience of the networks by percentage of connected nodes and the length of the shortest path between them. The results show that parameters such as TTC, power distribution capacity of the power grid nodes and percentage of the type of cyber nodes compromised, regulate the efficiency and resilience of the power and gas networks. The analysis of such attack scenarios helps the gas pipeline system administrators design attack remediation algorithms and improve the response of the system to an attack.
Multipath TCP (MP-TCP) has the potential to greatly improve application performance by using multiple paths transparently. We propose a fluid model for a large class of MP-TCP algorithms and identify design criteria that guarantee the existence, uniqueness, and stability of system equilibrium. We clarify how algorithm parameters impact TCP-friendliness, responsiveness, and window oscillation and demonstrate an inevitable tradeoff among these properties. We discuss the implications of these properties on the behavior of existing algorithms and motivate our algorithm Balia (balanced linked adaptation), which generalizes existing algorithms and strikes a good balance among TCP-friendliness, responsiveness, and window oscillation. We have implemented Balia in the Linux kernel. We use our prototype to compare the new algorithm to existing MP-TCP algorithms.
The phenomenon of metal-insulator-transition (MIT) in strongly correlated oxides, such as NbO2, have shown the oscillation behavior in recent experiments. In this work, the MIT based two-terminal device is proposed as a compact oscillation neuron for the parallel read operation from the resistive synaptic array. The weighted sum is represented by the frequency of the oscillation neuron. Compared to the complex CMOS integrate-and-fire neuron with tens of transistors, the oscillation neuron achieves significant area reduction, thereby alleviating the column pitch matching problem of the peripheral circuitry in resistive memories. Firstly, the impact of MIT device characteristics on the weighted sum accuracy is investigated when the oscillation neuron is connected to a single resistive synaptic device. Secondly, the array-level performance is explored when the oscillation neurons are connected to the resistive synaptic array. To address the interference of oscillation between columns in simple cross-point arrays, a 2-transistor-1-resistor (2T1R) array architecture is proposed at negligible increase in array area. Finally, the circuit-level benchmark of the proposed oscillation neuron with the CMOS neuron is performed. At single neuron node level, oscillation neuron shows textgreater12.5X reduction of area. At 128×128 array level, oscillation neuron shows a reduction of ˜4% total area, textgreater30% latency, ˜5X energy and ˜40X leakage power, demonstrating its advantage of being integrated into the resistive synaptic array for neuro-inspired computing.
We introduce a simplified island model with behavior similar to the λ (1+1) islands optimizing the Maze fitness function, and investigate the effects of the migration topology on the ability of the simplified island model to track the optimum of a dynamic fitness function. More specifically, we prove that there exist choices of model parameters for which using a unidirectional ring as the migration topology allows the model to track the oscillating optimum through n Maze-like phases with high probability, while using a complete graph as the migration topology results in the island model losing track of the optimum with overwhelming probability. Additionally, we prove that if migration occurs only rarely, denser migration topologies may be advantageous. This serves to illustrate that while a less-dense migration topology may be useful when optimizing dynamic functions with oscillating behavior, and requires less problem-specific knowledge to determine when migration may be allowed to occur, care must be taken to ensure that a sufficient amount of migration occurs during the optimization process.
Fabrication process introduces some inherent variability to the attributes of transistors (in particular length, widths, oxide thickness). As a result, every chip is physically unique. Physical uniqueness of microelectronics components can be used for multiple security applications. Physically Unclonable Functions (PUFs) are built to extract the physical uniqueness of microelectronics components and make it usable for secure applications. However, the microelectronics components used by PUFs designs suffer from external, environmental variations that impact the PUF behavior. Variations of temperature gradients during manufacturing can bias the PUF responses. Variations of temperature or thermal noise during PUF operation change the behavior of the circuit, and can introduce errors in PUF responses. Detailed knowledge of the behavior of PUFs operating over various environmental factors is needed to reliably extract and demonstrate uniqueness of the chips. In this work, we present a detailed and exhaustive analysis of the behavior of two PUF designs, a ring oscillator PUF and a timing path violation PUF. We have implemented both PUFs using FPGA fabricated by Xilinx, and analyzed their behavior while varying temperature and supply voltage. Our experiments quantify the robustness of each design, demonstrate their sensitivity to temperature and show the impact which supply voltage has on the uniqueness of the analyzed PUFs.
Convolutional Neural Network (CNN) is a powerful technique widely used in computer vision area, which also demands much more computations and memory resources than traditional solutions. The emerging metal-oxide resistive random-access memory (RRAM) and RRAM crossbar have shown great potential on neuromorphic applications with high energy efficiency. However, the interfaces between analog RRAM crossbars and digital peripheral functions, namely Analog-to-Digital Converters (ADCs) and Digital-to-Analog Converters (DACs), consume most of the area and energy of RRAM-based CNN design due to the large amount of intermediate data in CNN. In this paper, we propose an energy efficient structure for RRAM-based CNN. Based on the analysis of data distribution, a quantization method is proposed to transfer the intermediate data into 1 bit and eliminate DACs. An energy efficient structure using input data as selection signals is proposed to reduce the ADC cost for merging results of multiple crossbars. The experimental results show that the proposed method and structure can save 80% area and more than 95% energy while maintaining the same or comparable classification accuracy of CNN on MNIST.
Current-mode Analog-to-Digital Converter (ADC) has drawn many attentions due to its high operating speed, power and ground noise immunity, and etc. However, 2n – 1 comparators are required in traditional n-bit current-mode ADC design, leading to inevitable high power consumption and large chip area. In this work, we propose a low power and compact current mode Multi-Threshold Comparator (MTC) based on giant Spin Hall Effect (SHE). The two threshold currents of the proposed SHE-MTC are 200μA and 250μA with 1ns switching time, respectively. The proposed current-mode hybrid spin-CMOS flash ADC based on SHE-MTC reduces the number of comparators almost by half (2n-1), thus correspondingly reducing the required current mirror branches, total power consumption and chip area. Moreover, due to the non-volatility of SHE-MTC, the front-end analog circuits can be switched off when it is not required to further increase power efficiency. The device dynamics of SHE-MTC is simulated using a numerical device model based on Landau-Lifshitz-Gilbert (LLG) equation with Spin-Transfer Torque (STT) term and SHE term. The device-circuit co-simulation in SPICE (45nm CMOS technology) have shown that the average power dissipation of proposed ADC is 1.9mW, operating at 500MS/s with 1.2 V power supply. The INL and DNL are in the range of 0.23LSB and 0.32LSB, respectively.
Information retrieval methods, especially considering multimedia data, have evolved towards the integration of multiple sources of evidence in the analysis of the relevance of the items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevancediversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal online learning-to-rank method trained with relevance feedback over diversified results Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking is effective on improving result relevance and also boosts diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversity
Users’ online behaviors such as ratings and examination of items are recognized as one of the most valuable sources of information for learning users’ preferences in order to make personalized recommendations. But most previous works focus on modeling only one type of users’ behaviors such as numerical ratings or browsing records, which are referred to as explicit feedback and implicit feedback, respectively. In this article, we study a Semisupervised Collaborative Recommendation (SSCR) problem with labeled feedback (for explicit feedback) and unlabeled feedback (for implicit feedback), in analogy to the well-known Semisupervised Learning (SSL) setting with labeled instances and unlabeled instances. SSCR is associated with two fundamental challenges, that is, heterogeneity of two types of users’ feedback and uncertainty of the unlabeled feedback. As a response, we design a novel Self-Transfer Learning (sTL) algorithm to iteratively identify and integrate likely positive unlabeled feedback, which is inspired by the general forward/backward process in machine learning. The merit of sTL is its ability to learn users’ preferences from heterogeneous behaviors in a joint and selective manner. We conduct extensive empirical studies of sTL and several very competitive baselines on three large datasets. The experimental results show that our sTL is significantly better than the state-of-the-art methods.
A power-efficient programmable-gain control function embedded Delta-Sigma (ΔΣ) analog-to-digital converter (ADC) for various smart sensor applications is presented. It consists of a programmable-gain switched-capacitor ΔΣ modulator followed by a digital decimation filter for down-sampling. The programmable function is realized with programmable coefficients of a loop filter using a capacitor array. The coefficient control is accomplished with keeping the location of poles of a noise transfer function, so the stability of a designed closed-loop transfer function can be assured. The proposed gain control method helps ADC to optimize its performance with varying input signal magnitude. The gain controllability requires negligible additional energy consuming or area occupying block. The power efficient programmable-gain ADC (PGADC) is well-suited for sensor devices. The gain amplification can be optimized from 0 to 18 dB with a 6 dB step. Measurements show that the PGADC achieves 15.2-bit resolution and 12.4-bit noise free resolution with 99.9 % reliability. The chip operates with a 3.3 V analog supply and a 1.8 V digital supply, while consuming only 97 μA analog current and 37 μA digital current. The analog core area is 0.064 mm2 in a standard 0.18-μm CMOS process.
Head portraits are popular in traditional painting. Automating portrait painting is challenging as the human visual system is sensitive to the slightest irregularities in human faces. Applying generic painting techniques often deforms facial structures. On the other hand portrait painting techniques are mainly designed for the graphite style and/or are based on image analogies; an example painting as well as its original unpainted version are required. This limits their domain of applicability. We present a new technique for transferring the painting from a head portrait onto another. Unlike previous work our technique only requires the example painting and is not restricted to a specific style. We impose novel spatial constraints by locally transferring the color distributions of the example painting. This better captures the painting texture and maintains the integrity of facial structures. We generate a solution through Convolutional Neural Networks and we present an extension to video. Here motion is exploited in a way to reduce temporal inconsistencies and the shower-door effect. Our approach transfers the painting style while maintaining the input photograph identity. In addition it significantly reduces facial deformations over state of the art.
Humans are often able to generalize knowledge learned from a single exemplar. In this paper, we present a novel integration of mental simulation and analogical generalization algorithms into a cognitive robotic architecture that enables a similarly rudimentary generalization capability in robots. Specifically, we show how a robot can generate variations of a given scenario and then use the results of those new scenarios run in a physics simulator to generate generalized action scripts using analogical mappings. The generalized action scripts then allow the robot to perform the originally learned activity in a wider range of scenarios with different types of objects without the need for additional exploration or practice. In a proof-of-concept demonstration we show how the robot can generalize from a previously learned pick-and-place action performed with a single arm on an object with a handle to a pick-and-place action of a cylindrical object with no handle with two arms.
Although computing students may enjoy when their instructors teach using analogies, it is unknown to what extent these analogies are useful for their learning. This study examines the value of analogies when used to introduce three introductory computing topics. The value of these analogies may be evident during the teaching process itself (short term), in subsequent exams (long term), or in students' ability to apply their understanding to related non-technical areas (transfer). Comparing results between an experimental group (analogy) and control group (no analogy), we find potential value for analogies in short term learning. However, no solid evidence was found to support analogies as valuable for students in the long term or for knowledge transfer. Specific demographic groups were examined and promising preliminary findings are presented.
Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.
In many domains, a plethora of textual information is available on the web as news reports, blog posts, community portals, etc. Information extraction (IE) is the default technique to turn unstructured text into structured fact databases, but systematically applying IE techniques to web input requires highly complex systems, starting from focused crawlers over quality assurance methods to cope with the HTML input to long pipelines of natural language processing and IE algorithms. Although a number of tools for each of these steps exists, their seamless, flexible, and scalable combination into a web scale end-to-end text analytics system still is a true challenge. In this paper, we report our experiences from building such a system for comparing the "web view" on health related topics with that derived from a controlled scientific corpus, i.e., Medline. The system combines a focused crawler, applying shallow text analysis and classification to maintain focus, with a sophisticated text analytic engine inside the Big Data processing system Stratosphere. We describe a practical approach to seed generation which led us crawl a corpus of \textasciitilde1 TB web pages highly enriched for the biomedical domain. Pages were run through a complex pipeline of best-of-breed tools for a multitude of necessary tasks, such as HTML repair, boilerplate detection, sentence detection, linguistic annotation, parsing, and eventually named entity recognition for several types of entities. Results are compared with those from running the same pipeline (without the web-related tasks) on a corpus of 24 million scientific abstracts and a third corpus made of \textasciitilde250K scientific full texts. We evaluate scalability, quality, and robustness of the employed methods and tools. The focus of this paper is to provide a large, real-life use case to inspire future research into robust, easy-to-use, and scalable methods for domain-specific IE at web scale.
Text mining has developed and emerged as an essential tool for revealing the hidden value in the data. Text mining is an emerging technique for companies around the world and suitable for large enduring analyses and discrete investigations. Since there is a need to track disrupting technologies, explore internal knowledge bases or review enormous data sets. Most of the information produced due to conversation transcripts is an unstructured format. These data have ambiguity, redundancy, duplications, typological errors and many more. The processing and analysis of these unstructured data are difficult task. But, there are several techniques in text mining are available to extract keywords from these unstructured conversation transcripts. Keyword Extraction is the process of examining the most significant word in the context which helps to take decisions in a much faster manner. The main objective of the proposed work is extracting the keywords from meeting transcripts by using the Swarm Intelligence (SI) techniques. Here Stochastic Diffusion Search (SDS) algorithm is used for keyword extraction and Firefly algorithm used for clustering. These techniques will be implemented for an extensive range of optimization problems and produced better results when compared with existing technique.
Many books and papers describe how to do data science. While those texts are useful, it can also be important to reflect on anti-patterns; i.e. common classes of errors seen when large communities of researchers and commercial software engineers use, and misuse data mining tools. This technical briefing will present those errors and show how to avoid them.
The web world is been flooded with multi-media sources such as images, videos, animations and audios, which has in turn made the computer vision researchers to focus over extracting the content from the sources. Scene text recognition basically involves two major steps namely Text Localization and Text Recognition. This paper provides end-to-end text recognition approach to extract the characters alone from the complex natural scene. Using Maximal Stable Extremal Region (MSER) the various objects are localized, using Canny Edge detection method edges are identified, further binary classification is done using Connected-Component method which segregates the text and nontext objects and finally the stroke analysis method is applied to analyse the style of the character, leading to the character recognization. The Experimental results were obtained by testing the approach over ICDAR2015 dataset, wherein text was able to be recognized from most of the scene images with good precision value.
By reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms.
The identification of relevant information in large text databases is a challenging task. One of the reasons is human beings' limitations in handling large volumes of data. A common solution for scavenging data from texts are word clouds. A word cloud illustrates word usage in a document by resizing individual words in documents proportionally to how frequently they appear. Even though word clouds are easy to understand, they are not particularly efficient, because they are static. In addition, the presented information lacks context, i.e., words are not explained and they may lead to radically erroneous interpretations. To tackle these problems we developed VCloud, a tool that allows the user to interact with word clouds, therefore allowing informative and interactive data exploration. Furthermore, our tool also allows one to compare two data sets presented as word clouds. We evaluated VCloud using real data about the evolution of gastritis research through the years. The papers indexed by Pubmed related to this medical context were selected for visualization and data analysis using VCloud. A domain expert explored these visualizations, being able to extract useful information from it. This illustrates how can VCloud be a valuable tool for visual text analytics.
Analyzing and gaining insights from a large amount of textual conversations can be quite challenging for a user, especially when the discussions become very long. During my doctoral research, I have focused on integrating Information Visualization (InfoVis) with Natural Language Processing (NLP) techniques to better support the user's task of exploring and analyzing conversations. For this purpose, I have designed a visual text analytics system that supports the user exploration, starting from a possibly large set of conversations, then narrowing down to a subset of conversations, and eventually drilling-down to a set of comments of one conversation. While so far our approach is evaluated mainly based on lab studies, in my on-going and future work I plan to evaluate our approach via online longitudinal studies.
In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; iii. and event analytics.
Online conversations, such as blogs, provide rich amount of information and opinions about popular queries. Given a query, traditional blog sites return a set of conversations often consisting of thousands of comments with complex thread structure. Since the interfaces of these blog sites do not provide any overview of the data, it becomes very difficult for the user to explore and analyze such a large amount of conversational data. In this paper, we present MultiConVis, a visual text analytics system designed to support the exploration of a collection of online conversations. Our system tightly integrates NLP techniques for topic modeling and sentiment analysis with information visualizations, by considering the unique characteristics of online conversations. The resulting interface supports the user exploration, starting from a possibly large set of conversations, then narrowing down to the subset of conversations, and eventually drilling-down to the set of comments of one conversation. Our evaluations through case studies with domain experts and a formal user study with regular blog readers illustrate the potential benefits of our approach, when compared to a traditional blog reading interface.
It is estimated that 50% of the global population lives in urban areas occupying just 0.4% of the Earth's surface. Understanding urban activity constitutes monitoring population density and its changes over time, in urban environments. Currently, there are limited mechanisms to non-intrusively monitor population density in real-time. The pervasive use of cellular phones in urban areas is one such mechanism that provides a unique opportunity to study population density by monitoring the mobility patterns in near real-time. Cellular carriers such as AT&T harvest such data through their cell towers; however, this data is proprietary and the carriers restrict access, due to privacy concerns. In this work, we propose a system that passively senses the population density and infers mobility patterns in an urban area by monitoring power spectral density in cellular frequency bands using periodic beacons from each cellphone without knowing who and where they are located. A wireless sensor network platform is being developed to perform spectral monitoring along with environmental measurements. Algorithms are developed to generate real-time fine-resolution population estimates.
Spoofing is a serious threat to the widespread use of Global Navigation Satellite Systems (GNSSs) such as GPS and can be expected to play an important role in the security of many future IoT systems that rely on time, location, or navigation information. In this paper, we focus on the technique of multi-receiver GPS spoofing detection, so far only proposed theoretically. This technique promises to detect malicious spoofing signals by making use of the reported positions of several GPS receivers deployed in a fixed constellation. We scrutinize the assumptions of prior work, in particular the error models, and investigate how these models and their results can be improved due to the correlation of errors at co-located receiver positions. We show that by leveraging spatial noise correlations, the false acceptance rate of the countermeasure can be improved while preserving the sensitivity to attacks. As a result, receivers can be placed significantly closer together than previously expected, which broadens the applicability of the countermeasure. Based on theoretical and practical investigations, we build the first realization of a multi-receiver countermeasure and experimentally evaluate its performance both in authentic and in spoofing scenarios.