Biblio
Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.
Image style transfer is an increasingly interesting topic in computer vision where the goal is to map images from one style to another. In this paper, we propose a new framework called Combined Layer GAN as a solution of dealing with image style transfer problem. Specifically, the edge-constraint and color-constraint are proposed and explored in the GAN based image translation method to improve the performance. The motivation of the work is that color and edge are fundamental vision factors for an image, while in the traditional deep network based approach, there is a lack of fine control of these factors in the process of translation and the performance is degraded consequently. Our experiments and evaluations show that our novel method with the edge and color constrains is more stable, and significantly improves the performance compared with the traditional methods.
We re-define multimodality and introduce a simple approach to multimodal and arbitrary style transfer. Conventionally, style transfer methods are limited to synthesizing a deterministic output based on a single style, and there has been no work that can generate multiple images of various details, or multimodality, given a single style. In this work, we explore a way to achieve multimodal and arbitrary style transfer by injecting noise to a unimodal method. This novel approach does not require any trainable parameters, and can be readily applied to any unimodal style transfer methods with separate style encoding sub-network in literature. Experimental results show that while being able to transfer an image to multiple domains in various ways, the image quality is highly competitive with contemporary models in style transfer.
To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. To the best of our knowledge this work is the first that shows and attempt to solve the issues related to multiple style transfer. We also demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp dataset to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.
Re-drawing the image as a certain artistic style is considered to be a complicated task for computer machine. On the contrary, human can easily master the method to compose and describe the style between different images. In the past, many researchers studying on the deep neural networks had found an appropriate representation of the artistic style using perceptual loss and style reconstruction loss. In the previous works, Gatys et al. proposed an artificial system based on convolutional neural networks that creates artistic images of high perceptual quality. Whereas in terms of running speed, it was relatively time-consuming, thus it cannot apply to video style transfer. Recently, a feed-forward CNN approach has shown the potential of fast style transformation, which is an end-to-end system without hundreds of iteration while transferring. We combined the benefits of both approaches, optimized the feed-forward network and defined time loss function to make it possible to implement the style transfer on video in real time. In contrast to the past method, our method runs in real time with higher resolution while creating competitive visually pleasing and temporally consistent experimental results.
The digital low dropout regulators are widely used because it can operate at low supply voltage. In the digital low drop-out regulators, the high sampling frequency circuit has a short setup time, but it will produce overshoot, and then the output can be stabilized; although the low sampling frequency circuit output can be directly stabilized, the setup time is too long. This paper proposes a two sampling frequency circuit model, which aims to include the high and low sampling frequencies in the same circuit. By controlling the sampling frequency of the circuit under different conditions, this allows the circuit to combine the advantages of the circuit operating at different sampling frequencies. This shortens the circuit setup time and the stabilization time at the same time.
Malicious domain names are consistently changing. It is challenging to keep blacklists of malicious domain names up-to-date because of the time lag between its creation and detection. Even if a website is clean itself, it does not necessarily mean that it won't be used as a pivot point to redirect users to malicious destinations. To address this issue, this paper demonstrates how to use linkage analysis and open-source threat intelligence to visualize the relationship of malicious domain names whilst verifying their categories, i.e., drive-by download, unwanted software etc. Featured by a graph-based model that could present the inter-connectivity of malicious domain names in a dynamic fashion, the proposed approach proved to be helpful for revealing the group patterns of different kinds of malicious domain names. When applied to analyze a blacklisted set of URLs in a real enterprise network, it showed better effectiveness than traditional methods and yielded a clearer view of the common patterns in the data.
The rapid growth of Android malware apps poses a great security threat to users thus it is very important and urgent to detect Android malware effectively. What's more, the increasing unknown malware and evasion technique also call for novel detection method. In this paper, we focus on API feature and develop a novel method to detect Android malware. First, we propose a novel selection method for API feature related with the malware class. However, such API also has a legitimate use in benign app thus causing FP problem (misclassify benign as malware). Second, we further explore structure relationships between these APIs and map to a matrix interpreted as the hand-refined API-based feature graph. Third, a CNN-based classifier is developed for the API-based feature graph classification. Evaluations of a real-world dataset containing 3,697 malware apps and 3,312 benign apps demonstrate that selected API feature is effective for Android malware classification, just top 20 APIs can achieve high F1 of 94.3% under Random Forest classifier. When the available API features are few, classification performance including FPR indicator can achieve effective improvement effectively by complementing our further work.
Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here attempt to separate the representations for styles and contents, and propose a generalized style transfer network consisting of style encoder, content encoder, mixer and decoder. The style encoder and content encoder are used to extract the style and content factors from the style reference images and content reference images, respectively. The mixer employs a bilinear model to integrate the above two factors and finally feeds it into a decoder to generate images with target style and content. To separate the style features and content features, we leverage the conditional dependence of styles and contents given an image. During training, the encoder network learns to extract styles and contents from two sets of reference images in limited size, one with shared style and the other with shared content. This learning framework allows simultaneous style transfer among multiple styles and can be deemed as a special 'multi-task' learning scenario. The encoders are expected to capture the underlying features for different styles and contents which is generalizable to new styles and contents. For validation, we applied the proposed algorithm to the Chinese Typeface transfer problem. Extensive experiment results on character generation have demonstrated the effectiveness and robustness of our method.
In this paper, we present an improved approach to transfer style for videos based on semantic segmentation. We segment foreground objects and background, and then apply different styles respectively. A fully convolutional neural network is used to perform semantic segmentation. We increase the reliability of the segmentation, and use the information of segmentation and the relationship between foreground objects and background to improve segmentation iteratively. We also use segmentation to improve optical flow, and apply different motion estimation methods between foreground objects and background. This improves the motion boundaries of optical flow, and solves the problems of incorrect and discontinuous segmentation caused by occlusion and shape deformation.
Nowadays, Microblog has become an important online social networking platform, and a large number of users share information through Microblog. Many malicious users have released various false news driven by various interests, which seriously affects the availability of Microblog platform. Therefore, the evaluation of Microblog user credibility has become an important research issue. This paper proposes a microblog user credibility evaluation algorithm based on trust propagation. In view of the high consumption and low precision caused by malicious users' attacking algorithms and manual selection of seed sets by establishing false social relationships, this paper proposes two optimization strategies: pruning algorithm based on social activity and similarity and based on The seed node selection algorithm of clustering. The pruning algorithm can trim off the attack edges established by malicious users and normal users. The seed node selection algorithm can efficiently select the highly available seed node set, and finally use the user social relationship graph to perform the two-way propagation trust scoring, so that the low trusted user has a lower trusted score and thus identifies the malicious user. The related experiments verify the effectiveness of the trustworthiness-based user credibility evaluation algorithm in the evaluation of Microblog user credibility.
Malware is pervasive and poses serious threats to normal operation of business processes in cloud. Cloud computing environments typically have hundreds of hosts that are connected to each other, often with high risk trust assumptions and/or protection mechanisms that are not difficult to break. Malware often exploits such weaknesses, as its immediate goal is often to spread itself to as many hosts as possible. Detecting this propagation is often difficult to address because the malware may reside in multiple components across the software or hardware stack. In this scenario, it is usually best to contain the malware to the smallest possible number of hosts, and it's also critical for system administration to resolve the issue in a timely manner. Furthermore, resolution often requires that several participants across different organizational teams scramble together to address the intrusion. In this vision paper, we define this problem in detail. We then present our vision of decentralized malware containment and the challenges and issues associated with this vision. The approach of containment involves detection and response using graph analytics coupled with a blockchain framework. We propose the use of a dominance frontier for profile nodes which must be involved in the containment process. Smart contracts are used to obtain consensus amongst the involved parties. The paper presents a basic implementation of this proposal. We have further discussed some open problems related to our vision.
With the development of modern High-Speed Railway (HSR) and mobile communication systems, network operators have a strong demand to provide high-quality on-board Internet services for HSR passengers. Multi-path TCP (MPTCP) provides a potential solution to aggregate available network bandwidth, greatly overcoming throughout degradation and severe jitter using single transmission path during the high-speed train moving. However, the choose of MPTCP algorithms, i.e., Coupled or Uncoupled, has a great impact on the performance. In this paper, we investigate this interesting issue in the practical datasets along multiple HSR lines. Particularly, we collect the first-hand network datasets and analyze the characteristics and category of traffic flows. Based on this statistics, we measure and analyze the transmission performance for both mice flows and elephant ones with different MPTCP congestion control algorithms in HSR scenarios. The simulation results show that, by comparing with the coupled MPTCP algorithms, i.e., Fully Coupled and LIA, the uncoupled EWTCP algorithm provides more stable throughput and balances congestion window distribution, more suitable for the HSR scenario for elephant flows. This work provides significant reference for the development of on-board devices in HSR network systems.
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
We consider the scenario where a cloud service provider (CSP) operates multiple geo-distributed datacenters to provide Internet-scale service. Our objective is to minimize the total electricity and bandwidth cost by jointly optimizing electricity procurement from wholesale markets and geographical load balancing (GLB), i.e., dynamically routing workloads to locations with cheaper electricity. Under the ideal setting where exact values of market prices and workloads are given, this problem reduces to a simple linear programming and is easy to solve. However, under the realistic setting where only distributions of these variables are available, the problem unfolds into a non-convex infinite-dimensional one and is challenging to solve. One of our main contributions is to develop an algorithm that is proven to solve the challenging problem optimally, by exploring the full design space of strategic bidding. Trace-driven evaluations corroborate our theoretical results, demonstrate fast convergence of our algorithm, and show that it can reduce the cost for the CSP by up to 20% as compared with baseline alternatives. This paper highlights the intriguing role of uncertainty in workloads and market prices, measured by their variances. While uncertainty in workloads deteriorates the cost-saving performance of joint electricity procurement and GLB, counter-intuitively, uncertainty in market prices can be exploited to achieve a cost reduction even larger than the setting without price uncertainty.
The steady decline of IP transit prices in the past two decades has helped fuel the growth of traffic demands in the Internet ecosystem. Despite the declining unit pricing, bandwidth costs remain significant due to ever-increasing scale and reach of the Internet, combined with the price disparity between the Internet's core hubs versus remote regions. In the meantime, cloud providers have been auctioning underutilized computing resources in their marketplace as spot instances for a much lower price, compared to their on-demand instances. This state of affairs has led the networking community to devote extensive efforts to cloud-assisted networks - the idea of offloading network functionality to cloud platforms, ultimately leading to more flexible and highly composable network service chains.We initiate a critical discussion on the economic and technological aspects of leveraging cloud-assisted networks for Internet-scale interconnections and data transfers. Namely, we investigate the prospect of constructing a large-scale virtualized network provider that does not own any fixed or dedicated resources and runs atop several spot instances. We construct a cloud-assisted overlay as a virtual network provider, by leveraging third-party cloud spot instances. We identify three use case scenarios where such approach will not only be economically and technologically viable but also provide performance benefits compared to current commercial offerings of connectivity and transit providers.