Biblio

Found 2356 results

Filters: Keyword is privacy  [Clear All Filters]
2018-02-02
Akram, R. N., Markantonakis, K., Mayes, K., Habachi, O., Sauveron, D., Steyven, A., Chaumette, S..  2017.  Security, privacy and safety evaluation of dynamic and static fleets of drones. 2017 IEEE/AIAA 36th Digital Avionics Systems Conference (DASC). :1–12.

Interconnected everyday objects, either via public or private networks, are gradually becoming reality in modern life - often referred to as the Internet of Things (IoT) or Cyber-Physical Systems (CPS). One stand-out example are those systems based on Unmanned Aerial Vehicles (UAVs). Fleets of such vehicles (drones) are prophesied to assume multiple roles from mundane to high-sensitive applications, such as prompt pizza or shopping deliveries to the home, or to deployment on battlefields for battlefield and combat missions. Drones, which we refer to as UAVs in this paper, can operate either individually (solo missions) or as part of a fleet (group missions), with and without constant connection with a base station. The base station acts as the command centre to manage the drones' activities; however, an independent, localised and effective fleet control is necessary, potentially based on swarm intelligence, for several reasons: 1) an increase in the number of drone fleets; 2) fleet size might reach tens of UAVs; 3) making time-critical decisions by such fleets in the wild; 4) potential communication congestion and latency; and 5) in some cases, working in challenging terrains that hinders or mandates limited communication with a control centre, e.g. operations spanning long period of times or military usage of fleets in enemy territory. This self-aware, mission-focused and independent fleet of drones may utilise swarm intelligence for a), air-traffic or flight control management, b) obstacle avoidance, c) self-preservation (while maintaining the mission criteria), d) autonomous collaboration with other fleets in the wild, and e) assuring the security, privacy and safety of physical (drones itself) and virtual (data, software) assets. In this paper, we investigate the challenges faced by fleet of drones and propose a potential course of action on how to overcome them.

2018-06-20
Kosmidis, Konstantinos, Kalloniatis, Christos.  2017.  Machine Learning and Images for Malware Detection and Classification. Proceedings of the 21st Pan-Hellenic Conference on Informatics. :5:1–5:6.

Detecting malicious code with exact match on collected datasets is becoming a large-scale identification problem due to the existence of new malware variants. Being able to promptly and accurately identify new attacks enables security experts to respond effectively. My proposal is to develop an automated framework for identification of unknown vulnerabilities by leveraging current neural network techniques. This has a significant and immediate value for the security field, as current anti-virus software is typically able to recognize the malware type only after its infection, and preventive measures are limited. Artificial Intelligence plays a major role in automatic malware classification: numerous machine-learning methods, both supervised and unsupervised, have been researched to try classifying malware into families based on features acquired by static and dynamic analysis. The value of automated identification is clear, as feature engineering is both a time-consuming and time-sensitive task, with new malware studied while being observed in the wild.

2017-10-19
Grushka - Cohen, Hagit, Sofer, Oded, Biller, Ofer, Shapira, Bracha, Rokach, Lior.  2016.  CyberRank: Knowledge Elicitation for Risk Assessment of Database Security. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2009–2012.
Security systems for databases produce numerous alerts about anomalous activities and policy rule violations. Prioritizing these alerts will help security personnel focus their efforts on the most urgent alerts. Currently, this is done manually by security experts that rank the alerts or define static risk scoring rules. Existing solutions are expensive, consume valuable expert time, and do not dynamically adapt to changes in policy. Adopting a learning approach for ranking alerts is complex due to the efforts required by security experts to initially train such a model. The more features used, the more accurate the model is likely to be, but this will require the collection of a greater amount of user feedback and prolong the calibration process. In this paper, we propose CyberRank, a novel algorithm for automatic preference elicitation that is effective for situations with limited experts' time and outperforms other algorithms for initial training of the system. We generate synthetic examples and annotate them using a model produced by Analytic Hierarchical Processing (AHP) to bootstrap a preference learning algorithm. We evaluate different approaches with a new dataset of expert ranked pairs of database transactions, in terms of their risk to the organization. We evaluated using manual risk assessments of transaction pairs, CyberRank outperforms all other methods for cold start scenario with error reduction of 20%.
Knote, Robin, Baraki, Harun, Söllner, Matthias, Geihs, Kurt, Leimeister, Jan Marco.  2016.  From Requirement to Design Patterns for Ubiquitous Computing Applications. Proceedings of the 21st European Conference on Pattern Languages of Programs. :26:1–26:11.
Ubiquitous Computing describes a concept where computing appears around us at any time and any location. Respective systems rely on context-sensitivity and adaptability. This means that they constantly collect data of the user and his context to adapt its functionalities to certain situations. Hence, the development of Ubiquitous Computing systems is not only a technical issue and must be considered from a privacy, legal and usability perspective, too. This indicates a need for several experts from different disciplines to participate in the development process, mentioning requirements and evaluating design alternatives. In order to capture the knowledge of these interdisciplinary teams to make it reusable for similar problems, a pattern logic can be applied. In the early phase of a development project, requirement patterns are used to describe recurring requirements for similar problems, whereas in a more advanced development phase, design patterns are deployed to find a suitable design for recurring requirements. However, existing literature does not give sufficient insights on how both concepts are related and how the process of deriving design patterns from requirements (patterns) appears in practice. In our work, we give insights on how trust-related requirements for Ubiquitous Computing applications evolve to interdisciplinary design patterns. We elaborate on a six-step process using an example requirement pattern. With this contribution, we shed light on the relation of interdisciplinary requirement and design patterns and provide experienced practitioners and scholars regarding UC application development a way for systematic and effective pattern utilization.
2017-09-27
Bateman, Scott, Gutwin, Carl.  2016.  (The Lack of) Privacy Concerns with Sharing Web Activity at Work and the Implications for Collaborative Search. Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval. :43–52.
Collaborative information seeking frequently occurs in an opportunistic and loosely-coupled fashion that is supported by awareness of others' activities on the web. Automatically sharing traces of information about web activity could substantially improve these collaborative information tasks, but conventional wisdom suggests that people are very reluctant to share information about web usage. Because work settings have different rules and practices about privacy, we carried out the first systematic study of people's privacy concerns about sharing web activity within workgroups. To provide a better understanding of privacy concerns about sharing web activity at work, we conducted a two-week diary study with 18 participants. Our study system asked participants to report on their search tasks and privacy concerns. Surprisingly, our results showed that people have little concern about sharing the majority of their activities with their work colleagues, and had even fewer concerns with sharing work-related activities. Our results provide new insights into the possibilities of sharing web activities within workgroups, and provide evidence that tools based on automatic sharing of awareness information can be feasible.
2023-03-31
Rousseaux, Francis, Saurel, Pierre.  2016.  The legal debate about personal data privacy at a time of big data mining and searching: Making big data researchers cooperating with lawmakers to find solutions for the future. 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI). :354–357.
At the same time as Big Data technologies are being constantly refined, the legislation relating to data privacy is changing. The invalidation by the Court of Justice of the European Union on October 6, 2015, of the agreement known as “Safe Harbor”, negotiated by the European Commission on behalf of the European Union with the United States has two consequences. The first is to announce its replacement by a new, still fragile, program, the “Privacy Shield”, which isn't yet definitive and which could also later be repealed by the Court of Justice of the European Union. For example, we are expecting to hear the opinion in mid-April 2016 of the group of data protection authorities for the various states of the European Union, known as G29. The second is to mobilize the Big Data community to take control of the question of data privacy management and to put in place an adequate internal program.
2017-10-19
Zhang, Chenwei, Xie, Sihong, Li, Yaliang, Gao, Jing, Fan, Wei, Yu, Philip S..  2016.  Multi-source Hierarchical Prediction Consolidation. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2251–2256.
In big data applications such as healthcare data mining, due to privacy concerns, it is necessary to collect predictions from multiple information sources for the same instance, with raw features being discarded or withheld when aggregating multiple predictions. Besides, crowd-sourced labels need to be aggregated to estimate the ground truth of the data. Due to the imperfection caused by predictive models or human crowdsourcing workers, noisy and conflicting information is ubiquitous and inevitable. Although state-of-the-art aggregation methods have been proposed to handle label spaces with flat structures, as the label space is becoming more and more complicated, aggregation under a label hierarchical structure becomes necessary but has been largely ignored. These label hierarchies can be quite informative as they are usually created by domain experts to make sense of highly complex label correlations such as protein functionality interactions or disease relationships. We propose a novel multi-source hierarchical prediction consolidation method to effectively exploits the complicated hierarchical label structures to resolve the noisy and conflicting information that inherently originates from multiple imperfect sources. We formulate the problem as an optimization problem with a closed-form solution. The consolidation result is inferred in a totally unsupervised, iterative fashion. Experimental results on both synthetic and real-world data sets show the effectiveness of the proposed method over existing alternatives.
Dupree, Janna Lynn, Devries, Richard, Berry, Daniel M., Lank, Edward.  2016.  Privacy Personas: Clustering Users via Attitudes and Behaviors Toward Security Practices. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. :5228–5239.
A primary goal of research in usable security and privacy is to understand the differences and similarities between users. While past researchers have clustered users into different groups, past categories of users have proven to be poor predictors of end-user behaviors. In this paper, we perform an alternative clustering of users based on their behaviors. Through the analysis of data from surveys and interviews of participants, we identify five user clusters that emerge from end-user behaviors-Fundamentalists, Lazy Experts, Technicians, Amateurs and the Marginally Concerned. We examine the stability of our clusters through a survey-based study of an alternative sample, showing that clustering remains consistent. We conduct a small-scale design study to demonstrate the utility of our clusters in design. Finally, we argue that our clusters complement past work in understanding privacy choices, and that our categorization technique can aid in the design of new computer security technologies.
2017-10-10
Marxer, Claudio, Scherb, Christopher, Tschudin, Christian.  2016.  Access-Controlled In-Network Processing of Named Data. Proceedings of the 3rd ACM Conference on Information-Centric Networking. :77–82.

In content-based security, encrypted content as well as wrapped access keys are made freely available by an Information Centric Network: Only those clients which are able to unwrap the encryption key can access the protected content. In this paper we extend this model to computation chains where derived data (e.g. produced by a Named Function Network) also has to comply to the content-based security approach. A central problem to solve is the synchronized on-demand publishing of encrypted results and wrapped keys as well as defining the set of consumers which are authorized to access the derived data. In this paper we introduce "content-attendant policies" and report on a running prototype that demonstrates how to enforce data owner-defined access control policies despite fully decentralized and arbitrarily long computation chains.

2017-08-18
Trivedi, Munesh Chandra, Sharma, Shivani, Yadav, Virendra Kumar.  2016.  Analysis of Several Image Steganography Techniques in Spatial Domain: A Survey. Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies. :84:1–84:7.

Steganography enables user to hide confidential data in any digital medium such that its existence cannot be concealed by the third party. Several research work is being is conducted to improve steganography algorithm's efficiency. Recent trends in computing technology use steganography as an important tool for hiding confidential data. This paper summarizes some of the research work conducted in the field of image steganography in spatial domain along with their advantages and disadvantages. Future research work and experimental results of some techniques is also being discussed. The key goal is to show the powerful impact of steganography in information hiding and image processing domain.

2017-06-27
Obermaier, Johannes, Hutle, Martin.  2016.  Analyzing the Security and Privacy of Cloud-based Video Surveillance Systems. Proceedings of the 2Nd ACM International Workshop on IoT Privacy, Trust, and Security. :22–28.

In the area of the Internet of Things, cloud-based camera surveillance systems are ubiquitously available for industrial and private environments. However, the sensitive nature of the surveillance use case imposes high requirements on privacy/confidentiality, authenticity, and availability of such systems. In this work, we investigate how currently available mass-market camera systems comply with these requirements. Considering two attacker models, we test the cameras for weaknesses and analyze for their implications. We reverse-engineered the security implementation and discovered several vulnerabilities in every tested system. These weaknesses impair the users' privacy and, as a consequence, may also damage the camera system manufacturer's reputation. We demonstrate how an attacker can exploit these vulnerabilities to blackmail users and companies by denial-of-service attacks, injecting forged video streams, and by eavesdropping private video data - even without physical access to the device. Our analysis shows that current systems lack in practice the necessary care when implementing security for IoT devices.

2017-09-15
Ali Alatwi, Huda, Oh, Tae, Fokoue, Ernest, Stackpole, Bill.  2016.  Android Malware Detection Using Category-Based Machine Learning Classifiers. Proceedings of the 17th Annual Conference on Information Technology Education. :54–59.

Android malware growth has been increasing dramatically as well as the diversity and complicity of their developing techniques. Machine learning techniques have been applied to detect malware by modeling patterns of static features and dynamic behaviors of malware. The accuracy rates of the machine learning classifiers differ depending on the quality of the features. We increase the quality of the features by relating between the apps' features and the features that are required to deliver its category's functionality. To measure the benign app references, the features of the top rated apps in a specific category are utilized to train a malware detection classifier for that given category. Android apps stores such as Google Play organize apps into different categories. Each category has its distinct functionalities which means the apps under a specific category are similar in their static and dynamic features. In other words, benign apps under a certain category tend to share a common set of features. On the contrary, malicious apps tend to have abnormal features, which are uncommon for the category that they belong to. This paper proposes category-based machine learning classifiers to enhance the performance of classification models at detecting malicious apps under a certain category. The intensive machine learning experiments proved that category-based classifiers report a remarkable higher average performance compared to non-category based.

2017-04-24
Roegiest, Adam, Cormack, Gordon V..  2016.  An Architecture for Privacy-Preserving and Replicable High-Recall Retrieval Experiments. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. :1085–1088.

We demonstrate the infrastructure used in the TREC 2015 Total Recall track to facilitate controlled simulation of "assessor in the loop" high-recall retrieval experimentation. The implementation and corresponding design decisions are presented for this platform. This includes the necessary considerations to ensure that experiments are privacy-preserving when using test collections that cannot be distributed. Furthermore, we describe the use of virtual machines as a means of system submission in order to to promote replicable experiments while also ensuring the security of system developers and data providers.

2017-08-02
Madi, Taous, Majumdar, Suryadipta, Wang, Yushun, Jarraya, Yosr, Pourzandi, Makan, Wang, Lingyu.  2016.  Auditing Security Compliance of the Virtualized Infrastructure in the Cloud: Application to OpenStack. Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. :195–206.

Cloud service providers typically adopt the multi-tenancy model to optimize resources usage and achieve the promised cost-effectiveness. Sharing resources between different tenants and the underlying complex technology increase the necessity of transparency and accountability. In this regard, auditing security compliance of the provider's infrastructure against standards, regulations and customers' policies takes on an increasing importance in the cloud to boost the trust between the stakeholders. However, virtualization and scalability make compliance verification challenging. In this work, we propose an automated framework that allows auditing the cloud infrastructure from the structural point of view while focusing on virtualization-related security properties and consistency between multiple control layers. Furthermore, to show the feasibility of our approach, we integrate our auditing system into OpenStack, one of the most used cloud infrastructure management systems. To show the scalability and validity of our framework, we present our experimental results on assessing several properties related to auditing inter-layer consistency, virtual machines co-residence, and virtual resources isolation.

2017-04-20
Egner, Alexandru Ionut, Luu, Duc, den Hartog, Jerry, Zannone, Nicola.  2016.  An Authorization Service for Collaborative Situation Awareness. Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy. :136–138.

In international military coalitions, situation awareness is achieved by gathering critical intel from different authorities. Authorities want to retain control over their data, as they are sensitive by nature, and, thus, usually employ their own authorization solutions to regulate access to them. In this paper, we highlight that harmonizing authorization solutions at the coalition level raises many challenges. We demonstrate how we address authorization challenges in the context of a scenario defined by military experts using a prototype implementation of SAFAX, an XACML-based architectural framework tailored to the development of authorization services for distributed systems.

2017-06-05
Xu, Bin, Chang, Pamara, Welker, Christopher L., Bazarova, Natalya N., Cosley, Dan.  2016.  Automatic Archiving Versus Default Deletion: What Snapchat Tells Us About Ephemerality in Design. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. :1662–1675.

Unlike most social media, where automatic archiving of data is the default, Snapchat defaults to ephemerality: deleting content shortly after it is viewed by a receiver. Interviews with 25 Snapchat users show that ephemerality plays a key role in shaping their practices. Along with friend-adding features that facilitate a network of mostly close relations, default deletion affords everyday, mundane talk and reduces self-consciousness while encouraging playful interaction. Further, although receivers can save content through screenshots, senders are notified; this selective saving with notification supports complex information norms that preserve the feel of ephemeral communication while supporting the capture of meaningful content. This dance of giving and taking, sharing and showing, and agency for both senders and receivers provides the basis for a rich design space of mechanisms, levels, and domains for ephemerality.

2017-09-05
Ranjan, Juhi, Whitehouse, Kamin.  2016.  Automatic Authentication of Smartphone Touch Interactions Using Smartwatch. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. :361–364.

In this demo, we will display a smartphone authentication system that can automatically validate every touch interaction made on a smartphone using a smart watch worn by the phone's owner. The IMU sensors on a smart watch monitor the motion of the hand for specific signal characteristics, which is relayed to the phone. If the signal features match certain criteria then the touch is authenticated and the phone responds appropriately. If not, the phone's screen remains locked/unresponsive to the touch action. The challenge here is to be able to validate every touch gesture within acceptable limits of human perception.

2017-08-22
Junejo, Khurum Nazir, Goh, Jonathan.  2016.  Behaviour-Based Attack Detection and Classification in Cyber Physical Systems Using Machine Learning. Proceedings of the 2Nd ACM International Workshop on Cyber-Physical System Security. :34–43.

Cyber-physical systems (CPS) are often network integrated to enable remote management, monitoring, and reporting. Such integration has made them vulnerable to cyber attacks originating from an untrusted network (e.g., the internet). Once an attacker breaches the network security, he could corrupt operations of the system in question, which may in turn lead to catastrophes. Hence there is a critical need to detect intrusions into mission-critical CPS. Signature based detection may not work well for CPS, whose complexity may preclude any succinct signatures that we will need. Specification based detection requires accurate definitions of system behaviour that similarly can be hard to obtain, due to the CPS's complexity and dynamics, as well as inaccuracies and incompleteness of design documents or operation manuals. Formal models, to be tractable, are often oversimplified, in which case they will not support effective detection. In this paper, we study a behaviour-based machine learning (ML) approach for the intrusion detection. Whereas prior unsupervised ML methods have suffered from high missed detection or false-positive rates, we use a high-fidelity CPS testbed, which replicates all main physical and control components of a modern water treatment facility, to generate systematic training data for a supervised method. The method does not only detect the occurrence of a cyber attack at the physical process layer, but it also identifies the specific type of the attack. Its detection is fast and robust to noise. Furthermore, its adaptive system model can learn quickly to match dynamics of the CPS and its operating environment. It exhibits a low false positive (FP) rate, yet high precision and recall.

2017-08-18
Boroumand, Mehdi, Fridrich, Jessica.  2016.  Boosting Steganalysis with Explicit Feature Maps. Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security. :149–157.

Explicit non-linear transformations of existing steganalysis features are shown to boost their ability to detect steganography in combination with existing simple classifiers, such as the FLD-ensemble. The non-linear transformations are learned from a small number of cover features using Nyström approximation on pilot vectors obtained with kernelized PCA. The best performance is achieved with the exponential form of the Hellinger kernel, which improves the detection accuracy by up to 2-3% for spatial-domain contentadaptive steganography. Since the non-linear map depends only on the cover source and its learning has a low computational complexity, the proposed approach is a practical and low cost method for boosting the accuracy of existing detectors built as binary classifiers. The map can also be used to significantly reduce the feature dimensionality (by up to factor of ten) without performance loss with respect to the non-transformed features.

2017-04-24
Alan, Hasan Faik, Kaur, Jasleen.  2016.  Can Android Applications Be Identified Using Only TCP/IP Headers of Their Launch Time Traffic? Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks. :61–66.

The ability to identify mobile apps in network traffic has significant implications in many domains, including traffic management, malware detection, and maintaining user privacy. App identification methods in the literature typically use deep packet inspection (DPI) and analyze HTTP headers to extract app fingerprints. However, these methods cannot be used if HTTP traffic is encrypted. We investigate whether Android apps can be identified from their launch-time network traffic using only TCP/IP headers. We first capture network traffic of 86,109 app launches by repeatedly running 1,595 apps on 4 distinct Android devices. We then use supervised learning methods used previously in the web page identification literature, to identify the apps that generated the traffic. We find that: (i) popular Android apps can be identified with 88% accuracy, by using the packet sizes of the first 64 packets they generate, when the learning methods are trained and tested on the data collected from same device; (ii) when the data from an unseen device (but similar operating system/vendor) is used for testing, the apps can be identified with 67% accuracy; (iii) the app identification accuracy does not drop significantly even if the training data are stale by several days, and (iv) the accuracy does drop quite significantly if the operating system/vendor is very different. We discuss the implications of our findings as well as open issues.

2017-08-18
Abdulrahman, Hasan, Chaumont, Marc, Montesinos, Philippe, Magnier, Baptiste.  2016.  Color Image Steganalysis Based On Steerable Gaussian Filters Bank. Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security. :109–114.

This article deals with color images steganalysis based on machine learning. The proposed approach enriches the features from the Color Rich Model by adding new features obtained by applying steerable Gaussian filters and then computing the co-occurrence of pixel pairs. Adding these new features to those obtained from Color-Rich Models allows us to increase the detectability of hidden messages in color images. The Gaussian filters are angled in different directions to precisely compute the tangent of the gradient vector. Then, the gradient magnitude and the derivative of this tangent direction are estimated. This refined method of estimation enables us to unearth the minor changes that have occurred in the image when a message is embedded. The efficiency of the proposed framework is demonstrated on three stenographic algorithms designed to hide messages in images: S-UNIWARD, WOW, and Synch-HILL. Each algorithm is tested using different payload sizes. The proposed approach is compared to three color image steganalysis methods based on computation features and Ensemble Classifier classification: the Spatial Color Rich Model, the CFA-aware Rich Model and the RGB Geometric Color Rich Model.

2017-09-15
Dhulipala, Laxman, Kabiljo, Igor, Karrer, Brian, Ottaviano, Giuseppe, Pupyrev, Sergey, Shalita, Alon.  2016.  Compressing Graphs and Indexes with Recursive Graph Bisection. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1535–1544.

Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

Silva, Rodrigo M., Gomes, Guilherme C.M., Alvim, Mário S., Gonçalves, Marcos A..  2016.  Compression-Based Selective Sampling for Learning to Rank. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :247–256.

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning (AL) algorithms are able to reduce the labeling effort by actively sampling an unlabeled set and choosing data instances that maximize the effectiveness of a learning function. But AL methods require constant supervision, as documents have to be labeled at each round of the process. In this paper, we propose that certain characteristics of unlabeled L2R datasets allow for an unsupervised, compression-based selection process to be used to create small and yet highly informative and effective initial sets that can later be labeled and used to bootstrap a L2R system. We implement our ideas through a novel unsupervised selective sampling method, which we call Cover, that has several advantages over AL methods tailored to L2R. First, it does not need an initial labeled seed set and can select documents from scratch. Second, selected documents do not need to be labeled as the iterations of the method progress since it is unsupervised (i.e., no learning model needs to be updated). Thus, an arbitrarily sized training set can be selected without human intervention depending on the available budget. Third, the method is efficient and can be run on unlabeled collections containing millions of query-document instances. We run various experiments with two important L2R benchmarking collections to show that the proposed method allows for the creation of small, yet very effective training sets. It achieves full training-like performance with less than 10% of the original sets selected, outperforming the baselines in both effectiveness and scalability.

Li, Zheng, Xia, Yuli, Ye, Ruiqi, Zhao, Junsuo.  2016.  Compressive Sensing for Space Image Compressing. Proceedings of the 2016 International Conference on Intelligent Information Processing. :23:1–23:5.

Compressive sensing is a new technique by which sparse signals are sampled and recovered from a few measurements. To address the disadvantages of traditional space image compressing methods, a complete new compressing scheme under the compressive sensing framework was developed in this paper. Firstly, in the coding stage, a simple binary measurement matrix was constructed to obtain signal measurements. Secondly, the input image was divided into small blocks. The image blocks then would be used as training sets to get a dictionary basis for sparse representation with learning algorithm. At last, sparse reconstruction algorithm was used to recover the original input image. Experimental results show that both the compressing rate and image recovering quality of the proposed method are high. Besides, as the computation cost is very low in the sampling stage, it is suitable for on-board applications in astronomy.

2017-06-27
Jordan, Michael I..  2016.  On Computational Thinking, Inferential Thinking and Data Science. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :47–47.

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.