Visible to the public Biblio

Found 1057 results

Filters: Keyword is machine learning  [Clear All Filters]
2019-03-06
Hess, S., Satam, P., Ditzler, G., Hariri, S..  2018.  Malicious HTML File Prediction: A Detection and Classification Perspective with Noisy Data. 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA). :1-7.

Cybersecurity plays a critical role in protecting sensitive information and the structural integrity of networked systems. As networked systems continue to expand in numbers as well as in complexity, so does the threat of malicious activity and the necessity for advanced cybersecurity solutions. Furthermore, both the quantity and quality of available data on malicious content as well as the fact that malicious activity continuously evolves makes automated protection systems for this type of environment particularly challenging. Not only is the data quality a concern, but the volume of the data can be quite small for some of the classes. This creates a class imbalance in the data used to train a classifier; however, many classifiers are not well equipped to deal with class imbalance. One such example is detecting malicious HMTL files from static features. Unfortunately, collecting malicious HMTL files is extremely difficult and can be quite noisy from HTML files being mislabeled. This paper evaluates a specific application that is afflicted by these modern cybersecurity challenges: detection of malicious HTML files. Previous work presented a general framework for malicious HTML file classification that we modify in this work to use a $\chi$2 feature selection technique and synthetic minority oversampling technique (SMOTE). We experiment with different classifiers (i.e., AdaBoost, Gentle-Boost, RobustBoost, RusBoost, and Random Forest) and a pure detection model (i.e., Isolation Forest). We benchmark the different classifiers using SMOTE on a real dataset that contains a limited number of malicious files (40) with respect to the normal files (7,263). It was found that the modified framework performed better than the previous framework's results. However, additional evidence was found to imply that algorithms which train on both the normal and malicious samples are likely overtraining to the malicious distribution. We demonstrate the likely overtraining by determining that a subset of the malicious files, while suspicious, did not come from a malicious source.

Calo, Seraphin, Verma, Dinesh, Chakraborty, Supriyo, Bertino, Elisa, Lupu, Emil, Cirincione, Gregory.  2018.  Self-Generation of Access Control Policies. Proceedings of the 23Nd ACM on Symposium on Access Control Models and Technologies. :39-47.

Access control for information has primarily focused on access statically granted to subjects by administrators usually in the context of a specific system. Even if mechanisms are available for access revocation, revocations must still be executed manually by an administrator. However, as physical devices become increasingly embedded and interconnected, access control needs to become an integral part of the resource being protected and be generated dynamically by resources depending on the context in which the resource is being used. In this paper, we discuss a set of scenarios for access control needed in current and future systems and use that to argue that an approach for resources to generate and manage their access control policies dynamically on their own is needed. We discuss some approaches for generating such access control policies that may address the requirements of the scenarios.

2019-03-04
Lin, F., Beadon, M., Dixit, H. D., Vunnam, G., Desai, A., Sankar, S..  2018.  Hardware Remediation at Scale. 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). :14–17.
Large scale services have automated hardware remediation to maintain the infrastructure availability at a healthy level. In this paper, we share the current remediation flow at Facebook, and how it is being monitored. We discuss a class of hardware issues that are transient and typically have higher rates during heavy load. We describe how our remediation system was enhanced to be efficient in detecting this class of issues. As hardware and systems change in response to the advancement in technology and scale, we have also utilized machine learning frameworks for hardware remediation to handle the introduction of new hardware failure modes. We present an ML methodology that uses a set of predictive thresholds to monitor remediation efficiency over time. We also deploy a recommendation system based on natural language processing, which is used to recommend repair actions for efficient diagnosis and repair. We also describe current areas of research that will enable us to improve hardware availability further.
Aborisade, O., Anwar, M..  2018.  Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers. 2018 IEEE International Conference on Information Reuse and Integration (IRI). :269–276.

At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.

2019-02-25
Gupta, M., Bakliwal, A., Agarwal, S., Mehndiratta, P..  2018.  A Comparative Study of Spam SMS Detection Using Machine Learning Classifiers. 2018 Eleventh International Conference on Contemporary Computing (IC3). :1–7.
With technological advancements and increment in content based advertisement, the use of Short Message Service (SMS) on phones has increased to such a significant level that devices are sometimes flooded with a number of spam SMS. These spam messages can lead to loss of private data as well. There are many content-based machine learning techniques which have proven to be effective in filtering spam emails. Modern day researchers have used some stylistic features of text messages to classify them to be ham or spam. SMS spam detection can be greatly influenced by the presence of known words, phrases, abbreviations and idioms. This paper aims to compare different classifying techniques on different datasets collected from previous research works, and evaluate them on the basis of their accuracies, precision, recall and CAP Curve. The comparison has been performed between traditional machine learning techniques and deep learning methods.
Karamollaoglu, H., Dogru, İ A., Dorterler, M..  2018.  Detection of Spam E-mails with Machine Learning Methods. 2018 Innovations in Intelligent Systems and Applications Conference (ASYU). :1–5.

E-mail communication is one of today's indispensable communication ways. The widespread use of email has brought about some problems. The most important one of these problems are spam (unwanted) e-mails, often composed of advertisements or offensive content, sent without the recipient's request. In this study, it is aimed to analyze the content information of e-mails written in Turkish with the help of Naive Bayes Classifier and Vector Space Model from machine learning methods, to determine whether these e-mails are spam e-mails and classify them. Both methods are subjected to different evaluation criteria and their performances are compared.

Popovac, M., Karanovic, M., Sladojevic, S., Arsenovic, M., Anderla, A..  2018.  Convolutional Neural Network Based SMS Spam Detection. 2018 26th Telecommunications Forum (℡FOR). :1–4.
SMS spam refers to undesired text message. Machine Learning methods for anti-spam filters have been noticeably effective in categorizing spam messages. Dataset used in this research is known as Tiago's dataset. Crucial step in the experiment was data preprocessing, which involved reducing text to lower case, tokenization, removing stopwords. Convolutional Neural Network was the proposed method for classification. Overall model's accuracy was 98.4%. Obtained model can be used as a tool in many applications.
Peng, W., Huang, L., Jia, J., Ingram, E..  2018.  Enhancing the Naive Bayes Spam Filter Through Intelligent Text Modification Detection. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :849–854.

Spam emails have been a chronic issue in computer security. They are very costly economically and extremely dangerous for computers and networks. Despite of the emergence of social networks and other Internet based information exchange venues, dependence on email communication has increased over the years and this dependence has resulted in an urgent need to improve spam filters. Although many spam filters have been created to help prevent these spam emails from entering a user's inbox, there is a lack or research focusing on text modifications. Currently, Naive Bayes is one of the most popular methods of spam classification because of its simplicity and efficiency. Naive Bayes is also very accurate; however, it is unable to correctly classify emails when they contain leetspeak or diacritics. Thus, in this proposes, we implemented a novel algorithm for enhancing the accuracy of the Naive Bayes Spam Filter so that it can detect text modifications and correctly classify the email as spam or ham. Our Python algorithm combines semantic based, keyword based, and machine learning algorithms to increase the accuracy of Naive Bayes compared to Spamassassin by over two hundred percent. Additionally, we have discovered a relationship between the length of the email and the spam score, indicating that Bayesian Poisoning, a controversial topic, is actually a real phenomenon and utilized by spammers.

Ali, S. S., Maqsood, J..  2018.  .Net library for SMS spam detection using machine learning: A cross platform solution. 2018 15th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :470–476.

Short Message Service is now-days the most used way of communication in the electronic world. While many researches exist on the email spam detection, we haven't had the insight knowledge about the spam done within the SMS's. This might be because the frequency of spam in these short messages is quite low than the emails. This paper presents different ways of analyzing spam for SMS and a new pre-processing way to get the actual dataset of spam messages. This dataset was then used on different algorithm techniques to find the best working algorithm in terms of both accuracy and recall. Random Forest algorithm was then implemented in a real world application library written in C\# for cross platform .Net development. This library is capable of using a prebuild model for classifying a new dataset for spam and ham.

2019-02-22
Mulinka, Pavol, Casas, Pedro.  2018.  Stream-Based Machine Learning for Network Security and Anomaly Detection. Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks. :1-7.

Data Stream Machine Learning is rapidly gaining popularity within the network monitoring community as the big data produced by network devices and end-user terminals goes beyond the memory constraints of standard monitoring equipment. Critical network monitoring applications such as the detection of anomalies, network attacks and intrusions, require fast and continuous mechanisms for on-line analysis of data streams. In this paper we consider a stream-based machine learning approach for network security and anomaly detection, applying and evaluating multiple machine learning algorithms in the analysis of continuously evolving network data streams. The continuous evolution of the data stream analysis algorithms coming from the data stream mining domain, as well as the multiple evaluation approaches conceived for benchmarking such kind of algorithms makes it difficult to choose the appropriate machine learning model. Results of the different approaches may significantly differ and it is crucial to determine which approach reflects the algorithm performance the best. We therefore compare and analyze the results from the most recent evaluation approaches for sequential data on commonly used batch-based machine learning algorithms and their corresponding stream-based extensions, for the specific problem of on-line network security and anomaly detection. Similar to our previous findings when dealing with off-line machine learning approaches for network security and anomaly detection, our results suggest that adaptive random forests and stochastic gradient descent models are able to keep up with important concept drifts in the underlying network data streams, by keeping high accuracy with continuous re-training at concept drift detection times.

Petrík, Juraj, Chudá, Daniela.  2018.  Source Code Authorship Approaches Natural Language Processing. Proceedings of the 19th International Conference on Computer Systems and Technologies. :58-61.

This paper proposed method for source code authorship attribution using modern natural language processing methods. Our method based on text embedding with convolutional recurrent neural network reaches 94.5% accuracy within 500 authors in one dataset, which outperformed many state of the art models for authorship attribution. Our approach is dealing with source code as with natural language texts, so it is potentially programming language independent with more potential of future improving.

Dauber, Edwin, Caliskan, Aylin, Harang, Richard, Greenstadt, Rachel.  2018.  Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. :356-357.

Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution "in the wild," examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.

2019-02-18
Wu, Siyan, Tong, Xiaojun, Wang, Wei, Xin, Guodong, Wang, Bailing, Zhou, Qi.  2018.  Website Defacements Detection Based on Support Vector Machine Classification Method. Proceedings of the 2018 International Conference on Computing and Data Engineering. :62–66.
Website defacements can inflict significant harm on the website owner through the loss of reputation, the loss of money, or the leakage of information. Due to the complexity and diversity of all kinds of web application systems, especially a lack of necessary security maintenance, website defacements increased year by year. In this paper, we focus on detecting whether the website has been defaced by extracting website features and website embedded trojan features. We use three kinds of classification learning algorithms which include Gradient Boosting Decision Tree (GBDT), Random Forest (RF) and Support Vector Machine (SVM) to do the classification experiments, and experimental results show that Support Vector Machine classifier performed better than two other classifiers. It can achieve an overall accuracy of 95%-96% in detecting website defacements.
Nazari, Zahra, Yu, Seong-Mi, Kang, Dongshik, Kawachi, Yousuke.  2018.  Comparative Study of Outlier Detection Algorithms for Machine Learning. Proceedings of the 2018 2Nd International Conference on Deep Learning Technologies. :47–51.
Outliers are unusual data points which are inconsistent with other observations. Human error, mechanical faults, fraudulent behavior, instrument error, and changes in the environment are some reasons to arise outliers. Several types of outlier detection algorithms are developed and a number of surveys and overviews are performed to distinguish their advantages and disadvantages. Multivariate outlier detection algorithms are widely used among other types, therefore we concentrate on this type. In this work a comparison between effects of multivariate outlier detection algorithms on machine learning problems is performed. For this purpose, three multivariate outlier detection algorithms namely distance based, statistical based and clustering based are evaluated. Benchmark datasets of Heart disease, Breast cancer and Liver disorder are used for the experiments. To identify the effectiveness of mentioned algorithms, the above datasets are classified by Support Vector Machines (SVM) before and after outlier detection. Finally a comparative review is performed to distinguish the advantages and disadvantages of each algorithm and their respective effects on accuracy of SVM classifiers.
2019-02-14
Tesfay, Welderufael B., Hofmann, Peter, Nakamura, Toru, Kiyomoto, Shinsaku, Serna, Jetzabel.  2018.  PrivacyGuide: Towards an Implementation of the EU GDPR on Internet Privacy Policy Evaluation. Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. :15-21.

Nowadays Internet services have dramatically changed the way people interact with each other and many of our daily activities are supported by those services. Statistical indicators show that more than half of the world's population uses the Internet generating about 2.5 quintillion bytes of data on daily basis. While such a huge amount of data is useful in a number of fields, such as in medical and transportation systems, it also poses unprecedented threats for user's privacy. This is aggravated by the excessive data collection and user profiling activities of service providers. Yet, regulation require service providers to inform users about their data collection and processing practices. The de facto way of informing users about these practices is through the use of privacy policies. Unfortunately, privacy policies suffer from bad readability and other complexities which make them unusable for the intended purpose. To address this issue, we introduce PrivacyGuide, a privacy policy summarization tool inspired by the European Union (EU) General Data Protection Regulation (GDPR) and based on machine learning and natural language processing techniques. Our results show that PrivacyGuide is able to classify privacy policy content into eleven privacy aspects with a weighted average accuracy of 74% and further shed light on the associated risk level with an accuracy of 90%. This article is summarized in: the morning paper an interesting/influential/important paper from the world of CS every weekday morning, as selected by Adrian Colyer

2019-02-13
Feng, Y., Akiyama, H., Lu, L., Sakurai, K..  2018.  Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). :173–180.

It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.

2019-02-08
Wang, M., Zhu, W., Yan, S., Wang, Q..  2018.  SoundAuth: Secure Zero-Effort Two-Factor Authentication Based on Audio Signals. 2018 IEEE Conference on Communications and Network Security (CNS). :1-9.

Two-factor authentication (2FA) popularly works by verifying something the user knows (a password) and something she possesses (a token, popularly instantiated with a smart phone). Conventional 2FA systems require extra interaction like typing a verification code, which is not very user-friendly. For improved user experience, recent work aims at zero-effort 2FA, in which a smart phone placed close to a computer (where the user enters her username/password into a browser to log into a server) automatically assists with the authentication. To prove her possession of the smart phone, the user needs to prove the phone is on the login spot, which reduces zero-effort 2FA to co-presence detection. In this paper, we propose SoundAuth, a secure zero-effort 2FA mechanism based on (two kinds of) ambient audio signals. SoundAuth looks for signs of proximity by having the browser and the smart phone compare both their surrounding sounds and certain unpredictable near-ultrasounds; if significant distinguishability is found, SoundAuth rejects the login request. For the ambient signals comparison, we regard it as a classification problem and employ a machine learning technique to analyze the audio signals. Experiments with real login attempts show that SoundAuth not only is comparable to existent schemes concerning utility, but also outperforms them in terms of resilience to attacks. SoundAuth can be easily deployed as it is readily supported by most smart phones and major browsers.

Nguyen, Sinh-Ngoc, Nguyen, Van-Quyet, Choi, Jintae, Kim, Kyungbaek.  2018.  Design and Implementation of Intrusion Detection System Using Convolutional Neural Network for DoS Detection. Proceedings of the 2Nd International Conference on Machine Learning and Soft Computing. :34-38.

Nowadays, network is one of the essential parts of life, and lots of primary activities are performed by using the network. Also, network security plays an important role in the administrator and monitors the operation of the system. The intrusion detection system (IDS) is a crucial module to detect and defend against the malicious traffics before the system is affected. This system can extract the information from the network system and quickly indicate the reaction which provides real-time protection for the protected system. However, detecting malicious traffics is very complicating because of their large quantity and variants. Also, the accuracy of detection and execution time are the challenges of some detection methods. In this paper, we propose an IDS platform based on convolutional neural network (CNN) called IDS-CNN to detect DoS attack. Experimental results show that our CNN based DoS detection obtains high accuracy at most 99.87%. Moreover, comparisons with other machine learning techniques including KNN, SVM, and Naïve Bayes demonstrate that our proposed method outperforms traditional ones.

Sisiaridis, D., Markowitch, O..  2018.  Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :43-48.

Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cybersecurity threats and attacks by utilizing data mining techniques in the field of Artificial Intelligence. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection utilizing machine learning algorithms for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.

Clark, G., Doran, M., Glisson, W..  2018.  A Malicious Attack on the Machine Learning Policy of a Robotic System. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :516-521.

The field of robotics has matured using artificial intelligence and machine learning such that intelligent robots are being developed in the form of autonomous vehicles. The anticipated widespread use of intelligent robots and their potential to do harm has raised interest in their security. This research evaluates a cyberattack on the machine learning policy of an autonomous vehicle by designing and attacking a robotic vehicle operating in a dynamic environment. The primary contribution of this research is an initial assessment of effective manipulation through an indirect attack on a robotic vehicle using the Q learning algorithm for real-time routing control. Secondly, the research highlights the effectiveness of this attack along with relevant artifact issues.

Lee, D. ', La, W. Gyu, Kim, H..  2018.  Drone Detection and Identification System Using Artificial Intelligence. 2018 International Conference on Information and Communication Technology Convergence (ICTC). :1131-1133.

As drone attracts much interest, the drone industry has opened their market to ordinary people, making drones to be used in daily lives. However, as it got easier for drone to be used by more people, safety and security issues have raised as accidents are much more likely to happen: colliding into people by losing control or invading secured properties. For safety purposes, it is essential for observers and drone to be aware of an approaching drone. In this paper, we introduce a comprehensive drone detection system based on machine learning. This system is designed to be operable on drones with camera. Based on the camera images, the system deduces location on image and vendor model of drone based on machine classification. The system is actually built with OpenCV library. We collected drone imagery and information for learning process. The system's output shows about 89 percent accuracy.

Grieco, Gustavo, Dinaburg, Artem.  2018.  Toward Smarter Vulnerability Discovery Using Machine Learning. Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security. :48-56.

A Cyber Reasoning System (CRS) is designed to automatically find and exploit software vulnerabilities in complex software. To be effective, CRSs integrate multiple vulnerability detection tools (VDTs), such as symbolic executors and fuzzers. Determining which VDTs can best find bugs in a large set of target programs, and how to optimally configure those VDTs, remains an open and challenging problem. Current solutions are based on heuristics created by security analysts that rely on experience, intuition and luck. In this paper, we present Central Exploit Organizer (CEO), a proof-of-concept tool to optimize VDT selection. CEO uses machine learning to optimize the selection and configuration of the most suitable vulnerability detection tool. We show that CEO can predict the relative effectiveness of a given vulnerability detection tool, configuration, and initial input. The estimation accuracy presents an improvement between \$11%\$ and \$21%\$ over random selection. We are releasing CEO and our dataset as open source to encourage further research.

Liu, Xiaoyu, Huang, LiGuo, Ng, Vincent.  2018.  Effective API Recommendation Without Historical Software Repositories. Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. :282-292.
It is time-consuming and labor-intensive to learn and locate the correct API for programming tasks. Thus, it is beneficial to perform API recommendation automatically. The graph-based statistical model has been shown to recommend top-10 API candidates effectively. It falls short, however, in accurately recommending an actual top-1 API. To address this weakness, we propose RecRank, an approach and tool that applies a novel ranking-based discriminative approach leveraging API usage path features to improve top-1 API recommendation. Empirical evaluation on a large corpus of (1385+8) open source projects shows that RecRank significantly improves top-1 API recommendation accuracy and mean reciprocal rank when compared to state-of-the-art API recommendation approaches.
2019-01-31
Zhao, Jianxin, Mortier, Richard, Crowcroft, Jon, Wang, Liang.  2018.  Privacy-Preserving Machine Learning Based Data Analytics on Edge Devices. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. :341–346.

Emerging Machine Learning (ML) techniques, such as Deep Neural Network, are widely used in today's applications and services. However, with social awareness of privacy and personal data rapidly rising, it becomes a pressing and challenging societal issue to both keep personal data private and benefit from the data analytics power of ML techniques at the same time. In this paper, we argue that to avoid those costs, reduce latency in data processing, and minimise the raw data revealed to service providers, many future AI and ML services could be deployed on users' devices at the Internet edge rather than putting everything on the cloud. Moving ML-based data analytics from cloud to edge devices brings a series of challenges. We make three contributions in this paper. First, besides the widely discussed resource limitation on edge devices, we further identify two other challenges that are not yet recognised in existing literature: lack of suitable models for users, and difficulties in deploying services for users. Second, we present preliminary work of the first systematic solution, i.e. Zoo, to fully support the construction, composing, and deployment of ML models on edge and local devices. Third, in the deployment example, ML service are proved to be easy to compose and deploy with Zoo. Evaluation shows its superior performance compared with state-of-art deep learning platforms and Google ML services.

Shahbar, K., Zincir-Heywood, A. N..  2018.  How Far Can We Push Flow Analysis to Identify Encrypted Anonymity Network Traffic? NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium. :1–6.

Anonymity networks provide privacy to the users by relaying their data to multiple destinations in order to reach the final destination anonymously. Multilayer of encryption is used to protect the users' privacy from attacks or even from the operators of the stations. In this research, we showed how flow analysis could be used to identify encrypted anonymity network traffic under four scenarios: (i) Identifying anonymity networks compared to normal background traffic; (ii) Identifying the type of applications used on the anonymity networks; (iii) Identifying traffic flow behaviors of the anonymity network users; and (iv) Identifying / profiling the users on an anonymity network based on the traffic flow behavior. In order to study these, we employ a machine learning based flow analysis approach and explore how far we can push such an approach.