Biblio
Filters: Keyword is Measurement [Clear All Filters]
Big Data Nearest Neighbor Similar Data Retrieval Algorithm based on Improved Random Forest. 2021 International Conference on Big Data Analysis and Computer Science (BDACS). :175—178.
.
2021. In the process of big data nearest neighbor similar data retrieval, affected by the way of data feature extraction, the retrieval accuracy is low. Therefore, this paper proposes the design of big data nearest neighbor similar data retrieval algorithm based on improved random forest. Through the improvement of random forest model and the construction of random decision tree, the characteristics of current nearest neighbor big data are clarified. Based on the improved random forest, the hash code is generated. Finally, combined with the Hamming distance calculation method, the nearest neighbor similar data retrieval of big data is realized. The experimental results show that: in the multi label environment, the retrieval accuracy is improved by 9% and 10%. In the single label environment, the similar data retrieval accuracy of the algorithm is improved by 12% and 28% respectively.
Privacy-Preserving near Neighbor Search via Sparse Coding with Ambiguation. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :2635—2639.
.
2021. In this paper, we propose a framework for privacy-preserving approximate near neighbor search via stochastic sparsifying encoding. The core of the framework relies on sparse coding with ambiguation (SCA) mechanism that introduces the notion of inherent shared secrecy based on the support intersection of sparse codes. This approach is ‘fairness-aware’, in the sense that any point in the neighborhood has an equiprobable chance to be chosen. Our approach can be applied to raw data, latent representation of autoencoders, and aggregated local descriptors. The proposed method is tested on both synthetic i.i.d data and real image databases.
K-mean Index Learning for Multimedia Datasets. 2021 13th International Conference on Knowledge and Smart Technology (KST). :6—11.
.
2021. Currently, one method to deal with the storage and computation of multimedia retrieval applications is an approximate nearest neighbor (ANN) search. Hashing algorithms and Vector quantization (VQ) are widely used in ANN search. So, K-mean clustering is a method of VQ that can solve those problems. With the increasing growth of multimedia data such as text view, image view, video view, audio view, and 3D view. Thus, it is a reason that why multimedia retrieval is very important. We can retrieve the results of each media type by inputting a query of that type. Even though many hashing algorithms and VQ techniques are proposed to produce a compact or short binary codes. In the real-time purposes the exhaustive search is impractical, and Hamming distance computation in the Hamming space suffers inaccurate results. The challenge of this paper is focusing on how to learn multimedia raw data or features representation to search on each media type for multimedia retrieval. So we propose a new search method that utilizes K-mean hash codes by computing the probability of a cluster in the index code. The proposed employs the index code from the K-mean cluster number that is converted to hash code. The inverted index table is constructed basing on the K-mean hash code. Then we can improve the original K-mean index accuracy and efficiency by learning a deep neural network (DNN). We performed the experiments on four benchmark multimedia datasets to retrieve each view such as 3D, image, video, text, and audio, where hash codes are produced by K-mean clustering methods. Our results show the effectiveness boost the performance on the baseline (exhaustive search).
k-Nearest Neighbor algorithm based on feature subspace. 2021 International Conference on Big Data Analysis and Computer Science (BDACS). :225—228.
.
2021. The traditional KNN algorithm takes insufficient consideration of the spatial distribution of training samples, which leads to low accuracy in processing high-dimensional data sets. Moreover, the generation of k nearest neighbors requires all known samples to participate in the distance calculation, resulting in high time overhead. To solve these problems, a feature subspace based KNN algorithm (Feature Subspace KNN, FSS-KNN) is proposed in this paper. First, the FSS-KNN algorithm solves all the feature subspaces according to the distribution of the training samples in the feature space, so as to ensure that the samples in the same subspace have higher similarity. Second, the corresponding feature subspace is matched for the test set samples. On this basis, the search of k nearest neighbors is carried out in the corresponding subspace first, thus improving the accuracy and efficiency of the algorithm. Experimental results show that compared with the traditional KNN algorithm, FSS-KNN algorithm improves the accuracy and efficiency on Kaggle data set and UCI data set. Compared with the other four classical machine learning algorithms, FSS-KNN algorithm can significantly improve the accuracy.
In-Memory Nearest Neighbor Search with FeFET Multi-Bit Content-Addressable Memories. 2021 Design, Automation Test in Europe Conference Exhibition (DATE). :1084—1089.
.
2021. Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing \$L\$∞ and Hamming distance metrics, but they cannot achieve software-comparable accuracies. This paper proposes a novel distance function that can be natively evaluated with multi-bit content-addressable memories (MCAMs) based on ferroelectric FETs (Fe-FETs) to perform a single-step, in-memory NN search. Moreover, this approach achieves accuracies comparable to floating-point precision implementations in software for NN classification and one/few-shot learning tasks. As an example, the proposed method achieves a 98.34% accuracy for a 5-way, 5-shot classification task for the Omniglot dataset (only 0.8% lower than software-based implementations) with a 3-bit MCAM. This represents a 13% accuracy improvement over state-of-the-art TCAM-based implementations at iso-energy and iso-delay. The presented distance function is resilient to the effects of FeFET device-to-device variations. Furthermore, this work experimentally demonstrates a 2-bit implementation of FeFET MCAM using AND arrays from GLOBALFOUNDRIES to further validate proof of concept.
Enhancing Learned Index for A Higher Recall Trajectory K-Nearest Neighbor Search. 2021 IEEE International Conference on Big Data (Big Data). :6006—6007.
.
2021. Learned indices can significantly shorten the query response time of k-Nearest Neighbor search of points data. However, extending the learned index for k-Nearest Neighbor search of trajectory data may return incorrect results (low recall) and require longer pruning time. Thus, we introduce an enhancement for trajectory learned index which is a pruning step for a learned index to retrieve the k-Nearest Neighbors correctly by learning the query workload. The pruning utilizes a predicted range query that covers the correct neighbors. We show that that our approach has the potential to work effectively in a large real-world trajectory dataset.
Nearest Neighbor Search In Hyperspectral Data Using Binary Space Partitioning Trees. 2021 11th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS). :1—4.
.
2021. Fast search of hyperspectral data is crucial in many practical applications ranging from classification to finding duplicate fragments in images. In this paper, we evaluate two space partitioning data structures in the task of searching hyperspectral data. In particular, we consider vp-trees and ball-trees, study several tree construction algorithms, and compare these structures with the brute force approach. In addition, we evaluate vp-trees and ball-trees with four similarity measures, namely, Euclidean Distance, Spectral Angle Mapper Bhattacharyya Angle, and Hellinger distance.
Design of nearest neighbor search for dynamic interaction points. 2021 2nd International Conference on Big Data and Informatization Education (ICBDIE). :389—393.
.
2021. This article describes the definition, theoretical derivation, design ideas, and specific implementation of the nearest query algorithm for the acceleration of probabilistic optimization at first, and secondly gives an optimization conclusion that is generally applicable to high-dimensional Minkowski spaces with even-numbered feature parameters. Thirdly the operating efficiency and space sensitivity of this algorithm and the commonly used algorithms are compared from both theoretical and experimental aspects. Finally, the optimization direction is analyzed based on the results.
Accelerating Large-Scale Nearest Neighbor Search with Computational Storage Device. 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). :254—254.
.
2021. K-nearest neighbor algorithm that searches the K closest samples in a high dimensional feature space is one of the most fundamental tasks in machine learning and image retrieval applications. Computational storage device that combines computing unit and storage module on a single board becomes popular to address the data bandwidth bottleneck of the conventional computing system. In this paper, we propose a nearest neighbor search acceleration platform based on computational storage device, which can process a large-scale image dataset efficiently in terms of speed, energy, and cost. We believe that the proposed acceleration platform is promising to be deployed in cloud datacenters for data-intensive applications.
Analysis of Interest and Data Packet Behaviour in Vehicular Named Data Network. 2021 IEEE Madras Section Conference (MASCON). :1–5.
.
2021. Named Data Network (NDN) is considered to be the future of Internet architecture. The nature of NDN is to disseminate data based on the naming scheme rather than the location of the node. This feature caters to the need of vehicular applications, resulting in Vehicular Named Data Networks (VNDN). Although it is still in the initial stages of research, the collaboration has assured various advantages which attract the researchers to explore the architecture further. VNDN face challenges such as intermittent connectivity, mobility of nodes, design of efficient forwarding and naming schemes, among others. In order to develop effective forwarding strategies, behavior of data and interest packets under various circumstances needs to be studied. In this paper, propagation behavior of data and interest packets is analyzed by considering metrics such as Interest Satisfaction Ratio (ISR), Hop Count Difference (HCD) and Copies of Data Packets Processed (CDPP). These metrics are evaluated under network conditions such as varying network size, node mobility and amount of interest produced by each node. Simulation results show that data packets do not follow the reverse path of interest packets.
A Novel Assessment Metric for Intelligent Fault Diagnosis of Rolling Bearings with Different Fault Severities and Orientations. 2021 7th International Conference on Condition Monitoring of Machinery in Non-Stationary Operations (CMMNO). :225–228.
.
2021. The output of rolling bearings, as one of the most widely used support elements, has a significant impact on the equipment's stability and protection. Automatic and effective mining of features representing performance condition plays an important role in ensuring its reliability. However, in the actual process, there are often differences in the quality of features extracted from feature engineering, and this difference cannot be evaluated by commonly used methods, such as correlation metric and monotonicity metric. In order to accurately and automatically evaluate and select effective features, a novel assessment metric is established based on the attributes of the feature itself. Firstly, the features are extracted from different domains, which contain differential information, and a feature set is constructed. Secondly, the performances of the features are evaluated and selected based on internal distance and external distance, which is a novel feature evaluation model for classification task. Finally, an adaptive boosting strategy that combines multiple weak learners is adopted to achieve the fault identification at different severities and orientations. One experimental bearing dataset is adopted to analyze, and effectiveness and accuracy of proposed metric index is verified.
A Framework For Network Intrusion Detection Based on Unsupervised Learning. 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID). :188–193.
.
2021. Anomaly detection is the primary method of detecting intrusion. Unsupervised models, such as auto-encoders network, auto-encoder, and GMM, are currently the most widely used anomaly detection techniques. In reality, the samples used to train the unsupervised model may not be pure enough and may include some abnormal samples. However, the classification effect is poor since these approaches do not completely understand the association between reconstruction errors, reconstruction characteristics, and irregular sample density distribution. This paper proposes a novel intrusion detection system architecture that includes data collection, processing, and feature extraction by integrating data reconstruction features, reconstruction errors, auto-encoder parameters, and GMM. Our system outperforms other unsupervised learning-based detection approaches in terms of accuracy, recall, F1-score, and other assessment metrics after training and testing on multiple intrusion detection data sets.
A Comprehensive Data Sampling Analysis Applied to the Classification of Rare IoT Network Intrusion Types. 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC). :1–2.
.
2021. With the rapid growth of Internet of Things (IoT) network intrusion attacks, there is a critical need for sophisticated and comprehensive intrusion detection systems (IDSs). Classifying infrequent intrusion types such as root-to-local (R2L) and user-to-root (U2R) attacks is a reoccurring problem for IDSs. In this study, various data sampling and class balancing techniques-Generative Adversarial Network (GAN)-based oversampling, k-nearest-neighbor (kNN) oversampling, NearMiss-1 undersampling, and class weights-were used to resolve the severe class imbalance affecting U2R and R2L attacks in the NSL-KDD intrusion detection dataset. Artificial Neural Networks (ANNs) were trained on the adjusted datasets, and their performances were evaluated with a multitude of classification metrics. Here, we show that using no data sampling technique (baseline), GAN-based oversampling, and NearMiss-l undersampling, all with class weights, displayed high performances in identifying R2L and U2R attacks. Of these, the baseline with class weights had the highest overall performance with an F1-score of 0.11 and 0.22 for the identification of U2R and R2L attacks, respectively.
Making Obfuscated PUFs Secure Against Power Side-Channel Based Modeling Attacks. 2021 Design, Automation Test in Europe Conference Exhibition (DATE). :1000–1005.
.
2021. To enhance the security of digital circuits, there is often a desire to dynamically generate, rather than statically store, random values used for identification and authentication purposes. Physically Unclonable Functions (PUFs) provide the means to realize this feature in an efficient and reliable way by utilizing commonly overlooked process variations that unintentionally occur during the manufacturing of integrated circuits (ICs) due to the imperfection of fabrication process. When given a challenge, PUFs produce a unique response. However, PUFs have been found to be vulnerable to modeling attacks where by using a set of collected challenge response pairs (CRPs) and training a machine learning model, the response can be predicted for unseen challenges. To combat this vulnerability, researchers have proposed techniques such as Challenge Obfuscation. However, as shown in this paper, this technique can be compromised via modeling the PUF's power side-channel. We first show the vulnerability of a state-of-the-art Challenge Obfuscated PUF (CO-PUF) against power analysis attacks by presenting our attack results on the targeted CO-PUF. Then we propose two countermeasures, as well as their hybrid version, that when applied to the CO-PUFs make them resilient against power side-channel based modeling attacks. We also provide some insights on the proper design metrics required to be taken when implementing these mitigations. Our simulation results show the high success of our attack in compromising the original Challenge Obfuscated PUFs (success rate textgreater 98%) as well as the significant improvement on resilience of the obfuscated PUFs against power side-channel based modeling when equipped with our countermeasures.
Translating Intrusion Alerts to Cyberattack Stages Using Pseudo-Active Transfer Learning (PATRL). 2021 IEEE Conference on Communications and Network Security (CNS). :110–118.
.
2021. Intrusion alerts continue to grow in volume, variety, and complexity. Its cryptic nature requires substantial time and expertise to interpret the intended consequence of observed malicious actions. To assist security analysts in effectively diagnosing what alerts mean, this work develops a novel machine learning approach that translates alert descriptions to intuitively interpretable Action-Intent-Stages (AIS) with only 1% labeled data. We combine transfer learning, active learning, and pseudo labels and develop the Pseudo-Active Transfer Learning (PATRL) process. The PATRL process begins with an unsupervised-trained language model using MITRE ATT&CK, CVE, and IDS alert descriptions. The language model feeds to an LSTM classifier to train with 1% labeled data and is further enhanced with active learning using pseudo labels predicted by the iteratively improved models. Our results suggest PATRL can predict correctly for 85% (top-1 label) and 99% (top-3 labels) of the remaining 99% unknown data. Recognizing the need to build confidence for the analysts to use the model, the system provides Monte-Carlo Dropout Uncertainty and Pseudo-Label Convergence Score for each of the predicted alerts. These metrics give the analyst insights to determine whether to directly trust the top-1 or top-3 predictions and whether additional pseudo labels are needed. Our approach overcomes a rarely tackled research problem where minimal amounts of labeled data do not reflect the truly unlabeled data's characteristics. Combining the advantages of transfer learning, active learning, and pseudo labels, the PATRL process translates the complex intrusion alert description for the analysts with confidence.
Automated Security Assessment for the Internet of Things. 2021 IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC). :47–56.
.
2021. Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and poten-tial vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner.
Cyberbullying Predictive Model: Implementation of Machine Learning Approach. 2021 Fifth International Conference on Information Retrieval and Knowledge Management (CAMP). :65–69.
.
2021. Machine learning is implemented extensively in various applications. The machine learning algorithms teach computers to do what comes naturally to humans. The objective of this study is to do comparison on the predictive models in cyberbullying detection between the basic machine learning system and the proposed system with the involvement of feature selection technique, resampling and hyperparameter optimization by using two classifiers; Support Vector Classification Linear and Decision Tree. Corpus from ASKfm used to extract word n-grams features before implemented into eight different experiments setup. Evaluation on performance metric shows that Decision Tree gives the best performance when tested using feature selection without resampling and hyperparameter optimization involvement. This shows that the proposed system is better than the basic setting in machine learning.
Investigating the Changes in Software Metrics after Vulnerability Is Fixed. 2021 IEEE International Conference on Big Data (Big Data). :5658–5663.
.
2021. Preventing software vulnerabilities while writing code is one of the most effective ways for avoiding cyber attacks on any developed system. Although developers follow some standard guiding principles for ensuring secure code, the code can still have security bottlenecks and be compromised by an attacker. Therefore, assessing software security while developing code can help developers in writing vulnerability free code. Researchers have already focused on metrics-based and text mining based software vulnerability prediction models. The metrics based models showed higher precision in predicting vulnerabilities although the recall rate is low. In addition, current research did not investigate the impact of individual software metric on the occurrences of vulnerabilities. The main objective of this paper is to track the changes in every software metric after the developer fixes a particular vulnerability. The results of our research will potentially motivate further research on building more accurate vulnerability prediction models based on the appropriate software metrics. In particular, we have compared a total of 250 files from Apache Tomcat and Apache CXF. These files were extracted from the Apache database and were chosen because Apache released these files as vulnerable in their publicly available security advisories. Using a static analysis tool, metrics of the targeted vulnerable files and relevant fixed files (files where vulnerable code is removed by the developers) were extracted and compared. We show that eight of the 40 metrics have an average increase of 2% from vulnerable to fixed files. These metrics include CountDeclClass, CountDeclClassMethod, CountDeclClassVariable, CountDeclInstanceVariable, CountDeclMethodDefault, CountLineCode, MaxCyclomaticStrict, MaxNesting. This study will help developers to assess software security through utilizing software metrics in secure coding practices.
Deep Convolutional Embedding for Digitized Painting Clustering. 2020 25th International Conference on Pattern Recognition (ICPR). :2708–2715.
.
2021. Clustering artworks is difficult for several reasons. On the one hand, recognizing meaningful patterns in accordance with domain knowledge and visual perception is extremely difficult. On the other hand, applying traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, we propose to use a deep convolutional embedding model for digitized painting clustering, in which the task of mapping the raw input data to an abstract, latent space is jointly optimized with the task of finding a set of cluster centroids in this latent feature space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. The model is also capable of outperforming other state-of-the-art deep clustering approaches to the same problem. The proposed method can be useful for several art-related tasks, in particular visual link retrieval and historical knowledge discovery in painting datasets.
Accelerating Federated Edge Learning via Optimized Probabilistic Device Scheduling. 2021 IEEE 22nd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). :606–610.
.
2021. The popular federated edge learning (FEEL) framework allows privacy-preserving collaborative model training via frequent learning-updates exchange between edge devices and server. Due to the constrained bandwidth, only a subset of devices can upload their updates at each communication round. This has led to an active research area in FEEL studying the optimal device scheduling policy for minimizing communication time. However, owing to the difficulty in quantifying the exact communication time, prior work in this area can only tackle the problem partially by considering either the communication rounds or per-round latency, while the total communication time is determined by both metrics. To close this gap, we make the first attempt in this paper to formulate and solve the communication time minimization problem. We first derive a tight bound to approximate the communication time through cross-disciplinary effort involving both learning theory for convergence analysis and communication theory for per-round latency analysis. Building on the analytical result, an optimized probabilistic scheduling policy is derived in closed-form by solving the approximate communication time minimization problem. It is found that the optimized policy gradually turns its priority from suppressing the remaining communication rounds to reducing per-round latency as the training process evolves. The effectiveness of the proposed scheme is demonstrated via a use case on collaborative 3D objective detection in autonomous driving.
Automated Deviation Detection for Partially-Observable Human-Intensive Assembly Processes. 2021 IEEE 19th International Conference on Industrial Informatics (INDIN). :1–8.
.
2021. Unforeseen situations on the shopfloor cause the assembly process to divert from its expected progress. To be able to overcome these deviations in a timely manner, assembly process monitoring and early deviation detection are necessary. However, legal regulations and union policies often limit the direct monitoring of human-intensive assembly processes. Grounded in an industry use case, this paper outlines a novel approach that, based on indirect privacy-respecting monitored data from the shopfloor, enables the near real-time detection of multiple types of process deviations. In doing so, this paper specifically addresses uncertainties stemming from indirect shopfloor observations and how to reason in their presence.
Privacy-Preserving Collaborative Learning with Automatic Transformation Search. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :114–123.
.
2021. Collaborative learning has gained great popularity due to its benefit of data privacy protection: participants can jointly train a Deep Learning model without sharing their training sets. However, recent works discovered that an adversary can fully recover the sensitive training samples from the shared gradients. Such reconstruction attacks pose severe threats to collaborative learning. Hence, effective mitigation solutions are urgently desired.In this paper, we propose to leverage data augmentation to defeat reconstruction attacks: by preprocessing sensitive images with carefully-selected transformation policies, it becomes infeasible for the adversary to extract any useful information from the corresponding gradients. We design a novel search method to automatically discover qualified policies. We adopt two new metrics to quantify the impacts of transformations on data privacy and model usability, which can significantly accelerate the search speed. Comprehensive evaluations demonstrate that the policies discovered by our method can defeat existing reconstruction attacks in collaborative learning, with high efficiency and negligible impact on the model performance.
Residential Sector Demand Side Management: A Review. 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON). :1–6.
.
2021. Demand-side management (DSM) plays a significant function in the smart distribution system to make informed decisions from both the consumer and supplier side with regards to energy consumption to redesign the load profile and to decrease the peak load demand. This study extensively reviews the demand-side management (DSM) strategies along with both demand response and energy efficiency policies. The major objective of this paper is to enumerate the relevant features responsible to strengthen the DSM effectively, particularly for residential energy demand and the limits to energy indicators. Secondly, the large untapped and hidden potential and the associated barriers to energy efficiency enhancement are focused and surveyed for formulating a better number of potential policy responses. This further explores the portfolio approach with bundled strategies to reflect on the power market through enhancing the strength of individual residential measures through complementary policies to reduce the weaknesses. This concludes at last with the findings of possible holistic measures related to various approaches and attributes findings that reinforce the DSM strategies to enhance energy management and cost-effectiveness. Apart from that the architecture, formulation of optimization problems, and various approaches are presented to help the readers to develop research in this direction to maximize the total system peak demand, overall load factor, and utility revenue with the minimized customer electric bill.
A Blockchain-Based Evidential and Secure Bulk-Commodity Supervisory System. 2021 International Conference on Service Science (ICSS). :1–6.
.
2021. In recent years, the commodities industry has grown rapidly under the stimulus of domestic demand and the expansion of cross-border trade. It has also been combined with the rapid development of e-commerce technology in the same period to form a flexible and efficient e-commerce system for bulk commodities. However, the hasty combination of both has inspired a lack of effective regulatory measures in the bulk industry, leading to constant industry chaos. Among them, the problem of lagging evidence in regulatory platforms is particularly prominent. Based on this, we design a blockchain-based evidential and secure bulk-commodity supervisory system (abbr. BeBus). Setting different privacy protection policies for each participant in the system, the solution ensures effective forensics and tamper-proof evidence to meet the needs of the bulk business scenario.
A Model-Based Approach to Realize Privacy and Data Protection by Design. 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :332–339.
.
2021. Telecommunications and data are pervasive in almost each aspect of our every-day life and new concerns progressively arise as a result of stakes related to privacy and data protection [1]. Indeed, systems development becomes data-centric leading to an ecosystem where a variety of players intervene (citizens, industry, regulators) and where the policies regarding data usage and utilization are far from consensual. The new General Data Protection Regulation (GDPR) enacted by the European Commission in 2018 has introduced new provisions including principles for lawfulness, fairness, transparency, etc. thus endorsing data subjects with new rights in regards to their personal data. In this context, a growing need for approaches that conceptualize and help engineers to integrate GDPR and privacy provisions at design time becomes paramount. This paper presents a comprehensive approach to support different phases of the design process with special attention to the integration of privacy and data protection principles. Among others, it is a generic model-based approach that can be specialized according to the specifics of different application domains.