Visible to the public Biblio

Filters: Keyword is Knowledge discovery  [Clear All Filters]
2023-08-11
Wang, Jing, Wu, Fengheng, Zhang, Tingbo, Wu, Xiaohua.  2022.  DPP: Data Privacy-Preserving for Cloud Computing based on Homomorphic Encryption. 2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :29—32.
Cloud computing has been widely used because of its low price, high reliability, and generality of services. However, considering that cloud computing transactions between users and service providers are usually asynchronous, data privacy involving users and service providers may lead to a crisis of trust, which in turn hinders the expansion of cloud computing applications. In this paper, we propose DPP, a data privacy-preserving cloud computing scheme based on homomorphic encryption, which achieves correctness, compatibility, and security. DPP implements data privacy-preserving by introducing homomorphic encryption. To verify the security of DPP, we instantiate DPP based on the Paillier homomorphic encryption scheme and evaluate the performance. The experiment results show that the time-consuming of the key steps in the DPP scheme is reasonable and acceptable.
2023-05-12
Pupezescu, Valentin, Pupezescu, Marilena-Cătălina, Perișoară, Lucian-Andrei.  2022.  Optimizations of Database Management Systems for Real Time IoT Edge Applications. 2022 23rd International Carpathian Control Conference (ICCC). :171–176.

The exponential growth of IoT-type systems has led to a reconsideration of the field of database management systems in terms of storing and handling high-volume data. Recently, many real-time Database Management Systems(DBMS) have been developed to address issues such as security, managing concurrent access to stored data, and optimizing data query performance. This paper studies methods that allow to reduce the temporal validity range for common DBMS. The primary purpose of IoT edge devices is to generate data and make it available for machine learning or statistical algorithms. This is achieved inside the Knowledge Discovery in Databases process. In order to visualize and obtain critical Data Mining results, all the device-generated data must be made available as fast as possible for selection, preprocessing and data transformation. In this research we investigate if IoT edge devices can be used with common DBMS proper configured in order to access data fast instead of working with Real Time DBMS. We will study what kind of transactions are needed in large IoT ecosystems and we will analyze the techniques of controlling concurrent access to common resources (stored data). For this purpose, we built a series of applications that are able to simulate concurrent writing operations to a common DBMS in order to investigate the performance of concurrent access to database resources. Another important procedure that will be tested with the developed applications will be to increase the availability of data for users and data mining applications. This will be achieved by using field indexing.

2022-08-12
Bendre, Nihar, Desai, Kevin, Najafirad, Peyman.  2021.  Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention. 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). :3006–3012.
Visual Question Answering (VQA) models have achieved significant success in recent times. Despite the success of VQA models, they are mostly black-box models providing no reasoning about the predicted answer, thus raising questions for their applicability in safety-critical such as autonomous systems and cyber-security. Current state of the art fail to better complex questions and thus are unable to exploit compositionality. To minimize the black-box effect of these models and also to make them better exploit compositionality, we propose a Dynamic Neural Network (DMN), which can understand a particular question and then dynamically assemble various relatively shallow deep learning modules from a pool of modules to form a network. We incorporate compositional temporal attention to these deep learning based modules to increase compositionality exploitation. This results in achieving better understanding of complex questions and also provides reasoning as to why the module predicts a particular answer. Experimental analysis on the two benchmark datasets, VQA2.0 and CLEVR, depicts that our model outperforms the previous approaches for Visual Question Answering task as well as provides better reasoning, thus making it reliable for mission critical applications like safety and security.
2022-07-12
Özdemir, Durmuş, Çelik, Dilek.  2021.  Analysis of Encrypted Image Data with Deep Learning Models. 2021 International Conference on Information Security and Cryptology (ISCTURKEY). :121—126.
While various encryption algorithms ensure data security, it is essential to determine the accuracy and loss values and performance status in the analyzes made to determine encrypted data by deep learning. In this research, the analysis steps made by applying deep learning methods to encrypted cifar10 picture data are presented practically. The data was tried to be estimated by training with VGG16, VGG19, ResNet50 deep learning models. During this period, the network’s performance was tried to be measured, and the accuracy and loss values in these calculations were shown graphically.
2022-06-08
Imtiaz, Sayem Mohammad, Sultana, Kazi Zakia, Varde, Aparna S..  2021.  Mining Learner-friendly Security Patterns from Huge Published Histories of Software Applications for an Intelligent Tutoring System in Secure Coding. 2021 IEEE International Conference on Big Data (Big Data). :4869–4876.

Security patterns are proven solutions to recurring problems in software development. The growing importance of secure software development has introduced diverse research efforts on security patterns that mostly focused on classification schemes, evolution and evaluation of the patterns. Despite a huge mature history of research and popularity among researchers, security patterns have not fully penetrated software development practices. Besides, software security education has not been benefited by these patterns though a commonly stated motivation is the dissemination of expert knowledge and experience. This is because the patterns lack a simple embodiment to help students learn about vulnerable code, and to guide new developers on secure coding. In order to address this problem, we propose to conduct intelligent data mining in the context of software engineering to discover learner-friendly software security patterns. Our proposed model entails knowledge discovery from large scale published real-world vulnerability histories in software applications. We harness association rule mining for frequent pattern discovery to mine easily comprehensible and explainable learner-friendly rules, mainly of the type "flaw implies fix" and "attack type implies flaw", so as to enhance training in secure coding which in turn would augment secure software development. We propose to build a learner-friendly intelligent tutoring system (ITS) based on the newly discovered security patterns and rules explored. We present our proposed model based on association rule mining in secure software development with the goal of building this ITS. Our proposed model and prototype experiments are discussed in this paper along with challenges and ongoing work.

2022-02-24
Castellano, Giovanna, Vessio, Gennaro.  2021.  Deep Convolutional Embedding for Digitized Painting Clustering. 2020 25th International Conference on Pattern Recognition (ICPR). :2708–2715.
Clustering artworks is difficult for several reasons. On the one hand, recognizing meaningful patterns in accordance with domain knowledge and visual perception is extremely difficult. On the other hand, applying traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, we propose to use a deep convolutional embedding model for digitized painting clustering, in which the task of mapping the raw input data to an abstract, latent space is jointly optimized with the task of finding a set of cluster centroids in this latent feature space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. The model is also capable of outperforming other state-of-the-art deep clustering approaches to the same problem. The proposed method can be useful for several art-related tasks, in particular visual link retrieval and historical knowledge discovery in painting datasets.
2021-09-30
Wang, Wei, Liu, Tieyuan, Chang, Liang, Gu, Tianlong, Zhao, Xuemei.  2020.  Convolutional Recurrent Neural Networks for Knowledge Tracing. 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :287–290.
Knowledge Tracing (KT) is a task that aims to assess students' mastery level of knowledge and predict their performance over questions, which has attracted widespread attention over the years. Recently, an increasing number of researches have applied deep learning techniques to knowledge tracing and have made a huge success over traditional Bayesian Knowledge Tracing methods. Most existing deep learning-based methods utilized either Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs). However, it is worth noticing that these two sorts of models are complementary in modeling abilities. Thus, in this paper, we propose a novel knowledge tracing model by taking advantage of both two models via combining them into a single integrated model, named Convolutional Recurrent Knowledge Tracing (CRKT). Extensive experiments show that our model outperforms the state-of-the-art models in multiple KT datasets.
2021-05-25
Fang, Ying, Gu, Tianlong, Chang, Liang, Li, Long.  2020.  Algebraic Decision Diagram-Based CP-ABE with Constant Secret and Fast Decryption. 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :98–106.
Ciphertext-policy attribute-based encryption (CP-ABE) is applied to many data service platforms to provides secure and fine-grained access control. In this paper, a new CP-ABE system based on the algebraic decision diagram (ADD) is presented. The new system makes full use of both the powerful description ability and the high calculating efficiency of ADD to improves the performance and efficiency of algorithms contained in CP-ABE. First, the new system supports both positive and negative attributes in the description of access polices. Second, the size of the secret key is constant and is not affected by the number of attributes. Third, time complexity of the key generation and decryption algorithms are O(1). Finally, this scheme allows visitors to have different access permissions to access shared data or file. At the same time, PV operation is introduced into CP-ABE framework for the first time to prevent resource conflicts caused by read and write operations on shared files. Compared with other schemes, the new scheme proposed in this paper performs better in function and efficiency.
2020-12-11
Correia, A., Fonseca, B., Paredes, H., Schneider, D., Jameel, S..  2019.  Development of a Crowd-Powered System Architecture for Knowledge Discovery in Scientific Domains. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :1372—1377.
A substantial amount of work is often overlooked due to the exponential rate of growth in global scientific output across all disciplines. Current approaches for addressing this issue are usually limited in scope and often restrict the possibility of obtaining multidisciplinary views in practice. To tackle this problem, researchers can now leverage an ecosystem of citizens, volunteers and crowd workers to perform complex tasks that are either difficult for humans and machines to solve alone. Motivated by the idea that human crowds and computer algorithms have complementary strengths, we present an approach where the machine will learn from crowd behavior in an iterative way. This approach is embodied in the architecture of SciCrowd, a crowd-powered human-machine hybrid system designed to improve the analysis and processing of large amounts of publication records. To validate the proposal's feasibility, a prototype was developed and an initial evaluation was conducted to measure its robustness and reliability. We conclude this paper with a set of implications for design.
2020-10-05
Liu, Donglei, Niu, Zhendong, Zhang, Chunxia, Zhang, Jiadi.  2019.  Multi-Scale Deformable CNN for Answer Selection. IEEE Access. 7:164986—164995.

The answer selection task is one of the most important issues within the automatic question answering system, and it aims to automatically find accurate answers to questions. Traditional methods for this task use manually generated features based on tf-idf and n-gram models to represent texts, and then select the right answers according to the similarity between the representations of questions and the candidate answers. Nowadays, many question answering systems adopt deep neural networks such as convolutional neural network (CNN) to generate the text features automatically, and obtained better performance than traditional methods. CNN can extract consecutive n-gram features with fixed length by sliding fixed-length convolutional kernels over the whole word sequence. However, due to the complex semantic compositionality of the natural language, there are many phrases with variable lengths and be composed of non-consecutive words in natural language, such as these phrases whose constituents are separated by other words within the same sentences. But the traditional CNN is unable to extract the variable length n-gram features and non-consecutive n-gram features. In this paper, we propose a multi-scale deformable convolutional neural network to capture the non-consecutive n-gram features by adding offset to the convolutional kernel, and also propose to stack multiple deformable convolutional layers to mine multi-scale n-gram features by the means of generating longer n-gram in higher layer. Furthermore, we apply the proposed model into the task of answer selection. Experimental results on public dataset demonstrate the effectiveness of our proposed model in answer selection.

2020-05-18
Han, Ying, Li, Kun, Ge, Fawei.  2019.  Multiple Fault Diagnosis for Sucker Rod Pumping Systems Based on Matter Element Analysis with F-statistics. 2019 IEEE 8th Data Driven Control and Learning Systems Conference (DDCLS). :66–70.
Dynamometer cards can reflect different down-hole working conditions of sucker rod pumping wells. It has great significances to realize multiple fault diagnosis for actual oilfield production. In this paper, the extension theory is used to build a matter-element model to describe the fault diagnosis problem of the sucker rod pumping wells. The correlation function is used to calculate the correlation degree between the diagnostic fault and many standard fault types. The diagnosed sample and many possible fault types are divided into different combinations according to the correlation degree; the F-statistics of each combination is calculated and the “unbiased transformation” is used to find the mean of interval vectors. Larger F-statistics means greater differences within the faults classification; and the minimum F-statistics reflects the real multiple fault types. Case study shows the effectiveness of the proposed method.
2019-11-25
Zuin, Gianlucca, Chaimowicz, Luiz, Veloso, Adriano.  2018.  Learning Transferable Features For Open-Domain Question Answering. 2018 International Joint Conference on Neural Networks (IJCNN). :1–8.

Corpora used to learn open-domain Question-Answering (QA) models are typically collected from a wide variety of topics or domains. Since QA requires understanding natural language, open-domain QA models generally need very large training corpora. A simple way to alleviate data demand is to restrict the domain covered by the QA model, leading thus to domain-specific QA models. While learning improved QA models for a specific domain is still challenging due to the lack of sufficient training data in the topic of interest, additional training data can be obtained from related topic domains. Thus, instead of learning a single open-domain QA model, we investigate domain adaptation approaches in order to create multiple improved domain-specific QA models. We demonstrate that this can be achieved by stratifying the source dataset, without the need of searching for complementary data unlike many other domain adaptation approaches. We propose a deep architecture that jointly exploits convolutional and recurrent networks for learning domain-specific features while transferring domain-shared features. That is, we use transferable features to enable model adaptation from multiple source domains. We consider different transference approaches designed to learn span-level and sentence-level QA models. We found that domain-adaptation greatly improves sentence-level QA performance, and span-level QA benefits from sentence information. Finally, we also show that a simple clustering algorithm may be employed when the topic domains are unknown and the resulting loss in accuracy is negligible.

2019-09-04
Xiong, M., Li, A., Xie, Z., Jia, Y..  2018.  A Practical Approach to Answer Extraction for Constructing QA Solution. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). :398–404.
Question Answering system(QA) plays an increasingly important role in the Internet age. The proportion of using the QA is getting higher and higher for the Internet users to obtain knowledge and solve problems, especially in the modern agricultural filed. However, the answer quality in QA varies widely due to the agricultural expert's level. Answer quality assessment is important. Due to the lexical gap between questions and answers, the existing approaches are not quite satisfactory. A practical approach RCAS is proposed to rank the candidate answers, which utilizes the support sets to reduce the impact of lexical gap between questions and answers. Firstly, Similar questions are retrieved and support sets are produced with their high-quality answers. Based on the assumption that high quality answers would also have intrinsic similarity, the quality of candidate answers are then evaluated through their distance from the support sets. Secondly, Different from the existing approaches, previous knowledge from similar question-answer pairs are used to bridge the straight lexical and semantic gaps between questions and answers. Experiments are implemented on approximately 0.15 million question-answer pairs about agriculture, dietetics and food from Yahoo! Answers. The results show that our approach can rank the candidate answers more precisely.
Liang, J., Jiang, L., Cao, L., Li, L., Hauptmann, A..  2018.  Focal Visual-Text Attention for Visual Question Answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. :6135–6143.
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset.
2018-09-28
Li-Xin, L., Yong-Shan, D., Jia-Yan, W..  2017.  Differential Privacy Data Protection Method Based on Clustering. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :11–16.

To enhance privacy protection and improve data availability, a differential privacy data protection method ICMD-DP is proposed. Based on insensitive clustering algorithm, ICMD-DP performs differential privacy on the results of ICMD (insensitive clustering method for mixed data). The combination of clustering and differential privacy realizes the differentiation of query sensitivity from single record to group record. At the meanwhile, it reduces the risk of information loss and information disclosure. In addition, to satisfy the requirement of maintaining differential privacy for mixed data, ICMD-DP uses different methods to calculate the distance and centroid of categorical and numerical attributes. Finally, experiments are given to illustrate the availability of the method.

2018-03-26
Liu, W., Chen, F., Hu, H., Cheng, G., Huo, S., Liang, H..  2017.  A Novel Framework for Zero-Day Attacks Detection and Response with Cyberspace Mimic Defense Architecture. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :50–53.

In cyberspace, unknown zero-day attacks can bring safety hazards. Traditional defense methods based on signatures are ineffective. Based on the Cyberspace Mimic Defense (CMD) architecture, the paper proposes a framework to detect the attacks and respond to them. Inputs are assigned to all online redundant heterogeneous functionally equivalent modules. Their independent outputs are compared and the outputs in the majority will be the final response. The abnormal outputs can be detected and so can the attack. The damaged executive modules with abnormal outputs will be replaced with new ones from the diverse executive module pool. By analyzing the abnormal outputs, the correspondence between inputs and abnormal outputs can be built and inputs leading to recurrent abnormal outputs will be written into the zero-day attack related database and their reuses cannot work any longer, as the suspicious malicious inputs can be detected and processed. Further responses include IP blacklisting and patching, etc. The framework also uses honeypot like executive module to confuse the attacker. The proposed method can prevent the recurrent attack based on the same exploit.

2018-02-21
Zhang, G., Qiu, X., Chang, W..  2017.  Scheduling of Security Resources in Software Defined Security Architecture. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :494–503.

With the development of Software Defined Networking, its software programmability and openness brings new idea for network security. Therefore, many Software Defined Security Architectures emerged at the right moment. Software Defined Security decouples security control plane and security data plane. In Software Defined Security Architectures, underlying security devices are abstracted as security resources in resource pool, intellectualized and automated security business management and orchestration can be realized through software programming in security control plane. However, network management has been becoming extremely complicated due to expansible network scale, varying network devices, lack of abstraction and heterogeneity of network especially. Therefore, new-type open security devices are needed in SDS Architecture for unified management so that they can be conveniently abstracted as security resources in resource pool. This paper firstly analyses why open security devices are needed in SDS architecture and proposes a method of opening security devices. Considering this new architecture requires a new security scheduling mechanism, this paper proposes a security resource scheduling algorithm which is used for managing and scheduling security resources in resource pool according to user s security demand. The security resource scheduling algorithm aims to allocate a security protection task to a suitable security resource in resource pool so that improving security protection efficiency. In the algorithm, we use BP neural network to predict the execution time of security tasks to improve the performance of the algorithm. The simulation result shows that the algorithm has ideal performance. Finally, a usage scenario is given to illustrate the role of security resource scheduling in software defined security architecture.

2018-02-06
Shi, Y., Piao, C., Zheng, L..  2017.  Differential-Privacy-Based Correlation Analysis in Railway Freight Service Applications. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :35–39.

With the development of modern logistics industry railway freight enterprises as the main traditional logistics enterprises, the service mode is facing many problems. In the era of big data, for railway freight enterprises, coordinated development and sharing of information resources have become the requirements of the times, while how to protect the privacy of citizens has become one of the focus issues of the public. To prevent the disclosure or abuse of the citizens' privacy information, the citizens' privacy needs to be preserved in the process of information opening and sharing. However, most of the existing privacy preserving models cannot to be used to resist attacks with continuously growing background knowledge. This paper presents the method of applying differential privacy to protect associated data, which can be shared in railway freight service association information. First, the original service data need to slice by optimal shard length, then differential method and apriori algorithm is used to add Laplace noise in the Candidate sets. Thus the citizen's privacy information can be protected even if the attacker gets strong background knowledge. Last, sharing associated data to railway information resource partners. The steps and usefulness of the discussed privacy preservation method is illustrated by an example.

2017-12-12
Zhou, G., Huang, J. X..  2017.  Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval. IEEE Transactions on Knowledge and Data Engineering. 29:1226–1239.

Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.

2017-05-19
Lira, Wallace, Gama, Fernando, Barbosa, Hivana, Alves, Ronnie, de Souza, Cleidson.  2016.  VCloud: Adding Interactiveness to Word Clouds for Knowledge Exploration in Large Unstructured Texts. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :193–198.

The identification of relevant information in large text databases is a challenging task. One of the reasons is human beings' limitations in handling large volumes of data. A common solution for scavenging data from texts are word clouds. A word cloud illustrates word usage in a document by resizing individual words in documents proportionally to how frequently they appear. Even though word clouds are easy to understand, they are not particularly efficient, because they are static. In addition, the presented information lacks context, i.e., words are not explained and they may lead to radically erroneous interpretations. To tackle these problems we developed VCloud, a tool that allows the user to interact with word clouds, therefore allowing informative and interactive data exploration. Furthermore, our tool also allows one to compare two data sets presented as word clouds. We evaluated VCloud using real data about the evolution of gastritis research through the years. The papers indexed by Pubmed related to this medical context were selected for visualization and data analysis using VCloud. A domain expert explored these visualizations, being able to extract useful information from it. This illustrates how can VCloud be a valuable tool for visual text analytics.

2015-05-05
Fernandez Arguedas, V., Pallotta, G., Vespe, M..  2014.  Automatic generation of geographical networks for maritime traffic surveillance. Information Fusion (FUSION), 2014 17th International Conference on. :1-8.

In this paper, an algorithm is proposed to automatically produce hierarchical graph-based representations of maritime shipping lanes extrapolated from historical vessel positioning data. Each shipping lane is generated based on the detection of the vessel behavioural changes and represented in a compact synthetic route composed of the network nodes and route segments. The outcome of the knowledge discovery process is a geographical maritime network that can be used in Maritime Situational Awareness (MSA) applications such as track reconstruction from missing information, situation/destination prediction, and detection of anomalous behaviour. Experimental results are presented, testing the algorithm in a specific scenario of interest, the Dover Strait.
 

Ling-Xi Peng, Tian-Wei Chen.  2014.  Automated Intrusion Response System Algorithm with Danger Theory. Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2014 International Conference on. :31-34.

Intrusion response is a new generation of technology basing on active defence idea, which has very prominent significance on the protection of network security. However, the existing automatic intrusion response systems are difficult to judge the real "danger" of invasion or attack. In this study, an immune-inspired adaptive automated intrusion response system model, named as AIAIM, was given. With the descriptions of self, non-self, memory detector, mature detector and immature detector of the network transactions, the real-time network danger evaluation equations of host and network are built up. Then, the automated response polices are taken or adjusted according to the real-time danger and attack intensity, which not only solve the problem that the current automated response system models could not detect the true intrusions or attack actions, but also greatly reduce the response times and response costs. Theory analysis and experimental results prove that AIAIM provides a positive and active network security method, which will help to overcome the limitations of traditional passive network security system.