Visible to the public Biblio

Filters: Keyword is semantic similarity  [Clear All Filters]
2021-11-29
Hu, Shengze, He, Chunhui, Ge, Bin, Liu, Fang.  2020.  Enhanced Word Embedding Method in Text Classification. 2020 6th International Conference on Big Data and Information Analytics (BigDIA). :18–22.
For the task of natural language processing (NLP), Word embedding technology has a certain impact on the accuracy of deep neural network algorithms. Considering that the current word embedding method cannot realize the coexistence of words and phrases in the same vector space. Therefore, we propose an enhanced word embedding (EWE) method. Before completing the word embedding, this method introduces a unique sentence reorganization technology to rewrite all the sentences in the original training corpus. Then, all the original corpus and the reorganized corpus are merged together as the training corpus of the distributed word embedding model, so as to realize the coexistence problem of words and phrases in the same vector space. We carried out experiment to demonstrate the effectiveness of the EWE algorithm on three classic benchmark datasets. The results show that the EWE method can significantly improve the classification performance of the CNN model.
2020-05-22
Khadilkar, Kunal, Kulkarni, Siddhivinayak, Bone, Poojarani.  2018.  Plagiarism Detection Using Semantic Knowledge Graphs. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). :1—6.

Every day, huge amounts of unstructured text is getting generated. Most of this data is in the form of essays, research papers, patents, scholastic articles, book chapters etc. Many plagiarism softwares are being developed to be used in order to reduce the stealing and plagiarizing of Intellectual Property (IP). Current plagiarism softwares are mainly using string matching algorithms to detect copying of text from another source. The drawback of some of such plagiarism softwares is their inability to detect plagiarism when the structure of the sentence is changed. Replacement of keywords by their synonyms also fails to be detected by these softwares. This paper proposes a new method to detect such plagiarism using semantic knowledge graphs. The method uses Named Entity Recognition as well as semantic similarity between sentences to detect possible cases of plagiarism. The doubtful cases are visualized using semantic Knowledge Graphs for thorough analysis of authenticity. Rules for active and passive voice have also been considered in the proposed methodology.

2020-03-23
Xu, Yilin, Ge, Weimin, Li, Xiaohong, Feng, Zhiyong, Xie, Xiaofei, Bai, Yude.  2019.  A Co-Occurrence Recommendation Model of Software Security Requirement. 2019 International Symposium on Theoretical Aspects of Software Engineering (TASE). :41–48.
To guarantee the quality of software, specifying security requirements (SRs) is essential for developing systems, especially for security-critical software systems. However, using security threat to determine detailed SR is quite difficult according to Common Criteria (CC), which is too confusing and technical for non-security specialists. In this paper, we propose a Co-occurrence Recommend Model (CoRM) to automatically recommend software SRs. In this model, the security threats of product are extracted from security target documents of software, in which the related security requirements are tagged. In order to establish relationships between software security threat and security requirement, semantic similarities between different security threat is calculated by Skip-thoughts Model. To evaluate our CoRM model, over 1000 security target documents of 9 types software products are exploited. The results suggest that building a CoRM model via semantic similarity is feasible and reliable.
2018-04-30
Kafali, Ö, Jones, J., Petruso, M., Williams, L., Singh, M. P..  2017.  How Good Is a Security Policy against Real Breaches? A HIPAA Case Study 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). :530–540.

Policy design is an important part of software development. As security breaches increase in variety, designing a security policy that addresses all potential breaches becomes a nontrivial task. A complete security policy would specify rules to prevent breaches. Systematically determining which, if any, policy clause has been violated by a reported breach is a means for identifying gaps in a policy. Our research goal is to help analysts measure the gaps between security policies and reported breaches by developing a systematic process based on semantic reasoning. We propose SEMAVER, a framework for determining coverage of breaches by policies via comparison of individual policy clauses and breach descriptions. We represent a security policy as a set of norms. Norms (commitments, authorizations, and prohibitions) describe expected behaviors of users, and formalize who is accountable to whom and for what. A breach corresponds to a norm violation. We develop a semantic similarity metric for pairwise comparison between the norm that represents a policy clause and the norm that has been violated by a reported breach. We use the US Health Insurance Portability and Accountability Act (HIPAA) as a case study. Our investigation of a subset of the breaches reported by the US Department of Health and Human Services (HHS) reveals the gaps between HIPAA and reported breaches, leading to a coverage of 65%. Additionally, our classification of the 1,577 HHS breaches shows that 44% of the breaches are accidental misuses and 56% are malicious misuses. We find that HIPAA's gaps regarding accidental misuses are significantly larger than its gaps regarding malicious misuses.

2018-04-02
Jia, J., Chen, L..  2017.  (L, m, d) \#x2014; Anonymity : A Resisting Similarity Attack Model for Multiple Sensitive Attributes. 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). :756–760.

Preserving privacy is extremely important in data publishing. The existing privacy-preserving models are mostly oriented to single sensitive attribute, can not be applied to multiple sensitive attributes situation. Moreover, they do not consider the semantic similarity between sensitive attribute values, and may be vulnerable to similarity attack. In this paper, we propose a (l, m, d)-anonymity model for multiple sensitive attributes similarity attack, where m is the dimension of the sensitive attributes. This model uses the semantic hierarchical tree to analyze and compute the semantic dissimilarity between sensitive attribute values, and each equivalence class must exist at least l sensitive attribute values that satisfy d-different on each dimension sensitive attribute. Meanwhile, in order to make the published data highly available, our model adopts the distance-based measurement method to divide the equivalence class. We carry out extensive experiments to certify the (1, m, d)-anonymity model can significantly reduce the probability of sensitive information leakage and protect individual privacy more effectively.

2017-05-19
Wang, Xiangru, Nourashrafeddin, Seyednaser, Milios, Evangelos.  2016.  Relaxing Orthogonality Assumption in Conceptual Text Document Similarity. Proceedings of the 2016 ACM Symposium on Document Engineering. :69–78.

By reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms.