Biblio

List
Filter

Found 262 results

Filters: Keyword is Semantics [Clear All Filters]

2022-05-19

Chen, Xiarun, Li, Qien, Yang, Zhou, Liu, Yongzhi, Shi, Shaosen, Xie, Chenglin, Wen, Weiping. 2021. VulChecker: Achieving More Effective Taint Analysis by Identifying Sanitizers Automatically. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :774–782.

The automatic detection of vulnerabilities in Web applications using taint analysis is a hot topic. However, existing taint analysis methods for sanitizers identification are too simple to find available taint transmission chains effectively. These methods generally use pre-constructed dictionaries or simple keywords to identify, which usually suffer from large false positives and false negatives. No doubt, it will have a greater impact on the final result of the taint analysis. To solve that, we summarise and classify the commonly used sanitizers in Web applications and propose an identification method based on semantic analysis. Our method can accurately and completely identify the sanitizers in the target Web applications through static analysis. Specifically, we analyse the natural semantics and program semantics of existing sanitizers, use semantic analysis to find more in Web applications. Besides, we implemented the method prototype in PHP and achieved a vulnerability detection tool called VulChecker. Then, we experimented with some popular open-source CMS frameworks. The results show that Vulchecker can accurately identify more sanitizers. In terms of vulnerability detection, VulChecker also has a lower false positive rate and a higher detection rate than existing methods. Finally, we used VulChecker to analyse the latest PHP applications. We identified several new suspicious taint data propagation chains. Before the paper was completed, we have identified four unreported vulnerabilities. In general, these results show that our approach is highly effective in improving vulnerability detection based on taint analysis.

Anusha, M, Leelavathi, R. 2021. Analysis on Sentiment Analytics Using Deep Learning Techniques. 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :542–547.

Sentiment analytics is the process of applying natural language processing and methods for text-based information to define and extract subjective knowledge of the text. Natural language processing and text classifications can deal with limited corpus data and more attention has been gained by semantic texts and word embedding methods. Deep learning is a powerful method that learns different layers of representations or qualities of information and produces state-of-the-art prediction results. In different applications of sentiment analytics, deep learning methods are used at the sentence, document, and aspect levels. This review paper is based on the main difficulties in the sentiment assessment stage that significantly affect sentiment score, pooling, and polarity detection. The most popular deep learning methods are a Convolution Neural Network and Recurrent Neural Network. Finally, a comparative study is made with a vast literature survey using deep learning models.

Qing-chao, Ni, Cong-jue, Yin, Dong-hua, Zhao. 2021. Research on Small Sample Text Classification Based on Attribute Extraction and Data Augmentation. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). :53–57.

With the development of deep learning and the progress of natural language processing technology, as well as the continuous disclosure of judicial data such as judicial documents, legal intelligence has gradually become a research hot spot. The crime classification task is an important branch of text classification, which can help people related to the law to improve their work efficiency. However, in the actual research, the sample data is small and the distribution of crime categories is not balanced. To solve these two problems, BERT was used as the encoder to solve the problem of small data volume, and attribute extraction network was added to solve the problem of unbalanced distribution. Finally, the accuracy of 90.35% on small sample data set could be achieved, and F1 value was 67.62, which was close to the best model performance under sufficient data. Finally, a text enhancement method based on back-translation technology is proposed. Different models are used to conduct experiments. Finally, it is found that LSTM model is improved to some extent, but BERT is not improved to some extent.

Zhang, Xiangyu, Yang, Jianfeng, Li, Xiumei, Liu, Minghao, Kang, Ruichun, Wang, Runmin. 2021. Deeply Multi-channel guided Fusion Mechanism for Natural Scene Text Detection. 2021 7th International Conference on Big Data and Information Analytics (BigDIA). :149–156.

Scene text detection methods have developed greatly in the past few years. However, due to the limitation of the diversity of the text background of natural scene, the previous methods often failed when detecting more complicated text instances (e.g., super-long text and arbitrarily shaped text). In this paper, a text detection method based on multi -channel bounding box fusion is designed to address the problem. Firstly, the convolutional neural network is used as the basic network for feature extraction, including shallow text feature map and deep semantic text feature map. Secondly, the whole convolutional network is used for upsampling of feature map and fusion of feature map at each layer, so as to obtain pixel-level text and non-text classification results. Then, two independent text detection boxes channels are designed: the boundary box regression channel and get the bounding box directly on the score map channel. Finally, the result is obtained by combining multi-channel boundary box fusion mechanism with the detection box of the two channels. Experiments on ICDAR2013 and ICDAR2015 demonstrate that the proposed method achieves competitive results in scene text detection.

Wu, Juan. 2021. Long Text Filtering in English Translation based on LSTM Semantic Association. 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :740–743.

Translation studies is one of the fastest growing interdisciplinary research fields in the world today. Business English is an urgent research direction in the field of translation studies. To some extent, the quality of business English translation directly determines the success or failure of international trade and the economic benefits. On the basis of sequence information encoding and decoding model of LSTM, this paper proposes a strategy combining attention mechanism with bidirectional LSTM model to handle the question of feature extraction of text information. The proposed method reduces the semantic complexity and improves the overall correlation accuracy. The experimental results show its advantages.

2022-05-10

Salaou, Allassane Issa, Ghomari, Abdelghani. 2021. Fuzzy ontology-based complex and uncertain video surveillance events recognition. 2021 International Conference on Information Systems and Advanced Technologies (ICISAT). :1–5.

Nowadays, video surveillance systems are part of our daily life, because of their role in ensuring the security of goods and people this generates a huge amount of video data. Thus, several research works based on the ontology paradigm have tried to develop an efficient system to index and search precisely a very large volume of videos. Due to their semantic expressiveness, ontologies are undoubtedly very much in demand in recent years in the field of video surveillance to overcome the problem of the semantic gap between the interpretation of the data extracted from the low level and the high-level semantics of the video. Despite its good expressiveness of semantics, a classical ontology may not be sufficient for good handling of uncertainty, which is however commonly present in the video surveillance domain, hence the need to consider a new ontological approach that will better represent uncertainty. Fuzzy logic is recognized as a powerful tool for dealing with vague, incomplete, imperfect, or uncertain data or information. In this work, we develop a new ontological approach based on fuzzy logic. All the relevant fuzzy concepts such as Video\_Objects, Video\_Events, Video\_Sequences, that could appear in a video surveillance domain are well represented with their fuzzy Ontology DataProperty and the fuzzy relations between them (Ontology ObjectProperty). To achieve this goal, the new fuzzy video surveillance ontology is implemented using the fuzzy ontology web language 2 (fuzzy owl2) which is an extension of the standard semantic web language, ontology web language 2 (owl2).

Wang, Ben, Chu, Hanting, Zhang, Pengcheng, Dong, Hai. 2021. Smart Contract Vulnerability Detection Using Code Representation Fusion. 2021 28th Asia-Pacific Software Engineering Conference (APSEC). :564–565.

At present, most smart contract vulnerability detection use manually-defined patterns, which is time-consuming and far from satisfactory. To address this issue, researchers attempt to deploy deep learning techniques for automatic vulnerability detection in smart contracts. Nevertheless, current work mostly relies on a single code representation such as AST (Abstract Syntax Tree) or code tokens to learn vulnerability characteristics, which might lead to incompleteness of learned semantics information. In addition, the number of available vulnerability datasets is also insufficient. To address these limitations, first, we construct a dataset covering most typical types of smart contract vulnerabilities, which can accurately indicate the specific row number where a vulnerability may exist. Second, for each single code representation, we propose a novel way called AFS (AST Fuse program Slicing) to fuse code characteristic information. AFS can fuse the structured information of AST with program slicing information and detect vulnerabilities by learning new vulnerability characteristic information.

2022-05-09

Zobaed, Sakib M, Salehi, Mohsen Amini, Buyya, Rajkumar. 2021. SAED: Edge-Based Intelligence for Privacy-Preserving Enterprise Search on the Cloud. 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). :366–375.

Cloud-based enterprise search services (e.g., AWS Kendra) have been entrancing big data owners by offering convenient and real-time search solutions to them. However, the problem is that individuals and organizations possessing confidential big data are hesitant to embrace such services due to valid data privacy concerns. In addition, to offer an intelligent search, these services access the user’s search history that further jeopardizes his/her privacy. To overcome the privacy problem, the main idea of this research is to separate the intelligence aspect of the search from its pattern matching aspect. According to this idea, the search intelligence is provided by an on-premises edge tier and the shared cloud tier only serves as an exhaustive pattern matching search utility. We propose Smartness at Edge (SAED mechanism that offers intelligence in the form of semantic and personalized search at the edge tier while maintaining privacy of the search on the cloud tier. At the edge tier, SAED uses a knowledge-based lexical database to expand the query and cover its semantics. SAED personalizes the search via an RNN model that can learn the user’s interest. A word embedding model is used to retrieve documents based on their semantic relevance to the search query. SAED is generic and can be plugged into existing enterprise search systems and enable them to offer intelligent and privacy-preserving search without enforcing any change on them. Evaluation results on two enterprise search systems under real settings and verified by human users demonstrate that SAED can improve the relevancy of the retrieved results by on average ≈24% for plain-text and ≈75% for encrypted generic datasets.

2022-04-22

Zhang, Cuicui, Sun, Jiali, Lu, Ruixuan, Wang, Peng. 2021. Anomaly Detection Model of Power Grid Data Based on STL Decomposition. 2021 IEEE 5th Information Technology,Networking,Electronic and Automation Control Conference (ITNEC). 5:1262—1265.

This paper designs a data anomaly detection method for power grid data centers. The method uses cloud computing architecture to realize the storage and calculation of large amounts of data from power grid data centers. After that, the STL decomposition method is used to decompose the grid data, and then the decomposed residual data is used for anomaly analysis to complete the detection of abnormal data in the grid data. Finally, the feasibility of the method is verified through experiments.

2022-04-19

Hemmati, Mojtaba, Hadavi, Mohammad Ali. 2021. Using Deep Reinforcement Learning to Evade Web Application Firewalls. 2021 18th International ISC Conference on Information Security and Cryptology (ISCISC). :35–41.

Web application firewalls (WAF) are the last line of defense in protecting web applications from application layer security threats like SQL injection and cross-site scripting. Currently, most evasion techniques from WAFs are still developed manually. In this work, we propose a solution, which automatically scans the WAFs to find payloads through which the WAFs can be bypassed. Our solution finds out rules defects, which can be further used in rule tuning for rule-based WAFs. Also, it can enrich the machine learning-based dataset for retraining. To this purpose, we provide a framework based on reinforcement learning with an environment compatible with OpenAI gym toolset standards, employed for training agents to implement WAF evasion tasks. The framework acts as an adversary and exploits a set of mutation operators to mutate the malicious payload syntactically without affecting the original semantics. We use Q-learning and proximal policy optimization algorithms with the deep neural network. Our solution is successful in evading signature-based and machine learning-based WAFs.

Sun, Dengdi, Lv, Xiangjie, Huang, Shilei, Yao, Lin, Ding, Zhuanlian. 2021. Salient Object Detection Based on Multi-layer Cascade and Fine Boundary. 2021 17th International Conference on Computational Intelligence and Security (CIS). :299–303.

Due to the continuous improvement of deep learning, saliency object detection based on deep learning has been a hot topic in computational vision. The Fully Convolutional Neural Network (FCNS) has become the mainstream method in salient target measurement. In this article, we propose a new end-to-end multi-level feature fusion module(MCFB), success-fully achieving the goal of extracting rich multi-scale global information by integrating semantic and detailed information. In our module, we obtain different levels of feature maps through convolution, and then cascade the different levels of feature maps, fully considering our global information, and get a rough saliency image. We also propose an optimization module upon our base module to further optimize the feature map. To obtain a clearer boundary, we use a self-defined loss function to optimize the learning process, which includes the Intersection-over-Union (IoU) losses, Binary Cross-Entropy (BCE), and Structural Similarity (SSIM). The module can extract global information to a greater extent while obtaining clearer boundaries. Compared with some existing representative methods, this method has achieved good results.

2022-04-18

Yuan, Liu, Bai, Yude, Xing, Zhenchang, Chen, Sen, Li, Xiaohong, Deng, Zhidong. 2021. Predicting Entity Relations across Different Security Databases by Using Graph Attention Network. 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). :834–843.

Security databases such as Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), and Common Attack Pattern Enumeration and Classification (CAPEC) maintain diverse high-quality security concepts, which are treated as security entities. Meanwhile, security entities are documented with many potential relation types that profit for security analysis and comprehension across these three popular databases. To support reasoning security entity relationships, translation-based knowledge graph representation learning treats each triple independently for the entity prediction. However, it neglects the important semantic information about the neighbor entities around the triples. To address it, we propose a text-enhanced graph attention network model (text-enhanced GAT). This model highlights the importance of the knowledge in the 2-hop neighbors surrounding a triple, under the observation of the diversity of each entity. Thus, we can capture more structural and textual information from the knowledge graph about the security databases. Extensive experiments are designed to evaluate the effectiveness of our proposed model on the prediction of security entity relationships. Moreover, the experimental results outperform the state-of-the-art by Mean Reciprocal Rank (MRR) 0.132 for detecting the missing relationships.

Bonatti, Piero A., Sauro, Luigi, Langens, Jonathan. 2021. Representing Consent and Policies for Compliance. 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :283–291.

Being compliant with the GDPR (and data protection regulations in general) is a difficult task, that calls for manifold, computer-based automated support. In this context, several use cases related to the management and the enforcement of privacy policies and consent call for a machine-understandable policy language, equipped with reliable algorithms for compliance checking and explanations. In this paper, we outline a set of requirements for such languages and algorithms, and address such requirements with a framework based on a profile of OWL2 and a set of policy serializations based on popular formats such as ODRL and JSON. Such ``external'' policy syntax is translated into the ``internal'' OWL2 syntax, thereby enabling semantic compliance checking and explanations using specialized OWL2 reasoners. We provide a precise definition of both the OWL2 profile and the external policy language based on JSON.

2022-04-13

Solanke, Abiodun A., Chen, Xihui, Ramírez-Cruz, Yunior. 2021. Pattern Recognition and Reconstruction: Detecting Malicious Deletions in Textual Communications. 2021 IEEE International Conference on Big Data (Big Data). :2574–2582.

Digital forensic artifacts aim to provide evidence from digital sources for attributing blame to suspects, assessing their intents, corroborating their statements or alibis, etc. Textual data is a significant source of artifacts, which can take various forms, for instance in the form of communications. E-mails, memos, tweets, and text messages are all examples of textual communications. Complex statistical, linguistic and other scientific procedures can be manually applied to this data to uncover significant clues that point the way to factual information. While expert investigators can undertake this task, there is a possibility that critical information is missed or overlooked. The primary objective of this work is to aid investigators by partially automating the detection of suspicious e-mail deletions. Our approach consists in building a dynamic graph to represent the temporal evolution of communications, and then using a Variational Graph Autoencoder to detect possible e-mail deletions in this graph. Our model uses multiple types of features for representing node and edge attributes, some of which are based on metadata of the messages and the rest are extracted from the contents using natural language processing and text mining techniques. We use the autoencoder to detect missing edges, which we interpret as potential deletions; and to reconstruct their features, from which we emit hypotheses about the topics of deleted messages. We conducted an empirical evaluation of our model on the Enron e-mail dataset, which shows that our model is able to accurately detect a significant proportion of missing communications and to reconstruct the corresponding topic vectors.

2022-04-01

Peng, Yu, Liu, Qin, Tian, Yue, Wu, Jie, Wang, Tian, Peng, Tao, Wang, Guojun. 2021. Dynamic Searchable Symmetric Encryption with Forward and Backward Privacy. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :420—427.

Dynamic searchable symmetric encryption (DSSE) that enables a client to perform searches and updates on encrypted data has been intensively studied in cloud computing. Recently, forward privacy and backward privacy has engaged significant attention to protect DSSE from the leakage of updates. However, the research in this field almost focused on keyword-level updates. That is, the client needs to know the keywords of the documents in advance. In this paper, we proposed a document-level update scheme, DBP, which supports immediate deletion while guaranteeing forward privacy and backward privacy. Compared with existing forward and backward private DSSE schemes, our DBP scheme has the following merits: 1) Practicality. It achieves deletion based on document identifiers rather than document/keyword pairs; 2) Efficiency. It utilizes only lightweight primitives to realize backward privacy while supporting immediate deletion. Experimental evaluation on two real datasets demonstrates the practical efficiency of our scheme.

Walid, Redwan, Joshi, Karuna P., Choi, Seung Geol. 2021. Secure Cloud EHR with Semantic Access Control, Searchable Encryption and Attribute Revocation. 2021 IEEE International Conference on Digital Health (ICDH). :38—47.

To ensure a secure Cloud-based Electronic Health Record (EHR) system, we need to encrypt data and impose field-level access control to prevent malicious usage. Since the attributes of the Users will change with time, the encryption policies adopted may also vary. For large EHR systems, it is often necessary to search through the encrypted data in realtime and perform client-side computations without decrypting all patient records. This paper describes our novel cloud-based EHR system that uses Attribute Based Encryption (ABE) combined with Semantic Web technologies to facilitate differential access to an EHR, thereby ensuring only Users with valid attributes can access a particular field of the EHR. The system also includes searchable encryption using keyword index and search trapdoor, which allows querying EHR fields without decrypting the entire patient record. The attribute revocation feature is efficiently managed in our EHR by delegating the revision of the secret key and ciphertext to the Cloud Service Provider (CSP). Our methodology incorporates advanced security features that eliminate malicious use of EHR data and contributes significantly towards ensuring secure digital health systems on the Cloud.

2022-03-22

Feng, Weiqiang. 2021. A Lightweight Anonymous Authentication Protocol For Smart Grid. 2021 13th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). :87—90.

Recently, A. A. Khan et al proposed a lightweight authentication and key agreement framework for the next generation of smart grids. The framework uses third party authentication server and ECC algorithm, which has certain advantages in anonymity, secure communication and computational performance. However, this paper finds that this method cannot meet the requirements of semantic security through analysis. Therefore, we propose an improved scheme on this basis. And through the method of formal proof, we verify that the scheme can meet the requirement of semantic security and anonymity of smart grid.

Xi, Lanlan, Xin, Yang, Luo, Shoushan, Shang, Yanlei, Tang, Qifeng. 2021. Anomaly Detection Mechanism Based on Hierarchical Weights through Large-Scale Log Data. 2021 International Conference on Computer Communication and Artificial Intelligence (CCAI). :106—115.

In order to realize Intelligent Disaster Recovery and break the traditional reactive backup mode, it is necessary to forecast the potential system anomalies, and proactively backup the real-time datas and configurations. System logs record the running status as well as the critical events (including errors and warnings), which can help to detect system performance, debug system faults and analyze the causes of anomalies. What's more, with the features of real-time, hierarchies and easy-access, log data can be an ideal source for monitoring system status. To reduce the complexity and improve the robustness and practicability of existing log-based anomaly detection methods, we propose a new anomaly detection mechanism based on hierarchical weights, which can deal with unstable log data. We firstly extract semantic information of log strings, and get the word-level weights by SIF algorithm to embed log strings into vectors, which are then feed into attention-based Long Short-Term Memory(LSTM) deep learning network model. In addition to get sentence-level weight which can be used to explore the interdependence between different log sequences and improve the accuracy, we utilize attention weights to help with building workflow to diagnose the abnormal points in the execution of a specific task. Our experimental results show that the hierarchical weights mechanism can effectively improve accuracy of perdition task and reduce complexity of the model, which provides the feasibility foundation support for Intelligent Disaster Recovery.

2022-03-10

Zhang, Zhongtang, Liu, Shengli, Yang, Qichao, Guo, Shichen. 2021. Semantic Understanding of Source and Binary Code based on Natural Language Processing. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). 4:2010—2016.

With the development of open source projects, a large number of open source codes will be reused in binary software, and bugs in source codes will also be introduced into binary codes. In order to detect the reused open source codes in binary codes, it is sometimes necessary to compare and analyze the similarity between source codes and binary codes. One of the main challenge is that the compilation process can generate different binary code representations for the same source code, such as different compiler versions, compilation optimization options and target architectures, which greatly increases the difficulty of semantic similarity detection between source code and binary code. In order to solve the influence of the compilation process on the comparison of semantic similarity of codes, this paper transforms the source code and binary code into LLVM intermediate representation (LLVM IR), which is a universal intermediate representation independent of source code and binary code. We carry out semantic feature extraction and embedding training on LLVM IR based on natural language processing model. Experimental results show that LLVM IR eliminates the influence of compilation on the syntax differences between source code and binary code, and the semantic features of code are well represented and preserved.

2022-03-09

Shi, Di-Bo, Xie, Huan, Ji, Yi, Li, Ying, Liu, Chun-Ping. 2021. Deep Content Guidance Network for Arbitrary Style Transfer. 2021 International Joint Conference on Neural Networks (IJCNN). :1—8.

Arbitrary style transfer refers to generate a new image based on any set of existing images. Meanwhile, the generated image retains the content structure of one and the style pattern of another. In terms of content retention and style transfer, the recent arbitrary style transfer algorithms normally perform well in one, but it is difficult to find a trade-off between the two. In this paper, we propose the Deep Content Guidance Network (DCGN) which is stacked by content guidance (CG) layers. And each CG layer involves one position self-attention (pSA) module, one channel self-attention (cSA) module and one content guidance attention (cGA) module. Specially, the pSA module extracts more effective content information on the spatial layout of content images and the cSA module makes the style representation of style images in the channel dimension richer. And in the non-local view, the cGA module utilizes content information to guide the distribution of style features, which obtains a more detailed style expression. Moreover, we introduce a new permutation loss to generalize feature expression, so as to obtain abundant feature expressions while maintaining content structure. Qualitative and quantitative experiments verify that our approach can transform into better stylized images than the state-of-the-art methods.

Jia, Ning, Gong, Xiaoyi, Zhang, Qiao. 2021. Improvement of Style Transfer Algorithm based on Neural Network. 2021 International Conference on Computer Engineering and Application (ICCEA). :1—6.

In recent years, the application of style transfer has become more and more widespread. Traditional deep learning-based style transfer networks often have problems such as image distortion, loss of detailed information, partial content disappearance, and transfer errors. The style transfer network based on deep learning that we propose in this article is aimed at dealing with these problems. Our method uses image edge information fusion and semantic segmentation technology to constrain the image structure before and after the migration, so that the converted image maintains structural consistency and integrity. We have verified that this method can successfully suppress image conversion distortion in most scenarios, and can generate good results.

2022-02-25

Wittek, Kevin, Wittek, Neslihan, Lawton, James, Dohndorf, Iryna, Weinert, Alexander, Ionita, Andrei. 2021. A Blockchain-Based Approach to Provenance and Reproducibility in Research Workflows. 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). :1–6.

The traditional Proof of Existence blockchain service on the Bitcoin network can be used to verify the existence of any research data at a specific point of time, and to validate the data integrity, without revealing its content. Several variants of the blockchain service exist to certify the existence of data relying on cryptographic fingerprinting, thus enabling an efficient verification of the authenticity of such certifications. However, nowadays research data is continuously changing and being modified through different processing steps in most scientific research workflows such that certifications of individual data objects seem to be constantly outdated in this setting. This paper describes how the blockchain and distributed ledger technology can be used to form a new certification model, that captures the research process as a whole in a more meaningful way, including the description of the used data through its different stages and the associated computational pipeline, code for analysis and the experimental design. The scientific blockchain infrastructure bloxberg, together with a deep learning based analysis from the behavioral science field are used to show the applicability of the approach.

Abdelnabi, Sahar, Fritz, Mario. 2021. Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding. 2021 IEEE Symposium on Security and Privacy (SP). :121–140.

Recent advances in natural language generation have introduced powerful language models with high-quality output text. However, this raises concerns about the potential misuse of such models for malicious purposes. In this paper, we study natural language watermarking as a defense to help better mark and trace the provenance of text. We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training that, given an input text and a binary message, generates an output text that is unobtrusively encoded with the given message. We further study different training and inference strategies to achieve minimal changes to the semantics and correctness of the input text.AWT is the first end-to-end model to hide data in text by automatically learning -without ground truth- word substitutions along with their locations in order to encode the message. We empirically show that our model is effective in largely preserving text utility and decoding the watermark while hiding its presence against adversaries. Additionally, we demonstrate that our method is robust against a range of attacks.

Barthe, Gilles, Cauligi, Sunjay, Grégoire, Benjamin, Koutsos, Adrien, Liao, Kevin, Oliveira, Tiago, Priya, Swarn, Rezk, Tamara, Schwabe, Peter. 2021. High-Assurance Cryptography in the Spectre Era. 2021 IEEE Symposium on Security and Privacy (SP). :1884–1901.

High-assurance cryptography leverages methods from program verification and cryptography engineering to deliver efficient cryptographic software with machine-checked proofs of memory safety, functional correctness, provable security, and absence of timing leaks. Traditionally, these guarantees are established under a sequential execution semantics. However, this semantics is not aligned with the behavior of modern processors that make use of speculative execution to improve performance. This mismatch, combined with the high-profile Spectre-style attacks that exploit speculative execution, naturally casts doubts on the robustness of high-assurance cryptography guarantees. In this paper, we dispel these doubts by showing that the benefits of high-assurance cryptography extend to speculative execution, costing only a modest performance overhead. We build atop the Jasmin verification framework an end-to-end approach for proving properties of cryptographic software under speculative execution, and validate our approach experimentally with efficient, functionally correct assembly implementations of ChaCha20 and Poly1305, which are secure against both traditional timing and speculative execution attacks.

2022-02-24

Castellano, Giovanna, Vessio, Gennaro. 2021. Deep Convolutional Embedding for Digitized Painting Clustering. 2020 25th International Conference on Pattern Recognition (ICPR). :2708–2715.

Clustering artworks is difficult for several reasons. On the one hand, recognizing meaningful patterns in accordance with domain knowledge and visual perception is extremely difficult. On the other hand, applying traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, we propose to use a deep convolutional embedding model for digitized painting clustering, in which the task of mapping the raw input data to an abstract, latent space is jointly optimized with the task of finding a set of cluster centroids in this latent feature space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. The model is also capable of outperforming other state-of-the-art deep clustering approaches to the same problem. The proposed method can be useful for several art-related tasks, in particular visual link retrieval and historical knowledge discovery in painting datasets.