Visible to the public Biblio

Filters: Keyword is meta-data  [Clear All Filters]
2020-11-16
Zhang, C., Xu, C., Xu, J., Tang, Y., Choi, B..  2019.  GEMˆ2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :842–853.
Blockchain technology has attracted much attention due to the great success of the cryptocurrencies. Owing to its immutability property and consensus protocol, blockchain offers a new solution for trusted storage and computation services. To scale up the services, prior research has suggested a hybrid storage architecture, where only small meta-data are stored onchain and the raw data are outsourced to off-chain storage. To protect data integrity, a cryptographic proof can be constructed online for queries over the data stored in the system. However, the previous schemes only support simple key-value queries. In this paper, we take the first step toward studying authenticated range queries in the hybrid-storage blockchain. The key challenge lies in how to design an authenticated data structure (ADS) that can be efficiently maintained by the blockchain, in which a unique gas cost model is employed. By analyzing the performance of the existing techniques, we propose a novel ADS, called GEM2-tree, which is not only gas-efficient but also effective in supporting authenticated queries. To further reduce the ADS maintenance cost without sacrificing much the query performance, we also propose an optimized structure, GEM2*-tree, by designing a two-level index structure. Theoretical analysis and empirical evaluation validate the performance of the proposed ADSs.
2015-05-04
Pratanwanich, N., Lio, P..  2014.  Who Wrote This? Textual Modeling with Authorship Attribution in Big Data Data Mining Workshop (ICDMW), 2014 IEEE International Conference on. :645-652.

By representing large corpora with concise and meaningful elements, topic-based generative models aim to reduce the dimension and understand the content of documents. Those techniques originally analyze on words in the documents, but their extensions currently accommodate meta-data such as authorship information, which has been proved useful for textual modeling. The importance of learning authorship is to extract author interests and assign authors to anonymous texts. Author-Topic (AT) model, an unsupervised learning technique, successfully exploits authorship information to model both documents and author interests using topic representations. However, the AT model simplifies that each author has equal contribution on multiple-author documents. To overcome this limitation, we assumes that authors give different degrees of contributions on a document by using a Dirichlet distribution. This automatically transforms the unsupervised AT model to Supervised Author-Topic (SAT) model, which brings a novelty of authorship prediction on anonymous texts. The SAT model outperforms the AT model for identifying authors of documents written by either single authors or multiple authors with a better Receiver Operating Characteristic (ROC) curve and a significantly higher Area Under Curve (AUC). The SAT model not only achieves competitive performance to state-of-the-art techniques e.g. Random forests but also maintains the characteristics of the unsupervised models for information discovery i.e. Word distributions of topics, author interests, and author contributions.