Visible to the public Biblio

Filters: Keyword is indexing  [Clear All Filters]
2021-09-01
Walter, Dominik, Witterauf, Michael, Teich, Jürgen.  2020.  Real-time Scheduling of I/O Transfers for Massively Parallel Processor Arrays. 2020 18th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE). :1—11.
The following topics are dealt with: formal verification; formal specification; cyber-physical systems; program verification; mobile robots; control engineering computing; temporal logic; security of data; Internet of Things; traffic engineering computing.
2021-08-05
Bogatu, Alex, Fernandes, Alvaro A. A., Paton, Norman W., Konstantinou, Nikolaos.  2020.  Dataset Discovery in Data Lakes. 2020 IEEE 36th International Conference on Data Engineering (ICDE). :709—720.
Data analytics stands to benefit from the increasing availability of datasets that are held without their conceptual relationships being explicitly known. When collected, these datasets form a data lake from which, by processes like data wrangling, specific target datasets can be constructed that enable value- adding analytics. Given the potential vastness of such data lakes, the issue arises of how to pull out of the lake those datasets that might contribute to wrangling out a given target. We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it. Our approach uses features of the values in a dataset to construct hash- based indexes that map those features into a uniform distance space. This makes it possible to define similarity distances between features and to take those distances as measurements of relatedness w.r.t. a target table. Given the latter (and exemplar tuples), our approach returns the most related tables in the lake. We provide a detailed description of the approach and report on empirical results for two forms of relatedness (unionability and joinability) comparing them with prior work, where pertinent, and showing significant improvements in all of precision, recall, target coverage, indexing and discovery times.
2021-02-22
Li, M., Zhang, Y., Sun, Y., Wang, W., Tsang, I. W., Lin, X..  2020.  I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions. 2020 IEEE 36th International Conference on Data Engineering (ICDE). :289–300.
Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.
2020-12-11
Zhang, W., Byna, S., Niu, C., Chen, Y..  2019.  Exploring Metadata Search Essentials for Scientific Data Management. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). :83—92.

Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.

2020-11-20
Moghaddam, F. F., Wieder, P., Yahyapour, R., Khodadadi, T..  2018.  A Reliable Ring Analysis Engine for Establishment of Multi-Level Security Management in Clouds. 2018 41st International Conference on Telecommunications and Signal Processing (TSP). :1—5.
Security and Privacy challenges are the most obstacles for the advancement of cloud computing and the erosion of trust boundaries already happening in organizations is amplified and accelerated by this emerging technology. Policy Management Frameworks are the most proper solutions to create dedicated security levels based on the sensitivity of resources and according to the mapping process between requirements cloud customers and capabilities of service providers. The most concerning issue in these frameworks is the rate of perfect matches between capabilities and requirements. In this paper, a reliable ring analysis engine has been introduced to efficiently map the security requirements of cloud customers to the capabilities of service provider and to enhance the rate of perfect matches between them for establishment of different security levels in clouds. In the suggested model a structural index has been introduced to receive the requirement and efficiently map them to the most proper security mechanism of the service provider. Our results show that this index-based engine enhances the rate of perfect matches considerably and decreases the detected conflicts in syntactic and semantic analysis.
2020-09-11
Kim, Donghoon, Sample, Luke.  2019.  Search Prevention with Captcha Against Web Indexing: A Proof of Concept. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :219—224.
A website appears in search results based on web indexing conducted by a search engine bot (e.g., a web crawler). Some webpages do not want to be found easily because they include sensitive information. There are several methods to prevent web crawlers from indexing in search engine database. However, such webpages can still be indexed by malicious web crawlers. Through this study, we explore a paradox perspective on a new use of captchas for search prevention. Captchas are used to prevent web crawlers from indexing by converting sensitive words to captchas. We have implemented the web-based captcha conversion tool based on our search prevention algorithm. We also describe our proof of concept with the web-based chat application modified to utilize our algorithm. We have conducted the experiment to evaluate our idea on Google search engine with two versions of webpages, one containing plain text and another containing sensitive words converted to captchas. The experiment results show that the sensitive words on the captcha version of the webpages are unable to be found by Google's search engine, while the plain text versions are.
2020-05-22
Markchit, Sarawut, Chiu, Chih-Yi.  2019.  Hash Code Indexing in Cross-Modal Retrieval. 2019 International Conference on Content-Based Multimedia Indexing (CBMI). :1—4.

Cross-modal hashing, which searches nearest neighbors across different modalities in the Hamming space, has become a popular technique to overcome the storage and computation barrier in multimedia retrieval recently. Although dozens of cross-modal hashing algorithms are proposed to yield compact binary code representation, applying exhaustive search in a large-scale dataset is impractical for the real-time purpose, and the Hamming distance computation suffers inaccurate results. In this paper, we propose a novel index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme exploits a few binary bits of the hash code as the index code. Based on the index code representation, we construct an inverted index structure to accelerate the retrieval efficiency and train a neural network to improve the indexing accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated by three cross-modal hashing methods. Results show the proposed method effectively boosts the performance over the benchmark datasets and hash methods.

Rattaphun, Munlika, Prayoonwong, Amorntip, Chiu, Chih- Yi.  2019.  Indexing in k-Nearest Neighbor Graph by Hash-Based Hill-Climbing. 2019 16th International Conference on Machine Vision Applications (MVA). :1—4.
A main issue in approximate nearest neighbor search is to achieve an excellent tradeoff between search accuracy and computation cost. In this paper, we address this issue by leveraging k-nearest neighbor graph and hill-climbing to accelerate vector quantization in the query assignment process. A modified hill-climbing algorithm is proposed to traverse k-nearest neighbor graph to find closest centroids for a query, rather than calculating the query distances to all centroids. Instead of using random seeds in the original hill-climbing algorithm, we generate high-quality seeds based on the hashing technique. It can boost the query assignment efficiency due to a better start-up in hill-climbing. We evaluate the experiment on the benchmarks of SIFT1M and GIST1M datasets, and show the proposed hashing-based seed generation effectively improves the search performance.
2020-02-17
Nguyen, Trinh, Le, Cuong, Yoo, Myungsik.  2019.  A Self-Healing Mechanism for NFV By Leveraging Resource Information Indexing Technique. 2019 International Conference on Information and Communication Technology Convergence (ICTC). :802–804.
Network Functions Virtualization (NFV) enables network operators to develop and deploy services to the market in a flexible and timely manner which has never been achieved before. Though NFV has revolutionized the telco industry with many advantages of virtualization technologies, there are challenges need to be taken into account especially the fault recovery. Current implementation of NFV systems offers limited self-healing features such as ping for health check and the recovery procedure is expensive by taking down the whole virtual machine and replacing with a new one. This article proposes a new self-healing mechanism for NFV by leveraging the resource information indexing technique.
2020-02-10
Bansal, Bhawana, Sharma, Monika.  2019.  Client-Side Verification Framework for Offline Architecture of IoT. 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA). :1044–1050.
Internet of things is a network formed between two or more devices through internet which helps in sharing data and resources. IoT is present everywhere and lot of applications in our day-to-day life such as smart homes, smart grid system which helps in reducing energy consumption, smart garbage collection to make cities clean, smart cities etc. It has some limitations too such as concerns of security of the network and the cost of installations of the devices. There have been many researches proposed various method in improving the IoT systems. In this paper, we have discussed about the scope and limitations of IoT in various fields and we have also proposed a technique to secure offline architecture of IoT.
2019-09-04
Vanjari, M. S. P., Balsaraf, M. K. P..  2018.  Efficient Exploration of Algorithm in Scholarly Big Data Document. 2018 International Conference on Information , Communication, Engineering and Technology (ICICET). :1–5.
Algorithms are used to develop, analyzing, and applying in the computer field and used for developing new application. It is used for finding solutions to any problems in different condition. It transforms the problems into algorithmic ones on which standard algorithms are applied. Day by day Scholarly Digital documents are increasing. AlgorithmSeer is a search engine used for searching algorithms. The main aim of it provides a large algorithm database. It is used to automatically encountering and take these algorithms in this big collection of documents that enable algorithm indexing, searching, discovery, and analysis. An original set to identify and pull out algorithm representations in a big collection of scholarly documents is proposed, of scale able techniques used by AlgorithmSeer. Along with this, particularly important and relevant textual content can be accessed the platform and highlight portions by anyone with different levels of knowledge. In support of lectures and self-learning, the highlighted documents can be shared with others. But different levels of learners cannot use the highlighted part of text at same understanding level. The problem of guessing new highlights of partially highlighted documents can be solved by us.
2019-05-01
Shirsat, S. D..  2018.  Demonstrating Different Phishing Attacks Using Fuzzy Logic. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). :57-61.

Phishing has increased tremendously over last few years and it has become a serious threat to global security and economy. Existing literature dealing with the problem of phishing is scarce. Phishing is a deception technique that uses a combination of technology and social engineering to acquire sensitive information such as online banking passwords, credit card or bank account details [2]. Phishing can be done through emails and websites to collect confidential information. Phishers design fraudulent websites which look similar to the legitimate websites and lure the user to visit the malicious website. Therefore, the users must be aware of malicious websites to protect their sensitive data [1]. But it is very difficult to distinguish between legitimate and fake website especially for nontechnical users [4]. Moreover, phishing sites are growing rapidly. The aim of this paper is to demonstrate phishing detection using fuzzy logic and interpreting results using different defuzzification methods.

2019-03-22
Quweider, M., Lei, H., Zhang, L., Khan, F..  2018.  Managing Big Data in Visual Retrieval Systems for DHS Applications: Combining Fourier Descriptors and Metric Space Indexing. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :188-193.

Image retrieval systems have been an active area of research for more than thirty years progressively producing improved algorithms that improve performance metrics, operate in different domains, take advantage of different features extracted from the images to be retrieved, and have different desirable invariance properties. With the ever-growing visual databases of images and videos produced by a myriad of devices comes the challenge of selecting effective features and performing fast retrieval on such databases. In this paper, we incorporate Fourier descriptors (FD) along with a metric-based balanced indexing tree as a viable solution to DHS (Department of Homeland Security) needs to for quick identification and retrieval of weapon images. The FDs allow a simple but effective outline feature representation of an object, while the M-tree provide a dynamic, fast, and balanced search over such features. Motivated by looking for applications of interest to DHS, we have created a basic guns and rifles databases that can be used to identify weapons in images and videos extracted from media sources. Our simulations show excellent performance in both representation and fast retrieval speed.

2018-06-11
Yang, C., Li, Z., Qu, W., Liu, Z., Qi, H..  2017.  Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data. 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks 2017 11th International Conference on Frontier of Computer Science and Technology 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC). :46–51.

The rapid development of Internet has resulted in massive information overloading recently. These information is usually represented by high-dimensional feature vectors in many related applications such as recognition, classification and retrieval. These applications usually need efficient indexing and search methods for such large-scale and high-dimensional database, which typically is a challenging task. Some efforts have been made and solved this problem to some extent. However, most of them are implemented in a single machine, which is not suitable to handle large-scale database.In this paper, we present a novel data index structure and nearest neighbor search algorithm implemented on Apache Spark. We impose a grid on the database and index data by non-empty grid cells. This grid-based index structure is simple and easy to be implemented in parallel. Moreover, we propose to build a scalable KNN graph on the grids, which increase the efficiency of this index structure by a low cost in parallel implementation. Finally, experiments are conducted in both public databases and synthetic databases, showing that the proposed methods achieve overall high performance in both efficiency and accuracy.

2018-03-26
Abuein, Q., Shatnawi, A., Al-Sheyab, H..  2017.  Trusted Recomendation System Based on Level of Trust(TRS_LoT). 2017 International Conference on Engineering and Technology (ICET). :1–5.

There are vast amounts of information in our world. Accessing the most accurate information in a speedy way is becoming more difficult and complicated. A lot of relevant information gets ignored which leads to much duplication of work and effort. The focuses tend to provide rapid and intelligent retrieval systems. Information retrieval (IR) is the process of searching for information that is related to some topics of interest. Due to the massive search results, the user will normally have difficulty in identifying the relevant ones. To alleviate this problem, a recommendation system is used. A recommendation system is a sort of filtering information system, which predicts the relevance of retrieved information to the user's needs according to some criteria. Hence, it can provide the user with the results that best fit their needs. The services provided through the web normally provide massive information about any requested item or service. An efficient recommendation system is required to classify this information result. A recommendation system can be further improved if augmented with a level of trust information. That is, recommendations are ranked according to their level of trust. In our research, we produced a recommendation system combined with an efficient level of trust system to guarantee that the posts, comments and feedbacks from users are trusted. We customized the concept of LoT (Level of Trust) [1] since it can cover medical, shopping and learning through social media. The proposed system TRS\_LoT provides trusted recommendations to the users with a high percentage of accuracy. Whereas a 300 post with more than 5000 comments from ``Amazon'' was selected to be used as a dataset, the experiment has been conducted by using same dataset based on ``post rating''.

2018-01-16
Arasu, Arvind, Eguro, Ken, Kaushik, Raghav, Kossmann, Donald, Meng, Pingfan, Pandey, Vineet, Ramamurthy, Ravi.  2017.  Concerto: A High Concurrency Key-Value Store with Integrity. Proceedings of the 2017 ACM International Conference on Management of Data. :251–266.

Verifying the integrity of outsourced data is a classic, well-studied problem. However current techniques have fundamental performance and concurrency limitations for update-heavy workloads. In this paper, we investigate the potential advantages of deferred and batched verification rather than the per-operation verification used in prior work. We present Concerto, a comprehensive key-value store designed around this idea. Using Concerto, we argue that deferred verification preserves the utility of online verification and improves concurrency resulting in orders-of-magnitude performance improvement. On standard benchmarks, the performance of Concerto is within a factor of two when compared to state-of-the-art key-value stores without integrity.

Ba-Hutair, M. N., Kamel, I..  2016.  A New Scheme for Protecting the Privacy and Integrity of Spatial Data on the Cloud. 2016 IEEE Second International Conference on Multimedia Big Data (BigMM). :394–397.

As the amount of spatial data gets bigger, organizations realized that it is cheaper and more flexible to keep their data on the Cloud rather than to establish and maintain in-house huge data centers. Though this saves a lot for IT costs, organizations are still concerned about the privacy and security of their data. Encrypting the whole database before uploading it to the Cloud solves the security issue. But querying the database requires downloading and decrypting the data set, which is impractical. In this paper, we propose a new scheme for protecting the privacy and integrity of spatial data stored in the Cloud while being able to execute range queries efficiently. The proposed technique suggests a new index structure to support answering range query over encrypted data set. The proposed indexing scheme is based on the Z-curve. The paper describes a distributed algorithm for answering range queries over spatial data stored on the Cloud. We carried many simulation experiments to measure the performance of the proposed scheme. The experimental results show that the proposed scheme outperforms the most recent schemes by Kim et al. in terms of data redundancy.

2015-05-06
Jingkuan Song, Yi Yang, Xuelong Li, Zi Huang, Yang Yang.  2014.  Robust Hashing With Local Models for Approximate Similarity Search. Cybernetics, IEEE Transactions on. 44:1225-1236.

Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing ℓ2,1-norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.
 

Kafai, M., Eshghi, K., Bhanu, B..  2014.  Discrete Cosine Transform Locality-Sensitive Hashes for Face Retrieval. Multimedia, IEEE Transactions on. 16:1090-1103.

Descriptors such as local binary patterns perform well for face recognition. Searching large databases using such descriptors has been problematic due to the cost of the linear search, and the inadequate performance of existing indexing methods. We present Discrete Cosine Transform (DCT) hashing for creating index structures for face descriptors. Hashes play the role of keywords: an index is created, and queried to find the images most similar to the query image. Common hash suppression is used to improve retrieval efficiency and accuracy. Results are shown on a combination of six publicly available face databases (LFW, FERET, FEI, BioID, Multi-PIE, and RaFD). It is shown that DCT hashing has significantly better retrieval accuracy and it is more efficient compared to other popular state-of-the-art hash algorithms.
 

2015-05-05
Zadeh, B.Q., Handschuh, S..  2014.  Random Manhattan Indexing. Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. :203-208.

Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in text processing. In these models, high-dimensional, often sparse vectors represent text units. In an application, the similarity of vectors -- and hence the text units that they represent -- is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a new method, called Random Manhattan Indexing (RMI), for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental, and thus scalable, procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections.

Dey, L., Mahajan, D., Gupta, H..  2014.  Obtaining Technology Insights from Large and Heterogeneous Document Collections. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:102-109.

Keeping up with rapid advances in research in various fields of Engineering and Technology is a challenging task. Decision makers including academics, program managers, venture capital investors, industry leaders and funding agencies not only need to be abreast of latest developments but also be able to assess the effect of growth in certain areas on their core business. Though analyst agencies like Gartner, McKinsey etc. Provide such reports for some areas, thought leaders of all organisations still need to amass data from heterogeneous collections like research publications, analyst reports, patent applications, competitor information etc. To help them finalize their own strategies. Text mining and data analytics researchers have been looking at integrating statistics, text analytics and information visualization to aid the process of retrieval and analytics. In this paper, we present our work on automated topical analysis and insight generation from large heterogeneous text collections of publications and patents. While most of the earlier work in this area provides search-based platforms, ours is an integrated platform for search and analysis. We have presented several methods and techniques that help in analysis and better comprehension of search results. We have also presented methods for generating insights about emerging and popular trends in research along with contextual differences between academic research and patenting profiles. We also present novel techniques to present topic evolution that helps users understand how a particular area has evolved over time.
 

2015-05-01
Hammoud, R.I., Sahin, C.S., Blasch, E.P., Rhodes, B.J..  2014.  Multi-source Multi-modal Activity Recognition in Aerial Video Surveillance. Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on. :237-244.

Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we deal with two un-synchronized data sources collected in real-world operating scenarios: full-motion videos (FMV) and analyst call-outs (ACO) in the form of chat messages (voice-to-text) made by a human watching the streamed FMV from an aerial platform. We present a multi-source multi-modal activity/event recognition system for surveillance applications, consisting of: (1) detecting and tracking multiple dynamic targets from a moving platform, (2) representing FMV target tracks and chat messages as graphs of attributes, (3) associating FMV tracks and chat messages using a probabilistic graph-based matching approach, and (4) detecting spatial-temporal activity boundaries. We also present an activity pattern learning framework which uses the multi-source associated data as training to index a large archive of FMV videos. Finally, we describe a multi-intelligence user interface for querying an index of activities of interest (AOIs) by movement type and geo-location, and for playing-back a summary of associated text (ACO) and activity video segments of targets-of-interest (TOIs) (in both pixel and geo-coordinates). Such tools help the end-user to quickly search, browse, and prepare mission reports from multi-source data.