Visible to the public Biblio

Filters: Keyword is digital libraries  [Clear All Filters]
2023-02-03
Ouamour, S., Sayoud, H..  2022.  Computational Identification of Author Style on Electronic Libraries - Case of Lexical Features. 2022 5th International Symposium on Informatics and its Applications (ISIA). :1–4.
In the present work, we intend to present a thorough study developed on a digital library, called HAT corpus, for a purpose of authorship attribution. Thus, a dataset of 300 documents that are written by 100 different authors, was extracted from the web digital library and processed for a task of author style analysis. All the documents are related to the travel topic and written in Arabic. Basically, three important rules in stylometry should be respected: the minimum document size, the same topic for all documents and the same genre too. In this work, we made a particular effort to respect those conditions seriously during the corpus preparation. That is, three lexical features: Fixed-length words, Rare words and Suffixes are used and evaluated by using a centroid based Manhattan distance. The used identification approach shows interesting results with an accuracy of about 0.94.
2017-12-12
Saundry, A..  2017.  Institutional Repository Digital Object Metadata Enhancement and Re-Architecting. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). :1–3.

We present work undertaken at our institutional repository to enhance metadata and re-organize digital objects according to new information architecture, in an effort to minimize administrative object management and processing, and improve object discovery and use. This work was partly motivated by the launch of a new discovery platform at our institution, which aggregates metadata and full text from our four open access repositories into a cohesive, consistent, and enhanced searching and browsing experience. The platform provides digital object identifier (DOI) assignment, metadata access via various formats, and an open metadata and full text application program interface (API) for researchers, amongst other features. Functionality of these platform features relies heavily on accurate object representation and metadata. This work facilitates and improves the discovery and engagement of the diverse digital objects available from our institution, so they can be used and analyzed in new, flexible, and innovative ways by a myriad of communities and disciplines.

2017-03-07
Farag, Mohamed, Nakate, Pranav, Fox, Edward A..  2016.  Big Data Processing of School Shooting Archives. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. :271–272.

Web archives about school shootings consist of webpages that may or may not be relevant to the events of interest. There are 3 main goals of this work; first is to clean the webpages, which involves getting rid of the stop words and non-relevant parts of a webpage. The second goal is to select just webpages relevant to the events of interest. The third goal is to upload the cleaned and relevant webpages to Apache Solr so that they are easily accessible. We show the details of all the steps required to achieve these goals. The results show that representative Web archives are noisy, with 2% - 40% relevant content. By cleaning the archives, we aid researchers to focus on relevant content for their analysis.