Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 69

Search results for: metadata

69 Design and Implementation of Flexible Metadata Editing System for Digital Contents

Authors: K. W. Nam, B. J. Kim, S. J. Lee

Abstract:

Along with the development of network infrastructures, such as high-speed Internet and mobile environment, the explosion of multimedia data is expanding the range of multimedia services beyond voice and data services. Amid this flow, research is actively being done on the creation, management, and transmission of metadata on digital content to provide different services to users. This paper proposes a system for the insertion, storage, and retrieval of metadata about digital content. The metadata server with Binary XML was implemented for efficient storage space and retrieval speeds, and the transport data size required for metadata retrieval was simplified. With the proposed system, the metadata could be inserted into the moving objects in the video, and the unnecessary overlap could be minimized by improving the storage structure of the metadata. The proposed system can assemble metadata into one relevant topic, even if it is expressed in different media or in different forms. It is expected that the proposed system will handle complex network types of data.

Keywords: video, multimedia, metadata, editing tool, XML

Procedia PDF Downloads 123
68 Tool for Metadata Extraction and Content Packaging as Endorsed in OAIS Framework

Authors: Payal Abichandani, Rishi Prakash, Paras Nath Barwal, B. K. Murthy

Abstract:

Information generated from various computerization processes is a potential rich source of knowledge for its designated community. To pass this information from generation to generation without modifying the meaning is a challenging activity. To preserve and archive the data for future generations it’s very essential to prove the authenticity of the data. It can be achieved by extracting the metadata from the data which can prove the authenticity and create trust on the archived data. Subsequent challenge is the technology obsolescence. Metadata extraction and standardization can be effectively used to resolve and tackle this problem. Metadata can be categorized at two levels i.e. Technical and Domain level broadly. Technical metadata will provide the information that can be used to understand and interpret the data record, but only this level of metadata isn’t sufficient to create trustworthiness. We have developed a tool which will extract and standardize the technical as well as domain level metadata. This paper is about the different features of the tool and how we have developed this.

Keywords: digital preservation, metadata, OAIS, PDI, XML

Procedia PDF Downloads 357
67 Open educational Resources' Metadata: Towards the First Star to Quality of Open Educational Resources

Authors: Audrey Romero-Pelaez, Juan Carlos Morocho-Yunga

Abstract:

The increasing amount of open educational resources (OER) published on the web for consumption in teaching and learning environments also generates a growing need to ensure the quality of these resources. The low level of OER discovery is one of the most significant drawbacks when faced with its reuse, and as a consequence, high-quality educational resources can go unnoticed. Metadata enables the discovery of resources on the web. The purpose of this study is to lay the foundations for open educational resources to achieve their first quality star within the Quality4OER Framework. In this study, we evaluate the quality of OER metadata and establish the main guidelines on metadata quality in this context.

Keywords: open educational resources, OER quality, quality metadata

Procedia PDF Downloads 192
66 The Dynamic Metadata Schema in Neutron and Photon Communities: A Case Study of X-Ray Photon Correlation Spectroscopy

Authors: Amir Tosson, Mohammad Reza, Christian Gutt

Abstract:

Metadata stands at the forefront of advancing data management practices within research communities, with particular significance in the realms of neutron and photon scattering. This paper introduces a groundbreaking approach—dynamic metadata schema—within the context of X-ray Photon Correlation Spectroscopy (XPCS). XPCS, a potent technique unravelling nanoscale dynamic processes, serves as an illustrative use case to demonstrate how dynamic metadata can revolutionize data acquisition, sharing, and analysis workflows. This paper explores the challenges encountered by the neutron and photon communities in navigating intricate data landscapes and highlights the prowess of dynamic metadata in addressing these hurdles. Our proposed approach empowers researchers to tailor metadata definitions to the evolving demands of experiments, thereby facilitating streamlined data integration, traceability, and collaborative exploration. Through tangible examples from the XPCS domain, we showcase how embracing dynamic metadata standards bestows advantages, enhancing data reproducibility, interoperability, and the diffusion of knowledge. Ultimately, this paper underscores the transformative potential of dynamic metadata, heralding a paradigm shift in data management within the neutron and photon research communities.

Keywords: metadata, FAIR, data analysis, XPCS, IoT

Procedia PDF Downloads 26
65 Creating and Questioning Research-Oriented Digital Outputs to Manuscript Metadata: A Case-Based Methodological Investigation

Authors: Diandra Cristache

Abstract:

The transition of traditional manuscript studies into the digital framework closely affects the methodological premises upon which manuscript descriptions are modeled, created, and questioned for the purpose of research. This paper intends to explore the issue by presenting a methodological investigation into the process of modeling, creating, and questioning manuscript metadata. The investigation is founded on a close observation of the Polonsky Greek Manuscripts Project, a collaboration between the Universities of Cambridge and Heidelberg. More than just providing a realistic ground for methodological exploration, along with a complete metadata set for computational demonstration, the case study also contributes to a broader purpose: outlining general methodological principles for making the most out of manuscript metadata by means of research-oriented digital outputs. The analysis mainly focuses on the scholarly approach to manuscript descriptions, in the specific instance where the act of metadata recording does not have a programmatic research purpose. Close attention is paid to the encounter of 'traditional' practices in manuscript studies with the formal constraints of the digital framework: does the shift in practices (especially from the straight narrative of free writing towards the hierarchical constraints of the TEI encoding model) impact the structure of metadata and its capability to respond specific research questions? It is argued that flexible structure of TEI and traditional approaches to manuscript description lead to a proliferation of markup: does an 'encyclopedic' descriptive approach ensure the epistemological relevance of the digital outputs to metadata? To provide further insight on the computational approach to manuscript metadata, the metadata of the Polonsky project are processed with techniques of distant reading and data networking, thus resulting in a new group of digital outputs (relational graphs, geographic maps). The computational process and the digital outputs are thoroughly illustrated and discussed. Eventually, a retrospective analysis evaluates how the digital outputs respond to the scientific expectations of research, and the other way round, how the requirements of research questions feed back into the creation and enrichment of metadata in an iterative loop.

Keywords: digital manuscript studies, digital outputs to manuscripts metadata, metadata interoperability, methodological issues

Procedia PDF Downloads 103
64 Integration of Knowledge and Metadata for Complex Data Warehouses and Big Data

Authors: Jean Christian Ralaivao, Fabrice Razafindraibe, Hasina Rakotonirainy

Abstract:

This document constitutes a resumption of work carried out in the field of complex data warehouses (DW) relating to the management and formalization of knowledge and metadata. It offers a methodological approach for integrating two concepts, knowledge and metadata, within the framework of a complex DW architecture. The objective of the work considers the use of the technique of knowledge representation by description logics and the extension of Common Warehouse Metamodel (CWM) specifications. This will lead to a fallout in terms of the performance of a complex DW. Three essential aspects of this work are expected, including the representation of knowledge in description logics and the declination of this knowledge into consistent UML diagrams while respecting or extending the CWM specifications and using XML as pivot. The field of application is large but will be adapted to systems with heteroge-neous, complex and unstructured content and moreover requiring a great (re)use of knowledge such as medical data warehouses.

Keywords: data warehouse, description logics, integration, knowledge, metadata

Procedia PDF Downloads 94
63 Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems

Authors: Yiwei Li, Boyu Tian, Mingyu Gao

Abstract:

Hybrid main memory systems combine both performance and capacity advantages from heterogeneous memory technologies. With larger capacities, higher associativities, and finer granularities, hybrid memory systems currently exhibit significant metadata storage and lookup overheads for flexibly remapping data blocks between the two memory tiers. To alleviate the inefficiencies of existing designs, we propose Trimma, the combination of a multi-level metadata structure and an efficient metadata cache design. Trimma uses a multilevel metadata table to only track truly necessary address remap entries. The saved memory space is effectively utilized as extra DRAM cache capacity to improve performance. Trimma also uses separate formats to store the entries with non-identity and identity mappings. This improves the overall remap cache hit rate, further boosting the performance. Trimma is transparent to software and compatible with various types of hybrid memory systems. When evaluated on a representative DDR4 + NVM hybrid memory system, Trimma achieves up to 2.4× and on average 58.1% speedup benefits, compared with a state-of-the-art design that only leverages the unallocated fast memory space for caching. Trimma addresses metadata management overheads and targets future scalable large-scale hybrid memory architectures.

Keywords: memory system, data cache, hybrid memory, non-volatile memory

Procedia PDF Downloads 11
62 Provenance in Scholarly Publications: Introducing the provCite Ontology

Authors: Maria Joseph Israel, Ahmed Amer

Abstract:

Our work aims to broaden the application of provenance technology beyond its traditional domains of scientific workflow management and database systems by offering a general provenance framework to capture richer and extensible metadata in unstructured textual data sources such as literary texts, commentaries, translations, and digital humanities. Specifically, we demonstrate the feasibility of capturing and representing expressive provenance metadata, including more of the context for citing scholarly works (e.g., the authors’ explicit or inferred intentions at the time of developing his/her research content for publication), while also supporting subsequent augmentation with similar additional metadata (by third parties, be they human or automated). To better capture the nature and types of possible citations, in our proposed provenance scheme metaScribe, we extend standard provenance conceptual models to form our proposed provCite ontology. This provides a conceptual framework which can accurately capture and describe more of the functional and rhetorical properties of a citation than can be achieved with any current models.

Keywords: knowledge representation, provenance architecture, ontology, metadata, bibliographic citation, semantic web annotation

Procedia PDF Downloads 77
61 Providing Open Access for Scholarly Information in Libya

Authors: Mohamed Abolgasem Arteimi, Ahlam Al-Tajori

Abstract:

This paper describes an ongoing project at the Libyan Academy. The project aims to build digital library for thesis and dissertations (ETD). The researchers developed a system based on Greenstone open source systems for building ETD digital library. A metadata for theses and dissertations was developed. The paper addresses issues related to project design, development and user satisfaction. Conclusions highlighted some important lessons learned to date.

Keywords: digital library, electronic theses and dissertations, open access, ETD, metadata

Procedia PDF Downloads 276
60 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries

Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman

Abstract:

There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.

Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems

Procedia PDF Downloads 111
59 Data Integration with Geographic Information System Tools for Rural Environmental Monitoring

Authors: Tamas Jancso, Andrea Podor, Eva Nagyne Hajnal, Peter Udvardy, Gabor Nagy, Attila Varga, Meng Qingyan

Abstract:

The paper deals with the conditions and circumstances of integration of remotely sensed data for rural environmental monitoring purposes. The main task is to make decisions during the integration process when we have data sources with different resolution, location, spectral channels, and dimension. In order to have exact knowledge about the integration and data fusion possibilities, it is necessary to know the properties (metadata) that characterize the data. The paper explains the joining of these data sources using their attribute data through a sample project. The resulted product will be used for rural environmental analysis.

Keywords: remote sensing, GIS, metadata, integration, environmental analysis

Procedia PDF Downloads 80
58 The Comparison of Open Source Software for Digital Libraries

Authors: Kanita Beširević

Abstract:

Open-source software development activities highly rely on Internet gathering communities volunteering in software development projects. Additionally, the libraries and cultural institutions share their metadata in the form of linked metadata to enable dissemination and enrichment. The open-source software provides free alternatives to traditional software solutions. The article aims to investigate the ever-increasing options for the digital library open source software adoption. The software available is presented and compared to other software solutions as well as to their previous versions. The top three open-source digital library software solutions are presented and compared. The comparison criteria are adopted from the UNESCO study by Bankier, J., & Gleason, K. Institutional Repository Software Comparison comprising of twelve criteria to appraise software, namely: infrastructure, front-end design, content discovery, publication tools, interoperability, and preservation. This article adopts a descriptive methodology based on data and information collected through selected software websites and the literature review.

Keywords: open source software, digital library, DSpace, Fedora, Greenstone

Procedia PDF Downloads 70
57 Represent Light and Shade of Old Beijing: Construction of Historical Picture Display Platform Based on Geographic Information System (GIS)

Authors: Li Niu, Jihong Liang, Lichao Liu, Huidi Chen

Abstract:

With the drawing of ancient palace painter, the layout of Beijing famous architect and the lens under photographers, a series of pictures which described whether emperors or ordinary people, whether gardens or Hutongs, whether historical events or life scenarios has emerged into our society. These precious resources are scattered around and preserved in different places Such as organizations like archives and libraries, along with individuals. The research combined decentralized photographic resources with Geographic Information System (GIS), focusing on the figure, event, time and location of the pictures to map them with geographic information in webpage and to display them productively. In order to meet the demand of reality, we designed a metadata description proposal, which is referred to DC and VRA standards. Another essential procedure is to formulate a four-tier classification system to correspond with the metadata proposals. As for visualization, we used Photo Waterfall and Time Line to display our resources in front end. Last but not the least, leading the Web 2.0 trend, the research developed an artistic, friendly, expandable, universal and user involvement platform to show the historical and culture precipitation of Beijing.

Keywords: historical picture, geographic information system, display platform, four-tier classification system

Procedia PDF Downloads 237
56 Categorical Metadata Encoding Schemes for Arteriovenous Fistula Blood Flow Sound Classification: Scaling Numerical Representations Leads to Improved Performance

Authors: George Zhou, Yunchan Chen, Candace Chien

Abstract:

Kidney replacement therapy is the current standard of care for end-stage renal diseases. In-center or home hemodialysis remains an integral component of the therapeutic regimen. Arteriovenous fistulas (AVF) make up the vascular circuit through which blood is filtered and returned. Naturally, AVF patency determines whether adequate clearance and filtration can be achieved and directly influences clinical outcomes. Our aim was to build a deep learning model for automated AVF stenosis screening based on the sound of blood flow through the AVF. A total of 311 patients with AVF were enrolled in this study. Blood flow sounds were collected using a digital stethoscope. For each patient, blood flow sounds were collected at 6 different locations along the patient’s AVF. The 6 locations are artery, anastomosis, distal vein, middle vein, proximal vein, and venous arch. A total of 1866 sounds were collected. The blood flow sounds are labeled as “patent” (normal) or “stenotic” (abnormal). The labels are validated from concurrent ultrasound. Our dataset included 1527 “patent” and 339 “stenotic” sounds. We show that blood flow sounds vary significantly along the AVF. For example, the blood flow sound is loudest at the anastomosis site and softest at the cephalic arch. Contextualizing the sound with location metadata significantly improves classification performance. How to encode and incorporate categorical metadata is an active area of research1. Herein, we study ordinal (i.e., integer) encoding schemes. The numerical representation is concatenated to the flattened feature vector. We train a vision transformer (ViT) on spectrogram image representations of the sound and demonstrate that using scalar multiples of our integer encodings improves classification performance. Models are evaluated using a 10-fold cross-validation procedure. The baseline performance of our ViT without any location metadata achieves an AuROC and AuPRC of 0.68 ± 0.05 and 0.28 ± 0.09, respectively. Using the following encodings of Artery:0; Arch: 1; Proximal: 2; Middle: 3; Distal 4: Anastomosis: 5, the ViT achieves an AuROC and AuPRC of 0.69 ± 0.06 and 0.30 ± 0.10, respectively. Using the following encodings of Artery:0; Arch: 10; Proximal: 20; Middle: 30; Distal 40: Anastomosis: 50, the ViT achieves an AuROC and AuPRC of 0.74 ± 0.06 and 0.38 ± 0.10, respectively. Using the following encodings of Artery:0; Arch: 100; Proximal: 200; Middle: 300; Distal 400: Anastomosis: 500, the ViT achieves an AuROC and AuPRC of 0.78 ± 0.06 and 0.43 ± 0.11. respectively. Interestingly, we see that using increasing scalar multiples of our integer encoding scheme (i.e., encoding “venous arch” as 1,10,100) results in progressively improved performance. In theory, the integer values do not matter since we are optimizing the same loss function; the model can learn to increase or decrease the weights associated with location encodings and converge on the same solution. However, in the setting of limited data and computation resources, increasing the importance at initialization either leads to faster convergence or helps the model escape a local minimum.

Keywords: arteriovenous fistula, blood flow sounds, metadata encoding, deep learning

Procedia PDF Downloads 42
55 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 78
54 Digital Curriculum Preservation Planning, Actions, and Challenges

Authors: Misook Ahn

Abstract:

This study examined the Digital Curriculum Repository (DCR) project initiated at Defense Language Institute Foreign Language Center (DLIFLC). The purpose of the DCR is to build a centralized curriculum infrastructure, preserve all curriculum materials, and provide academic service to users (faculty, students, or other agencies). The DCR collection includes core language curriculum materials developed by each language school—foreign language textbooks, language survival kits, and audio files currently in or not in use at the schools. All core curriculum materials with audio and video files have been coded, collected, and preserved at the DCR. The DCR website was designed with MS SharePoint for easy accessibility by the DLIFLC’s faculty and students. All metadata for the collected curriculum materials have been input by language, code, year, book type, level, user, version, and current status (in use/not in use). The study documents digital curriculum preservation planning, actions, and challenges, including collecting, coding, collaborating, designing DCR SharePoint, and policymaking. DCR Survey data is also collected and analyzed for this research. Based on the finding, the study concludes that the mandatory policy for the DCR system and collaboration with school leadership are critical elements of a successful repository system. The sample collected items, metadata, and DCR SharePoint site are presented in the evaluation section.

Keywords: MS share point, digital preservation, repository, policy

Procedia PDF Downloads 120
53 Semantic-Based Collaborative Filtering to Improve Visitor Cold Start in Recommender Systems

Authors: Baba Mbaye

Abstract:

In collaborative filtering recommendation systems, a user receives suggested items based on the opinions and evaluations of a community of users. This type of recommendation system uses only the information (notes in numerical values) contained in a usage matrix as input data. This matrix can be constructed based on users' behaviors or by offering users to declare their opinions on the items they know. The cold start problem leads to very poor performance for new users. It is a phenomenon that occurs at the beginning of use, in the situation where the system lacks data to make recommendations. There are three types of cold start problems: cold start for a new item, a new system, and a new user. We are interested in this article at the cold start for a new user. When the system welcomes a new user, the profile exists but does not have enough data, and its communities with other users profiles are still unknown. This leads to recommendations not adapted to the profile of the new user. In this paper, we propose an approach that improves cold start by using the notions of similarity and semantic proximity between users profiles during cold start. We will use the cold-metadata available (metadata extracted from the new user's data) useful in positioning the new user within a community. The aim is to look for similarities and semantic proximities with the old and current user profiles of the system. Proximity is represented by close concepts considered to belong to the same group, while similarity groups together elements that appear similar. Similarity and proximity are two close but not similar concepts. This similarity leads us to the construction of similarity which is based on: a) the concepts (properties, terms, instances) independent of ontology structure and, b) the simultaneous representation of the two concepts (relations, presence of terms in a document, simultaneous presence of the authorities). We propose an ontology, OIVCSRS (Ontology of Improvement Visitor Cold Start in Recommender Systems), in order to structure the terms and concepts representing the meaning of an information field, whether by the metadata of a namespace, or the elements of a knowledge domain. This approach allows us to automatically attach the new user to a user community, partially compensate for the data that was not initially provided and ultimately to associate a better first profile with the cold start. Thus, the aim of this paper is to propose an approach to improving cold start using semantic technologies.

Keywords: visitor cold start, recommender systems, collaborative filtering, semantic filtering

Procedia PDF Downloads 185
52 Variables, Annotation, and Metadata Schemas for Early Modern Greek

Authors: Eleni Karantzola, Athanasios Karasimos, Vasiliki Makri, Ioanna Skouvara

Abstract:

Historical linguistics unveils the historical depth of languages and traces variation and change by analyzing linguistic variables over time. This field of linguistics usually deals with a closed data set that can only be expanded by the (re)discovery of previously unknown manuscripts or editions. In some cases, it is possible to use (almost) the entire closed corpus of a language for research, as is the case with the Thesaurus Linguae Graecae digital library for Ancient Greek, which contains most of the extant ancient Greek literature. However, concerning ‘dynamic’ periods when the production and circulation of texts in printed as well as manuscript form have not been fully mapped, representative samples and corpora of texts are needed. Such material and tools are utterly lacking for Early Modern Greek (16th-18th c.). In this study, the principles of the creation of EMoGReC, a pilot representative corpus of Early Modern Greek (16th-18th c.) are presented. Its design follows the fundamental principles of historical corpora. The selection of texts aims to create a representative and balanced corpus that gives insight into diachronic, diatopic and diaphasic variation. The pilot sample includes data derived from fully machine-readable vernacular texts, which belong to 4-5 different textual genres and come from different geographical areas. We develop a hierarchical linguistic annotation scheme, further customized to fit the characteristics of our text corpus. Regarding variables and their variants, we use as a point of departure the bundle of twenty-four features (or categories of features) for prose demotic texts of the 16th c. Tags are introduced bearing the variants [+old/archaic] or [+novel/vernacular]. On the other hand, further phenomena that are underway (cf. The Cambridge Grammar of Medieval and Early Modern Greek) are selected for tagging. The annotated texts are enriched with metalinguistic and sociolinguistic metadata to provide a testbed for the development of the first comprehensive set of tools for the Greek language of that period. Based on a relational management system with interconnection of data, annotations, and their metadata, the EMoGReC database aspires to join a state-of-the-art technological ecosystem for the research of observed language variation and change using advanced computational approaches.

Keywords: early modern Greek, variation and change, representative corpus, diachronic variables.

Procedia PDF Downloads 19
51 A Tool for Facilitating an Institutional Risk Profile Definition

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the easy creation of an institutional risk profile for endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support risk factors set up with just the most important values that are important for a particular organisation. Subsequently, the risk profile employs fuzzy models and associated configurations for the file format metadata aggregator to support digital preservation experts with a semi-automatic estimation of endangerment level for file formats. Our goal is to make use of a domain expert knowledge base aggregated from a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation and analysis of risk factors for a requried dimension. The proposed methods improve the visibility of risk factor information and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and automatically aggregated file format metadata from linked open data sources. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.

Keywords: digital information management, file format, endangerment analysis, fuzzy models

Procedia PDF Downloads 368
50 Performance Analysis of Search Medical Imaging Service on Cloud Storage Using Decision Trees

Authors: González A. Julio, Ramírez L. Leonardo, Puerta A. Gabriel

Abstract:

Telemedicine services use a large amount of data, most of which are diagnostic images in Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7) formats. Metadata is generated from each related image to support their identification. This study presents the use of decision trees for the optimization of information search processes for diagnostic images, hosted on the cloud server. To analyze the performance in the server, the following quality of service (QoS) metrics are evaluated: delay, bandwidth, jitter, latency and throughput in five test scenarios for a total of 26 experiments during the loading and downloading of DICOM images, hosted by the telemedicine group server of the Universidad Militar Nueva Granada, Bogotá, Colombia. By applying decision trees as a data mining technique and comparing it with the sequential search, it was possible to evaluate the search times of diagnostic images in the server. The results show that by using the metadata in decision trees, the search times are substantially improved, the computational resources are optimized and the request management of the telemedicine image service is improved. Based on the experiments carried out, search efficiency increased by 45% in relation to the sequential search, given that, when downloading a diagnostic image, false positives are avoided in management and acquisition processes of said information. It is concluded that, for the diagnostic images services in telemedicine, the technique of decision trees guarantees the accessibility and robustness in the acquisition and manipulation of medical images, in improvement of the diagnoses and medical procedures in patients.

Keywords: cloud storage, decision trees, diagnostic image, search, telemedicine

Procedia PDF Downloads 169
49 Another Beautiful Sounds: Building the Memory of Sound of Peddling in Beijing with Digital Technology

Authors: Dan Wang, Qing Ma, Xiaodan Wang, Tianjiao Qi

Abstract:

The sound of peddling in Beijing, also called “yo-heave-ho” or “cry of one's ware”, is a unique folk culture and usually found in Beijing hutong. For the civilians in Beijing, sound of peddling is part of their childhood. And for those who love the traditional culture of Beijing, it is an old song singing the local conditions and customs of the ancient city. For example, because of his great appreciation, the British poet Osbert Stewart once put sound of peddling which he had heard in Beijing as a street orchestra performance in the article named "Beijing's sound and color".This research aims to collect and integrate the voice/photo resources and historical materials of sound concerning peddling in Beijing by digital technology in order to protect the intangible cultural heritage and pass on the city memory. With the goal in mind, the next stage is to collect and record all the materials and resources based on the historical documents study and interviews with civilians or performers. Then set up a metadata scheme (which refers to the domestic and international standards such as "Audio Data Processing Standards in the National Library", DC, VRA, and CDWA, etc.) to describe, process and organize the sound of peddling into a database. In order to fully show the traditional culture of sound of peddling in Beijing, web design and GIS technology are utilized to establish a website and plan holding offline exhibitions and events for people to simulate and learn the sound of peddling by using VR/AR technology. All resources are opened to the public and civilians can share the digital memory through not only the offline experiential activities, but also the online interaction. With all the attempts, a multi-media narrative platform has been established to multi-dimensionally record the sound of peddling in old Beijing with text, images, audio, video and so on.

Keywords: sound of peddling, GIS, metadata scheme, VR/AR technology

Procedia PDF Downloads 266
48 Research and Development of Net-Centric Information Sharing Platform

Authors: Wang Xiaoqing, Fang Youyuan, Zheng Yanxing, Gu Tianyang, Zong Jianjian, Tong Jinrong

Abstract:

Compared with traditional distributed environment, the net-centric environment brings on more demanding challenges for information sharing with the characteristics of ultra-large scale and strong distribution, dynamic, autonomy, heterogeneity, redundancy. This paper realizes an information sharing model and a series of core services, through which provides an open, flexible and scalable information sharing platform.

Keywords: net-centric environment, information sharing, metadata registry and catalog, cross-domain data access control

Procedia PDF Downloads 525
47 Analysis of Citation Rate and Data Reuse for Openly Accessible Biodiversity Datasets on Global Biodiversity Information Facility

Authors: Nushrat Khan, Mike Thelwall, Kayvan Kousha

Abstract:

Making research data openly accessible has been mandated by most funders over the last 5 years as it promotes reproducibility in science and reduces duplication of effort to collect the same data. There are evidence that articles that publicly share research data have higher citation rates in biological and social sciences. However, how and whether shared data is being reused is not always intuitive as such information is not easily accessible from the majority of research data repositories. This study aims to understand the practice of data citation and how data is being reused over the years focusing on biodiversity since research data is frequently reused in this field. Metadata of 38,878 datasets including citation counts were collected through the Global Biodiversity Information Facility (GBIF) API for this purpose. GBIF was used as a data source since it provides citation count for datasets, not a commonly available feature for most repositories. Analysis of dataset types, citation counts, creation and update time of datasets suggests that citation rate varies for different types of datasets, where occurrence datasets that have more granular information have higher citation rates than checklist and metadata-only datasets. Another finding is that biodiversity datasets on GBIF are frequently updated, which is unique to this field. Majority of the datasets from the earliest year of 2007 were updated after 11 years, with no dataset that was not updated since creation. For each year between 2007 and 2017, we compared the correlations between update time and citation rate of four different types of datasets. While recent datasets do not show any correlations, 3 to 4 years old datasets show weak correlation where datasets that were updated more recently received high citations. The results are suggestive that it takes several years to cumulate citations for research datasets. However, this investigation found that when searched on Google Scholar or Scopus databases for the same datasets, the number of citations is often not the same as GBIF. Hence future aim is to further explore the citation count system adopted by GBIF to evaluate its reliability and whether it can be applicable to other fields of studies as well.

Keywords: data citation, data reuse, research data sharing, webometrics

Procedia PDF Downloads 131
46 Cognitive Footprints: Analytical and Predictive Paradigm for Digital Learning

Authors: Marina Vicario, Amadeo Argüelles, Pilar Gómez, Carlos Hernández

Abstract:

In this paper, the Computer Research Network of the National Polytechnic Institute of Mexico proposes a paradigmatic model for the inference of cognitive patterns in digital learning systems. This model leads to metadata architecture useful for analysis and prediction in online learning systems; especially on MOOc's architectures. The model is in the design phase and expects to be tested through an institutional of courses project which is going to develop for the MOOc.

Keywords: cognitive footprints, learning analytics, predictive learning, digital learning, educational computing, educational informatics

Procedia PDF Downloads 426
45 Regional Adjustment to the Analytical Attenuation Coefficient in the GMPM BSSA 14 for the Region of Spain

Authors: Gonzalez Carlos, Martinez Fransisco

Abstract:

There are various types of analysis that allow us to involve seismic phenomena that cause strong requirements for structures that are designed by society; one of them is a probabilistic analysis which works from prediction equations that have been created based on metadata seismic compiled in different regions. These equations form models that are used to describe the 5% damped pseudo spectra response for the various zones considering some easily known input parameters. The biggest problem for the creation of these models requires data with great robust statistics that support the results, and there are several places where this type of information is not available, for which the use of alternative methodologies helps to achieve adjustments to different models of seismic prediction.

Keywords: GMPM, 5% damped pseudo-response spectra, models of seismic prediction, PSHA

Procedia PDF Downloads 38
44 Integration of “FAIR” Data Principles in Longitudinal Mental Health Research in Africa: Lessons from a Landscape Analysis

Authors: Bylhah Mugotitsa, Jim Todd, Agnes Kiragga, Jay Greenfield, Evans Omondi, Lukoye Atwoli, Reinpeter Momanyi

Abstract:

The INSPIRE network aims to build an open, ethical, sustainable, and FAIR (Findable, Accessible, Interoperable, Reusable) data science platform, particularly for longitudinal mental health (MH) data. While studies have been done at the clinical and population level, there still exists limitations in data and research in LMICs, which pose a risk of underrepresentation of mental disorders. It is vital to examine the existing longitudinal MH data, focusing on how FAIR datasets are. This landscape analysis aimed to provide both overall level of evidence of availability of longitudinal datasets and degree of consistency in longitudinal studies conducted. Utilizing prompters proved instrumental in streamlining the analysis process, facilitating access, crafting code snippets, categorization, and analysis of extensive data repositories related to depression, anxiety, and psychosis in Africa. While leveraging artificial intelligence (AI), we filtered through over 18,000 scientific papers spanning from 1970 to 2023. This AI-driven approach enabled the identification of 228 longitudinal research papers meeting inclusion criteria. Quality assurance revealed 10% incorrectly identified articles and 2 duplicates, underscoring the prevalence of longitudinal MH research in South Africa, focusing on depression. From the analysis, evaluating data and metadata adherence to FAIR principles remains crucial for enhancing accessibility and quality of MH research in Africa. While AI has the potential to enhance research processes, challenges such as privacy concerns and data security risks must be addressed. Ethical and equity considerations in data sharing and reuse are also vital. There’s need for collaborative efforts across disciplinary and national boundaries to improve the Findability and Accessibility of data. Current efforts should also focus on creating integrated data resources and tools to improve Interoperability and Reusability of MH data. Practical steps for researchers include careful study planning, data preservation, machine-actionable metadata, and promoting data reuse to advance science and improve equity. Metrics and recognition should be established to incentivize adherence to FAIR principles in MH research

Keywords: longitudinal mental health research, data sharing, fair data principles, Africa, landscape analysis

Procedia PDF Downloads 8
43 Semantic Data Schema Recognition

Authors: Aïcha Ben Salem, Faouzi Boufares, Sebastiao Correia

Abstract:

The subject covered in this paper aims at assisting the user in its quality approach. The goal is to better extract, mix, interpret and reuse data. It deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.

Keywords: schema recognition, semantic data profiling, meta-categorisation, semantic dependencies inter columns

Procedia PDF Downloads 383
42 Preservation Model to Process 'La Bomba Del Chota' as a Living Cultural Heritage

Authors: Lucia Carrion Gordon, Maria Gabriela Lopez Yanez

Abstract:

This project focuses on heritage concepts and their importance in every evolving and changing Digital Era where system solutions have to be sustainable, efficient and suitable to the basic needs. The prototype has to cover the principal requirements for the case studies. How to preserve the sociological ideas of dances in Ecuador like ‘La Bomba’ is the best example and challenge to preserve the intangible data. The same idea is applicable with books and music. The History and how to keep it, is the principal mission of Heritage Preservation. The dance of La Bomba is rooted on a specific movement system whose main part is the sideward hip movement. La Bomba´s movement system is the surface manifestation of a whole system of knowledge whose principal characteristics are the historical relation of Chote˜nos with their land and their families.

Keywords: digital preservation, heritage, IT management, data, metadata, ontology, serendipity

Procedia PDF Downloads 348
41 Prediction of Bariatric Surgery Publications by Using Different Machine Learning Algorithms

Authors: Senol Dogan, Gunay Karli

Abstract:

Identification of relevant publications based on a Medline query is time-consuming and error-prone. An all based process has the potential to solve this problem without any manual work. To the best of our knowledge, our study is the first to investigate the ability of machine learning to identify relevant articles accurately. 5 different machine learning algorithms were tested using 23 predictors based on several metadata fields attached to publications. We find that the Boosted model is the best-performing algorithm and its overall accuracy is 96%. In addition, specificity and sensitivity of the algorithm is 97 and 93%, respectively. As a result of the work, we understood that we can apply the same procedure to understand cancer gene expression big data.

Keywords: prediction of publications, machine learning, algorithms, bariatric surgery, comparison of algorithms, boosted, tree, logistic regression, ANN model

Procedia PDF Downloads 170
40 The Comparative Study of Binary Artifact Repository Managers

Authors: Evgeny Chugunnyy, Alena Gerasimova, Kirill Chernyavskiy, Alexander Krasnov

Abstract:

One of the primary component of Continuous deployment (CD) is a binary artifact repository — the place where artifacts are stored with metadata in a structured way. The binary artifact repository manager (BARM) is a software, which implements this repository logic and exposes a public application programming interface (API) for managing these artifacts. Almost every programming language ecosystem has its own artifact repository kind. During creating Artipie — BARM constructor and server, we analyzed and implemented a lot of different artifact repositories. In this paper we present criterias for comparing artifact repositories, and analyze the most popular repositories using these metrics. We also describe some of the notable features of different repositories. This paper aimed to help people who are creating, maintaining or optimizing software repository and CI tools.

Keywords: artifact, repository, continuous deployment, build automation, artifacts management

Procedia PDF Downloads 99