Search results for: metadata
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 69

Search results for: metadata

9 Tagging a corpus of Media Interviews with Diplomats: Challenges and Solutions

Authors: Roberta Facchinetti, Sara Corrizzato, Silvia Cavalieri

Abstract:

Increasing interconnection between data digitalization and linguistic investigation has given rise to unprecedented potentialities and challenges for corpus linguists, who need to master IT tools for data analysis and text processing, as well as to develop techniques for efficient and reliable annotation in specific mark-up languages that encode documents in a format that is both human and machine-readable. In the present paper, the challenges emerging from the compilation of a linguistic corpus will be taken into consideration, focusing on the English language in particular. To do so, the case study of the InterDiplo corpus will be illustrated. The corpus, currently under development at the University of Verona (Italy), represents a novelty in terms both of the data included and of the tag set used for its annotation. The corpus covers media interviews and debates with diplomats and international operators conversing in English with journalists who do not share the same lingua-cultural background as their interviewees. To date, this appears to be the first tagged corpus of international institutional spoken discourse and will be an important database not only for linguists interested in corpus analysis but also for experts operating in international relations. In the present paper, special attention will be dedicated to the structural mark-up, parts of speech annotation, and tagging of discursive traits, that are the innovational parts of the project being the result of a thorough study to find the best solution to suit the analytical needs of the data. Several aspects will be addressed, with special attention to the tagging of the speakers’ identity, the communicative events, and anthropophagic. Prominence will be given to the annotation of question/answer exchanges to investigate the interlocutors’ choices and how such choices impact communication. Indeed, the automated identification of questions, in relation to the expected answers, is functional to understand how interviewers elicit information as well as how interviewees provide their answers to fulfill their respective communicative aims. A detailed description of the aforementioned elements will be given using the InterDiplo-Covid19 pilot corpus. The data yielded by our preliminary analysis of the data will highlight the viable solutions found in the construction of the corpus in terms of XML conversion, metadata definition, tagging system, and discursive-pragmatic annotation to be included via Oxygen.

Keywords: spoken corpus, diplomats’ interviews, tagging system, discursive-pragmatic annotation, english linguistics

Procedia PDF Downloads 156
8 The Policia Internacional e de Defesa do Estado 1933–1969 and Valtiollinen Poliisi 1939–1948 on Screen: Comparing and Contrasting the Images of the Political Police in Portuguese and Finnish Films between the 1930s and the 1960s

Authors: Riikka Elina Kallio

Abstract:

“The walls have ears” phrase is defining the era of dictatorship in Portugal (1926–1974) and political unrest decades in Finland (1917–1948). The phrase is referring to the policing of the political, secret police, PIDE (Policia Internacional e de Defesa do Estado 1933–1969) in Portugal and VALPO (Valtiollinen Poliisi 1939–1948) in Finland. Free speech at any public space and even in private events could be fatal. The members of the PIDE/VALPO or informers/collaborators could be listening. Strict censorship under the Salazar´s regime was controlling media for example newspapers, music, and the film industry. Similarly, the politically affected censorship influenced the media in Finland in those unrest decades. This article examines the similarities and the differences in the images of the political police in Finland and Portugal, by analyzing Finnish and Portuguese films from the nineteen-thirties to nineteensixties. The text addresses two main research questions: what are the common and different features in the representations of the Finnish and Portuguese political police in films between the 1930s and 1960s, and how did the national censorship affect these representations? This study approach is interdisciplinary, and it combines film studies and criminology. Close reading is a practical qualitative method for analyzing films and in this study, close reading emphasizes the features of the police officer. Criminology provides the methodological tools for analysis of the police universal features and European common policies. The characterization of the police in this study is based on Robert Reiner´s 1980s and Timo Korander´s 2010s definitions of the police officer. The research material consisted of the Portuguese films from online film archives and Finnish films from Movie Making Finland -project´s metadata which offered suitable material by data mining the keywords such as poliisi, poliisipäällikkö and konstaapeli (police, police chief, police constable). The findings of this study suggest that even though there are common features of the images of the political police in Finland and Portugal, there are still national and cultural differences in the representations of the political police and policing.

Keywords: censorship, film studies, images, PIDE, political police, VALPO

Procedia PDF Downloads 35
7 Decolonizing Print Culture and Bibliography Through Digital Visualizations of Artists’ Books at the University of Miami

Authors: Alejandra G. Barbón, José Vila, Dania Vazquez

Abstract:

This study seeks to contribute to the advancement of library and archival sciences in the areas of records management, knowledge organization, and information architecture, particularly focusing on the enhancement of bibliographical description through the incorporation of visual interactive designs aimed to enrich the library users’ experience. In an era of heightened awareness about the legacy of hiddenness across special and rare collections in libraries and archives, along with the need for inclusivity in academia, the University of Miami Libraries has embarked on an innovative project that intersects the realms of print culture, decolonization, and digital technology. This proposal presents an exciting initiative to revitalize the study of Artists’ Books collections by employing digital visual representations to decolonize bibliographic records of some of the most unique materials and foster a more holistic understanding of cultural heritage. Artists' Books, a dynamic and interdisciplinary art form, challenge conventional bibliographic classification systems, making them ripe for the exploration of alternative approaches. This project involves the creation of a digital platform that combines multimedia elements for digital representations, interactive information retrieval systems, innovative information architecture, trending bibliographic cataloging and metadata initiatives, and collaborative curation to transform how we engage with and understand these collections. By embracing the potential of technology, we aim to transcend traditional constraints and address the historical biases that have influenced bibliographic practices. In essence, this study showcases a groundbreaking endeavor at the University of Miami Libraries that seeks to not only enhance bibliographic practices but also confront the legacy of hiddenness across special and rare collections in libraries and archives while strengthening conventional bibliographic description. By embracing digital visualizations, we aim to provide new pathways for understanding Artists' Books collections in a manner that is more inclusive, dynamic, and forward-looking. This project exemplifies the University’s dedication to fostering critical engagement, embracing technological innovation, and promoting diverse and equitable classifications and representations of cultural heritage.

Keywords: decolonizing bibliographic cataloging frameworks, digital visualizations information architecture platforms, collaborative curation and inclusivity for records management, engagement and accessibility increasing interaction design and user experience

Procedia PDF Downloads 46
6 DeepNIC a Method to Transform Each Tabular Variable into an Independant Image Analyzable by Basic CNNs

Authors: Nguyen J. M., Lucas G., Ruan S., Digonnet H., Antonioli D.

Abstract:

Introduction: Deep Learning (DL) is a very powerful tool for analyzing image data. But for tabular data, it cannot compete with machine learning methods like XGBoost. The research question becomes: can tabular data be transformed into images that can be analyzed by simple CNNs (Convolutional Neuron Networks)? Will DL be the absolute tool for data classification? All current solutions consist in repositioning the variables in a 2x2 matrix using their correlation proximity. In doing so, it obtains an image whose pixels are the variables. We implement a technology, DeepNIC, that offers the possibility of obtaining an image for each variable, which can be analyzed by simple CNNs. Material and method: The 'ROP' (Regression OPtimized) model is a binary and atypical decision tree whose nodes are managed by a new artificial neuron, the Neurop. By positioning an artificial neuron in each node of the decision trees, it is possible to make an adjustment on a theoretically infinite number of variables at each node. From this new decision tree whose nodes are artificial neurons, we created the concept of a 'Random Forest of Perfect Trees' (RFPT), which disobeys Breiman's concepts by assembling very large numbers of small trees with no classification errors. From the results of the RFPT, we developed a family of 10 statistical information criteria, Nguyen Information Criterion (NICs), which evaluates in 3 dimensions the predictive quality of a variable: Performance, Complexity and Multiplicity of solution. A NIC is a probability that can be transformed into a grey level. The value of a NIC depends essentially on 2 super parameters used in Neurops. By varying these 2 super parameters, we obtain a 2x2 matrix of probabilities for each NIC. We can combine these 10 NICs with the functions AND, OR, and XOR. The total number of combinations is greater than 100,000. In total, we obtain for each variable an image of at least 1166x1167 pixels. The intensity of the pixels is proportional to the probability of the associated NIC. The color depends on the associated NIC. This image actually contains considerable information about the ability of the variable to make the prediction of Y, depending on the presence or absence of other variables. A basic CNNs model was trained for supervised classification. Results: The first results are impressive. Using the GSE22513 public data (Omic data set of markers of Taxane Sensitivity in Breast Cancer), DEEPNic outperformed other statistical methods, including XGBoost. We still need to generalize the comparison on several databases. Conclusion: The ability to transform any tabular variable into an image offers the possibility of merging image and tabular information in the same format. This opens up great perspectives in the analysis of metadata.

Keywords: tabular data, CNNs, NICs, DeepNICs, random forest of perfect trees, classification

Procedia PDF Downloads 72
5 Studying Language of Immediacy and Language of Distance from a Corpus Linguistic Perspective: A Pilot Study of Evaluation Markers in French Television Weather Reports

Authors: Vince Liégeois

Abstract:

Language of immediacy and distance: Within their discourse theory, Koch & Oesterreicher establish a distinction between a language of immediacy and a language of distance. The former refers to those discourses which are oriented more towards a spoken norm, whereas the latter entails discourses oriented towards a written norm, regardless of whether they are realised phonically or graphically. This means that an utterance can be realised phonically but oriented more towards the written language norm (e.g., a scientific presentation or eulogy) or realised graphically but oriented towards a spoken norm (e.g., a scribble or chat messages). Research desiderata: The methodological approach from Koch & Oesterreicher has often been criticised for not providing a corpus-linguistic methodology, which makes it difficult to work with quantitative data or address large text collections within this research paradigm. Consequently, the Koch & Oesterreicher approach has difficulties gaining ground in those research areas which rely more on corpus linguistic research models, like text linguistics and LSP-research. A combinatory approach: Accordingly, we want to establish a combinatory approach with corpus-based linguistic methodology. To this end, we propose to (i) include data about the context of an utterance (e.g., monologicity/dialogicity, familiarity with the speaker) – which were called “conditions of communication” in the original work of Koch & Oesterreicher – and (ii) correlate the linguistic phenomenon at the centre of the inquiry (e.g., evaluation markers) to a group of linguistic phenomena deemed typical for either distance- or immediacy-language. Based on these two parameters, linguistic phenomena and texts could then be mapped on an immediacy-distance continuum. Pilot study: To illustrate the benefits of this approach, we will conduct a pilot study on evaluation phenomena in French television weather reports, a form of domain-sensitive discourse which has often been cited as an example of a “text genre”. Within this text genre, we will look at so-called “evaluation markers,” e.g., fixed strings like bad weather, stifling hot, and “no luck today!”. These evaluation markers help to communicate the coming weather situation towards the lay audience but have not yet been studied within the Koch & Oesterreicher research paradigm. Accordingly, we want to figure out whether said evaluation markers are more typical for those weather reports which tend more towards immediacy or those which tend more towards distance. To this aim, we collected a corpus with different kinds of television weather reports,e.g., as part of the news broadcast, including dialogue. The evaluation markers themselves will be studied according to the explained methodology, by correlating them to (i) metadata about the context and (ii) linguistic phenomena characterising immediacy-language: repetition, deixis (personal, spatial, and temporal), a freer choice of tense and right- /left-dislocation. Results: Our results indicate that evaluation markers are more dominantly present in those weather reports inclining towards immediacy-language. Based on the methodology established above, we have gained more insight into the working of evaluation markers in the domain-sensitive text genre of (television) weather reports. For future research, it will be interesting to determine whether said evaluation markers are also typical for immediacy-language-oriented in other domain-sensitive discourses.

Keywords: corpus-based linguistics, evaluation markers, language of immediacy and distance, weather reports

Procedia PDF Downloads 179
4 Destination Management Organization in the Digital Era: A Data Framework to Leverage Collective Intelligence

Authors: Alfredo Fortunato, Carmelofrancesco Origlia, Sara Laurita, Rossella Nicoletti

Abstract:

In the post-pandemic recovery phase of tourism, the role of a Destination Management Organization (DMO) as a coordinated management system of all the elements that make up a destination (attractions, access, marketing, human resources, brand, pricing, etc.) is also becoming relevant for local territories. The objective of a DMO is to maximize the visitor's perception of value and quality while ensuring the competitiveness and sustainability of the destination, as well as the long-term preservation of its natural and cultural assets, and to catalyze benefits for the local economy and residents. In carrying out the multiple functions to which it is called, the DMO can leverage a collective intelligence that comes from the ability to pool information, explicit and tacit knowledge, and relationships of the various stakeholders: policymakers, public managers and officials, entrepreneurs in the tourism supply chain, researchers, data journalists, schools, associations and committees, citizens, etc. The DMO potentially has at its disposal large volumes of data and many of them at low cost, that need to be properly processed to produce value. Based on these assumptions, the paper presents a conceptual framework for building an information system to support the DMO in the intelligent management of a tourist destination tested in an area of southern Italy. The approach adopted is data-informed and consists of four phases: (1) formulation of the knowledge problem (analysis of policy documents and industry reports; focus groups and co-design with stakeholders; definition of information needs and key questions); (2) research and metadatation of relevant sources (reconnaissance of official sources, administrative archives and internal DMO sources); (3) gap analysis and identification of unconventional information sources (evaluation of traditional sources with respect to the level of consistency with information needs, the freshness of information and granularity of data; enrichment of the information base by identifying and studying web sources such as Wikipedia, Google Trends, Booking.com, Tripadvisor, websites of accommodation facilities and online newspapers); (4) definition of the set of indicators and construction of the information base (specific definition of indicators and procedures for data acquisition, transformation, and analysis). The framework derived consists of 6 thematic areas (accommodation supply, cultural heritage, flows, value, sustainability, and enabling factors), each of which is divided into three domains that gather a specific information need to be represented by a scheme of questions to be answered through the analysis of available indicators. The framework is characterized by a high degree of flexibility in the European context, given that it can be customized for each destination by adapting the part related to internal sources. Application to the case study led to the creation of a decision support system that allows: •integration of data from heterogeneous sources, including through the execution of automated web crawling procedures for data ingestion of social and web information; •reading and interpretation of data and metadata through guided navigation paths in the key of digital story-telling; •implementation of complex analysis capabilities through the use of data mining algorithms such as for the prediction of tourist flows.

Keywords: collective intelligence, data framework, destination management, smart tourism

Procedia PDF Downloads 95
3 A Bibliometric Analysis of Ukrainian Research Articles on SARS-COV-2 (COVID-19) in Compliance with the Standards of Current Research Information Systems

Authors: Sabina Auhunas

Abstract:

These days in Ukraine, Open Science dramatically develops for the sake of scientists of all branches, providing an opportunity to take a more close look on the studies by foreign scientists, as well as to deliver their own scientific data to national and international journals. However, when it comes to the generalization of data on science activities by Ukrainian scientists, these data are often integrated into E-systems that operate inconsistent and barely related information sources. In order to resolve these issues, developed countries productively use E-systems, designed to store and manage research data, such as Current Research Information Systems that enable combining uncompiled data obtained from different sources. An algorithm for selecting SARS-CoV-2 research articles was designed, by means of which we collected the set of papers published by Ukrainian scientists and uploaded by August 1, 2020. Resulting metadata (document type, open access status, citation count, h-index, most cited documents, international research funding, author counts, the bibliographic relationship of journals) were taken from Scopus and Web of Science databases. The study also considered the info from COVID-19/SARS-CoV-2-related documents published from December 2019 to September 2020, directly from documents published by authors depending on territorial affiliation to Ukraine. These databases are enabled to get the necessary information for bibliometric analysis and necessary details: copyright, which may not be available in other databases (e.g., Science Direct). Search criteria and results for each online database were considered according to the WHO classification of the virus and the disease caused by this virus and represented (Table 1). First, we identified 89 research papers that provided us with the final data set after consolidation and removing duplication; however, only 56 papers were used for the analysis. The total number of documents by results from the WoS database came out at 21641 documents (48 affiliated to Ukraine among them) in the Scopus database came out at 32478 documents (41 affiliated to Ukraine among them). According to the publication activity of Ukrainian scientists, the following areas prevailed: Education, educational research (9 documents, 20.58%); Social Sciences, interdisciplinary (6 documents, 11.76%) and Economics (4 documents, 8.82%). The highest publication activity by institution types was reported in the Ministry of Education and Science of Ukraine (its percent of published scientific papers equals 36% or 7 documents), Danylo Halytsky Lviv National Medical University goes next (5 documents, 15%) and P. L. Shupyk National Medical Academy of Postgraduate Education (4 documents, 12%). Basically, research activities by Ukrainian scientists were funded by 5 entities: Belgian Development Cooperation, the National Institutes of Health (NIH, U.S.), The United States Department of Health & Human Services, grant from the Whitney and Betty MacMillan Center for International and Area Studies at Yale, a grant from the Yale Women Faculty Forum. Based on the results of the analysis, we obtained a set of published articles and preprints to be assessed on the variety of features in upcoming studies, including citation count, most cited documents, a bibliographic relationship of journals, reference linking. Further research on the development of the national scientific E-database continues using brand new analytical methods.

Keywords: content analysis, COVID-19, scientometrics, text mining

Procedia PDF Downloads 88
2 Voices of Dissent: Case Study of a Digital Archive of Testimonies of Political Oppression

Authors: Andrea Scapolo, Zaya Rustamova, Arturo Matute Castro

Abstract:

The “Voices in Dissent” initiative aims at collecting and making available in a digital format, testimonies, letters, and other narratives produced by victims of political oppression from different geographical spaces across the Atlantic. By recovering silenced voices behind the official narratives, this open-access online database will provide indispensable tools for rewriting the history of authoritarian regimes from the margins as memory debates continue to provoke controversy among academic and popular transnational circles. In providing an extensive database of non-hegemonic discourses in a variety of political and social contexts, the project will complement the existing European and Latin-American studies, and invite further interdisciplinary and trans-national research. This digital resource will be available to academic communities and the general audience and will be organized geographically and chronologically. “Voices in Dissent” will offer a first comprehensive study of these personal accounts of persecution and repression against determined historical backgrounds and their impact on collective memory formation in contemporary societies. The digitalization of these texts will allow to run metadata analyses and adopt comparatist approaches for a broad range of research endeavors. Most of the testimonies included in our archive are testimonies of trauma: the trauma of exile, imprisonment, torture, humiliation, censorship. The research on trauma has now reached critical mass and offers a broad spectrum of critical perspectives. By putting together testimonies from different geographical and historical contexts, our project will provide readers and scholars with an extraordinary opportunity to investigate how culture shapes individual and collective memories and provides or denies resources to make sense and cope with the trauma. For scholars dealing with the epistemological and rhetorical analysis of testimonies, an online open-access archive will prove particularly beneficial to test theories on truth status and the formation of belief as well as to study the articulation of discourse. An important aspect of this project is also its pedagogical applications since it will contribute to the creation of Open Educational Resources (OER) to support students and educators worldwide. Through collaborations with our Library System, the archive will form part of the Digital Commons database. The texts collected in this online archive will be made available in the original languages as well as in English translation. They will be accompanied by a critical apparatus that will contextualize them historically by providing relevant background information and bibliographical references. All these materials can serve as a springboard for a broad variety of educational projects and classroom activities. They can also be used to design specific content courses or modules. In conclusion, the desirable outcomes of the “Voices in Dissent” project are: 1. the collections and digitalization of political dissent testimonies; 2. the building of a network of scholars, educators, and learners involved in the design, development, and sustainability of the digital archive; 3. the integration of the content of the archive in both research and teaching endeavors, such as publication of scholarly articles, design of new upper-level courses, and integration of the materials in existing courses.

Keywords: digital archive, dissent, open educational resources, testimonies, transatlantic studies

Procedia PDF Downloads 87
1 A Tool to Provide Advanced Secure Exchange of Electronic Documents through Europe

Authors: Jesus Carretero, Mario Vasile, Javier Garcia-Blas, Felix Garcia-Carballeira

Abstract:

Supporting cross-border secure and reliable exchange of data and documents and to promote data interoperability is critical for Europe to enhance sector (like eFinance, eJustice and eHealth). This work presents the status and results of the European Project MADE, a Research Project funded by Connecting Europe facility Programme, to provide secure e-invoicing and e-document exchange systems among Europe countries in compliance with the eIDAS Regulation (Regulation EU 910/2014 on electronic identification and trust services). The main goal of MADE is to develop six new AS4 Access Points and SMP in Europe to provide secure document exchanges using the eDelivery DSI (Digital Service Infrastructure) amongst both private and public entities. Moreover, the project demonstrates the feasibility and interest of the solution provided by providing several months of interoperability among the providers of the six partners in different EU countries. To achieve those goals, we have followed a methodology setting first a common background for requirements in the partner countries and the European regulations. Then, the partners have implemented access points in each country, including their service metadata publisher (SMP), to allow the access to their clients to the pan-European network. Finally, we have setup interoperability tests with the other access points of the consortium. The tests will include the use of each entity production-ready Information Systems that process the data to confirm all steps of the data exchange. For the access points, we have chosen AS4 instead of other existing alternatives because it supports multiple payloads, native web services, pulling facilities, lightweight client implementations, modern crypto algorithms, and more authentication types, like username-password and X.509 authentication and SAML authentication. The main contribution of MADE project is to open the path for European companies to use eDelivery services with cross-border exchange of electronic documents following PEPPOL (Pan-European Public Procurement Online) based on the e-SENS AS4 Profile. It also includes the development/integration of new components, integration of new and existing logging and traceability solutions and maintenance tool support for PKI. Moreover, we have found that most companies are still not ready to support those profiles. Thus further efforts will be needed to promote this technology into the companies. The consortium includes the following 9 partners. From them, 2 are research institutions: University Carlos III of Madrid (Coordinator), and Universidad Politecnica de Valencia. The other 7 (EDICOM, BIZbrains, Officient, Aksesspunkt Norge, eConnect, LMT group, Unimaze) are private entities specialized in secure delivery of electronic documents and information integration brokerage in their respective countries. To achieve cross-border operativity, they will include AS4 and SMP services in their platforms according to the EU Core Service Platform. Made project is instrumental to test the feasibility of cross-border documents eDelivery in Europe. If successful, not only einvoices, but many other types of documents will be securely exchanged through Europe. It will be the base to extend the network to the whole Europe. This project has been funded under the Connecting Europe Facility Agreement number: INEA/CEF/ICT/A2016/1278042. Action No: 2016-EU-IA-0063.

Keywords: security, e-delivery, e-invoicing, e-delivery, e-document exchange, trust

Procedia PDF Downloads 225