Search results for: semi-structured documents
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 901

Search results for: semi-structured documents

871 A Social Identity Analysis of Ottoman and Safavid Architects in the Historical Documents of the 16th to 17th Centuries

Authors: Farzaneh Farrokhfar, Mohammad Khazaie

Abstract:

The 16th and 17th centuries coincide with the classical age of Ottoman art history. Simultaneously with this age and in the eastern neighborhood of the Ottoman state, the Safavid Shiite state emerged, which, despite political and religious differences with the Ottomans, played an important role in cultural and artistic exchanges with Anatolia. The harmony of arts, including architecture, is one of the most important manifestations of cultural exchange between the two regions, which shows the intellectual commonalities of the two regions. In parallel with the production of works of art, the registration of information and identities of Ottoman and Safavid artists and craftsmen has been done by many historians and biographers, some of whom, fortunately, are available to us today and can be evaluated. This research first intends to read historical documents and reports related to the architects of the two Ottoman states in Anatolia and Safavid states in Iran in the 16th and 17th centuries and then examines the status of architects' information records and their location in the two regions. The results reveal the names and identities of some Ottoman and Safavid architects in the 16th and 17th centuries and show the method of recording information in the documents of the two regions. This research is done in a comparative historical method, and the method of collecting its resources is a documentary library.

Keywords: classical era, Ottoman architecture, Safavid architecture, Central Asian historical documents

Procedia PDF Downloads 104
870 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 367
869 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 423
868 Direct Blind Separation Methods for Convolutive Images Mixtures

Authors: Ahmed Hammed, Wady Naanaa

Abstract:

In this paper, we propose a general approach to deal with the problem of a convolutive mixture of images. We use a direct blind source separation method by adding only one non-statistical justified constraint describing the relationships between different mixing matrix at the aim to make its resolution easy. This method can be applied, provided that this constraint is known, to degraded document affected by the overlapping of text-patterns and images. This is due to chemical and physical reactions of the materials (paper, inks,...) occurring during the documents aging, and other unpredictable causes such as humidity, microorganism infestation, human handling, etc. We will demonstrate that this problem corresponds to a convolutive mixture of images. Subsequently, we will show how the validation of our method through numerical examples. We can so obtain clear images from unreadable ones which can be caused by pages superposition, a phenomenon similar to that we find every often in archival documents.

Keywords: blind source separation, convoluted mixture, degraded documents, text-patterns overlapping

Procedia PDF Downloads 298
867 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 372
866 An Exploration of Policy-related Documents on District Heating and Cooling in Flanders: a Slow and Bottom-up Process

Authors: Isaura Bonneux

Abstract:

District heating and cooling (DHC) is increasingly recognized as a viable path towards sustainable heating and cooling. While some countries like Sweden and Denmark have a longstanding tradition of DHC, Belgium is lacking behind. The Northern part of Belgium, Flanders, had only a total of 95 heating networks in July 2023. Nevertheless, it is increasingly exploring its possibilities to enhance the scope of DHC. DHC is a complex energy system, requiring a lot of collaboration between various stakeholders on various levels. Therefore, it is of interest to look closer at policy-related documents at the Flemish (regional) level, as these policies set the scene for DHC development in the Flemish region. This kind of analysis has not been undertaken so far. This paper has the following research question: “Who talks about DHC, and in which way and context is DHC discussed in Flemish policy-related documents?” To answer this question, the Overton policy database was used to search and retrieve relevant policy-related documents. Overton retrieves data from governments, think thanks, NGOs, and IGOs. In total, out of the 244 original results, 117 documents between 2009 and 2023 were analyzed. Every selected document included theme keywords, policymaking department(s), date, and document type. These elements were used for quantitative data description and visualization. Further, qualitative content analysis revealed patterns and main themes regarding DHC in Flanders. Four main conclusions can be drawn: First, it is obvious from the timeframe that DHC is a new topic in Flanders with still limited attention; 2014, 2016 and 2017 were the years with the most documents, yet this number is still only 12 documents. In addition, many documents talked about DHC but not much in depth and painted it as a future scenario with a lot of uncertainty around it. The largest part of the issuing government departments had a link to either energy or climate (e.g. Flemish Environmental Agency) or policy (e.g. Socio-Economic Council of Flanders) Second, DHC is mentioned most within an ‘Environment and Sustainability’ context, followed by ‘General Policy and Regulation’. This is intuitive, as DHC is perceived as a sustainable heating and cooling technique and this analysis compromises policy-related documents. Third, Flanders seems mostly interested in using waste or residual heat as a heating source for DHC. The harbors and waste incineration plants are identified as potential and promising supply sources. This approach tries to conciliate environmental and economic incentives. Last, local councils get assigned a central role and the initiative is mostly taken by them. The policy documents and policy advices demonstrate that Flanders opts for a bottom-up organization. As DHC is very dependent on local conditions, this seems a logic step. Nevertheless, this can impede smaller councils to create DHC networks and slow down systematic and fast implementation of DHC throughout Flanders.

Keywords: district heating and cooling, flanders, overton database, policy analysis

Procedia PDF Downloads 9
865 Psychodidactic Strategies to Facilitate Flow of Logical Thinking in Preparation of Academic Documents

Authors: Deni Stincer Gomez, Zuraya Monroy Nasr, Luis Pérez Alvarez

Abstract:

The preparation of academic documents such as thesis, articles and research projects is one of the requirements of the higher educational level. These documents demand the implementation of logical argumentative thinking which is experienced and executed with difficulty. To mitigate the effect of these difficulties this study designed a thesis seminar, with which the authors have seven years of experience. It is taught in a graduate program in Psychology at the National Autonomous University of Mexico. In this study the authors use the Toulmin model as a mental heuristic and for the application of a set of psychodidactic strategies that facilitate the elaboration of the plot and culmination of the thesis. The efficiency in obtaining the degree in the groups exposed to the seminar has increased by 94% compared to the 10% that existed in the generations that were not exposed to the seminar. In this article the authors will emphasize the psychodidactic strategies used. The Toulmin model alone does not guarantee the success achieved. A set of actions of a psychological nature (almost psychotherapeutic) and didactics of the teacher also seem to contribute. These are actions that derive from an understanding of the psychological, epistemological and ontogenetic obstacles and the most frequent errors in which thought tends to fall when it is demanded a logical course. The authors have grouped the strategies into three groups: 1) strategies to facilitate logical thinking, 2) strategies to strengthen the scientific self and 3) strategies to facilitate the act of writing the text. In this work the authors delve into each of them.

Keywords: psychodidactic strategies, logical thinking, academic documents, Toulmin model

Procedia PDF Downloads 155
864 On Exploring Search Heuristics for improving the efficiency in Web Information Extraction

Authors: Patricia Jiménez, Rafael Corchuelo

Abstract:

Nowadays the World Wide Web is the most popular source of information that relies on billions of on-line documents. Web mining is used to crawl through these documents, collect the information of interest and process it by applying data mining tools in order to use the gathered information in the best interest of a business, what enables companies to promote theirs. Unfortunately, it is not easy to extract the information a web site provides automatically when it lacks an API that allows to transform the user-friendly data provided in web documents into a structured format that is machine-readable. Rule-based information extractors are the tools intended to extract the information of interest automatically and offer it in a structured format that allow mining tools to process it. However, the performance of an information extractor strongly depends on the search heuristic employed since bad choices regarding how to learn a rule may easily result in loss of effectiveness and/or efficiency. Improving search heuristics regarding efficiency is of uttermost importance in the field of Web Information Extraction since typical datasets are very large. In this paper, we employ an information extractor based on a classical top-down algorithm that uses the so-called Information Gain heuristic introduced by Quinlan and Cameron-Jones. Unfortunately, the Information Gain relies on some well-known problems so we analyse an intuitive alternative, Termini, that is clearly more efficient; we also analyse other proposals in the literature and conclude that none of them outperforms the previous alternative.

Keywords: information extraction, search heuristics, semi-structured documents, web mining.

Procedia PDF Downloads 307
863 Mapping of Adrenal Gland Diseases Research in Middle East Countries: A Scientometric Analysis, 2007-2013

Authors: Zahra Emami, Mohammad Ebrahim Khamseh, Nahid Hashemi Madani, Iman Kermani

Abstract:

The aim of the study was to map scientific research on adrenal gland diseases in the Middle East countries through the Web of Science database using scientometric analysis. Data were analyzed with Excel software; and HistCite was used for mapping of the scientific texts. In this study, from a total of 268 retrieved records, 1125 authors from 328 institutions published their texts in 138 journals. Among 17 Middle East countries, Turkey ranked first with 164 documents (61.19%), Israel ranked second with 47 documents (15.53%) and Iran came in the third place with 26 documents. Most of the publications (185 documents, 69.2%) were articles. Among the universities of the Middle East, Istanbul University had the highest science production rate (9.7%). The Journal of Clinical Endocrinology & Metabolism had the highest TGCS (243 citations). In the scientific mapping, 7 clusters were formed based on TLCS (Total Local Citation Score) & TGCS (Total Global Citation Score). considering the study results, establishment of scientific connections and collaboration with other countries and use of publications on adrenal gland diseases from high ranking universities can help in the development of this field and promote the medical practice in this regard. Moreover, investigation of the formed clusters in relation to Congenital Hyperplasia and puberty related disorders can be research priorities for investigators.

Keywords: mapping, scientific research, adrenal gland diseases, scientometric

Procedia PDF Downloads 240
862 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 241
861 A Bibliometric Analysis of Ukrainian Research Articles on SARS-COV-2 (COVID-19) in Compliance with the Standards of Current Research Information Systems

Authors: Sabina Auhunas

Abstract:

These days in Ukraine, Open Science dramatically develops for the sake of scientists of all branches, providing an opportunity to take a more close look on the studies by foreign scientists, as well as to deliver their own scientific data to national and international journals. However, when it comes to the generalization of data on science activities by Ukrainian scientists, these data are often integrated into E-systems that operate inconsistent and barely related information sources. In order to resolve these issues, developed countries productively use E-systems, designed to store and manage research data, such as Current Research Information Systems that enable combining uncompiled data obtained from different sources. An algorithm for selecting SARS-CoV-2 research articles was designed, by means of which we collected the set of papers published by Ukrainian scientists and uploaded by August 1, 2020. Resulting metadata (document type, open access status, citation count, h-index, most cited documents, international research funding, author counts, the bibliographic relationship of journals) were taken from Scopus and Web of Science databases. The study also considered the info from COVID-19/SARS-CoV-2-related documents published from December 2019 to September 2020, directly from documents published by authors depending on territorial affiliation to Ukraine. These databases are enabled to get the necessary information for bibliometric analysis and necessary details: copyright, which may not be available in other databases (e.g., Science Direct). Search criteria and results for each online database were considered according to the WHO classification of the virus and the disease caused by this virus and represented (Table 1). First, we identified 89 research papers that provided us with the final data set after consolidation and removing duplication; however, only 56 papers were used for the analysis. The total number of documents by results from the WoS database came out at 21641 documents (48 affiliated to Ukraine among them) in the Scopus database came out at 32478 documents (41 affiliated to Ukraine among them). According to the publication activity of Ukrainian scientists, the following areas prevailed: Education, educational research (9 documents, 20.58%); Social Sciences, interdisciplinary (6 documents, 11.76%) and Economics (4 documents, 8.82%). The highest publication activity by institution types was reported in the Ministry of Education and Science of Ukraine (its percent of published scientific papers equals 36% or 7 documents), Danylo Halytsky Lviv National Medical University goes next (5 documents, 15%) and P. L. Shupyk National Medical Academy of Postgraduate Education (4 documents, 12%). Basically, research activities by Ukrainian scientists were funded by 5 entities: Belgian Development Cooperation, the National Institutes of Health (NIH, U.S.), The United States Department of Health & Human Services, grant from the Whitney and Betty MacMillan Center for International and Area Studies at Yale, a grant from the Yale Women Faculty Forum. Based on the results of the analysis, we obtained a set of published articles and preprints to be assessed on the variety of features in upcoming studies, including citation count, most cited documents, a bibliographic relationship of journals, reference linking. Further research on the development of the national scientific E-database continues using brand new analytical methods.

Keywords: content analysis, COVID-19, scientometrics, text mining

Procedia PDF Downloads 88
860 Project Design Deliverables Sequence (PDD)

Authors: Nahed Al-Hajeri

Abstract:

There are several reasons which lead to a delay in project completion, out of all, one main reason is the delay in deliverable processing, i.e. submission and review of documents. Most of the project cycles start with a list of deliverables but without a sequence of submission of the same, means without a direction to move, leading to overlapping of activities and more interdependencies. Hence Project Design Deliverables (PDD) is developed as a solution to Organize Transmittals (Documents/Drawings) received from contractors/consultants during different phases of an EPC (Engineering, Procurement, and Construction) projects, which gives proper direction to the stakeholders from the beginning, to reduce inter-discipline dependency, avoid overlapping of activities, provide a list of deliverables, sequence of activities, etc. PDD attempts to provide a list and sequencing of the engineering documents/drawings required during different phases of a Project which will benefit both client and Contractor in performing planned activities through timely submission and review of deliverables. This helps in ensuring improved quality and completion of Project in time. The successful implementation begins with a detailed understanding the specific challenges and requirements of the project. PDD will help to learn about vendor document submissions including general workflow, sequence and monitor the submission and review of the deliverables from the early stages of Project. This will provide an overview for the Submission of deliverables by the concerned during the projects in proper sequence. The goal of PDD is also to hold responsible and accountability of all stakeholders during complete project cycle. We believe that successful implementation of PDD with a detailed list of documents and their sequence will help organizations to achieve the project target.

Keywords: EPC (Engineering, Procurement, and Construction), project design deliverables (PDD), econometrics sciences, management sciences

Procedia PDF Downloads 371
859 An Investigation of Migrants' Attitudes towards Their Ethnic Languages: A Study of Angolan Migrants in Namibia

Authors: Julia Indongo - Haiduwa

Abstract:

The study looks at the attitudes of Angolan migrants in the informal sectors towards their ethnic languages. The assumption is most Angolan migrants speak Portuguese instead of their ethnic languages as they lack interest in their ethnic languages. The study was qualitative in nature, and 20 Angolan migrants who are operating in the informal sector where purposively selected for the semistructured interviews. The study revealed that many Angolan has negative attitudes towards their ethnic language because even prior to their migration to Namibia, they use Portuguese to communicate as opposed to their ethnic languages. The ethnic languages are associated with old people and the ethnic languages do not offer the migrants any economic benefits. The study recommends that there is a need for the revitalization of Angolan ethnic languages in Namibia in order to maintain the language and prevent them from dying.

Keywords: ethnic languages language attitude, language, choice, language maintenance, multilingualism

Procedia PDF Downloads 164
858 Selection of Relevant Servers in Distributed Information Retrieval System

Authors: Benhamouda Sara, Guezouli Larbi

Abstract:

Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.

Keywords: distributed information retrieval, relevance, server selection, collection selection

Procedia PDF Downloads 257
857 The Second Column of Origen’s Hexapla and the Transcription of BGDKPT Consonants: A Confrontation with Transliterated Hebrew Names in Greek Documents

Authors: Isabella Maurizio

Abstract:

This research analyses the pronunciation of Hebrew consonants 'bgdkpt' in II- III C. E. in Palestine, through the confrontation of two kinds of data: the fragments of transliteration of Old Testament in the Greek alphabet, from the second column of Origen’s synopsis, called Hexapla, and Hebrew names transliterated in Greek documents, especially epigraphs. Origen is a very important author, not only for his bgdkpt theological and exegetic works: the Hexapla, synoptic six columns for a critical edition of Septuaginta, has a relevant role in attempting to reconstruct the pronunciation of Hebrew language before Masoretic punctuation. For this reason, at the beginning, it is important to analyze the column in order to study phonetic and linguistic phenomena. Among the most problematic data, there is the evidence from bgdkpt consonants, always represented as Greek aspirated graphemes. This transcription raised the question if their pronunciation was the only spirant, and consequently, the double one, that is, the stop/spirant contrast, was introduced by Masoretes. However, the phonetic and linguistic examination of the column alone is not enough to establish a real pronunciation of language: this paper is significant because a confrontation between the second column’s transliteration and Hebrew names found in Greek documents epigraphic ones mainly, is achieved. Palestine in II - III was a bilingual country: Greek and Aramaic language lived together, the first one like the official language, the second one as the principal mean of communication between people. For this reason, Hebrew names are often found in Greek documents of the same geographical area: a deep examination of bgdkpt’s transliteration can help to understand better which the real pronunciation of these consonants was, or at least it allows to evidence a phonetic tendency. As a consequence, the research considers the contemporary documents to Origen and the previous ones: the first ones testify a specific stadium of pronunciation, the second ones reflect phonemes’ evolution. Alexandrian documents are also examined: Origen was from there, and the influence of Greek language, spoken in his native country, must be considered. The epigraphs have another implication: they are totally free from morphological criteria, probably used by Origen in his column, because of their popular origin. Thus, a confrontation between the hexaplaric transliteration and Hebrew names is absolutely required, in Hexapla’s studies: first of all, it can be the second clue of a pronunciation already noted in the column; then because, for documents’ specific nature, it has more probabilities to be real, reflecting a daily use of language. The examination of data shows a general tendency to employ the aspirated graphemes for bgdkpt consonants’ transliteration. This probably means that they were closer to Greek aspirated consonants rather than to the plosive ones. The exceptions are linked to a particular status of the name, i.e. its history and origin. In this way, this paper gives its contribution to onomastic studies, too: indeed, the research may contribute to verify the diffusion and the treatment of Jewish names in Hellenized world and in the koinè language.

Keywords: bgdkpt consonants, Greek epigraphs, Jewish names, origen's Hexapla

Procedia PDF Downloads 110
856 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: search algorithm, centroid, query, keyword, co-occurrence, categorisation

Procedia PDF Downloads 253
855 Behavior of Printing Inks on Historical Documents Subjected to Cold RF Plasma Discharges

Authors: Dorina Rusu, Emil Ghiocel Ioanid, Marta Ursescu, Ana Maria Vlad, Mihaela Popescu

Abstract:

During the last decades the cold plasma discharges made the subject of numerous studies concerning the applications in the cultural heritage field, especially concentrated on ecological and non-invasive aspect of these conservation procedures. The conservation treatment using cold plasma is based, on the one hand, on the well-known property of plasma discharges to inactivate the contaminant biological species and, on the other hand, on the surface cleaning effect. Moreover the plasma discharge produces the functionalization of the treated surface, allowing subsequent deposition of protective layers. The paper presents the behavior of printing inks on historical documents treated in cold RF plasma. Two types of printing inks were studied, namely red and black ink, used on a religious book published in 19 century. SEM-EDX analysis results in the identification of the two inks as carbon black ink (C presence in the EDX spectrum) and cinnabar based red ink (Hg and S lines in the spectrum), result confirmed by XRF analysis. The experiments have been performed on paper samples written with laboratory- made inks, of similar composition with the inks identified on historical documents. The samples were subjected to RF plasma discharge, operating in nitrogen gaseous medium, at 1.2 MHz frequency and low-pressure (0.5 mbar), performed in a self-designed equipment for the application of conservation treatments on naturally aged paper supports. The impact of plasma discharge on the inks has been evaluated by SEM, XRD and color analysis. The color analysis revealed a slight discoloration of cinnabar ink on the historical document. SEM and XRD analyses have been carried out in an attempt to elucidate the process responsable for color modification.

Keywords: RF plasma, printing inks, historical documents, surface cleaning effect

Procedia PDF Downloads 413
854 Popularization of the Communist Manifesto in 19th Century Europe

Authors: Xuanyu Bai

Abstract:

“The Communist Manifesto”, written by Karl Marx and Friedrich Engels, is one of the most significant documents throughout the whole history which covers across different fields including Economic, Politic, Sociology and Philosophy. Instead of discussing the Communist ideas presented in the Communist Manifesto, the essay focuses on exploring the reasons that contributed to the popularization of the document and its influence on political revolutions in 19th century Europe by concentrating on the document itself along with other primary and secondary sources and temporal artwork. Combining the details from the Communist Manifesto and other documents, Marx’s writing style and word choice, his convincible notions about a new society dominated by proletariats, and the revolutionary idea of class destruction has led to the popularization of the Communist Manifesto and influenced the latter political revolutions.

Keywords: communist manifesto, Marx, Engels, capitalism

Procedia PDF Downloads 107
853 Modified Active (MA) Algorithm to Generate Semantic Web Related Clustered Hierarchy for Keyword Search

Authors: G. Leena Giri, Archana Mathur, S. H. Manjula, K. R. Venugopal, L. M. Patnaik

Abstract:

Keyword search in XML documents is based on the notion of lowest common ancestors in the labelled trees model of XML documents and has recently gained a lot of research interest in the database community. In this paper, we propose the Modified Active (MA) algorithm which is an improvement over the active clustering algorithm by taking into consideration the entity aspect of the nodes to find the level of the node pertaining to a particular keyword input by the user. A portion of the bibliography database is used to experimentally evaluate the modified active algorithm and results show that it performs better than the active algorithm. Our modification improves the response time of the system and thereby increases the efficiency of the system.

Keywords: keyword matching patterns, MA algorithm, semantic search, knowledge management

Procedia PDF Downloads 376
852 Computer Fraud from the Perspective of Iran's Law and International Documents

Authors: Babak Pourghahramani

Abstract:

One of the modern crimes against property and ownership in the cyber-space is the computer fraud. Despite being modern, the aforementioned crime has its roots in the principles of religious jurisprudence. In some cases, this crime is compatible with the traditional regulations and that is when the computer is considered as a crime commitment device and also some computer frauds that take place in the context of electronic exchanges are considered as crime based on the E-commerce Law (approved in 2003) but the aforementioned regulations are flawed and until recent years there was no comprehensive law in this regard; yet after some years the Computer Crime Act was approved in 2009/26/5 and partly solved the problem of legal vacuum. The present study intends to investigate the computer fraud according to Iran's Computer Crime Act and by taking into consideration the international documents.

Keywords: fraud, cyber fraud, computer fraud, classic fraud, computer crime

Procedia PDF Downloads 304
851 Analysis of State Documents on Environmental Awareness Aspects in Kazakhstan

Authors: Y. A. Kumar

Abstract:

Environmental awareness issues in Kazakhstan are one of the most undermined topics both among the public community and in terms of state rhetoric. In the context of official state documents, so far only two official environmental codes and national programs called Zhasyl Kazakhstan were introduced in the country in 2021. While on the one hand the Environmental Code was introduced with the purpose to modernize, frame and enlist main legislative aspects on various sectors of environmental law in Kazakhstan, on the other hand, the Zhasyl Kazakhstan Program has been implemented as a state program to address with numerous environmental projects various environmental issues ranging from air pollution to waste management as well as aspects related to ecological education and low environmental awareness matters. In this regard, the main goal of this paper is to analyze critically the main content of both of these documents with a particular focus on sections related to environmental awareness-raising aspects. For that, this paper applied a subjective-based content analysis in order to identify interesting insights on regulatory legal aspects, future research streams, and uncovering of improved legislative frameworks in the context of an environmental awareness issue. Apart from that, five open-ended questions were sent out to the Ministry of Ecology, Geology and Natural Resources to obtain primary data on the state’s view in regards to current previous, recent and future aspects of environmental awareness issues in the country.

Keywords: Kazakhstan, environmental awareness, environmental code, Zhasyl Kazakhstan, content analysis

Procedia PDF Downloads 64
850 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili

Abstract:

Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: data mining, information retrieval, business, feature extraction, layout, business data processing, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 97
849 A Transformer-Based Question Answering Framework for Software Contract Risk Assessment

Authors: Qisheng Hu, Jianglei Han, Yue Yang, My Hoa Ha

Abstract:

When a company is considering purchasing software for commercial use, contract risk assessment is critical to identify risks to mitigate the potential adverse business impact, e.g., security, financial and regulatory risks. Contract risk assessment requires reviewers with specialized knowledge and time to evaluate the legal documents manually. Specifically, validating contracts for a software vendor requires the following steps: manual screening, interpreting legal documents, and extracting risk-prone segments. To automate the process, we proposed a framework to assist legal contract document risk identification, leveraging pre-trained deep learning models and natural language processing techniques. Given a set of pre-defined risk evaluation problems, our framework utilizes the pre-trained transformer-based models for question-answering to identify risk-prone sections in a contract. Furthermore, the question-answering model encodes the concatenated question-contract text and predicts the start and end position for clause extraction. Due to the limited labelled dataset for training, we leveraged transfer learning by fine-tuning the models with the CUAD dataset to enhance the model. On a dataset comprising 287 contract documents and 2000 labelled samples, our best model achieved an F1 score of 0.687.

Keywords: contract risk assessment, NLP, transfer learning, question answering

Procedia PDF Downloads 95
848 Structural Challenges of Social Integration of Immigrants in Iran: Investigating the Status of Providing Citizenship and Social Services

Authors: Iman Shabanzadeh

Abstract:

In terms of its geopolitical position, Iran has been one of the main centers of migration movements in the world in recent decades. However, the policy makers' lack of preparation in completing the cycle of social integration of these immigrants, especially the second and third generation, has caused these people to always be prone to leave the country and immigrate to developed and industrialized countries. In this research, the issue of integration of immigrants in Iran from the perspective of four indicators, "Identity Documents", "Access to Banking Services", "Access to Health and Treatment Services" and "Obtaining a Driver's License" will be analyzed. The research method is descriptive-analytical. To collect information, library and document sources in the field of laws and regulations related to immigrants' rights in Iran, semi-structured interviews with experts have been used. The investigations of this study show that none of the residence documents of immigrants in Iran guarantee the full enjoyment of basic citizenship rights for them. In fact, the function of many of these identity documents, such as the census card, educational support card, etc., is only to prevent crossing the border, and none of them guarantee the basic rights of citizenship. Therefore, for many immigrants, the difference between legality and illegality is only in the risk of crossing the border, and this has led to the spread of the habit of illegal presence for them. Despite this, it seems that there is no clear and coherent policy framework around the issue of foreign immigrants in the country. This policy incoherence can be clearly seen in the diversity and plurality of identity and legal documents of the citizens present in the country and the policy maker's lack of planning to integrate and organize the identity of this huge group. Examining the differences and socioeconomic inequalities between immigrants and the native Iranian population shows that immigrants have been poorly integrated into the structures of Iranian society from an economic and social point of view.

Keywords: immigrants, social integration, citizen services, structural inequality

Procedia PDF Downloads 22
847 BIM-Based Tool for Sustainability Assessment and Certification Documents Provision

Authors: Taki Eddine Seghier, Mohd Hamdan Ahmad, Yaik-Wah Lim, Samuel Opeyemi Williams

Abstract:

The assessment of building sustainability to achieve a specific green benchmark and the preparation of the required documents in order to receive a green building certification, both are considered as major challenging tasks for green building design team. However, this labor and time-consuming process can take advantage of the available Building Information Modeling (BIM) features such as material take-off and scheduling. Furthermore, the workflow can be automated in order to track potentially achievable credit points and provide rating feedback for several design options by using integrated Visual Programing (VP) to handle the stored parameters within the BIM model. Hence, this study proposes a BIM-based tool that uses Green Building Index (GBI) rating system requirements as a unique input case to evaluate the building sustainability in the design stage of the building project life cycle. The tool covers two key models for data extraction, firstly, a model for data extraction, calculation and the classification of achievable credit points in a green template, secondly, a model for the generation of the required documents for green building certification. The tool was validated on a BIM model of residential building and it serves as proof of concept that building sustainability assessment of GBI certification can be automatically evaluated and documented through BIM.

Keywords: green building rating system, GBRS, building information modeling, BIM, visual programming, VP, sustainability assessment

Procedia PDF Downloads 301
846 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 117
845 Improving the Performance of Requisition Document Online System for Royal Thai Army by Using Time Series Model

Authors: D. Prangchumpol

Abstract:

This research presents a forecasting method of requisition document demands for Military units by using Exponential Smoothing methods to analyze data. The data used in the forecast is an actual data requisition document of The Adjutant General Department. The results of the forecasting model to forecast the requisition of the document found that Holt–Winters’ trend and seasonality method of α=0.1, β=0, γ=0 is appropriate and matches for requisition of documents. In addition, the researcher has developed a requisition online system to improve the performance of requisition documents of The Adjutant General Department, and also ensuring that the operation can be checked.

Keywords: requisition, holt–winters, time series, royal thai army

Procedia PDF Downloads 280
844 Slovenian Spatial Legislation over Time and Its Issues

Authors: Andreja Benko

Abstract:

Article presents a short overview of the architects’ profession over time with outlined work of the architectural theoreticians. In the continuation is described a former affiliation of Slovenia as well as the spatial planning documents that were in use until the Slovenia joint Yugoslavia (last part in 1919). This legislation from former Austro-Hungarian monarchy was valid almost until 1950 in some parts of Yugoslavia even longer. Upon that will be mentioned some valid Slovenian spatial documents which will be compared with the German legislation. Analysed will be the number of architect and spatial planners in Slovenia and also their number upon certain region in Slovenia. Based on that will be given also the number from statistical office of Slovenia of the number of buildings between years 2007 and 2012, and described also the collapse of the major construction companies in Slovenia and consequences of that. At the end will be outlined the morality and ethics by spatial interventions and lack of the architectural law in Slovenia as well as the problematic of minimal collaboration between the Ministry of infrastructure and spatial planning with the profession.

Keywords: architect, history, legislation, Slovenia

Procedia PDF Downloads 335
843 Evaluation of Environmental, Social, and Governance Factors by U.S. Tolling Authorities in Bond Issuance Disclosures

Authors: Nicolas D. Norboge

Abstract:

Purchasers of municipal bonds in primary and secondary markets are increasingly expecting issuers to disclose environmental, social, and governance factors (ESG) inissuance and continuing disclosure documents. U.S. tolling authorities are slowly catching up with other transportation sectors, such as public transit, in integrating ESG factors into their bond disclosure documents. A systematic mixed-methods evaluation of publicly available bond disclosure documents from 2010-2022 suggest that only a small number of U.S. tolling authorities disclosedall ESG factors; however, the pace has accelerated significantly from 2020-2022. Because many tolling authorities have a direct financial stake in the growth of passenger vehicle miles traveled on their toll facilities, and in turn the burning of more climate-warming fossil fuels, one crucial questionthat remains is how bond purchasers will view increasedESG transparency. Recent moves by large institutional investors, credit rating agencies, and regulators suggestan expectation of ESG disclosure is a trend likely to endure. This researchsuggests tolling authorities will need to proactively consider these emerging trends and carefully adapt their disclosure practiceswhere possible. Building on these findings, this research also provides a basic sketch framework for how issuers can responsibly position themselves within the changing global municipal debt marketplace.

Keywords: debt policy, ESG, municipal bonds, public-private partnerships, public tolling authorities, transportation finance, and policy

Procedia PDF Downloads 149
842 Quantitative Method of Measurement for the Rights and Obligations of Contracting Parties in Standard Forms of Contract in Malaysia: A Case Study

Authors: Sim Nee Ting, Lan Eng Ng

Abstract:

Standard forms of contract in Malaysia are pre-written, printed contractual documents drafted by recognised authoritative bodies in order to describe the rights and obligations of the contracting parties in all construction projects in Malaysia. Studies and form revisions are usually conducted in a relatively random and qualitative manner, but the search of contractual documents idealization remains. It is not clear how these qualitative findings could be helpful for contractual documents improvements and re-drafting. This study aims to quantitatively and systematically analyse and evaluate the rights and obligations of the contracting parties as stated in the standard forms of contract. The Institution of Engineers Malaysia (IEM) published a new standard form of contract in 2012 with a total of 63 classes but the improvements and changes in the newly revised form that are yet to be analysed. IEM form will be used as the case study for this study. Every clause in this said form were interpreted and analysed according to the involved parties including contractor, engineer and employer. Modified from Matrix Method and Likert Scale, the result analysis were conducted based on a scale from 0 to 1 with five ratings namely “Very Unbalance”, “Unbalance”, “Balance”, “Good Balance” and “Very Good Balance”. It is hoped that quantitative method of form study can be used for future form revisions and any new forms drafting so to reduce on any subjectivity in standard forms of contract studies.

Keywords: contracting parties, Malaysia, obligations, quantitative measurement, rights, standard form of contract

Procedia PDF Downloads 240