Search results for: historiography of documents
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 950

Search results for: historiography of documents

950 An Analysis of Methodological Approaches of Ahmed Cevdet and Fatma Aliye towards the Ottoman Historiography in a Comparative Context

Authors: Aysen Muderrisoglu Esiner

Abstract:

As an intellectual, scholar, bureaucrat, and statesman, Ahmed Cevdet Pasha (1822-1895) was the prominent figure of “Tanzimat” (reorganization) reforms of the Ottoman State while his daughter Fatma Aliye (1862-1936) was a novelist, columnist, essayist, and women’s rights activist. His father had numerous books on law, grammar, linguistics, logic, and astronomy, moreover, Aliye accepted as the first female novelist in the Turkish literature and the Islamic world. Even if she was better known as a novelist, she also published some works on philosophy, Islam, poetry. In addition, Aliye who was one of the pioneers of the Ottoman women’s movement, also wrote historical works. Her historical works which titled as Tarih-i Osmaninin Bir Devre-i Mühimmesi Kosova Zaferi-Ankara Hezimeti (An Important Era of the Ottoman History: Kosova Victory-Ankara Defeat), and Ahmed Cevdet Paşa ve Zamanı (Ahmed Cevdet Pasha and His Time) have been generally ignored in the literature. However, Aliye’s works in history field are worth being studied in terms of her methodological approach to the Ottoman historiography. On the other hand, written by Ahmed Cevdet Pasha, such as Tarih-i Cevdet (History of Cevdet), Tezâkir (Memoir), Mâruzat (Reports, the events that took place between 1839-1876, 1890), Kısas-ı Enbiya ve Tevârîh-i Hulefa (Retaliation of the Prophets and the History of Calips), Kırım ve Kafkas Tarihçesi (Crimean and Caucasian History) are the most important works in terms of historiography in the 19th century. In contrast to the traditional methodology, Cevdet Pasha brought a new understanding to the Ottoman historiography by making a synthesis between the traditional and modern methods. In this research, the historical works of these two prominent figures of the Ottoman State will be analyzed in terms of their approaches to the Ottoman historiography while evaluating the following questions: to what extent that their use of local and foreign historical sources and their handling of the historical events differ, or if it is possible to talk about a methodological similarities in terms of historiography.

Keywords: Ahmed Cevdet Pasha, Fatma Aliye, historiography, methodology

Procedia PDF Downloads 257
949 Precarious ID Cards - Studying Documentary Practices in India through the Lens of Internal Migration

Authors: Ambuja Raj

Abstract:

This research will attempt to understand how documents are materially indispensable civic artifacts for migrants in their encounters with the state. Documents such as ID cards are sites of mediation and bureaucratic manifestation which reveal the inherent dynamics of power between the state and a delocalized people. While ID cards allow the holder to retain a different identity and articulate their demands as a citizen, they at the same time transform subjects into ‘objects’ in the exercise of governmental power. The research is based on the study of internal migrants in India, who are ‘visible’ to the state through its host of ID documents such as the ‘Aadhaar card’, electoral IDs, Ration cards, and a variety of region-specific documents, without the possession of which, not only are they unable to access jobs, public goods and services, and accommodation, but are liable to exploitation from state forces and mediators. Through semi-structured interviews with social actors in the processes of documentation and welfare of migrants, as well as with settlements of migrants themselves located in the state of Kerala in India, the thesis will attempt to understand the salience of documentary practices in the lives of inter-state migrants who move within Indian states in the hope of bettering their economic conditions. The research will trace the material and evolving significance of ID cards in the tenacity of states dealing with these ‘illegible’ populations. It will try to bring theories of governmentality, biopolitics and Weberian bureaucracy into the migrant issue while critically grounding itself on secondary literature by scholars who have worked on South Asian ‘governments of paper’.

Keywords: migration, historiography of documents, anthropology of state, documentary practices

Procedia PDF Downloads 188
948 The Use of Technology in Mathematics Learning (1995-2024): A Bibliometric Analysis

Authors: Rahma Adinda Sartika

Abstract:

The use of technology in learning mathematics has received a positive response from both students and teachers, so many researchers have conducted research on this theme. Based on the findings carried out in this study, 807 documents relevant to this theme have been published in Scopus from 1995-2024. After going through the stages of identification, screening, eligibility, and including, the documents that meet the criteria are 227 documents. These documents are then analyzed using the bibliometric method so that it can be seen that the most published documents in the Scopus database occurred in 2020, with 38 documents, and the lowest was from 1996 to 2000 and 2004 to 2007, namely, no documents published. The highest number of citations is in documents published in 2018, with a total of 349 citations, so the h-index is higher than the others. The country that published the most documents relevant to this theme is Indonesia with a total of 91 documents. The second largest is the United States, with a total of 28 published documents, and the third largest is China, with a total of 15 documents. Indonesia and the United States have the most working relationships between countries compared to other countries. The focus of research related to this theme is 1) mathematics learning, 2) learning systems, 3) engineering education, 4) technology and 5) mathematical concepts.

Keywords: technology, bibliometric, mathematics learning, mathematical concepts

Procedia PDF Downloads 56
947 Comparative Analysis of the Treatment of the Success of the First Crusade in Modern Arab and Western Historiography

Authors: Oleg Sokolov

Abstract:

Despite the fact that the epoch of the Crusades ended more than 700 years ago, its legacy still remains relevant both in the Middle East and in the West. There was made a comparison of the positions of the most prominent Western and Arab medievalists of XX-XXI centuries, using the example of their interpretations of the success of the First Crusade. The analyzed corpus consists of 70 works. In the modern Arab Historiography, it is often pointed out that the Seljuks' struggle against the crusaders of the First Crusade was seriously hampered by the raids of the Arab Bedouin tribes of Jazira. At the same time, it is emphasized that the Arab rulers of Northern Syria were ‘pleased’ with the defeats of the Turks and made peace with the Crusaders, refusing to fight them. At the same time it is usually underlined that the Fatimid aggression against the Turks led both the first and the second to defeat from the Crusaders and became one of the main reasons for the success of the First Crusade and the Muslims' loss of Jerusalem in 1099. The position of Western historians about the reasons for the success of the First Crusade differs significantly. First of all, in the Western Historiography, it is noted that the deaths of the Fatimid and Abbasid Caliphs and the Seljuk Sultan between 1092 and 1094 years created political vacuum just before the crusaders appeared in the Middle East political arena. In 1097-1099, when the Crusaders advanced through Asia Minor, Syria and Palestine to Jerusalem, there was an active internecine struggle between the parts of the Seljuq state that had broken up by that time, and the crusaders were not perceived as a general threat of all Muslims of this region at that time. It is also pointed out that the main goals of the Crusaders - Antioch, Edessa, and Jerusalem - were at that time periphery since the main struggle for power in the Middle East was at this time in Iran. Thus, Arab historians see the lack of support from Arabs of Syria and Jazira and the aggression from Egypt as a crucial factors preventing the Seljuks from defeating the Crusaders, while their Western counterparts consider the internal power struggle between the Seljuks as a more important reason for the success of the First Crusade. The reason for this divergence in the treatment of the events of the First Crusade is probably the prevailing in much of Arab historiography, the idea of the Franks as an enemy of all peoples and religions of the Middle East. At the same time, in contemporary Western Historiography, the crusaders are described only as one of the many military and political forces that operated in this region at the end of the eleventh century.

Keywords: Arabs, Crusades, historiography, Turks

Procedia PDF Downloads 167
946 Words Spotting in the Images Handwritten Historical Documents

Authors: Issam Ben Jami

Abstract:

Information retrieval in digital libraries is very important because most famous historical documents occupy a significant value. The word spotting in historical documents is a very difficult notion, because automatic recognition of such documents is naturally cursive, it represents a wide variability in the level scale and translation words in the same documents. We first present a system for the automatic recognition, based on the extraction of interest points words from the image model. The extraction phase of the key points is chosen from the representation of the image as a synthetic description of the shape recognition in a multidimensional space. As a result, we use advanced methods that can find and describe interesting points invariant to scale, rotation and lighting which are linked to local configurations of pixels. We test this approach on documents of the 15th century. Our experiments give important results.

Keywords: feature matching, historical documents, pattern recognition, word spotting

Procedia PDF Downloads 274
945 Historiography of European Urbanism in the 20th Century in Slavic Languages

Authors: Aliaksandr Shuba, Max Welch Guerra, Martin Pekar

Abstract:

The research is dedicated to the Historiography of European urbanism in the 20th century with its critical analysis of transnational oriented sources in Slavic languages. The goal of this research was to give an overview of Slavic sources on this subject. In the research, historians, who wrote in influential historiographies on architecture and urbanism in the 20th century history in Slavic languages from Eastern, Central and South-eastern Europe, are analysed. The analysis of historiographies in Slavic languages includes diverse sources from around Europe with authors, who examined European Urbanism in the 20th century through a global prism of or their own perspectives. The main publications are from the second half of the 20th century and the early 21st century with Soviet and Post-Soviet discourses. The necessity to analyse Slavic sources was a result of historiography of urbanism establishment as a discipline in the 20th century and by the USSR, Czechslovak, and Yugoslavian academics, who created strong historiographic bases for a development of their urban historiographic schools for wide studies and analysis of architectural and urban ideas and projects with their history in the early 1970s. That is analyzed in this research within Slavic publications, which often have different perspectives and discourses to Anglo-Saxon, and these bibliographic sources can bring a diversity of new ideas in contemporary academic discourse of the European urban historiography. The publications in Slavic languages are analyzed according to the following aspects: where, when, which types, by whom, and to whom the sources were written. The critical analysis of essential sources on the Historiography of European urbanism in the 20th century with an accomplishment through their comparison and interpretation. The authors’ autonomy is analysed as a central point, along with the influence of the Communist Party and state control on the interpretation of the history of urbanism in Central, Eastern and South-eastern Europe with the main dominant topics and ideas from the second half of the 20th century. Cross-national Slavic Historiographic sources and their perspectives are compared to the main transnational Anglo-Saxon Historiographic topics as some of the dominant subjects are hypothetically similar and others have more local or national oriented directions. Some of the dominant subjects, topics, and subtopics are hypothetically similar, while the others have more local or national oriented directions because of the authors’ autonomy and influences of the Communist Party with the state control in Slavic Socialists countries that were illustrated in this research.

Keywords: European urbanism, historiography, different perspectives, 20th century

Procedia PDF Downloads 174
944 Thinking Historiographically in the 21st Century: The Case of Spanish Musicology, a History of Music without History

Authors: Carmen Noheda

Abstract:

This text provides a reflection on the way of thinking about the study of the history of music by examining the production of historiography in Spain at the turn of the century. Based on concepts developed by the historical theorist Jörn Rüsen, the article focuses on the following aspects: the theoretical artifacts that structure the interpretation of the limits of writing the history of music, the narrative patterns used to give meaning to the discourse of history, and the orientation context that functions as a source of criteria of significance for both interpretation and representation. This analysis intends to show that historical music theory is not only a means to abstractly explore the complex questions connected to the production of historical knowledge, but also a tool for obtaining concrete images about the intellectual practice of professional musicologists. Writing about the historiography of contemporary Spanish music is a task that requires both a knowledge of the history that is being written and investigated, as well as a familiarity with current theoretical trends and methodologies that allow for the recognition and definition of the different tendencies that have arisen in recent decades. With the objective of carrying out these premises, this project takes as its point of departure the 'immediate historiography' in relation to Spanish music at the beginning of the 21st century. The hesitation that Spanish musicology has shown in opening itself to new anthropological and sociological approaches, along with its rigidity in the face of the multiple shifts in dynamic forms of thinking about history, have produced a standstill whose consequences can be seen in the delayed reception of the historiographical revolutions that have emerged in the last century. Methodologically, this essay is underpinned by Rüsen’s notion of the disciplinary matrix, which is an important contribution to the understanding of historiography. Combined with his parallel conception of differing paradigms of historiography, it is useful for analyzing the present-day forms of thinking about the history of music. Following these theories, the article will in the first place address the characteristics and identification of present historiographical currents in Spanish musicology to thereby carry out an analysis based on the theories of Rüsen. Finally, it will establish some considerations for the future of musical historiography, whose atrophy has not only fostered the maintenance of an ingrained positivist tradition, but has also implied, in the case of Spain, an absence of methodological schools and an insufficient participation in international theoretical debates. An update of fundamental concepts has become necessary in order to understand that thinking historically about music demands that we remember that subjects are always linked by reciprocal interdependencies that structure and define what it is possible to create. In this sense, the fundamental aim of this research departs from the recognition that the history of music is embedded in the conditions that make it conceivable, communicable and comprehensible within a society.

Keywords: historiography, Jörn Rüssen, Spanish musicology, theory of history of music

Procedia PDF Downloads 190
943 Indecisiveness in 'The Road Not Taken' by Robert Frost: An Expressive Critical Analysis

Authors: Kurt S. Candilas

Abstract:

This expressive critical study is an effort to bring in light new interpretation of Robert Frost poem 'The Road Not Taken' as a reflection of his indecisiveness in life. Specifically, it aims at examining Frost’s inner being, emphasizing his own self and experiences in the poem or text. The study employs the qualitative research design which made use of discourse analysis using the critical theory of expressivism as the main guide. In acquiring the data of the study, the art of historiography is used such as autobiographical and/or biographical notes, sources documents, and web information. In executing the methods involved in this study, it is observed that the poem shows a naturalist implicatures, expressing Frost’s strong feelings and emotions being devoid of free will and a narrow bit of confusions and ambiguities with his indecisions in life.

Keywords: The Road Not Taken, expressivism, indecisiveness, naturalist implicatures

Procedia PDF Downloads 342
942 Data Gathering and Analysis for Arabic Historical Documents

Authors: Ali Dulla

Abstract:

This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.

Keywords: dataset production, ground truth production, historical documents, arbitrary warping, geometric correction

Procedia PDF Downloads 168
941 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 332
940 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: grayscale image format, image fusing, RGB image format, SURF detection, YCbCr image format

Procedia PDF Downloads 377
939 The Platform for Digitization of Georgian Documents

Authors: Erekle Magradze, Davit Soselia, Levan Shughliashvili, Irakli Koberidze, Shota Tsiskaridze, Victor Kakhniashvili, Tamar Chaghiashvili

Abstract:

Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors. Implementing these stages requires a unified, scalable, and automated platform, where the digital service developed for each stage will perform the task assigned to it; at the same time, it will be possible to develop these services dynamically so that there is no interruption in the work of the platform.

Keywords: NLP, OCR, BERT, Kubernetes, transformers

Procedia PDF Downloads 144
938 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 518
937 Logic and Arabic Grammar Debates at Medieval Ages: A Quest for Muslim Contributions to Philosophical Development

Authors: Umar Sheikh Tahir

Abstract:

This paper focuses on the historiography of the relationship between Logic and Arabic grammar in the Muslim Medieval Ages (a period between 750 and 1100/ 150 and 500 Ah). This sensation appears in the famous debate among many others between grammarians represented by abū Sa'id al-Sairafī and logicians represented by abū Bishr Mattā on Logic and its validity. This incident took place in Baghdad around 932 AD. However, this study singlehandedly samples these debates as the base for the contributions of Islamic philosophers to philosophy of language as well as Epistemology. The question that shapes this research is: What is the intellectual development for Muslim thinkers to philosophy of language in regards to this debate? The current research addresses the Arabic grammar and logical debates by conducting historiography to emphasize on Islamic philosophers’ concerns about this issue. Consequently, this debate generates philosophical phenomena and resolutions in deep-thinking. In addition, these dialogues create a language impression for Philosophy in Islamic world from the period under study. Thereupon, Islamic philosophers’ discourse on this phenomenon serves as contribution to the Philosophy of Language.

Keywords: debates, epistemology, grammar and grammarians, Islamic philosophy, philosophy language, logic

Procedia PDF Downloads 224
936 System of Quality Automation for Documents (SQAD)

Authors: R. Babi Saraswathi, K. Divya, A. Habeebur Rahman, D. B. Hari Prakash, S. Jayanth, T. Kumar, N. Vijayarangan

Abstract:

Document automation is the design of systems and workflows, assembling repetitive documents to meet the specific business needs. In any organization or institution, documenting employee’s information is very important for both employees as well as management. It shows an individual’s progress to the management. Many documents of the employee are in the form of papers, so it is very difficult to arrange and for future reference we need to spend more time in getting the exact document. Also, it is very tedious to generate reports according to our needs. The process gets even more difficult on getting approvals and hence lacks its security aspects. This project overcomes the above-stated issues. By storing the details in the database and maintaining the e-documents, the automation system reduces the manual work to a large extent. Then the approval process of some important documents can be done in a much-secured manner by using Digital Signature and encryption techniques. Details are maintained in the database and e-documents are stored in specific folders and generation of various kinds of reports is possible. Moreover, an efficient search method is implemented is used in the database. Automation supporting document maintenance in many aspects is useful for minimize data entry, reduce the time spent on proof-reading, avoids duplication, and reduce the risks associated with the manual error, etc.

Keywords: e-documents, automation, digital signature, encryption

Procedia PDF Downloads 391
935 Enhancement of Indexing Model for Heterogeneous Multimedia Documents: User Profile Based Approach

Authors: Aicha Aggoune, Abdelkrim Bouramoul, Mohamed Khiereddine Kholladi

Abstract:

Recent research shows that user profile as important element can improve heterogeneous information retrieval with its content. In this context, we present our indexing model for heterogeneous multimedia documents. This model is based on the combination of user profile to the indexing process. The general idea of our proposal is to operate the common concepts between the representation of a document and the definition of a user through his profile. These two elements will be added as additional indexing entities to enrich the heterogeneous corpus documents indexes. We have developed IRONTO domain ontology allowing annotation of documents. We will present also the developed tool validating the proposed model.

Keywords: indexing model, user profile, multimedia document, heterogeneous of sources, ontology

Procedia PDF Downloads 348
934 Procedure for Recommendation of Archival Documents

Authors: Marlon J. Remedios, Maria T. Morell, Jesse D. Cano

Abstract:

Diffusion and accessibility of historical collections is one of the main objectives of the institutions that aim to safeguard archival documents (General Archives). Several countries have Web applications that try to make accessible and public the large number of documents that they guard. Each of these sites has a set of features in order to facilitate access, navigability, and search for information. Different sources of information include Recommender Systems as a way of customizing content. This paper aims at describing a process for the production of archival documents relevant to the user. To comply with this, the characteristics ruling archival description, elements and main techniques that establishes the design of Recommender Systems, a set of rules to follow, and how these rules operate and the way in which take advantage of the domain knowledge are discussed. Finally, relevant issues are discussed in the design of the proposed tests and the results obtained are shown.

Keywords: archival document, recommender system, procedure, information management

Procedia PDF Downloads 514
933 On the Interactive Search with Web Documents

Authors: Mario Kubek, Herwig Unger

Abstract:

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-of-the-art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents

Keywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking

Procedia PDF Downloads 393
932 Binarization and Recognition of Characters from Historical Degraded Documents

Authors: Bency Jacob, S.B. Waykar

Abstract:

Degradations in historical document images appear due to aging of the documents. It is very difficult to understand and retrieve text from badly degraded documents as there is variation between the document foreground and background. Thresholding of such document images either result in broken characters or detection of false texts. Numerous algorithms exist that can separate text and background efficiently in the textual regions of the document; but portions of background are mistaken as text in areas that hardly contain any text. This paper presents a way to overcome these problems by a robust binarization technique that recovers the text from a severely degraded document images and thereby increases the accuracy of optical character recognition systems. The proposed document recovery algorithm efficiently removes degradations from document images. Here we are using the ostus method ,local thresholding and global thresholding and after the binarization training and recognizing the characters in the degraded documents.

Keywords: binarization, denoising, global thresholding, local thresholding, thresholding

Procedia PDF Downloads 344
931 One-Class Support Vector Machine for Sentiment Analysis of Movie Review Documents

Authors: Chothmal, Basant Agarwal

Abstract:

Sentiment analysis means to classify a given review document into positive or negative polar document. Sentiment analysis research has been increased tremendously in recent times due to its large number of applications in the industry and academia. Sentiment analysis models can be used to determine the opinion of the user towards any entity or product. E-commerce companies can use sentiment analysis model to improve their products on the basis of users’ opinion. In this paper, we propose a new One-class Support Vector Machine (One-class SVM) based sentiment analysis model for movie review documents. In the proposed approach, we initially extract features from one class of documents, and further test the given documents with the one-class SVM model if a given new test document lies in the model or it is an outlier. Experimental results show the effectiveness of the proposed sentiment analysis model.

Keywords: feature selection methods, machine learning, NB, one-class SVM, sentiment analysis, support vector machine

Procedia PDF Downloads 517
930 Model-Based Field Extraction from Different Class of Administrative Documents

Authors: Jinen Daghrir, Anis Kricha, Karim Kalti

Abstract:

The amount of incoming administrative documents is massive and manually processing these documents is a costly task especially on the timescale. In fact, this problem has led an important amount of research and development in the context of automatically extracting fields from administrative documents, in order to reduce the charges and to increase the citizen satisfaction in administrations. In this matter, we introduce an administrative document understanding system. Given a document in which a user has to select fields that have to be retrieved from a document class, a document model is automatically built. A document model is represented by an attributed relational graph (ARG) where nodes represent fields to extract, and edges represent the relation between them. Both of vertices and edges are attached with some feature vectors. When another document arrives to the system, the layout objects are extracted and an ARG is generated. The fields extraction is translated into a problem of matching two ARGs which relies mainly on the comparison of the spatial relationships between layout objects. Experimental results yield accuracy rates from 75% to 100% tested on eight document classes. Our proposed method has a good performance knowing that the document model is constructed using only one single document.

Keywords: administrative document understanding, logical labelling, logical layout analysis, fields extraction from administrative documents

Procedia PDF Downloads 213
929 Providing a Secure, Reliable and Decentralized Document Management Solution Using Blockchain by a Virtual Identity Card

Authors: Meet Shah, Ankita Aditya, Dhruv Bindra, V. S. Omkar, Aashruti Seervi

Abstract:

In today's world, we need documents everywhere for a smooth workflow in the identification process or any other security aspects. The current system and techniques which are used for identification need one thing, that is ‘proof of existence’, which involves valid documents, for example, educational, financial, etc. The main issue with the current identity access management system and digital identification process is that the system is centralized in their network, which makes it inefficient. The paper presents the system which resolves all these cited issues. It is based on ‘blockchain’ technology, which is a 'decentralized system'. It allows transactions in a decentralized and immutable manner. The primary notion of the model is to ‘have everything with nothing’. It involves inter-linking required documents of a person with a single identity card so that a person can go anywhere without having the required documents with him/her. The person just needs to be physically present at a place wherein documents are necessary, and using a fingerprint impression and an iris scan print, the rest of the verification will progress. Furthermore, some technical overheads and advancements are listed. This paper also aims to layout its far-vision scenario of blockchain and its impact on future trends.

Keywords: blockchain, decentralized system, fingerprint impression, identity management, iris scan

Procedia PDF Downloads 129
928 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: document processing, framework, formal definition, machine learning

Procedia PDF Downloads 216
927 Contribution of a Higher Education Institute towards Built Environment Sustainability

Authors: Tayyab Ahmad, Gerard Healey

Abstract:

The potential role of higher education institutes in sustainable development cannot be undermined. In this regard, it is important to investigate the established concept of sustainability in such institutes to explore the room for further improvement. In this paper, a case study of the University of Melbourne is conducted, and the institute’s commitments towards sustainability are examined by a detailed qualitative review of its policy and design standard documents. These documents are reviewed as through these; the institute portrays its vision of building environment facilities, which it aspires to procure and use. From detailed review, it is realized that these documents are updated at different times, creating the potential for mismatch between them. The occurrence of different goals and objectives in different documents is highlighted, and the interrelationships between different goals and operational objectives are explored. The role of the university aspired goals/objectives in terms of built environment sustainability is discussed, and the gaps in the articulation of goals and operational objectives are highlighted. Recommendations are provided for enhancing the built environment sustainability at the University of Melbourne.

Keywords: university, design standards, policy, sustainability, built environment

Procedia PDF Downloads 170
926 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 426
925 Application of Ontologies to Contract for Difference Documents

Authors: Renato Figueira Franco

Abstract:

This paper aims to create a representational information system applied to the securities market, particularly the development of an ontology applied to the analysis of the Key Information Documents of Contracts for Difference. The process of obtaining knowledge and its proper formal representation has raised the attention both from the scientific literature and the capital markets supervisory authorities. The formal knowledge representation is embodied in the construction of ontologies, which are responsible for defining a knowledge base structure of a given scientific domain, facilitating its understanding, and allowing its sharing among the scientific community. The scope of this study is restricted to the analysis of capital markets ontologies in order to capture its structure, semantics and knowledge sharing between people and systems.

Keywords: ontology, financial markets, CFD, PRIIPs, key information documents

Procedia PDF Downloads 66
924 A Proposed Approach for Emotion Lexicon Enrichment

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.

Keywords: document analysis, sentimental analysis, emotion detection, WEKA tool, NRC lexicon

Procedia PDF Downloads 442
923 Framework for Detecting External Plagiarism from Monolingual Documents: Use of Shallow NLP and N-Gram Frequency Comparison

Authors: Saugata Bose, Ritambhra Korpal

Abstract:

The internet has increased the copy-paste scenarios amongst students as well as amongst researchers leading to different levels of plagiarized documents. For this reason, much of research is focused on for detecting plagiarism automatically. In this paper, an initiative is discussed where Natural Language Processing (NLP) techniques as well as supervised machine learning algorithms have been combined to detect plagiarized texts. Here, the major emphasis is on to construct a framework which detects external plagiarism from monolingual texts successfully. For successfully detecting the plagiarism, n-gram frequency comparison approach has been implemented to construct the model framework. The framework is based on 120 characteristics which have been extracted during pre-processing the documents using NLP approach. Afterwards, filter metrics has been applied to select most relevant characteristics and then supervised classification learning algorithm has been used to classify the documents in four levels of plagiarism. Confusion matrix was built to estimate the false positives and false negatives. Our plagiarism framework achieved a very high the accuracy score.

Keywords: lexical matching, shallow NLP, supervised machine learning algorithm, word n-gram

Procedia PDF Downloads 357
922 Wasting Human and Computer Resources

Authors: Mária Csernoch, Piroska Biró

Abstract:

The legends about “user-friendly” and “easy-to-use” birotical tools (computer-related office tools) have been spreading and misleading end-users. This approach has led us to the extremely high number of incorrect documents, causing serious financial losses in the creating, modifying, and retrieving processes. Our research proved that there are at least two sources of this underachievement: (1) The lack of the definition of the correctly edited, formatted documents. Consequently, end-users do not know whether their methods and results are correct or not. They are not aware of their ignorance. They are so ignorant that their ignorance does not allow them to realize their lack of knowledge. (2) The end-users’ problem-solving methods. We have found that in non-traditional programming environments end-users apply, almost exclusively, surface approach metacognitive methods to carry out their computer related activities, which are proved less effective than deep approach methods. Based on these findings we have developed deep approach methods which are based on and adapted from traditional programming languages. In this study, we focus on the most popular type of birotical documents, the text-based documents. We have provided the definition of the correctly edited text, and based on this definition, adapted the debugging method known in programming. According to the method, before the realization of text editing, a thorough debugging of already existing texts and the categorization of errors are carried out. With this method in advance to real text editing users learn the requirements of text-based documents and also of the correctly formatted text. The method has been proved much more effective than the previously applied surface approach methods. The advantages of the method are that the real text handling requires much less human and computer sources than clicking aimlessly in the GUI (Graphical User Interface), and the data retrieval is much more effective than from error-prone documents.

Keywords: deep approach metacognitive methods, error-prone birotical documents, financial losses, human and computer resources

Procedia PDF Downloads 382
921 Lecture Video Indexing and Retrieval Using Topic Keywords

Authors: B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa

Abstract:

In this paper, we propose a framework to help users to search and retrieve the portions in the lecture video of their interest. This is achieved by temporally segmenting and indexing the lecture video using the topic keywords. We use transcribed text from the video and documents relevant to the video topic extracted from the web for this purpose. The keywords for indexing are found by applying the non-negative matrix factorization (NMF) topic modeling techniques on the web documents. Our proposed technique first creates indices on the transcribed documents using the topic keywords, and these are mapped to the video to find the start and end time of the portions of the video for a particular topic. This time information is stored in the index table along with the topic keyword which is used to retrieve the specific portions of the video for the query provided by the users.

Keywords: video indexing and retrieval, lecture videos, content based video search, multimodal indexing

Procedia PDF Downloads 250