Search results for: document archiving
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 758

Search results for: document archiving

728 System of Quality Automation for Documents (SQAD)

Authors: R. Babi Saraswathi, K. Divya, A. Habeebur Rahman, D. B. Hari Prakash, S. Jayanth, T. Kumar, N. Vijayarangan

Abstract:

Document automation is the design of systems and workflows, assembling repetitive documents to meet the specific business needs. In any organization or institution, documenting employee’s information is very important for both employees as well as management. It shows an individual’s progress to the management. Many documents of the employee are in the form of papers, so it is very difficult to arrange and for future reference we need to spend more time in getting the exact document. Also, it is very tedious to generate reports according to our needs. The process gets even more difficult on getting approvals and hence lacks its security aspects. This project overcomes the above-stated issues. By storing the details in the database and maintaining the e-documents, the automation system reduces the manual work to a large extent. Then the approval process of some important documents can be done in a much-secured manner by using Digital Signature and encryption techniques. Details are maintained in the database and e-documents are stored in specific folders and generation of various kinds of reports is possible. Moreover, an efficient search method is implemented is used in the database. Automation supporting document maintenance in many aspects is useful for minimize data entry, reduce the time spent on proof-reading, avoids duplication, and reduce the risks associated with the manual error, etc.

Keywords: e-documents, automation, digital signature, encryption

Procedia PDF Downloads 360
727 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 132
726 Product Development Process to Obtain Community Standard Product Certificate: A Case of Bangkhonthi, Samut Songkhram, Thailand

Authors: Supattra Pranee

Abstract:

The objectives of this research were to study the product development process to obtain a community standard product certificate and to set a guideline for the product development process to obtain the community product certificate. Focus group discussion was conducted with many experts in the field, local government officials, and representatives from local producers in Bangkontee district. The findings revealed that there were eight important processes to obtain the community product certificate: 1) prepare document, 2) submit the document, 3) set up an appointment for onsite inspection, 4) onsite inspection and sample collections, 5) evaluate samples, 6) obtain test result, and 7) obtain certificate.

Keywords: perceived values, tourist destination, visiting, product development

Procedia PDF Downloads 416
725 The Legality of the Individual Education Plan from the Teachers’ Perspective in Saudi Arabia

Authors: Sohil I. Alqazlan

Abstract:

Introduction and Objectives: The individual educational plans (IEPs) is the cornerstone in education for students with special education need (SEN). The Saudi government supported the students’ right to have an IEP, and their education is one of the primary goals for the Ministry of Education (MoE). However, this support does not reflect the huge government investment. For example, some SEN students do not have an IEP, and poor communication was found between IEP teams and student's families. As a result, this study investigated perspectives and understandings of the IEP from the views of SEN teachers in the Saudi context. Methods: This study design utilised a qualitative approach, where in-depth semi-structured interviews were used with 8 SEN teachers in Riyadh (the capital city of Saudi Arabia) schools. In terms of analysing the interviews’ findings, the researcher used the thematic analyses approach. Results and Conclusion: The legality and the consideration of the legal document in Saudi Arabia are the main areas wherein study participants were questioned. It was observed that the IEP is not considered a legal document in the region of Saudi Arabia. As interpreted from the response of the SEN teachers, the IEP lacks the required legality with respect to its implementation in Saudi Arabia. All teachers were in agreement that the IEP is not considered to be a legal document in the Kingdom of Saudi Arabia. As a result, they did not use it for all their students with SEN. Such findings might have affected the teaching quality, and school outcomes as all SEN students must be supported individually depending on their needs.

Keywords: individual education plan, special education, IEP, teachers

Procedia PDF Downloads 146
724 Popularization of the Communist Manifesto in 19th Century Europe

Authors: Xuanyu Bai

Abstract:

“The Communist Manifesto”, written by Karl Marx and Friedrich Engels, is one of the most significant documents throughout the whole history which covers across different fields including Economic, Politic, Sociology and Philosophy. Instead of discussing the Communist ideas presented in the Communist Manifesto, the essay focuses on exploring the reasons that contributed to the popularization of the document and its influence on political revolutions in 19th century Europe by concentrating on the document itself along with other primary and secondary sources and temporal artwork. Combining the details from the Communist Manifesto and other documents, Marx’s writing style and word choice, his convincible notions about a new society dominated by proletariats, and the revolutionary idea of class destruction has led to the popularization of the Communist Manifesto and influenced the latter political revolutions.

Keywords: communist manifesto, Marx, Engels, capitalism

Procedia PDF Downloads 107
723 Mediation in Turkey

Authors: Ibrahim Ercan, Mustafa Arikan

Abstract:

In recent years, alternative dispute resolution methods have attracted the attention of many country’s legislators. Instead of solving the disputes by litigation, putting the end to a dispute by parties themselves is more important for the preservation of social peace. Therefore, alternative dispute resolution methods (ADR) have been discussed more intensively in Turkey as well as the whole world. After these discussions, Mediation Act was adopted on 07.06.2012 and entered into force on 21.06.2013. According to the Mediation Act, it is only possible to mediate issues arising from the private law. Also, it is not compulsory to go to mediation in Turkish law, it is optional. Therefore, the parties are completely free to choose mediation method in dispute resolution. Mediators need to be a lawyer with experience in five years. Therefore, it is not possible to be a mediator who is not lawyers. Beyond five years of experience, getting education and success in exams about especially body language and psychology is also very important to be a mediator. If the parties compromise as a result of mediation, a document is issued. This document will also have the ability to exercising availability under certain circumstances. Thus, the parties will not need to apply to the court again. On the contrary, they will find the opportunity to execute this document, so they can regain their debts. However, the Mediation Act has entered into force in a period of nearly two years of history; it is possible to say that the interest in mediation is not at the expected level. Therefore, making mediation mandatory for some disputes has been discussed recently. At this point, once the mediation becomes mandatory and good results follows it, this institution will be able to find a serious interest in Turkey. Otherwise, if the results will not be satisfying, the mediation method will be removed.

Keywords: alternative dispute resolution methods, mediation act, mediation, mediator, mediation in Turkey

Procedia PDF Downloads 340
722 Enhancement of Indexing Model for Heterogeneous Multimedia Documents: User Profile Based Approach

Authors: Aicha Aggoune, Abdelkrim Bouramoul, Mohamed Khiereddine Kholladi

Abstract:

Recent research shows that user profile as important element can improve heterogeneous information retrieval with its content. In this context, we present our indexing model for heterogeneous multimedia documents. This model is based on the combination of user profile to the indexing process. The general idea of our proposal is to operate the common concepts between the representation of a document and the definition of a user through his profile. These two elements will be added as additional indexing entities to enrich the heterogeneous corpus documents indexes. We have developed IRONTO domain ontology allowing annotation of documents. We will present also the developed tool validating the proposed model.

Keywords: indexing model, user profile, multimedia document, heterogeneous of sources, ontology

Procedia PDF Downloads 321
721 Cost of Outpatient Procedures for Ostomized Patients Treated in the Public Health Network in Brazil and Its Impact on the Budget of the Unified Health System

Authors: Karina Guimaraes, Lilian Santos

Abstract:

This study has the purpose of planning and instituting monitoring actions as a way of knowing the scenario of assistance to the patient with stoma, treated in the public health network in Brazil, from January to November of the year 2016, from the elaboration of a technical document containing the survey of the number of procedures offered and the value of the ostomy services, accredited in the Unified Health System-SUS. The purpose of this document is to improve the quality of these services in the efficient management of available financial resources, making it indispensable for the creation of strategies for the implementation and implementation of care services for people with stomata as a strategic tool in the promotion, prevention, qualification and efficiency in health care.

Keywords: health economic, management, ostomy, unified health system

Procedia PDF Downloads 282
720 Proposal for an Inspection Tool for Damaged Structures after Disasters

Authors: Karim Akkouche, Amine Nekmouche, Leyla Bouzid

Abstract:

This study focuses on the development of a multifunctional Expert System (ES) called post-seismic damage inspection tool (PSDIT), a powerful tool which allows the evaluation, the processing, and the archiving of the collected data stock after earthquakes. PSDIT can be operated by two user types; an ordinary user (ingineer, expert, or architect) for the damage visual inspection and an administrative user for updating the knowledge and / or for adding or removing the ordinary user. The knowledge acquisition is driven by a hierarchical knowledge model, the Information from investigation reports and those acquired through feedback from expert / engineer questionnaires are part.

Keywords: .disaster, damaged structures, damage assessment, expert system

Procedia PDF Downloads 51
719 Proposal of a Damage Inspection Tool After Earthquakes: Case of Algerian Buildings

Authors: Akkouche Karim, Nekmouche Aghiles, Bouzid Leyla

Abstract:

This study focuses on the development of a multifunctional Expert System (ES) called post-seismic damage inspection tool (PSDIT), a powerful tool which allows the evaluation, the processing and the archiving of the collected data stock after earthquakes. PSDIT can be operated by two user types; an ordinary user (engineer, expert or architect) for the damage visual inspection and an administrative user for updating the knowledge and / or for adding or removing the ordinary user. The knowledge acquisition is driven by a hierarchical knowledge model, the Information from investigation reports and those acquired through feedback from expert / engineer questionnaires are part.

Keywords: buildings, earthquake, seismic damage, damage assessment, expert system

Procedia PDF Downloads 45
718 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 423
717 Control Configuration System as a Key Element in Distributed Control System

Authors: Goodarz Sabetian, Sajjad Moshfe

Abstract:

Control system for hi-tech industries could be realized generally and deeply by a special document. Vast heavy industries such as power plants with a large number of I/O signals are controlled by a distributed control system (DCS). This system comprises of so many parts from field level to high control level, and junior instrument engineers may be confused by this enormous information. The key document which can solve this problem is “control configuration system diagram” for each type of DCS. This is a road map that covers all of activities respect to control system in each industrial plant and inevitable to be studied by whom corresponded. It plays an important role from designing control system start point until the end; deliver the system to operate. This should be inserted in bid documents, contracts, purchasing specification and used in different periods of project EPC (engineering, procurement, and construction). Separate parts of DCS are categorized here in order of importance and a brief description and some practical plan is offered. This article could be useful for all instrument and control engineers who worked is EPC projects.

Keywords: control, configuration, DCS, power plant, bus

Procedia PDF Downloads 464
716 Applications of Visual Ethnography in Public Anthropology

Authors: Subramaniam Panneerselvam, Gunanithi Perumal, KP Subin

Abstract:

The Visual Ethnography is used to document the culture of a community through a visual means. It could be either photography or audio-visual documentation. The visual ethnographic techniques are widely used in visual anthropology. The visual anthropologists use the camera to capture the cultural image of the studied community. There is a scope for subjectivity while the culture is documented by an external person. But the upcoming of the public anthropology provides an opportunity for the participants to document their own culture. There is a need to equip the participants with the skill of doing visual ethnography. The mobile phone technology provides visual documentation facility to everyone to capture the moments instantly. The visual ethnography facilitates the multiple-interpretation for the audiences. This study explores the effectiveness of visual ethnography among the tribal youth through public anthropology perspective. The case study was conducted to equip the tribal youth of Nilgiris in visual ethnography and the outcome of the experiment shared in this paper.

Keywords: visual ethnography, visual anthropology, public anthropology, multiple-interpretation, case study

Procedia PDF Downloads 137
715 Development of Distance Training Packages for Teacher on Education Management for Learners with Special Needs

Authors: Jareeluk Ratanaphan

Abstract:

The purposed of this research were; 1. To survey the teacher’s needs on knowledge about special education management for special needs student 2. Development of distance training packages for teacher on special education management for special needs student 3. to study the effects of using the packages on trainee’s achievement 4. to study the effects of using the packages on trainee’s opinion on the distance training packages. The design of the experiment was research and development. The research sample for survey were 86 teachers, and 22 teachers for study the effects of using the packages on achievement and opinion. The research instrument comprised: 1) training packages on special education management for special needs student 2) achievement test 3) questionnaire. Mean, percentage, standard deviation, t-test and content analysis were used for data analysis. The findings of the research were as follows: 1. The teacher’s needs on knowledge about teaching for a learner with learning disability, mental retardation, autism, physical and health impairment and research in special education. 2. The package composed of special education management for special needs student document and manual of distance training packages. The document consisted by the name of packages, the explanation for the educator, content’s structure, concept, objectives, content and activities. Manual of distance training packages consisted by the explanation about a document, objectives, explanation about using the package, training schedule, and evaluation. The efficiency of packages was established at 79.50/81.35. 3. The results of using the packages were the posttest average scores of trainee’s achievement were higher than the pretest. 4. The trainee’s opinion on the package was at the highest level.

Keywords: distance training package, teacher, learner with special needs

Procedia PDF Downloads 462
714 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 118
713 Understanding Embryology in Promoting Peace Leadership: A Document Review

Authors: Vasudev Das

Abstract:

The specific problem is that many leaders of the 21st century do not understand that the extermination of embryos wreaks havoc on peace leadership. The purpose of the document review is to understand embryology in facilitating peace leadership. Extermination of human embryos generates a requital wave of violence which later falls on human society in the form of disturbances, considering that violence breeds further violence as a consequentiality. The study results reveal that a deep understanding of embryology facilitates peace leadership, given that minimizing embryo extermination enhances non-violence in the global village. Neo-Newtonians subscribe to the idea that every action has an equal and opposite reaction. The US Federal Government recognizes the embryo or fetus as a member of Homo sapiens. The social change implications of this study are that understanding human embryology promotes peace leadership, considering that the consequentiality of embryo extermination can serve as a deterrent for violence on embryos.

Keywords: consequentiality, Homo sapiens, neo-Newtonians, violence

Procedia PDF Downloads 107
712 Adaptation of Projection Profile Algorithm for Skewed Handwritten Text Line Detection

Authors: Kayode A. Olaniyi, Tola. M. Osifeko, Adeola A. Ogunleye

Abstract:

Text line segmentation is an important step in document image processing. It represents a labeling process that assigns the same label using distance metric probability to spatially aligned units. Text line detection techniques have successfully been implemented mainly in printed documents. However, processing of the handwritten texts especially unconstrained documents has remained a key problem. This is because the unconstrained hand-written text lines are often not uniformly skewed. The spaces between text lines may not be obvious, complicated by the nature of handwriting and, overlapping ascenders and/or descenders of some characters. Hence, text lines detection and segmentation represents a leading challenge in handwritten document image processing. Text line detection methods that rely on the traditional global projection profile of the text document cannot efficiently confront with the problem of variable skew angles between different text lines. Hence, the formulation of a horizontal line as a separator is often not efficient. This paper presents a technique to segment a handwritten document into distinct lines of text. The proposed algorithm starts, by partitioning the initial text image into columns, across its width into chunks of about 5% each. At each vertical strip of 5%, the histogram of horizontal runs is projected. We have worked with the assumption that text appearing in a single strip is almost parallel to each other. The algorithm developed provides a sliding window through the first vertical strip on the left side of the page. It runs through to identify the new minimum corresponding to a valley in the projection profile. Each valley would represent the starting point of the orientation line and the ending point is the minimum point on the projection profile of the next vertical strip. The derived text-lines traverse around any obstructing handwritten vertical strips of connected component by associating it to either the line above or below. A decision of associating such connected component is made by the probability obtained from a distance metric decision. The technique outperforms the global projection profile for text line segmentation and it is robust to handle skewed documents and those with lines running into each other.

Keywords: connected-component, projection-profile, segmentation, text-line

Procedia PDF Downloads 93
711 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: data mining, information retrieval system, multi-label, problem transformation, histogram of gradients

Procedia PDF Downloads 347
710 Predicting Success and Failure in Drug Development Using Text Analysis

Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev

Abstract:

Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.

Keywords: data analysis, drug development, sentiment analysis, text-mining

Procedia PDF Downloads 124
709 Efficient Layout-Aware Pretraining for Multimodal Form Understanding

Authors: Armineh Nourbakhsh, Sameena Shah, Carolyn Rose

Abstract:

Layout-aware language models have been used to create multimodal representations for documents that are in image form, achieving relatively high accuracy in document understanding tasks. However, the large number of parameters in the resulting models makes building and using them prohibitive without access to high-performing processing units with large memory capacity. We propose an alternative approach that can create efficient representations without the need for a neural visual backbone. This leads to an 80% reduction in the number of parameters compared to the smallest SOTA model, widely expanding applicability. In addition, our layout embeddings are pre-trained on spatial and visual cues alone and only fused with text embeddings in downstream tasks, which can facilitate applicability to low-resource of multi-lingual domains. Despite using 2.5% of training data, we show competitive performance on two form understanding tasks: semantic labeling and link prediction.

Keywords: layout understanding, form understanding, multimodal document understanding, bias-augmented attention

Procedia PDF Downloads 116
708 HIS Integration Systems Using Modality Worklist and DICOM

Authors: Kulvinder Singh Mann

Abstract:

The usability and simulation of information systems, known as Hospital Information System (HIS), Radiology Information System (RIS), and Picture Archiving, Communication System, for electronic medical records has shown a good impact for actors in the hospital. The objective is to help and make their work easier; such as for a nurse or administration staff to record the medical records of the patient, and for a patient to check their bill transparently. However, several limitations still exists on such area regarding the type of data being stored in the system, ability for data transfer, storage and protocols to support communication between medical devices and digital images. This paper reports the simulation result of integrating several systems to cope with those limitations by using the Modality Worklist and DICOM standard. It succeeds in documenting the reason of that failure so future research will gain better understanding and be able to integrate those systems.

Keywords: HIS, RIS, PACS, modality worklist, DICOM, digital images

Procedia PDF Downloads 287
707 Teachers' Beliefs and Practices in Designing Negotiated English Lesson Plans

Authors: Joko Nurkamto

Abstract:

A lesson plan is a part of the planning phase in a learning and teaching system framing the scenario of pedagogical activities in the classroom. It informs a decision on what to teach and how to landscape classroom interaction. Regardless of these benefits, the writer has witnessed the fact that lesson plans are viewed merely as a teaching document. Therefore, this paper will explore teachers’ beliefs and practices in designing lesson plans. It focuses primarily on how both teachers and students negotiate lesson plans in which the students are deemed to be the agents of instructional innovations. Additionally, the paper will talk about how such lesson plans are enacted. To investigate these issues, document analysis, in-depth interviews, participant classroom observation, and focus group discussion will be deployed as data collection methods in this explorative case study. The benefits of the paper are to show different roles of lesson plans and to discover different ways to design and enact such plans from a socio-interactional perspective.

Keywords: instructional innovation, learning and teaching system, lesson plan, pedagogical activities, teachers' beliefs and practices

Procedia PDF Downloads 128
706 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: document processing, framework, formal definition, machine learning

Procedia PDF Downloads 184
705 Digital Technology Relevance in Archival and Digitising Practices in the Republic of South Africa

Authors: Tashinga Matindike

Abstract:

By means of definition, digital artworks encompass an array of artistic productions that are expressed in a technological form as an essential part of a creative process. Examples include illustrations, photos, videos, sculptures, and installations. Within the context of the visual arts, the process of repatriation involves the return of once-appropriated goods. Archiving denotes the preservation of a commodity for storage purposes in order to nurture its continuity. The aforementioned definitions form the foundation of the academic framework and premise of the argument, which is outlined in this paper. This paper aims to define, discuss and decipher the complexities involved in digitising artworks, whilst explaining the benefits of the process, particularly within the South African context, which is rich in tangible and intangible traditional cultural material, objects, and performances. With the internet having been introduced to the African Continent in the early 1990s, this new form of technology, in its own right, initiated a high degree of efficiency, which also resulted in the progressive transformation of computer-generated visual output. Subsequently, this caused a revolutionary influence on the manner in which technological software was developed and uterlised in art-making. Digital technology and the digitisation of creative processes then opened up new avenues of collating and recording information. One of the first visual artists to make use of digital technology software in his creative productions was United States-based artist John Whitney. His inventive work contributed greatly to the onset and development of digital animation. Comparable by technique and originality, South African contemporary visual artists who make digital artworks, both locally and internationally, include David Goldblatt, Katherine Bull, Fritha Langerman, David Masoga, Zinhle Sethebe, Alicia Mcfadzean, Ivan Van Der Walt, Siobhan Twomey, and Fhatuwani Mukheli. In conclusion, the main objective of this paper is to address the following questions: In which ways has the South African art community of visual artists made use of and benefited from technology, in its digital form, as a means to further advance creativity? What are the positive changes that have resulted in art production in South Africa since the onset and use of digital technological software? How has digitisation changed the manner in which we record, interpret, and archive both written and visual information? What is the role of South African art institutions in the development of digital technology and its use in the field of visual art. What role does digitisation play in the process of the repatriation of artworks and artefacts. The methodology in terms of the research process of this paper takes on a multifacted form, inclusive of data analysis of information attained by means of qualitative and quantitative approaches.

Keywords: digital art, digitisation, technology, archiving, transformation and repatriation

Procedia PDF Downloads 25
704 Lexical Based Method for Opinion Detection on Tripadvisor Collection

Authors: Faiza Belbachir, Thibault Schienhinski

Abstract:

The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.

Keywords: Tripadvisor, opinion detection, SentiWordNet, trust score

Procedia PDF Downloads 165
703 Document-level Sentiment Analysis: An Exploratory Case Study of Low-resource Language Urdu

Authors: Ammarah Irum, Muhammad Ali Tahir

Abstract:

Document-level sentiment analysis in Urdu is a challenging Natural Language Processing (NLP) task due to the difficulty of working with lengthy texts in a language with constrained resources. Deep learning models, which are complex neural network architectures, are well-suited to text-based applications in addition to data formats like audio, image, and video. To investigate the potential of deep learning for Urdu sentiment analysis, we implemented five different deep learning models, including Bidirectional Long Short Term Memory (BiLSTM), Convolutional Neural Network (CNN), Convolutional Neural Network with Bidirectional Long Short Term Memory (CNN-BiLSTM), and Bidirectional Encoder Representation from Transformer (BERT). In this study, we developed a hybrid deep learning model called BiLSTM-Single Layer Multi Filter Convolutional Neural Network (BiLSTM-SLMFCNN) by fusing BiLSTM and CNN architecture. The proposed and baseline techniques are applied on Urdu Customer Support data set and IMDB Urdu movie review data set by using pre-trained Urdu word embedding that are suitable for sentiment analysis at the document level. Results of these techniques are evaluated and our proposed model outperforms all other deep learning techniques for Urdu sentiment analysis. BiLSTM-SLMFCNN outperformed the baseline deep learning models and achieved 83%, 79%, 83% and 94% accuracy on small, medium and large sized IMDB Urdu movie review data set and Urdu Customer Support data set respectively.

Keywords: urdu sentiment analysis, deep learning, natural language processing, opinion mining, low-resource language

Procedia PDF Downloads 36
702 Recurrent Neural Networks with Deep Hierarchical Mixed Structures for Chinese Document Classification

Authors: Zhaoxin Luo, Michael Zhu

Abstract:

In natural languages, there are always complex semantic hierarchies. Obtaining the feature representation based on these complex semantic hierarchies becomes the key to the success of the model. Several RNN models have recently been proposed to use latent indicators to obtain the hierarchical structure of documents. However, the model that only uses a single-layer latent indicator cannot achieve the true hierarchical structure of the language, especially a complex language like Chinese. In this paper, we propose a deep layered model that stacks arbitrarily many RNN layers equipped with latent indicators. After using EM and training it hierarchically, our model solves the computational problem of stacking RNN layers and makes it possible to stack arbitrarily many RNN layers. Our deep hierarchical model not only achieves comparable results to large pre-trained models on the Chinese short text classification problem but also achieves state of art results on the Chinese long text classification problem.

Keywords: nature language processing, recurrent neural network, hierarchical structure, document classification, Chinese

Procedia PDF Downloads 34
701 Multi-source Question Answering Framework Using Transformers for Attribute Extraction

Authors: Prashanth Pillai, Purnaprajna Mangsuli

Abstract:

Oil exploration and production companies invest considerable time and efforts to extract essential well attributes (like well status, surface, and target coordinates, wellbore depths, event timelines, etc.) from unstructured data sources like technical reports, which are often non-standardized, multimodal, and highly domain-specific by nature. It is also important to consider the context when extracting attribute values from reports that contain information on multiple wells/wellbores. Moreover, semantically similar information may often be depicted in different data syntax representations across multiple pages and document sources. We propose a hierarchical multi-source fact extraction workflow based on a deep learning framework to extract essential well attributes at scale. An information retrieval module based on the transformer architecture was used to rank relevant pages in a document source utilizing the page image embeddings and semantic text embeddings. A question answering framework utilizingLayoutLM transformer was used to extract attribute-value pairs incorporating the text semantics and layout information from top relevant pages in a document. To better handle context while dealing with multi-well reports, we incorporate a dynamic query generation module to resolve ambiguities. The extracted attribute information from various pages and documents are standardized to a common representation using a parser module to facilitate information comparison and aggregation. Finally, we use a probabilistic approach to fuse information extracted from multiple sources into a coherent well record. The applicability of the proposed approach and related performance was studied on several real-life well technical reports.

Keywords: natural language processing, deep learning, transformers, information retrieval

Procedia PDF Downloads 166
700 An Evaluation of 6th Grade History Curriculum in Ghana

Authors: Abigail Amoako Kayser, Brian Kayser

Abstract:

This study aimed to examine Ghana's 6th-grade Basic School history curriculum to determine how Ghanaian history is taught. We used qualitative methods and document analysis. The document analysis served two primary purposes: (1) To gain insight into what the curriculum materials covered and from whom's perspectives, and (2) To triangulate with teacher interview data. Documents obtained included: (1) Textbooks used by 6th-grade students, (2) Teacher pacing guide provided by the Department of Education in Ghana, and (3) Student work samples. This study was guided through Post-colonial theory and criticisms to explore the remnants of colonial power and hegemony that persist in history curricula used in public schools in Ghana. We also applied African Feminist Thought and Black Feminist Thought to unpack the extent to which issues of patriarchy, race, traditions, underdevelopment, and sexuality impact how we see the experiences of people on the continent. The findings indicated that the remnant of colonial rule persisted in the contents of the history curriculum, and the atrocities of slavery were overlooked or eliminated from the curriculum. The findings also indicated that Ghana's history centered on men's experiences.

Keywords: history, curriculum, decolonialization, culturally relevant pedagogy

Procedia PDF Downloads 38
699 Secure Text Steganography for Microsoft Word Document

Authors: Khan Farhan Rafat, M. Junaid Hussain

Abstract:

Seamless modification of an entity for the purpose of hiding a message of significance inside its substance in a manner that the embedding remains oblivious to an observer is known as steganography. Together with today's pervasive registering frameworks, steganography has developed into a science that offers an assortment of strategies for stealth correspondence over the globe that must, however, need a critical appraisal from security breach standpoint. Microsoft Word is amongst the preferably used word processing software, which comes as a part of the Microsoft Office suite. With a user-friendly graphical interface, the richness of text editing, and formatting topographies, the documents produced through this software are also most suitable for stealth communication. This research aimed not only to epitomize the fundamental concepts of steganography but also to expound on the utilization of Microsoft Word document as a carrier for furtive message exchange. The exertion is to examine contemporary message hiding schemes from security aspect so as to present the explorative discoveries and suggest enhancements which may serve a wellspring of information to encourage such futuristic research endeavors.

Keywords: hiding information in plain sight, stealth communication, oblivious information exchange, conceal, steganography

Procedia PDF Downloads 215