Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 4025

Search results for: scanned business documents

4025 The Platform for Digitization of Georgian Documents

Authors: Erekle Magradze, Davit Soselia, Levan Shughliashvili, Irakli Koberidze, Shota Tsiskaridze, Victor Kakhniashvili, Tamar Chaghiashvili

Abstract:

Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors. Implementing these stages requires a unified, scalable, and automated platform, where the digital service developed for each stage will perform the task assigned to it; at the same time, it will be possible to develop these services dynamically so that there is no interruption in the work of the platform.

Keywords: NLP, OCR, BERT, Kubernetes, transformers

Procedia PDF Downloads 144

4024 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili

Abstract:

Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: data mining, information retrieval, business, feature extraction, layout, business data processing, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 130

4023 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: document processing, framework, formal definition, machine learning

Procedia PDF Downloads 217

4022 The Use of Technology in Mathematics Learning (1995-2024): A Bibliometric Analysis

Authors: Rahma Adinda Sartika

Abstract:

The use of technology in learning mathematics has received a positive response from both students and teachers, so many researchers have conducted research on this theme. Based on the findings carried out in this study, 807 documents relevant to this theme have been published in Scopus from 1995-2024. After going through the stages of identification, screening, eligibility, and including, the documents that meet the criteria are 227 documents. These documents are then analyzed using the bibliometric method so that it can be seen that the most published documents in the Scopus database occurred in 2020, with 38 documents, and the lowest was from 1996 to 2000 and 2004 to 2007, namely, no documents published. The highest number of citations is in documents published in 2018, with a total of 349 citations, so the h-index is higher than the others. The country that published the most documents relevant to this theme is Indonesia with a total of 91 documents. The second largest is the United States, with a total of 28 published documents, and the third largest is China, with a total of 15 documents. Indonesia and the United States have the most working relationships between countries compared to other countries. The focus of research related to this theme is 1) mathematics learning, 2) learning systems, 3) engineering education, 4) technology and 5) mathematical concepts.

Keywords: technology, bibliometric, mathematics learning, mathematical concepts

Procedia PDF Downloads 56

4021 System of Quality Automation for Documents (SQAD)

Authors: R. Babi Saraswathi, K. Divya, A. Habeebur Rahman, D. B. Hari Prakash, S. Jayanth, T. Kumar, N. Vijayarangan

Abstract:

Document automation is the design of systems and workflows, assembling repetitive documents to meet the specific business needs. In any organization or institution, documenting employee’s information is very important for both employees as well as management. It shows an individual’s progress to the management. Many documents of the employee are in the form of papers, so it is very difficult to arrange and for future reference we need to spend more time in getting the exact document. Also, it is very tedious to generate reports according to our needs. The process gets even more difficult on getting approvals and hence lacks its security aspects. This project overcomes the above-stated issues. By storing the details in the database and maintaining the e-documents, the automation system reduces the manual work to a large extent. Then the approval process of some important documents can be done in a much-secured manner by using Digital Signature and encryption techniques. Details are maintained in the database and e-documents are stored in specific folders and generation of various kinds of reports is possible. Moreover, an efficient search method is implemented is used in the database. Automation supporting document maintenance in many aspects is useful for minimize data entry, reduce the time spent on proof-reading, avoids duplication, and reduce the risks associated with the manual error, etc.

Keywords: e-documents, automation, digital signature, encryption

Procedia PDF Downloads 391

4020 Words Spotting in the Images Handwritten Historical Documents

Authors: Issam Ben Jami

Abstract:

Information retrieval in digital libraries is very important because most famous historical documents occupy a significant value. The word spotting in historical documents is a very difficult notion, because automatic recognition of such documents is naturally cursive, it represents a wide variability in the level scale and translation words in the same documents. We first present a system for the automatic recognition, based on the extraction of interest points words from the image model. The extraction phase of the key points is chosen from the representation of the image as a synthetic description of the shape recognition in a multidimensional space. As a result, we use advanced methods that can find and describe interesting points invariant to scale, rotation and lighting which are linked to local configurations of pixels. We test this approach on documents of the 15th century. Our experiments give important results.

Keywords: feature matching, historical documents, pattern recognition, word spotting

Procedia PDF Downloads 274

4019 Data Gathering and Analysis for Arabic Historical Documents

Authors: Ali Dulla

Abstract:

This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.

Keywords: dataset production, ground truth production, historical documents, arbitrary warping, geometric correction

Procedia PDF Downloads 168

4018 Analyzing Business Model Choices and Sustainable Value Capturing: A Multiple Case Study of Sharing Economy Business Models

Authors: Minttu Laukkanen, Janne Huiskonen

Abstract:

This study investigates the sharing economy business models as examples of the sustainable business models. The aim is to contribute to the limited literature on sharing economy in connection with sustainable business models by explaining sharing economy business models value capturing. Specifically, this research answers the following question: How business model choices affect captured sustainable value? A multiple case study approach is applied in this study. Twenty different successful sharing economy business models focusing on consumer business and covering four main areas, accommodation, mobility, food, and consumer goods, are selected for analysis. The secondary data available on companies’ websites, previous research, reports, and other public documents are used. All twenty cases are analyzed through the sharing economy business model framework and sustainable value analysis framework using qualitative data analysis. This study represents general sharing economy business model value attributes and their specifications, i.e. sustainable value propositions for different stakeholders, and further explains the sustainability impacts of different sharing economy business models through captured and uncaptured value. In conclusion, this study represents how business model choices affect sustainable value capturing through eight business model attributes identified in this study. This paper contributes to the research on sustainable business models and sharing economy by examining how business model choices affect captured sustainable value. This study highlights the importance of careful business model and sustainability impacts analyses including the triple bottom line, multiple stakeholders and value captured and uncaptured perspectives as well as sustainability trade-offs. It is not self-evident that sharing economy business models advance sustainability, and business model choices does matter.

Keywords: sharing economy, sustainable business model innovation, sustainable value, value capturing

Procedia PDF Downloads 172

4017 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 332

4016 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: grayscale image format, image fusing, RGB image format, SURF detection, YCbCr image format

Procedia PDF Downloads 377

4015 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 518

4014 Quick Response(QR) Code for Vehicle Registration and Identification

Authors: S. Malarvizhi, S. Sadiq Basha, M. Santhosh Kumar, K. Saravanan, R. Sasikumar, R. Satheesh

Abstract:

This is a web based application which provides authorization for the vehicle identification and registration. It also provides mutual authentication between the police and users in order to avoid misusage. The QR code generation in this application overcomes the difficulty in the manual registration of the vehicle documents. This generated QR code is placed in the number plates of the vehicles. The QR code is scanned using the QR Reader installed in the smart devices. The police officials can check the vehicle details and file cases on accidents, theft and traffic rules violations using QR code. In addition to vehicle insurance payments and renewals, the renewal alert is sent to the vehicle owner about payment deadline. The non-permitted vehicles can be blocked in the next check-post by sending the alert messages.

Keywords: QR code, QR reader, registration, authentication, idenfication

Procedia PDF Downloads 494

4013 On Exploring Search Heuristics for improving the efficiency in Web Information Extraction

Authors: Patricia Jiménez, Rafael Corchuelo

Abstract:

Nowadays the World Wide Web is the most popular source of information that relies on billions of on-line documents. Web mining is used to crawl through these documents, collect the information of interest and process it by applying data mining tools in order to use the gathered information in the best interest of a business, what enables companies to promote theirs. Unfortunately, it is not easy to extract the information a web site provides automatically when it lacks an API that allows to transform the user-friendly data provided in web documents into a structured format that is machine-readable. Rule-based information extractors are the tools intended to extract the information of interest automatically and offer it in a structured format that allow mining tools to process it. However, the performance of an information extractor strongly depends on the search heuristic employed since bad choices regarding how to learn a rule may easily result in loss of effectiveness and/or efficiency. Improving search heuristics regarding efficiency is of uttermost importance in the field of Web Information Extraction since typical datasets are very large. In this paper, we employ an information extractor based on a classical top-down algorithm that uses the so-called Information Gain heuristic introduced by Quinlan and Cameron-Jones. Unfortunately, the Information Gain relies on some well-known problems so we analyse an intuitive alternative, Termini, that is clearly more efficient; we also analyse other proposals in the literature and conclude that none of them outperforms the previous alternative.

Keywords: information extraction, search heuristics, semi-structured documents, web mining.

Procedia PDF Downloads 335

4012 Business Logic and Environmental Policy, a Research Agenda for the Business-to-Citizen Business Model

Authors: Mats Nilsson

Abstract:

The European electricity markets have been changing from a regulated market, to in some places a deregulated market, and are now experiencing a strong influence of renewable support systems. Firm’s that rely on subsidies have a different business logic than firms acting in a market context. The article proposes that an offspring to the regular business models, the business-to-citizen, should be used. The case of the European electricity market frames the concept of a business-citizen business model, and a research agenda for this concept is outlined.

Keywords: business logic, business model, subsidies, business-to-citizen

Procedia PDF Downloads 462

4011 Social Business: Opportunities and Challenges

Authors: Muhammad Mustafizur Rahaman

Abstract:

Social business is a new concept in the field of Business Economics and Capitalist Economy. It has increased the importance in economic and social development in emerging economies. Professor Muhammad Yunus is the founding father of the notion. While conventional business underscores profit maximization as a core business principle, social business calls for addressing social problems at the expense of profit. This underlying principle gives social business advantageous position over conventional businesses to serve those who live at the bottom of the pyramid. It also poses grave challenges to the social business because social business sacrifices profit at one hand and seeks financial sustainability on the other. For the sake of its financial sustainability, the social business might increase the price of its product or service which might lower its social impact, thus, makes the business self-defeating. Therefore, social business should be more innovative in every business process including production, marketing, and management. Otherwise, the business is unlikely to be driven out from the society.

Keywords: innovativeness, self-defeat, social business, social problem

Procedia PDF Downloads 619

4010 Topology-Based Character Recognition Method for Coin Date Detection

Authors: Xingyu Pan, Laure Tougne

Abstract:

For recognizing coins, the graved release date is important information to identify precisely its monetary type. However, reading characters in coins meets much more obstacles than traditional character recognition tasks in the other fields, such as reading scanned documents or license plates. To address this challenging issue in a numismatic context, we propose a training-free approach dedicated to detection and recognition of the release date of the coin. In the first step, the date zone is detected by comparing histogram features; in the second step, a topology-based algorithm is introduced to recognize coin numbers with various font types represented by binary gradient map. Our method obtained a recognition rate of 92% on synthetic data and of 44% on real noised data.

Keywords: coin, detection, character recognition, topology

Procedia PDF Downloads 253

4009 Enhancement of Indexing Model for Heterogeneous Multimedia Documents: User Profile Based Approach

Authors: Aicha Aggoune, Abdelkrim Bouramoul, Mohamed Khiereddine Kholladi

Abstract:

Recent research shows that user profile as important element can improve heterogeneous information retrieval with its content. In this context, we present our indexing model for heterogeneous multimedia documents. This model is based on the combination of user profile to the indexing process. The general idea of our proposal is to operate the common concepts between the representation of a document and the definition of a user through his profile. These two elements will be added as additional indexing entities to enrich the heterogeneous corpus documents indexes. We have developed IRONTO domain ontology allowing annotation of documents. We will present also the developed tool validating the proposed model.

Keywords: indexing model, user profile, multimedia document, heterogeneous of sources, ontology

Procedia PDF Downloads 348

4008 Procedure for Recommendation of Archival Documents

Authors: Marlon J. Remedios, Maria T. Morell, Jesse D. Cano

Abstract:

Diffusion and accessibility of historical collections is one of the main objectives of the institutions that aim to safeguard archival documents (General Archives). Several countries have Web applications that try to make accessible and public the large number of documents that they guard. Each of these sites has a set of features in order to facilitate access, navigability, and search for information. Different sources of information include Recommender Systems as a way of customizing content. This paper aims at describing a process for the production of archival documents relevant to the user. To comply with this, the characteristics ruling archival description, elements and main techniques that establishes the design of Recommender Systems, a set of rules to follow, and how these rules operate and the way in which take advantage of the domain knowledge are discussed. Finally, relevant issues are discussed in the design of the proposed tests and the results obtained are shown.

Keywords: archival document, recommender system, procedure, information management

Procedia PDF Downloads 514

4007 On the Interactive Search with Web Documents

Authors: Mario Kubek, Herwig Unger

Abstract:

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-of-the-art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents

Keywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking

Procedia PDF Downloads 393

4006 Business Process Orientation: Case of Croatia

Authors: Ljubica Milanović Glavan

Abstract:

Because of the increasing business pressures, companies must be adaptable and flexible in order to withstand them. Inadequate business processes and low level of business process orientation, that in its core accentuates business processes as opposed to business functions and focuses on process performance and customer satisfaction, hider the ability to adapt to changing environment. It has been shown in previous studies that the companies which have reached higher business process maturity level consistently outperform those that have not reached them. The aim of this paper is to provide a basic understanding of business process orientation concept and business process maturity model. Besides that the paper presents the state of business process orientation in Croatia that has been captured with a study conducted in 2013. Based on the results some practical implications and guidelines for managers are given.

Keywords: business process orientation, business process maturity, Croatia, maturity score

Procedia PDF Downloads 547

4005 Characteristics of Inclusive Circular Business Models in Social Entrepreneurship

Authors: Svitlana Yermak, Olubukola Aluko

Abstract:

The purpose of this study was a literature review on the topic of social entrepreneurship, a review of new trends and best practices, the study of existing inclusive business models and their interaction with the principles of the circular economy for possible implementation in the practice of Ukraine in war and post-war times in conditions of scarce resources. Thus, three research questions were identified and substantiated: to determine the characteristics of social entrepreneurship, consider the features in Ukraine and the UK; highlight the criteria for inclusion in social entrepreneurship and its legal support; explore examples of existing inclusive circular business models to illustrate how the two concepts may be combined. A detailed review of the literature selected from the Scopus and Web of Science databases was carried out. The study revealed signs of social entrepreneurship, the main of which are doing business and making a profit, as well as the social orientation of the business, which is prescribed in the constituent documents of the enterprise immediately upon its creation. Considered are the characteristics of social entrepreneurship in the UK and Ukraine. It has been established that in the UK, social entrepreneurship is clearly regulated by the state; there are special legislative norms and support programs, in contrast to Ukraine, where these processes are only partially regulated. The study identified the main criteria for inclusion in inclusive circular business models: economic (sustainability and efficiency, job creation and economic growth, promotion of local development), social (accessibility, equity and fairness, inclusion and participation), and resources in their interconnection. It is substantiated that the resource criterion is especially important for this type of business model. It provides for the efficient and sustainable use of resources, as well as the cyclical nature of resources. And it was concluded that the principles of the circular economy not only do not contradict but, on the contrary, complement and expand the inclusive business models on which social entrepreneurship is based.

Keywords: social entrepreneurship, inclusive business models, circular economy, inclusion criteria

Procedia PDF Downloads 101

4004 Jelly and Beans: Appropriate Use of Ultrasound in Acute Kidney Injury

Authors: Raja Ezman Raja Shariff

Abstract:

Acute kidney injury (AKI) is commonly seen in inpatients, and places a great cost on the NHS and patients. Timely and appropriate management is both nephron sparing and potentially life-saving. Ultrasound scanning (USS) is a well-recognised method for stratifying patients. Subsequently, the NICE AKI guidance has defined groups in whom scanning is recommended within 6 hours of request (pyonephrosis), within 24 hours (obstruction/cause unknown), and in whom routine scanning isn't recommended (cause for AKI identified). The audit looks into whether Stockport NHS Trust USS practice was in line with such recommendations. The audit evaluated 92 patients with AKI who had USS, between 01/01/14 to 30/04/14. Data collection was divided into 2 parts. Firstly, radiology request cards and the online imaging software (PACS) were evaluated. Then, the electronic case notes (ADVANTIS) was evaluated further. Based on request cards, 10% of requests were for pyonephrosis. Only 33% were scanned within 6hours and a further 33% within 24hours. 75% were requested for possible obstructions and unknown cause collectively. Of those due to possible obstruction, 71% of patients were scanned within 24 hours. Of those with unknown cause, 50% were scanned within 24 hours. 15% of requests had a cause declared and so potentially did not require scanning. Evaluation of the patients’ notes suggested further interesting findings. Firstly, potentially 39% of patients had a known cause for AKI, therefore, did not need USS. Subsequently, the cohort of unknown cause and possible obstruction was collectively reduced to 45%. Alarmingly the patient cohort with possible pyonephrosis went up to 16%, suggesting an under-recognition of this life-threatening condition. We plan to highlight these findings within our institution and make changes to encourage more appropriate requesting and timely scanning. Time will tell if we manage to save or increase our costs in this cost-conscious NHS. Patient benefits, though, seem to be guaranteed.

Keywords: AKI, ARF, kidney, renal

Procedia PDF Downloads 399

4003 Logistics Support as a Key Success Factor in Gastronomy

Authors: Hanna Zietara

Abstract:

Gastronomy is one of the oldest forms of commercial activity. It is currently one of the most popular and still dynamically developing branches of business. Socio-economic changes, its widespread occurrence, new techniques, or culinary styles affect the almost unlimited possibilities of its development. Importantly, regardless of the form of business adopted, food service is strongly related to logistics processes, and areas of food service that are closely linked to logistics are of strategic importance. Any inefficiency in logistics processes results in reduced chances for success and achieving competitive advantage by companies belonging to the catering industry. The aim of the paper is to identify the areas of logistic support occurring in the catering business, affecting the scope of the logistic processes implemented. The aim of the paper is realized through a plural homogeneous approach, based on: direct observation, text analysis of current documents, in-depth free targeted interviews.

Keywords: gastronomy, competitive advantage, logistics, logistics support

Procedia PDF Downloads 163

4002 Binarization and Recognition of Characters from Historical Degraded Documents

Authors: Bency Jacob, S.B. Waykar

Abstract:

Degradations in historical document images appear due to aging of the documents. It is very difficult to understand and retrieve text from badly degraded documents as there is variation between the document foreground and background. Thresholding of such document images either result in broken characters or detection of false texts. Numerous algorithms exist that can separate text and background efficiently in the textual regions of the document; but portions of background are mistaken as text in areas that hardly contain any text. This paper presents a way to overcome these problems by a robust binarization technique that recovers the text from a severely degraded document images and thereby increases the accuracy of optical character recognition systems. The proposed document recovery algorithm efficiently removes degradations from document images. Here we are using the ostus method ,local thresholding and global thresholding and after the binarization training and recognizing the characters in the degraded documents.

Keywords: binarization, denoising, global thresholding, local thresholding, thresholding

Procedia PDF Downloads 344

4001 One-Class Support Vector Machine for Sentiment Analysis of Movie Review Documents

Authors: Chothmal, Basant Agarwal

Abstract:

Sentiment analysis means to classify a given review document into positive or negative polar document. Sentiment analysis research has been increased tremendously in recent times due to its large number of applications in the industry and academia. Sentiment analysis models can be used to determine the opinion of the user towards any entity or product. E-commerce companies can use sentiment analysis model to improve their products on the basis of users’ opinion. In this paper, we propose a new One-class Support Vector Machine (One-class SVM) based sentiment analysis model for movie review documents. In the proposed approach, we initially extract features from one class of documents, and further test the given documents with the one-class SVM model if a given new test document lies in the model or it is an outlier. Experimental results show the effectiveness of the proposed sentiment analysis model.

Keywords: feature selection methods, machine learning, NB, one-class SVM, sentiment analysis, support vector machine

Procedia PDF Downloads 517

4000 Model-Based Field Extraction from Different Class of Administrative Documents

Authors: Jinen Daghrir, Anis Kricha, Karim Kalti

Abstract:

The amount of incoming administrative documents is massive and manually processing these documents is a costly task especially on the timescale. In fact, this problem has led an important amount of research and development in the context of automatically extracting fields from administrative documents, in order to reduce the charges and to increase the citizen satisfaction in administrations. In this matter, we introduce an administrative document understanding system. Given a document in which a user has to select fields that have to be retrieved from a document class, a document model is automatically built. A document model is represented by an attributed relational graph (ARG) where nodes represent fields to extract, and edges represent the relation between them. Both of vertices and edges are attached with some feature vectors. When another document arrives to the system, the layout objects are extracted and an ARG is generated. The fields extraction is translated into a problem of matching two ARGs which relies mainly on the comparison of the spatial relationships between layout objects. Experimental results yield accuracy rates from 75% to 100% tested on eight document classes. Our proposed method has a good performance knowing that the document model is constructed using only one single document.

Keywords: administrative document understanding, logical labelling, logical layout analysis, fields extraction from administrative documents

Procedia PDF Downloads 213

3999 Design Criteria for an Internal Information Technology Cost Allocation to Support Business Information Technology Alignment

Authors: Andrea Schnabl, Mario Bernhart

Abstract:

The controlling instrument of an internal cost allocation (IT chargeback) is commonly used to make IT costs transparent and controllable. Information Technology (IT) became, especially for information industries, a central competitive factor. Consequently, the focus is not on minimizing IT costs but on the strategic aligned application of IT. Hence, an internal IT cost allocation should be designed to enhance the business-IT alignment (strategic alignment of IT) in order to support the effective application of IT from a company’s point of view. To identify design criteria for an internal cost allocation to support business alignment a case study analysis at a typical medium-sized firm in information industry is performed. Documents, Key Performance Indicators, and cost accounting data over a period of 10 years are analyzed and interviews are performed. The derived design criteria are evaluated by 6 heads of IT departments from 6 different companies, which have an internal IT cost allocation at use. By applying these design criteria an internal cost allocation serves not only for cost controlling but also as an instrument in strategic IT management.

Keywords: accounting for IT services, Business IT Alignment, internal cost allocation, IT controlling, IT governance, strategic IT management

Procedia PDF Downloads 155

3998 Business Continuity Opportunities in the Cloud a Small to Medium Business Perspective

Authors: Donald Zullick, Cihan Varol

Abstract:

This research paper begins with a look at current work in business continuity as it relates to the cloud and small to medium business (SMB). While cloud services are an emerging paradigm that is quickly making an impact on business, there has been no substantive research applied to SMB. Seeing this lapse, we have taken a fusion of continuity and cloud research with application to the SMB market. It is an initial reflection with base framework guidelines as a starting point for implementation. In this approach, our research ties together existing work and fill the gap with an SMB outlook.

Keywords: business continuity, cloud services, medium size business, risk assessment, small business

Procedia PDF Downloads 404

3997 A Transformer-Based Question Answering Framework for Software Contract Risk Assessment

Authors: Qisheng Hu, Jianglei Han, Yue Yang, My Hoa Ha

Abstract:

When a company is considering purchasing software for commercial use, contract risk assessment is critical to identify risks to mitigate the potential adverse business impact, e.g., security, financial and regulatory risks. Contract risk assessment requires reviewers with specialized knowledge and time to evaluate the legal documents manually. Specifically, validating contracts for a software vendor requires the following steps: manual screening, interpreting legal documents, and extracting risk-prone segments. To automate the process, we proposed a framework to assist legal contract document risk identification, leveraging pre-trained deep learning models and natural language processing techniques. Given a set of pre-defined risk evaluation problems, our framework utilizes the pre-trained transformer-based models for question-answering to identify risk-prone sections in a contract. Furthermore, the question-answering model encodes the concatenated question-contract text and predicts the start and end position for clause extraction. Due to the limited labelled dataset for training, we leveraged transfer learning by fine-tuning the models with the CUAD dataset to enhance the model. On a dataset comprising 287 contract documents and 2000 labelled samples, our best model achieved an F1 score of 0.687.

Keywords: contract risk assessment, NLP, transfer learning, question answering

Procedia PDF Downloads 129

3996 Investigation of Verbal Feedback and Learning Process for Oral Presentation

Authors: Nattawadee Sinpattanawong

Abstract:

Oral presentation has been used mostly in business communication. The business presentation is carrying out through an audio and visual presentation material such as statistical documents, projectors, etc. Common examples of business presentation are intra-organization and sales presentations. The study aims at investigating functions, strategies and contents of assessors’ verbal feedback on presenters’ oral presentations and exploring presenters’ learning process and specific views and expectations concerning assessors’ verbal feedback related to the delivery of the oral presentation. This study is designed as a descriptive qualitative research; four master students and one teacher in English for Business and Industry Presentation Techniques class of public university will be selected. The researcher hopes that any understanding how assessors’ verbal feedback on oral presentations and learning process may illuminate issues for other people. The data from this research may help to expand and facilitate the readers’ understanding of assessors’ verbal feedback on oral presentations and learning process in their own situations. The research instruments include an audio recorder, video recorder and an interview. The students will be interviewing in order to ask for their views and expectations concerning assessors’ verbal feedback related to the delivery of the oral presentation. After finishing data collection, the data will be analyzed and transcribed. The findings of this study are significant because it can provide presenters knowledge to enhance their learning process and provide teachers knowledge about providing verbal feedback on student’s oral presentations on a business context.

Keywords: business context, learning process, oral presentation, verbal feedback

Procedia PDF Downloads 194