Search results for: Chi-Ching Lee

3 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: document processing, framework, formal definition, machine learning

Procedia PDF Downloads 183

2 Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison

Authors: Po-Fang Hsu, Chiching Wei

Abstract:

In this paper, we present a novel neural graph matching approach applied to document comparison. Document comparison is a common task in the legal and financial industries. In some cases, the most important differences may be the addition or omission of words, sentences, clauses, or paragraphs. However, it is a challenging task without recording or tracing the whole edited process. Under many temporal uncertainties, we explore the potentiality of our approach to proximate the accurate comparison to make sure which element blocks have a relation of edition with others. In the beginning, we apply a document layout analysis that combines traditional and modern technics to segment layouts in blocks of various types appropriately. Then we transform this issue into a problem of layout graph matching with textual awareness. Regarding graph matching, it is a long-studied problem with a broad range of applications. However, different from previous works focusing on visual images or structural layout, we also bring textual features into our model for adapting this domain. Specifically, based on the electronic document, we introduce an encoder to deal with the visual presentation decoding from PDF. Additionally, because the modifications can cause the inconsistency of document layout analysis between modified documents and the blocks can be merged and split, Sinkhorn divergence is adopted in our neural graph approach, which tries to overcome both these issues with many-to-many block matching. We demonstrate this on two categories of layouts, as follows., legal agreement and scientific articles, collected from our real-case datasets.

Keywords: document comparison, graph matching, graph neural network, modification similarity, multi-modal

Procedia PDF Downloads 149

1 An Observation Approach of Reading Order for Single Column and Two Column Layout Template

Authors: In-Tsang Lin, Chiching Wei

Abstract:

Reading order is an important task in many digitization scenarios involving the preservation of the logical structure of a document. From the paper survey, it finds that the state-of-the-art algorithm could not fulfill to get the accurate reading order in the portable document format (PDF) files with rich formats, diverse layout arrangement. In recent years, most of the studies on the analysis of reading order have targeted the specific problem of associating layout components with logical labels, while less attention has been paid to the problem of extracting relationships the problem of detecting the reading order relationship between logical components, such as cross-references. Over 3 years of development, the company Foxit has demonstrated the layout recognition (LR) engine in revision 20601 to eager for the accuracy of the reading order. The bounding box of each paragraph can be obtained correctly by the Foxit LR engine, but the result of reading-order is not always correct for single-column, and two-column layout format due to the table issue, formula issue, and multiple mini separated bounding box and footer issue. Thus, the algorithm is developed to improve the accuracy of the reading order based on the Foxit LR structure. In this paper, a creative observation method (Here called the MESH method) is provided here to open a new chance in the research of the reading-order field. Here two important parameters are introduced, one parameter is the number of the bounding box on the right side of the present bounding box (NRight), and another parameter is the number of the bounding box under the present bounding box (Nunder). And the normalized x-value (x/the whole width), the normalized y-value (y/the whole height) of each bounding box, the x-, and y- position of each bounding box were also put into consideration. Initial experimental results of single column layout format demonstrate a 19.33% absolute improvement in accuracy of the reading-order over 7 PDF files (total 150 pages) using our proposed method based on the LR structure over the baseline method using the LR structure in 20601 revision, which its accuracy of the reading-order is 72%. And for two-column layout format, the preliminary results demonstrate a 44.44% absolute improvement in accuracy of the reading-order over 2 PDF files (total 18 pages) using our proposed method based on the LR structure over the baseline method using the LR structure in 20601 revision, which its accuracy of the reading-order is 0%. Until now, the footer issue and a part of multiple mini separated bounding box issue can be solved by using the MESH method. However, there are still three issues that cannot be solved, such as the table issue, formula issue, and the random multiple mini separated bounding boxes. But the detection of the table position and the recognition of the table structure are out of the scope in this paper, and there is needed another research. In the future, the tasks are chosen- how to detect the table position in the page and to extract the content of the table.

Keywords: document processing, reading order, observation method, layout recognition

Procedia PDF Downloads 149