Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 5454

Search results for: fields extraction from administrative documents

5454 Model-Based Field Extraction from Different Class of Administrative Documents

Authors: Jinen Daghrir, Anis Kricha, Karim Kalti

Abstract:

The amount of incoming administrative documents is massive and manually processing these documents is a costly task especially on the timescale. In fact, this problem has led an important amount of research and development in the context of automatically extracting fields from administrative documents, in order to reduce the charges and to increase the citizen satisfaction in administrations. In this matter, we introduce an administrative document understanding system. Given a document in which a user has to select fields that have to be retrieved from a document class, a document model is automatically built. A document model is represented by an attributed relational graph (ARG) where nodes represent fields to extract, and edges represent the relation between them. Both of vertices and edges are attached with some feature vectors. When another document arrives to the system, the layout objects are extracted and an ARG is generated. The fields extraction is translated into a problem of matching two ARGs which relies mainly on the comparison of the spatial relationships between layout objects. Experimental results yield accuracy rates from 75% to 100% tested on eight document classes. Our proposed method has a good performance knowing that the document model is constructed using only one single document.

Keywords: administrative document understanding, logical labelling, logical layout analysis, fields extraction from administrative documents

Procedia PDF Downloads 208

5453 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili

Abstract:

Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: data mining, information retrieval, business, feature extraction, layout, business data processing, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 126

5452 A Proposed Approach for Emotion Lexicon Enrichment

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.

Keywords: document analysis, sentimental analysis, emotion detection, WEKA tool, NRC lexicon

Procedia PDF Downloads 433

5451 Words Spotting in the Images Handwritten Historical Documents

Authors: Issam Ben Jami

Abstract:

Information retrieval in digital libraries is very important because most famous historical documents occupy a significant value. The word spotting in historical documents is a very difficult notion, because automatic recognition of such documents is naturally cursive, it represents a wide variability in the level scale and translation words in the same documents. We first present a system for the automatic recognition, based on the extraction of interest points words from the image model. The extraction phase of the key points is chosen from the representation of the image as a synthetic description of the shape recognition in a multidimensional space. As a result, we use advanced methods that can find and describe interesting points invariant to scale, rotation and lighting which are linked to local configurations of pixels. We test this approach on documents of the 15th century. Our experiments give important results.

Keywords: feature matching, historical documents, pattern recognition, word spotting

Procedia PDF Downloads 269

5450 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 104

5449 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 324

5448 On the Interactive Search with Web Documents

Authors: Mario Kubek, Herwig Unger

Abstract:

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-of-the-art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents

Keywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking

Procedia PDF Downloads 389

5447 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 138

5446 Examines the Proportionality between the Needs of Industry and Technical and Vocational Training of Male and Female Vocational Schools

Authors: Khalil Aryanfar, Pariya Gholipor, Elmira Hafez

Abstract:

This study examines the proportionality between the needs of industry and technical and vocational training of male and female vocational schools. The research method was descriptive that was conducted in two parts: documentary analysis and needs assessment and Delphi method was used in the need assessment. The statistical population of the study included 312 individuals from the industry sector employers and 52 of them were selected through stratified random sampling. Methods of data collection in this study, upstream documents include: document of the development of technical and vocational training, Statistical Yearbook 1393 in Tehran, the available documents in Isfahan Planning Department, the findings indicate that there is an almost proportionality between the needs of industry and Vocational training of male and female vocational schools in fields of welding, industrial electronics, electro technique, industrial drawing, auto mechanics, design, packaging, machine tool, metalworking, construction, accounting, computer graphics and the Administrative Affairs. The findings indicate that there is no proportionality between the needs of industry and Vocational training of male and female vocational schools in fields of Thermal - cooling systems, building electricity, building drawing, interior architecture, car electricity and motor repair.

Keywords: needs assessment, technical and vocational training, industry

Procedia PDF Downloads 450

5445 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: grayscale image format, image fusing, RGB image format, SURF detection, YCbCr image format

Procedia PDF Downloads 372

5444 On Exploring Search Heuristics for improving the efficiency in Web Information Extraction

Authors: Patricia Jiménez, Rafael Corchuelo

Abstract:

Nowadays the World Wide Web is the most popular source of information that relies on billions of on-line documents. Web mining is used to crawl through these documents, collect the information of interest and process it by applying data mining tools in order to use the gathered information in the best interest of a business, what enables companies to promote theirs. Unfortunately, it is not easy to extract the information a web site provides automatically when it lacks an API that allows to transform the user-friendly data provided in web documents into a structured format that is machine-readable. Rule-based information extractors are the tools intended to extract the information of interest automatically and offer it in a structured format that allow mining tools to process it. However, the performance of an information extractor strongly depends on the search heuristic employed since bad choices regarding how to learn a rule may easily result in loss of effectiveness and/or efficiency. Improving search heuristics regarding efficiency is of uttermost importance in the field of Web Information Extraction since typical datasets are very large. In this paper, we employ an information extractor based on a classical top-down algorithm that uses the so-called Information Gain heuristic introduced by Quinlan and Cameron-Jones. Unfortunately, the Information Gain relies on some well-known problems so we analyse an intuitive alternative, Termini, that is clearly more efficient; we also analyse other proposals in the literature and conclude that none of them outperforms the previous alternative.

Keywords: information extraction, search heuristics, semi-structured documents, web mining.

Procedia PDF Downloads 330

5443 Literature Review on Text Comparison Techniques: Analysis of Text Extraction, Main Comparison and Visual Representation Tools

Authors: Andriana Mkrtchyan, Vahe Khlghatyan

Abstract:

The choice of a profession is one of the most important decisions people make throughout their life. With the development of modern science, technologies, and all the spheres existing in the modern world, more and more professions are being arisen that complicate even more the process of choosing. Hence, there is a need for a guiding platform to help people to choose a profession and the right career path based on their interests, skills, and personality. This review aims at analyzing existing methods of comparing PDF format documents and suggests that a 3-stage approach is implemented for the comparison, that is – 1. text extraction from PDF format documents, 2. comparison of the extracted text via NLP algorithms, 3. comparison representation using special shape and color psychology methodology.

Keywords: color psychology, data acquisition/extraction, data augmentation, disambiguation, natural language processing, outlier detection, semantic similarity, text-mining, user evaluation, visual search

Procedia PDF Downloads 64

5442 On the Road towards Effective Administrative Justice in Macedonia, Albania and Kosovo: Common Challenges and Problems

Authors: Arlinda Memetaj

Abstract:

A sound system of administrative justice represents a vital element of democratic governance. The proper control of public administration consists not only of a sound civil service framework and legislative oversight, but empowerment of the public and courts to hold public officials accountable for their decision-making through the application of fair administrative procedural rules and the use of appropriate administrative appeals processes and judicial review. The establishment of both effective public administration and administrative justice system has been for a long period of time among the most ‘important and urgent’ final strategic objectives of almost any country in the Balkans region, including Macedonia, Albania and Kosovo. Closely related to this is their common strategic goal to enter the membership in the European Union, which requires fulfilling of many criteria and standards as incorporated in EU acquis communautaire. The latter is presently done with the framework of the Stabilization and Association Agreement which each of these countries has concluded with the EU accordingly. To above aims, each of the three countries has so far adopted a huge series of legislative and strategic documents related to any aspects of their individual administrative justice system. ‘Changes and reforms’ in this field have been thus the most frequent terms being used in any of these countries. The three countries have already established their own national administrative judiciary, while permanently amending their laws on the general administrative procedure introducing thereby considerable innovations concerned. National administrative courts are expected to have crucial important role within the broader judiciary systems-related reforms of these countries; they are designed to check the legality of decisions of the state administration with the aim to guarantee an effective protection of human rights and legitimate interests of private persons through a regular, conform, fast and reasonable judicial administrative process. Further improvements in this field are presently an integral crucial part of all the relevant national strategic documents including the ones on judiciary reform and public administration reform, as adopted by each of the three countries; those strategic documents are designed among others to provide effective protection of their citizens` rights` of administrative justice. On the basis of the later, the paper finally is aimed at highlighting selective common challenges and problems of the three countries on their European road, while claiming (among others) that the current status quo situation in each of them may be overcome only if there is a proper implementation of the administrative courts decisions and a far stricter international monitoring process thereof. A new approach and strong political commitment from the highest political leadership is thus absolutely needed to ensure the principles of transparency, accountability and merit in public administration. The main methods used in this paper include the analytical and comparative ones due to the very character of the paper itself.

Keywords: administrative courts , administrative justice, administrative procedure, benefit, effective administrative justice, human rights, implementation, monitoring, reform

Procedia PDF Downloads 150

5441 The Duties of the Immortals and the Name of Anauša or Anušiya

Authors: Behzad Moeini Sam, Sara Mohammadi Avandi

Abstract:

One of the reasons for the success of the Achaemenids was the innovation and precise organization used in the administrative and military fields. Of course, these organizations had their roots in the previous governments that had changed in these borrowings. The units of the Achaemenid army are also among the cases that have their origins in the ancient East. In this article, the attempt is to find the sources of the Immortal Army based on the writings of old and current authors and archaeological documents, and the name mentioned by Herodotus and rejected by some authors. Of course, linguistic sources have also been used for better conclusions than the indicated sources. It emphasizes linguistic data to lead to a better deduction. Thus, it was included that ‘anauša’ is more probable than anušiya.

Keywords: army, immortal, ten thousand, Anauša, Anušiya

Procedia PDF Downloads 68

5440 Systems and Procedures in Indonesian Administrative Law

Authors: Andhika Danesjvara

Abstract:

Governance of the Republic of Indonesia should be based on the principle of sovereignty and the rule of law. Based on these principles, all forms of decisions and/or actions of government administration should be based on the sovereignty of the people and the law. Decisions and/or actions for citizens should be based on the provisions of the legislation and the general principles of good governance. Control of the decisions and/or actions is a part of administrative review and also judicial control. The control is part of the administrative justice system, which is intended for people affected by the decisions or administrative actions. This control is the duty and authority of the government or independent administrative court. Therefore, systems and procedures for the implementation of the task of governance and development must be regulated by law. Systems and procedures of governance is a subject studied in administrative law, therefore, the research also includes a review of the principles of law in administrative law. The administrative law procedure is important for the government to make decisions, the question is whether the procedures are part of the justice system itself.

Keywords: administrative court, administrative justice, administrative law, administrative procedures

Procedia PDF Downloads 282

5439 Effect of Electromagnetic Fields on Protein Extraction from Shrimp By-Products for Electrospinning Process

Authors: Guido Trautmann-Sáez, Mario Pérez-Won, Vilbett Briones, María José Bugueño, Gipsy Tabilo-Munizaga, Luis Gonzáles-Cavieres

Abstract:

Shrimp by-products are a valuable source of protein. However, traditional protein extraction methods have limitations in terms of their efficiency. Protein extraction from shrimp (Pleuroncodes monodon) industrial by-products assisted with ohmic heating (OH), microwave (MW) and pulsed electric field (PEF). It was performed by chemical method (using NaOH and HCl 2M) assisted with OH, MW and PEF in a continuous flow system (5 ml/s). Protein determination, differential scanning calorimetry (DSC) and Fourier-transform infrared (FTIR). Results indicate a 19.25% (PEF) 3.65% (OH) and 28.19% (MW) improvement in protein extraction efficiency. The most efficient method was selected for the electrospinning process and obtaining fiber.

Keywords: electrospinning process, emerging technology, protein extraction, shrimp by-products

Procedia PDF Downloads 83

5438 The Use of Technology in Mathematics Learning (1995-2024): A Bibliometric Analysis

Authors: Rahma Adinda Sartika

Abstract:

The use of technology in learning mathematics has received a positive response from both students and teachers, so many researchers have conducted research on this theme. Based on the findings carried out in this study, 807 documents relevant to this theme have been published in Scopus from 1995-2024. After going through the stages of identification, screening, eligibility, and including, the documents that meet the criteria are 227 documents. These documents are then analyzed using the bibliometric method so that it can be seen that the most published documents in the Scopus database occurred in 2020, with 38 documents, and the lowest was from 1996 to 2000 and 2004 to 2007, namely, no documents published. The highest number of citations is in documents published in 2018, with a total of 349 citations, so the h-index is higher than the others. The country that published the most documents relevant to this theme is Indonesia with a total of 91 documents. The second largest is the United States, with a total of 28 published documents, and the third largest is China, with a total of 15 documents. Indonesia and the United States have the most working relationships between countries compared to other countries. The focus of research related to this theme is 1) mathematics learning, 2) learning systems, 3) engineering education, 4) technology and 5) mathematical concepts.

Keywords: technology, bibliometric, mathematics learning, mathematical concepts

Procedia PDF Downloads 36

5437 Improvement of Protein Extraction From Shrimp by Product Used for Electrospinning by Applying Emerging Technologies

Authors: Mario Pérez-Won, Vilbett Briones L., Guido Trautmann, María José Bugueño, Gipsy Tabilo-Munizaga, Luis Gonzalez-Cavieres

Abstract:

The fishing industry generates a significant amount of shrimp byproducts, which often result in environmental contamination. Protein extraction from these by-products is a potential solution to minimize waste and revalue the by-products. To improve the extraction of proteins (by chemical method) from shrimp (Pleuroncodes monodon) by-products, the emerging technologies of ohmic heating (OH), microwaves (MW) and pulsed electric fields (PEF) were used. The results show that microwaves, electrical pulses, and ohmic heating improved performance by 28.19%, 19.25%, and 3.65%, respectively. Furthermore, conformational changes were studied by DSC and FTIR. Subsequently, the use of these proteins in electrospinning technology was evaluated. In conclusion, this study demonstrates that the application of emerging technologies, can significantly improve the extraction yield of proteins from shrimp by-products.

Keywords: electrospinning, emerging technologies, improving extraction, shrimp by-products

Procedia PDF Downloads 70

5436 Mechanisms of Ginger Bioactive Compounds Extract Using Soxhlet and Accelerated Water Extraction

Authors: M. N. Azian, A. N. Ilia Anisa, Y. Iwai

Abstract:

The mechanism for extraction bioactive compounds from plant matrix is essential for optimizing the extraction process. As a benchmark technique, a soxhlet extraction has been utilized for discussing the mechanism and compared with an accelerated water extraction. The trends of both techniques show that the process involves extraction and degradation. The highest yields of 6-, 8-, 10-gingerols and 6-shogaol in soxhlet extraction were 13.948, 7.12, 10.312 and 2.306 mg/g, respectively. The optimum 6-, 8-, 10-gingerols and 6-shogaol extracted by the accelerated water extraction at 140oC were 68.97±3.95 mg/g at 3min, 18.98±3.04 mg/g at 5min, 5.167±2.35 mg/g at 3min and 14.57±6.27 mg/g at 3min, respectively. The effect of temperature at 3mins shows that the concentration of 6-shogaol increased rapidly as decreasing the recovery of 6-gingerol.

Keywords: mechanism, ginger bioactive compounds, soxhlet extraction, accelerated water extraction

Procedia PDF Downloads 428

5435 Building up of European Administrative Space at Central and Local Level as a Key Challenge for the Kosovo's Further State Building Process

Authors: Arlinda Memetaj

Abstract:

Building up of a well-functioning administrative justice system is one of the key prerequisites for ensuring the existence of an accountable and efficient public administration in Kosovo as well. To this aim, the country has already established an almost comprehensive legislative and institutional frameworks. The latter derives from (among others) the Kosovo`s Stabilisation and Association Agreement with the EU of 2016. A series of efforts are being presently still undertaken by all relevant domestic and international stakeholders being active in both the Kosovo`s public administration reform and the country` s system of a local self-government. Both systems are thus under a constant state of reform. Despite the aforesaid, there is still a series of shortcomings in the country in above context. There is a lot of backlog of administrative cases in the Prishtina Administrative court; there is a public lack in judiciary; the public administration is organized in a fragmented way; the administrative laws are still not properly implemented at local level; the municipalities` legislative and executive branches are not sufficiently transparent for the ordinary citizens ... Against the above short background, the full paper firstly outlines the legislative and institutional framework of the Kosovo's systems of an administrative justice and local self-government (on the basis of the fact that public administration and local government are not separate fields). It then illustrates the key specific shortcomings in those fields, as seen from the perspective of the citizens' right to good administration. It finally claims that the current status quo situation in the country may be resolved (among others) by granting Kosovo a status of full member state of the Council of Europe or at least granting it with a temporary status of a contracting party of (among others) the European Human Rights Convention. The later would enable all Kosovo citizens (regardless their ethnic or other origin whose human rights are violated by the Kosovo`s relative administrative authorities including the administrative courts) to bring their case/s before the respective well-known European Strasbourg-based Human Rights Court. This would consequently put the State under permanent and full monitoring process, with a view to obliging the country to properly implement the European Court`s decisions (as adopted by this court in those cases). This would be a benefit first of all for the very Kosovo`s ordinary citizens regardless their ethnic or other background. It would provide for a particular positive input in the ongoing efforts being undertaken by Kosovo and Serbia states within the EU-facilitated Dialogue, with a view to building up of an integral administrative justice system at central and local level in the whole Kosovo` s territory. The main method used in this paper is the descriptive, analytical and comparative one.

Keywords: administrative courts, administrative justice, administrative procedure, benefit, European Human Rights Court, human rights, monitoring, reform.

Procedia PDF Downloads 299

5434 BIM-Based Tool for Sustainability Assessment and Certification Documents Provision

Authors: Taki Eddine Seghier, Mohd Hamdan Ahmad, Yaik-Wah Lim, Samuel Opeyemi Williams

Abstract:

The assessment of building sustainability to achieve a specific green benchmark and the preparation of the required documents in order to receive a green building certification, both are considered as major challenging tasks for green building design team. However, this labor and time-consuming process can take advantage of the available Building Information Modeling (BIM) features such as material take-off and scheduling. Furthermore, the workflow can be automated in order to track potentially achievable credit points and provide rating feedback for several design options by using integrated Visual Programing (VP) to handle the stored parameters within the BIM model. Hence, this study proposes a BIM-based tool that uses Green Building Index (GBI) rating system requirements as a unique input case to evaluate the building sustainability in the design stage of the building project life cycle. The tool covers two key models for data extraction, firstly, a model for data extraction, calculation and the classification of achievable credit points in a green template, secondly, a model for the generation of the required documents for green building certification. The tool was validated on a BIM model of residential building and it serves as proof of concept that building sustainability assessment of GBI certification can be automatically evaluated and documented through BIM.

Keywords: green building rating system, GBRS, building information modeling, BIM, visual programming, VP, sustainability assessment

Procedia PDF Downloads 322

5433 The Role of Named Entity Recognition for Information Extraction

Authors: Girma Yohannis Bade, Olga Kolesnikova, Grigori Sidorov

Abstract:

Named entity recognition (NER) is a building block for information extraction. Though the information extraction process has been automated using a variety of techniques to find and extract a piece of relevant information from unstructured documents, the discovery of targeted knowledge still poses a number of research difficulties because of the variability and lack of structure in Web data. NER, a subtask of information extraction (IE), came to exist to smooth such difficulty. It deals with finding the proper names (named entities), such as the name of the person, country, location, organization, dates, and event in a document, and categorizing them as predetermined labels, which is an initial step in IE tasks. This survey paper presents the roles and importance of NER to IE from the perspective of different algorithms and application area domains. Thus, this paper well summarizes how researchers implemented NER in particular application areas like finance, medicine, defense, business, food science, archeology, and so on. It also outlines the three types of sequence labeling algorithms for NER such as feature-based, neural network-based, and rule-based. Finally, the state-of-the-art and evaluation metrics of NER were presented.

Keywords: the role of NER, named entity recognition, information extraction, sequence labeling algorithms, named entity application area

Procedia PDF Downloads 76

5432 Management Software for the Elaboration of an Electronic File in the Pharmaceutical Industry Following Mexican Regulations

Authors: M. Peña Aguilar Juan, Ríos Hernández Ezequiel, R. Valencia Luis

Abstract:

For certification, certain goods of public interest, such as medicines and food, it is required the preparation and delivery of a dossier. For its elaboration, legal and administrative knowledge must be taken, as well as organization of the documents of the process, and an order that allows the file verification. Therefore, a virtual platform was developed to support the process of management and elaboration of the dossier, providing accessibility to the information and interfaces that allow the user to know the status of projects. The development of dossier system on the cloud allows the inclusion of the technical requirements for the software management, including the validation and the manufacturing in the field industry. The platform guides and facilitates the dossier elaboration (report, file or history), considering Mexican legislation and regulations, it also has auxiliary tools for its management. This technological alternative provides organization support for documents and accessibility to the information required to specify the successful development of a dossier. The platform divides into the following modules: System control, catalog, dossier and enterprise management. The modules are designed per the structure required in a dossier in those areas. However, the structure allows for flexibility, as its goal is to become a tool that facilitates and does not obstruct processes. The architecture and development of the software allows flexibility for future work expansion to other fields, this would imply feeding the system with new regulations.

Keywords: electronic dossier, cloud management software, pharmaceutical industry, sanitary registration

Procedia PDF Downloads 290

5431 Bamboo Fibre Extraction and Its Reinforced Polymer Composite Material

Authors: P. Zakikhani, R. Zahari, M. T. H. Sultan, D. L. Majid

Abstract:

Natural plant fibres reinforced polymeric composite materials have been used in many fields of our lives to save the environment. Especially, bamboo fibres due to its environmental sustainability, mechanical properties, and recyclability have been utilized as reinforced polymer matrix composite in construction industries. In this review study bamboo structure and three different methods such as mechanical, chemical and combination of mechanical and chemical to extract fibres from bamboo are summarized. Each extraction method has been done base on the application of bamboo. In addition Bamboo fibre is compared with glass fibre from various aspects and in some parts it has advantages over the glass fibre.

Keywords: bamboo fibres, natural fibres, bio composite, mechanical extraction, glass fibres

Procedia PDF Downloads 482

5430 On the Right an Effective Administrative Justice in the Republic of Macedonia: Challenges and Problems

Authors: Arlinda Memetaj

Abstract:

A sound system of administrative justice represents a vital element of democratic governance. The proper control of public administration consists not only of a sound civil service framework and legislative oversight, but empowerment of the public and courts to hold public officials accountable for their decision-making through the application of fair administrative procedural rules and the use of appropriate administrative appeals processes and judicial review. The establishment of effective public administration, has been since 1990s among the most 'important and urgent' final strategic objectives of the Republic of Macedonia. To this aim the country has so far adopted a huge series of legislative and strategic documents related to any aspects of the administrative justice system. The latter is designed to strengthen the legal position of citizens, businesses, civic organizations, and other societal subjects. 'Changes and reforms' in this field have been thus the most frequent terms being used in the country for the last more than 20 years. Several years ago the County established Administrative Courts, while permanently amending the Law on the General Administrative procedure (LGAP). The new LGAP was adopted in 2015 and it introduced considerable innovations concerned. The most recent inputs in this regard includes the National Public Administration Reform Strategy 2017 – 2022, one of the key expected result of which includes both providing effective protection of the citizens` rights. In doing the aforesaid however there is still a series of interrelated shortcomings in this regard, such as (just to mention few) the complex appeal procedure, delays in enforcing court rulings, etc. Against the above background, the paper firstly describes the Macedonian institutional and legislative framework in the above field, and then illustrates the shortcomings therein. It finally claims that the current status quo situation may be overcome only if there is a proper implementation of the administrative courts decisions and far stricter international monitoring process thereof. A new approach and strong political commitment from the highest political leadership is thus absolutely needed to ensure the principles of transparency, accountability and merit in public administration. The main method used in this paper is the descriptive, analytical and comparative one due to the very character of the paper itself.

Keywords: administrative justice, administrative procedure, administrative courts/disputes, European Human Rights Court, human rights, monitoring, reform, benefit.

Procedia PDF Downloads 149

5429 Using the Smith-Waterman Algorithm to Extract Features in the Classification of Obesity Status

Authors: Rosa Figueroa, Christopher Flores

Abstract:

Text categorization is the problem of assigning a new document to a set of predetermined categories, on the basis of a training set of free-text data that contains documents whose category membership is known. To train a classification model, it is necessary to extract characteristics in the form of tokens that facilitate the learning and classification process. In text categorization, the feature extraction process involves the use of word sequences also known as N-grams. In general, it is expected that documents belonging to the same category share similar features. The Smith-Waterman (SW) algorithm is a dynamic programming algorithm that performs a local sequence alignment in order to determine similar regions between two strings or protein sequences. This work explores the use of SW algorithm as an alternative to feature extraction in text categorization. The dataset used for this purpose, contains 2,610 annotated documents with the classes Obese/Non-Obese. This dataset was represented in a matrix form using the Bag of Word approach. The score selected to represent the occurrence of the tokens in each document was the term frequency-inverse document frequency (TF-IDF). In order to extract features for classification, four experiments were conducted: the first experiment used SW to extract features, the second one used unigrams (single word), the third one used bigrams (two word sequence) and the last experiment used a combination of unigrams and bigrams to extract features for classification. To test the effectiveness of the extracted feature set for the four experiments, a Support Vector Machine (SVM) classifier was tuned using 20% of the dataset. The remaining 80% of the dataset together with 5-Fold Cross Validation were used to evaluate and compare the performance of the four experiments of feature extraction. Results from the tuning process suggest that SW performs better than the N-gram based feature extraction. These results were confirmed by using the remaining 80% of the dataset, where SW performed the best (accuracy = 97.10%, weighted average F-measure = 97.07%). The second best was obtained by the combination of unigrams-bigrams (accuracy = 96.04, weighted average F-measure = 95.97) closely followed by the bigrams (accuracy = 94.56%, weighted average F-measure = 94.46%) and finally unigrams (accuracy = 92.96%, weighted average F-measure = 92.90%).

Keywords: comorbidities, machine learning, obesity, Smith-Waterman algorithm

Procedia PDF Downloads 292

5428 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 135

5427 Data Gathering and Analysis for Arabic Historical Documents

Authors: Ali Dulla

Abstract:

This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.

Keywords: dataset production, ground truth production, historical documents, arbitrary warping, geometric correction

Procedia PDF Downloads 161

5426 Analytical Study of Cobalt(II) and Nickel(II) Extraction with Salicylidene O-, M-, and P-Toluidine in Chloroform

Authors: Sana Almi, Djamel Barkat

Abstract:

The solvent extraction of cobalt (II) and nickel (II) from aqueous sulfate solutions were investigated with the analytical methods of slope analysis using salicylidene aniline and the three isomeric o-, m- and p-salicylidene toluidine diluted with chloroform at 25°C. By a statistical analysis of the extraction data, it was concluded that the extracted species are CoL2 with CoL2(HL) and NiL2 (HL denotes HSA, HSOT, HSMT, and HSPT). The extraction efficiency of Co(II) was higher than Ni(II). This tendency is confirmed from numerical extraction constants for each metal cations. The best extraction was according to the following order: HSMT > HSPT > HSOT > HSA for Co2+ and Ni2+.

Keywords: solvent extraction, nickel(II), cobalt(II), salicylidene aniline, o-, m-, and p-salicylidene toluidine

Procedia PDF Downloads 476

5425 Extraction of Essential Oil From Orange Peels

Authors: Aayush Bhisikar, Neha Rajas, Aditya Bhingare, Samarth Bhandare, Amruta Amrurkar

Abstract:

Orange peels are currently thrown away as garbage in India after orange fruits' edible components are consumed. However, the nation depends on important essential oils for usage in companies that produce goods, including food, beverages, cosmetics, and medicines. This study was conducted to show how to effectively use it. By using various extraction techniques, orange peel is used in the creation of essential oils. Stream distillation, water distillation, and solvent extraction were the techniques taken into consideration in this paper. Due to its relative prevalence among the extraction techniques, Design Expert 7.0 was used to plan an experimental run for solvent extraction. Oil was examined to ascertain its physical and chemical characteristics after extraction. It was determined from the outcomes that the orange peels.

Keywords: orange peels, extraction, essential oil, distillation

Procedia PDF Downloads 75