Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 11090

Search results for: key information documents

11060 Binarization and Recognition of Characters from Historical Degraded Documents

Abstract:

Degradations in historical document images appear due to aging of the documents. It is very difficult to understand and retrieve text from badly degraded documents as there is variation between the document foreground and background. Thresholding of such document images either result in broken characters or detection of false texts. Numerous algorithms exist that can separate text and background efficiently in the textual regions of the document; but portions of background are mistaken as text in areas that hardly contain any text. This paper presents a way to overcome these problems by a robust binarization technique that recovers the text from a severely degraded document images and thereby increases the accuracy of optical character recognition systems. The proposed document recovery algorithm efficiently removes degradations from document images. Here we are using the ostus method ,local thresholding and global thresholding and after the binarization training and recognizing the characters in the degraded documents.

Keywords: binarization, denoising, global thresholding, local thresholding, thresholding

Procedia PDF Downloads 312

11059 One-Class Support Vector Machine for Sentiment Analysis of Movie Review Documents

Authors: Chothmal, Basant Agarwal

Abstract:

Sentiment analysis means to classify a given review document into positive or negative polar document. Sentiment analysis research has been increased tremendously in recent times due to its large number of applications in the industry and academia. Sentiment analysis models can be used to determine the opinion of the user towards any entity or product. E-commerce companies can use sentiment analysis model to improve their products on the basis of users’ opinion. In this paper, we propose a new One-class Support Vector Machine (One-class SVM) based sentiment analysis model for movie review documents. In the proposed approach, we initially extract features from one class of documents, and further test the given documents with the one-class SVM model if a given new test document lies in the model or it is an outlier. Experimental results show the effectiveness of the proposed sentiment analysis model.

Keywords: feature selection methods, machine learning, NB, one-class SVM, sentiment analysis, support vector machine

Procedia PDF Downloads 481

11058 A Method to Evaluate and Compare Web Information Extractors

Authors: Patricia Jiménez, Rafael Corchuelo, Hassan A. Sleiman

Abstract:

Web mining is gaining importance at an increasing pace. Currently, there are many complementary research topics under this umbrella. Their common theme is that they all focus on applying knowledge discovery techniques to data that is gathered from the Web. Sometimes, these data are relatively easy to gather, chiefly when it comes from server logs. Unfortunately, there are cases in which the data to be mined is the data that is displayed on a web document. In such cases, it is necessary to apply a pre-processing step to first extract the information of interest from the web documents. Such pre-processing steps are performed using so-called information extractors, which are software components that are typically configured by means of rules that are tailored to extracting the information of interest from a web page and structuring it according to a pre-defined schema. Paramount to getting good mining results is that the technique used to extract the source information is exact, which requires to evaluate and compare the different proposals in the literature from an empirical point of view. According to Google Scholar, about 4 200 papers on information extraction have been published during the last decade. Unfortunately, they were not evaluated within a homogeneous framework, which leads to difficulties to compare them empirically. In this paper, we report on an original information extraction evaluation method. Our contribution is three-fold: a) this is the first attempt to provide an evaluation method for proposals that work on semi-structured documents; the little existing work on this topic focuses on proposals that work on free text, which has little to do with extracting information from semi-structured documents. b) It provides a method that relies on statistically sound tests to support the conclusions drawn; the previous work does not provide clear guidelines or recommend statistically sound tests, but rather a survey that collects many features to take into account as well as related work; c) We provide a novel method to compute the performance measures regarding unsupervised proposals; otherwise they would require the intervention of a user to compute them by using the annotations on the evaluation sets and the information extracted. Our contributions will definitely help researchers in this area make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it will also help practitioners make informed decisions on which proposal is the most adequate for a particular problem. This conference is a good forum to discuss on our ideas so that we can spread them to help improve the evaluation of information extraction proposals and gather valuable feedback from other researchers.

Keywords: web information extractors, information extraction evaluation method, Google scholar, web

Procedia PDF Downloads 226

11057 BIM-Based Tool for Sustainability Assessment and Certification Documents Provision

Authors: Taki Eddine Seghier, Mohd Hamdan Ahmad, Yaik-Wah Lim, Samuel Opeyemi Williams

Abstract:

The assessment of building sustainability to achieve a specific green benchmark and the preparation of the required documents in order to receive a green building certification, both are considered as major challenging tasks for green building design team. However, this labor and time-consuming process can take advantage of the available Building Information Modeling (BIM) features such as material take-off and scheduling. Furthermore, the workflow can be automated in order to track potentially achievable credit points and provide rating feedback for several design options by using integrated Visual Programing (VP) to handle the stored parameters within the BIM model. Hence, this study proposes a BIM-based tool that uses Green Building Index (GBI) rating system requirements as a unique input case to evaluate the building sustainability in the design stage of the building project life cycle. The tool covers two key models for data extraction, firstly, a model for data extraction, calculation and the classification of achievable credit points in a green template, secondly, a model for the generation of the required documents for green building certification. The tool was validated on a BIM model of residential building and it serves as proof of concept that building sustainability assessment of GBI certification can be automatically evaluated and documented through BIM.

Keywords: green building rating system, GBRS, building information modeling, BIM, visual programming, VP, sustainability assessment

Procedia PDF Downloads 302

11056 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification

Authors: A. Elsehemy, M. Abdeen , T. Nazmy

Abstract:

Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.

Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontology

Procedia PDF Downloads 499

11055 Model-Based Field Extraction from Different Class of Administrative Documents

Authors: Jinen Daghrir, Anis Kricha, Karim Kalti

Abstract:

The amount of incoming administrative documents is massive and manually processing these documents is a costly task especially on the timescale. In fact, this problem has led an important amount of research and development in the context of automatically extracting fields from administrative documents, in order to reduce the charges and to increase the citizen satisfaction in administrations. In this matter, we introduce an administrative document understanding system. Given a document in which a user has to select fields that have to be retrieved from a document class, a document model is automatically built. A document model is represented by an attributed relational graph (ARG) where nodes represent fields to extract, and edges represent the relation between them. Both of vertices and edges are attached with some feature vectors. When another document arrives to the system, the layout objects are extracted and an ARG is generated. The fields extraction is translated into a problem of matching two ARGs which relies mainly on the comparison of the spatial relationships between layout objects. Experimental results yield accuracy rates from 75% to 100% tested on eight document classes. Our proposed method has a good performance knowing that the document model is constructed using only one single document.

Keywords: administrative document understanding, logical labelling, logical layout analysis, fields extraction from administrative documents

Procedia PDF Downloads 184

11054 Providing a Secure, Reliable and Decentralized Document Management Solution Using Blockchain by a Virtual Identity Card

Authors: Meet Shah, Ankita Aditya, Dhruv Bindra, V. S. Omkar, Aashruti Seervi

Abstract:

In today's world, we need documents everywhere for a smooth workflow in the identification process or any other security aspects. The current system and techniques which are used for identification need one thing, that is ‘proof of existence’, which involves valid documents, for example, educational, financial, etc. The main issue with the current identity access management system and digital identification process is that the system is centralized in their network, which makes it inefficient. The paper presents the system which resolves all these cited issues. It is based on ‘blockchain’ technology, which is a 'decentralized system'. It allows transactions in a decentralized and immutable manner. The primary notion of the model is to ‘have everything with nothing’. It involves inter-linking required documents of a person with a single identity card so that a person can go anywhere without having the required documents with him/her. The person just needs to be physically present at a place wherein documents are necessary, and using a fingerprint impression and an iris scan print, the rest of the verification will progress. Furthermore, some technical overheads and advancements are listed. This paper also aims to layout its far-vision scenario of blockchain and its impact on future trends.

Keywords: blockchain, decentralized system, fingerprint impression, identity management, iris scan

Procedia PDF Downloads 101

11053 Methodologies for Deriving Semantic Technical Information Using an Unstructured Patent Text Data

Authors: Jaehyung An, Sungjoo Lee

Abstract:

Patent documents constitute an up-to-date and reliable source of knowledge for reflecting technological advance, so patent analysis has been widely used for identification of technological trends and formulation of technology strategies. But, identifying technological information from patent data entails some limitations such as, high cost, complexity, and inconsistency because it rely on the expert’ knowledge. To overcome these limitations, researchers have applied to a quantitative analysis based on the keyword technique. By using this method, you can include a technological implication, particularly patent documents, or extract a keyword that indicates the important contents. However, it only uses the simple-counting method by keyword frequency, so it cannot take into account the sematic relationship with the keywords and sematic information such as, how the technologies are used in their technology area and how the technologies affect the other technologies. To automatically analyze unstructured technological information in patents to extract the semantic information, it should be transformed into an abstracted form that includes the technological key concepts. Specific sentence structure ‘SAO’ (subject, action, object) is newly emerged by representing ‘key concepts’ and can be extracted by NLP (Natural language processor). An SAO structure can be organized in a problem-solution format if the action-object (AO) states that the problem and subject (S) form the solution. In this paper, we propose the new methodology that can extract the SAO structure through technical elements extracting rules. Although sentence structures in the patents text have a unique format, prior studies have depended on general NLP (Natural language processor) applied to the common documents such as newspaper, research paper, and twitter mentions, so it cannot take into account the specific sentence structure types of the patent documents. To overcome this limitation, we identified a unique form of the patent sentences and defined the SAO structures in the patents text data. There are four types of technical elements that consist of technology adoption purpose, application area, tool for technology, and technical components. These four types of sentence structures from patents have their own specific word structure by location or sequence of the part of speech at each sentence. Finally, we developed algorithms for extracting SAOs and this result offer insight for the technology innovation process by providing different perspectives of technology.

Keywords: NLP, patent analysis, SAO, semantic-analysis

Procedia PDF Downloads 243

11052 Contribution of a Higher Education Institute towards Built Environment Sustainability

Authors: Tayyab Ahmad, Gerard Healey

Abstract:

The potential role of higher education institutes in sustainable development cannot be undermined. In this regard, it is important to investigate the established concept of sustainability in such institutes to explore the room for further improvement. In this paper, a case study of the University of Melbourne is conducted, and the institute’s commitments towards sustainability are examined by a detailed qualitative review of its policy and design standard documents. These documents are reviewed as through these; the institute portrays its vision of building environment facilities, which it aspires to procure and use. From detailed review, it is realized that these documents are updated at different times, creating the potential for mismatch between them. The occurrence of different goals and objectives in different documents is highlighted, and the interrelationships between different goals and operational objectives are explored. The role of the university aspired goals/objectives in terms of built environment sustainability is discussed, and the gaps in the articulation of goals and operational objectives are highlighted. Recommendations are provided for enhancing the built environment sustainability at the University of Melbourne.

Keywords: university, design standards, policy, sustainability, built environment

Procedia PDF Downloads 146

11051 Bridging the Digital Divide in India: Issus and Challenges

Authors: Parveen Kumar

Abstract:

The cope the rapid change of technology and to control the ephemeral rate of information generation, librarians along with their professional colleagues need to equip themselves as per the requirement of the electronic information society. E-learning is purely based on computer and communication technologies. The terminologies like computer based learning. It is the delivery of content via all electronic media through internet, internet, Extranets television broadcast, CD-Rom documents, etc. E-learning poses lot of issues in the transformation of literature or knowledge from the conventional medium to ICT based format and web based services.

Keywords: e-learning, digital libraries, online learning, electronic information society

Procedia PDF Downloads 483

11050 Progressive Multimedia Collection Structuring via Scene Linking

Authors: Aman Berhe, Camille Guinaudeau, Claude Barras

Abstract:

In order to facilitate information seeking in large collections of multimedia documents with long and progressive content (such as broadcast news or TV series), one can extract the semantic links that exist between semantically coherent parts of documents, i.e., scenes. The links can then create a coherent collection of scenes from which it is easier to perform content analysis, topic extraction, or information retrieval. In this paper, we focus on TV series structuring and propose two approaches for scene linking at different levels of granularity (episode and season): a fuzzy online clustering technique and a graph-based community detection algorithm. When evaluated on the two first seasons of the TV series Game of Thrones, we found that the fuzzy online clustering approach performed better compared to graph-based community detection at the episode level, while graph-based approaches show better performance at the season level.

Keywords: multimedia collection structuring, progressive content, scene linking, fuzzy clustering, community detection

Procedia PDF Downloads 70

11049 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 117

11048 Framework for Detecting External Plagiarism from Monolingual Documents: Use of Shallow NLP and N-Gram Frequency Comparison

Authors: Saugata Bose, Ritambhra Korpal

Abstract:

The internet has increased the copy-paste scenarios amongst students as well as amongst researchers leading to different levels of plagiarized documents. For this reason, much of research is focused on for detecting plagiarism automatically. In this paper, an initiative is discussed where Natural Language Processing (NLP) techniques as well as supervised machine learning algorithms have been combined to detect plagiarized texts. Here, the major emphasis is on to construct a framework which detects external plagiarism from monolingual texts successfully. For successfully detecting the plagiarism, n-gram frequency comparison approach has been implemented to construct the model framework. The framework is based on 120 characteristics which have been extracted during pre-processing the documents using NLP approach. Afterwards, filter metrics has been applied to select most relevant characteristics and then supervised classification learning algorithm has been used to classify the documents in four levels of plagiarism. Confusion matrix was built to estimate the false positives and false negatives. Our plagiarism framework achieved a very high the accuracy score.

Keywords: lexical matching, shallow NLP, supervised machine learning algorithm, word n-gram

Procedia PDF Downloads 332

11047 Towards Learning Query Expansion

Authors: Ahlem Bouziri, Chiraz Latiri, Eric Gaussier

Abstract:

The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to extract dependencies between terms, namely a generic basis of association rules between terms. The key feature of our approach is a better trade off between the size of the mining result and the conveyed knowledge. Thus, face to the huge number of derived association rules and in order to select the optimal combination of query terms from the generic basis, we propose to model the problem as a classification problem and solve it using a supervised learning algorithm such as SVM or k-means. For this purpose, we first generate a training set using a genetic algorithm based approach that explores the association rules space in order to find an optimal set of expansion terms, improving the MAP of the search results. The experiments were performed on SDA 95 collection, a data collection for information retrieval. It was found that the results were better in both terms of MAP and NDCG. The main observation is that the hybridization of text mining techniques and query expansion in an intelligent way allows us to incorporate the good features of all of them. As this is a preliminary attempt in this direction, there is a large scope for enhancing the proposed method.

Keywords: supervised leaning, classification, query expansion, association rules

Procedia PDF Downloads 300

11046 Wasting Human and Computer Resources

Authors: Mária Csernoch, Piroska Biró

Abstract:

The legends about “user-friendly” and “easy-to-use” birotical tools (computer-related office tools) have been spreading and misleading end-users. This approach has led us to the extremely high number of incorrect documents, causing serious financial losses in the creating, modifying, and retrieving processes. Our research proved that there are at least two sources of this underachievement: (1) The lack of the definition of the correctly edited, formatted documents. Consequently, end-users do not know whether their methods and results are correct or not. They are not aware of their ignorance. They are so ignorant that their ignorance does not allow them to realize their lack of knowledge. (2) The end-users’ problem-solving methods. We have found that in non-traditional programming environments end-users apply, almost exclusively, surface approach metacognitive methods to carry out their computer related activities, which are proved less effective than deep approach methods. Based on these findings we have developed deep approach methods which are based on and adapted from traditional programming languages. In this study, we focus on the most popular type of birotical documents, the text-based documents. We have provided the definition of the correctly edited text, and based on this definition, adapted the debugging method known in programming. According to the method, before the realization of text editing, a thorough debugging of already existing texts and the categorization of errors are carried out. With this method in advance to real text editing users learn the requirements of text-based documents and also of the correctly formatted text. The method has been proved much more effective than the previously applied surface approach methods. The advantages of the method are that the real text handling requires much less human and computer sources than clicking aimlessly in the GUI (Graphical User Interface), and the data retrieval is much more effective than from error-prone documents.

Keywords: deep approach metacognitive methods, error-prone birotical documents, financial losses, human and computer resources

Procedia PDF Downloads 360

11045 Precarious ID Cards - Studying Documentary Practices in India through the Lens of Internal Migration

Authors: Ambuja Raj

Abstract:

This research will attempt to understand how documents are materially indispensable civic artifacts for migrants in their encounters with the state. Documents such as ID cards are sites of mediation and bureaucratic manifestation which reveal the inherent dynamics of power between the state and a delocalized people. While ID cards allow the holder to retain a different identity and articulate their demands as a citizen, they at the same time transform subjects into ‘objects’ in the exercise of governmental power. The research is based on the study of internal migrants in India, who are ‘visible’ to the state through its host of ID documents such as the ‘Aadhaar card’, electoral IDs, Ration cards, and a variety of region-specific documents, without the possession of which, not only are they unable to access jobs, public goods and services, and accommodation, but are liable to exploitation from state forces and mediators. Through semi-structured interviews with social actors in the processes of documentation and welfare of migrants, as well as with settlements of migrants themselves located in the state of Kerala in India, the thesis will attempt to understand the salience of documentary practices in the lives of inter-state migrants who move within Indian states in the hope of bettering their economic conditions. The research will trace the material and evolving significance of ID cards in the tenacity of states dealing with these ‘illegible’ populations. It will try to bring theories of governmentality, biopolitics and Weberian bureaucracy into the migrant issue while critically grounding itself on secondary literature by scholars who have worked on South Asian ‘governments of paper’.

Keywords: migration, historiography of documents, anthropology of state, documentary practices

Procedia PDF Downloads 166

11044 Documents Emotions Classification Model Based on TF-IDF Weighting Measure

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Emotions classification of text documents is applied to reveal if the document expresses a determined emotion from its writer. As different supervised methods are previously used for emotion documents’ classification, in this research we present a novel model that supports the classification algorithms for more accurate results by the support of TF-IDF measure. Different experiments have been applied to reveal the applicability of the proposed model, the model succeeds in raising the accuracy percentage according to the determined metrics (precision, recall, and f-measure) based on applying the refinement of the lexicon, integration of lexicons using different perspectives, and applying the TF-IDF weighting measure over the classifying features. The proposed model has also been compared with other research to prove its competence in raising the results’ accuracy.

Keywords: emotion detection, TF-IDF, WEKA tool, classification algorithms

Procedia PDF Downloads 446

11043 Investigation of Topic Modeling-Based Semi-Supervised Interpretable Document Classifier

Authors: Dasom Kim, William Xiu Shun Wong, Yoonjin Hyun, Donghoon Lee, Minji Paek, Sungho Byun, Namgyu Kim

Abstract:

There have been many researches on document classification for classifying voluminous documents automatically. Through document classification, we can assign a specific category to each unlabeled document on the basis of various machine learning algorithms. However, providing labeled documents manually requires considerable time and effort. To overcome the limitations, the semi-supervised learning which uses unlabeled document as well as labeled documents has been invented. However, traditional document classifiers, regardless of supervised or semi-supervised ones, cannot sufficiently explain the reason or the process of the classification. Thus, in this paper, we proposed a methodology to visualize major topics and class components of each document. We believe that our methodology for visualizing topics and classes of each document can enhance the reliability and explanatory power of document classifiers.

Keywords: data mining, document classifier, text mining, topic modeling

Procedia PDF Downloads 363

11042 The Passive Recipient – How the Pupil Comes across in Local Swedish Health Policy Documents

Authors: Zofia Hammerin, Goran Basic, Disa Bergnehr

Abstract:

Ever since the Ottawa charter in 1986, health promotion through schools has been stressed across the globe. Both in the global and national discourse, schools are made responsible not only for providing education but also for working with pupil health and well-being. In Sweden, where the study is set, it is emphasized in national directives that promoting pupil health should be part of the school practice. Since the Swedish school system is decentralized, these directives need to be interpreted and recontextualized locally. This study aims to explore how the student comes across in Swedish local health policy documents. The data consists of 37 such documents called student health plans collected from different high schools throughout Sweden. The analysis was inspired by critical discourse analysis, and tentative results are divided into two main themes; the invisible actor and the passive recipient. The pupil is largely invisible in the documents, and the discourse instead focuses on school health service staff and, to some extent, the teachers. When the pupils are visible, they mainly come across as passive recipients of health promoting actions. Since participation, taking action, and feeling empowered are key aspects of health promotion, the findings could impact the pupils’ possibilities for health and well-being.

Keywords: health promotion, high school, student, sweden

Procedia PDF Downloads 76

11041 Authentication of Physical Objects with Dot-Based 2D Code

Authors: Michał Glet, Kamil Kaczyński

Abstract:

Counterfeit goods and documents are a global problem, which needs more and more sophisticated methods of resolving it. Existing techniques using watermarking or embedding symbols on objects are not suitable for all use cases. To address those special needs, we created complete system allowing authentication of paper documents and physical objects with flat surface. Objects are marked using orientation independent and resistant to camera noise 2D graphic codes, named DotAuth. Based on the identifier stored in 2D code, the system is able to perform basic authentication and allows to conduct more sophisticated analysis methods, e.g., relying on augmented reality and physical properties of the object. In this paper, we present the complete architecture, algorithms and applications of the proposed system. Results of the features comparison of the proposed solution and other products are presented as well, pointing to the existence of many advantages that increase usability and efficiency in the means of protecting physical objects.

Keywords: anti-forgery, authentication, paper documents, security

Procedia PDF Downloads 105

11040 Building Information Modelling in Eastern Province Municipality of KSA

Authors: Banan Aljumaiah

Abstract:

In recent years, the construction industry has leveraged the information revolution, which makes it possible to view the entire construction process of new buildings before they are built with the advent of Building Information Modelling (BIM). Although BIM is an integration of the building model with the data and documents about the building, however, its implementation is limited to individual buildings missing the large picture of the city infrastructure. This limitation of BIM led to the birth of City Information Modelling. Three years ago, Eastern Province Municipality (EPM) in Saudi Arabia mandated that all major projects be delivered with collaborative 3D BIM. After three years of implementation, EPM started to implement City Information Modelling (CIM) as a part of the Smart City Plan to link infrastructure and public services and modelling how people move around and interact with the city. This paper demonstrates a local case study of BIM implementation in EPM and its future as a part of project management automation; the paper also highlights the ambitious plan of EPM to transform CIM towards building smart cities.

Keywords: BIM, BIM to CIM

Procedia PDF Downloads 103

11039 Direct Blind Separation Methods for Convolutive Images Mixtures

Authors: Ahmed Hammed, Wady Naanaa

Abstract:

In this paper, we propose a general approach to deal with the problem of a convolutive mixture of images. We use a direct blind source separation method by adding only one non-statistical justified constraint describing the relationships between different mixing matrix at the aim to make its resolution easy. This method can be applied, provided that this constraint is known, to degraded document affected by the overlapping of text-patterns and images. This is due to chemical and physical reactions of the materials (paper, inks,...) occurring during the documents aging, and other unpredictable causes such as humidity, microorganism infestation, human handling, etc. We will demonstrate that this problem corresponds to a convolutive mixture of images. Subsequently, we will show how the validation of our method through numerical examples. We can so obtain clear images from unreadable ones which can be caused by pages superposition, a phenomenon similar to that we find every often in archival documents.

Keywords: blind source separation, convoluted mixture, degraded documents, text-patterns overlapping

Procedia PDF Downloads 299

11038 Structural Challenges of Social Integration of Immigrants in Iran: Investigating the Status of Providing Citizenship and Social Services

Authors: Iman Shabanzadeh

Abstract:

In terms of its geopolitical position, Iran has been one of the main centers of migration movements in the world in recent decades. However, the policy makers' lack of preparation in completing the cycle of social integration of these immigrants, especially the second and third generation, has caused these people to always be prone to leave the country and immigrate to developed and industrialized countries. In this research, the issue of integration of immigrants in Iran from the perspective of four indicators, "Identity Documents", "Access to Banking Services", "Access to Health and Treatment Services" and "Obtaining a Driver's License" will be analyzed. The research method is descriptive-analytical. To collect information, library and document sources in the field of laws and regulations related to immigrants' rights in Iran, semi-structured interviews with experts have been used. The investigations of this study show that none of the residence documents of immigrants in Iran guarantee the full enjoyment of basic citizenship rights for them. In fact, the function of many of these identity documents, such as the census card, educational support card, etc., is only to prevent crossing the border, and none of them guarantee the basic rights of citizenship. Therefore, for many immigrants, the difference between legality and illegality is only in the risk of crossing the border, and this has led to the spread of the habit of illegal presence for them. Despite this, it seems that there is no clear and coherent policy framework around the issue of foreign immigrants in the country. This policy incoherence can be clearly seen in the diversity and plurality of identity and legal documents of the citizens present in the country and the policy maker's lack of planning to integrate and organize the identity of this huge group. Examining the differences and socioeconomic inequalities between immigrants and the native Iranian population shows that immigrants have been poorly integrated into the structures of Iranian society from an economic and social point of view.

Keywords: immigrants, social integration, citizen services, structural inequality

Procedia PDF Downloads 23

11037 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari

Abstract:

Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 129

11036 An Exploration of Policy-related Documents on District Heating and Cooling in Flanders: a Slow and Bottom-up Process

Authors: Isaura Bonneux

Abstract:

District heating and cooling (DHC) is increasingly recognized as a viable path towards sustainable heating and cooling. While some countries like Sweden and Denmark have a longstanding tradition of DHC, Belgium is lacking behind. The Northern part of Belgium, Flanders, had only a total of 95 heating networks in July 2023. Nevertheless, it is increasingly exploring its possibilities to enhance the scope of DHC. DHC is a complex energy system, requiring a lot of collaboration between various stakeholders on various levels. Therefore, it is of interest to look closer at policy-related documents at the Flemish (regional) level, as these policies set the scene for DHC development in the Flemish region. This kind of analysis has not been undertaken so far. This paper has the following research question: “Who talks about DHC, and in which way and context is DHC discussed in Flemish policy-related documents?” To answer this question, the Overton policy database was used to search and retrieve relevant policy-related documents. Overton retrieves data from governments, think thanks, NGOs, and IGOs. In total, out of the 244 original results, 117 documents between 2009 and 2023 were analyzed. Every selected document included theme keywords, policymaking department(s), date, and document type. These elements were used for quantitative data description and visualization. Further, qualitative content analysis revealed patterns and main themes regarding DHC in Flanders. Four main conclusions can be drawn: First, it is obvious from the timeframe that DHC is a new topic in Flanders with still limited attention; 2014, 2016 and 2017 were the years with the most documents, yet this number is still only 12 documents. In addition, many documents talked about DHC but not much in depth and painted it as a future scenario with a lot of uncertainty around it. The largest part of the issuing government departments had a link to either energy or climate (e.g. Flemish Environmental Agency) or policy (e.g. Socio-Economic Council of Flanders) Second, DHC is mentioned most within an ‘Environment and Sustainability’ context, followed by ‘General Policy and Regulation’. This is intuitive, as DHC is perceived as a sustainable heating and cooling technique and this analysis compromises policy-related documents. Third, Flanders seems mostly interested in using waste or residual heat as a heating source for DHC. The harbors and waste incineration plants are identified as potential and promising supply sources. This approach tries to conciliate environmental and economic incentives. Last, local councils get assigned a central role and the initiative is mostly taken by them. The policy documents and policy advices demonstrate that Flanders opts for a bottom-up organization. As DHC is very dependent on local conditions, this seems a logic step. Nevertheless, this can impede smaller councils to create DHC networks and slow down systematic and fast implementation of DHC throughout Flanders.

Keywords: district heating and cooling, flanders, overton database, policy analysis

Procedia PDF Downloads 13

11035 Psychodidactic Strategies to Facilitate Flow of Logical Thinking in Preparation of Academic Documents

Authors: Deni Stincer Gomez, Zuraya Monroy Nasr, Luis Pérez Alvarez

Abstract:

The preparation of academic documents such as thesis, articles and research projects is one of the requirements of the higher educational level. These documents demand the implementation of logical argumentative thinking which is experienced and executed with difficulty. To mitigate the effect of these difficulties this study designed a thesis seminar, with which the authors have seven years of experience. It is taught in a graduate program in Psychology at the National Autonomous University of Mexico. In this study the authors use the Toulmin model as a mental heuristic and for the application of a set of psychodidactic strategies that facilitate the elaboration of the plot and culmination of the thesis. The efficiency in obtaining the degree in the groups exposed to the seminar has increased by 94% compared to the 10% that existed in the generations that were not exposed to the seminar. In this article the authors will emphasize the psychodidactic strategies used. The Toulmin model alone does not guarantee the success achieved. A set of actions of a psychological nature (almost psychotherapeutic) and didactics of the teacher also seem to contribute. These are actions that derive from an understanding of the psychological, epistemological and ontogenetic obstacles and the most frequent errors in which thought tends to fall when it is demanded a logical course. The authors have grouped the strategies into three groups: 1) strategies to facilitate logical thinking, 2) strategies to strengthen the scientific self and 3) strategies to facilitate the act of writing the text. In this work the authors delve into each of them.

Keywords: psychodidactic strategies, logical thinking, academic documents, Toulmin model

Procedia PDF Downloads 155

11034 Assessing the Current State of Software Engineering and Information Technology in Ghana

Authors: David Yartel

Abstract:

Drawing on the current state of software engineering and information technology in Ghana, the study documents its significant contribution to the development of Ghanaian industries. The study focuses on the application of modern trends in technology and the barriers faced in the area of software engineering and information technology. A thorough analysis of a dozen of interviews with stakeholders in software engineering and information technology via interviews reveals how modern trends in software engineering pose challenges to the industry in Ghana. Results show that to meet the expectation of modern software engineering and information technology trends, stakeholders must have skilled professionals, adequate infrastructure, and enhanced support for technology startups. Again, individuals should be encouraged to pursue a career in software engineering and information technology, as it has the propensity to increase the efficiency and effectiveness of work-related activities. This study recommends that stakeholders in software engineering and technology industries should invest enough in training more professionals by collaborating with international institutions well-versed in the area by organizing frequent training and seminars. The government should also provide funding opportunities for small businesses in the technology sector to drive creativity and development in order to bring about growth and development.

Keywords: software engineering, information technology, Ghana, development

Procedia PDF Downloads 59

11033 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 119

11032 Digital Preservation: Requirement of 21st Century

Authors: Gaurav Kumar, Shilpa

Abstract:

Digital libraries have been established all over the world to create, maintain and to preserve the digital materials. This paper focuses on operational digital preservation systems specifically in educational organizations in India. It considers the broad range of digital objects including e-journals, technical reports, e-records, project documents, scientific data, etc. This paper describes the main objectives, process and technological issues involved in preservation of digital materials. Digital preservation refers to the various methods of keeping digital materials alive for the future. It includes everything from electronic publications on CD-ROM to Online database and collections of experimental data in digital format maintains the ability to display, retrieve and use digital collections in the face of rapidly changing technological and organizational infrastructures elements. This paper exhibits the importance and objectives of digital preservation. The necessities of preservation are hardware and software technology to interpret the digital documents and discuss various aspects of digital preservation.

Keywords: preservation, digital preservation, digital dark age, conservation, archive, repository, document, information technology, hardware, software, organization, machine readable format

Procedia PDF Downloads 423

11031 Management Software for the Elaboration of an Electronic File in the Pharmaceutical Industry Following Mexican Regulations

Authors: M. Peña Aguilar Juan, Ríos Hernández Ezequiel, R. Valencia Luis

Abstract:

For certification, certain goods of public interest, such as medicines and food, it is required the preparation and delivery of a dossier. For its elaboration, legal and administrative knowledge must be taken, as well as organization of the documents of the process, and an order that allows the file verification. Therefore, a virtual platform was developed to support the process of management and elaboration of the dossier, providing accessibility to the information and interfaces that allow the user to know the status of projects. The development of dossier system on the cloud allows the inclusion of the technical requirements for the software management, including the validation and the manufacturing in the field industry. The platform guides and facilitates the dossier elaboration (report, file or history), considering Mexican legislation and regulations, it also has auxiliary tools for its management. This technological alternative provides organization support for documents and accessibility to the information required to specify the successful development of a dossier. The platform divides into the following modules: System control, catalog, dossier and enterprise management. The modules are designed per the structure required in a dossier in those areas. However, the structure allows for flexibility, as its goal is to become a tool that facilitates and does not obstruct processes. The architecture and development of the software allows flexibility for future work expansion to other fields, this would imply feeding the system with new regulations.

Keywords: electronic dossier, cloud management software, pharmaceutical industry, sanitary registration

Procedia PDF Downloads 266