Search results for: scene text
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 671

Search results for: scene text

671 An Edge-based Text Region Extraction Algorithm for Indoor Mobile Robot Navigation

Authors: Jagath Samarabandu, Xiaoqing Liu

Abstract:

Using bottom-up image processing algorithms to predict human eye fixations and extract the relevant embedded information in images has been widely applied in the design of active machine vision systems. Scene text is an important feature to be extracted, especially in vision-based mobile robot navigation as many potential landmarks such as nameplates and information signs contain text. This paper proposes an edge-based text region extraction algorithm, which is robust with respect to font sizes, styles, color/intensity, orientations, and effects of illumination, reflections, shadows, perspective distortion, and the complexity of image backgrounds. Performance of the proposed algorithm is compared against a number of widely used text localization algorithms and the results show that this method can quickly and effectively localize and extract text regions from real scenes and can be used in mobile robot navigation under an indoor environment to detect text based landmarks.

Keywords: Landmarks, mobile robot navigation, scene text, text localization and extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2862
670 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches, and provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scene segments consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics by statistical learning method. To tackle this problem, we propose a method to improve topic quality with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, more accurate topical representations lead to get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. By iteratively inferring topics and determining semantically neighborhood scene segments, we draw a topic space represents broadcasting contents well. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: Broadcasting contents, generalized P´olya urn model, scripts, text similarity, topic model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774
669 Continuous Text Translation Using Text Modeling in the Thetos System

Authors: Nina Suszczanska, Przemyslaw Szmal, Slawomir Kulikow

Abstract:

In the paper a method of modeling text for Polish is discussed. The method is aimed at transforming continuous input text into a text consisting of sentences in so called canonical form, whose characteristic is, among others, a complete structure as well as no anaphora or ellipses. The transformation is lossless as to the content of text being transformed. The modeling method has been worked out for the needs of the Thetos system, which translates Polish written texts into the Polish sign language. We believe that the method can be also used in various applications that deal with the natural language, e.g. in a text summary generator for Polish.

Keywords: anaphora, machine translation, NLP, sign language, text syntax.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1601
668 Partial 3D Reconstruction using Evolutionary Algorithms

Authors: Mónica Pérez-Meza, Rodrigo Montúfar-Chaveznava

Abstract:

When reconstructing a scenario, it is necessary to know the structure of the elements present on the scene to have an interpretation. In this work we link 3D scenes reconstruction to evolutionary algorithms through the vision stereo theory. We consider vision stereo as a method that provides the reconstruction of a scene using only a couple of images of the scene and performing some computation. Through several images of a scene, captured from different positions, vision stereo can give us an idea about the threedimensional characteristics of the world. Vision stereo usually requires of two cameras, making an analogy to the mammalian vision system. In this work we employ only a camera, which is translated along a path, capturing images every certain distance. As we can not perform all computations required for an exhaustive reconstruction, we employ an evolutionary algorithm to partially reconstruct the scene in real time. The algorithm employed is the fly algorithm, which employ “flies" to reconstruct the principal characteristics of the world following certain evolutionary rules.

Keywords: 3D Reconstruction, Computer Vision, EvolutionaryAlgorithms, Vision Stereo.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1827
667 A Bionic Approach to Dynamic, Multimodal Scene Perception and Interpretation in Buildings

Authors: Rosemarie Velik, Dietmar Bruckner

Abstract:

Today, building automation is advancing from simple monitoring and control tasks of lightning and heating towards more and more complex applications that require a dynamic perception and interpretation of different scenes occurring in a building. Current approaches cannot handle these newly upcoming demands. In this article, a bionically inspired approach for multimodal, dynamic scene perception and interpretation is presented, which is based on neuroscientific and neuro-psychological research findings about the perceptual system of the human brain. This approach bases on data from diverse sensory modalities being processed in a so-called neuro-symbolic network. With its parallel structure and with its basic elements being information processing and storing units at the same time, a very efficient method for scene perception is provided overcoming the problems and bottlenecks of classical dynamic scene interpretation systems.

Keywords: building automation, biomimetrics, dynamic scene interpretation, human-like perception, neuro-symbolic networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1565
666 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert

Abstract:

This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 690
665 Effect of Scene Changing on Image Sequences Compression Using Zero Tree Coding

Authors: Mbainaibeye Jérôme, Noureddine Ellouze

Abstract:

We study in this paper the effect of the scene changing on image sequences coding system using Embedded Zerotree Wavelet (EZW). The scene changing considered here is the full motion which may occurs. A special image sequence is generated where the scene changing occurs randomly. Two scenarios are considered: In the first scenario, the system must provide the reconstruction quality as best as possible by the management of the bit rate (BR) while the scene changing occurs. In the second scenario, the system must keep the bit rate as constant as possible by the management of the reconstruction quality. The first scenario may be motivated by the availability of a large band pass transmission channel where an increase of the bit rate may be possible to keep the reconstruction quality up to a given threshold. The second scenario may be concerned by the narrow band pass transmission channel where an increase of the bit rate is not possible. In this last case, applications for which the reconstruction quality is not a constraint may be considered. The simulations are performed with five scales wavelet decomposition using the 9/7-tap filter bank biorthogonal wavelet. The entropy coding is performed using a specific defined binary code book and EZW algorithm. Experimental results are presented and compared to LEAD H263 EVAL. It is shown that if the reconstruction quality is the constraint, the system increases the bit rate to obtain the required quality. In the case where the bit rate must be constant, the system is unable to provide the required quality if the scene change occurs; however, the system is able to improve the quality while the scene changing disappears.

Keywords: Image Sequence Compression, Wavelet Transform, Scene Changing, Zero Tree, Bit Rate, Quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1310
664 On-Road Text Detection Platform for Driver Assistance Systems

Authors: Guezouli Larbi, Belkacem Soundes

Abstract:

The automation of the text detection process can help the human in his driving task. Its application can be very useful to help drivers to have more information about their environment by facilitating the reading of road signs such as directional signs, events, stores, etc. In this paper, a system consisting of two stages has been proposed. In the first one, we used pseudo-Zernike moments to pinpoint areas of the image that may contain text. The architecture of this part is based on three main steps, region of interest (ROI) detection, text localization, and non-text region filtering. Then, in the second step, we present a convolutional neural network architecture (On-Road Text Detection Network - ORTDN) which is considered as a classification phase. The results show that the proposed framework achieved ≈ 35 fps and an mAP of ≈ 90%, thus a low computational time with competitive accuracy.

Keywords: Text detection, CNN, PZM, deep learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 81
663 Powerful Tool to Expand Business Intelligence: Text Mining

Authors: Li Gao, Elizabeth Chang, Song Han

Abstract:

With the extensive inclusion of document, especially text, in the business systems, data mining does not cover the full scope of Business Intelligence. Data mining cannot deliver its impact on extracting useful details from the large collection of unstructured and semi-structured written materials based on natural languages. The most pressing issue is to draw the potential business intelligence from text. In order to gain competitive advantages for the business, it is necessary to develop the new powerful tool, text mining, to expand the scope of business intelligence. In this paper, we will work out the strong points of text mining in extracting business intelligence from huge amount of textual information sources within business systems. We will apply text mining to each stage of Business Intelligence systems to prove that text mining is the powerful tool to expand the scope of BI. After reviewing basic definitions and some related technologies, we will discuss the relationship and the benefits of these to text mining. Some examples and applications of text mining will also be given. The motivation behind is to develop new approach to effective and efficient textual information analysis. Thus we can expand the scope of Business Intelligence using the powerful tool, text mining.

Keywords: Business intelligence, document warehouse, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2607
662 Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Authors: Faisal Alshuwaier, Ali Areshey

Abstract:

Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound (BB) method to simplify the texts.

Keywords: Extraction, Max-Prod, Fuzzy Relations, Text Mining, Memberships, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2130
661 A Web Text Mining Flexible Architecture

Authors: M. Castellano, G. Mastronardi, A. Aprile, G. Tarricone

Abstract:

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.

Keywords: Web text mining, flexible architecture, knowledgediscovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2611
660 Key Based Text Watermarking of E-Text Documents in an Object Based Environment Using Z-Axis for Watermark Embedding

Authors: Mussarat Abdullah, Fazal Wahab

Abstract:

Data hiding into text documents itself involves pretty complexities due to the nature of text documents. A robust text watermarking scheme targeting an object based environment is presented in this research. The heart of the proposed solution describes the concept of watermarking an object based text document where each and every text string is entertained as a separate object having its own set of properties. Taking advantage of the z-ordering of objects watermark is applied with the z-axis letting zero fidelity disturbances to the text. Watermark sequence of bits generated against user key is hashed with selected properties of given document, to determine the bit sequence to embed. Bits are embedded along z-axis and the document has no fidelity issues when printed, scanned or photocopied.

Keywords: Digital Watermarking, Object Based Environment, Watermark, z-ordering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1650
659 Application of Smooth Ergodic Hidden Markov Model in Text to Speech Systems

Authors: Armin Ghayoori, Faramarz Hendessi, Asrar Sheikh

Abstract:

In developing a text-to-speech system, it is well known that the accuracy of information extracted from a text is crucial to produce high quality synthesized speech. In this paper, a new scheme for converting text into its equivalent phonetic spelling is introduced and developed. This method is applicable to many applications in text to speech converting systems and has many advantages over other methods. The proposed method can also complement the other methods with a purpose of improving their performance. The proposed method is a probabilistic model and is based on Smooth Ergodic Hidden Markov Model. This model can be considered as an extension to HMM. The proposed method is applied to Persian language and its accuracy in converting text to speech phonetics is evaluated using simulations.

Keywords: Hidden Markov Models, text, synthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1505
658 RB-Matcher: String Matching Technique

Authors: Rajender Singh Chillar, Barjesh Kochar

Abstract:

All Text processing systems allow their users to search a pattern of string from a given text. String matching is fundamental to database and text processing applications. Every text editor must contain a mechanism to search the current document for arbitrary strings. Spelling checkers scan an input text for words in the dictionary and reject any strings that do not match. We store our information in data bases so that later on we can retrieve the same and this retrieval can be done by using various string matching algorithms. This paper is describing a new string matching algorithm for various applications. A new algorithm has been designed with the help of Rabin Karp Matcher, to improve string matching process.

Keywords: Algorithm, Complexity, Matching-patterns, Pattern, Rabin-Karp, String, text-processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1720
657 An Semantic Algorithm for Text Categoritation

Authors: Xu Zhao

Abstract:

Text categorization techniques are widely used to many Information Retrieval (IR) applications. In this paper, we proposed a simple but efficient method that can automatically find the relationship between any pair of terms and documents, also an indexing matrix is established for text categorization. We call this method Indexing Matrix Categorization Machine (IMCM). Several experiments are conducted to show the efficiency and robust of our algorithm.

Keywords: Text categorization, Sub-space learning, Latent Semantic Space

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1414
656 Enhanced Multi-Intensity Analysis in Multi-Scenery Classification-Based Macro and Micro Elements

Authors: R. Bremananth

Abstract:

Several computationally challenging issues are encountered while classifying complex natural scenes. In this paper, we address the problems that are encountered in rotation invariance with multi-intensity analysis for multi-scene overlapping. In the present literature, various algorithms proposed techniques for multi-intensity analysis, but there are several restrictions in these algorithms while deploying them in multi-scene overlapping classifications. In order to resolve the problem of multi-scenery overlapping classifications, we present a framework that is based on macro and micro basis functions. This algorithm conquers the minimum classification false alarm while pigeonholing multi-scene overlapping. Furthermore, a quadrangle multi-intensity decay is invoked. Several parameters are utilized to analyze invariance for multi-scenery classifications such as rotation, classification, correlation, contrast, homogeneity, and energy. Benchmark datasets were collected for complex natural scenes and experimented for the framework. The results depict that the framework achieves a significant improvement on gray-level matrix of co-occurrence features for overlapping in diverse degree of orientations while pigeonholing multi-scene overlapping.

Keywords: Automatic classification, contrast, homogeneity, invariant analysis, multi-scene analysis, overlapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1068
655 Binarization of Text Region based on Fuzzy Clustering and Histogram Distribution in Signboards

Authors: Jonghyun Park, Toan Nguyen Dinh, Gueesang Lee

Abstract:

In this paper, we present a novel approach to accurately detect text regions including shop name in signboard images with complex background for mobile system applications. The proposed method is based on the combination of text detection using edge profile and region segmentation using fuzzy c-means method. In the first step, we perform an elaborate canny edge operator to extract all possible object edges. Then, edge profile analysis with vertical and horizontal direction is performed on these edge pixels to detect potential text region existing shop name in a signboard. The edge profile and geometrical characteristics of each object contour are carefully examined to construct candidate text regions and classify the main text region from background. Finally, the fuzzy c-means algorithm is performed to segment and detected binarize text region. Experimental results show that our proposed method is robust in text detection with respect to different character size and color and can provide reliable text binarization result.

Keywords: Text detection, edge profile, signboard image, fuzzy clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2180
654 Gender Differences in Spatial Navigation

Authors: Bia Kim, Sewon Lee, Jaesik Lee

Abstract:

This study aims to investigate the gender differences in spatial navigation using the tasks of 2-D matrix navigation and recognition of real driving scene. The results can be summarized as followings. First, female subjects responded faster in 2-D matrix navigation task than male subjects when landmark instructions were provided. Second, in recognition task, male subjects recognized the key elements involved in the past driving scene more accurately than female subjects. In particular, female subjects tended to miss peripheral information. These results suggest the possibility of gender differences in spatial navigation.

Keywords: Gender differences, Spatial navigation, 2-D matrixnavigation, Recognition of driving scene.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2684
653 Text Retrieval Relevance Feedback Techniques for Bag of Words Model in CBIR

Authors: Nhu Van NGUYEN, Jean-Marc OGIER, Salvatore TABBONE, Alain BOUCHER

Abstract:

The state-of-the-art Bag of Words model in Content- Based Image Retrieval has been used for years but the relevance feedback strategies for this model are not fully investigated. Inspired from text retrieval, the Bag of Words model has the ability to use the wealth of knowledge and practices available in text retrieval. We study and experiment the relevance feedback model in text retrieval for adapting it to image retrieval. The experiments show that the techniques from text retrieval give good results for image retrieval and that further improvements is possible.

Keywords: Relevance feedback, bag of words model, probabilistic model, vector space model, image retrieval

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2057
652 A System to Adapt Techniques of Text Summarizing to Polish

Authors: Marcin Ciura, Damian Grund, S

Abstract:

This paper describes a system, in which various methods of text summarizing can be adapted to Polish. A structure of the system is presented. A modular construction of the system and access to the system via the Internet are signaled.

Keywords: Automatic summary generation, linguistic analysis, text generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502
651 A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Authors: M. F. Zaiyadi, B. Baharudin

Abstract:

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Keywords: Ant colony optimization, feature selection, information gain, text categorization, text representation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2011
650 Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge

Authors: Lu Zhang, Chunping Li, Jun Liu, Hui Wang

Abstract:

Text similarity measurement is a fundamental issue in many textual applications such as document clustering, classification, summarization and question answering. However, prevailing approaches based on Vector Space Model (VSM) more or less suffer from the limitation of Bag of Words (BOW), which ignores the semantic relationship among words. Enriching document representation with background knowledge from Wikipedia is proven to be an effective way to solve this problem, but most existing methods still cannot avoid similar flaws of BOW in a new vector space. In this paper, we propose a novel text similarity measurement which goes beyond VSM and can find semantic affinity between documents. Specifically, it is a unified graph model that exploits Wikipedia as background knowledge and synthesizes both document representation and similarity computation. The experimental results on two different datasets show that our approach significantly improves VSM-based methods in both text clustering and classification.

Keywords: Text classification, Text clustering, Text similarity, Wikipedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2058
649 Target Detection using Adaptive Progressive Thresholding Based Shifted Phase-Encoded Fringe-Adjusted Joint Transform Correlator

Authors: Inder K. Purohit, M. Nazrul Islam, K. Vijayan Asari, Mohammad A. Karim

Abstract:

A new target detection technique is presented in this paper for the identification of small boats in coastal surveillance. The proposed technique employs an adaptive progressive thresholding (APT) scheme to first process the given input scene to separate any objects present in the scene from the background. The preprocessing step results in an image having only the foreground objects, such as boats, trees and other cluttered regions, and hence reduces the search region for the correlation step significantly. The processed image is then fed to the shifted phase-encoded fringe-adjusted joint transform correlator (SPFJTC) technique which produces single and delta-like correlation peak for a potential target present in the input scene. A post-processing step involves using a peak-to-clutter ratio (PCR) to determine whether the boat in the input scene is authorized or unauthorized. Simulation results are presented to show that the proposed technique can successfully determine the presence of an authorized boat and identify any intruding boat present in the given input scene.

Keywords: Adaptive progressive thresholding, fringe adjusted filters, image segmentation, joint transform correlation, synthetic discriminant function

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1167
648 Dynamic Decompression for Text Files

Authors: Ananth Kamath, Ankit Kant, Aravind Srivatsa, Harisha J.A

Abstract:

Compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv (LZ) family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. Decompression is also required to retrieve the original data by lossless means. A compression scheme for text files coupled with the principle of dynamic decompression, which decompresses only the section of the compressed text file required by the user instead of decompressing the entire text file. Dynamic decompressed files offer better disk space utilization due to higher compression ratios compared to most of the currently available text file formats.

Keywords: Compression, Dynamic Decompression, Text file format, Portable Document Format, Compression Ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1712
647 Journals Subheadlines Text Extraction Using Wavelet Thresholding and New Projection Profile

Authors: Davod Zaravi, Habib Rostami, Alireza Malahzaheh, S. S. Mortazavi

Abstract:

In this paper a new robust and efficient algorithm to automatic text extraction from colored book and journal cover sheets is proposed. First, we perform wavelet transform. Next for edge detecting from detail wavelet coefficient, we use dynamic threshold. By blurring approximate coefficients with alternative heuristic thresholding, achieve effective edge,. Afterward, with ROI technique get binary image. Finally text boxes would be extracted with new projection profile.

Keywords: Text extraction, colored cover sheet, wavelet threshold, region of interest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1585
646 Text-Mining Approach for Evaluation of Affective Management Practices

Authors: Masaaki Saito, Qin Tang, Hiroyuki Umemuro

Abstract:

The purpose of this paper is to propose a text mining approach to evaluate companies- practices on affective management. Affective management argues that it is critical to take stakeholders- affects into consideration during decision-making process, along with the traditional numerical and rational indices. CSR reports published by companies were collected as source information. Indices were proposed based on the frequency and collocation of words relevant to affective management concept using text mining approach to analyze the text information of CSR reports. In addition, the relationships between the results obtained using proposed indices and traditional indicators of business performance were investigated using correlation analysis. Those correlations were also compared between manufacturing and non-manufacturing companies. The results of this study revealed the possibility to evaluate affective management practices of companies based on publicly available text documents.

Keywords: Affective management, Affect, Stakeholder, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1803
645 Meta-Classification using SVM Classifiers for Text Documents

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.

Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1558
644 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1938
643 A Content Vector Model for Text Classification

Authors: Eric Jiang

Abstract:

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1838
642 Narrative and Expository Text Reading Comprehension by Fourth Grade Spanish-Speaking Children

Authors: Mariela V. De Mier, Veronica S. Sanchez Abchi, Ana M. Borzone

Abstract:

This work aims to explore the factors that have an incidence in reading comprehension process, with different type of texts. In a recent study with 2nd, 3rd and 4th grade children, it was observed that reading comprehension of narrative texts was better than comprehension of expository texts. Nevertheless it seems that not only the type of text but also other textual factors would account for comprehension depending on the cognitive processing demands posed by the text. In order to explore this assumption, three narrative and three expository texts were elaborated with different degree of complexity. A group of 40 fourth grade Spanish-speaking children took part in the study. Children were asked to read the texts and answer orally three literal and three inferential questions for each text. The quantitative and qualitative analysis of children responses showed that children had difficulties in both, narrative and expository texts. The problem was to answer those questions that involved establishing complex relationships among information units that were present in the text or that should be activated from children’s previous knowledge to make an inference. Considering the data analysis, it could be concluded that there is some interaction between the type of text and the cognitive processing load of a specific text.

Keywords: comprehension, textual factors, type of text, processing demands.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1344