Search results for: Text extraction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1285

Search results for: Text extraction

1255 CFD Simulation of Dense Gas Extraction through Polymeric Membranes

Authors: Azam Marjani, Saeed Shirazian

Abstract:

In this study is presented a general methodology to predict the performance of a continuous near-critical fluid extraction process to remove compounds from aqueous solutions using hollow fiber membrane contactors. A comprehensive 2D mathematical model was developed to study Porocritical extraction process. The system studied in this work is a membrane based extractor of ethanol and acetone from aqueous solutions using near-critical CO2. Predictions of extraction percentages obtained by simulations have been compared to the experimental values reported by Bothun et al. [5]. Simulations of extraction percentage of ethanol and acetone show an average difference of 9.3% and 6.5% with the experimental data, respectively. More accurate predictions of the extraction of acetone could be explained by a better estimation of the transport properties in the aqueous phase that controls the extraction of this solute.

Keywords: Solvent extraction, Membrane, Mass transfer, Densegas, Modeling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1558
1254 A Supervised Text-Independent Speaker Recognition Approach

Authors: Tudor Barbu

Abstract:

We provide a supervised speech-independent voice recognition technique in this paper. In the feature extraction stage we propose a mel-cepstral based approach. Our feature vector classification method uses a special nonlinear metric, derived from the Hausdorff distance for sets, and a minimum mean distance classifier.

Keywords: Text-independent speaker recognition, mel cepstral analysis, speech feature vector, Hausdorff-based metric, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1804
1253 Thermodynamic Study of Seed Oil Extraction by Organic Solvents

Authors: Zhila Safari, Ali Ashrafizadeh, Najaf Hedayat

Abstract:

Thermodynamics characterization Sesame oil extraction by Acetone, Hexane and Benzene has been evaluated. The 120 hours experimental Data were described by a simple mathematical model. According to the simulation results and the essential criteria, Acetone is superior to other solvents but under certain conditions where oil extraction takes place Hexane is superior catalyst.

Keywords: Liquid-solid extraction, seed oil, ThermodynamicStudy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2040
1252 Binarization of Text Region based on Fuzzy Clustering and Histogram Distribution in Signboards

Authors: Jonghyun Park, Toan Nguyen Dinh, Gueesang Lee

Abstract:

In this paper, we present a novel approach to accurately detect text regions including shop name in signboard images with complex background for mobile system applications. The proposed method is based on the combination of text detection using edge profile and region segmentation using fuzzy c-means method. In the first step, we perform an elaborate canny edge operator to extract all possible object edges. Then, edge profile analysis with vertical and horizontal direction is performed on these edge pixels to detect potential text region existing shop name in a signboard. The edge profile and geometrical characteristics of each object contour are carefully examined to construct candidate text regions and classify the main text region from background. Finally, the fuzzy c-means algorithm is performed to segment and detected binarize text region. Experimental results show that our proposed method is robust in text detection with respect to different character size and color and can provide reliable text binarization result.

Keywords: Text detection, edge profile, signboard image, fuzzy clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2202
1251 Effect of Enzyme and Heat Pretreatment on Sunflower Oil Recovery Using Aqueous and Hexane Extractions

Authors: E. Danso-Boateng

Abstract:

The effects of enzyme action and heat pretreatment on oil extraction yield from sunflower kernels were analysed using hexane extraction with Soxhlet, and aqueous extraction with incubator shaker. Ground kernels of raw and heat treated kernels, each with and without Viscozyme treatment were used. Microscopic images of the kernels were taken to analyse the visible effects of each treatment on the cotyledon cell structure of the kernels. Heat pretreated kernels before both extraction processes produced enhanced oil extraction yields than the control, with steam explosion the most efficient. In hexane extraction, applying a combination of steam explosion and Viscozyme treatments to the kernels before the extraction gave the maximum oil extractable in 1 hour; while for aqueous extraction, raw kernels treated with Viscozyme gave the highest oil extraction yield. Remarkable cotyledon cell disruption was evident in kernels treated with Viscozyme; whereas steam explosion and conventional heat treated kernels had similar effects.

Keywords: Enzyme-assisted aqueous and hexane extraction, heatpretreatment, sunflower cotyledon structure, sunflower oil extraction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3445
1250 A Framework for Urdu Language Translation using LESSA

Authors: Imran Sarwar Bajwa

Abstract:

Internet is one of the major sources of information for the person belonging to almost all the fields of life. Major language that is used to publish information on internet is language. This thing becomes a problem in a country like Pakistan, where Urdu is the national language. Only 10% of Pakistan mass can understand English. The reason is millions of people are deprived of precious information available on internet. This paper presents a system for translation from English to Urdu. A module LESSA is used that uses a rule based algorithm to read the input text in English language, understand it and translate it into Urdu language. The designed approach was further incorporated to translate the complete website from English language o Urdu language. An option appears in the browser to translate the webpage in a new window. The designed system will help the millions of users of internet to get benefit of the internet and approach the latest information and knowledge posted daily on internet.

Keywords: Natural Language Translation, Text Understanding, Knowledge extraction, Text Processing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2640
1249 Subcritical Water Extraction of Mannitol from Olive Leaves

Authors: S. M. Ghoreishi, R. Gholami Shahrestani, S. H. Ghaziaskar

Abstract:

Subcritical water extraction was investigated as a novel and alternative technology in the food and pharmaceutical industry for the separation of Mannitol from olive leaves and its results was compared with those of Soxhlet extraction. The effects of temperature, pressure, and flow rate of water and also momentum and mass transfer dimensionless variables such as Reynolds and Peclet Numbers on extraction yield and equilibrium partition coefficient were investigated. The 30-110 bars, 60-150°C, and flow rates of 0.2-2 mL/min were the water operating conditions. The results revealed that the highest Mannitol yield was obtained at 100°C and 50 bars. However, extraction of Mannitol was not influenced by the variations of flow rate. The mathematical modeling of experimental measurements was also investigated and the model is capable of predicting the experimental measurements very well. In addition, the results indicated higher extraction yield for the subcritical water extraction in contrast to Soxhlet method.

Keywords: Extraction, Mannitol, Modeling, Olive leaves, Soxhlet extraction, Subcritical water.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3034
1248 A Text Mining Technique Using Association Rules Extraction

Authors: Hany Mahgoub, Dietmar Rösner, Nabil Ismail, Fawzy Torkey

Abstract:

This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.

Keywords: Text mining, data mining, association rule mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4398
1247 Text Retrieval Relevance Feedback Techniques for Bag of Words Model in CBIR

Authors: Nhu Van NGUYEN, Jean-Marc OGIER, Salvatore TABBONE, Alain BOUCHER

Abstract:

The state-of-the-art Bag of Words model in Content- Based Image Retrieval has been used for years but the relevance feedback strategies for this model are not fully investigated. Inspired from text retrieval, the Bag of Words model has the ability to use the wealth of knowledge and practices available in text retrieval. We study and experiment the relevance feedback model in text retrieval for adapting it to image retrieval. The experiments show that the techniques from text retrieval give good results for image retrieval and that further improvements is possible.

Keywords: Relevance feedback, bag of words model, probabilistic model, vector space model, image retrieval

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2078
1246 A System to Adapt Techniques of Text Summarizing to Polish

Authors: Marcin Ciura, Damian Grund, S

Abstract:

This paper describes a system, in which various methods of text summarizing can be adapted to Polish. A structure of the system is presented. A modular construction of the system and access to the system via the Internet are signaled.

Keywords: Automatic summary generation, linguistic analysis, text generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
1245 A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Authors: M. F. Zaiyadi, B. Baharudin

Abstract:

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Keywords: Ant colony optimization, feature selection, information gain, text categorization, text representation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2035
1244 Effect of Wheat Flour Extraction Rates on Flour Composition, Farinographic Characteristics and Sensory Perception of Sourdough Naans

Authors: Ghulam Mueen-ud-Din, Salim-ur-Rehman, Faqir M. Anjum, Haq Nawaz, Mian A. Murtaza

Abstract:

The effect of wheat flour extraction rates on flour composition, farinographic characteristics and the quality of sourdough naans was investigated. The results indicated that by increasing the extraction rate, the amount of protein, fiber, fat and ash increased, whereas moisture content decreased. Farinographic characteristic like water absorption and dough development time increased with an increase in flour extraction rate but the dough stabilities and tolerance indices were reduced with an increase in flour extraction rates. Titratable acidity for both sourdough and sourdough naans also increased along with flour extraction rate. The study showed that overall quality of sourdough naans were affected by both flour extraction rate and starter culture used. Sensory analysis of sourdough naans revealed that desirable extraction rate for sourdough naan was 76%.

Keywords: Extraction rates, Farinographic characteristics, Flour composition, Sourdough naans, Wheat flour.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4650
1243 Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge

Authors: Lu Zhang, Chunping Li, Jun Liu, Hui Wang

Abstract:

Text similarity measurement is a fundamental issue in many textual applications such as document clustering, classification, summarization and question answering. However, prevailing approaches based on Vector Space Model (VSM) more or less suffer from the limitation of Bag of Words (BOW), which ignores the semantic relationship among words. Enriching document representation with background knowledge from Wikipedia is proven to be an effective way to solve this problem, but most existing methods still cannot avoid similar flaws of BOW in a new vector space. In this paper, we propose a novel text similarity measurement which goes beyond VSM and can find semantic affinity between documents. Specifically, it is a unified graph model that exploits Wikipedia as background knowledge and synthesizes both document representation and similarity computation. The experimental results on two different datasets show that our approach significantly improves VSM-based methods in both text clustering and classification.

Keywords: Text classification, Text clustering, Text similarity, Wikipedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2082
1242 A New Recognition Scheme for Machine- Printed Arabic Texts based on Neural Networks

Authors: Z. Shaaban

Abstract:

This paper presents a new approach to tackle the problem of recognizing machine-printed Arabic texts. Because of the difficulty of recognizing cursive Arabic words, the text has to be normalized and segmented to be ready for the recognition stage. The new scheme for recognizing Arabic characters depends on multiple parallel neural networks classifier. The classifier has two phases. The first phase categories the input character into one of eight groups. The second phase classifies the character into one of the Arabic character classes in the group. The system achieved high recognition rate.

Keywords: Neural Networks, character recognition, feature extraction, multiple networks, Arabic text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1452
1241 Extraction Condition of Phaseolus vulgaris

Authors: Ratchadaporn Oonsivilai, Jutarat Manatwiyangkool, Anant Oonsivilai

Abstract:

Theoptimal extraction condition of dried Phaseolus vulgaris powderwas studied. The three independent variables are raw material concentration, shaking and centrifugaltime. The dependent variables are both yield percentage of crude extract and alphaamylase enzyme inhibition activity. The experimental design was based on box-behnkendesign. Highest yield percentage of crude extract could get from extraction condition at concentration of 1, 0,1, concentration of 0.15 M ,extraction time for 2hour, and separationtime for60 min. Moreover, the crude extract with highest alpha-amylase enzyme inhibition activityoccurred by extraction condition at concentration of 0.10 M, extraction time for 2 min, and separation time for 45 min

Keywords: Extraction time, Optimal condition, Alpha-amylase enzymeinhibition activity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2518
1240 A New Method for Rapid DNA Extraction from Artemia (Branchiopoda, Crustacea)

Authors: R. Manaffar, R. Maleki, S. Zare, N. Agh, S. Soltanian, B. Sehatnia, P. Sorgeloos, P. Bossier, G. Van Stappen

Abstract:

Artemia is one of the most conspicuous invertebrates associated with aquaculture. It can be considered as a model organism, offering numerous advantages for comprehensive and multidisciplinary studies using morphologic or molecular methods. Since DNA extraction is an important step of any molecular experiment, a new and a rapid method of DNA extraction from adult Artemia was described in this study. Besides, the efficiency of this technique was compared with two widely used alternative techniques, namely Chelex® 100 resin and SDS-chloroform methods. Data analysis revealed that the new method is the easiest and the most cost effective method among the other methods which allows a quick and efficient extraction of DNA from the adult animal.

Keywords: APD, Artemia, DNA extraction, Molecularexperiments

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3163
1239 Optimization for Subcritical Water Extraction of Phenolic Compounds from Rambutan Peels

Authors: Nuttawan Yoswathana, M. N. Eshtiaghi

Abstract:

Rambutan is a tropical fruit which peel possesses antioxidant properties. This work was conducted to optimize extraction conditions of phenolic compounds from rambutan peel. Response surface methodology (RSM) was adopted to optimize subcritical water extraction (SWE) on temperature, extraction time and percent solvent mixture. The results demonstrated that the optimum conditions for SWE were as follows: temperature 160°C, extraction time 20min. and concentration of 50% ethanol. Comparison of the phenolic compounds from the rambutan peels in maceration 6h, soxhlet 4h, and SWE 20min., it indicated that total phenolic content (using Folin-Ciocalteu-s phenol reagent) was 26.42, 70.29, and 172.47mg of tannic acid equivalent (TAE) per g dry rambutan peel, respectively. The comparative study concluded that SWE was a promising technique for phenolic compounds extraction from rambutan peel, due to much more two times of conventional techniques and shorter extraction times.

Keywords: Subcritical water extraction, Rambutan peel, phenolic compounds, response surface methodology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3620
1238 Optimization of Deglet-Nour Date (Phoenix dactylifera L.) Phenol Extraction Conditions

Authors: Lekbir Adel, Alloui-Lombarkia Ourida, Mekentichi Sihem, Noui Yassine, Baississe Salima

Abstract:

The objective of this study was to optimize the extraction conditions for phenolic compounds, total flavonoids, and antioxidant activity from Deglet-Nour variety. The extraction of active components from natural sources depends on different factors. The knowledge of the effects of different extraction parameters is useful for the optimization of the process, as well for the ability to predict the extraction yield. The effects of extraction variables, namely types of solvent (methanol, ethanol and acetone) and extraction time (1h, 6h, 12h and 24h) on phenolics extraction yield were evaluated. It has been shown that the time of extraction and types of solvent have a statistically significant influence on the extraction of phenolic compounds from Deglet-Nour variety. The optimised conditions yielded values of 80.19 ± 6.37 mg GAE/100 g FW for TPC, 2.34 ± 0.27 mg QE/100 g FW for TFC and 90.20 ± 1.29% for antioxidant activity were methanol solvent and 6 hours of time. According to the results obtained in this study, Deglet-Nour variety can be considered as a natural source of phenolic compounds with good antioxidant capacity.

Keywords: Deglet-Nour variety, Date palm Fruit, Phenolic compounds, Total flavonoids, Antioxidant activity, Extraction, Optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2640
1237 Dynamic Decompression for Text Files

Authors: Ananth Kamath, Ankit Kant, Aravind Srivatsa, Harisha J.A

Abstract:

Compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv (LZ) family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. Decompression is also required to retrieve the original data by lossless means. A compression scheme for text files coupled with the principle of dynamic decompression, which decompresses only the section of the compressed text file required by the user instead of decompressing the entire text file. Dynamic decompressed files offer better disk space utilization due to higher compression ratios compared to most of the currently available text file formats.

Keywords: Compression, Dynamic Decompression, Text file format, Portable Document Format, Compression Ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730
1236 Automatic Extraction of Features and Opinion-Oriented Sentences from Customer Reviews

Authors: Khairullah Khan, Baharum B. Baharudin, Aurangzeb Khan, Fazal_e_Malik

Abstract:

Opinion extraction about products from customer reviews is becoming an interesting area of research. Customer reviews about products are nowadays available from blogs and review sites. Also tools are being developed for extraction of opinion from these reviews to help the user as well merchants to track the most suitable choice of product. Therefore efficient method and techniques are needed to extract opinions from review and blogs. As reviews of products mostly contains discussion about the features, functions and services, therefore, efficient techniques are required to extract user comments about the desired features, functions and services. In this paper we have proposed a novel idea to find features of product from user review in an efficient way. Our focus in this paper is to get the features and opinion-oriented words about products from text through auxiliary verbs (AV) {is, was, are, were, has, have, had}. From the results of our experiments we found that 82% of features and 85% of opinion-oriented sentences include AVs. Thus these AVs are good indicators of features and opinion orientation in customer reviews.

Keywords: Classification, Customer Reviews, Helping Verbs, Opinion Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2062
1235 Optimization and Kinetic Study of Gaharu Oil Extraction

Authors: Muhammad Hazwan H., Azlina M.F., Hasfalina C.M., Zurina Z.A., Hishamuddin J

Abstract:

Gaharu that produced by Aquilaria spp. is classified as one of the most valuable forest products traded internationally as it is very resinous, fragrant and highly valuable heartwood. Gaharu has been widely used in aromatheraphy, medicine, perfume and religious practices. This work aimed to determine the factors affecting solid liquid extraction of gaharu oil using hexane as solvent under experimental condition. The kinetics of extraction was assumed and verified based on a second-order mechanism. The effect of three main factors, which were temperature, reaction time and solvent to solid ratio were investigated to achieve maximum oil yield. The optimum condition were found at temperature 65°C, 9 hours reaction time and solvent to solid ratio of 12:1 with 14.5% oil yield. The kinetics experimental data agrees and well fitted with the second order extraction model. The initial extraction rate (h) was 0.0115 gmL-1min-1; the extraction capacity (Cs) was 1.282gmL-1; the second order extraction constant (k) was 0.007 mLg-1min-1 and coefficient of determination, R2 was 0.945.

Keywords: Gaharu, solid liquid extraction, optimization, kinetics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3238
1234 Ultrasound Assisted Extraction and Microwave Assisted Extraction of Carotenoids from Melon Shells

Authors: A. Brinda Lakshmi, J. Lakshmi Priya

Abstract:

Cantaloupes (muskmelon and watermelon) contain biologically active molecules such as carotenoids which are natural pigments used as food colorants and afford health benefits. ß-carotene is the major source of carotenoids present in muskmelon and watermelon shell. Carotenoids were extracted using Microwave assisted extraction (MAE) and Ultrasound assisted extraction (UAE) utilising organic lipophilic solvents such as acetone, methanol, and hexane. Extraction conditions feed-solvent ratio, microwave power, ultrasound frequency, temperature and particle size were varied and optimized. It was found that the yield of carotenoids was higher using UAE than MAE, and muskmelon had the highest yield of carotenoids when was ethanol used as a solvent for 0.5 mm particle size.

Keywords: Carotenoids, extraction, muskmelon shell, watermelon shell.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 941
1233 A Proposed Approach for Emotion Lexicon Enrichment

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.

Keywords: Document analysis, sentimental analysis, emotion detection, WEKA tool, NRC Lexicon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1426
1232 Text-Mining Approach for Evaluation of Affective Management Practices

Authors: Masaaki Saito, Qin Tang, Hiroyuki Umemuro

Abstract:

The purpose of this paper is to propose a text mining approach to evaluate companies- practices on affective management. Affective management argues that it is critical to take stakeholders- affects into consideration during decision-making process, along with the traditional numerical and rational indices. CSR reports published by companies were collected as source information. Indices were proposed based on the frequency and collocation of words relevant to affective management concept using text mining approach to analyze the text information of CSR reports. In addition, the relationships between the results obtained using proposed indices and traditional indicators of business performance were investigated using correlation analysis. Those correlations were also compared between manufacturing and non-manufacturing companies. The results of this study revealed the possibility to evaluate affective management practices of companies based on publicly available text documents.

Keywords: Affective management, Affect, Stakeholder, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1825
1231 Meta-Classification using SVM Classifiers for Text Documents

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.

Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1585
1230 Semi-Automatic Analyzer to Detect Authorial Intentions in Scientific Documents

Authors: Kanso Hassan, Elhore Ali, Soule-dupuy Chantal, Tazi Said

Abstract:

Information Retrieval has the objective of studying models and the realization of systems allowing a user to find the relevant documents adapted to his need of information. The information search is a problem which remains difficult because the difficulty in the representing and to treat the natural languages such as polysemia. Intentional Structures promise to be a new paradigm to extend the existing documents structures and to enhance the different phases of documents process such as creation, editing, search and retrieval. The intention recognition of the author-s of texts can reduce the largeness of this problem. In this article, we present intentions recognition system is based on a semi-automatic method of extraction the intentional information starting from a corpus of text. This system is also able to update the ontology of intentions for the enrichment of the knowledge base containing all possible intentions of a domain. This approach uses the construction of a semi-formal ontology which considered as the conceptualization of the intentional information contained in a text. An experiments on scientific publications in the field of computer science was considered to validate this approach.

Keywords: Information research, text analyzes, intentionalstructure, segmentation, ontology, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614
1229 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1960
1228 A Content Vector Model for Text Classification

Authors: Eric Jiang

Abstract:

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1861
1227 Automatic Extraction of Water Bodies Using Whole-R Method

Authors: Nikhat Nawaz, S. Srinivasulu, P. Kesava Rao

Abstract:

Feature extraction plays an important role in many remote sensing applications. Automatic extraction of water bodies is of great significance in many remote sensing applications like change detection, image retrieval etc. This paper presents a procedure for automatic extraction of water information from remote sensing images. The algorithm uses the relative location of R color component of the chromaticity diagram. This method is then integrated with the effectiveness of the spatial scale transformation of whole method. The whole method is based on water index fitted from spectral library. Experimental results demonstrate the improved accuracy and effectiveness of the integrated method for automatic extraction of water bodies.

Keywords: Chromaticity, Feature Extraction, Remote Sensing, Spectral library, Water Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3333
1226 Narrative and Expository Text Reading Comprehension by Fourth Grade Spanish-Speaking Children

Authors: Mariela V. De Mier, Veronica S. Sanchez Abchi, Ana M. Borzone

Abstract:

This work aims to explore the factors that have an incidence in reading comprehension process, with different type of texts. In a recent study with 2nd, 3rd and 4th grade children, it was observed that reading comprehension of narrative texts was better than comprehension of expository texts. Nevertheless it seems that not only the type of text but also other textual factors would account for comprehension depending on the cognitive processing demands posed by the text. In order to explore this assumption, three narrative and three expository texts were elaborated with different degree of complexity. A group of 40 fourth grade Spanish-speaking children took part in the study. Children were asked to read the texts and answer orally three literal and three inferential questions for each text. The quantitative and qualitative analysis of children responses showed that children had difficulties in both, narrative and expository texts. The problem was to answer those questions that involved establishing complex relationships among information units that were present in the text or that should be activated from children’s previous knowledge to make an inference. Considering the data analysis, it could be concluded that there is some interaction between the type of text and the cognitive processing load of a specific text.

Keywords: comprehension, textual factors, type of text, processing demands.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1366