Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5

Text categorization Related Publications

5 An Semantic Algorithm for Text Categoritation

Authors: Xu Zhao

Abstract:

Text categorization techniques are widely used to many Information Retrieval (IR) applications. In this paper, we proposed a simple but efficient method that can automatically find the relationship between any pair of terms and documents, also an indexing matrix is established for text categorization. We call this method Indexing Matrix Categorization Machine (IMCM). Several experiments are conducted to show the efficiency and robust of our algorithm.

Keywords: Text categorization, Sub-space learning, Latent Semantic Space

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1176
4 A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Authors: M. F. Zaiyadi, B. Baharudin

Abstract:

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Keywords: Ant colony optimization, Feature selection, Text categorization, text representation, information gain

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751
3 A Knowledge-Based E-mail System Using Semantic Categorization and Rating Mechanisms

Authors: Azleena Mohd Kassim, Muhamad Rashidi A. Rahman, Yu-N. Cheah

Abstract:

Knowledge-based e-mail systems focus on incorporating knowledge management approach in order to enhance the traditional e-mail systems. In this paper, we present a knowledgebased e-mail system called KS-Mail where people do not only send and receive e-mail conventionally but are also able to create a sense of knowledge flow. We introduce semantic processing on the e-mail contents by automatically assigning categories and providing links to semantically related e-mails. This is done to enrich the knowledge value of each e-mail as well as to ease the organization of the e-mails and their contents. At the application level, we have also built components like the service manager, evaluation engine and search engine to handle the e-mail processes efficiently by providing the means to share and reuse knowledge. For this purpose, we present the KS-Mail architecture, and elaborate on the details of the e-mail server and the application server. We present the ontology mapping technique used to achieve the e-mail content-s categorization as well as the protocols that we have developed to handle the transactions in the e-mail system. Finally, we discuss further on the implementation of the modules presented in the KS-Mail architecture.

Keywords: Text categorization, knowledge-based system, E-mail rating, ontology mapping

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1140
2 The Influence of Preprocessing Parameters on Text Categorization

Authors: Jan Pomikalek, Radim Rehurek

Abstract:

Text categorization (the assignment of texts in natural language into predefined categories) is an important and extensively studied problem in Machine Learning. Currently, popular techniques developed to deal with this task include many preprocessing and learning algorithms, many of which in turn require tuning nontrivial internal parameters. Although partial studies are available, many authors fail to report values of the parameters they use in their experiments, or reasons why these values were used instead of others. The goal of this work then is to create a more thorough comparison of preprocessing parameters and their mutual influence, and report interesting observations and results.

Keywords: Machine Learning, classification, Text categorization, electronic documents

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
1 Hybrid Machine Learning Approach for Text Categorization

Authors: Nerijus Remeikis, Ignas Skucas, Vida Melninkaite

Abstract:

Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.

Keywords: Neural Networks, Machine Learning, Text categorization, Decision trees

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1508