Search results for: Document Classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1357

Search results for: Document Classification

1297 Computer-aided Lenke Classification of Scoliotic Spines

Authors: Neila Mezghani, Philippe Phan, Hubert Labelle, Carl Eric Aubin, Jacques de Guise

Abstract:

The identification and classification of the spine deformity play an important role when considering surgical planning for adolescent patients with idiopathic scoliosis. The subject of this article is the Lenke classification of scoliotic spines using Cobb angle measurements. The purpose is two-fold: (1) design a rulebased diagram to assist clinicians in the classification process and (2) investigate a computer classifier which improves the classification time and accuracy. The rule-based diagram efficiency was evaluated in a series of scoliotic classifications by 10 clinicians. The computer classifier was tested on a radiographic measurement database of 603 patients. Classification accuracy was 93% using the rule-based diagram and 99% for the computer classifier. Both the computer classifier and the rule based diagram can efficiently assist clinicians in their Lenke classification of spine scoliosis.

Keywords: Scoliosis, Lenke model, decision-rules, computer aided classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1636
1296 Dataset Analysis Using Membership-Deviation Graph

Authors: Itgel Bayarsaikhan, Jimin Lee, Sejong Oh

Abstract:

Classification is one of the primary themes in computational biology. The accuracy of classification strongly depends on quality of a dataset, and we need some method to evaluate this quality. In this paper, we propose a new graphical analysis method using 'Membership-Deviation Graph (MDG)' for analyzing quality of a dataset. MDG represents degree of membership and deviations for instances of a class in the dataset. The result of MDG analysis is used for understanding specific feature and for selecting best feature for classification.

Keywords: feature, classification, machine learning algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1445
1295 Unsupervised Texture Classification and Segmentation

Authors: V.P.Subramanyam Rallabandi, S.K.Sett

Abstract:

An unsupervised classification algorithm is derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent non-Gaussian densities. The algorithm estimates the data density in each class by using parametric nonlinear functions that fit to the non-Gaussian structure of the data. This improves classification accuracy compared with standard Gaussian mixture models. When applied to textures, the algorithm can learn basis functions for images that capture the statistically significant structure intrinsic in the images. We apply this technique to the problem of unsupervised texture classification and segmentation.

Keywords: Gaussian Mixture Model, Independent Component Analysis, Segmentation, Unsupervised Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591
1294 Evaluating some Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Features selection, learning with kernels, support vector machine, genetic algorithms and classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538
1293 Automatic Enhanced Update Summary Generation System for News Documents

Authors: S. V. Kogilavani, C. S. Kanimozhiselvi, S. Malliga

Abstract:

Fast changing knowledge systems on the Internet can be accessed more efficiently with the help of automatic document summarization and updating techniques. The aim of multi-document update summary generation is to construct a summary unfolding the mainstream of data from a collection of documents based on the hypothesis that the user has already read a set of previous documents. In order to provide a lot of semantic information from the documents, deeper linguistic or semantic analysis of the source documents were used instead of relying only on document word frequencies to select important concepts. In order to produce a responsive summary, meaning oriented structural analysis is needed. To address this issue, the proposed system presents a document summarization approach based on sentence annotation with aspects, prepositions and named entities. Semantic element extraction strategy is used to select important concepts from documents which are used to generate enhanced semantic summary.

Keywords: Aspects, named entities, prepositions, update summary.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2134
1292 Performance Analysis of Artificial Neural Network Based Land Cover Classification

Authors: Najam Aziz, Nasru Minallah, Ahmad Junaid, Kashaf Gul

Abstract:

Landcover classification using automated classification techniques, while employing remotely sensed multi-spectral imagery, is one of the promising areas of research. Different land conditions at different time are captured through satellite and monitored by applying different classification algorithms in specific environment. In this paper, a SPOT-5 image provided by SUPARCO has been studied and classified in Environment for Visual Interpretation (ENVI), a tool widely used in remote sensing. Then, Artificial Neural Network (ANN) classification technique is used to detect the land cover changes in Abbottabad district. Obtained results are compared with a pixel based Distance classifier. The results show that ANN gives the better overall accuracy of 99.20% and Kappa coefficient value of 0.98 over the Mahalanobis Distance Classifier.

Keywords: Landcover classification, artificial neural network, remote sensing, SPOT-5.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1607
1291 Genetic Programming Approach for Multi-Category Pattern Classification Appliedto Network Intrusions Detection

Authors: K.M. Faraoun, A. Boukelif

Abstract:

This paper describes a new approach of classification using genetic programming. The proposed technique consists of genetically coevolving a population of non-linear transformations on the input data to be classified, and map them to a new space with a reduced dimension, in order to get a maximum inter-classes discrimination. The classification of new samples is then performed on the transformed data, and so become much easier. Contrary to the existing GP-classification techniques, the proposed one use a dynamic repartition of the transformed data in separated intervals, the efficacy of a given intervals repartition is handled by the fitness criterion, with a maximum classes discrimination. Experiments were first performed using the Fisher-s Iris dataset, and then, the KDD-99 Cup dataset was used to study the intrusion detection and classification problem. Obtained results demonstrate that the proposed genetic approach outperform the existing GP-classification methods [1],[2] and [3], and give a very accepted results compared to other existing techniques proposed in [4],[5],[6],[7] and [8].

Keywords: Genetic programming, patterns classification, intrusion detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1711
1290 Lithofacies Classification from Well Log Data Using Neural Networks, Interval Neutrosophic Sets and Quantification of Uncertainty

Authors: Pawalai Kraipeerapun, Chun Che Fung, Kok Wai Wong

Abstract:

This paper proposes a novel approach to the question of lithofacies classification based on an assessment of the uncertainty in the classification results. The proposed approach has multiple neural networks (NN), and interval neutrosophic sets (INS) are used to classify the input well log data into outputs of multiple classes of lithofacies. A pair of n-class neural networks are used to predict n-degree of truth memberships and n-degree of false memberships. Indeterminacy memberships or uncertainties in the predictions are estimated using a multidimensional interpolation method. These three memberships form the INS used to support the confidence in results of multiclass classification. Based on the experimental data, our approach improves the classification performance as compared to an existing technique applied only to the truth membership. In addition, our approach has the capability to provide a measure of uncertainty in the problem of multiclass classification.

Keywords: Multiclass classification, feed-forward backpropagation neural network, interval neutrosophic sets, uncertainty.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633
1289 Support Vector Machine Approach for Classification of Cancerous Prostate Regions

Authors: Metehan Makinacı

Abstract:

The objective of this paper, is to apply support vector machine (SVM) approach for the classification of cancerous and normal regions of prostate images. Three kinds of textural features are extracted and used for the analysis: parameters of the Gauss- Markov random field (GMRF), correlation function and relative entropy. Prostate images are acquired by the system consisting of a microscope, video camera and a digitizing board. Cross-validated classification over a database of 46 images is implemented to evaluate the performance. In SVM classification, sensitivity and specificity of 96.2% and 97.0% are achieved for the 32x32 pixel block sized data, respectively, with an overall accuracy of 96.6%. Classification performance is compared with artificial neural network and k-nearest neighbor classifiers. Experimental results demonstrate that the SVM approach gives the best performance.

Keywords: Computer-aided diagnosis, support vector machines, Gauss-Markov random fields, texture classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1792
1288 Establishment of Air Quality Zones in Italy

Authors: M. G. Dirodi, G. Gugliotta, C. Leonardi

Abstract:

Member States shall establish zones and agglomerations throughout their territory to assess and manage air quality in order to comply with European directives. In Italy decree 155/2010, transposing Directive 2008/50/EC on ambient air quality and cleaner air for Europe, merged into a single act the previous provisions on ambient air quality assessment and management, including those resulting from the implementation of Directive 2004/107/EC relating to arsenic, cadmium, nickel, mercury and polycyclic aromatic hydrocarbons in ambient air. Decree 155/2010 introduced stricter rules for identifying zones on the basis of the characteristics of the territory in spite of considering pollution levels, as it was in the past. The implementation of such new criteria has reduced the great variability of the previous zoning, leading to a significant reduction of the total number of zones and to a complete and uniform ambient air quality assessment and management throughout the Country. The present document is related to the new zones definition in Italy according to Decree 155/2010. In particular the paper contains the description and the analysis of the outcome of zoning and classification.

Keywords: Zones, agglomerations, air quality assessment, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2129
1287 Comparison between Different Classifications of Periodontal Diseases and Their Advantages

Authors: Ilma Robo, Saimir Heta, Merilda Tarja, Sonila Kapaj, Eduart Kapaj, Geriona Lasku

Abstract:

The classification of periodontal diseases has changed significantly in favor of simplifying the protocol of diagnosis and periodontal treatment. This review study aims to highlight the latest publications in the new periodontal disease classification, talking about the most significant differences versus the old classification with the tendency to express the advantages or disadvantages of clinical application. The aim of the study also includes the growing tendency to link the way of classification of periodontal diseases with predetermined protocols of periodontal treatment of the diagnoses included in the classification. The new classification of periodontal diseases is rather comprehensive in its subdivisions, as the disease is viewed in its entirety, with the biological dimensions of the disease, the degree of aggravation and progression of the disease, in relation to risk factors, predisposition to patient susceptibility and impact of periodontal disease to the general health status of the patient.

Keywords: Periodontal diseases, clinical application, periodontal treatment, oral diagnosis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 597
1286 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1499
1285 An Efficient Classification Method for Inverse Synthetic Aperture Radar Images

Authors: Sang-Hong Park

Abstract:

This paper proposes an efficient method to classify inverse synthetic aperture (ISAR) images. Because ISAR images can be translated and rotated in the 2-dimensional image place, invariance to the two factors is indispensable for successful classification. The proposed method achieves invariance to translation and rotation of ISAR images using a combination of two-dimensional Fourier transform, polar mapping and correlation-based alignment of the image. Classification is conducted using a simple matching score classifier. In simulations using the real ISAR images of five scaled models measured in a compact range, the proposed method yields classification ratios higher than 97 %.

Keywords: Radar, ISAR, radar target classification, radar imaging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
1284 A New Model for Question Answering Systems

Authors: Mohammad Reza Kangavari, Samira Ghandchi, Manak Golpour

Abstract:

Most of the Question Answering systems composed of three main modules: question processing, document processing and answer processing. Question processing module plays an important role in QA systems. If this module doesn't work properly, it will make problems for other sections. Moreover answer processing module is an emerging topic in Question Answering, where these systems are often required to rank and validate candidate answers. These techniques aiming at finding short and precise answers are often based on the semantic classification. This paper discussed about a new model for question answering which improved two main modules, question processing and answer processing. There are two important components which are the bases of the question processing. First component is question classification that specifies types of question and answer. Second one is reformulation which converts the user's question into an understandable question by QA system in a specific domain. Answer processing module, consists of candidate answer filtering, candidate answer ordering components and also it has a validation section for interacting with user. This module makes it more suitable to find exact answer. In this paper we have described question and answer processing modules with modeling, implementing and evaluating the system. System implemented in two versions. Results show that 'Version No.1' gave correct answer to 70% of questions (30 correct answers to 50 asked questions) and 'version No.2' gave correct answers to 94% of questions (47 correct answers to 50 asked questions).

Keywords: Answer Processing, Classification, QuestionAnswering and Query Reformulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2125
1283 A New Approach for Fingerprint Classification based on Minutiae Distribution

Authors: Jayant V Kulkarni, Jayadevan R, Suresh N Mali, Hemant K Abhyankar, Raghunath S Holambe

Abstract:

The paper describes a new approach for fingerprint classification, based on the distribution of local features (minute details or minutiae) of the fingerprints. The main advantage is that fingerprint classification provides an indexing scheme to facilitate efficient matching in a large fingerprint database. A set of rules based on heuristic approach has been proposed. The area around the core point is treated as the area of interest for extracting the minutiae features as there are substantial variations around the core point as compared to the areas away from the core point. The core point in a fingerprint has been located at a point where there is maximum curvature. The experimental results report an overall average accuracy of 86.57 % in fingerprint classification.

Keywords: Minutiae distribution, Minutiae, Classification, Orientation, Heuristic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1568
1282 Content-based Indoor/Outdoor Video Classification System for a Mobile Platform

Authors: Mitko Veta, Tomislav Kartalov, Zoran Ivanovski

Abstract:

Organization of video databases is becoming difficult task as the amount of video content increases. Video classification based on the content of videos can significantly increase the speed of tasks such as browsing and searching for a particular video in a database. In this paper, a content-based videos classification system for the classes indoor and outdoor is presented. The system is intended to be used on a mobile platform with modest resources. The algorithm makes use of the temporal redundancy in videos, which allows using an uncomplicated classification model while still achieving reasonable accuracy. The training and evaluation was done on a video database of 443 videos downloaded from a video sharing service. A total accuracy of 87.36% was achieved.

Keywords: Indoor/outdoor, video classification, imageclassification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
1281 Chilean Wines Classification based only on Aroma Information

Authors: Nicolás H. Beltrán, Manuel A. Duarte-Mermoud, Víctor A. Soto, Sebastián A. Salah, and Matías A. Bustos

Abstract:

Results of Chilean wine classification based on the information provided by an electronic nose are reported in this paper. The classification scheme consists of two parts; in the first stage, Principal Component Analysis is used as feature extraction method to reduce the dimensionality of the original information. Then, Radial Basis Functions Neural Networks is used as pattern recognition technique to perform the classification. The objective of this study is to classify different Cabernet Sauvignon, Merlot and Carménère wine samples from different years, valleys and vineyards of Chile.

Keywords: Feature extraction techniques, Pattern recognitiontechniques, Principal component analysis, Radial basis functionsneural networks, Wine classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1547
1280 Restoration of Noisy Document Images with an Efficient Bi-Level Adaptive Thresholding

Authors: Abhijit Mitra

Abstract:

An effective approach for extracting document images from a noisy background is introduced. The entire scheme is divided into three sub- stechniques – the initial preprocessing operations for noise cluster tightening, introduction of a new thresholding method by maximizing the ratio of stan- dard deviations of the combined effect on the image to the sum of weighted classes and finally the image restoration phase by image binarization utiliz- ing the proposed optimum threshold level. The proposed method is found to be efficient compared to the existing schemes in terms of computational complexity as well as speed with better noise rejection.

Keywords: Document image extraction, Preprocessing, Ratio of stan-dard deviations, Bi-level adaptive thresholding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457
1279 Classification of Prostate Cell Nuclei using Artificial Neural Network Methods

Authors: M. Sinecen, M. Makinacı

Abstract:

The purpose of this paper is to assess the value of neural networks for classification of cancer and noncancer prostate cells. Gauss Markov Random Fields, Fourier entropy and wavelet average deviation features are calculated from 80 noncancer and 80 cancer prostate cell nuclei. For classification, artificial neural network techniques which are multilayer perceptron, radial basis function and learning vector quantization are used. Two methods are utilized for multilayer perceptron. First method has single hidden layer and between 3-15 nodes, second method has two hidden layer and each layer has between 3-15 nodes. Overall classification rate of 86.88% is achieved.

Keywords: Artificial neural networks, texture classification, cancer diagnosis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591
1278 Fast Document Segmentation Using Contourand X-Y Cut Technique

Authors: Boontee Kruatrachue, Narongchai Moongfangklang, Kritawan Siriboon

Abstract:

This paper describes fast and efficient method for page segmentation of document containing nonrectangular block. The segmentation is based on edge following algorithm using small window of 16 by 32 pixels. This segmentation is very fast since only border pixels of paragraph are used without scanning the whole page. Still, the segmentation may contain error if the space between them is smaller than the window used in edge following. Consequently, this paper reduce this error by first identify the missed segmentation point using direction information in edge following then, using X-Y cut at the missed segmentation point to separate the connected columns. The advantage of the proposed method is the fast identification of missed segmentation point. This methodology is faster with fewer overheads than other algorithms that need to access much more pixel of a document.

Keywords: Contour Direction Technique, Missed SegmentationPoints, Page Segmentation, Recursive X-Y Cut Technique

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2784
1277 Improving RBF Networks Classification Performance by using K-Harmonic Means

Authors: Z. Zainuddin, W. K. Lye

Abstract:

In this paper, a clustering algorithm named KHarmonic means (KHM) was employed in the training of Radial Basis Function Networks (RBFNs). KHM organized the data in clusters and determined the centres of the basis function. The popular clustering algorithms, namely K-means (KM) and Fuzzy c-means (FCM), are highly dependent on the initial identification of elements that represent the cluster well. In KHM, the problem can be avoided. This leads to improvement in the classification performance when compared to other clustering algorithms. A comparison of the classification accuracy was performed between KM, FCM and KHM. The classification performance is based on the benchmark data sets: Iris Plant, Diabetes and Breast Cancer. RBFN training with the KHM algorithm shows better accuracy in classification problem.

Keywords: Neural networks, Radial basis functions, Clusteringmethod, K-harmonic means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1850
1276 Automatic Fingerprint Classification Using Graph Theory

Authors: Mana Tarjoman, Shaghayegh Zarei

Abstract:

Using efficient classification methods is necessary for automatic fingerprint recognition system. This paper introduces a new structural approach to fingerprint classification by using the directional image of fingerprints to increase the number of subclasses. In this method, the directional image of fingerprints is segmented into regions consisting of pixels with the same direction. Afterwards the relational graph to the segmented image is constructed and according to it, the super graph including prominent information of this graph is formed. Ultimately we apply a matching technique to compare obtained graph with the model graphs in order to classify fingerprints by using cost function. Increasing the number of subclasses with acceptable accuracy in classification and faster processing in fingerprints recognition, makes this system superior.

Keywords: Classification, Directional image, Fingerprint, Graph, Super graph.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3634
1275 Wavelet - Based Classification of Outdoor Natural Scenes by Resilient Neural Network

Authors: Amitabh Wahi, Sundaramurthy S.

Abstract:

Natural outdoor scene classification is active and promising research area around the globe. In this study, the classification is carried out in two phases. In the first phase, the features are extracted from the images by wavelet decomposition method and stored in a database as feature vectors. In the second phase, the neural classifiers such as back-propagation neural network (BPNN) and resilient back-propagation neural network (RPNN) are employed for the classification of scenes. Four hundred color images are considered from MIT database of two classes as forest and street. A comparative study has been carried out on the performance of the two neural classifiers BPNN and RPNN on the increasing number of test samples. RPNN showed better classification results compared to BPNN on the large test samples.

Keywords: BPNN, Classification, Feature extraction, RPNN, Wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1943
1274 The Development of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications

Authors: Mohamed R. Mhereeg

Abstract:

The paper investigates the feasibility of constructing a software multi-agent based monitoring and classification system and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. The agents function autonomously to provide continuous and periodic monitoring of excels spreadsheet workbooks. Resulting in, the development of the MultiAgent classification System (MACS) that is in compliance with the specifications of the Foundation for Intelligent Physical Agents (FIPA). However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies that are Windows Communication Foundation (WCF) services, Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW that is in order to satisfy the monitoring and classification of the multiple developer aspect. ODM was used to automate the classification phase of MACS.

Keywords: Autonomous, Classification, MACS, Multi-Agent, SOA, WCF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1589
1273 The Classification Model for Hard Disk Drive Functional Tests under Sparse Data Conditions

Authors: S. Pattanapairoj, D. Chetchotsak

Abstract:

This paper proposed classification models that would be used as a proxy for hard disk drive (HDD) functional test equitant which required approximately more than two weeks to perform the HDD status classification in either “Pass" or “Fail". These models were constructed by using committee network which consisted of a number of single neural networks. This paper also included the method to solve the problem of sparseness data in failed part, which was called “enforce learning method". Our results reveal that the constructed classification models with the proposed method could perform well in the sparse data conditions and thus the models, which used a few seconds for HDD classification, could be used to substitute the HDD functional tests.

Keywords: Sparse data, Classifications, Committee network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
1272 Selection of Appropriate Classification Technique for Lithological Mapping of Gali Jagir Area, Pakistan

Authors: Khunsa Fatima, Umar K. Khattak, Allah Bakhsh Kausar

Abstract:

Satellite images interpretation and analysis assist geologists by providing valuable information about geology and minerals of an area to be surveyed. A test site in Fatejang of district Attock has been studied using Landsat ETM+ and ASTER satellite images for lithological mapping. Five different supervised image classification techniques namely maximum likelihood, parallelepiped, minimum distance to mean, mahalanobis distance and spectral angle mapper have been performed upon both satellite data images to find out the suitable classification technique for lithological mapping in the study area. Results of these five image classification techniques were compared with the geological map produced by Geological Survey of Pakistan. Result of maximum likelihood classification technique applied on ASTER satellite image has highest correlation of 0.66 with the geological map. Field observations and XRD spectra of field samples also verified the results. A lithological map was then prepared based on the maximum likelihood classification of ASTER satellite image.

Keywords: ASTER, Landsat-ETM+, Satellite, Image classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2920
1271 A Content Vector Model for Text Classification

Authors: Eric Jiang

Abstract:

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885
1270 Specialized Web Robot for Objectionable Web Content Classification

Authors: SuGil Choi, SeungWan Han, Chi-Yoon Jeong, TaekYong Nam

Abstract:

This paper proposes a specialized Web robot to automatically collect objectionable Web contents for use in an objectionable Web content classification system, which creates the URL database of objectionable Web contents. It aims at shortening the update period of the DB, increasing the number of URLs in the DB, and enhancing the accuracy of the information in the DB.

Keywords: Web robot, objectionable Web content classification, URL database, URL rating

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885
1269 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 500
1268 An Enhanced Support Vector Machine-Based Approach for Sentiment Classification of Arabic Tweets of Different Dialects

Authors: Gehad S. Kaseb, Mona F. Ahmed

Abstract:

Arabic Sentiment Analysis (SA) is one of the most common research fields with many open areas. This paper proposes different pre-processing steps and a modified methodology to improve the accuracy using normal Support Vector Machine (SVM) classification. The paper works on two datasets, Arabic Sentiment Tweets Dataset (ASTD) and Extended Arabic Tweets Sentiment Dataset (Extended-ATSD), which are publicly available for academic use. The results show that the classification accuracy approaches 86%.

Keywords: Arabic, hybrid classification, sentiment analysis, tweets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 475