Search results for: Web page classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1205

Search results for: Web page classification

1205 Feature Selection for Web Page Classification Using Swarm Optimization

Authors: B. Leela Devi, A. Sankar

Abstract:

The web’s increased popularity has included a huge amount of information, due to which automated web page classification systems are essential to improve search engines’ performance. Web pages have many features like HTML or XML tags, hyperlinks, URLs and text contents which can be considered during an automated classification process. It is known that Webpage classification is enhanced by hyperlinks as it reflects Web page linkages. The aim of this study is to reduce the number of features to be used to improve the accuracy of the classification of web pages. In this paper, a novel feature selection method using an improved Particle Swarm Optimization (PSO) using principle of evolution is proposed. The extracted features were tested on the WebKB dataset using a parallel Neural Network to reduce the computational cost.

Keywords: Web page classification, WebKB Dataset, Term Frequency-Inverse Document Frequency (TF-IDF), Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3259
1204 Comparative Analysis of Different Page Ranking Algorithms

Authors: S. Prabha, K. Duraiswamy, J. Indhumathi

Abstract:

Search engine plays an important role in internet, to retrieve the relevant documents among the huge number of web pages. However, it retrieves more number of documents, which are all relevant to your search topics. To retrieve the most meaningful documents related to search topics, ranking algorithm is used in information retrieval technique. One of the issues in data miming is ranking the retrieved document. In information retrieval the ranking is one of the practical problems. This paper includes various Page Ranking algorithms, page segmentation algorithms and compares those algorithms used for Information Retrieval. Diverse Page Rank based algorithms like Page Rank (PR), Weighted Page Rank (WPR), Weight Page Content Rank (WPCR), Hyperlink Induced Topic Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time Rank, Tag Rank, Relational Based Page Rank and Query Dependent Ranking algorithms are discussed and compared.

Keywords: Information Retrieval, Web Page Ranking, search engine, web mining, page segmentations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4288
1203 Combining ILP with Semi-supervised Learning for Web Page Categorization

Authors: Nuanwan Soonthornphisaj, Boonserm Kijsirikul

Abstract:

This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervised Naive Bayes, which is the well-known algorithm for the text classification problem. The performance of our learning algorithm is also compare with other semi-supervised learning algorithms which are Co-Training and EM. The experimental results show that ICT algorithm outperforms those algorithms and the performance of the weak learner can be enhanced by ILP system.

Keywords: Inductive Logic Programming, Semi-supervisedLearning, Web Page Categorization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1642
1202 An Empirical Analysis of Arabic WebPages Classification using Fuzzy Operators

Authors: Ahmad T. Al-Taani, Noor Aldeen K. Al-Awad

Abstract:

In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.

Keywords: Text classification, HTML documents, Web pages, Machine learning, Fuzzy logic, Arabic Web pages.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1906
1201 A Comparative Study of Web-pages Classification Methods using Fuzzy Operators Applied to Arabic Web-pages

Authors: Ahmad T. Al-Taani, Noor Aldeen K. Al-Awad

Abstract:

In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web-pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.

Keywords: Text classification, HTML, web pages, machine learning, fuzzy logic, Arabic web pages.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2235
1200 Predictability of the Two Commonly Used Models to Represent the Thin-layer Re-wetting Characteristics of Barley

Authors: M. A. Basunia

Abstract:

Thirty three re-wetting tests were conducted at different combinations of temperatures (5.7- 46.30C) and relative humidites (48.2-88.6%) with barley. Two most commonly used thinlayer drying and rewetting models i.e. Page and Diffusion were compared for their ability to the fit the experimental re-wetting data based on the standard error of estimate (SEE) of the measured and simulated moisture contents. The comparison shows both the Page and Diffusion models fit the re-wetting experimental data of barley well. The average SEE values for the Page and Diffusion models were 0.176 % d.b. and 0.199 % d.b., respectively. The Page and Diffusion models were found to be most suitable equations, to describe the thin-layer re-wetting characteristics of barley over a typically five day re-wetting. These two models can be used for the simulation of deep-bed re-wetting of barley occurring during ventilated storage and deep bed drying.

Keywords: Thin-layer, barley, re-wetting parameters, temperature, relative humidity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1495
1199 Classification Influence Index and its Application for k-Nearest Neighbor Classifier

Authors: Sejong Oh

Abstract:

Classification is an important topic in machine learning and bioinformatics. Many datasets have been introduced for classification tasks. A dataset contains multiple features, and the quality of features influences the classification accuracy of the dataset. The power of classification for each feature differs. In this study, we suggest the Classification Influence Index (CII) as an indicator of classification power for each feature. CII enables evaluation of the features in a dataset and improved classification accuracy by transformation of the dataset. By conducting experiments using CII and the k-nearest neighbor classifier to analyze real datasets, we confirmed that the proposed index provided meaningful improvement of the classification accuracy.

Keywords: accuracy, classification, dataset, data preprocessing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1495
1198 Automatic Detection and Spatio-temporal Analysis of Commercial Accumulations Using Digital Yellow Page Data

Authors: Yuki. Akiyama, Hiroaki. Sengoku, Ryosuke. Shibasaki

Abstract:

In this study, the locations and areas of commercial accumulations were detected by using digital yellow page data. An original buffering method that can accurately create polygons of commercial accumulations is proposed in this paper.; by using this method, distribution of commercial accumulations can be easily created and monitored over a wide area. The locations, areas, and time-series changes of commercial accumulations in the South Kanto region can be monitored by integrating polygons of commercial accumulations with the time-series data of digital yellow page data. The circumstances of commercial accumulations were shown to vary according to areas, that is, highly- urbanized regions such as the city center of Tokyo and prefectural capitals, suburban areas near large cities, and suburban and rural areas.

Keywords: Commercial accumulations, Spatio-temporal analysis, Urban monitoring, Yellow page data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1262
1197 Using a Semantic Self-Organising Web Page-Ranking Mechanism for Public Administration and Education

Authors: Marios Poulos, Sozon Papavlasopoulos, V. S. Belesiotis

Abstract:

In the proposed method for Web page-ranking, a novel theoretic model is introduced and tested by examples of order relationships among IP addresses. Ranking is induced using a convexity feature, which is learned according to these examples using a self-organizing procedure. We consider the problem of selforganizing learning from IP data to be represented by a semi-random convex polygon procedure, in which the vertices correspond to IP addresses. Based on recent developments in our regularization theory for convex polygons and corresponding Euclidean distance based methods for classification, we develop an algorithmic framework for learning ranking functions based on a Computational Geometric Theory. We show that our algorithm is generic, and present experimental results explaining the potential of our approach. In addition, we explain the generality of our approach by showing its possible use as a visualization tool for data obtained from diverse domains, such as Public Administration and Education.

Keywords: Computational Geometry, Education, e-Governance, Semantic Web.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755
1196 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Associative classification (AC) is a data mining approach that combines association rule and classification to build classification models (classifiers). AC has attracted a significant attention from several researchers mainly because it derives accurate classifiers that contain simple yet effective rules. In the last decade, a number of associative classification algorithms have been proposed such as Classification based Association (CBA), Classification based on Multiple Association Rules (CMAR), Class based Associative Classification (CACA), and Classification based on Predicted Association Rule (CPAR). This paper surveys major AC algorithms and compares the steps and methods performed in each algorithm including: rule learning, rule sorting, rule pruning, classifier building, and class prediction.

Keywords: Associative Classification, Classification, Data Mining, Learning, Rule Ranking, Rule Pruning, Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6633
1195 Data Gathering and Analysis for Arabic Historical Documents

Authors: Ali Dulla

Abstract:

This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.

Keywords: Dataset production, ground truth production, historical documents, arbitrary warping, geometric correction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 865
1194 Sensitive Analysis of the ZF Model for ABC Multi Criteria Inventory Classification

Authors: Makram Ben Jeddou

Abstract:

ABC classification is widely used by managers for inventory control. The classical ABC classification is based on Pareto principle and according to the criterion of the annual use value only. Single criterion classification is often insufficient for a closely inventory control. Multi-criteria inventory classification models have been proposed by researchers in order to consider other important criteria. From these models, we will consider a specific model in order to make a sensitive analysis on the composite score calculated for each item. In fact, this score, based on a normalized average between a good and a bad optimized index, can affect the ABC-item classification. We will focus on items differently assigned to classes and then propose a classification compromise.

Keywords: ABC classification, Multi criteria inventory classification models, ZF-model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2517
1193 Techniques with Statistics for Web Page Watermarking

Authors: Mohamed Lahcen BenSaad, Sun XingMing

Abstract:

Information hiding, especially watermarking is a promising technique for the protection of intellectual property rights. This technology is mainly advanced for multimedia but the same has not been done for text. Web pages, like other documents, need a protection against piracy. In this paper, some techniques are proposed to show how to hide information in web pages using some features of the markup language used to describe these pages. Most of the techniques proposed here use the white space to hide information or some varieties of the language in representing elements. Experiments on a very small page and analysis of five thousands web pages show that these techniques have a wide bandwidth available for information hiding, and they might form a solid base to develop a robust algorithm for web page watermarking.

Keywords: Digital Watermarking, Information Hiding, Markup Language, Text watermarking, Software Watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1793
1192 A Multiresolution Approach for Noised Texture Classification based on the Co-occurrence Matrix and First Order Statistics

Authors: M. Ben Othmen, M. Sayadi, F. Fnaiech

Abstract:

Wavelet transform provides several important characteristics which can be used in a texture analysis and classification. In this work, an efficient texture classification method, which combines concepts from wavelet and co-occurrence matrices, is presented. An Euclidian distance classifier is used to evaluate the various methods of classification. A comparative study is essential to determine the ideal method. Using this conjecture, we developed a novel feature set for texture classification and demonstrate its effectiveness

Keywords: Classification, Wavelet, Co-occurrence, Euclidian Distance, Classifier, Texture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480
1191 Web Pages Aesthetic Evaluation Using Low-Level Visual Features

Authors: Maryam Mirdehghani, S. Amirhassan Monadjemi

Abstract:

Web sites are rapidly becoming the preferred media choice for our daily works such as information search, company presentation, shopping, and so on. At the same time, we live in a period where visual appearances play an increasingly important role in our daily life. In spite of designers- effort to develop a web site which be both user-friendly and attractive, it would be difficult to ensure the outcome-s aesthetic quality, since the visual appearance is a matter of an individual self perception and opinion. In this study, it is attempted to develop an automatic system for web pages aesthetic evaluation which are the building blocks of web sites. Based on the image processing techniques and artificial neural networks, the proposed method would be able to categorize the input web page according to its visual appearance and aesthetic quality. The employed features are multiscale/multidirectional textural and perceptual color properties of the web pages, fed to perceptron ANN which has been trained as the evaluator. The method is tested using university web sites and the results suggested that it would perform well in the web page aesthetic evaluation tasks with around 90% correct categorization.

Keywords: Web Page Design, Web Page Aesthetic, Color Spaces, Texture, Neural Networks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633
1190 Fast Document Segmentation Using Contourand X-Y Cut Technique

Authors: Boontee Kruatrachue, Narongchai Moongfangklang, Kritawan Siriboon

Abstract:

This paper describes fast and efficient method for page segmentation of document containing nonrectangular block. The segmentation is based on edge following algorithm using small window of 16 by 32 pixels. This segmentation is very fast since only border pixels of paragraph are used without scanning the whole page. Still, the segmentation may contain error if the space between them is smaller than the window used in edge following. Consequently, this paper reduce this error by first identify the missed segmentation point using direction information in edge following then, using X-Y cut at the missed segmentation point to separate the connected columns. The advantage of the proposed method is the fast identification of missed segmentation point. This methodology is faster with fewer overheads than other algorithms that need to access much more pixel of a document.

Keywords: Contour Direction Technique, Missed SegmentationPoints, Page Segmentation, Recursive X-Y Cut Technique

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2782
1189 The Use of Performance Indicators for Evaluating Models of Drying Jackfruit (Artocarpus heterophyllus L.): Page, Midilli, and Lewis

Authors: D. S. C. Soares, D. G. Costa, J. T. S., A. K. S. Abud, T. P. Nunes, A. M. Oliveira Júnior

Abstract:

Mathematical models of drying are used for the purpose of understanding the drying process in order to determine important parameters for design and operation of the dryer. The jackfruit is a fruit with high consumption in the Northeast and perishability. It is necessary to apply techniques to improve their conservation for longer in order to diffuse it by regions with low consumption. This study aimed to analyze several mathematical models (Page, Lewis, and Midilli) to indicate one that best fits the conditions of convective drying process using performance indicators associated with each model: accuracy (Af) and noise factors (Bf), mean square error (RMSE) and standard error of prediction (% SEP). Jackfruit drying was carried out in convective type tray dryer at a temperature of 50°C for 9 hours. It is observed that the model Midili was more accurate with Af: 1.39, Bf: 1.33, RMSE: 0.01%, and SEP: 5.34. However, the use of the Model Midilli is not appropriate for purposes of control process due to need four tuning parameters. With the performance indicators used in this paper, the Page model showed similar results with only two parameters. It is concluded that the best correlation between the experimental and estimated data is given by the Page’s model.

Keywords: Drying, models, jackfruit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2421
1188 Classification of Attaks over Cloud Environment

Authors: Karim Abouelmehdi, Loubna Dali, Elmoutaoukkil Abdelmajid, Hoda Elsayed Eladnani Fatiha, Benihssane Abderahim

Abstract:

The security of cloud services is the concern of cloud service providers. In this paper, we will mention different classifications of cloud attacks referred by specialized organizations. Each agency has its classification of well-defined properties. The purpose is to present a high-level classification of current research in cloud computing security. This classification is organized around attack strategies and corresponding defenses.

Keywords: Cloud computing, security, classification, risk.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2082
1187 Multi-Label Hierarchical Classification for Protein Function Prediction

Authors: Helyane B. Borges, Julio Cesar Nievola

Abstract:

Hierarchical classification is a problem with applications in many areas as protein function prediction where the dates are hierarchically structured. Therefore, it is necessary the development of algorithms able to induce hierarchical classification models. This paper presents experimenters using the algorithm for hierarchical classification called Multi-label Hierarchical Classification using a Competitive Neural Network (MHC-CNN). It was tested in ten datasets the Gene Ontology (GO) Cellular Component Domain. The results are compared with the Clus-HMC and Clus-HSC using the hF-Measure.

Keywords: Hierarchical Classification, Competitive Neural Network, Global Classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2380
1186 An Evaluation of Pesticide Stress Induced Proteins in three Cyanobacterial Species-Anabaena Fertilissima, Aulosira Fertilissima and Westiellopsis Prolifica using SDS-PAGE

Authors: Nirmal Kumar, Rita N. Kumar, Anubhuti Bora, Manmeet Kaur Amb

Abstract:

The whole-cell protein-profiling technique was evaluated for studying differences in banding pattern of three different species of Cyanobacteria i.e. Anabaena fertilissima, Aulosira fertilissima and Westiellopsis prolifica under the influence of four different pesticides-2,4-D (Ethyl Ester of 2,4-Dichloro Phenoxy Acetic Acid), Pencycuron (N-[(4-chlorophenyl)methyl]-Ncyclopentyl- N'–phenylurea), Endosulfan (6,7,8,9,10,10hexachloro- 1,5,5a,6,9,9a-hexahydro-6,9-methano-2,4,3-benzodioxathiepine-3- oxide) and Tebuconazole (1-(4-Chlorophenyl)-4,4-dimethyl-3-(1,2,4- triazol-1-ylmethyl)pentan-3-ol). Whole-cell extracts were obtained by sonication treatment (Sonifier cell disruptor -Branson Digital Sonifier S-450D, USA) and were analyzed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). SDS-PAGE analyses of the total protein profile of Anabaena fertilissima, Aulosira fertilissima and Westiellopsis prolifica showed a linear decrease in the protein content with increasing pesticide stress when administered to different concentrations of 2, 4-D, Pencycuron, Endosulfan and Tebuconazole. The results indicate that different stressors exert specific effects on cyanobacterial protein synthesis.

Keywords: Cyanobacteria, pesticide, SDS-PAGE

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2508
1185 Detection and Classification of Power Quality Disturbances Using S-Transform and Wavelet Algorithm

Authors: Mohamed E. Salem Abozaed

Abstract:

Detection and classification of power quality (PQ) disturbances is an important consideration to electrical utilities and many industrial customers so that diagnosis and mitigation of such disturbance can be implemented quickly. S-transform algorithm and continuous wavelet transforms (CWT) are time-frequency algorithms, and both of them are powerful in detection and classification of PQ disturbances. This paper presents detection and classification of PQ disturbances using S-transform and CWT algorithms. The results of detection and classification, provides that S-transform is more accurate in detection and classification for most PQ disturbance than CWT algorithm, where as CWT algorithm more powerful in detection in some disturbances like notching

Keywords: CWT, Disturbances classification, Disturbances detection, Power quality, S-transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2599
1184 GA Based Optimal Feature Extraction Method for Functional Data Classification

Authors: Jun Wan, Zehua Chen, Yingwu Chen, Zhidong Bai

Abstract:

Classification is an interesting problem in functional data analysis (FDA), because many science and application problems end up with classification problems, such as recognition, prediction, control, decision making, management, etc. As the high dimension and high correlation in functional data (FD), it is a key problem to extract features from FD whereas keeping its global characters, which relates to the classification efficiency and precision to heavens. In this paper, a novel automatic method which combined Genetic Algorithm (GA) and classification algorithm to extract classification features is proposed. In this method, the optimal features and classification model are approached via evolutional study step by step. It is proved by theory analysis and experiment test that this method has advantages in improving classification efficiency, precision and robustness whereas using less features and the dimension of extracted classification features can be controlled.

Keywords: Classification, functional data, feature extraction, genetic algorithm, wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1554
1183 Meta-Classification using SVM Classifiers for Text Documents

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.

Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614
1182 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: Algorithm recommendation, meta-learning, bioinformatics, hierarchical classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1370
1181 A Comparative Study of Page Ranking Algorithms for Information Retrieval

Authors: Ashutosh Kumar Singh, Ravi Kumar P

Abstract:

This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank (PR), WPR (Weighted PageRank), HITS (Hyperlink-Induced Topic Search), DistanceRank and DirichletRank algorithms are discussed and compared. PageRanks are calculated for PageRank and Weighted PageRank algorithms for a given hyperlink structure. Simulation Program is developed for PageRank algorithm because PageRank is the only ranking algorithm implemented in the search engine (Google). The outputs are shown in a table and chart format.

Keywords: Web Mining, Web Structure, Web Graph, LinkAnalysis, PageRank, Weighted PageRank, HITS, DistanceRank, DirichletRank,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2834
1180 Binary Classification Tree with Tuned Observation-based Clustering

Authors: Maythapolnun Athimethphat, Boontarika Lerteerawong

Abstract:

There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.

Keywords: multiclass classification, hierarchical classification, binary classification tree, clustering, observation-based clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730
1179 Pose Normalization Network for Object Classification

Authors: Bingquan Shen

Abstract:

Convolutional Neural Networks (CNN) have demonstrated their effectiveness in synthesizing 3D views of object instances at various viewpoints. Given the problem where one have limited viewpoints of a particular object for classification, we present a pose normalization architecture to transform the object to existing viewpoints in the training dataset before classification to yield better classification performance. We have demonstrated that this Pose Normalization Network (PNN) can capture the style of the target object and is able to re-render it to a desired viewpoint. Moreover, we have shown that the PNN improves the classification result for the 3D chairs dataset and ShapeNet airplanes dataset when given only images at limited viewpoint, as compared to a CNN baseline.

Keywords: Convolutional neural networks, object classification, pose normalization, viewpoint invariant.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1120
1178 Lean Models Classification: Towards a Holistic View

Authors: Y. Tiamaz, N. Souissi

Abstract:

The purpose of this paper is to present a classification of Lean models which aims to capture all the concepts related to this approach and thus facilitate its implementation. This classification allows the identification of the most relevant models according to several dimensions. From this perspective, we present a review and an analysis of Lean models literature and we propose dimensions for the classification of the current proposals while respecting among others the axes of the Lean approach, the maturity of the models as well as their application domains. This classification allowed us to conclude that researchers essentially consider the Lean approach as a toolbox also they design their models to solve problems related to a specific environment. Since Lean approach is no longer intended only for the automotive sector where it was invented, but to all fields (IT, Hospital, ...), we consider that this approach requires a generic model that is capable of being implemented in all areas.

Keywords: Lean approach, lean models, classification, dimensions, holistic view.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1248
1177 Obstacle Classification Method Based On 2D LIDAR Database

Authors: Moohyun Lee, Soojung Hur, Yongwan Park

Abstract:

We propose obstacle classification method based on 2D LIDAR Database. The existing obstacle classification method based on 2D LIDAR, has an advantage in terms of accuracy and shorter calculation time. However, it was difficult to classifier the type of obstacle and therefore accurate path planning was not possible. In order to overcome this problem, a method of classifying obstacle type based on width data of obstacle was proposed. However, width data was not sufficient to improve accuracy. In this paper, database was established by width and intensity data; the first classification was processed by the width data; the second classification was processed by the intensity data; classification was processed by comparing to database; result of obstacle classification was determined by finding the one with highest similarity values. An experiment using an actual autonomous vehicle under real environment shows that calculation time declined in comparison to 3D LIDAR and it was possible to classify obstacle using single 2D LIDAR.

Keywords: Obstacle, Classification, LIDAR, Segmentation, Width, Intensity, Database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3444
1176 An Efficient Obstacle Detection Algorithm Using Colour and Texture

Authors: Chau Nguyen Viet, Ian Marshall

Abstract:

This paper presents a new classification algorithm using colour and texture for obstacle detection. Colour information is computationally cheap to learn and process. However in many cases, colour alone does not provide enough information for classification. Texture information can improve classification performance but usually comes at an expensive cost. Our algorithm uses both colour and texture features but texture is only needed when colour is unreliable. During the training stage, texture features are learned specifically to improve the performance of a colour classifier. The algorithm learns a set of simple texture features and only the most effective features are used in the classification stage. Therefore our algorithm has a very good classification rate while is still fast enough to run on a limited computer platform. The proposed algorithm was tested with a challenging outdoor image set. Test result shows the algorithm achieves a much better trade-off between classification performance and efficiency than a typical colour classifier.

Keywords: Colour, texture, classification, obstacle detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1822