Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2134

Search results for: hierarchical classification

2134 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas


Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 174
2133 The Use of Layered Neural Networks for Classifying Hierarchical Scientific Fields of Study

Authors: Colin Smith, Linsey S Passarella


Due to the proliferation and decentralized nature of academic publication, no widely accepted scheme exists for organizing papers by their scientific field of study (FoS) to the author’s best knowledge. While many academic journals require author provided keywords for papers, these keywords range wildly in scope and are not consistent across papers, journals, or field domains, necessitating alternative approaches to paper classification. Past attempts to perform field-of-study (FoS) classification on scientific texts have largely used a-hierarchical FoS schemas or ignored the schema’s inherently hierarchical structure, e.g. by compressing the structure into a single layer for multi-label classification. In this paper, we introduce an application of a Layered Neural Network (LNN) to the problem of performing supervised hierarchical classification of scientific fields of study (FoS) on research papers. In this approach, paper embeddings from a pretrained language model are fed into a top-down LNN. Beginning with a single neural network (NN) for the highest layer of the class hierarchy, each node uses a separate local NN to classify the subsequent subfield child node(s) for an input embedding of concatenated paper titles and abstracts. We compare our LNN-FOS method to other recent machine learning methods using the Microsoft Academic Graph (MAG) FoS hierarchy and find that the LNN-FOS offers increased classification accuracy at each FoS hierarchical level.

Keywords: hierarchical classification, layer neural network, scientific field of study, scientific taxonomy

Procedia PDF Downloads 10
2132 Scene Classification Using Hierarchy Neural Network, Directed Acyclic Graph Structure, and Label Relations

Authors: Po-Jen Chen, Jian-Jiun Ding, Hung-Wei Hsu, Chien-Yao Wang, Jia-Ching Wang


A more accurate scene classification algorithm using label relations and the hierarchy neural network was developed in this work. In many classification algorithms, it is assumed that the labels are mutually exclusive. This assumption is true in some specific problems, however, for scene classification, the assumption is not reasonable. Because there are a variety of objects with a photo image, it is more practical to assign multiple labels for an image. In this paper, two label relations, which are exclusive relation and hierarchical relation, were adopted in the classification process to achieve more accurate multiple label classification results. Moreover, the hierarchy neural network (hierarchy NN) is applied to classify the image and the directed acyclic graph structure is used for predicting a more reasonable result which obey exclusive and hierarchical relations. Simulations show that, with these techniques, a much more accurate scene classification result can be achieved.

Keywords: convolutional neural network, label relation, hierarchy neural network, scene classification

Procedia PDF Downloads 346
2131 Identification of Spam Keywords Using Hierarchical Category in C2C E-Commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park


Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like e-bay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C e-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C e-commerce.

Keywords: spam keyword, e-commerce, keyword features, spam filtering

Procedia PDF Downloads 220
2130 Image Segmentation Using 2-D Histogram in RGB Color Space in Digital Libraries

Authors: El Asnaoui Khalid, Aksasse Brahim, Ouanan Mohammed


This paper presents an unsupervised color image segmentation method. It is based on a hierarchical analysis of 2-D histogram in RGB color space. This histogram minimizes storage space of images and thus facilitates the operations between them. The improved segmentation approach shows a better identification of objects in a color image and, at the same time, the system is fast.

Keywords: image segmentation, hierarchical analysis, 2-D histogram, classification

Procedia PDF Downloads 287
2129 Evaluating Classification with Efficacy Metrics

Authors: Guofan Shao, Lina Tang, Hao Zhang


The values of image classification accuracy are affected by class size distributions and classification schemes, making it difficult to compare the performance of classification algorithms across different remote sensing data sources and classification systems. Based on the term efficacy from medicine and pharmacology, we have developed the metrics of image classification efficacy at the map and class levels. The novelty of this approach is that a baseline classification is involved in computing image classification efficacies so that the effects of class statistics are reduced. Furthermore, the image classification efficacies are interpretable and comparable, and thus, strengthen the assessment of image data classification methods. We use real-world and hypothetical examples to explain the use of image classification efficacies. The metrics of image classification efficacy meet the critical need to rectify the strategy for the assessment of image classification performance as image classification methods are becoming more diversified.

Keywords: accuracy assessment, efficacy, image classification, machine learning, uncertainty

Procedia PDF Downloads 95
2128 Hybrid Hierarchical Clustering Approach for Community Detection in Social Network

Authors: Radhia Toujani, Jalel Akaichi


Social Networks generally present a hierarchy of communities. To determine these communities and the relationship between them, detection algorithms should be applied. Most of the existing algorithms, proposed for hierarchical communities identification, are based on either agglomerative clustering or divisive clustering. In this paper, we present a hybrid hierarchical clustering approach for community detection based on both bottom-up and bottom-down clustering. Obviously, our approach provides more relevant community structure than hierarchical method which considers only divisive or agglomerative clustering to identify communities. Moreover, we performed some comparative experiments to enhance the quality of the clustering results and to show the effectiveness of our algorithm.

Keywords: agglomerative hierarchical clustering, community structure, divisive hierarchical clustering, hybrid hierarchical clustering, opinion mining, social network, social network analysis

Procedia PDF Downloads 253
2127 New Approach to Construct Phylogenetic Tree

Authors: Ouafae Baida, Najma Hamzaoui, Maha Akbib, Abdelfettah Sedqui, Abdelouahid Lyhyaoui


Numerous scientific works present various methods to analyze the data for several domains, specially the comparison of classifications. In our recent work, we presented a new approach to help the user choose the best classification method from the results obtained by every method, by basing itself on the distances between the trees of classification. The result of our approach was in the form of a dendrogram contains methods as a succession of connections. This approach is much needed in phylogeny analysis. This discipline is intended to analyze the sequences of biological macro molecules for information on the evolutionary history of living beings, including their relationship. The product of phylogeny analysis is a phylogenetic tree. In this paper, we recommend the use of a new method of construction the phylogenetic tree based on comparison of different classifications obtained by different molecular genes.

Keywords: hierarchical classification, classification methods, structure of tree, genes, phylogenetic analysis

Procedia PDF Downloads 424
2126 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan


Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: clustering, unsupervised learning, algorithms, hierarchical

Procedia PDF Downloads 537
2125 A Novel PSO Based Decision Tree Classification

Authors: Ali Farzan


Classification of data objects or patterns is a major part in most of Decision making systems. One of the popular and commonly used classification methods is Decision Tree (DT). It is a hierarchical decision making system by which a binary tree is constructed and starting from root, at each node some of the classes is rejected until reaching the leaf nods. Each leaf node is a representative of one specific class. Finding the splitting criteria in each node for constructing or training the tree is a major problem. Particle Swarm Optimization (PSO) has been adopted as a metaheuristic searching method for finding the best splitting criteria. Result of evaluating the proposed method over benchmark datasets indicates the higher accuracy of the new PSO based decision tree.

Keywords: decision tree, particle swarm optimization, splitting criteria, metaheuristic

Procedia PDF Downloads 330
2124 Urban Land Cover from GF-2 Satellite Images Using Object Based and Neural Network Classifications

Authors: Lamyaa Gamal El-Deen Taha, Ashraf Sharawi


China launched satellite GF-2 in 2014. This study deals with comparing nearest neighbor object-based classification and neural network classification methods for classification of the fused GF-2 image. Firstly, rectification of GF-2 image was performed. Secondly, a comparison between nearest neighbor object-based classification and neural network classification for classification of fused GF-2 was performed. Thirdly, the overall accuracy of classification and kappa index were calculated. Results indicate that nearest neighbor object-based classification is better than neural network classification for urban mapping.

Keywords: GF-2 images, feature extraction-rectification, nearest neighbour object based classification, segmentation algorithms, neural network classification, multilayer perceptron

Procedia PDF Downloads 245
2123 Knowledge Discovery from Production Databases for Hierarchical Process Control

Authors: Pavol Tanuska, Pavel Vazan, Michal Kebisek, Dominika Jurovata


The paper gives the results of the project that was oriented on the usage of knowledge discoveries from production systems for needs of the hierarchical process control. One of the main project goals was the proposal of knowledge discovery model for process control. Specifics data mining methods and techniques was used for defined problems of the process control. The gained knowledge was used on the real production system, thus, the proposed solution has been verified. The paper documents how it is possible to apply new discovery knowledge to be used in the real hierarchical process control. There are specified the opportunities for application of the proposed knowledge discovery model for hierarchical process control.

Keywords: hierarchical process control, knowledge discovery from databases, neural network, process control

Procedia PDF Downloads 351
2122 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui


In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 353
2121 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu


Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 245
2120 Why Do We Need Hierachical Linear Models?

Authors: Mustafa Aydın, Ali Murat Sunbul


Hierarchical or nested data structures usually are seen in many research areas. Especially, in the field of education, if we examine most of the studies, we can see the nested structures. Students in classes, classes in schools, schools in cities and cities in regions are similar nested structures. In a hierarchical structure, students being in the same class, sharing the same physical conditions and similar experiences and learning from the same teachers, they demonstrate similar behaviors between them rather than the students in other classes.

Keywords: hierarchical linear modeling, nested data, hierarchical structure, data structure

Procedia PDF Downloads 519
2119 Hydrothermally Fabricated 3-D Nanostructure Metal Oxide Sensors

Authors: Mohammad Alenezi


Hierarchical nanostructures with higher dimensionality, consisting of nanostructure building blocks such as nanowires, nanotubes, or nanosheets are very attractive. They hold great properties like the high surface-to-volume ratio and well-ordered porous structures, which can be very challenging to attain for other mono-morphological nanostructures. Well-ordered hierarchical nanostructures with high surface-to-volume ratios facilitate gas diffusion into their surfaces as well as scattering of light. Therefore, hierarchical nanostructures are expected to perform highly as gas sensors. A multistage controlled hydrothermal synthesis method to fabricate high-performance single ZnO brushlike hierarchical nanostructure gas sensor from initial nanowires is reported. The performance of the sensor based on brush-like hierarchical nanostructure is analyzed and compared to that of a nanowire gas sensor. The hierarchical gas sensor demonstrated high sensitivity toward low concentration of acetone at high speed of response. The enhancement in the hierarchical sensor performance is attributed to the increased surface to volume ratio, reduction in dimensionality of the nanowire building blocks, formation of junctions between the initial nanowire and the secondary nanowires, and enhanced gas diffusion into the surfaces of the hierarchical nanostructures.

Keywords: metal oxide, nanostructure, hydrothermal, sensor

Procedia PDF Downloads 181
2118 A Model Based Metaheuristic for Hybrid Hierarchical Community Structure in Social Networks

Authors: Radhia Toujani, Jalel Akaichi


In recent years, the study of community detection in social networks has received great attention. The hierarchical structure of the network leads to the emergence of the convergence to a locally optimal community structure. In this paper, we aim to avoid this local optimum in the introduced hybrid hierarchical method. To achieve this purpose, we present an objective function where we incorporate the value of structural and semantic similarity based modularity and a metaheuristic namely bees colonies algorithm to optimize our objective function on both hierarchical level divisive and agglomerative. In order to assess the efficiency and the accuracy of the introduced hybrid bee colony model, we perform an extensive experimental evaluation on both synthetic and real networks.

Keywords: social network, community detection, agglomerative hierarchical clustering, divisive hierarchical clustering, similarity, modularity, metaheuristic, bee colony

Procedia PDF Downloads 290
2117 Sensitive Analysis of the ZF Model for ABC Multi Criteria Inventory Classification

Authors: Makram Ben Jeddou


The ABC classification is widely used by managers for inventory control. The classical ABC classification is based on the Pareto principle and according to the criterion of the annual use value only. Single criterion classification is often insufficient for a closely inventory control. Multi-criteria inventory classification models have been proposed by researchers in order to take into account other important criteria. From these models, we will consider the ZF model in order to make a sensitive analysis on the composite score calculated for each item. In fact, this score based on a normalized average between a good and a bad optimized index can affect the ABC items classification. We will then focus on the weights assigned to each index and propose a classification compromise.

Keywords: ABC classification, multi criteria inventory classification models, ZF-model

Procedia PDF Downloads 392
2116 Use of Hierarchical Temporal Memory Algorithm in Heart Attack Detection

Authors: Tesnim Charrad, Kaouther Nouira, Ahmed Ferchichi


In order to reduce the number of deaths due to heart problems, we propose the use of Hierarchical Temporal Memory Algorithm (HTM) which is a real time anomaly detection algorithm. HTM is a cortical learning algorithm based on neocortex used for anomaly detection. In other words, it is based on a conceptual theory of how the human brain can work. It is powerful in predicting unusual patterns, anomaly detection and classification. In this paper, HTM have been implemented and tested on ECG datasets in order to detect cardiac anomalies. Experiments showed good performance in terms of specificity, sensitivity and execution time.

Keywords: cardiac anomalies, ECG, HTM, real time anomaly detection

Procedia PDF Downloads 82
2115 An E-Assessment Website to Implement Hierarchical Aggregate Assessment

Authors: M. Lesage, G. Raîche, M. Riopel, F. Fortin, D. Sebkhi


This paper describes a Web server implementation of the hierarchical aggregate assessment process in the field of education. This process describes itself as a field of teamwork assessment where teams can have multiple levels of hierarchy and supervision. This process is applied everywhere and is part of the management, education, assessment and computer science fields. The E-Assessment website named “Cluster” records in its database the students, the course material, the teams and the hierarchical relationships between the students. For the present research, the hierarchical relationships are team member, team leader and group administrator appointments. The group administrators have the responsibility to supervise team leaders. The experimentation of the application has been performed by high school students in geology courses and Canadian army cadets for navigation patrols in teams. This research extends the work of Nance that uses a hierarchical aggregation process similar as the one implemented in the “Cluster” application.

Keywords: e-learning, e-assessment, teamwork assessment, hierarchical aggregate assessment

Procedia PDF Downloads 283
2114 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel


Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 464
2113 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi


In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia PDF Downloads 363
2112 Unseen Classes: The Paradigm Shift in Machine Learning

Authors: Vani Singhal, Jitendra Parmar, Satyendra Singh Chouhan


Unseen class discovery has now become an important part of a machine-learning algorithm to judge new classes. Unseen classes are the classes on which the machine learning model is not trained on. With the advancement in technology and AI replacing humans, the amount of data has increased to the next level. So while implementing a model on real-world examples, we come across unseen new classes. Our aim is to find the number of unseen classes by using a hierarchical-based active learning algorithm. The algorithm is based on hierarchical clustering as well as active sampling. The number of clusters that we will get in the end will give the number of unseen classes. The total clusters will also contain some clusters that have unseen classes. Instead of first discovering unseen classes and then finding their number, we directly calculated the number by applying the algorithm. The dataset used is for intent classification. The target data is the intent of the corresponding query. We conclude that when the machine learning model will encounter real-world data, it will automatically find the number of unseen classes. In the future, our next work would be to label these unseen classes correctly.

Keywords: active sampling, hierarchical clustering, open world learning, unseen class discovery

Procedia PDF Downloads 76
2111 A Hierarchical Method for Multi-Class Probabilistic Classification Vector Machines

Authors: P. Byrnes, F. A. DiazDelaO


The Support Vector Machine (SVM) has become widely recognised as one of the leading algorithms in machine learning for both regression and binary classification. It expresses predictions in terms of a linear combination of kernel functions, referred to as support vectors. Despite its popularity amongst practitioners, SVM has some limitations, with the most significant being the generation of point prediction as opposed to predictive distributions. Stemming from this issue, a probabilistic model namely, Probabilistic Classification Vector Machines (PCVM), has been proposed which respects the original functional form of SVM whilst also providing a predictive distribution. As physical system designs become more complex, an increasing number of classification tasks involving industrial applications consist of more than two classes. Consequently, this research proposes a framework which allows for the extension of PCVM to a multi class setting. Additionally, the original PCVM framework relies on the use of type II maximum likelihood to provide estimates for both the kernel hyperparameters and model evidence. In a high dimensional multi class setting, however, this approach has been shown to be ineffective due to bad scaling as the number of classes increases. Accordingly, we propose the application of Markov Chain Monte Carlo (MCMC) based methods to provide a posterior distribution over both parameters and hyperparameters. The proposed framework will be validated against current multi class classifiers through synthetic and real life implementations.

Keywords: probabilistic classification vector machines, multi class classification, MCMC, support vector machines

Procedia PDF Downloads 158
2110 Classification of Attacks Over Cloud Environment

Authors: Karim Abouelmehdi, Loubna Dali, Elmoutaoukkil Abdelmajid, Hoda Elsayed, Eladnani Fatiha, Benihssane Abderahim


The security of cloud services is the concern of cloud service providers. In this paper, we will mention different classifications of cloud attacks referred by specialized organizations. Each agency has its classification of well-defined properties. The purpose is to present a high-level classification of current research in cloud computing security. This classification is organized around attack strategies and corresponding defenses.

Keywords: cloud computing, classification, risk, security

Procedia PDF Downloads 410
2109 Digital Geography and Geographic Information System in Schools: Towards a Hierarchical Geospatial Approach

Authors: Mary Fargher


This paper examines the opportunities of using a more hierarchical approach to geospatial enquiry in using GIS in school geography. A case is made that it is not just the lack of teacher technological knowledge that is stopping some teachers from using GIS in the classroom but that there is a gap in their understanding of how to link GIS use more specifically to the pedagogy of teaching geography with GIS. Using a hierarchical approach to geospatial enquiry as a theoretical framework, the analysis shows clearly how concepts of spatial distribution, interaction, relation, comparison, and temporal relationships can be used by teachers more explicitly to capitalise on the analytical power of GIS and to construct what can be interpreted as powerful geographical knowledge. An exemplar illustrating this approach on the topic of geo-hazards is then presented for critical analysis and discussion. Recommendations are then made for a model of progression for geography teacher education with GIS through hierarchical geospatial enquiry that takes into account beginner, intermediate, and more advanced users.

Keywords: digital geography, GIS, education, hierarchical geospatial enquiry, powerful geographical knowledge

Procedia PDF Downloads 66
2108 An Empirical Study to Predict Myocardial Infarction Using K-Means and Hierarchical Clustering

Authors: Md. Minhazul Islam, Shah Ashisul Abed Nipun, Majharul Islam, Md. Abdur Rakib Rahat, Jonayet Miah, Salsavil Kayyum, Anwar Shadaab, Faiz Al Faisal


The target of this research is to predict Myocardial Infarction using unsupervised Machine Learning algorithms. Myocardial Infarction Prediction related to heart disease is a challenging factor faced by doctors & hospitals. In this prediction, accuracy of the heart disease plays a vital role. From this concern, the authors have analyzed on a myocardial dataset to predict myocardial infarction using some popular Machine Learning algorithms K-Means and Hierarchical Clustering. This research includes a collection of data and the classification of data using Machine Learning Algorithms. The authors collected 345 instances along with 26 attributes from different hospitals in Bangladesh. This data have been collected from patients suffering from myocardial infarction along with other symptoms. This model would be able to find and mine hidden facts from historical Myocardial Infarction cases. The aim of this study is to analyze the accuracy level to predict Myocardial Infarction by using Machine Learning techniques.

Keywords: Machine Learning, K-means, Hierarchical Clustering, Myocardial Infarction, Heart Disease

Procedia PDF Downloads 131
2107 High Resolution Satellite Imagery and Lidar Data for Object-Based Tree Species Classification in Quebec, Canada

Authors: Bilel Chalghaf, Mathieu Varin


Forest characterization in Quebec, Canada, is usually assessed based on photo-interpretation at the stand level. For species identification, this often results in a lack of precision. Very high spatial resolution imagery, such as DigitalGlobe, and Light Detection and Ranging (LiDAR), have the potential to overcome the limitations of aerial imagery. To date, few studies have used that data to map a large number of species at the tree level using machine learning techniques. The main objective of this study is to map 11 individual high tree species ( > 17m) at the tree level using an object-based approach in the broadleaf forest of Kenauk Nature, Quebec. For the individual tree crown segmentation, three canopy-height models (CHMs) from LiDAR data were assessed: 1) the original, 2) a filtered, and 3) a corrected model. The corrected CHM gave the best accuracy and was then coupled with imagery to refine tree species crown identification. When compared with photo-interpretation, 90% of the objects represented a single species. For modeling, 313 variables were derived from 16-band WorldView-3 imagery and LiDAR data, using radiance, reflectance, pixel, and object-based calculation techniques. Variable selection procedures were employed to reduce their number from 313 to 16, using only 11 bands to aid reproducibility. For classification, a global approach using all 11 species was compared to a semi-hierarchical hybrid classification approach at two levels: (1) tree type (broadleaf/conifer) and (2) individual broadleaf (five) and conifer (six) species. Five different model techniques were used: (1) support vector machine (SVM), (2) classification and regression tree (CART), (3) random forest (RF), (4) k-nearest neighbors (k-NN), and (5) linear discriminant analysis (LDA). Each model was tuned separately for all approaches and levels. For the global approach, the best model was the SVM using eight variables (overall accuracy (OA): 80%, Kappa: 0.77). With the semi-hierarchical hybrid approach, at the tree type level, the best model was the k-NN using six variables (OA: 100% and Kappa: 1.00). At the level of identifying broadleaf and conifer species, the best model was the SVM, with OA of 80% and 97% and Kappa values of 0.74 and 0.97, respectively, using seven variables for both models. This paper demonstrates that a hybrid classification approach gives better results and that using 16-band WorldView-3 with LiDAR data leads to more precise predictions for tree segmentation and classification, especially when the number of tree species is large.

Keywords: tree species, object-based, classification, multispectral, machine learning, WorldView-3, LiDAR

Procedia PDF Downloads 62
2106 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents

Authors: Ying Zhao, Xingyan Bin


Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.

Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints

Procedia PDF Downloads 439
2105 Hierarchical Filtering Method of Threat Alerts Based on Correlation Analysis

Authors: Xudong He, Jian Wang, Jiqiang Liu, Lei Han, Yang Yu, Shaohua Lv


Nowadays, the threats of the internet are enormous and increasing; however, the classification of huge alert messages generated in this environment is relatively monotonous. It affects the accuracy of the network situation assessment, and also brings inconvenience to the security managers to deal with the emergency. In order to deal with potential network threats effectively and provide more effective data to improve the network situation awareness. It is essential to build a hierarchical filtering method to prevent the threats. In this paper, it establishes a model for data monitoring, which can filter systematically from the original data to get the grade of threats and be stored for using again. Firstly, it filters the vulnerable resources, open ports of host devices and services. Then use the entropy theory to calculate the performance changes of the host devices at the time of the threat occurring and filter again. At last, sort the changes of the performance value at the time of threat occurring. Use the alerts and performance data collected in the real network environment to evaluate and analyze. The comparative experimental analysis shows that the threat filtering method can effectively filter the threat alerts effectively.

Keywords: correlation analysis, hierarchical filtering, multisource data, network security

Procedia PDF Downloads 126