Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 416

Search results for: supervised classifiers

416 Investigation of Topic Modeling-Based Semi-Supervised Interpretable Document Classifier

Authors: Dasom Kim, William Xiu Shun Wong, Yoonjin Hyun, Donghoon Lee, Minji Paek, Sungho Byun, Namgyu Kim

Abstract:

There have been many researches on document classification for classifying voluminous documents automatically. Through document classification, we can assign a specific category to each unlabeled document on the basis of various machine learning algorithms. However, providing labeled documents manually requires considerable time and effort. To overcome the limitations, the semi-supervised learning which uses unlabeled document as well as labeled documents has been invented. However, traditional document classifiers, regardless of supervised or semi-supervised ones, cannot sufficiently explain the reason or the process of the classification. Thus, in this paper, we proposed a methodology to visualize major topics and class components of each document. We believe that our methodology for visualizing topics and classes of each document can enhance the reliability and explanatory power of document classifiers.

Keywords: data mining, document classifier, text mining, topic modeling

Procedia PDF Downloads 310
415 2D Point Clouds Features from Radar for Helicopter Classification

Authors: Danilo Habermann, Aleksander Medella, Carla Cremon, Yusef Caceres

Abstract:

This paper aims to analyze the ability of 2d point clouds features to classify different models of helicopters using radars. This method does not need to estimate the blade length, the number of blades of helicopters, and the period of their micro-Doppler signatures. It is also not necessary to generate spectrograms (or any other image based on time and frequency domain). This work transforms a radar return signal into a 2D point cloud and extracts features of it. Three classifiers are used to distinguish 9 different helicopter models in order to analyze the performance of the features used in this work. The high accuracy obtained with each of the classifiers demonstrates that the 2D point clouds features are very useful for classifying helicopters from radar signal.

Keywords: helicopter classification, point clouds features, radar, supervised classifiers

Procedia PDF Downloads 103
414 Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems

Authors: Andrey V. Timofeev

Abstract:

This paper introduces an original method for guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with a probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smaller the cardinality of the discrete set of hypothetical classes is, the higher is the classification accuracy. Experiments have shown that if the cardinality of the classifiers ensemble is increased then the cardinality of this set of hypothetical classes is reduced. The problem of the guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers is relevant in the multichannel classification of target events in C-OTDR monitoring systems. Results of suggested approach practical usage to accuracy control in C-OTDR monitoring systems are present.

Keywords: Lipschitz classifiers, confidence set, C-OTDR monitoring, classifiers accuracy, classifiers ensemble

Procedia PDF Downloads 410
413 DDoS Attacks Detection Using Ensemble Learning

Authors: Nidhi Aildasani, Himanshu Gauttam, Saumya Bhadauria

Abstract:

The advancement of virtual networks such as cloud computing, SDN, NFV, etc., have enabled ease to access information from anywhere around the globe with their capability for providing on-the-go services. However, it also poses a massive threat to the information and data available on these virtual networks. DDoS attacks are one of the most frequent threats to network security that cause severe damage, which hamper network performance. In this, the attacker blocks the network resources on the victim's side by sending an enormous number of packets or connection requests to the server using bots to damage the server. The existing works explore only single supervised learning algorithms or a comparison of multiple individual classifiers for DDoS attack detection. Hence, this work proposed a combination of supervised machine learning techniques called ensemble learning to combat DDoS attacks. Ensemble learning utilizes Random Forest, Support Vector Machine, and AdaBoost techniques to detect and identify the type of DDoS attack.

Keywords: DDoS attacks, ensemble learning supervised machine learning, virtual network

Procedia PDF Downloads 8
412 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents

Authors: Ying Zhao, Xingyan Bin

Abstract:

Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.

Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints

Procedia PDF Downloads 437
411 Using Classifiers to Predict Student Outcome at Higher Institute of Telecommunication

Authors: Fuad M. Alkoot

Abstract:

We aim at highlighting the benefits of classifier systems especially in supporting educational management decisions. The paper aims at using classifiers in an educational application where an outcome is predicted based on given input parameters that represent various conditions at the institute. We present a classifier system that is designed using a limited training set with data for only one semester. The achieved system is able to reach at previously known outcomes accurately. It is also tested on new input parameters representing variations of input conditions to see its prediction on the possible outcome value. Given the supervised expectation of the outcome for the new input we find the system is able to predict the correct outcome. Experiments were conducted on one semester data from two departments only, Switching and Mathematics. Future work on other departments with larger training sets and wider input variations will show additional benefits of classifier systems in supporting the management decisions at an educational institute.

Keywords: machine learning, pattern recognition, classifier design, educational management, outcome estimation

Procedia PDF Downloads 195
410 Large-Scale Electroencephalogram Biometrics through Contrastive Learning

Authors: Mostafa ‘Neo’ Mohsenvand, Mohammad Rasool Izadi, Pattie Maes

Abstract:

EEG-based biometrics (user identification) has been explored on small datasets of no more than 157 subjects. Here we show that the accuracy of modern supervised methods falls rapidly as the number of users increases to a few thousand. Moreover, supervised methods require a large amount of labeled data for training which limits their applications in real-world scenarios where acquiring data for training should not take more than a few minutes. We show that using contrastive learning for pre-training, it is possible to maintain high accuracy on a dataset of 2130 subjects while only using a fraction of labels. We compare 5 different self-supervised tasks for pre-training of the encoder where our proposed method achieves the accuracy of 96.4%, improving the baseline supervised models by 22.75% and the competing self-supervised model by 3.93%. We also study the effects of the length of the signal and the number of channels on the accuracy of the user-identification models. Our results reveal that signals from temporal and frontal channels contain more identifying features compared to other channels.

Keywords: brainprint, contrastive learning, electroencephalo-gram, self-supervised learning, user identification

Procedia PDF Downloads 83
409 Bilingualism: A Case Study of Assamese and Bodo Classifiers

Authors: Samhita Bharadwaj

Abstract:

This is an empirical study of classifiers in Assamese and Bodo, two genetically unrelated languages of India. The objective of the paper is to address the language contact between Assamese and Bodo as reflected in classifiers. The data has been collected through fieldwork in Bodo recording narratives and folk tales and eliciting specific data from the speakers. The data for Assamese is self-produced as native speaker of the language. Assamese is the easternmost New-Indo-Aryan (henceforth NIA) language mainly spoken in the Brahmaputra valley of Assam and some other north-eastern states of India. It is the lingua franca of Assam and is creolised in the neighbouring state of Nagaland. Bodo, on the other hand, is a Tibeto-Burman (henceforth TB) language of the Bodo-Garo group. It has the highest number of speakers among the TB languages of Assam. However, compared to Assamese, it is still a lesser documented language and due to the prestige of Assamese, all the Bodo speakers are fluent bi-lingual in Assamese, though the opposite isn’t the case. With this context, classifiers, a characteristic phenomenon of TB languages, but not so much of NIA languages, presents an interesting case study on language contact caused by bilingualism. Assamese, as a result of its language contact with the TB languages which are rich in classifiers; has developed the richest classifier system among the IA languages in India. Yet, as a part of rampant borrowing of Assamese words and patterns into Bodo; Bodo is seen to borrow even Assamese classifiers into its system. This paper analyses the borrowed classifiers of Bodo and finds the route of this borrowing phenomenon in the number system of the languages. As the Bodo speakers start replacing the higher numbers from five with Assamese ones, they also choose the Assamese classifiers to attach to these numbers. Thus, the partial loss of number in Bodo as a result of language contact and bilingualism in Assamese is found to be the reason behind the borrowing of classifiers in Bodo. The significance of the study lies in exploring an interesting aspect of language contact in Assam. It is hoped that this will attract further research on bilingualism and classifiers in Assam.

Keywords: Assamese, bi-lingual, Bodo, borrowing, classifier, language contact

Procedia PDF Downloads 124
408 Use of Fractal Geometry in Machine Learning

Authors: Fuad M. Alkoot

Abstract:

The main component of a machine learning system is the classifier. Classifiers are mathematical models that can perform classification tasks for a specific application area. Additionally, many classifiers are combined using any of the available methods to reduce the classifier error rate. The benefits gained from the combination of multiple classifier designs has motivated the development of diverse approaches to multiple classifiers. We aim to investigate using fractal geometry to develop an improved classifier combiner. Initially we experiment with measuring the fractal dimension of data and use the results in the development of a combiner strategy.

Keywords: fractal geometry, machine learning, classifier, fractal dimension

Procedia PDF Downloads 89
407 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data

Authors: Muthukumarasamy Govindarajan

Abstract:

Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.

Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine

Procedia PDF Downloads 60
406 Comparing Effects of Supervised Exercise Therapy versus Home-Based Exercise Therapy on Low Back Pain Severity, Muscle Strength and Anthropometric Parameters in Patients with Nonspecific Chronic Low Back Pain

Authors: Haleh Dadgostar, Faramarz Akbari, Hosien Vahid Tari, Masoud Solaymani-Dodaran, Mohammad Razi

Abstract:

Introduction: There are a number of exercises-protocols have been applied to improve low back pain. We compared the effect of supervised exercise therapy and home-based exercise therapy among patients with nonspecific chronic low back pain. Methods: 70 patients with nonspecific chronic low back pain were randomly (using a random number generator, excel) divided into two groups to compare the effects of two types of exercise therapy. After a common educational session to learn how to live with low back pain as well as to use core training protocols to strengthen the muscles, the subjects were randomly assigned to follow supervised exercise therapy (n = 31) or home-based exercise therapy (n = 34) for 20 weeks. Results: Although both types of exercise programs resulted in reduced pain, this factor decreased more significantly in supervised exercise program. All scores of fitness improved significantly in supervised exercise group. But only knee extensor strength score was increased in the home base exercise group. Conclusion: Comparing between two types of exercise, supervised group exercise showed more effective than the other one. Reduction in low back pain severity and improvement in muscle flexibility and strength can be more achieved by using a 20-week supervised exercise program compared to the home-based exercise program in patients with nonspecific chronic low back pain.

Keywords: low back pain, anthropometric parameters, supervised exercise therapy, home-based exercise therapy

Procedia PDF Downloads 224
405 Assessment of Planet Image for Land Cover Mapping Using Soft and Hard Classifiers

Authors: Lamyaa Gamal El-Deen Taha, Ashraf Sharawi

Abstract:

Planet image is a new data source from planet lab. This research is concerned with the assessment of Planet image for land cover mapping. Two pixel based classifiers and one subpixel based classifier were compared. Firstly, rectification of Planet image was performed. Secondly, a comparison between minimum distance, maximum likelihood and neural network classifications for classification of Planet image was performed. Thirdly, the overall accuracy of classification and kappa coefficient were calculated. Results indicate that neural network classification is best followed by maximum likelihood classifier then minimum distance classification for land cover mapping.

Keywords: planet image, land cover mapping, rectification, neural network classification, multilayer perceptron, soft classifiers, hard classifiers

Procedia PDF Downloads 104
404 Multi-Sensor Target Tracking Using Ensemble Learning

Authors: Bhekisipho Twala, Mantepu Masetshaba, Ramapulana Nkoana

Abstract:

Multiple classifier systems combine several individual classifiers to deliver a final classification decision. However, an increasingly controversial question is whether such systems can outperform the single best classifier, and if so, what form of multiple classifiers system yields the most significant benefit. Also, multi-target tracking detection using multiple sensors is an important research field in mobile techniques and military applications. In this paper, several multiple classifiers systems are evaluated in terms of their ability to predict a system’s failure or success for multi-sensor target tracking tasks. The Bristol Eden project dataset is utilised for this task. Experimental and simulation results show that the human activity identification system can fulfill requirements of target tracking due to improved sensors classification performances with multiple classifier systems constructed using boosting achieving higher accuracy rates.

Keywords: single classifier, ensemble learning, multi-target tracking, multiple classifiers

Procedia PDF Downloads 135
403 Efficient Schemes of Classifiers for Remote Sensing Satellite Imageries of Land Use Pattern Classifications

Authors: S. S. Patil, Sachidanand Kini

Abstract:

Classification of land use patterns is compelling in complexity and variability of remote sensing imageries data. An imperative research in remote sensing application exploited to mine some of the significant spatially variable factors as land cover and land use from satellite images for remote arid areas in Karnataka State, India. The diverse classification techniques, unsupervised and supervised consisting of maximum likelihood, Mahalanobis distance, and minimum distance are applied in Bellary District in Karnataka State, India for the classification of the raw satellite images. The accuracy evaluations of results are compared visually with the standard maps with ground-truths. We initiated with the maximum likelihood technique that gave the finest results and both minimum distance and Mahalanobis distance methods over valued agriculture land areas. In meanness of mislaid few irrelevant features due to the low resolution of the satellite images, high-quality accord between parameters extracted automatically from the developed maps and field observations was found.

Keywords: Mahalanobis distance, minimum distance, supervised, unsupervised, user classification accuracy, producer's classification accuracy, maximum likelihood, kappa coefficient

Procedia PDF Downloads 100
402 A Video Surveillance System Using an Ensemble of Simple Neural Network Classifiers

Authors: Rodrigo S. Moreira, Nelson F. F. Ebecken

Abstract:

This paper proposes a maritime vessel tracker composed of an ensemble of WiSARD weightless neural network classifiers. A failure detector analyzes vessel movement with a Kalman filter and corrects the tracking, if necessary, using FFT matching. The use of the WiSARD neural network to track objects is uncommon. The additional contributions of the present study include a performance comparison with four state-of-art trackers, an experimental study of the features that improve maritime vessel tracking, the first use of an ensemble of classifiers to track maritime vessels and a new quantization algorithm that compares the values of pixel pairs.

Keywords: ram memory, WiSARD weightless neural network, object tracking, quantization

Procedia PDF Downloads 220
401 PatchMix: Learning Transferable Semi-Supervised Representation by Predicting Patches

Authors: Arpit Rai

Abstract:

In this work, we propose PatchMix, a semi-supervised method for pre-training visual representations. PatchMix mixes patches of two images and then solves an auxiliary task of predicting the label of each patch in the mixed image. Our experiments on the CIFAR-10, 100 and the SVHN dataset show that the representations learned by this method encodes useful information for transfer to new tasks and outperform the baseline Residual Network encoders by on CIFAR 10 by 12% on ResNet 101 and 2% on ResNet-56, by 4% on CIFAR-100 on ResNet101 and by 6% on SVHN dataset on the ResNet-101 baseline model.

Keywords: self-supervised learning, representation learning, computer vision, generalization

Procedia PDF Downloads 13
400 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Authors: Rodolfo Lorbieski, Silvia Modesto Nassar

Abstract:

Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.

Keywords: stacking, multi-layers, ensemble, multi-class

Procedia PDF Downloads 196
399 Training in Psychology in Brazil: Reflections on the Role of Early Supervised Internships in Undergraduate Courses

Authors: Ana Paula Melchiors Stahlschmidt, Cristina Py de Pinto Gomes Mairesse

Abstract:

This paper presents observations on the early supervised internships in Psychology, currently called basic internships in Brazil, and its importance in professional training. The work is an experience report and focuses on the Professional training, illustrated by the reality of a Brazilian institution, used as a case study. It was developed from the authors' experience as academic supervisors of this kind of practice throughout this undergraduate course, combined with aspects investigated in the post-doctoral research of one of them. Theoretical references on the subject and related national legislation are analyzed, as well as reports of students who experienced at least one semester of this type of practice, articulated to the observations of the authors. The results demonstrate the importance of the early supervised internships as a way of creating opportunities for the students of a first contact with the professional reality and the practice of psychologists in different fields of insertion, preparing them for further experiments that require more involvement in activities of training and practices in Psychology.

Keywords: training of psychologists, internships in psychology, supervised internships, combination of theory and practice

Procedia PDF Downloads 370
398 Random Subspace Ensemble of CMAC Classifiers

Authors: Somaiyeh Dehghan, Mohammad Reza Kheirkhahan Haghighi

Abstract:

The rapid growth of domains that have data with a large number of features, while the number of samples is limited has caused difficulty in constructing strong classifiers. To reduce the dimensionality of the feature space becomes an essential step in classification task. Random subspace method (or attribute bagging) is an ensemble classifier that consists of several classifiers that each base learner in ensemble has subset of features. In the present paper, we introduce Random Subspace Ensemble of CMAC neural network (RSE-CMAC), each of which has training with subset of features. Then we use this model for classification task. For evaluation performance of our model, we compare it with bagging algorithm on 36 UCI datasets. The results reveal that the new model has better performance.

Keywords: classification, random subspace, ensemble, CMAC neural network

Procedia PDF Downloads 250
397 A Supervised Approach for Detection of Singleton Spam Reviews

Authors: Atefeh Heydari, Mohammadali Tavakoli, Naomie Salim

Abstract:

In recent years, we have witnessed that online reviews are the most important source of customers’ opinion. They are progressively more used by individuals and organisations to make purchase and business decisions. Unfortunately, for the reason of profit or fame, frauds produce deceptive reviews to hoodwink potential customers. Their activities mislead not only potential customers to make appropriate purchasing decisions and organisations to reshape their business, but also opinion mining techniques by preventing them from reaching accurate results. Spam reviews could be divided into two main groups, i.e. multiple and singleton spam reviews. Detecting a singleton spam review that is the only review written by a user ID is extremely challenging due to lack of clue for detection purposes. Singleton spam reviews are very harmful and various features and proofs used in multiple spam reviews detection are not applicable in this case. Current research aims to propose a novel supervised technique to detect singleton spam reviews. To achieve this, various features are proposed in this study and are to be combined with the most appropriate features extracted from literature and employed in a classifier. In order to compare the performance of different classifiers, SVM and naive Bayes classification algorithms were used for model building. The results revealed that SVM was more accurate than naive Bayes and our proposed technique is capable to detect singleton spam reviews effectively.

Keywords: classification algorithms, Naïve Bayes, opinion review spam detection, singleton review spam detection, support vector machine

Procedia PDF Downloads 222
396 Performance Evaluation of Contemporary Classifiers for Automatic Detection of Epileptic EEG

Authors: K. E. Ch. Vidyasagar, M. Moghavvemi, T. S. S. T. Prabhat

Abstract:

Epilepsy is a global problem, and with seizures eluding even the smartest of diagnoses a requirement for automatic detection of the same using electroencephalogram (EEG) would have a huge impact in diagnosis of the disorder. Among a multitude of methods for automatic epilepsy detection, one should find the best method out, based on accuracy, for classification. This paper reasons out, and rationalizes, the best methods for classification. Accuracy is based on the classifier, and thus this paper discusses classifiers like quadratic discriminant analysis (QDA), classification and regression tree (CART), support vector machine (SVM), naive Bayes classifier (NBC), linear discriminant analysis (LDA), K-nearest neighbor (KNN) and artificial neural networks (ANN). Results show that ANN is the most accurate of all the above stated classifiers with 97.7% accuracy, 97.25% specificity and 98.28% sensitivity in its merit. This is followed closely by SVM with 1% variation in result. These results would certainly help researchers choose the best classifier for detection of epilepsy.

Keywords: classification, seizure, KNN, SVM, LDA, ANN, epilepsy

Procedia PDF Downloads 424
395 A Study on the Acquisition of Chinese Classifiers by Vietnamese Learners

Authors: Quoc Hung Le Pham

Abstract:

In the field of language study, classifier is an interesting research feature. In the world’s languages, some languages have classifier system, some do not. Mandarin Chinese and Vietnamese languages are a rich classifier system, however, because of the language system, the cognitive, cultural differences, so that the syntactic structure of classifier of them also dissimilar. When using Mandarin Chinese classifiers must collocate with nouns or verbs, in the lexical category it is not like nouns or verbs, belong to the open class. But some scholars believe that Mandarin Chinese measure words are similar to English and other Indo European languages. The word hanging on the structure and word formation (suffix), is a closed class. Compared to other languages, such as Chinese, Vietnamese, Thai and other Asian languages are still belonging to the classifier language’s second type, this type of language is classifier, it is in the majority of quantity must exist, and following deictic, anaphoric or quantity appearing together, not separation between its modified noun, also known as numeral classifier language. Main syntactic structure of Chinese classifiers are as follows: ‘quantity+measure+noun’, ‘pronoun+measure+noun’, ‘pronoun+quantity+measure+noun’, ‘prefix+quantity+measure +noun’, ‘quantity +adjective + measure +noun’, ‘ quantity (above 10 whole number), + duo (多)measure +noun’, ‘ quantity (around 10) + measure + duo (多) +noun’. Main syntactic structure of Vietnamese classifiers are: ‘quantity+measure+noun’, ‘ measure+noun+pronoun’, ‘quantity+measure+noun+pronoun’, ‘measure+noun+prefix+ quantity’, ‘quantity+measure+noun+adjective', ‘duo (多) +quanlity+measure+noun’, ‘quantity+measure+adjective+pronoun (quantity word could not be 1)’, ‘measure+adjective+pronoun’, ‘measure+pronoun’. In daily life, classifiers are commonly used, if Chinese learners failed to standardize this using catergory, because the negative impact might occur on their verbal communication. The richness of the Chinese classifier system contributes to the complexity in the study of the system by foreign learners, especially in the inter language of Vietnamese learners. As above mentioned, Vietnamese language also has a rich system of classifiers, however, the basic structure order of two languages are similar but both still have differences. These similarities and dissimilarities between Chinese and Vietnamese classifier systems contribute significantly to the common errors made by Vietnamese students while they acquire Chinese, which are distinct from the errors made by students from the other language background. This article from a comparative perspective of language, has an orientation towards Chinese and Vietnamese languages commonly used in classifiers semantics and structural form two aspects. This comparative study aims to identity Vietnamese students while learning Chinese classifiers may face some negative transference of mother language, beside that through the analysis of the classifiers questionnaire, find out the causes and patterns of the errors they made. As the preliminary analysis shows, Vietnamese students while learning Chinese classifiers made some errors such as: overuse classifier ‘ge’(个); misuse the other classifiers ‘*yi zhang ri ji’(yi pian ri ji), ‘*yi zuo fang zi’(yi jian fang zi), ‘*si zhang jin pai’(si mei jin pai); homonym words ‘dui, shuang, fu, tao’ (对、双、副、套), ‘ke, li’ (颗、粒).

Keywords: acquisition, classifiers, negative transfer, Vietnamse learners

Procedia PDF Downloads 358
394 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events

Authors: Jaqueline Maria Ribeiro Vieira

Abstract:

Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). Previously we developed and proposed a novel strategy capable of detecting patterns at borehole images that may point to regions that have tension and breakout characteristics, based on segmented images. In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge data set configurations.

Keywords: image segmentation, oil well visualization, classifiers, data-mining, visual computer

Procedia PDF Downloads 229
393 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 558
392 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 264
391 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall

Procedia PDF Downloads 209
390 Contextual Sentiment Analysis with Untrained Annotators

Authors: Lucas A. Silva, Carla R. Aguiar

Abstract:

This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.

Keywords: sentiment analysis, untrained annotators, naive bayes, entrepreneurship, contextualized classifier

Procedia PDF Downloads 306
389 Experiments on Weakly-Supervised Learning on Imperfect Data

Authors: Yan Cheng, Yijun Shao, James Rudolph, Charlene R. Weir, Beth Sahlmann, Qing Zeng-Treitler

Abstract:

Supervised predictive models require labeled data for training purposes. Complete and accurate labeled data, i.e., a ‘gold standard’, is not always available, and imperfectly labeled data may need to serve as an alternative. An important question is if the accuracy of the labeled data creates a performance ceiling for the trained model. In this study, we trained several models to recognize the presence of delirium in clinical documents using data with annotations that are not completely accurate (i.e., weakly-supervised learning). In the external evaluation, the support vector machine model with a linear kernel performed best, achieving an area under the curve of 89.3% and accuracy of 88%, surpassing the 80% accuracy of the training sample. We then generated a set of simulated data and carried out a series of experiments which demonstrated that models trained on imperfect data can (but do not always) outperform the accuracy of the training data, e.g., the area under the curve for some models is higher than 80% when trained on the data with an error rate of 40%. Our experiments also showed that the error resistance of linear modeling is associated with larger sample size, error type, and linearity of the data (all p-values < 0.001). In conclusion, this study sheds light on the usefulness of imperfect data in clinical research via weakly-supervised learning.

Keywords: weakly-supervised learning, support vector machine, prediction, delirium, simulation

Procedia PDF Downloads 98
388 Comparing the Efficacy of Minimally Supervised Home-Based and Closely Supervised Gym Based Exercise Programs on Weight Reduction and Insulin Resistance after Bariatric Surgery

Authors: Haleh Dadgostar, Sara Kaviani, Hanieh Adib, Ali Mazaherinezhad, Masoud Solaymani-Dodaran, Fahimeh Soheilipour, Abdolreza Pazouki

Abstract:

Background and Objectives: Effectiveness of various exercise protocols in weight reduction after bariatric surgery has not been sufficiently explored in the literature. We compared the effect of minimally supervised home-based and closely supervised Gym based exercise programs on weight reduction and insulin resistance after bariatric surgery. Methods: Women undergoing gastric bypass surgery were invited to participate in an exercise program and were randomly allocated into two groups. They were either offered a minimally supervised home-based (MSHB) or closely supervised Gym-based (CSGB) exercise program. The CSGB protocol constitute two sessions per week of training under ACSM guidelines. In the MSHB protocol participants received a notebook containing a list of recommended aerobic and resistance exercises, a log to record their activity and a schedule of follow up phone calls and clinic visits. Both groups received a pedometer. We measured their weight, BMI, lipid profile, FBS, and insulin level at the baseline and after 20 weeks of exercise and were compared at the end of the study. Results: A total of 80 patients completed our study (MSHB=38 and CSGB=42). The baseline comparison showed that the two groups are similar. Using the ANCOVA method of analysis the mean change in BMI (covariate: BMI at the beginning of the study) was slightly better in CSGB compared with the MSHB (between-group mean difference: 3.33 (95%CI 4.718 to 1.943, F: 22.844 p < 0.001)). Conclusion: Our results showed that both MSHB and CSGB exercise methods are somewhat equally effective in improvement of studied factors in the two groups. With considerably lower costs of Minimally Supervised Home Based exercise programs, these methods should be considered when adequate funding are not available.

Keywords: postoperative exercise, insulin resistance, bariatric surgery, morbid obesity

Procedia PDF Downloads 211
387 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 23