Search results for: classification and regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5031

Search results for: classification and regression

4881 Detection and Classification of Mammogram Images Using Principle Component Analysis and Lazy Classifiers

Authors: Rajkumar Kolangarakandy

Abstract:

Feature extraction and selection is the primary part of any mammogram classification algorithms. The choice of feature, attribute or measurements have an important influence in any classification system. Discrete Wavelet Transformation (DWT) coefficients are one of the prominent features for representing images in frequency domain. The features obtained after the decomposition of the mammogram images using wavelet transformations have higher dimension. Even though the features are higher in dimension, they were highly correlated and redundant in nature. The dimensionality reduction techniques play an important role in selecting the optimum number of features from the higher dimension data, which are highly correlated. PCA is a mathematical tool that reduces the dimensionality of the data while retaining most of the variation in the dataset. In this paper, a multilevel classification of mammogram images using reduced discrete wavelet transformation coefficients and lazy classifiers is proposed. The classification is accomplished in two different levels. In the first level, mammogram ROIs extracted from the dataset is classified as normal and abnormal types. In the second level, all the abnormal mammogram ROIs is classified into benign and malignant too. A further classification is also accomplished based on the variation in structure and intensity distribution of the images in the dataset. The Lazy classifiers called Kstar, IBL and LWL are used for classification. The classification results obtained with the reduced feature set is highly promising and the result is also compared with the performance obtained without dimension reduction.

Keywords: PCA, wavelet transformation, lazy classifiers, Kstar, IBL, LWL

Procedia PDF Downloads 316
4880 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data

Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill

Abstract:

Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.

Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function

Procedia PDF Downloads 254
4879 Characterization, Classification and Fertility Capability Classification of Three Rice Zones of Ebonyi State, Southeastern Nigeria

Authors: Sunday Nathaniel Obasi, Chiamak Chinasa Obasi

Abstract:

Soil characterization and classification provide the basic information necessary to create a functional evaluation and soil classification schemes. Fertility capability classification (FCC) on the other hand is a technical system that groups the soils according to kinds of problems they present for management of soil physical and chemical properties. This research was carried out in Ebonyi state, southeastern Nigeria, which is an agrarian state and a leading rice producing part of southeastern Nigeria. In order to maximize the soil and enhance the productivity of rice in Ebonyi soils, soil classification, and fertility classification information need to be supplied. The state was grouped into three locations according to their agricultural zones namely; Ebonyi north, Ebonyi central and Ebonyi south representing Abakaliki, Ikwo and Ivo locations respectively. Major rice growing areas of the soils were located and two profile pits were sunk in each of the studied zones from which soils were characterized, classified and fertility capability classification (FCC) developed. Soil classification was done using United State Department of Agriculture (USDA) Soil Taxonomy and correlated with World Reference Base for soil resources. Results obtained classified Abakaliki 1 and Abakaliki 2 as Typic Fluvaquents (Ochric Fluvisols). Ikwo 1 was classified as Vertic Eutrudepts (Eutric Vertisols) while Ikwo 2 was classified as Typic Eutrudepts (Eutric Cambisols). Ivo 1 and Ivo 2 were both classified as Aquic Eutrudepts (Gleyic Leptosols). Fertility capability classification (FCC) revealed that all studied soils had mostly loamy topsoils and subsoils except Ikwo 1 with clayey topsoil. Limitations encountered in the studied soils include; dryness (d), low ECEC (e), low nutrient capital reserve (k) and water logging/ anaerobic condition (gley). Thus, FCC classifications were Ldek for Abakaliki 1 and 2, Ckv for Ikwo 1, LCk for Ikwo 2 while Ivo 1 and 2 were Legk and Lgk respectively.

Keywords: soil classification, soil fertility, limitations, modifiers, Southeastern Nigeria

Procedia PDF Downloads 110
4878 Land Cover Classification Using Sentinel-2 Image Data and Random Forest Algorithm

Authors: Thanh Noi Phan, Martin Kappas, Jan Degener

Abstract:

The currently launched Sentinel 2 (S2) satellite (June, 2015) bring a great potential and opportunities for land use/cover map applications, due to its fine spatial resolution multispectral as well as high temporal resolutions. So far, there are handful studies using S2 real data for land cover classification. Especially in northern Vietnam, to our best knowledge, there exist no studies using S2 data for land cover map application. The aim of this study is to provide the preliminary result of land cover classification using Sentinel -2 data with a rising state – of – art classifier, Random Forest. A case study with heterogeneous land use/cover in the eastern of Hanoi Capital – Vietnam was chosen for this study. All 10 spectral bands of 10 and 20 m pixel size of S2 images were used, the 10 m bands were resampled to 20 m. Among several classified algorithms, supervised Random Forest classifier (RF) was applied because it was reported as one of the most accuracy methods of satellite image classification. The results showed that the red-edge and shortwave infrared (SWIR) bands play an important role in land cover classified results. A very high overall accuracy above 90% of classification results was achieved.

Keywords: classify algorithm, classification, land cover, random forest, sentinel 2, Vietnam

Procedia PDF Downloads 351
4877 Classification of Cochannel Signals Using Cyclostationary Signal Processing and Deep Learning

Authors: Bryan Crompton, Daniel Giger, Tanay Mehta, Apurva Mody

Abstract:

The task of classifying radio frequency (RF) signals has seen recent success in employing deep neural network models. In this work, we present a combined signal processing and machine learning approach to signal classification for cochannel anomalous signals. The power spectral density and cyclostationary signal processing features of a captured signal are computed and fed into a neural net to produce a classification decision. Our combined signal preprocessing and machine learning approach allows for simpler neural networks with fast training times and small computational resource requirements for inference with longer preprocessing time.

Keywords: signal processing, machine learning, cyclostationary signal processing, signal classification

Procedia PDF Downloads 75
4876 Using Data Mining Technique for Scholarship Disbursement

Authors: J. K. Alhassan, S. A. Lawal

Abstract:

This work is on decision tree-based classification for the disbursement of scholarship. Tree-based data mining classification technique is used in other to determine the generic rule to be used to disburse the scholarship. The system based on the defined rules from the tree is able to determine the class (status) to which an applicant shall belong whether Granted or Not Granted. The applicants that fall to the class of granted denote a successful acquirement of scholarship while those in not granted class are unsuccessful in the scheme. An algorithm that can be used to classify the applicants based on the rules from tree-based classification was also developed. The tree-based classification is adopted because of its efficiency, effectiveness, and easy to comprehend features. The system was tested with the data of National Information Technology Development Agency (NITDA) Abuja, a Parastatal of Federal Ministry of Communication Technology that is mandated to develop and regulate information technology in Nigeria. The system was found working according to the specification. It is therefore recommended for all scholarship disbursement organizations.

Keywords: classification, data mining, decision tree, scholarship

Procedia PDF Downloads 346
4875 Synthetic Aperture Radar Remote Sensing Classification Using the Bag of Visual Words Model to Land Cover Studies

Authors: Reza Mohammadi, Mahmod R. Sahebi, Mehrnoosh Omati, Milad Vahidi

Abstract:

Classification of high resolution polarimetric Synthetic Aperture Radar (PolSAR) images plays an important role in land cover and land use management. Recently, classification algorithms based on Bag of Visual Words (BOVW) model have attracted significant interest among scholars and researchers in and out of the field of remote sensing. In this paper, BOVW model with pixel based low-level features has been implemented to classify a subset of San Francisco bay PolSAR image, acquired by RADARSAR 2 in C-band. We have used segment-based decision-making strategy and compared the result with the result of traditional Support Vector Machine (SVM) classifier. 90.95% overall accuracy of the classification with the proposed algorithm has shown that the proposed algorithm is comparable with the state-of-the-art methods. In addition to increase in the classification accuracy, the proposed method has decreased undesirable speckle effect of SAR images.

Keywords: Bag of Visual Words (BOVW), classification, feature extraction, land cover management, Polarimetric Synthetic Aperture Radar (PolSAR)

Procedia PDF Downloads 181
4874 Novel Inference Algorithm for Gaussian Process Classification Model with Multiclass and Its Application to Human Action Classification

Authors: Wanhyun Cho, Soonja Kang, Sangkyoon Kim, Soonyoung Park

Abstract:

In this paper, we propose a novel inference algorithm for the multi-class Gaussian process classification model that can be used in the field of human behavior recognition. This algorithm can drive simultaneously both a posterior distribution of a latent function and estimators of hyper-parameters in a Gaussian process classification model with multi-class. Our algorithm is based on the Laplace approximation (LA) technique and variational EM framework. This is performed in two steps: called expectation and maximization steps. First, in the expectation step, using the Bayesian formula and LA technique, we derive approximately the posterior distribution of the latent function indicating the possibility that each observation belongs to a certain class in the Gaussian process classification model. Second, in the maximization step, using a derived posterior distribution of latent function, we compute the maximum likelihood estimator for hyper-parameters of a covariance matrix necessary to define prior distribution for latent function. These two steps iteratively repeat until a convergence condition satisfies. Moreover, we apply the proposed algorithm with human action classification problem using a public database, namely, the KTH human action data set. Experimental results reveal that the proposed algorithm shows good performance on this data set.

Keywords: bayesian rule, gaussian process classification model with multiclass, gaussian process prior, human action classification, laplace approximation, variational EM algorithm

Procedia PDF Downloads 308
4873 An Application to Predict the Best Study Path for Information Technology Students in Learning Institutes

Authors: L. S. Chathurika

Abstract:

Early prediction of student performance is an important factor to be gained academic excellence. Whatever the study stream in secondary education, students lay the foundation for higher studies during the first year of their degree or diploma program in Sri Lanka. The information technology (IT) field has certain improvements in the education domain by selecting specialization areas to show the talents and skills of students. These specializations can be software engineering, network administration, database administration, multimedia design, etc. After completing the first-year, students attempt to select the best path by considering numerous factors. The purpose of this experiment is to predict the best study path using machine learning algorithms. Five classification algorithms: decision tree, support vector machine, artificial neural network, Naïve Bayes, and logistic regression are selected and tested. The support vector machine obtained the highest accuracy, 82.4%. Then affecting features are recognized to select the best study path.

Keywords: algorithm, classification, evaluation, features, testing, training

Procedia PDF Downloads 101
4872 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 163
4871 Polarimetric Synthetic Aperture Radar Data Classification Using Support Vector Machine and Mahalanobis Distance

Authors: Najoua El Hajjaji El Idrissi, Necip Gokhan Kasapoglu

Abstract:

Polarimetric Synthetic Aperture Radar-based imaging is a powerful technique used for earth observation and classification of surfaces. Forest evolution has been one of the vital areas of attention for the remote sensing experts. The information about forest areas can be achieved by remote sensing, whether by using active radars or optical instruments. However, due to several weather constraints, such as cloud cover, limited information can be recovered using optical data and for that reason, Polarimetric Synthetic Aperture Radar (PolSAR) is used as a powerful tool for forestry inventory. In this [14paper, we applied support vector machine (SVM) and Mahalanobis distance to the fully polarimetric AIRSAR P, L, C-bands data from the Nezer forest areas, the classification is based in the separation of different tree ages. The classification results were evaluated and the results show that the SVM performs better than the Mahalanobis distance and SVM achieves approximately 75% accuracy. This result proves that SVM classification can be used as a useful method to evaluate fully polarimetric SAR data with sufficient value of accuracy.

Keywords: classification, synthetic aperture radar, SAR polarimetry, support vector machine, mahalanobis distance

Procedia PDF Downloads 106
4870 Classification of Land Cover Usage from Satellite Images Using Deep Learning Algorithms

Authors: Shaik Ayesha Fathima, Shaik Noor Jahan, Duvvada Rajeswara Rao

Abstract:

Earth's environment and its evolution can be seen through satellite images in near real-time. Through satellite imagery, remote sensing data provide crucial information that can be used for a variety of applications, including image fusion, change detection, land cover classification, agriculture, mining, disaster mitigation, and monitoring climate change. The objective of this project is to propose a method for classifying satellite images according to multiple predefined land cover classes. The proposed approach involves collecting data in image format. The data is then pre-processed using data pre-processing techniques. The processed data is fed into the proposed algorithm and the obtained result is analyzed. Some of the algorithms used in satellite imagery classification are U-Net, Random Forest, Deep Labv3, CNN, ANN, Resnet etc. In this project, we are using the DeepLabv3 (Atrous convolution) algorithm for land cover classification. The dataset used is the deep globe land cover classification dataset. DeepLabv3 is a semantic segmentation system that uses atrous convolution to capture multi-scale context by adopting multiple atrous rates in cascade or in parallel to determine the scale of segments.

Keywords: area calculation, atrous convolution, deep globe land cover classification, deepLabv3, land cover classification, resnet 50

Procedia PDF Downloads 120
4869 Classification of Opaque Exterior Walls of Buildings from a Sustainable Point of View

Authors: Michelle Sánchez de León Brajkovich, Nuria Martí Audi

Abstract:

The envelope is one of the most important elements when one analyzes the operation of the building in terms of sustainability. Taking this into consideration, this research focuses on setting a classification system of the envelopes opaque systems, crossing the knowledge and parameters of construction systems with requirements in terms of sustainability that they may have, to have a better understanding of how these systems work with respect to their sustainable contribution to the building. Therefore, this paper evaluates the importance of the envelope design on the building sustainability. It analyses the parameters that make the construction systems behave differently in terms of sustainability. At the same time it explains the classification process generated from this analysis that results in a classification where all opaque vertical envelope construction systems enter.

Keywords: sustainable, exterior walls, envelope, facades, construction systems, energy efficiency

Procedia PDF Downloads 540
4868 Multi-Classification Deep Learning Model for Diagnosing Different Chest Diseases

Authors: Bandhan Dey, Muhsina Bintoon Yiasha, Gulam Sulaman Choudhury

Abstract:

Chest disease is one of the most problematic ailments in our regular life. There are many known chest diseases out there. Diagnosing them correctly plays a vital role in the process of treatment. There are many methods available explicitly developed for different chest diseases. But the most common approach for diagnosing these diseases is through X-ray. In this paper, we proposed a multi-classification deep learning model for diagnosing COVID-19, lung cancer, pneumonia, tuberculosis, and atelectasis from chest X-rays. In the present work, we used the transfer learning method for better accuracy and fast training phase. The performance of three architectures is considered: InceptionV3, VGG-16, and VGG-19. We evaluated these deep learning architectures using public digital chest x-ray datasets with six classes (i.e., COVID-19, lung cancer, pneumonia, tuberculosis, atelectasis, and normal). The experiments are conducted on six-classification, and we found that VGG16 outperforms other proposed models with an accuracy of 95%.

Keywords: deep learning, image classification, X-ray images, Tensorflow, Keras, chest diseases, convolutional neural networks, multi-classification

Procedia PDF Downloads 61
4867 Experimental Study of Hyperparameter Tuning a Deep Learning Convolutional Recurrent Network for Text Classification

Authors: Bharatendra Rai

Abstract:

The sequence of words in text data has long-term dependencies and is known to suffer from vanishing gradient problems when developing deep learning models. Although recurrent networks such as long short-term memory networks help to overcome this problem, achieving high text classification performance is a challenging problem. Convolutional recurrent networks that combine the advantages of long short-term memory networks and convolutional neural networks can be useful for text classification performance improvements. However, arriving at suitable hyperparameter values for convolutional recurrent networks is still a challenging task where fitting a model requires significant computing resources. This paper illustrates the advantages of using convolutional recurrent networks for text classification with the help of statistically planned computer experiments for hyperparameter tuning.

Keywords: long short-term memory networks, convolutional recurrent networks, text classification, hyperparameter tuning, Tukey honest significant differences

Procedia PDF Downloads 92
4866 3D Receiver Operator Characteristic Histogram

Authors: Xiaoli Zhang, Xiongfei Li, Yuncong Feng

Abstract:

ROC curves, as a widely used evaluating tool in machine learning field, are the tradeoff of true positive rate and negative rate. However, they are blamed for ignoring some vital information in the evaluation process, such as the amount of information about the target that each instance carries, predicted score given by each classification model to each instance. Hence, in this paper, a new classification performance method is proposed by extending the Receiver Operator Characteristic (ROC) curves to 3D space, which is denoted as 3D ROC Histogram. In the histogram, the

Keywords: classification, performance evaluation, receiver operating characteristic histogram, hardness prediction

Procedia PDF Downloads 290
4865 The Extended Skew Gaussian Process for Regression

Authors: M. T. Alodat

Abstract:

In this paper, we propose a generalization to the Gaussian process regression(GPR) model called the extended skew Gaussian process for regression(ESGPr) model. The ESGPR model works better than the GPR model when the errors are skewed. We derive the predictive distribution for the ESGPR model at a new input. Also we apply the ESGPR model to FOREX data and we find that it fits the Forex data better than the GPR model.

Keywords: extended skew normal distribution, Gaussian process for regression, predictive distribution, ESGPr model

Procedia PDF Downloads 525
4864 Combined Odd Pair Autoregressive Coefficients for Epileptic EEG Signals Classification by Radial Basis Function Neural Network

Authors: Boukari Nassim

Abstract:

This paper describes the use of odd pair autoregressive coefficients (Yule _Walker and Burg) for the feature extraction of electroencephalogram (EEG) signals. In the classification: the radial basis function neural network neural network (RBFNN) is employed. The RBFNN is described by his architecture and his characteristics: as the RBF is defined by the spread which is modified for improving the results of the classification. Five types of EEG signals are defined for this work: Set A, Set B for normal signals, Set C, Set D for interictal signals, set E for ictal signal (we can found that in Bonn university). In outputs, two classes are given (AC, AD, AE, BC, BD, BE, CE, DE), the best accuracy is calculated at 99% for the combined odd pair autoregressive coefficients. Our method is very effective for the diagnosis of epileptic EEG signals.

Keywords: epilepsy, EEG signals classification, combined odd pair autoregressive coefficients, radial basis function neural network

Procedia PDF Downloads 324
4863 Integrated Nested Laplace Approximations For Quantile Regression

Authors: Kajingulu Malandala, Ranganai Edmore

Abstract:

The asymmetric Laplace distribution (ADL) is commonly used as the likelihood function of the Bayesian quantile regression, and it offers different families of likelihood method for quantile regression. Notwithstanding their popularity and practicality, ADL is not smooth and thus making it difficult to maximize its likelihood. Furthermore, Bayesian inference is time consuming and the selection of likelihood may mislead the inference, as the Bayes theorem does not automatically establish the posterior inference. Furthermore, ADL does not account for greater skewness and Kurtosis. This paper develops a new aspect of quantile regression approach for count data based on inverse of the cumulative density function of the Poisson, binomial and Delaporte distributions using the integrated nested Laplace Approximations. Our result validates the benefit of using the integrated nested Laplace Approximations and support the approach for count data.

Keywords: quantile regression, Delaporte distribution, count data, integrated nested Laplace approximation

Procedia PDF Downloads 137
4862 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado

Authors: Ana Paula Camelo, Keila Sanches

Abstract:

The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.

Keywords: deforestation, geographically weighted regression, land use, spatial analysis

Procedia PDF Downloads 334
4861 Automatic Classification Using Dynamic Fuzzy C Means Algorithm and Mathematical Morphology: Application in 3D MRI Image

Authors: Abdelkhalek Bakkari

Abstract:

Image segmentation is a critical step in image processing and pattern recognition. In this paper, we proposed a new robust automatic image classification based on a dynamic fuzzy c-means algorithm and mathematical morphology. The proposed segmentation algorithm (DFCM_MM) has been applied to MR perfusion images. The obtained results show the validity and robustness of the proposed approach.

Keywords: segmentation, classification, dynamic, fuzzy c-means, MR image

Procedia PDF Downloads 446
4860 Classification of Construction Projects

Authors: M. Safa, A. Sabet, S. MacGillivray, M. Davidson, K. Kaczmarczyk, C. T. Haas, G. E. Gibson, D. Rayside

Abstract:

To address construction project requirements and specifications, scholars and practitioners need to establish a taxonomy according to a scheme that best fits their need. While existing characterization methods are continuously being improved, new ones are devised to cover project properties which have not been previously addressed. One such method, the Project Definition Rating Index (PDRI), has received limited consideration strictly as a classification scheme. Developed by the Construction Industry Institute (CII) in 1996, the PDRI has been refined over the last two decades as a method for evaluating a project's scope definition completeness during front-end planning (FEP). The main contribution of this study is a review of practical project classification methods, and a discussion of how PDRI can be used to classify projects based on their readiness in the FEP phase. The proposed model has been applied to 59 construction projects in Ontario, and the results are discussed.

Keywords: project classification, project definition rating index (PDRI), risk, project goals alignment

Procedia PDF Downloads 653
4859 New Approach to Construct Phylogenetic Tree

Authors: Ouafae Baida, Najma Hamzaoui, Maha Akbib, Abdelfettah Sedqui, Abdelouahid Lyhyaoui

Abstract:

Numerous scientific works present various methods to analyze the data for several domains, specially the comparison of classifications. In our recent work, we presented a new approach to help the user choose the best classification method from the results obtained by every method, by basing itself on the distances between the trees of classification. The result of our approach was in the form of a dendrogram contains methods as a succession of connections. This approach is much needed in phylogeny analysis. This discipline is intended to analyze the sequences of biological macro molecules for information on the evolutionary history of living beings, including their relationship. The product of phylogeny analysis is a phylogenetic tree. In this paper, we recommend the use of a new method of construction the phylogenetic tree based on comparison of different classifications obtained by different molecular genes.

Keywords: hierarchical classification, classification methods, structure of tree, genes, phylogenetic analysis

Procedia PDF Downloads 477
4858 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for the balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: power spectral density, 3D EEG model, brain balancing, kNN

Procedia PDF Downloads 458
4857 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 434
4856 MapReduce Logistic Regression Algorithms with RHadoop

Authors: Byung Ho Jung, Dong Hoon Lim

Abstract:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.

Keywords: big data, logistic regression, MapReduce, RHadoop

Procedia PDF Downloads 249
4855 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 133
4854 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting

Authors: Yiannis G. Smirlis

Abstract:

The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.

Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction

Procedia PDF Downloads 144
4853 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 315
4852 Interference among Lambsquarters and Oil Rapeseed Cultivars

Authors: Reza Siyami, Bahram Mirshekari

Abstract:

Seed and oil yield of rapeseed is considerably affected by weeds interference including mustard (Sinapis arvensis L.), lambsquarters (Chenopodium album L.) and redroot pigweed (Amaranthus retroflexus L.) throughout the East Azerbaijan province in Iran. To formulate the relationship between four independent growth variables measured in our experiment with a dependent variable, multiple regression analysis was carried out for the weed leaves number per plant (X1), green cover percentage (X2), LAI (X3) and leaf area per plant (X4) as independent variables and rapeseed oil yield as a dependent variable. The multiple regression equation is shown as follows: Seed essential oil yield (kg/ha) = 0.156 + 0.0325 (X1) + 0.0489 (X2) + 0.0415 (X3) + 0.133 (X4). Furthermore, the stepwise regression analysis was also carried out for the data obtained to test the significance of the independent variables affecting the oil yield as a dependent variable. The resulted stepwise regression equation is shown as follows: Oil yield = 4.42 + 0.0841 (X2) + 0.0801 (X3); R2 = 81.5. The stepwise regression analysis verified that the green cover percentage and LAI of weed had a marked increasing effect on the oil yield of rapeseed.

Keywords: green cover percentage, independent variable, interference, regression

Procedia PDF Downloads 390