Search results for: large margin nearest neighbor regression
3192 Feature Reduction of Nearest Neighbor Classifiers using Genetic Algorithm
Authors: M. Analoui, M. Fadavi Amiri
Abstract:
The design of a pattern classifier includes an attempt to select, among a set of possible features, a minimum subset of weakly correlated features that better discriminate the pattern classes. This is usually a difficult task in practice, normally requiring the application of heuristic knowledge about the specific problem domain. The selection and quality of the features representing each pattern have a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and allow higher classification accuracy. Many current feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and increasing classification efficiency, it does not necessarily reduce the number of features that must be measured since each new feature may be a linear combination of all of the features in the original pattern vector. In this paper a new approach is presented to feature extraction in which feature selection, feature extraction, and classifier training are performed simultaneously using a genetic algorithm. In this approach each feature value is first normalized by a linear equation, then scaled by the associated weight prior to training, testing, and classification. A knn classifier is used to evaluate each set of feature weights. The genetic algorithm optimizes a vector of feature weights, which are used to scale the individual features in the original pattern vectors in either a linear or a nonlinear fashion. By this approach, the number of features used in classifying can be finely reduced.Keywords: Feature reduction, genetic algorithm, pattern classification, nearest neighbor rule classifiers (k-NNR).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17683191 Educational Data Mining: The Case of Department of Mathematics and Computing in the Period 2009-2018
Authors: M. Sitoe, O. Zacarias
Abstract:
University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.
Keywords: Evasion and retention, cross validation, bagging, stacking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1213190 Topological Properties of an Exponential Random Geometric Graph Process
Authors: Yilun Shang
Abstract:
In this paper we consider a one-dimensional random geometric graph process with the inter-nodal gaps evolving according to an exponential AR(1) process. The transition probability matrix and stationary distribution are derived for the Markov chains concerning connectivity and the number of components. We analyze the algorithm for hitting time regarding disconnectivity. In addition to dynamical properties, we also study topological properties for static snapshots. We obtain the degree distributions as well as asymptotic precise bounds and strong law of large numbers for connectivity threshold distance and the largest nearest neighbor distance amongst others. Both exact results and limit theorems are provided in this paper.Keywords: random geometric graph, autoregressive process, degree, connectivity, Markovian, wireless network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14583189 Designing Early Warning System: Prediction Accuracy of Currency Crisis by Using k-Nearest Neighbour Method
Authors: Nor Azuana Ramli, Mohd Tahir Ismail, Hooy Chee Wooi
Abstract:
Developing a stable early warning system (EWS) model that is capable to give an accurate prediction is a challenging task. This paper introduces k-nearest neighbour (k-NN) method which never been applied in predicting currency crisis before with the aim of increasing the prediction accuracy. The proposed k-NN performance depends on the choice of a distance that is used where in our analysis; we take the Euclidean distance and the Manhattan as a consideration. For the comparison, we employ three other methods which are logistic regression analysis (logit), back-propagation neural network (NN) and sequential minimal optimization (SMO). The analysis using datasets from 8 countries and 13 macro-economic indicators for each country shows that the proposed k-NN method with k = 4 and Manhattan distance performs better than the other methods.
Keywords: Currency crisis, k-nearest neighbour method, logit, neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22973188 Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition
Authors: Redouane Tlemsani, Abdelkader Benyettou
Abstract:
Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology.
This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data.
Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables.
In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization.
The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.
Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17813187 Using Historical Data for Stock Prediction of a Tech Company
Authors: Sofia Stoica
Abstract:
In this paper, we use historical data to predict the stock price of a tech company. To this end, we use a dataset consisting of the stock prices over the past five years of 10 major tech companies: Adobe, Amazon, Apple, Facebook, Google, Microsoft, Netflix, Oracle, Salesforce, and Tesla. We implemented and tested three models – a linear regressor model, a k-nearest neighbor model (KNN), and a sequential neural network – and two algorithms – Multiplicative Weight Update and AdaBoost. We found that the sequential neural network performed the best, with a testing error of 0.18%. Interestingly, the linear model performed the second best with a testing error of 0.73%. These results show that using historical data is enough to obtain high accuracies, and a simple algorithm like linear regression has a performance similar to more sophisticated models while taking less time and resources to implement.
Keywords: Finance, machine learning, opening price, stock market.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6773186 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study
Authors: Faisal Aburub, Wael Hadi
Abstract:
Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.Keywords: Classification, data mining, evaluation measures, groundwater.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25953185 Identification of Cardiac Arrhythmias using Natural Resonance Complex Frequencies
Authors: Moustafa A. Bani-Hasan, Yasser M. Kadah, Fatma M. El-Hefnawi
Abstract:
An electrocardiogram (ECG) feature extraction system based on the calculation of the complex resonance frequency employing Prony-s method is developed. Prony-s method is applied on five different classes of ECG signals- arrhythmia as a finite sum of exponentials depending on the signal-s poles and the resonant complex frequencies. Those poles and resonance frequencies of the ECG signals- arrhythmia are evaluated for a large number of each arrhythmia. The ECG signals of lead II (ML II) were taken from MIT-BIH database for five different types. These are the ventricular couplet (VC), ventricular tachycardia (VT), ventricular bigeminy (VB), and ventricular fibrillation (VF) and the normal (NR). This novel method can be extended to any number of arrhythmias. Different classification techniques were tried using neural networks (NN), K nearest neighbor (KNN), linear discriminant analysis (LDA) and multi-class support vector machine (MC-SVM).Keywords: Arrhythmias analysis, electrocardiogram, featureextraction, statistical classifiers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20753184 Determination of Neighbor Node in Consideration of the Imaging Range of Cameras in Automatic Human Tracking System
Authors: Kozo Tanigawa, Tappei Yotsumoto, Kenichi Takahashi, Takao Kawamura, Kazunori Sugahara
Abstract:
A automatic human tracking system using mobile agent technology is realized because a mobile agent moves in accordance with a migration of a target person. In this paper, we propose a method for determining the neighbor node in consideration of the imaging range of cameras.
Keywords: Human tracking, Mobile agent, Pan/Tilt/Zoom, Neighbor relation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16763183 A Machine Learning Approach for Earthquake Prediction in Various Zones Based on Solar Activity
Authors: Viacheslav Shkuratskyy, Aminu Bello Usman, Michael O’Dea, Mujeeb Ur Rehman, Saifur Rahman Sabuj
Abstract:
This paper examines relationships between solar activity and earthquakes, it applied machine learning techniques: K-nearest neighbour, support vector regression, random forest regression, and long short-term memory network. Data from the SILSO World Data Center, the NOAA National Center, the GOES satellite, NASA OMNIWeb, and the United States Geological Survey were used for the experiment. The 23rd and 24th solar cycles, daily sunspot number, solar wind velocity, proton density, and proton temperature were all included in the dataset. The study also examined sunspots, solar wind, and solar flares, which all reflect solar activity, and earthquake frequency distribution by magnitude and depth. The findings showed that the long short-term memory network model predicts earthquakes more correctly than the other models applied in the study, and solar activity is more likely to effect earthquakes of lower magnitude and shallow depth than earthquakes of magnitude 5.5 or larger with intermediate depth and deep depth
.Keywords: K-Nearest Neighbour, Support Vector Regression, Random Forest Regression, Long Short-Term Memory Network, earthquakes, solar activity, sunspot number, solar wind, solar flares.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2053182 Lung Cancer Detection and Multi Level Classification Using Discrete Wavelet Transform Approach
Authors: V. Veeraprathap, G. S. Harish, G. Narendra Kumar
Abstract:
Uncontrolled growth of abnormal cells in the lung in the form of tumor can be either benign (non-cancerous) or malignant (cancerous). Patients with Lung Cancer (LC) have an average of five years life span expectancy provided diagnosis, detection and prediction, which reduces many treatment options to risk of invasive surgery increasing survival rate. Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance Imaging (MRI) for earlier detection of cancer are common. Gaussian filter along with median filter used for smoothing and noise removal, Histogram Equalization (HE) for image enhancement gives the best results without inviting further opinions. Lung cavities are extracted and the background portion other than two lung cavities is completely removed with right and left lungs segmented separately. Region properties measurements area, perimeter, diameter, centroid and eccentricity measured for the tumor segmented image, while texture is characterized by Gray-Level Co-occurrence Matrix (GLCM) functions, feature extraction provides Region of Interest (ROI) given as input to classifier. Two levels of classifications, K-Nearest Neighbor (KNN) is used for determining patient condition as normal or abnormal, while Artificial Neural Networks (ANN) is used for identifying the cancer stage is employed. Discrete Wavelet Transform (DWT) algorithm is used for the main feature extraction leading to best efficiency. The developed technology finds encouraging results for real time information and on line detection for future research.
Keywords: ANN, DWT, GLCM, KNN, ROI, artificial neural networks, discrete wavelet transform, gray-level co-occurrence matrix, k-nearest neighbor, region of interest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9603181 Comparing Machine Learning Estimation of Fuel Consumption of Heavy-Duty Vehicles
Authors: Victor Bodell, Lukas Ekstrom, Somayeh Aghanavesi
Abstract:
Fuel consumption (FC) is one of the key factors in determining expenses of operating a heavy-duty vehicle. A customer may therefore request an estimate of the FC of a desired vehicle. The modular design of heavy-duty vehicles allows their construction by specifying the building blocks, such as gear box, engine and chassis type. If the combination of building blocks is unprecedented, it is unfeasible to measure the FC, since this would first r equire the construction of the vehicle. This paper proposes a machine learning approach to predict FC. This study uses around 40,000 vehicles specific and o perational e nvironmental c onditions i nformation, such as road slopes and driver profiles. A ll v ehicles h ave d iesel engines and a mileage of more than 20,000 km. The data is used to investigate the accuracy of machine learning algorithms Linear regression (LR), K-nearest neighbor (KNN) and Artificial n eural n etworks (ANN) in predicting fuel consumption for heavy-duty vehicles. Performance of the algorithms is evaluated by reporting the prediction error on both simulated data and operational measurements. The performance of the algorithms is compared using nested cross-validation and statistical hypothesis testing. The statistical evaluation procedure finds that ANNs have the lowest prediction error compared to LR and KNN in estimating fuel consumption on both simulated and operational data. The models have a mean relative prediction error of 0.3% on simulated data, and 4.2% on operational data.Keywords: Artificial neural networks, fuel consumption, machine learning, regression, statistical tests.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8333180 DWT Based Image Steganalysis
Authors: Indradip Banerjee, Souvik Bhattacharyya, Gautam Sanyal
Abstract:
‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.
Keywords: Steganalysis, Moments, Wavelet Domain, KNN, K*, LWL, Naive Bayes Classifier, Neural networks, Decision trees, SVM.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25723179 Multiple Regression based Graphical Modeling for Images
Authors: Pavan S., Sridhar G., Sridhar V.
Abstract:
Super resolution is one of the commonly referred inference problems in computer vision. In the case of images, this problem is generally addressed using a graphical model framework wherein each node represents a portion of the image and the edges between the nodes represent the statistical dependencies. However, the large dimensionality of images along with the large number of possible states for a node makes the inference problem computationally intractable. In this paper, we propose a representation wherein each node can be represented as acombination of multiple regression functions. The proposed approach achieves a tradeoff between the computational complexity and inference accuracy by varying the number of regression functions for a node.
Keywords: Belief propagation, Graphical model, Regression, Super resolution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15473178 Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications
Authors: Kanthida Kusonmano, Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Klaus R. Liedl, Armin Graber
Abstract:
Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.
Keywords: Classification, High dimensional data, Machine learning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23843177 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach
Authors: Rajvir Kaur, Jeewani Anupama Ginige
Abstract:
With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15543176 Real Time Lidar and Radar High-Level Fusion for Obstacle Detection and Tracking with Evaluation on a Ground Truth
Authors: Hatem Hajri, Mohamed-Cherif Rahal
Abstract:
Both Lidars and Radars are sensors for obstacle detection. While Lidars are very accurate on obstacles positions and less accurate on their velocities, Radars are more precise on obstacles velocities and less precise on their positions. Sensor fusion between Lidar and Radar aims at improving obstacle detection using advantages of the two sensors. The present paper proposes a real-time Lidar/Radar data fusion algorithm for obstacle detection and tracking based on the global nearest neighbour standard filter (GNN). This algorithm is implemented and embedded in an automative vehicle as a component generated by a real-time multisensor software. The benefits of data fusion comparing with the use of a single sensor are illustrated through several tracking scenarios (on a highway and on a bend) and using real-time kinematic sensors mounted on the ego and tracked vehicles as a ground truth.Keywords: Ground truth, Hungarian algorithm, lidar Radar data fusion, global nearest neighbor filter.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9633175 Investigation of Wave Atom Sub-Bands via Breast Cancer Classification
Authors: Nebi Gedik, Ayten Atasoy
Abstract:
This paper investigates successful sub-bands of wave atom transform via classification of mammograms, when the coefficients of sub-bands are used as features. A computer-aided diagnosis system is constructed by using wave atom transform, support vector machine and k-nearest neighbor classifiers. Two-class classification is studied in detail using two data sets, separately. The successful sub-bands are determined according to the accuracy rates, coefficient numbers, and sensitivity rates.
Keywords: Breast cancer, wave atom transform, SVM, k-NN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10723174 Improved Safety Science: Utilizing a Design Hierarchy
Authors: Ulrica Pettersson
Abstract:
Collection of information on incidents is regularly done through pre-printed incident report forms. These tend to be incomplete and frequently lack essential information. ne consequence is that reports with inadequate information, that do not fulfil analysts’ requirements, are transferred into the analysis process. To improve an incident reporting form, theory in design science, witness psychology and interview and questionnaire research has been used. Previously three experiments have been conducted to evaluate the form and shown significant improved results. The form has proved to capture knowledge, regardless of the incidents’ character or context. The aim in this paper is to describe how design science, in more detail a design hierarchy can be used to construct a collection form for improvements in safety science.
Keywords: Design science, data collection, form, incident report, safety science.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8343173 On Speeding Up Support Vector Machines: Proximity Graphs Versus Random Sampling for Pre-Selection Condensation
Authors: Xiaohua Liu, Juan F. Beltran, Nishant Mohanchandra, Godfried T. Toussaint
Abstract:
Support vector machines (SVMs) are considered to be the best machine learning algorithms for minimizing the predictive probability of misclassification. However, their drawback is that for large data sets the computation of the optimal decision boundary is a time consuming function of the size of the training set. Hence several methods have been proposed to speed up the SVM algorithm. Here three methods used to speed up the computation of the SVM classifiers are compared experimentally using a musical genre classification problem. The simplest method pre-selects a random sample of the data before the application of the SVM algorithm. Two additional methods use proximity graphs to pre-select data that are near the decision boundary. One uses k-Nearest Neighbor graphs and the other Relative Neighborhood Graphs to accomplish the task.Keywords: Machine learning, data mining, support vector machines, proximity graphs, relative-neighborhood graphs, k-nearestneighbor graphs, random sampling, training data condensation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19193172 Relationship between Sums of Squares in Linear Regression and Semi-parametric Regression
Authors: Dursun Aydın, Bilgin Senel
Abstract:
In this paper, the sum of squares in linear regression is reduced to sum of squares in semi-parametric regression. We indicated that different sums of squares in the linear regression are similar to various deviance statements in semi-parametric regression. In addition to, coefficient of the determination derived in linear regression model is easily generalized to coefficient of the determination of the semi-parametric regression model. Then, it is made an application in order to support the theory of the linear regression and semi-parametric regression. In this way, study is supported with a simulated data example.Keywords: Semi-parametric regression, Penalized LeastSquares, Residuals, Deviance, Smoothing Spline.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18563171 Comparing SVM and Naïve Bayes Classifier for Automatic Microaneurysm Detections
Authors: A. Sopharak, B. Uyyanonvara, S. Barman
Abstract:
Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.
Keywords: Diabetic retinopathy, microaneurysm, Naïve Bayes classifier, SVM classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 61073170 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison
Authors: Xiangtuo Chen, Paul-Henry Cournéde
Abstract:
Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11763169 Comparison of Phylogenetic Trees of Multiple Protein Sequence Alignment Methods
Authors: Khaddouja Boujenfa, Nadia Essoussi, Mohamed Limam
Abstract:
Multiple sequence alignment is a fundamental part in many bioinformatics applications such as phylogenetic analysis. Many alignment methods have been proposed. Each method gives a different result for the same data set, and consequently generates a different phylogenetic tree. Hence, the chosen alignment method affects the resulting tree. However in the literature, there is no evaluation of multiple alignment methods based on the comparison of their phylogenetic trees. This work evaluates the following eight aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN, ProbCons and Align-m, based on their phylogenetic trees (test trees) produced on a given data set. The Neighbor-Joining method is used to estimate trees. Three criteria, namely, the dNNI, the dRF and the Id_Tree are established to test the ability of different alignment methods to produce closer test tree compared to the reference one (true tree). Results show that the method which produces the most accurate alignment gives the nearest test tree to the reference tree. MUSCLE outperforms all aligners with respect to the three criteria and for all datasets, performing particularly better when sequence identities are within 10-20%. It is followed by T-Coffee at lower sequence identity (<10%), Align-m at 20-30% identity, and ClustalX and ProbCons at 30-50% identity. Also, it is noticed that when sequence identities are higher (>30%), trees scores of all methods become similar.Keywords: Multiple alignment methods, phylogenetic trees, Neighbor-Joining method, Robinson-Foulds distance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18273168 MarginDistillation: Distillation for Face Recognition Neural Networks with Margin-Based Softmax
Authors: Svitov David, Alyamkin Sergey
Abstract:
The usage of convolutional neural networks (CNNs) in conjunction with the margin-based softmax approach demonstrates the state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and face embeddings predicted by the teacher network.Keywords: ArcFace, distillation, face recognition, margin-based softmax.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6323167 A Finite Point Method Based on Directional Derivatives for Diffusion Equation
Authors: Guixia Lv, Longjun Shen
Abstract:
This paper presents a finite point method based on directional derivatives for diffusion equation on 2D scattered points. To discretize the diffusion operator at a given point, a six-point stencil is derived by employing explicit numerical formulae of directional derivatives, namely, for the point under consideration, only five neighbor points are involved, the number of which is the smallest for discretizing diffusion operator with first-order accuracy. A method for selecting neighbor point set is proposed, which satisfies the solvability condition of numerical derivatives. Some numerical examples are performed to show the good performance of the proposed method.Keywords: Finite point method, directional derivatives, diffusionequation, method for selecting neighbor point set.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13523166 A Method for Quality Inspection of Motors by Detecting Abnormal Sound
Authors: Tadatsugu Kitamoto
Abstract:
Recently, a quality of motors is inspected by human ears. In this paper, I propose two systems using a method of speech recognition for automation of the inspection. The first system is based on a method of linear processing which uses K-means and Nearest Neighbor method, and the second is based on a method of non-linear processing which uses neural networks. I used motor sounds in these systems, and I successfully recognize 86.67% of motor sounds in the linear processing system and 97.78% in the non-linear processing system.Keywords: Acoustical diagnosis, Neural networks, K-means, Short-time Fourier transformation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17013165 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique
Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan
Abstract:
In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.
Keywords: Brain balancing, kNN, power spectral density, 3D EEG model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26303164 Margin-Based Feed-Forward Neural Network Classifiers
Authors: Han Xiao, Xiaoyan Zhu
Abstract:
Margin-Based Principle has been proposed for a long time, it has been proved that this principle could reduce the structural risk and improve the performance in both theoretical and practical aspects. Meanwhile, feed-forward neural network is a traditional classifier, which is very hot at present with a deeper architecture. However, the training algorithm of feed-forward neural network is developed and generated from Widrow-Hoff Principle that means to minimize the squared error. In this paper, we propose a new training algorithm for feed-forward neural networks based on Margin-Based Principle, which could effectively promote the accuracy and generalization ability of neural network classifiers with less labelled samples and flexible network. We have conducted experiments on four UCI open datasets and achieved good results as expected. In conclusion, our model could handle more sparse labelled and more high-dimension dataset in a high accuracy while modification from old ANN method to our method is easy and almost free of work.Keywords: Max-Margin Principle, Feed-Forward Neural Network, Classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17273163 Comparison of SVC and STATCOM in Static Voltage Stability Margin Enhancement
Authors: Mehrdad Ahmadi Kamarposhti, Mostafa Alinezhad
Abstract:
One of the major causes of voltage instability is the reactive power limit of the system. Improving the system's reactive power handling capacity via Flexible AC transmission System (FACTS) devices is a remedy for prevention of voltage instability and hence voltage collapse. In this paper, the effects of SVC and STATCOM in Static Voltage Stability Margin Enhancement will be studied. AC and DC representations of SVC and STATCOM are used in the continuation power flow process in static voltage stability study. The IEEE-14 bus system is simulated to test the increasing loadability. It is found that these controllers significantly increase the loadability margin of power systems.
Keywords: SVC, STATCOM, Voltage Collapse, Maximum Loading Point.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6374