Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1923

Search results for: missing at random (MAR)

1923 Survival Data with Incomplete Missing Categorical Covariates

Authors: Madaki Umar Yusuf, Mohd Rizam B. Abubakar

Abstract:

The survival censored data with incomplete covariate data is a common occurrence in many studies in which the outcome is survival time. With model when the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM by the method of weights. The survival outcome for the class of generalized linear model is applied and this method requires the estimation of the parameters of the distribution of the covariates. In this paper, we propose some clinical trials with ve covariates, four of which have some missing values which clearly show that they were fully censored data.

Keywords: EM algorithm, incomplete categorical covariates, ignorable missing data, missing at random (MAR), Weibull Distribution

Procedia PDF Downloads 310
1922 Deadline Missing Prediction for Mobile Robots through the Use of Historical Data

Authors: Edwaldo R. B. Monteiro, Patricia D. M. Plentz, Edson R. De Pieri

Abstract:

Mobile robotics is gaining an increasingly important role in modern society. Several potentially dangerous or laborious tasks for human are assigned to mobile robots, which are increasingly capable. Many of these tasks need to be performed within a specified period, i.e., meet a deadline. Missing the deadline can result in financial and/or material losses. Mechanisms for predicting the missing of deadlines are fundamental because corrective actions can be taken to avoid or minimize the losses resulting from missing the deadline. In this work we propose a simple but reliable deadline missing prediction mechanism for mobile robots through the use of historical data and we use the Pioneer 3-DX robot for experiments and simulations, one of the most popular robots in academia.

Keywords: deadline missing, historical data, mobile robots, prediction mechanism

Procedia PDF Downloads 328
1921 Estimation of Missing Values in Aggregate Level Spatial Data

Authors: Amitha Puranik, V. S. Binu, Seena Biju

Abstract:

Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.

Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis

Procedia PDF Downloads 275
1920 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 204
1919 A Review of Methods for Handling Missing Data in the Formof Dropouts in Longitudinal Clinical Trials

Authors: A. Satty, H. Mwambi

Abstract:

Much clinical trials data-based research are characterized by the unavoidable problem of dropout as a result of missing or erroneous values. This paper aims to review some of the various techniques to address the dropout problems in longitudinal clinical trials. The fundamental concepts of the patterns and mechanisms of dropout are discussed. This study presents five general techniques for handling dropout: (1) Deletion methods; (2) Imputation-based methods; (3) Data augmentation methods; (4) Likelihood-based methods; and (5) MNAR-based methods. Under each technique, several methods that are commonly used to deal with dropout are presented, including a review of the existing literature in which we examine the effectiveness of these methods in the analysis of incomplete data. Two application examples are presented to study the potential strengths or weaknesses of some of the methods under certain dropout mechanisms as well as to assess the sensitivity of the modelling assumptions.

Keywords: incomplete longitudinal clinical trials, missing at random (MAR), imputation, weighting methods, sensitivity analysis

Procedia PDF Downloads 321
1918 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second

Authors: P. V. Pramila , V. Mahesh

Abstract:

Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.

Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest

Procedia PDF Downloads 231
1917 Effect of Genuine Missing Data Imputation on Prediction of Urinary Incontinence

Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Ananias Diokno

Abstract:

Missing data is a common challenge in statistical analyses of most clinical survey datasets. A variety of methods have been developed to enable analysis of survey data to deal with missing values. Imputation is the most commonly used among the above methods. However, in order to minimize the bias introduced due to imputation, one must choose the right imputation technique and apply it to the correct type of missing data. In this paper, we have identified different types of missing values: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD) and applied rough set imputation on only the GMD portion of the missing data. We have used rough set imputation to evaluate the effect of such imputation on prediction by generating several simulation datasets based on an existing epidemiological dataset (MESA). To measure how well each dataset lends itself to the prediction model (logistic regression), we have used p-values from the Wald test. To evaluate the accuracy of the prediction, we have considered the width of 95% confidence interval for the probability of incontinence. Both imputed and non-imputed simulation datasets were fit to the prediction model, and they both turned out to be significant (p-value < 0.05). However, the Wald score shows a better fit for the imputed compared to non-imputed datasets (28.7 vs. 23.4). The average confidence interval width was decreased by 10.4% when the imputed dataset was used, meaning higher precision. The results show that using the rough set method for missing data imputation on GMD data improve the predictive capability of the logistic regression. Further studies are required to generalize this conclusion to other clinical survey datasets.

Keywords: rough set, imputation, clinical survey data simulation, genuine missing data, predictive index

Procedia PDF Downloads 79
1916 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values

Authors: Daniel Fundi Murithi

Abstract:

Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.

Keywords: finite population total, missing data, model-based imputation, two-phase sampling

Procedia PDF Downloads 42
1915 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values

Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi

Abstract:

A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.

Keywords: eXtreme gradient boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impair, multiclass classification, ADNI, support vector machine, random forest

Procedia PDF Downloads 83
1914 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 420
1913 Stochastic Simulation of Random Numbers Using Linear Congruential Method

Authors: Melvin Ballera, Aldrich Olivar, Mary Soriano

Abstract:

Digital computers nowadays must be able to have a utility that is capable of generating random numbers. Usually, computer-generated random numbers are not random given predefined values such as starting point and end points, making the sequence almost predictable. There are many applications of random numbers such business simulation, manufacturing, services domain, entertainment sector and other equally areas making worthwhile to design a unique method and to allow unpredictable random numbers. Applying stochastic simulation using linear congruential algorithm, it shows that as it increases the numbers of the seed and range the number randomly produced or selected by the computer becomes unique. If this implemented in an environment where random numbers are very much needed, the reliability of the random number is guaranteed.

Keywords: stochastic simulation, random numbers, linear congruential algorithm, pseudorandomness

Procedia PDF Downloads 224
1912 Bias-Corrected Estimation Methods for Receiver Operating Characteristic Surface

Authors: Khanh To Duc, Monica Chiogna, Gianfranco Adimari

Abstract:

With three diagnostic categories, assessment of the performance of diagnostic tests is achieved by the analysis of the receiver operating characteristic (ROC) surface, which generalizes the ROC curve for binary diagnostic outcomes. The volume under the ROC surface (VUS) is a summary index usually employed for measuring the overall diagnostic accuracy. When the true disease status can be exactly assessed by means of a gold standard (GS) test, unbiased nonparametric estimators of the ROC surface and VUS are easily obtained. In practice, unfortunately, disease status verification via the GS test could be unavailable for all study subjects, due to the expensiveness or invasiveness of the GS test. Thus, often only a subset of patients undergoes disease verification. Statistical evaluations of diagnostic accuracy based only on data from subjects with verified disease status are typically biased. This bias is known as verification bias. Here, we consider the problem of correcting for verification bias when continuous diagnostic tests for three-class disease status are considered. We assume that selection for disease verification does not depend on disease status, given test results and other observed covariates, i.e., we assume that the true disease status, when missing, is missing at random. Under this assumption, we discuss several solutions for ROC surface analysis based on imputation and re-weighting methods. In particular, verification bias-corrected estimators of the ROC surface and of VUS are proposed, namely, full imputation, mean score imputation, inverse probability weighting and semiparametric efficient estimators. Consistency and asymptotic normality of the proposed estimators are established, and their finite sample behavior is investigated by means of Monte Carlo simulation studies. Two illustrations using real datasets are also given.

Keywords: imputation, missing at random, inverse probability weighting, ROC surface analysis

Procedia PDF Downloads 324
1911 Comparison of Statistical Methods for Estimating Missing Precipitation Data in the River Subbasin Lenguazaque, Colombia

Authors: Miguel Cañon, Darwin Mena, Ivan Cabeza

Abstract:

In this work was compared and evaluated the applicability of statistical methods for the estimation of missing precipitations data in the basin of the river Lenguazaque located in the departments of Cundinamarca and Boyacá, Colombia. The methods used were the method of simple linear regression, distance rate, local averages, mean rates, correlation with nearly stations and multiple regression method. The analysis used to determine the effectiveness of the methods is performed by using three statistical tools, the correlation coefficient (r2), standard error of estimation and the test of agreement of Bland and Altmant. The analysis was performed using real rainfall values removed randomly in each of the seasons and then estimated using the methodologies mentioned to complete the missing data values. So it was determined that the methods with the highest performance and accuracy in the estimation of data according to conditions that were counted are the method of multiple regressions with three nearby stations and a random application scheme supported in the precipitation behavior of related data sets.

Keywords: statistical comparison, precipitation data, river subbasin, Bland and Altmant

Procedia PDF Downloads 398
1910 Existence Result of Third Order Functional Random Integro-Differential Inclusion

Authors: D. S. Palimkar

Abstract:

The FRIGDI (functional random integrodifferential inclusion) seems to be new and includes several known random differential inclusions already studied in the literature as special cases have been discussed in the literature for various aspects of the solutions. In this paper, we prove the existence result for FIGDI under the non-convex case of multi-valued function involved in it.Using random fixed point theorem of B. C. Dhage and caratheodory condition. This result is new to the theory of differential inclusion.

Keywords: caratheodory condition, random differential inclusion, random solution, integro-differential inclusion

Procedia PDF Downloads 336
1909 Existence Theory for First Order Functional Random Differential Equations

Authors: Rajkumar N. Ingle

Abstract:

In this paper, the existence of a solution of nonlinear functional random differential equations of the first order is proved under caratheodory condition. The study of the functional random differential equation has got importance in the random analysis of the dynamical systems of universal phenomena. Objectives: Nonlinear functional random differential equation is useful to the scientists, engineers, and mathematicians, who are engaged in N.F.R.D.E. analyzing a universal random phenomenon, govern by nonlinear random initial value problems of D.E. Applications of this in the theory of diffusion or heat conduction. Methodology: Using the concepts of probability theory, functional analysis, generally the existence theorems for the nonlinear F.R.D.E. are prove by using some tools such as fixed point theorem. The significance of the study: Our contribution will be the generalization of some well-known results in the theory of Nonlinear F.R.D.E.s. Further, it seems that our study will be useful to scientist, engineers, economists and mathematicians in their endeavors to analyses the nonlinear random problems of the universe in a better way.

Keywords: Random Fixed Point Theorem, functional random differential equation, N.F.R.D.E., universal random phenomenon

Procedia PDF Downloads 386
1908 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Authors: Loai AbdAllah, Mahmoud Kaiyal

Abstract:

Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.

Keywords: missing values, incomplete data, distance, incomplete diabetes data

Procedia PDF Downloads 132
1907 A Very Efficient Pseudo-Random Number Generator Based On Chaotic Maps and S-Box Tables

Authors: M. Hamdi, R. Rhouma, S. Belghith

Abstract:

Generating random numbers are mainly used to create secret keys or random sequences. It can be carried out by various techniques. In this paper we present a very simple and efficient pseudo-random number generator (PRNG) based on chaotic maps and S-Box tables. This technique adopted two main operations one to generate chaotic values using two logistic maps and the second to transform them into binary words using random S-Box tables. The simulation analysis indicates that our PRNG possessing excellent statistical and cryptographic properties.

Keywords: Random Numbers, Chaotic map, S-box, cryptography, statistical tests

Procedia PDF Downloads 276
1906 Heuristic to Generate Random X-Monotone Polygons

Authors: Kamaljit Pati, Manas Kumar Mohanty, Sanjib Sadhu

Abstract:

A heuristic has been designed to generate a random simple monotone polygon from a given set of ‘n’ points lying on a 2-Dimensional plane. Our heuristic generates a random monotone polygon in O(n) time after O(nℓogn) preprocessing time which is improved over the previous work where a random monotone polygon is produced in the same O(n) time but the preprocessing time is O(k) for n < k < n2. However, our heuristic does not generate all possible random polygons with uniform probability. The space complexity of our proposed heuristic is O(n).

Keywords: sorting, monotone polygon, visibility, chain

Procedia PDF Downloads 353
1905 Modern Imputation Technique for Missing Data in Linear Functional Relationship Model

Authors: Adilah Abdul Ghapor, Yong Zulina Zubairi, Rahmatullah Imon

Abstract:

Missing value problem is common in statistics and has been of interest for years. This article considers two modern techniques in handling missing data for linear functional relationship model (LFRM) namely the Expectation-Maximization (EM) algorithm and Expectation-Maximization with Bootstrapping (EMB) algorithm using three performance indicators; namely the mean absolute error (MAE), root mean square error (RMSE) and estimated biased (EB). In this study, we applied the methods of imputing missing values in the LFRM. Results of the simulation study suggest that EMB algorithm performs much better than EM algorithm in both models. We also illustrate the applicability of the approach in a real data set.

Keywords: expectation-maximization, expectation-maximization with bootstrapping, linear functional relationship model, performance indicators

Procedia PDF Downloads 305
1904 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: data estimation, link data, machine learning, road network

Procedia PDF Downloads 430
1903 Handling Missing Data by Using Expectation-Maximization and Expectation-Maximization with Bootstrapping for Linear Functional Relationship Model

Authors: Adilah Abdul Ghapor, Yong Zulina Zubairi, A. H. M. R. Imon

Abstract:

Missing value problem is common in statistics and has been of interest for years. This article considers two modern techniques in handling missing data for linear functional relationship model (LFRM) namely the Expectation-Maximization (EM) algorithm and Expectation-Maximization with Bootstrapping (EMB) algorithm using three performance indicators; namely the mean absolute error (MAE), root mean square error (RMSE) and estimated biased (EB). In this study, we applied the methods of imputing missing values in two types of LFRM namely the full model of LFRM and in LFRM when the slope is estimated using a nonparametric method. Results of the simulation study suggest that EMB algorithm performs much better than EM algorithm in both models. We also illustrate the applicability of the approach in a real data set.

Keywords: expectation-maximization, expectation-maximization with bootstrapping, linear functional relationship model, performance indicators

Procedia PDF Downloads 373
1902 University Students’ Fear of Missing out and Night Eating Syndrome. A Descriptive Correlational Study

Authors: Mohammed Qutishat, Omar Al-Omari, Kholoud Al-Damery, Mohammed Al-Qadiri

Abstract:

Objective: The current study aims to explore the relationship between Night Eating Syndrome and the experiences of Fear of Missing out (FOMO) among college students in Oman. Methods: The study adopted a descriptive correlational design. The total sample was 366 based on defined inclusion criteria. The questionnaires were distributed over one month during the spring semester of 2020. We used a self-report instrument as a measurement tool to investigate the extents of the research phenomena, and it consists of two major sections: fear of missing out Questionnaires and Night Eating Questionnaire. Results: The respondents' age ranged between 18 and 30. The majority of the participants were female 76.7% (204), single 97.7% (266), in their third academic year 28.6% (76), live in –campus, 57.1% (152). The findings of this study showed that fear of missing out experiences are significantly correlated with age (P=.010), gender (P= .005), and daily sleeping hours (P= .007). However, night eating experiences are significantly associated with age (p=018), living arrangement (P= .017), and sleeping hours (P= .000). Conclusion: This article can define a limiting aspect of the relationship between fear of missing out and night eating behaviors. During academic life, students may find themselves overloaded and use their smartphones to do the simplest tasks they have, leading them to skip their meals frequently and interfere with their eating patterns and psychological function. Health awareness programs or the implementation of healthy eating standards and technology uses can be introduced for undergraduates.

Keywords: fear of missing out, night eating syndrome, smartphone, addiction

Procedia PDF Downloads 109
1901 Determining Optimal Number of Trees in Random Forests

Authors: Songul Cinaroglu

Abstract:

Background: Random Forest is an efficient, multi-class machine learning method using for classification, regression and other tasks. This method is operating by constructing each tree using different bootstrap sample of the data. Determining the number of trees in random forests is an open question in the literature for studies about improving classification performance of random forests. Aim: The aim of this study is to analyze whether there is an optimal number of trees in Random Forests and how performance of Random Forests differ according to increase in number of trees using sample health data sets in R programme. Method: In this study we analyzed the performance of Random Forests as the number of trees grows and doubling the number of trees at every iteration using “random forest” package in R programme. For determining minimum and optimal number of trees we performed Mc Nemar test and Area Under ROC Curve respectively. Results: At the end of the analysis it was found that as the number of trees grows, it does not always means that the performance of the forest is better than forests which have fever trees. In other words larger number of trees only increases computational costs but not increases performance results. Conclusion: Despite general practice in using random forests is to generate large number of trees for having high performance results, this study shows that increasing number of trees doesn’t always improves performance. Future studies can compare different kinds of data sets and different performance measures to test whether Random Forest performance results change as number of trees increase or not.

Keywords: classification methods, decision trees, number of trees, random forest

Procedia PDF Downloads 305
1900 [Keynote Talk]: Existence of Random Fixed Point Theorem for Contractive Mappings

Authors: D. S. Palimkar

Abstract:

Random fixed point theory has received much attention in recent years, and it is needed for the study of various classes of random equations. The study of random fixed point theorems was initiated by the Prague school of probabilistic in the 1950s. The existence and uniqueness of fixed points for the self-maps of a metric space by altering distances between the points with the use of a control function is an interesting aspect in the classical fixed point theory. In a new category of fixed point problems for a single self-map with the help of a control function that alters the distance between two points in a metric space which they called an altering distance function. In this paper, we prove the results of existence of random common fixed point and its uniqueness for a pair of random mappings under weakly contractive condition for generalizing alter distance function in polish spaces using Random Common Fixed Point Theorem for Generalized Weakly Contractions.

Keywords: Polish space, random common fixed point theorem, weakly contractive mapping, altering function

Procedia PDF Downloads 195
1899 Attitude Stabilization of Satellites Using Random Dither Quantization

Authors: Kazuma Okada, Tomoaki Hashimoto, Hirokazu Tahara

Abstract:

Recently, the effectiveness of random dither quantization method for linear feedback control systems has been shown in several papers. However, the random dither quantization method has not yet been applied to nonlinear feedback control systems. The objective of this paper is to verify the effectiveness of random dither quantization method for nonlinear feedback control systems. For this purpose, we consider the attitude stabilization problem of satellites using discrete-level actuators. Namely, this paper provides a control method based on the random dither quantization method for stabilizing the attitude of satellites using discrete-level actuators.

Keywords: quantized control, nonlinear systems, random dither quantization

Procedia PDF Downloads 150
1898 Analyzing the Performance of Machine Learning Models to Predict Alzheimer's Disease and its Stages Addressing Missing Value Problem

Authors: Carlos Theran, Yohn Parra Bautista, Victor Adankai, Richard Alo, Jimwi Liu, Clement G. Yedjou

Abstract:

Alzheimer's disease (AD) is a neurodegenerative disorder primarily characterized by deteriorating cognitive functions. AD has gained relevant attention in the last decade. An estimated 24 million people worldwide suffered from this disease by 2011. In 2016 an estimated 40 million were diagnosed with AD, and for 2050 is expected to reach 131 million people affected by AD. Therefore, detecting and confirming AD at its different stages is a priority for medical practices to provide adequate and accurate treatments. Recently, Machine Learning (ML) models have been used to study AD's stages handling missing values in multiclass, focusing on the delineation of Early Mild Cognitive Impairment (EMCI), Late Mild Cognitive Impairment (LMCI), and normal cognitive (CN). But, to our best knowledge, robust performance information of these models and the missing data analysis has not been presented in the literature. In this paper, we propose studying the performance of five different machine learning models for AD's stages multiclass prediction in terms of accuracy, precision, and F1-score. Also, the analysis of three imputation methods to handle the missing value problem is presented. A framework that integrates ML model for AD's stages multiclass prediction is proposed, performing an average accuracy of 84%.

Keywords: alzheimer's disease, missing value, machine learning, performance evaluation

Procedia PDF Downloads 38
1897 Asymptotic Spectral Theory for Nonlinear Random Fields

Authors: Karima Kimouche

Abstract:

In this paper, we consider the asymptotic problems in spectral analysis of stationary causal random fields. We impose conditions only involving (conditional) moments, which are easily verifiable for a variety of nonlinear random fields. Limiting distributions of periodograms and smoothed periodogram spectral density estimates are obtained and applications to the spectral domain bootstrap are given.

Keywords: spatial nonlinear processes, spectral estimators, GMC condition, bootstrap method

Procedia PDF Downloads 358
1896 Non-Universality in Barkhausen Noise Signatures of Thin Iron Films

Authors: Arnab Roy, P. S. Anil Kumar

Abstract:

We discuss angle dependent changes to the Barkhausen noise signatures of thin epitaxial Fe films upon altering the angle of the applied field. We observe a sub-critical to critical phase transition in the hysteresis loop of the sample upon increasing the out-of-plane component of the applied field. The observations are discussed in the light of simulations of a 2D Gaussian Random Field Ising Model with references to a reducible form of the Random Anisotropy Ising Model.

Keywords: Barkhausen noise, Planar Hall effect, Random Field Ising Model, Random Anisotropy Ising Model

Procedia PDF Downloads 298
1895 Effect of Correlation of Random Variables on Structural Reliability Index

Authors: Agnieszka Dudzik

Abstract:

The problem of correlation between random variables in the structural reliability analysis has been extensively discussed in literature on the subject. The cases taken under consideration were usually related to correlation between random variables from one side of ultimate limit state: correlation between particular loads applied on structure or correlation between resistance of particular members of a structure as a system. It has been proved that positive correlation between these random variables reduces the reliability of structure and increases the probability of failure. In the paper, the problem of correlation between random variables from both side of the limit state equation will be taken under consideration. The simplest case where these random variables are of the normal distributions will be concerned. The case when a degree of that correlation is described by the covariance or the coefficient of correlation will be used. Special attention will be paid on questions: how much that correlation changes the reliability level and can it be ignored. In reliability analysis will be used well-known methods for assessment of the failure probability: based on the Hasofer-Lind reliability index and Monte Carlo method adapted to the problem of correlation. The main purpose of this work will be a presentation how correlation of random variables influence on reliability index of steel bar structures. Structural design parameters will be defined as deterministic values and random variables. The latter will be correlated. The criterion of structural failure will be expressed by limit functions related to the ultimate and serviceability limit state. In the description of random variables will be used only for the normal distribution. Sensitivity of reliability index to the random variables will be defined. If the reliability index sensitivity due to the random variable X will be low when compared with other variables, it can be stated that the impact of this variable on failure probability is small. Therefore, in successive computations, it can be treated as a deterministic parameter. Sensitivity analysis leads to simplify the description of the mathematical model, determine the new limit functions and values of the Hasofer-Lind reliability index. In the examples, the NUMPRESS software will be used in the reliability analysis.

Keywords: correlation of random variables, reliability index, sensitivity of reliability index, steel structure

Procedia PDF Downloads 147
1894 On Deterministic Chaos: Disclosing the Missing Mathematics from the Lorenz-Haken Equations

Authors: Meziane Belkacem

Abstract:

We aim at converting the original 3D Lorenz-Haken equations, which describe laser dynamics –in terms of self-pulsing and chaos- into 2-second-order differential equations, out of which we extract the so far missing mathematics and corroborations with respect to nonlinear interactions. Leaning on basic trigonometry, we pull out important outcomes; a fundamental result attributes chaos to forbidden periodic solutions inside some precisely delimited region of the control parameter space that governs the bewildering dynamics.

Keywords: Physics, optics, nonlinear dynamics, chaos

Procedia PDF Downloads 58