Search results for: sparse regression
3291 Analysis of Factors Affecting the Number of Infant and Maternal Mortality in East Java with Geographically Weighted Bivariate Generalized Poisson Regression Method
Authors: Luh Eka Suryani, Purhadi
Abstract:
Poisson regression is a non-linear regression model with response variable in the form of count data that follows Poisson distribution. Modeling for a pair of count data that show high correlation can be analyzed by Poisson Bivariate Regression. Data, the number of infant mortality and maternal mortality, are count data that can be analyzed by Poisson Bivariate Regression. The Poisson regression assumption is an equidispersion where the mean and variance values are equal. However, the actual count data has a variance value which can be greater or less than the mean value (overdispersion and underdispersion). Violations of this assumption can be overcome by applying Generalized Poisson Regression. Characteristics of each regency can affect the number of cases occurred. This issue can be overcome by spatial analysis called geographically weighted regression. This study analyzes the number of infant mortality and maternal mortality based on conditions in East Java in 2016 using Geographically Weighted Bivariate Generalized Poisson Regression (GWBGPR) method. Modeling is done with adaptive bisquare Kernel weighting which produces 3 regency groups based on infant mortality rate and 5 regency groups based on maternal mortality rate. Variables that significantly influence the number of infant and maternal mortality are the percentages of pregnant women visit health workers at least 4 times during pregnancy, pregnant women get Fe3 tablets, obstetric complication handled, clean household and healthy behavior, and married women with the first marriage age under 18 years.Keywords: adaptive bisquare kernel, GWBGPR, infant mortality, maternal mortality, overdispersion
Procedia PDF Downloads 1593290 Determining the Causality Variables in Female Genital Mutilation: A Factor Screening Approach
Authors: Ekele Alih, Enejo Jalija
Abstract:
Female Genital Mutilation (FGM) is made up of three types namely: Clitoridectomy, Excision and Infibulation. In this study, we examine the factors responsible for FGM in order to identify the causality variables in a logistic regression approach. From the result of the survey conducted by the Public Health Division, Nigeria Institute of Medical Research, Yaba, Lagos State, the tau statistic, τ was used to screen 9 factors that causes FGM in order to select few of the predictors before multiple regression equation is obtained. The need for this may be that the sample size may not be able to sustain having a regression with all the predictors or to avoid multi-collinearity. A total of 300 respondents, comprising 150 adult males and 150 adult females were selected for the household survey based on the multi-stage sampling procedure. The tau statistic,Keywords: female genital mutilation, logistic regression, tau statistic, African society
Procedia PDF Downloads 2613289 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation
Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen
Abstract:
Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning
Procedia PDF Downloads 723288 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco
Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui
Abstract:
The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate
Procedia PDF Downloads 1883287 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models
Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini
Abstract:
The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion
Procedia PDF Downloads 1393286 Generalized Additive Model for Estimating Propensity Score
Authors: Tahmidul Islam
Abstract:
Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching
Procedia PDF Downloads 3673285 A Comparison of Neural Network and DOE-Regression Analysis for Predicting Resource Consumption of Manufacturing Processes
Authors: Frank Kuebler, Rolf Steinhilper
Abstract:
Artificial neural networks (ANN) as well as Design of Experiments (DOE) based regression analysis (RA) are mainly used for modeling of complex systems. Both methodologies are commonly applied in process and quality control of manufacturing processes. Due to the fact that resource efficiency has become a critical concern for manufacturing companies, these models needs to be extended to predict resource-consumption of manufacturing processes. This paper describes an approach to use neural networks as well as DOE based regression analysis for predicting resource consumption of manufacturing processes and gives a comparison of the achievable results based on an industrial case study of a turning process.Keywords: artificial neural network, design of experiments, regression analysis, resource efficiency, manufacturing process
Procedia PDF Downloads 5243284 Logistic Regression Model versus Additive Model for Recurrent Event Data
Authors: Entisar A. Elgmati
Abstract:
Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event
Procedia PDF Downloads 6353283 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments
Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic
Abstract:
Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.Keywords: time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder
Procedia PDF Downloads 2893282 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data
Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone
Abstract:
This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as a ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease data set, the study successfully identified key factors, and the results were consistent with previous studies.Keywords: lyme disease, Poisson generalized linear model, ridge regression, lasso regression, elastic net regression
Procedia PDF Downloads 1373281 Genetic Divergence and Morphogenic Analysis of Sugarcane Red Rot Pathogen Colletotrichum falcatum under South Gujarat Condition
Authors: Prittesh Patel, Ramar Krishnamurthy
Abstract:
In the present study, nine strains of C. falcatum obtained from different places and cultivars were characterized for sporulation, growth rate, and 18S rRNA gene sequence. All isolates had characteristic fast-growing sparse and fleecy aerial mycelia on potato dextrose agar with sickle shape conidia (length x width: varied from 20.0 X 3.89 to 25.52 X 5.34 μm) and blackish to orange acervuli with setae (length x width: varied from 112.37X 2.78 to 167.66 X 6.73 μm). They could be divided into two groups on the base of morphology; P1, dense mycelia with concentric growth and P2, sparse mycelia with uneven growth. Genomic DNA isolation followed by PCR amplification with ITS1 and ITS4 primer produced ~550bp amplicons for all isolates. Phylogeny generated by 18S rRNA gene sequence confirmed the variation in isolates and mainly grouped into two clusters; cluster 1 contained CoC671 isolates (cfNAV and cfPAR) and Co86002 isolate (cfTIM). Other isolates cfMAD, cfKAM, and cfMAR were grouped into cluster 2. Remaining isolates did not fall into any cluster. Isolate cfGAN, collected from Co86032 was found highly diverse of all the nine isolates. In a nutshell, we found considerable genetic divergence and morphological variation within C. falcatum accessions collected from different areas of south Gujarat, India and these can be used for the breeding program.Keywords: Colletotrichum falcatum, ITS, morphology, red rot, sugarcane
Procedia PDF Downloads 1273280 An Analysis of the Effect of Sharia Financing and Work Relation Founding towards Non-Performing Financing in Islamic Banks in Indonesia
Authors: Muhammad Bahrul Ilmi
Abstract:
The purpose of this research is to analyze the influence of Islamic financing and work relation founding simultaneously and partially towards non-performing financing in Islamic banks. This research was regression quantitative field research, and had been done in Muammalat Indonesia Bank and Islamic Danamon Bank in 3 months. The populations of this research were 15 account officers of Muammalat Indonesia Bank and Islamic Danamon Bank in Surakarta, Indonesia. The techniques of collecting data used in this research were documentation, questionnaire, literary study and interview. Regression analysis result shows that Islamic financing and work relation founding simultaneously has positive and significant effect towards non performing financing of two Islamic Banks. It is obtained with probability value 0.003 which is less than 0.05 and F value 9.584. The analysis result of Islamic financing regression towards non performing financing shows the significant effect. It is supported by double linear regression analysis with probability value 0.001 which is less than 0.05. The regression analysis of work relation founding effect towards non-performing financing shows insignificant effect. This is shown in the double linear regression analysis with probability value 0.161 which is bigger than 0.05.Keywords: Syariah financing, work relation founding, non-performing financing (NPF), Islamic Bank
Procedia PDF Downloads 4313279 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies
Authors: Chen Li-Ching
Abstract:
The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression
Procedia PDF Downloads 4563278 A Study on Inference from Distance Variables in Hedonic Regression
Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro
Abstract:
In urban area, several landmarks may affect housing price and rents, hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.Keywords: landmarks, hedonic regression, distance variables, collinearity, multicollinerity
Procedia PDF Downloads 4523277 Forecasting of Grape Juice Flavor by Using Support Vector Regression
Authors: Ren-Jieh Kuo, Chun-Shou Huang
Abstract:
The research of juice flavor forecasting has become more important in China. Due to the fast economic growth in China, many different kinds of juices have been introduced to the market. If a beverage company can understand their customers’ preference well, the juice can be served more attractively. Thus, this study intends to introduce the basic theory and computing process of grapes juice flavor forecasting based on support vector regression (SVR). Applying SVR, BPN and LR to forecast the flavor of grapes juice in real data, the result shows that SVR is more suitable and effective at predicting performance.Keywords: flavor forecasting, artificial neural networks, Support Vector Regression, China
Procedia PDF Downloads 4923276 Estimation of Coefficients of Ridge and Principal Components Regressions with Multicollinear Data
Authors: Rajeshwar Singh
Abstract:
The presence of multicollinearity is common in handling with several explanatory variables simultaneously due to exhibiting a linear relationship among them. A great problem arises in understanding the impact of explanatory variables on the dependent variable. Thus, the method of least squares estimation gives inexact estimates. In this case, it is advised to detect its presence first before proceeding further. Using the ridge regression degree of its occurrence is reduced but principal components regression gives good estimates in this situation. This paper discusses well-known techniques of the ridge and principal components regressions and applies to get the estimates of coefficients by both techniques. In addition to it, this paper also discusses the conflicting claim on the discovery of the method of ridge regression based on available documents.Keywords: conflicting claim on credit of discovery of ridge regression, multicollinearity, principal components and ridge regressions, variance inflation factor
Procedia PDF Downloads 4183275 Performance Comparison of Wideband Covariance Matrix Sparse Representation (W-CMSR) with Other Wideband DOA Estimation Methods
Authors: Sandeep Santosh, O. P. Sahu
Abstract:
In this paper, performance comparison of wideband covariance matrix sparse representation (W-CMSR) method with other existing wideband Direction of Arrival (DOA) estimation methods has been made.W-CMSR relies less on a priori information of the incident signal number than the ordinary subspace based methods.Consider the perturbation free covariance matrix of the wideband array output. The diagonal covariance elements are contaminated by unknown noise variance. The covariance matrix of array output is conjugate symmetric i.e its upper right triangular elements can be represented by lower left triangular ones.As the main diagonal elements are contaminated by unknown noise variance,slide over them and align the lower left triangular elements column by column to obtain a measurement vector.Simulation results for W-CMSR are compared with simulation results of other wideband DOA estimation methods like Coherent signal subspace method (CSSM), Capon, l1-SVD, and JLZA-DOA. W-CMSR separate two signals very clearly and CSSM, Capon, L1-SVD and JLZA-DOA fail to separate two signals clearly and an amount of pseudo peaks exist in the spectrum of L1-SVD.Keywords: W-CMSR, wideband direction of arrival (DOA), covariance matrix, electrical and computer engineering
Procedia PDF Downloads 4713274 Physical Activity and Mental Health: A Cross-Sectional Investigation into the Relationship of Specific Physical Activity Domains and Mental Well-Being
Authors: Katja Siefken, Astrid Junge
Abstract:
Background: Research indicates that physical activity (PA) protects us from developing mental disorders. The knowledge regarding optimal domain, intensity, type, context, and amount of PA promotion for the prevention of mental disorders is sparse and incoherent. The objective of this study is to determine the relationship between PA domains and mental well-being, and whether associations vary by domain, amount, context, intensity, and type of PA. Methods: 310 individuals (age: 25 yrs., SD 7; 73% female) completed a questionnaire on personal patterns of their PA behaviour (IPQA) and their mental health (Centre of Epidemiologic Studies Depression Scale (CES-D), Generalized Anxiety Disorder (GAD-7) scale, the subjective physical well-being (FEW-16)). Linear and multiple regression were used for analysis. Findings: Individuals who met the PA recommendation (N=269) reported higher scores on subjective physical well-being than those who did not meet the PA recommendations (N=41). Whilst vigorous intensity PA predicts subjective well-being (β = .122, p = .028), it also correlates with depression. The more vigorously physically active a person is, the higher the depression score (β = .127, p = .026). The strongest impact of PA on mental well-being can be seen in the transport domain. A positive linear correlation on subjective physical well-being (β =.175, p = .002), and a negative linear correlation for anxiety (β =-.142, p = .011) and depression (β = -.164, p = .004) was found. Multiple regression analysis indicates similar results: Time spent in active transport on the bicycle significantly lowers anxiety and depression scores and enhances subjective physical well-being. The more time a participant spends using the bicycle for transport, the lower the depression (β = -.143, p = .013) and anxiety scores (β = -.111,p = .050). Conclusions: Meeting the PA recommendations enhances subjective physical well-being. Active transport has a substantial impact on mental well-being. Findings have implications for policymakers, employers, public health experts and civil society. A stronger focus on the promotion and protection of health through active transport is recommended. Inter-sectoral exchange, outside the health sector, is required. Health systems must engage other sectors in adopting policies that maximize possible health gains.Keywords: active transport, mental well-being, health promotion, psychological disorders
Procedia PDF Downloads 3203273 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing
Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva
Abstract:
In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity
Procedia PDF Downloads 3113272 Incorporating Anomaly Detection in a Digital Twin Scenario Using Symbolic Regression
Authors: Manuel Alves, Angelica Reis, Armindo Lobo, Valdemar Leiras
Abstract:
In industry 4.0, it is common to have a lot of sensor data. In this deluge of data, hints of possible problems are difficult to spot. The digital twin concept aims to help answer this problem, but it is mainly used as a monitoring tool to handle the visualisation of data. Failure detection is of paramount importance in any industry, and it consumes a lot of resources. Any improvement in this regard is of tangible value to the organisation. The aim of this paper is to add the ability to forecast test failures, curtailing detection times. To achieve this, several anomaly detection algorithms were compared with a symbolic regression approach. To this end, Isolation Forest, One-Class SVM and an auto-encoder have been explored. For the symbolic regression PySR library was used. The first results show that this approach is valid and can be added to the tools available in this context as a low resource anomaly detection method since, after training, the only requirement is the calculation of a polynomial, a useful feature in the digital twin context.Keywords: anomaly detection, digital twin, industry 4.0, symbolic regression
Procedia PDF Downloads 1203271 Impact of Infrastructural Development on Socio-Economic Growth: An Empirical Investigation in India
Authors: Jonardan Koner
Abstract:
The study attempts to find out the impact of infrastructural investment on state economic growth in India. It further tries to determine the magnitude of the impact of infrastructural investment on economic indicator, i.e., per-capita income (PCI) in Indian States. The study uses panel regression technique to measure the impact of infrastructural investment on per-capita income (PCI) in Indian States. Panel regression technique helps incorporate both the cross-section and time-series aspects of the dataset. In order to analyze the difference in impact of the explanatory variables on the explained variables across states, the study uses Fixed Effect Panel Regression Model. The conclusions of the study are that infrastructural investment has a desirable impact on economic development and that the impact is different for different states in India. We analyze time series data (annual frequency) ranging from 1991 to 2010. The study reveals that the infrastructural investment significantly explains the variation of economic indicators.Keywords: infrastructural investment, multiple regression, panel regression techniques, economic development, fixed effect dummy variable model
Procedia PDF Downloads 3713270 A Quadratic Model to Early Predict the Blastocyst Stage with a Time Lapse Incubator
Authors: Cecile Edel, Sandrine Giscard D'Estaing, Elsa Labrune, Jacqueline Lornage, Mehdi Benchaib
Abstract:
Introduction: The use of incubator equipped with time-lapse technology in Artificial Reproductive Technology (ART) allows a continuous surveillance. With morphocinetic parameters, algorithms are available to predict the potential outcome of an embryo. However, the different proposed time-lapse algorithms do not take account the missing data, and then some embryos could not be classified. The aim of this work is to construct a predictive model even in the case of missing data. Materials and methods: Patients: A retrospective study was performed, in biology laboratory of reproduction at the hospital ‘Femme Mère Enfant’ (Lyon, France) between 1 May 2013 and 30 April 2015. Embryos (n= 557) obtained from couples (n=108) were cultured in a time-lapse incubator (Embryoscope®, Vitrolife, Goteborg, Sweden). Time-lapse incubator: The morphocinetic parameters obtained during the three first days of embryo life were used to build the predictive model. Predictive model: A quadratic regression was performed between the number of cells and time. N = a. T² + b. T + c. N: number of cells at T time (T in hours). The regression coefficients were calculated with Excel software (Microsoft, Redmond, WA, USA), a program with Visual Basic for Application (VBA) (Microsoft) was written for this purpose. The quadratic equation was used to find a value that allows to predict the blastocyst formation: the synthetize value. The area under the curve (AUC) obtained from the ROC curve was used to appreciate the performance of the regression coefficients and the synthetize value. A cut-off value has been calculated for each regression coefficient and for the synthetize value to obtain two groups where the difference of blastocyst formation rate according to the cut-off values was maximal. The data were analyzed with SPSS (IBM, Il, Chicago, USA). Results: Among the 557 embryos, 79.7% had reached the blastocyst stage. The synthetize value corresponds to the value calculated with time value equal to 99, the highest AUC was then obtained. The AUC for regression coefficient ‘a’ was 0.648 (p < 0.001), 0.363 (p < 0.001) for the regression coefficient ‘b’, 0.633 (p < 0.001) for the regression coefficient ‘c’, and 0.659 (p < 0.001) for the synthetize value. The results are presented as follow: blastocyst formation rate under cut-off value versus blastocyst rate formation above cut-off value. For the regression coefficient ‘a’ the optimum cut-off value was -1.14.10-3 (61.3% versus 84.3%, p < 0.001), 0.26 for the regression coefficient ‘b’ (83.9% versus 63.1%, p < 0.001), -4.4 for the regression coefficient ‘c’ (62.2% versus 83.1%, p < 0.001) and 8.89 for the synthetize value (58.6% versus 85.0%, p < 0.001). Conclusion: This quadratic regression allows to predict the outcome of an embryo even in case of missing data. Three regression coefficients and a synthetize value could represent the identity card of an embryo. ‘a’ regression coefficient represents the acceleration of cells division, ‘b’ regression coefficient represents the speed of cell division. We could hypothesize that ‘c’ regression coefficient could represent the intrinsic potential of an embryo. This intrinsic potential could be dependent from oocyte originating the embryo. These hypotheses should be confirmed by studies analyzing relationship between regression coefficients and ART parameters.Keywords: ART procedure, blastocyst formation, time-lapse incubator, quadratic model
Procedia PDF Downloads 3063269 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values
Authors: Daniel Fundi Murithi
Abstract:
Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.Keywords: finite population total, missing data, model-based imputation, two-phase sampling
Procedia PDF Downloads 1303268 A Novel Approach towards Test Case Prioritization Technique
Authors: Kamna Solanki, Yudhvir Singh, Sandeep Dalal
Abstract:
Software testing is a time and cost intensive process. A scrutiny of the code and rigorous testing is required to identify and rectify the putative bugs. The process of bug identification and its consequent correction is continuous in nature and often some of the bugs are removed after the software has been launched in the market. This process of code validation of the altered software during the maintenance phase is termed as Regression testing. Regression testing ubiquitously considers resource constraints; therefore, the deduction of an appropriate set of test cases, from the ensemble of the entire gamut of test cases, is a critical issue for regression test planning. This paper presents a novel method for designing a suitable prioritization process to optimize fault detection rate and performance of regression test on predefined constraints. The proposed method for test case prioritization m-ACO alters the food source selection criteria of natural ants and is basically a modified version of Ant Colony Optimization (ACO). The proposed m-ACO approach has been coded in 'Perl' language and results are validated using three examples by computation of Average Percentage of Faults Detected (APFD) metric.Keywords: regression testing, software testing, test case prioritization, test suite optimization
Procedia PDF Downloads 3383267 Prediction of the Thermodynamic Properties of Hydrocarbons Using Gaussian Process Regression
Authors: N. Alhazmi
Abstract:
Knowing the thermodynamics properties of hydrocarbons is vital when it comes to analyzing the related chemical reaction outcomes and understanding the reaction process, especially in terms of petrochemical industrial applications, combustions, and catalytic reactions. However, measuring the thermodynamics properties experimentally is time-consuming and costly. In this paper, Gaussian process regression (GPR) has been used to directly predict the main thermodynamic properties - standard enthalpy of formation, standard entropy, and heat capacity -for more than 360 cyclic and non-cyclic alkanes, alkenes, and alkynes. A simple workflow has been proposed that can be applied to directly predict the main properties of any hydrocarbon by knowing its descriptors and chemical structure and can be generalized to predict the main properties of any material. The model was evaluated by calculating the statistical error R², which was more than 0.9794 for all the predicted properties.Keywords: thermodynamic, Gaussian process regression, hydrocarbons, regression, supervised learning, entropy, enthalpy, heat capacity
Procedia PDF Downloads 2223266 Solving Single Machine Total Weighted Tardiness Problem Using Gaussian Process Regression
Authors: Wanatchapong Kongkaew
Abstract:
This paper proposes an application of probabilistic technique, namely Gaussian process regression, for estimating an optimal sequence of the single machine with total weighted tardiness (SMTWT) scheduling problem. In this work, the Gaussian process regression (GPR) model is utilized to predict an optimal sequence of the SMTWT problem, and its solution is improved by using an iterated local search based on simulated annealing scheme, called GPRISA algorithm. The results show that the proposed GPRISA method achieves a very good performance and a reasonable trade-off between solution quality and time consumption. Moreover, in the comparison of deviation from the best-known solution, the proposed mechanism noticeably outperforms the recently existing approaches.Keywords: Gaussian process regression, iterated local search, simulated annealing, single machine total weighted tardiness
Procedia PDF Downloads 3093265 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation
Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski
Abstract:
Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.Keywords: bootstrap, edgeworth approximation, IID, quantile
Procedia PDF Downloads 1593264 A Picture is worth a Billion Bits: Real-Time Image Reconstruction from Dense Binary Pixels
Authors: Tal Remez, Or Litany, Alex Bronstein
Abstract:
The pursuit of smaller pixel sizes at ever increasing resolution in digital image sensors is mainly driven by the stringent price and form-factor requirements of sensors and optics in the cellular phone market. Recently, Eric Fossum proposed a novel concept of an image sensor with dense sub-diffraction limit one-bit pixels (jots), which can be considered a digital emulation of silver halide photographic film. This idea has been recently embodied as the EPFL Gigavision camera. A major bottleneck in the design of such sensors is the image reconstruction process, producing a continuous high dynamic range image from oversampled binary measurements. The extreme quantization of the Poisson statistics is incompatible with the assumptions of most standard image processing and enhancement frameworks. The recently proposed maximum-likelihood (ML) approach addresses this difficulty, but suffers from image artifacts and has impractically high computational complexity. In this work, we study a variant of a sensor with binary threshold pixels and propose a reconstruction algorithm combining an ML data fitting term with a sparse synthesis prior. We also show an efficient hardware-friendly real-time approximation of this inverse operator. Promising results are shown on synthetic data as well as on HDR data emulated using multiple exposures of a regular CMOS sensor.Keywords: binary pixels, maximum likelihood, neural networks, sparse coding
Procedia PDF Downloads 2013263 A Hybrid Classical-Quantum Algorithm for Boundary Integral Equations of Scattering Theory
Authors: Damir Latypov
Abstract:
A hybrid classical-quantum algorithm to solve boundary integral equations (BIE) arising in problems of electromagnetic and acoustic scattering is proposed. The quantum speed-up is due to a Quantum Linear System Algorithm (QLSA). The original QLSA of Harrow et al. provides an exponential speed-up over the best-known classical algorithms but only in the case of sparse systems. Due to the non-local nature of integral operators, matrices arising from discretization of BIEs, are, however, dense. A QLSA for dense matrices was introduced in 2017. Its runtime as function of the system's size N is bounded by O(√Npolylog(N)). The run time of the best-known classical algorithm for an arbitrary dense matrix scales as O(N².³⁷³). Instead of exponential as in case of sparse matrices, here we have only a polynomial speed-up. Nevertheless, sufficiently high power of this polynomial, ~4.7, should make QLSA an appealing alternative. Unfortunately for the QLSA, the asymptotic separability of the Green's function leads to high compressibility of the BIEs matrices. Classical fast algorithms such as Multilevel Fast Multipole Method (MLFMM) take advantage of this fact and reduce the runtime to O(Nlog(N)), i.e., the QLSA is only quadratically faster than the MLFMM. To be truly impactful for computational electromagnetics and acoustics engineers, QLSA must provide more substantial advantage than that. We propose a computational scheme which combines elements of the classical fast algorithms with the QLSA to achieve the required performance.Keywords: quantum linear system algorithm, boundary integral equations, dense matrices, electromagnetic scattering theory
Procedia PDF Downloads 1543262 Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.Keywords: clustering, data analysis, data mining, predictive models
Procedia PDF Downloads 466