Search results for: multivariate regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3507

Search results for: multivariate regression

3477 Introduction of Robust Multivariate Process Capability Indices

Authors: Behrooz Khalilloo, Hamid Shahriari, Emad Roghanian

Abstract:

Process capability indices (PCIs) are important concepts of statistical quality control and measure the capability of processes and how much processes are meeting certain specifications. An important issue in statistical quality control is parameter estimation. Under the assumption of multivariate normality, the distribution parameters, mean vector and variance-covariance matrix must be estimated, when they are unknown. Classic estimation methods like method of moment estimation (MME) or maximum likelihood estimation (MLE) makes good estimation of the population parameters when data are not contaminated. But when outliers exist in the data, MME and MLE make weak estimators of the population parameters. So we need some estimators which have good estimation in the presence of outliers. In this work robust M-estimators for estimating these parameters are used and based on robust parameter estimators, robust process capability indices are introduced. The performances of these robust estimators in the presence of outliers and their effects on process capability indices are evaluated by real and simulated multivariate data. The results indicate that the proposed robust capability indices perform much better than the existing process capability indices.

Keywords: multivariate process capability indices, robust M-estimator, outlier, multivariate quality control, statistical quality control

Procedia PDF Downloads 250
3476 A Non-parametric Clustering Approach for Multivariate Geostatistical Data

Authors: Francky Fouedjio

Abstract:

Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

Keywords: clustering, geostatistics, multivariate data, non-parametric

Procedia PDF Downloads 454
3475 Hospital Malnutrition and its Impact on 30-day Mortality in Hospitalized General Medicine Patients in a Tertiary Hospital in South India

Authors: Vineet Agrawal, Deepanjali S., Medha R., Subitha L.

Abstract:

Background. Hospital malnutrition is a highly prevalent issue and is known to increase the morbidity, mortality, length of hospital stay, and cost of care. In India, studies on hospital malnutrition have been restricted to ICU, post-surgical, and cancer patients. We designed this study to assess the impact of hospital malnutrition on 30-day post-discharge and in-hospital mortality in patients admitted in the general medicine department, irrespective of diagnosis. Methodology. All patients aged above 18 years admitted in the medicine wards, excluding medico-legal cases, were enrolled in the study. Nutritional assessment was done within 72 h of admission, using Subjective Global Assessment (SGA), which classifies patients into three categories: Severely malnourished, Mildly/moderately malnourished, and Normal/well-nourished. Anthropometric measurements like Body Mass Index (BMI), Triceps skin-fold thickness (TSF), and Mid-upper arm circumference (MUAC) were also performed. Patients were followed-up during hospital stay and 30 days after discharge through telephonic interview, and their final diagnosis, comorbidities, and cause of death were noted. Multivariate logistic regression and cox regression model were used to determine if the nutritional status at admission independently impacted mortality at one month. Results. The prevalence of malnourishment by SGA in our study was 67.3% among 395 hospitalized patients, of which 155 patients (39.2%) were moderately malnourished, and 111 (28.1%) were severely malnourished. Of 395 patients, 61 patients (15.4%) expired, of which 30 died in the hospital, and 31 died within 1 month of discharge from hospital. On univariate analysis, malnourished patients had significantly higher morality (24.3% in 111 Cat C patients) than well-nourished patients (10.1% in 129 Cat A patients), with OR 9.17, p-value 0.007. On multivariate logistic regression, age and higher Charlson Comorbidity Index (CCI) were independently associated with mortality. Higher CCI indicates higher burden of comorbidities on admission, and the CCI in the expired patient group (mean=4.38) was significantly higher than that of the alive cohort (mean=2.85). Though malnutrition significantly contributed to higher mortality on univariate analysis, it was not an independent predictor of outcome on multivariate logistic regression. Length of hospitalisation was also longer in the malnourished group (mean= 9.4 d) compared to the well-nourished group (mean= 8.03 d) with a trend towards significance (p=0.061). None of the anthropometric measurements like BMI, MUAC, or TSF showed any association with mortality or length of hospitalisation. Inference. The results of our study highlight the issue of hospital malnutrition in medicine wards and reiterate that malnutrition contributes significantly to patient outcomes. We found that SGA performs better than anthropometric measurements in assessing under-nutrition. We are of the opinion that the heterogeneity of the study population by diagnosis was probably the primary reason why malnutrition by SGA was not found to be an independent risk factor for mortality. Strategies to identify high-risk patients at admission and treat malnutrition in the hospital and post-discharge are needed.

Keywords: hospitalization outcome, length of hospital stay, mortality, malnutrition, subjective global assessment (SGA)

Procedia PDF Downloads 118
3474 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart

Authors: O. Ikpotokin

Abstract:

In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.

Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 320
3473 Full Mini Nutritional Assessment Questionnaire and the Risk of Malnutrition and Mortality in Elderly, Hospitalized Patients: A Cross-Sectional Study

Authors: Christos E. Lampropoulos, Maria Konsta, Tamta Sirbilatze, Ifigenia Apostolou, Vicky Dradaki, Konstantina Panouria, Irini Dri, Christina Kordali, Vaggelis Lambas, Georgios Mavras

Abstract:

Objectives: Full Mini Nutritional Assessment (MNA) questionnaire is one of the most useful tools in diagnosis of malnutrition in hospitalized patients, which is related to increased morbidity and mortality. The purpose of our study was to assess the nutritional status of elderly, hospitalized patients and examine the hypothesis that MNA may predict mortality and extension of hospitalization. Methods: One hundred fifty patients (78 men, 72 women, mean age 80±8.2) were included in this cross-sectional study. The following data were taken into account in analysis: anthropometric and laboratory data, physical activity (International Physical Activity Questionnaires, IPAQ), smoking status, dietary habits, cause and duration of current admission, medical history (co-morbidities, previous admissions). Primary endpoints were mortality (from admission until 6 months afterwards) and duration of admission. The latter was compared to national guidelines for closed consolidated medical expenses. Logistic regression and linear regression analysis were performed in order to identify independent predictors for mortality and extended hospitalization respectively. Results: According to MNA, nutrition was normal in 54/150 (36%) of patients, 46/150 (30.7%) of them were at risk of malnutrition and the rest 50/150 (33.3%) were malnourished. After performing multivariate logistic regression analysis we found that the odds of death decreased 20% per each unit increase of full MNA score (OR=0.8, 95% CI 0.74-0.89, p < 0.0001). Patients who admitted due to cancer were 23 times more likely to die, compared to those with infection (OR=23, 95% CI 3.8-141.6, p=0.001). Similarly, patients who admitted due to stroke were 7 times more likely to die (OR=7, 95% CI 1.4-34.5, p=0.02), while these with all other causes of admission were less likely (OR=0.2, 95% CI 0.06-0.8, p=0.03), compared to patients with infection. According to multivariate linear regression analysis, each increase of unit of full MNA, decreased the admission duration on average 0.3 days (b:-0.3, 95% CI -0.45 - -0.15, p < 0.0001). Patients admitted due to cancer had on average 6.8 days higher extension of hospitalization, compared to those admitted for infection (b:6.8, 95% CI 3.2-10.3, p < 0.0001). Conclusion: Mortality and extension of hospitalization is significantly increased in elderly, malnourished patients. Full MNA score is a useful diagnostic tool of malnutrition.

Keywords: duration of admission, malnutrition, mini nutritional assessment score, prognostic factors for mortality

Procedia PDF Downloads 290
3472 Replicating Brain’s Resting State Functional Connectivity Network Using a Multi-Factor Hub-Based Model

Authors: B. L. Ho, L. Shi, D. F. Wang, V. C. T. Mok

Abstract:

The brain’s functional connectivity while temporally non-stationary does express consistency at a macro spatial level. The study of stable resting state connectivity patterns hence provides opportunities for identification of diseases if such stability is severely perturbed. A mathematical model replicating the brain’s spatial connections will be useful for understanding brain’s representative geometry and complements the empirical model where it falls short. Empirical computations tend to involve large matrices and become infeasible with fine parcellation. However, the proposed analytical model has no such computational problems. To improve replicability, 92 subject data are obtained from two open sources. The proposed methodology, inspired by financial theory, uses multivariate regression to find relationships of every cortical region of interest (ROI) with some pre-identified hubs. These hubs acted as representatives for the entire cortical surface. A variance-covariance framework of all ROIs is then built based on these relationships to link up all the ROIs. The result is a high level of match between model and empirical correlations in the range of 0.59 to 0.66 after adjusting for sample size; an increase of almost forty percent. More significantly, the model framework provides an intuitive way to delineate between systemic drivers and idiosyncratic noise while reducing dimensions by more than 30 folds, hence, providing a way to conduct attribution analysis. Due to its analytical nature and simple structure, the model is useful as a standalone toolkit for network dependency analysis or as a module for other mathematical models.

Keywords: functional magnetic resonance imaging, multivariate regression, network hubs, resting state functional connectivity

Procedia PDF Downloads 127
3471 Orthogonal Regression for Nonparametric Estimation of Errors-In-Variables Models

Authors: Anastasiia Yu. Timofeeva

Abstract:

Two new algorithms for nonparametric estimation of errors-in-variables models are proposed. The first algorithm is based on penalized regression spline. The spline is represented as a piecewise-linear function and for each linear portion orthogonal regression is estimated. This algorithm is iterative. The second algorithm involves locally weighted regression estimation. When the independent variable is measured with error such estimation is a complex nonlinear optimization problem. The simulation results have shown the advantage of the second algorithm under the assumption that true smoothing parameters values are known. Nevertheless the use of some indexes of fit to smoothing parameters selection gives the similar results and has an oversmoothing effect.

Keywords: grade point average, orthogonal regression, penalized regression spline, locally weighted regression

Procedia PDF Downloads 383
3470 A Learning-Based EM Mixture Regression Algorithm

Authors: Yi-Cheng Tian, Miin-Shen Yang

Abstract:

The mixture likelihood approach to clustering is a popular clustering method where the expectation and maximization (EM) algorithm is the most used mixture likelihood method. In the literature, the EM algorithm had been used for mixture regression models. However, these EM mixture regression algorithms are sensitive to initial values with a priori number of clusters. In this paper, to resolve these drawbacks, we construct a learning-based schema for the EM mixture regression algorithm such that it is free of initializations and can automatically obtain an approximately optimal number of clusters. Some numerical examples and comparisons demonstrate the superiority and usefulness of the proposed learning-based EM mixture regression algorithm.

Keywords: clustering, EM algorithm, Gaussian mixture model, mixture regression model

Procedia PDF Downloads 478
3469 The Prognostic Prediction Value of Positive Lymph Nodes Numbers for the Hypopharyngeal Squamous Cell Carcinoma

Authors: Wendu Pang, Yaxin Luo, Junhong Li, Yu Zhao, Danni Cheng, Yufang Rao, Minzi Mao, Ke Qiu, Yijun Dong, Fei Chen, Jun Liu, Jian Zou, Haiyang Wang, Wei Xu, Jianjun Ren

Abstract:

We aimed to compare the prognostic prediction value of positive lymph node number (PLNN) to the American Joint Committee on Cancer (AJCC) tumor, lymph node, and metastasis (TNM) staging system for patients with hypopharyngeal squamous cell carcinoma (HPSCC). A total of 826 patients with HPSCC from the Surveillance, Epidemiology, and End Results database (2004–2015) were identified and split into two independent cohorts: training (n=461) and validation (n=365). Univariate and multivariate Cox regression analyses were used to evaluate the prognostic effects of PLNN in patients with HPSCC. We further applied six Cox regression models to compare the survival predictive values of the PLNN and AJCC TNM staging system. PLNN showed a significant association with overall survival (OS) and cancer-specific survival (CSS) (P < 0.001) in both univariate and multivariable analyses, and was divided into three groups (PLNN 0, PLNN 1-5, and PLNN>5). In the training cohort, multivariate analysis revealed that the increased PLNN of HPSCC gave rise to significantly poor OS and CSS after adjusting for age, sex, tumor size, and cancer stage; this trend was also verified by the validation cohort. Additionally, the survival model incorporating a composite of PLNN and TNM classification (C-index, 0.705, 0.734) performed better than the PLNN and AJCC TNM models. PLNN can serve as a powerful survival predictor for patients with HPSCC and is a surrogate supplement for cancer staging systems.

Keywords: hypopharyngeal squamous cell carcinoma, positive lymph nodes number, prognosis, prediction models, survival predictive values

Procedia PDF Downloads 113
3468 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 90
3467 Risk of Androgen Deprivation Therapy-Induced Metabolic Syndrome-Related Complications for Prostate Cancer in Taiwan

Authors: Olivia Rachel Hwang, Yu-Hsuan Joni Shao

Abstract:

Androgen Deprivation Therapy (ADT) has been a primary treatment for patients with advanced prostate cancer. However, it is associated with numerous adverse effects related to Metabolic Syndrome (MetS), including hypertension, diabetes, hyperlipidaemia, heart diseases and ischemic strokes. However, complications associated with ADT for prostate cancer in Taiwan is not well documented. The purpose of this study is to utilize the data from NHIRD (National Health Insurance Research Database) to examine the trajectory changes of MetS-related complications in men receiving ADT. The risks of developing complications after the treatment were analyzed with multivariate Cox regression model. Covariates including in the model were the complications before the diagnosis of prostate cancer, the age, and the year at cancer diagnosis. A total number of 17268 patients from 1997-2013 were included in this study. The exclusion criteria were patients with any other types of cancer or with the existing MetS-related complications. Changes in MetS-related complications were observed among two treatment groups: 1) ADT (n=9042), and 2) non-ADT (n=8226). The ADT group appeared to have an increased risk in hypertension (hazard ratio 1.08, 95% confidence interval 1.03-1.13, P = 0.001) and hyperlipidemia (hazard ratio 1.09, 95% confidence interval 1.01-1.17, P = 0.02) when compared with non-ADT group in the multivariate Cox regression analyses. In the risk of diabetes, heart diseases, and ischemic strokes, ADT group appeared to have an increased but not significant hazard ratio. In conclusion, ADT was associated with an increased risk in hypertension and hyperlipidemia in prostate cancer patients in Taiwan. The risk of hypertension and hyperlipidemia should be considered while deciding on ADT, especially those with the known history of hypertension and hyperlipidemia.

Keywords: androgen deprivation therapy, ADT, complications, metabolic syndrome, MetS, prostate cancer

Procedia PDF Downloads 261
3466 Multivariate Dependent Frequency-Severity Modeling of Insurance Claims: A Vine Copula Approach

Authors: Islem Kedidi, Rihab Bedoui Bensalem, Faysal Manssouri

Abstract:

In traditional models of insurance data, the number and size of claims are assumed to be independent. Relaxing the independence assumption, this article explores the Vine copula to model dependence structure between multivariate frequency and average severity of insurance claim. To illustrate this approach, we use the Wisconsin local government property insurance fund which offers several insurance protections for motor vehicles, property and contractor’s equipment claims. Results show that the C-vine copula can better characterize the multivariate dependence structure between frequency and severity. Furthermore, we find significant dependencies especially between frequency and average severity among different coverage types.

Keywords: dependency modeling, government insurance, insurance claims, vine copula

Procedia PDF Downloads 172
3465 Prediction of Energy Storage Areas for Static Photovoltaic System Using Irradiation and Regression Modelling

Authors: Kisan Sarda, Bhavika Shingote

Abstract:

This paper aims to evaluate regression modelling for prediction of Energy storage of solar photovoltaic (PV) system using Semi parametric regression techniques because there are some parameters which are known while there are some unknown parameters like humidity, dust etc. Here irradiation of solar energy is different for different places on the basis of Latitudes, so by finding out areas which give more storage we can implement PV systems at those places and our need of energy will be fulfilled. This regression modelling is done for daily, monthly and seasonal prediction of solar energy storage. In this, we have used R modules for designing the algorithm. This algorithm will give the best comparative results than other regression models for the solar PV cell energy storage.

Keywords: semi parametric regression, photovoltaic (PV) system, regression modelling, irradiation

Procedia PDF Downloads 350
3464 On the Impact of Oil Price Fluctuations on Stock Markets: A Multivariate Long-Memory GARCH Framework

Authors: Manel Youssef, Lotfi Belkacem

Abstract:

This paper employs multivariate long memory GARCH models to simultaneously estimate mean and conditional variance spillover effects between oil prices and different financial markets. Since different financial assets are traded based on these market sector returns, it’s important for financial market participants to understand the volatility transmission mechanism over time and across these series in order to make optimal portfolio allocation decisions. We examine weekly returns from January 1, 2003 to November 30, 2012 and find evidence of significant transmission of shocks and volatilities between oil prices and some of the examined financial markets. The findings support the idea of cross-market hedging and sharing of common information by investors.

Keywords: oil prices, stock indices returns, oil volatility, contagion, DCC-multivariate (FI) GARCH

Procedia PDF Downloads 501
3463 Illustrative Effects of Social Capital on Perceived Health Status and Quality of Life among Older Adult in India: Evidence from WHO-Study on Global AGEing and Adults Health India

Authors: Himansu, Bedanga Talukdar

Abstract:

The aim of present study is to investigate the prevalence of various health outcomes and quality of life and analyzes the moderating role of social capital on health outcomes (i.e., self-rated good health (SRH), depression, functional health and quality of life) among elderly in India. Using WHO Study on Global AGEing and adults health (SAGE) data, with sample of 6559 elderly between 50 and above (Mage=61.81, SD=9.00) age were selected for analysis. Multivariate analysis accessed the prevalence of SRH, depression, functional limitation and quality of life among older adults. Logistic regression evaluates the effect of social capital along with other co-founders on SRH, depression, and functional limitation, whereas linear regression evaluates the effect of social capital with other co-founders on quality of life (QoL) among elderly. Empirical results reveal that (74%) of respondents were married, (70%) having low social action, (46%) medium sociability, (45%) low trust-solidarity, (58%) high safety, (65%) medium civic engagement and 37% reported medium psychological resources. The multivariate analysis, explains (SRH) is associated with age, female, having education, higher social action great trust, safety and greater psychological resources. Depression among elderly is greatly related to age, sex, education and higher wealth, higher sociability, having psychological resources. QoL is negatively associated with age, sex, being Muslim, whereas positive associated with higher education, currently married, civic engagement, having wealth, social action, trust and solidarity, safeness, and strong psychological resources.

Keywords: depressive symptom, functional limitation, older adults, quality of life, self rated health, social capital

Procedia PDF Downloads 196
3462 New Segmentation of Piecewise Linear Regression Models Using Reversible Jump MCMC Algorithm

Authors: Suparman

Abstract:

Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation of piecewise linear regression models. The method used to estimate the parameters of picewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters of picewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.

Keywords: regression, piecewise, Bayesian, reversible Jump MCMC

Procedia PDF Downloads 489
3461 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 423
3460 Developing and Evaluating Clinical Risk Prediction Models for Coronary Artery Bypass Graft Surgery

Authors: Mohammadreza Mohebbi, Masoumeh Sanagou

Abstract:

The ability to predict clinical outcomes is of great importance to physicians and clinicians. A number of different methods have been used in an effort to accurately predict these outcomes. These methods include the development of scoring systems based on multivariate statistical modelling, and models involving the use of classification and regression trees. The process usually consists of two consecutive phases, namely model development and external validation. The model development phase consists of building a multivariate model and evaluating its predictive performance by examining calibration and discrimination, and internal validation. External validation tests the predictive performance of a model by assessing its calibration and discrimination in different but plausibly related patients. A motivate example focuses on prediction modeling using a sample of patients undergone coronary artery bypass graft (CABG) has been used for illustrative purpose and a set of primary considerations for evaluating prediction model studies using specific quality indicators as criteria to help stakeholders evaluate the quality of a prediction model study has been proposed.

Keywords: clinical prediction models, clinical decision rule, prognosis, external validation, model calibration, biostatistics

Procedia PDF Downloads 267
3459 Model of Optimal Centroids Approach for Multivariate Data Classification

Authors: Pham Van Nha, Le Cam Binh

Abstract:

Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm. PSO was inspired by the natural behavior of birds and fish in migration and foraging for food. PSO is considered as a multidisciplinary optimization model that can be applied in various optimization problems. PSO’s ideas are simple and easy to understand but PSO is only applied in simple model problems. We think that in order to expand the applicability of PSO in complex problems, PSO should be described more explicitly in the form of a mathematical model. In this paper, we represent PSO in a mathematical model and apply in the multivariate data classification. First, PSOs general mathematical model (MPSO) is analyzed as a universal optimization model. Then, Model of Optimal Centroids (MOC) is proposed for the multivariate data classification. Experiments were conducted on some benchmark data sets to prove the effectiveness of MOC compared with several proposed schemes.

Keywords: analysis of optimization, artificial intelligence based optimization, optimization for learning and data analysis, global optimization

Procedia PDF Downloads 176
3458 Stock Market Prediction by Regression Model with Social Moods

Authors: Masahiro Ohmura, Koh Kakusho, Takeshi Okadome

Abstract:

This paper presents a regression model with autocorrelated errors in which the inputs are social moods obtained by analyzing the adjectives in Twitter posts using a document topic model. The regression model predicts Dow Jones Industrial Average (DJIA) more precisely than autoregressive moving-average models.

Keywords: stock market prediction, social moods, regression model, DJIA

Procedia PDF Downloads 519
3457 Statistical Model of Water Quality in Estero El Macho, Machala-El Oro

Authors: Rafael Zhindon Almeida

Abstract:

Surface water quality is an important concern for the evaluation and prediction of water quality conditions. The objective of this study is to develop a statistical model that can accurately predict the water quality of the El Macho estuary in the city of Machala, El Oro province. The methodology employed in this study is of a basic type that involves a thorough search for theoretical foundations to improve the understanding of statistical modeling for water quality analysis. The research design is correlational, using a multivariate statistical model involving multiple linear regression and principal component analysis. The results indicate that water quality parameters such as fecal coliforms, biochemical oxygen demand, chemical oxygen demand, iron and dissolved oxygen exceed the allowable limits. The water of the El Macho estuary is determined to be below the required water quality criteria. The multiple linear regression model, based on chemical oxygen demand and total dissolved solids, explains 99.9% of the variance of the dependent variable. In addition, principal component analysis shows that the model has an explanatory power of 86.242%. The study successfully developed a statistical model to evaluate the water quality of the El Macho estuary. The estuary did not meet the water quality criteria, with several parameters exceeding the allowable limits. The multiple linear regression model and principal component analysis provide valuable information on the relationship between the various water quality parameters. The findings of the study emphasize the need for immediate action to improve the water quality of the El Macho estuary to ensure the preservation and protection of this valuable natural resource.

Keywords: statistical modeling, water quality, multiple linear regression, principal components, statistical models

Procedia PDF Downloads 52
3456 Model-Based Software Regression Test Suite Reduction

Authors: Shiwei Deng, Yang Bao

Abstract:

In this paper, we present a model-based regression test suite reducing approach that uses EFSM model dependence analysis and probability-driven greedy algorithm to reduce software regression test suites. The approach automatically identifies the difference between the original model and the modified model as a set of elementary model modifications. The EFSM dependence analysis is performed for each elementary modification to reduce the regression test suite, and then the probability-driven greedy algorithm is adopted to select the minimum set of test cases from the reduced regression test suite that cover all interaction patterns. Our initial experience shows that the approach may significantly reduce the size of regression test suites.

Keywords: dependence analysis, EFSM model, greedy algorithm, regression test

Procedia PDF Downloads 398
3455 Segmentation of Piecewise Polynomial Regression Model by Using Reversible Jump MCMC Algorithm

Authors: Suparman

Abstract:

Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.

Keywords: piecewise regression, bayesian, reversible jump MCMC, segmentation

Procedia PDF Downloads 340
3454 A Fuzzy Linear Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

Fuzzy regression models are useful for investigating the relationship between explanatory variables and responses in fuzzy environments. To overcome the deficiencies of previous models and increase the explanatory power of fuzzy data, the graded mean integration (GMI) representation is applied to determine representative crisp regression coefficients. A fuzzy regression model is constructed based on the modified dissemblance index (MDI), which can precisely measure the actual total error. Compared with previous studies based on the proposed MDI and distance criterion, the results from commonly used test examples show that the proposed fuzzy linear regression model has higher explanatory power and forecasting accuracy.

Keywords: dissemblance index, fuzzy linear regression, graded mean integration, mathematical programming

Procedia PDF Downloads 407
3453 Modeling and Analysis Of Occupant Behavior On Heating And Air Conditioning Systems In A Higher Education And Vocational Training Building In A Mediterranean Climate

Authors: Abderrahmane Soufi

Abstract:

The building sector is the largest consumer of energy in France, accounting for 44% of French consumption. To reduce energy consumption and improve energy efficiency, France implemented an energy transition law targeting 40% energy savings by 2030 in the tertiary building sector. Building simulation tools are used to predict the energy performance of buildings but the reliability of these tools is hampered by discrepancies between the real and simulated energy performance of a building. This performance gap lies in the simplified assumptions of certain factors, such as the behavior of occupants on air conditioning and heating, which is considered deterministic when setting a fixed operating schedule and a fixed interior comfort temperature. However, the behavior of occupants on air conditioning and heating is stochastic, diverse, and complex because it can be affected by many factors. Probabilistic models are an alternative to deterministic models. These models are usually derived from statistical data and express occupant behavior by assuming a probabilistic relationship to one or more variables. In the literature, logistic regression has been used to model the behavior of occupants with regard to heating and air conditioning systems by considering univariate logistic models in residential buildings; however, few studies have developed multivariate models for higher education and vocational training buildings in a Mediterranean climate. Therefore, in this study, occupant behavior on heating and air conditioning systems was modeled using logistic regression. Occupant behavior related to the turn-on heating and air conditioning systems was studied through experimental measurements collected over a period of one year (June 2023–June 2024) in three classrooms occupied by several groups of students in engineering schools and professional training. Instrumentation was provided to collect indoor temperature and indoor relative humidity in 10-min intervals. Furthermore, the state of the heating/air conditioning system (off or on) and the set point were determined. The outdoor air temperature, relative humidity, and wind speed were collected as weather data. The number of occupants, age, and sex were also considered. Logistic regression was used for modeling an occupant turning on the heating and air conditioning systems. The results yielded a proposed model that can be used in building simulation tools to predict the energy performance of teaching buildings. Based on the first months (summer and early autumn) of the investigations, the results illustrate that the occupant behavior of the air conditioning systems is affected by the indoor relative humidity and temperature in June, July, and August and by the indoor relative humidity, temperature, and number of occupants in September and October. Occupant behavior was analyzed monthly, and univariate and multivariate models were developed.

Keywords: occupant behavior, logistic regression, behavior model, mediterranean climate, air conditioning, heating

Procedia PDF Downloads 33
3452 Neutral Heavy Scalar Searches via Standard Model Gauge Boson Decays at the Large Hadron Electron Collider with Multivariate Techniques

Authors: Luigi Delle Rose, Oliver Fischer, Ahmed Hammad

Abstract:

In this article, we study the prospects of the proposed Large Hadron electron Collider (LHeC) in the search for heavy neutral scalar particles. We consider a minimal model with one additional complex scalar singlet that interacts with the Standard Model (SM) via mixing with the Higgs doublet, giving rise to an SM-like Higgs boson and a heavy scalar particle. Both scalar particles are produced via vector boson fusion and can be tested via their decays into pairs of SM particles, analogously to the SM Higgs boson. Using multivariate techniques, we show that the LHeC is sensitive to heavy scalars with masses between 200 and 800 GeV down to scalar mixing of order 0.01.

Keywords: beyond the standard model, large hadron electron collider, multivariate analysis, scalar singlet

Procedia PDF Downloads 106
3451 The Theory behind Logistic Regression

Authors: Jan Henrik Wosnitza

Abstract:

The logistic regression has developed into a standard approach for estimating conditional probabilities in a wide range of applications including credit risk prediction. The article at hand contributes to the current literature on logistic regression fourfold: First, it is demonstrated that the binary logistic regression automatically meets its model assumptions under very general conditions. This result explains, at least in part, the logistic regression's popularity. Second, the requirement of homoscedasticity in the context of binary logistic regression is theoretically substantiated. The variances among the groups of defaulted and non-defaulted obligors have to be the same across the level of the aggregated default indicators in order to achieve linear logits. Third, this article sheds some light on the question why nonlinear logits might be superior to linear logits in case of a small amount of data. Fourth, an innovative methodology for estimating correlations between obligor-specific log-odds is proposed. In order to crystallize the key ideas, this paper focuses on the example of credit risk prediction. However, the results presented in this paper can easily be transferred to any other field of application.

Keywords: correlation, credit risk estimation, default correlation, homoscedasticity, logistic regression, nonlinear logistic regression

Procedia PDF Downloads 394
3450 Multivariate Statistical Process Monitoring of Base Metal Flotation Plant Using Dissimilarity Scale-Based Singular Spectrum Analysis

Authors: Syamala Krishnannair

Abstract:

A multivariate statistical process monitoring methodology using dissimilarity scale-based singular spectrum analysis (SSA) is proposed for the detection and diagnosis of process faults in the base metal flotation plant. Process faults are detected based on the multi-level decomposition of process signals by SSA using the dissimilarity structure of the process data and the subsequent monitoring of the multiscale signals using the unified monitoring index which combines T² with SPE. Contribution plots are used to identify the root causes of the process faults. The overall results indicated that the proposed technique outperformed the conventional multivariate techniques in the detection and diagnosis of the process faults in the flotation plant.

Keywords: fault detection, fault diagnosis, process monitoring, dissimilarity scale

Procedia PDF Downloads 175
3449 Effect of Pregnancy Intention, Postnatal Depressive Symptoms and Social Support on Early Childhood Stunting: Findings from India

Authors: Swati Srivastava, Ashish Kumar Upadhyay

Abstract:

Background: According to United Nation Children’s Fund, it has been estimated that worldwide about 165 million children were stunted in 2012 and India alone accounts for 38% of global burden of stunting. In terms of incidence, India is home of more than 60 million stunted children worldwide. Our study aims to examine the effect of pregnancy intention and maternal postnatal depressive symptoms on early childhood stunting in India. We hypothesized that effect of pregnancy intention and postnatal maternal depressive symptoms were mediated by social support. Methods: We used data from first wave of Young Lives Study India. Out of 2011 children recruited in original cohort, 1833 children had complete information on pregnancy intention, maternal depression and other variables. A series of multivariate logistic regression model were used to examine the effect of pregnancy intention and postnatal depressive symptoms on early childhood stunting. Results: Bivariate result indicates that a higher percent of children born after unintended pregnancy (40%) were stunted than children of intended pregnancy (26%). Likewise, proportion of stunted children was also higher among women of high postnatal depressive symptoms (35%) than low level of depression (24%). Results of multivariate logistic regression model indicate that children born after unintended pregnancy were significantly more likely to be stunted than children born after intended pregnancy (Coefficient: 1.70, CI: 1.17, 2.48). Likewise, early childhood stunting was also associated with maternal postnatal depressive symptoms among women (Coefficient: 1.48, CI: 1.16, 1.88). The effect of pregnancy intention and postnatal depressive symptoms on early childhood stunting remains unchanged after controlling for social support and other variables. Conclusions: The findings of this study provide conclusive evidence regarding consequences of pregnancy intention and postnatal depressive symptoms on early childhood stunting in India. Therefore, there is need to identify the women with unintended pregnancy and incorporate the promotion of mental health into their national reproductive and child health programme.

Keywords: pregnancy intention, postnatal depressive symptoms, social support, childhood stunting, young lives study, India

Procedia PDF Downloads 269
3448 Confidence Envelopes for Parametric Model Selection Inference and Post-Model Selection Inference

Authors: I. M. L. Nadeesha Jayaweera, Adao Alex Trindade

Abstract:

In choosing a candidate model in likelihood-based modeling via an information criterion, the practitioner is often faced with the difficult task of deciding just how far up the ranked list to look. Motivated by this pragmatic necessity, we construct an uncertainty band for a generalized (model selection) information criterion (GIC), defined as a criterion for which the limit in probability is identical to that of the normalized log-likelihood. This includes common special cases such as AIC & BIC. The method starts from the asymptotic normality of the GIC for the joint distribution of the candidate models in an independent and identically distributed (IID) data framework and proceeds by deriving the (asymptotically) exact distribution of the minimum. The calculation of an upper quantile for its distribution then involves the computation of multivariate Gaussian integrals, which is amenable to efficient implementation via the R package "mvtnorm". The performance of the methodology is tested on simulated data by checking the coverage probability of nominal upper quantiles and compared to the bootstrap. Both methods give coverages close to nominal for large samples, but the bootstrap is two orders of magnitude slower. The methodology is subsequently extended to two other commonly used model structures: regression and time series. In the regression case, we derive the corresponding asymptotically exact distribution of the minimum GIC invoking Lindeberg-Feller type conditions for triangular arrays and are thus able to similarly calculate upper quantiles for its distribution via multivariate Gaussian integration. The bootstrap once again provides a default competing procedure, and we find that similar comparison performance metrics hold as for the IID case. The time series case is complicated by far more intricate asymptotic regime for the joint distribution of the model GIC statistics. Under a Gaussian likelihood, the default in most packages, one needs to derive the limiting distribution of a normalized quadratic form for a realization from a stationary series. Under conditions on the process satisfied by ARMA models, a multivariate normal limit is once again achieved. The bootstrap can, however, be employed for its computation, whence we are once again in the multivariate Gaussian integration paradigm for upper quantile evaluation. Comparisons of this bootstrap-aided semi-exact method with the full-blown bootstrap once again reveal a similar performance but faster computation speeds. One of the most difficult problems in contemporary statistical methodological research is to be able to account for the extra variability introduced by model selection uncertainty, the so-called post-model selection inference (PMSI). We explore ways in which the GIC uncertainty band can be inverted to make inferences on the parameters. This is being attempted in the IID case by pivoting the CDF of the asymptotically exact distribution of the minimum GIC. For inference one parameter at a time and a small number of candidate models, this works well, whence the attained PMSI confidence intervals are wider than the MLE-based Wald, as expected.

Keywords: model selection inference, generalized information criteria, post model selection, Asymptotic Theory

Procedia PDF Downloads 61