Search results for: regression estimators
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3219

Search results for: regression estimators

3039 A Modified Estimating Equations in Derivation of the Causal Effect on the Survival Time with Time-Varying Covariates

Authors: Yemane Hailu Fissuh, Zhongzhan Zhang

Abstract:

a systematic observation from a defined time of origin up to certain failure or censor is known as survival data. Survival analysis is a major area of interest in biostatistics and biomedical researches. At the heart of understanding, the most scientific and medical research inquiries lie for a causality analysis. Thus, the main concern of this study is to investigate the causal effect of treatment on survival time conditional to the possibly time-varying covariates. The theory of causality often differs from the simple association between the response variable and predictors. A causal estimation is a scientific concept to compare a pragmatic effect between two or more experimental arms. To evaluate an average treatment effect on survival outcome, the estimating equation was adjusted for time-varying covariates under the semi-parametric transformation models. The proposed model intuitively obtained the consistent estimators for unknown parameters and unspecified monotone transformation functions. In this article, the proposed method estimated an unbiased average causal effect of treatment on survival time of interest. The modified estimating equations of semiparametric transformation models have the advantage to include the time-varying effect in the model. Finally, the finite sample performance characteristics of the estimators proved through the simulation and Stanford heart transplant real data. To this end, the average effect of a treatment on survival time estimated after adjusting for biases raised due to the high correlation of the left-truncation and possibly time-varying covariates. The bias in covariates was restored, by estimating density function for left-truncation. Besides, to relax the independence assumption between failure time and truncation time, the model incorporated the left-truncation variable as a covariate. Moreover, the expectation-maximization (EM) algorithm iteratively obtained unknown parameters and unspecified monotone transformation functions. To summarize idea, the ratio of cumulative hazards functions between the treated and untreated experimental group has a sense of the average causal effect for the entire population.

Keywords: a modified estimation equation, causal effect, semiparametric transformation models, survival analysis, time-varying covariate

Procedia PDF Downloads 155
3038 Estimating Bridge Deterioration for Small Data Sets Using Regression and Markov Models

Authors: Yina F. Muñoz, Alexander Paz, Hanns De La Fuente-Mella, Joaquin V. Fariña, Guilherme M. Sales

Abstract:

The primary approach for estimating bridge deterioration uses Markov-chain models and regression analysis. Traditional Markov models have problems in estimating the required transition probabilities when a small sample size is used. Often, reliable bridge data have not been taken over large periods, thus large data sets may not be available. This study presents an important change to the traditional approach by using the Small Data Method to estimate transition probabilities. The results illustrate that the Small Data Method and traditional approach both provide similar estimates; however, the former method provides results that are more conservative. That is, Small Data Method provided slightly lower than expected bridge condition ratings compared with the traditional approach. Considering that bridges are critical infrastructures, the Small Data Method, which uses more information and provides more conservative estimates, may be more appropriate when the available sample size is small. In addition, regression analysis was used to calculate bridge deterioration. Condition ratings were determined for bridge groups, and the best regression model was selected for each group. The results obtained were very similar to those obtained when using Markov chains; however, it is desirable to use more data for better results.

Keywords: concrete bridges, deterioration, Markov chains, probability matrix

Procedia PDF Downloads 326
3037 Computational Tool for Surface Electromyography Analysis; an Easy Way for Non-Engineers

Authors: Fabiano Araujo Soares, Sauro Emerick Salomoni, Joao Paulo Lima da Silva, Igor Luiz Moura, Adson Ferreira da Rocha

Abstract:

This paper presents a tool developed in the Matlab platform. It was developed to simplify the analysis of surface electromyography signals (S-EMG) in a way accessible to users that are not familiarized with signal processing procedures. The tool receives data by commands in window fields and generates results as graphics and excel tables. The underlying math of each S-EMG estimator is presented. Setup window and result graphics are presented. The tool was presented to four non-engineer users and all of them managed to appropriately use it after a 5 minutes instruction period.

Keywords: S-EMG estimators, electromyography, surface electromyography, ARV, RMS, MDF, MNF, CV

Procedia PDF Downloads 535
3036 A New Method to Estimate the Low Income Proportion: Monte Carlo Simulations

Authors: Encarnación Álvarez, Rosa M. García-Fernández, Juan F. Muñoz

Abstract:

Estimation of a proportion has many applications in economics and social studies. A common application is the estimation of the low income proportion, which gives the proportion of people classified as poor into a population. In this paper, we present this poverty indicator and propose to use the logistic regression estimator for the problem of estimating the low income proportion. Various sampling designs are presented. Assuming a real data set obtained from the European Survey on Income and Living Conditions, Monte Carlo simulation studies are carried out to analyze the empirical performance of the logistic regression estimator under the various sampling designs considered in this paper. Results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the customary estimator under the various sampling designs considered in this paper. The stratified sampling design can also provide more accurate results.

Keywords: poverty line, risk of poverty, auxiliary variable, ratio method

Procedia PDF Downloads 438
3035 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 156
3034 Coverage Probability Analysis of WiMAX Network under Additive White Gaussian Noise and Predicted Empirical Path Loss Model

Authors: Chaudhuri Manoj Kumar Swain, Susmita Das

Abstract:

This paper explores a detailed procedure of predicting a path loss (PL) model and its application in estimating the coverage probability in a WiMAX network. For this a hybrid approach is followed in predicting an empirical PL model of a 2.65 GHz WiMAX network deployed in a suburban environment. Data collection, statistical analysis, and regression analysis are the phases of operations incorporated in this approach and the importance of each of these phases has been discussed properly. The procedure of collecting data such as received signal strength indicator (RSSI) through experimental set up is demonstrated. From the collected data set, empirical PL and RSSI models are predicted with regression technique. Furthermore, with the aid of the predicted PL model, essential parameters such as PL exponent as well as the coverage probability of the network are evaluated. This research work may assist in the process of deployment and optimisation of any cellular network significantly.

Keywords: WiMAX, RSSI, path loss, coverage probability, regression analysis

Procedia PDF Downloads 155
3033 Regression Analysis in Estimating Stream-Flow and the Effect of Hierarchical Clustering Analysis: A Case Study in Euphrates-Tigris Basin

Authors: Goksel Ezgi Guzey, Bihrat Onoz

Abstract:

The scarcity of streamflow gauging stations and the increasing effects of global warming cause designing water management systems to be very difficult. This study is a significant contribution to assessing regional regression models for estimating streamflow. In this study, simulated meteorological data was related to the observed streamflow data from 1971 to 2020 for 33 stream gauging stations of the Euphrates-Tigris Basin. Ordinary least squares regression was used to predict flow for 2020-2100 with the simulated meteorological data. CORDEX- EURO and CORDEX-MENA domains were used with 0.11 and 0.22 grids, respectively, to estimate climate conditions under certain climate scenarios. Twelve meteorological variables simulated by two regional climate models, RCA4 and RegCM4, were used as independent variables in the ordinary least squares regression, where the observed streamflow was the dependent variable. The variability of streamflow was then calculated with 5-6 meteorological variables and watershed characteristics such as area and height prior to the application. Of the regression analysis of 31 stream gauging stations' data, the stations were subjected to a clustering analysis, which grouped the stations in two clusters in terms of their hydrometeorological properties. Two streamflow equations were found for the two clusters of stream gauging stations for every domain and every regional climate model, which increased the efficiency of streamflow estimation by a range of 10-15% for all the models. This study underlines the importance of homogeneity of a region in estimating streamflow not only in terms of the geographical location but also in terms of the meteorological characteristics of that region.

Keywords: hydrology, streamflow estimation, climate change, hydrologic modeling, HBV, hydropower

Procedia PDF Downloads 108
3032 The Impact of Governance on Happiness: Evidence from Quantile Regressions

Authors: Chiung-Ju Huang

Abstract:

This study utilizes the quantile regression analysis to examine the impact of governance (including democratic quality and technical quality) on happiness in 101 countries worldwide, classified as “developed countries” and “developing countries”. The empirical results show that the impact of democratic quality and technical quality on happiness is significantly positive for “developed countries”, while is insignificant for “developing countries”. The results suggest that the authorities in developed countries can enhance the level of individual happiness by means of improving the democracy quality and technical quality. However, for developing countries, promoting the quality of governance in order to enhance the level of happiness may not be effective. Policy makers in developed countries may pay more attention on increasing real GDP per capita instead of promoting the quality of governance to enhance individual happiness.

Keywords: governance, happiness, multiple regression, quantile regression

Procedia PDF Downloads 263
3031 The Normal-Generalized Hyperbolic Secant Distribution: Properties and Applications

Authors: Hazem M. Al-Mofleh

Abstract:

In this paper, a new four-parameter univariate continuous distribution called the Normal-Generalized Hyperbolic Secant Distribution (NGHS) is defined and studied. Some general and structural distributional properties are investigated and discussed, including: central and non-central n-th moments and incomplete moments, quantile and generating functions, hazard function, Rényi and Shannon entropies, shapes: skewed right, skewed left, and symmetric, modality regions: unimodal and bimodal, maximum likelihood (MLE) estimators for the parameters. Finally, two real data sets are used to demonstrate empirically its flexibility and prove the strength of the new distribution.

Keywords: bimodality, estimation, hazard function, moments, Shannon’s entropy

Procedia PDF Downloads 324
3030 The Impact of Female Education on Fertility: A Natural Experiment from Egypt

Authors: Fatma Romeh, Shiferaw Gurmu

Abstract:

This paper examines the impact of female education on fertility, using the change in length of primary schooling in Egypt in 1988-89 as the source of exogenous variation in schooling. In particular, beginning in 1988, children had to attend primary school for only five years rather than six years. This change was applicable to all individuals born on or after October 1977. Using a nonparametric regression discontinuity approach, we compare education and fertility of women born just before and after October 1977. The results show that female education significantly reduces the number of children born per woman and delays the time until first birth. Applying a robust regression discontinuity approach, however, the impact of education on the number of children is no longer significant. The impact on the timing of first birth remained significant under the robust approach. Each year of female education postponed childbearing by three months, on average.

Keywords: Egypt, female education, fertility, robust regression discontinuity

Procedia PDF Downloads 324
3029 Breast Cancer Mortality and Comorbidities in Portugal: A Predictive Model Built with Real World Data

Authors: Cecília M. Antão, Paulo Jorge Nogueira

Abstract:

Breast cancer (BC) is the first cause of cancer mortality among Portuguese women. This retrospective observational study aimed at identifying comorbidities associated with BC female patients admitted to Portuguese public hospitals (2010-2018), investigating the effect of comorbidities on BC mortality rate, and building a predictive model using logistic regression. Results showed that the BC mortality in Portugal decreased in this period and reached 4.37% in 2018. Adjusted odds ratio indicated that secondary malignant neoplasms of liver, of bone and bone marrow, congestive heart failure, and diabetes were associated with an increased chance of dying from breast cancer. Although the Lisbon district (the most populated area) accounted for the largest percentage of BC patients, the logistic regression model showed that, besides patient’s age, being resident in Bragança, Castelo Branco, or Porto districts was directly associated with an increase of the mortality rate.

Keywords: breast cancer, comorbidities, logistic regression, adjusted odds ratio

Procedia PDF Downloads 72
3028 Assessing Relationships between Glandularity and Gray Level by Using Breast Phantoms

Authors: Yun-Xuan Tang, Pei-Yuan Liu, Kun-Mu Lu, Min-Tsung Tseng, Liang-Kuang Chen, Yuh-Feng Tsai, Ching-Wen Lee, Jay Wu

Abstract:

Breast cancer is predominant of malignant tumors in females. The increase in the glandular density increases the risk of breast cancer. BI-RADS is a frequently used density indicator in mammography; however, it significantly overestimates the glandularity. Therefore, it is very important to accurately and quantitatively assess the glandularity by mammography. In this study, 20%, 30% and 50% glandularity phantoms were exposed using a mammography machine at 28, 30 and 31 kVp, and 30, 55, 80 and 105 mAs, respectively. The regions of interest (ROIs) were drawn to assess the gray level. The relationship between the glandularity and gray level under various compression thicknesses, kVp, and mAs was established by the multivariable linear regression. A phantom verification was performed with automatic exposure control (AEC). The regression equation was obtained with an R-square value of 0.928. The average gray levels of the verification phantom were 8708, 8660 and 8434 for 0.952, 0.963 and 0.985 g/cm3, respectively. The percent differences of glandularity to the regression equation were 3.24%, 2.75% and 13.7%. We concluded that the proposed method could be clinically applied in mammography to improve the glandularity estimation and further increase the importance of breast cancer screening.

Keywords: mammography, glandularity, gray value, BI-RADS

Procedia PDF Downloads 474
3027 Efficient Frontier: Comparing Different Volatility Estimators

Authors: Tea Poklepović, Zdravka Aljinović, Mario Matković

Abstract:

Modern Portfolio Theory (MPT) according to Markowitz states that investors form mean-variance efficient portfolios which maximizes their utility. Markowitz proposed the standard deviation as a simple measure for portfolio risk and the lower semi-variance as the only risk measure of interest to rational investors. This paper uses a third volatility estimator based on intraday data and compares three efficient frontiers on the Croatian Stock Market. The results show that range-based volatility estimator outperforms both mean-variance and lower semi-variance model.

Keywords: variance, lower semi-variance, range-based volatility, MPT

Procedia PDF Downloads 498
3026 An Analysis of the Regression Hypothesis from a Shona Broca’s Aphasci Perspective

Authors: Esther Mafunda, Simbarashe Muparangi

Abstract:

The present paper tests the applicability of the Regression Hypothesis on the pathological language dissolution of a Shona male adult with Broca’s aphasia. It particularly assesses the prediction of the Regression Hypothesis, which states that the process according to which language is forgotten will be the reversal of the process according to which it will be acquired. The main aim of the paper is to find out whether mirror symmetries between L1 acquisition and L1 dissolution of tense in Shona and, if so, what might cause these regression patterns. The paper also sought to highlight the practical contributions that Linguistic theory can make to solving language-related problems. Data was collected from a 46-year-old male adult with Broca’s aphasia who was receiving speech therapy at St Giles Rehabilitation Centre in Harare, Zimbabwe. The primary data elicitation method was experimental, using the probe technique. The TART (Test for Assessing Reference Time) Shona version in the form of sequencing pictures was used to access tense by Broca’s aphasic and 3.5-year-old child. Using the SPSS (Statistical Package for Social Studies) and Excel analysis, it was established that the use of the future tense was impaired in Shona Broca’s aphasic whilst the present and past tense was intact. However, though the past tense was intact in the male adult with Broca’s aphasic, a reference to the remote past was made. The use of the future tense was also found to be difficult for the 3,5-year-old speaking child. No difficulties were encountered in using the present and past tenses. This means that mirror symmetries were found between L1 acquisition and L1 dissolution of tense in Shona. On the basis of the results of this research, it can be concluded that the use of tense in a Shona adult with Broca’s aphasia supports the Regression Hypothesis. The findings of this study are important in terms of speech therapy in the context of Zimbabwe. The study also contributes to Bantu linguistics in general and to Shona linguistics in particular. Further studies could also be done focusing on the rest of the Bantu language varieties in terms of aphasia.

Keywords: Broca’s Aphasia, regression hypothesis, Shona, language dissolution

Procedia PDF Downloads 77
3025 Exploring the Applications of Neural Networks in the Adaptive Learning Environment

Authors: Baladitya Swaika, Rahul Khatry

Abstract:

Computer Adaptive Tests (CATs) is one of the most efficient ways for testing the cognitive abilities of students. CATs are based on Item Response Theory (IRT) which is based on item selection and ability estimation using statistical methods of maximum information selection/selection from posterior and maximum-likelihood (ML)/maximum a posteriori (MAP) estimators respectively. This study aims at combining both classical and Bayesian approaches to IRT to create a dataset which is then fed to a neural network which automates the process of ability estimation and then comparing it to traditional CAT models designed using IRT. This study uses python as the base coding language, pymc for statistical modelling of the IRT and scikit-learn for neural network implementations. On creation of the model and on comparison, it is found that the Neural Network based model performs 7-10% worse than the IRT model for score estimations. Although performing poorly, compared to the IRT model, the neural network model can be beneficially used in back-ends for reducing time complexity as the IRT model would have to re-calculate the ability every-time it gets a request whereas the prediction from a neural network could be done in a single step for an existing trained Regressor. This study also proposes a new kind of framework whereby the neural network model could be used to incorporate feature sets, other than the normal IRT feature set and use a neural network’s capacity of learning unknown functions to give rise to better CAT models. Categorical features like test type, etc. could be learnt and incorporated in IRT functions with the help of techniques like logistic regression and can be used to learn functions and expressed as models which may not be trivial to be expressed via equations. This kind of a framework, when implemented would be highly advantageous in psychometrics and cognitive assessments. This study gives a brief overview as to how neural networks can be used in adaptive testing, not only by reducing time-complexity but also by being able to incorporate newer and better datasets which would eventually lead to higher quality testing.

Keywords: computer adaptive tests, item response theory, machine learning, neural networks

Procedia PDF Downloads 162
3024 Apricot Insurance Portfolio Risk

Authors: Kasirga Yildirak, Ismail Gur

Abstract:

We propose a model to measure hail risk of an Agricultural Insurance portfolio. Hail is one of the major catastrophic event that causes big amount of loss to an insurer. Moreover, it is very hard to predict due to its strange atmospheric characteristics. We make use of parcel based claims data on apricot damage collected by the Turkish Agricultural Insurance Pool (TARSIM). As our ultimate aim is to compute the loadings assigned to specific parcels, we build a portfolio risk model that makes use of PD and the severity of the exposures. PD is computed by Spherical-Linear and Circular –Linear regression models as the data carries coordinate information and seasonality. Severity is mapped into integer brackets so that Probability Generation Function could be employed. Individual regressions are run on each clusters estimated on different criteria. Loss distribution is constructed by Panjer Recursion technique. We also show that one risk-one crop model can easily be extended to the multi risk–multi crop model by assuming conditional independency.

Keywords: hail insurance, spherical regression, circular regression, spherical clustering

Procedia PDF Downloads 238
3023 Estimating The Population Mean by Using Stratified Double Extreme Ranked Set Sample

Authors: Mahmoud I. Syam, Kamarulzaman Ibrahim, Amer I. Al-Omari

Abstract:

Stratified double extreme ranked set sampling (SDERSS) method is introduced and considered for estimating the population mean. The SDERSS is compared with the simple random sampling (SRS), stratified ranked set sampling (SRSS) and stratified simple set sampling (SSRS). It is shown that the SDERSS estimator is an unbiased of the population mean and more efficient than the estimators using SRS, SRSS and SSRS when the underlying distribution of the variable of interest is symmetric or asymmetric.

Keywords: double extreme ranked set sampling, extreme ranked set sampling, ranked set sampling, stratified double extreme ranked set sampling

Procedia PDF Downloads 442
3022 Enhancing the Interpretation of Group-Level Diagnostic Results from Cognitive Diagnostic Assessment: Application of Quantile Regression and Cluster Analysis

Authors: Wenbo Du, Xiaomei Ma

Abstract:

With the empowerment of Cognitive Diagnostic Assessment (CDA), various domains of language testing and assessment have been investigated to dig out more diagnostic information. What is noticeable is that most of the extant empirical CDA-based research puts much emphasis on individual-level diagnostic purpose with very few concerned about learners’ group-level performance. Even though the personalized diagnostic feedback is the unique feature that differentiates CDA from other assessment tools, group-level diagnostic information cannot be overlooked in that it might be more practical in classroom setting. Additionally, the group-level diagnostic information obtained via current CDA always results in a “flat pattern”, that is, the mastery/non-mastery of all tested skills accounts for the two highest proportion. In that case, the outcome does not bring too much benefits than the original total score. To address these issues, the present study attempts to apply cluster analysis for group classification and quantile regression analysis to pinpoint learners’ performance at different proficiency levels (beginner, intermediate and advanced) thus to enhance the interpretation of the CDA results extracted from a group of EFL learners’ reading performance on a diagnostic reading test designed by PELDiaG research team from a key university in China. The results show that EM method in cluster analysis yield more appropriate classification results than that of CDA, and quantile regression analysis does picture more insightful characteristics of learners with different reading proficiencies. The findings are helpful and practical for instructors to refine EFL reading curriculum and instructional plan tailored based on the group classification results and quantile regression analysis. Meanwhile, these innovative statistical methods could also make up the deficiencies of CDA and push forward the development of language testing and assessment in the future.

Keywords: cognitive diagnostic assessment, diagnostic feedback, EFL reading, quantile regression

Procedia PDF Downloads 135
3021 The Factors of Supply Chain Collaboration

Authors: Ghada Soltane

Abstract:

The objective of this study was to identify factors impacting supply chain collaboration. a quantitative study was carried out on a sample of 84 Tunisian industrial companies. To verify the research hypotheses and test the direct effect of these factors on supply chain collaboration a multiple regression method was used using SPSS 26 software. The results show that there are four factors direct effects that affect supply chain collaboration in a meaningful and positive way, including: trust, engagement, information sharing and information quality

Keywords: supply chain collaboration, factors of collaboration, principal component analysis, multiple regression

Procedia PDF Downloads 22
3020 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining

Procedia PDF Downloads 103
3019 The Usage of Bridge Estimator for Hegy Seasonal Unit Root Tests

Authors: Huseyin Guler, Cigdem Kosar

Abstract:

The aim of this study is to propose Bridge estimator for seasonal unit root tests. Seasonality is an important factor for many economic time series. Some variables may contain seasonal patterns and forecasts that ignore important seasonal patterns have a high variance. Therefore, it is very important to eliminate seasonality for seasonal macroeconomic data. There are some methods to eliminate the impacts of seasonality in time series. One of them is filtering the data. However, this method leads to undesired consequences in unit root tests, especially if the data is generated by a stochastic seasonal process. Another method to eliminate seasonality is using seasonal dummy variables. Some seasonal patterns may result from stationary seasonal processes, which are modelled using seasonal dummies but if there is a varying and changing seasonal pattern over time, so the seasonal process is non-stationary, deterministic seasonal dummies are inadequate to capture the seasonal process. It is not suitable to use seasonal dummies for modeling such seasonally nonstationary series. Instead of that, it is necessary to take seasonal difference if there are seasonal unit roots in the series. Different alternative methods are proposed in the literature to test seasonal unit roots, such as Dickey, Hazsa, Fuller (DHF) and Hylleberg, Engle, Granger, Yoo (HEGY) tests. HEGY test can be also used to test the seasonal unit root in different frequencies (monthly, quarterly, and semiannual). Another issue in unit root tests is the lag selection. Lagged dependent variables are added to the model in seasonal unit root tests as in the unit root tests to overcome the autocorrelation problem. In this case, it is necessary to choose the lag length and determine any deterministic components (i.e., a constant and trend) first, and then use the proper model to test for seasonal unit roots. However, this two-step procedure might lead size distortions and lack of power in seasonal unit root tests. Recent studies show that Bridge estimators are good in selecting optimal lag length while differentiating nonstationary versus stationary models for nonseasonal data. The advantage of this estimator is the elimination of the two-step nature of conventional unit root tests and this leads a gain in size and power. In this paper, the Bridge estimator is proposed to test seasonal unit roots in a HEGY model. A Monte-Carlo experiment is done to determine the efficiency of this approach and compare the size and power of this method with HEGY test. Since Bridge estimator performs well in model selection, our approach may lead to some gain in terms of size and power over HEGY test.

Keywords: bridge estimators, HEGY test, model selection, seasonal unit root

Procedia PDF Downloads 317
3018 Study on Optimal Control Strategy of PM2.5 in Wuhan, China

Authors: Qiuling Xie, Shanliang Zhu, Zongdi Sun

Abstract:

In this paper, we analyzed the correlation relationship among PM2.5 from other five Air Quality Indices (AQIs) based on the grey relational degree, and built a multivariate nonlinear regression equation model of PM2.5 and the five monitoring indexes. For the optimal control problem of PM2.5, we took the partial large Cauchy distribution of membership equation as satisfaction function. We established a nonlinear programming model with the goal of maximum performance to price ratio. And the optimal control scheme is given.

Keywords: grey relational degree, multiple linear regression, membership function, nonlinear programming

Procedia PDF Downloads 282
3017 SVM-Based Modeling of Mass Transfer Potential of Multiple Plunging Jets

Authors: Surinder Deswal, Mahesh Pal

Abstract:

The paper investigates the potential of support vector machines based regression approach to model the mass transfer capacity of multiple plunging jets, both vertical (θ = 90°) and inclined (θ = 60°). The data set used in this study consists of four input parameters with a total of eighty eight cases. For testing, tenfold cross validation was used. Correlation coefficient values of 0.971 and 0.981 (root mean square error values of 0.0025 and 0.0020) were achieved by using polynomial and radial basis kernel functions based support vector regression respectively. Results suggest an improved performance by radial basis function in comparison to polynomial kernel based support vector machines. The estimated overall mass transfer coefficient, by both the kernel functions, is in good agreement with actual experimental values (within a scatter of ±15 %); thereby suggesting the utility of support vector machines based regression approach.

Keywords: mass transfer, multiple plunging jets, support vector machines, ecological sciences

Procedia PDF Downloads 447
3016 The Modelling of Real Time Series Data

Authors: Valeria Bondarenko

Abstract:

We proposed algorithms for: estimation of parameters fBm (volatility and Hurst exponent) and for the approximation of random time series by functional of fBm. We proved the consistency of the estimators, which constitute the above algorithms, and proved the optimal forecast of approximated time series. The adequacy of estimation algorithms, approximation, and forecasting is proved by numerical experiment. During the process of creating software, the system has been created, which is displayed by the hierarchical structure. The comparative analysis of proposed algorithms with the other methods gives evidence of the advantage of approximation method. The results can be used to develop methods for the analysis and modeling of time series describing the economic, physical, biological and other processes.

Keywords: mathematical model, random process, Wiener process, fractional Brownian motion

Procedia PDF Downloads 341
3015 Supervised-Component-Based Generalised Linear Regression with Multiple Explanatory Blocks: THEME-SCGLR

Authors: Bry X., Trottier C., Mortier F., Cornu G., Verron T.

Abstract:

We address component-based regularization of a Multivariate Generalized Linear Model (MGLM). A set of random responses Y is assumed to depend, through a GLM, on a set X of explanatory variables, as well as on a set T of additional covariates. X is partitioned into R conceptually homogeneous blocks X1, ... , XR , viewed as explanatory themes. Variables in each Xr are assumed many and redundant. Thus, Generalised Linear Regression (GLR) demands regularization with respect to each Xr. By contrast, variables in T are assumed selected so as to demand no regularization. Regularization is performed searching each Xr for an appropriate number of orthogonal components that both contribute to model Y and capture relevant structural information in Xr. We propose a very general criterion to measure structural relevance (SR) of a component in a block, and show how to take SR into account within a Fisher-scoring-type algorithm in order to estimate the model. We show how to deal with mixed-type explanatory variables. The method, named THEME-SCGLR, is tested on simulated data.

Keywords: Component-Model, Fisher Scoring Algorithm, GLM, PLS Regression, SCGLR, SEER, THEME

Procedia PDF Downloads 384
3014 Parameter Estimation via Metamodeling

Authors: Sergio Haram Sarmiento, Arcady Ponosov

Abstract:

Based on appropriate multivariate statistical methodology, we suggest a generic framework for efficient parameter estimation for ordinary differential equations and the corresponding nonlinear models. In this framework classical linear regression strategies is refined into a nonlinear regression by a locally linear modelling technique (known as metamodelling). The approach identifies those latent variables of the given model that accumulate most information about it among all approximations of the same dimension. The method is applied to several benchmark problems, in particular, to the so-called ”power-law systems”, being non-linear differential equations typically used in Biochemical System Theory.

Keywords: principal component analysis, generalized law of mass action, parameter estimation, metamodels

Procedia PDF Downloads 495
3013 Development of Computational Approach for Calculation of Hydrogen Solubility in Hydrocarbons for Treatment of Petroleum

Authors: Abdulrahman Sumayli, Saad M. AlShahrani

Abstract:

For the hydrogenation process, knowing the solubility of hydrogen (H2) in hydrocarbons is critical to improve the efficiency of the process. We investigated the H2 solubility computation in four heavy crude oil feedstocks using machine learning techniques. Temperature, pressure, and feedstock type were considered as the inputs to the models, while the hydrogen solubility was the sole response. Specifically, we employed three different models: Support Vector Regression (SVR), Gaussian process regression (GPR), and Bayesian ridge regression (BRR). To achieve the best performance, the hyper-parameters of these models are optimized using the whale optimization algorithm (WOA). We evaluated the models using a dataset of solubility measurements in various feedstocks, and we compared their performance based on several metrics. Our results show that the WOA-SVR model tuned with WOA achieves the best performance overall, with an RMSE of 1.38 × 10− 2 and an R-squared of 0.991. These findings suggest that machine learning techniques can provide accurate predictions of hydrogen solubility in different feedstocks, which could be useful in the development of hydrogen-related technologies. Besides, the solubility of hydrogen in the four heavy oil fractions is estimated in different ranges of temperatures and pressures of 150 ◦C–350 ◦C and 1.2 MPa–10.8 MPa, respectively

Keywords: temperature, pressure variations, machine learning, oil treatment

Procedia PDF Downloads 54
3012 Representativity Based Wasserstein Active Regression

Authors: Benjamin Bobbia, Matthias Picard

Abstract:

In recent years active learning methodologies based on the representativity of the data seems more promising to limit overfitting. The presented query methodology for regression using the Wasserstein distance measuring the representativity of our labelled dataset compared to the global distribution. In this work a crucial use of GroupSort Neural Networks is made therewith to draw a double advantage. The Wasserstein distance can be exactly expressed in terms of such neural networks. Moreover, one can provide explicit bounds for their size and depth together with rates of convergence. However, heterogeneity of the dataset is also considered by weighting the Wasserstein distance with the error of approximation at the previous step of active learning. Such an approach leads to a reduction of overfitting and high prediction performance after few steps of query. After having detailed the methodology and algorithm, an empirical study is presented in order to investigate the range of our hyperparameters. The performances of this method are compared, in terms of numbers of query needed, with other classical and recent query methods on several UCI datasets.

Keywords: active learning, Lipschitz regularization, neural networks, optimal transport, regression

Procedia PDF Downloads 69
3011 A Machine Learning Approach for Earthquake Prediction in Various Zones Based on Solar Activity

Authors: Viacheslav Shkuratskyy, Aminu Bello Usman, Michael O’Dea, Saifur Rahman Sabuj

Abstract:

This paper examines relationships between solar activity and earthquakes; it applied machine learning techniques: K-nearest neighbour, support vector regression, random forest regression, and long short-term memory network. Data from the SILSO World Data Center, the NOAA National Center, the GOES satellite, NASA OMNIWeb, and the United States Geological Survey were used for the experiment. The 23rd and 24th solar cycles, daily sunspot number, solar wind velocity, proton density, and proton temperature were all included in the dataset. The study also examined sunspots, solar wind, and solar flares, which all reflect solar activity and earthquake frequency distribution by magnitude and depth. The findings showed that the long short-term memory network model predicts earthquakes more correctly than the other models applied in the study, and solar activity is more likely to affect earthquakes of lower magnitude and shallow depth than earthquakes of magnitude 5.5 or larger with intermediate depth and deep depth.

Keywords: k-nearest neighbour, support vector regression, random forest regression, long short-term memory network, earthquakes, solar activity, sunspot number, solar wind, solar flares

Procedia PDF Downloads 58
3010 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis

Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin

Abstract:

Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.

Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve

Procedia PDF Downloads 319