Search results for: simple regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5917

Search results for: simple regression

5797 Statistical Comparison of Ensemble Based Storm Surge Forecasting Models

Authors: Amin Salighehdar, Ziwen Ye, Mingzhe Liu, Ionut Florescu, Alan F. Blumberg

Abstract:

Storm surge is an abnormal water level caused by a storm. Accurate prediction of a storm surge is a challenging problem. Researchers developed various ensemble modeling techniques to combine several individual forecasts to produce an overall presumably better forecast. There exist some simple ensemble modeling techniques in literature. For instance, Model Output Statistics (MOS), and running mean-bias removal are widely used techniques in storm surge prediction domain. However, these methods have some drawbacks. For instance, MOS is based on multiple linear regression and it needs a long period of training data. To overcome the shortcomings of these simple methods, researchers propose some advanced methods. For instance, ENSURF (Ensemble SURge Forecast) is a multi-model application for sea level forecast. This application creates a better forecast of sea level using a combination of several instances of the Bayesian Model Averaging (BMA). An ensemble dressing method is based on identifying best member forecast and using it for prediction. Our contribution in this paper can be summarized as follows. First, we investigate whether the ensemble models perform better than any single forecast. Therefore, we need to identify the single best forecast. We present a methodology based on a simple Bayesian selection method to select the best single forecast. Second, we present several new and simple ways to construct ensemble models. We use correlation and standard deviation as weights in combining different forecast models. Third, we use these ensembles and compare with several existing models in literature to forecast storm surge level. We then investigate whether developing a complex ensemble model is indeed needed. To achieve this goal, we use a simple average (one of the simplest and widely used ensemble model) as benchmark. Predicting the peak level of Surge during a storm as well as the precise time at which this peak level takes place is crucial, thus we develop a statistical platform to compare the performance of various ensemble methods. This statistical analysis is based on root mean square error of the ensemble forecast during the testing period and on the magnitude and timing of the forecasted peak surge compared to the actual time and peak. In this work, we analyze four hurricanes: hurricanes Irene and Lee in 2011, hurricane Sandy in 2012, and hurricane Joaquin in 2015. Since hurricane Irene developed at the end of August 2011 and hurricane Lee started just after Irene at the beginning of September 2011, in this study we consider them as a single contiguous hurricane event. The data set used for this study is generated by the New York Harbor Observing and Prediction System (NYHOPS). We find that even the simplest possible way of creating an ensemble produces results superior to any single forecast. We also show that the ensemble models we propose generally have better performance compared to the simple average ensemble technique.

Keywords: Bayesian learning, ensemble model, statistical analysis, storm surge prediction

Procedia PDF Downloads 287
5796 Climate Related Variability and Stock-Recruitment Relationship of the North Pacific Albacore Tuna

Authors: Ashneel Ajay Singh, Naoki Suzuki, Kazumi Sakuramoto,

Abstract:

The North Pacific albacore (Thunnus alalunga) is a temperate tuna species distributed in the North Pacific which is of significant economic importance to the Pacific Island Nations and Territories. Despite its importance, the stock dynamics and ecological characteristics of albacore still, have gaps in knowledge. The stock-recruitment relationship of the North Pacific stock of albacore tuna was investigated for different density-dependent effects and a regime shift in the stock characteristics in response to changes in environmental and climatic conditions. Linear regression analysis for recruit per spawning biomass (RPS) and recruitment (R) against the female spawning stock biomass (SSB) were significant for the presence of different density-dependent effects and positive for a regime shift in the stock time series. Application of Deming regression to RPS against SSB with the assumption for the presence of observation and process errors in both the dependent and independent variables confirmed the results of simple regression. However, R against SSB results disagreed given variance level of < 3 and agreed with linear regression results given the assumption of variance ≥ 3. Assuming the presence of different density-dependent effects in the albacore tuna time series, environmental and climatic condition variables were compared with R, RPS, and SSB. The significant relationship of R, RPS and SSB were determined with the sea surface temperature (SST), Pacific Decadal Oscillation (PDO) and multivariate El Niño Southern Oscillation (ENSO) with SST being the principal variable exhibiting significantly similar trend with R and RPS. Recruitment is significantly influenced by the dynamics of the SSB as well as environmental conditions which demonstrates that the stock-recruitment relationship is multidimensional. Further investigation of the North Pacific albacore tuna age-class and structure is necessary for further support the results presented here. It is important for fishery managers and decision makers to be vigilant of regime shifts in environmental conditions relating to albacore tuna as it may possibly cause regime shifts in the albacore R and RPS which should be taken into account to effectively and sustainability formulate harvesting plans and management of the species in the North Pacific oceanic region.

Keywords: Albacore tuna, Thunnus alalunga, recruitment, spawning stock biomass, recruits per spawning biomass, sea surface temperature, pacific decadal oscillation, El Niño southern oscillation, density-dependent effects, regime shift

Procedia PDF Downloads 275
5795 Instability Index Method and Logistic Regression to Assess Landslide Susceptibility in County Route 89, Taiwan

Authors: Y. H. Wu, Ji-Yuan Lin, Yu-Ming Liou

Abstract:

This study aims to set up the landslide susceptibility map of County Route 89 at Ren-Ai Township in Nantou County using the Instability Index Method and Logistic regression. Seven susceptibility factors including Slope Angle, Aspect, Elevation, Distance to fold, Distance to River, Distance to Road and Accumulated Rainfall were obtained by GIS based on the Typhoon Toraji landslide area identified by Industrial Technology Research Institute in 2001. To calculate the landslide percentage of each factor and acquire the weight and grade the grid by means of Instability Index Method. In this study, landslide susceptibility can be classified into four grades: high, medium high, medium low and low, in order to determine the advantages and disadvantages of the two models. The precision of this model is verified by classification error matrix and SRC curve. These results suggest that the logistic regression model is a preferred method than instability index in the assessment of landslide susceptibility. It is suitable for the landslide prediction and precaution in this area in the future.

Keywords: instability index method, logistic regression, landslide susceptibility, SRC curve

Procedia PDF Downloads 260
5794 Regret-Regression for Multi-Armed Bandit Problem

Authors: Deyadeen Ali Alshibani

Abstract:

In the literature, the multi-armed bandit problem as a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. There are several different algorithms models and their applications on this problem. In this paper, we evaluate the Regret-regression through comparing with Q-learning method. A simulation on determination of optimal treatment regime is presented in detail.

Keywords: optimal, bandit problem, optimization, dynamic programming

Procedia PDF Downloads 424
5793 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Authors: Jihye Jeon

Abstract:

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Keywords: multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon

Procedia PDF Downloads 603
5792 QSRR Analysis of 17-Picolyl and 17-Picolinylidene Androstane Derivatives Based on Partial Least Squares and Principal Component Regression

Authors: Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Lidija Jevrić, Evgenija Djurendić, Jovana Ajduković

Abstract:

There are several methods for determination of the lipophilicity of biologically active compounds, however chromatography has been shown as a very suitable method for this purpose. Chromatographic (C18-RP-HPLC) analysis of a series of 24 17-picolyl and 17-picolinylidene androstane derivatives was carried out. The obtained retention indices (logk, methanol (90%) / water (10%)) were correlated with calculated physicochemical and lipophilicity descriptors. The QSRR analysis was carried out applying principal component regression (PCR) and partial least squares regression (PLS). The PCR and PLS model were selected on the basis of the highest variance and the lowest root mean square error of cross-validation. The obtained PCR and PLS model successfully correlate the calculated molecular descriptors with logk parameter indicating the significance of the lipophilicity of compounds in chromatographic process. On the basis of the obtained results it can be concluded that the obtained logk parameters of the analyzed androstane derivatives can be considered as their chromatographic lipophilicity. These results are the part of the project No. 114-451-347/2015-02, financially supported by the Provincial Secretariat for Science and Technological Development of Vojvodina and CMST COST Action CM1105.

Keywords: androstane derivatives, chromatography, molecular structure, principal component regression, partial least squares regression

Procedia PDF Downloads 242
5791 Detecting Earnings Management via Statistical and Neural Networks Techniques

Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie

Abstract:

Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.

Keywords: earnings management, generalized linear regression, neural networks multi-layer perceptron, Tehran stock exchange

Procedia PDF Downloads 397
5790 Edge Enhancement Visual Methodology for Fat Amount and Distribution Assessment in Dry-Cured Ham Slices

Authors: Silvia Grassi, Stefano Schiavon, Ernestina Casiraghi, Cristina Alamprese

Abstract:

Dry-cured ham is an uncooked meat product particularly appreciated for its peculiar sensory traits among which lipid component plays a key role in defining quality and, consequently, consumers’ acceptability. Usually, fat content and distribution are chemically determined by expensive, time-consuming, and destructive analyses. Moreover, different sensory techniques are applied to assess product conformity to desired standards. In this context, visual systems are getting a foothold in the meat market envisioning more reliable and time-saving assessment of food quality traits. The present work aims at developing a simple but systematic and objective visual methodology to assess the fat amount of dry-cured ham slices, in terms of total, intermuscular and intramuscular fractions. To the aim, 160 slices from 80 PDO dry-cured hams were evaluated by digital image analysis and Soxhlet extraction. RGB images were captured by a flatbed scanner, converted in grey-scale images, and segmented based on intensity histograms as well as on a multi-stage algorithm aimed at edge enhancement. The latter was performed applying the Canny algorithm, which consists of image noise reduction, calculation of the intensity gradient for each image, spurious response removal, actual thresholding on corrected images, and confirmation of strong edge boundaries. The approach allowed for the automatic calculation of total, intermuscular and intramuscular fat fractions as percentages of the total slice area. Linear regression models were run to estimate the relationships between the image analysis results and the chemical data, thus allowing for the prediction of the total, intermuscular and intramuscular fat content by the dry-cured ham images. The goodness of fit of the obtained models was confirmed in terms of coefficient of determination (R²), hypothesis testing and pattern of residuals. Good regression models have been found being 0.73, 0.82, and 0.73 the R2 values for the total fat, the sum of intermuscular and intramuscular fat and the intermuscular fraction, respectively. In conclusion, the edge enhancement visual procedure brought to a good fat segmentation making the simple visual approach for the quantification of the different fat fractions in dry-cured ham slices sufficiently simple, accurate and precise. The presented image analysis approach steers towards the development of instruments that can overcome destructive, tedious and time-consuming chemical determinations. As future perspectives, the results of the proposed image analysis methodology will be compared with those of sensory tests in order to develop a fast grading method of dry-cured hams based on fat distribution. Therefore, the system will be able not only to predict the actual fat content but it will also reflect the visual appearance of samples as perceived by consumers.

Keywords: dry-cured ham, edge detection algorithm, fat content, image analysis

Procedia PDF Downloads 151
5789 The Relationship between Employee Commitment, Job Satisfaction and External Market Orientation in Vietnamese Joint-Stock Commercial Banks

Authors: Nguyen Ngoc Que Tran

Abstract:

Purpose: The purpose of this paper is to investigate the relationship between internal market orientation, external market orientation, employee commitment and job satisfaction. Design/methodology/approach: This study collected data through a survey and utilized simple linear regression and multiple regression analysis to determine if there was any support for the research hypotheses as presented in the previous chapter. Findings: Using data from 256 employees of four leading joint stock banks in Vietnam, the empirical results indicates that employee commitment is positively related with external market orientation, job satisfaction is positively related to employee commitment, and employee commitment and job satisfaction are positively related to external market orientation. However, job satisfaction has no significant positive effect on external market orientation. Theoretical contribution: The primary contribution to marketing theory arising from this study is the integration of job satisfaction, employee commitment, and external market orientation in a single research model. Practical implications: The major contribution to practice is an external market oriented bank has to respond rapidly to the future needs and preferences of its customers. This could result in high levels of commitment to the service process and in doing so provide Vietnamese joint-stock commercial banks with a competitive advantage. The finding is important for the banking service sector in general and the Vietnamese banking industry in particular.

Keywords: employee commitment, job satisfaction and external market orientation, vietnam, bank

Procedia PDF Downloads 383
5788 Comparative Study od Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast

Authors: Nabilah Filzah Mohd Radzuan, Andi Putra, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Precipitation forecast is important to avoid natural disaster incident which can cause losses in the involved area. This paper reviews three techniques logistic regression, decision tree, and random forest which are used in making precipitation forecast. These combination techniques through the vector auto-regression (VAR) model help in finding the advantages and strengths of each technique in the forecast process. The data-set contains variables of the rain’s domain. Adaptation of artificial intelligence techniques involved in rain domain enables the forecast process to be easier and systematic for precipitation forecast.

Keywords: logistic regression, decisions tree, random forest, VAR model

Procedia PDF Downloads 418
5787 A Study of User Awareness and Attitudes Towards Civil-ID Authentication in Oman’s Electronic Services

Authors: Raya Al Khayari, Rasha Al Jassim, Muna Al Balushi, Fatma Al Moqbali, Said El Hajjar

Abstract:

This study utilizes linear regression analysis to investigate the correlation between user account passwords and the probability of civil ID exposure, offering statistical insights into civil ID security. The study employs multiple linear regression (MLR) analysis to further investigate the elements that influence consumers’ views of civil ID security. This aims to increase awareness and improve preventive measures. The results obtained from the MLR analysis provide a thorough comprehension and can guide specific educational and awareness campaigns aimed at promoting improved security procedures. In summary, the study’s results offer significant insights for improving existing security measures and developing more efficient tactics to reduce risks related to civil ID security in Oman. By identifying key factors that impact consumers’ perceptions, organizations can tailor their strategies to address vulnerabilities effectively. Additionally, the findings can inform policymakers on potential regulatory changes to enhance civil ID security in the country.

Keywords: civil-id disclosure, awareness, linear regression, multiple regression

Procedia PDF Downloads 11
5786 A Research on Inference from Multiple Distance Variables in Hedonic Regression Focus on Three Variables

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban context, urban nodes such as amenity or hazard will certainly affect house price, while classic hedonic analysis will employ distance variables measured from each urban nodes. However, effects from distances to facilities on house prices generally do not represent the true price of the property. Distance variables measured on the same surface are suffering a problem called multicollinearity, which is usually presented as magnitude variance and mean value in regression, errors caused by instability. In this paper, we provided a theoretical framework to identify and gather the data with less bias, and also provided specific sampling method on locating the sample region to avoid the spatial multicollinerity problem in three distance variable’s case.

Keywords: hedonic regression, urban node, distance variables, multicollinerity, collinearity

Procedia PDF Downloads 440
5785 The Efficacy of Clobazam for Landau-Kleffner Syndrome

Authors: Nino Gogatishvili, Davit Kvernadze, Giorgi Japharidze

Abstract:

Background and aims: Landau Kleffner syndrome (LKS) is a rare disorder with epileptic seizures and acquired aphasia. It usually starts in initially healthy children. The first symptoms are language regression and behavioral disturbances, and the sleep EEG reveals abnormal epileptiform activity. The aim was to discuss the efficacy of Clobazam for Landau Kleffner syndrome. Case report: We report a case of an 11-year-old boy with an uneventful pregnancy and delivery. He began to walk at 11 months and speak with simple phrases at the age of 2,5 years. At the age of 18 months, he had febrile convulsions; at the age of 5 years, the parents noticed language regression, stuttering, and serious behavioral dysfunction, including hyperactivity, temper outbursts. The epileptic seizure was not noticed. MRI was without any abnormality. Neuropsychological testing revealed verbal auditory agnosia. Sleep EEG showed abundant left fronto-temporal spikes, reaching over 85% during non-rapid eye movement sleep (non-REM sleep). Treatment was started with Clobazam. After ten weeks, EEG was improved. Stuttering and behavior also improved. Results: Since the start of Clobazam treatment, stuttering and behavior improved. Now, he is 11 years old, without antiseizure medication. Sleep EEG shows fronto-temporal spikes on the left side, over 10-49 % of non-REM sleep, bioccipital spikes, and slow-wave discharges and spike-waves. Conclusions: This case provides further support for the efficacy of Clobazam in patients with LKS.

Keywords: Landau-Kleffner syndrome, antiseizure medication, stuttering, aphasia

Procedia PDF Downloads 45
5784 Calculating Shear Strength Parameter from Simple Shear Apparatus

Authors: G. Nitesh

Abstract:

The shear strength of soils is a crucial parameter instability analysis. Therefore, it is important to determine reliable values for the accuracy of stability analysis. Direct shear tests are mostly performed to determine the shear strength of cohesionless soils. The major limitation of the direct shear test is that the failure takes place through the pre-defined failure plane but the failure is not along pre-defined plane and is along the weakest plane in actual shearing mechanism that goes on in the field. This leads to overestimating the strength parameter; hence, a new apparatus called simple shear is developed and used in this study to determine the shear strength parameter that simulates the field conditions.

Keywords: direct shear, simple shear, angle of shear resistance, cohesionless soils

Procedia PDF Downloads 385
5783 Urban Energy Demand Modelling: Spatial Analysis Approach

Authors: Hung-Chu Chen, Han Qi, Bauke de Vries

Abstract:

Energy consumption in the urban environment has attracted numerous researches in recent decades. However, it is comparatively rare to find literary works which investigated 3D spatial analysis of urban energy demand modelling. In order to analyze the spatial correlation between urban morphology and energy demand comprehensively, this paper investigates their relation by using the spatial regression tool. In addition, the spatial regression tool which is applied in this paper is ordinary least squares regression (OLS) and geographically weighted regression (GWR) model. Normalized Difference Built-up Index (NDBI), Normalized Difference Vegetation Index (NDVI), and building volume are explainers of urban morphology, which act as independent variables of Energy-land use (E-L) model. NDBI and NDVI are used as the index to describe five types of land use: urban area (U), open space (O), artificial green area (G), natural green area (V), and water body (W). Accordingly, annual electricity, gas demand and energy demand are dependent variables of the E-L model. Based on the analytical result of E-L model relation, it revealed that energy demand and urban morphology are closely connected and the possible causes and practical use are discussed. Besides, the spatial analysis methods of OLS and GWR are compared.

Keywords: energy demand model, geographically weighted regression, normalized difference built-up index, normalized difference vegetation index, spatial statistics

Procedia PDF Downloads 118
5782 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines

Authors: Arun Goel

Abstract:

The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free over-fall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, support vector machine (Polynomial and rbf) models, and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free over-fall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.

Keywords: air entrainment rate, dissolved oxygen, weir, SVM, regression

Procedia PDF Downloads 405
5781 Predicting COVID-19 Severity Using a Simple Parameters in Resource-Limited Settings

Authors: Sireethorn Nimitvilai, Ussanee Poolvivatchaikarn, Nuchanart Tomeun

Abstract:

Objective: To determine the simple laboratory parameters to predict disease severity among COVID-19 patients in resource-limited settings. Material and methods: A retrospective cohort study was conducted at Nakhonpathom Hospital, a 722-bed tertiary care hospital, with an average of 50,000 admissions per year, during April 15 and May 15, 2021. Eligible patients were adults aged ≥ 15 years who were hospitalized with COVID-19. Baseline characteristics, comorbid conditions ad laboratory findings at admission were collected. Predictive factors for severe COVID-19 infection were analyzed. Result: There were 207 patients (79 male and 128 female) and the mean age was 46.7 (16.8) years. Of these, 39 cases (18.8%) were severe and 168 (81.2%) cases were non-severe. Factors associated with severe COVID-19 were neutrophil to lymphocyte ratio ≥ 4 (OR 8.1, 95%CI 2.3-20.3, P < 0.001) and C-reactive protein to albumin ratio ≥ 10 (OR 3.49, 95%CI 1.3-9.1, p 0.01). Conclusions: Complete blood counts, C-reactive protein and albumin are simple, inexpensive, widely available tests and can be used to predict severe COVID-19 in resource-limited settings.

Keywords: COVID-19, predictor of severity, resource-limiting settings, simple laboratory parameters

Procedia PDF Downloads 148
5780 A Simple Approach for the Analysis of First Vibration Mode of Layered Soil Profiles

Authors: Haizhong Zhang, Yan-Gang Zhao

Abstract:

Fundamental period, mode shape, and participation factor are important basic information for the understanding of earthquake response of ground. In this study, a simple approach is presented to calculate these basic information of layered soil profiles. To develop this method, closed form equations are derived for analysis of free vibration of layered soil profiles firstly, based on equilibrium between inertia and elastic forces. Then, by further associating with the Madera procedure developed for estimation of fundamental period, a simple method that can directly determine the fundamental period, mode shape and participation factor is proposed. The proposed approach can be conveniently implemented in simple spreadsheets and easily used by practicing engineers. In addition, the accuracy of the proposed approach is investigated by analyzing first vibration mode of 67 representative layered soil profiles, it is found that results by the proposed method agree very well with accurate results.

Keywords: layered soil profile, natural vibration, fundamental period, fundamental mode shape

Procedia PDF Downloads 293
5779 Use of Regression Analysis in Determining the Length of Plastic Hinge in Reinforced Concrete Columns

Authors: Mehmet Alpaslan Köroğlu, Musa Hakan Arslan, Muslu Kazım Körez

Abstract:

Basic objective of this study is to create a regression analysis method that can estimate the length of a plastic hinge which is an important design parameter, by making use of the outcomes of (lateral load-lateral displacement hysteretic curves) the experimental studies conducted for the reinforced square concrete columns. For this aim, 170 different square reinforced concrete column tests results have been collected from the existing literature. The parameters which are thought affecting the plastic hinge length such as cross-section properties, features of material used, axial loading level, confinement of the column, longitudinal reinforcement bars in the columns etc. have been obtained from these 170 different square reinforced concrete column tests. In the study, when determining the length of plastic hinge, using the experimental test results, a regression analysis have been separately tested and compared with each other. In addition, the outcome of mentioned methods on determination of plastic hinge length of the reinforced concrete columns has been compared to other methods available in the literature.

Keywords: columns, plastic hinge length, regression analysis, reinforced concrete

Procedia PDF Downloads 449
5778 Measurement Errors and Misclassifications in Covariates in Logistic Regression: Bayesian Adjustment of Main and Interaction Effects and the Sample Size Implications

Authors: Shahadut Hossain

Abstract:

Measurement errors in continuous covariates and/or misclassifications in categorical covariates are common in epidemiological studies. Regression analysis ignoring such mismeasurements seriously biases the estimated main and interaction effects of covariates on the outcome of interest. Thus, adjustments for such mismeasurements are necessary. In this research, we propose a Bayesian parametric framework for eliminating deleterious impacts of covariate mismeasurements in logistic regression. The proposed adjustment method is unified and thus can be applied to any generalized linear and non-linear regression models. Furthermore, adjustment for covariate mismeasurements requires validation data usually in the form of either gold standard measurements or replicates of the mismeasured covariates on a subset of the study population. Initial investigation shows that adequacy of such adjustment depends on the sizes of main and validation samples, especially when prevalences of the categorical covariates are low. Thus, we investigate the impact of main and validation sample sizes on the adjusted estimates, and provide a general guideline about these sample sizes based on simulation studies.

Keywords: measurement errors, misclassification, mismeasurement, validation sample, Bayesian adjustment

Procedia PDF Downloads 386
5777 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents

Authors: M. Ouassaf, S. Belaid

Abstract:

A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.

Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR

Procedia PDF Downloads 126
5776 In and Out-Of-Sample Performance of Non Simmetric Models in International Price Differential Forecasting in a Commodity Country Framework

Authors: Nicola Rubino

Abstract:

This paper presents an analysis of a group of commodity exporting countries' nominal exchange rate movements in relationship to the US dollar. Using a series of Unrestricted Self-exciting Threshold Autoregressive models (SETAR), we model and evaluate sixteen national CPI price differentials relative to the US dollar CPI. Out-of-sample forecast accuracy is evaluated through calculation of mean absolute error measures on the basis of two-hundred and fifty-three months rolling window forecasts and extended to three additional models, namely a logistic smooth transition regression (LSTAR), an additive non linear autoregressive model (AAR) and a simple linear Neural Network model (NNET). Our preliminary results confirm presence of some form of TAR non linearity in the majority of the countries analyzed, with a relatively higher goodness of fit, with respect to the linear AR(1) benchmark, in five countries out of sixteen considered. Although no model appears to statistically prevail over the other, our final out-of-sample forecast exercise shows that SETAR models tend to have quite poor relative forecasting performance, especially when compared to alternative non-linear specifications. Finally, by analyzing the implied half-lives of the > coefficients, our results confirms the presence, in the spirit of arbitrage band adjustment, of band convergence with an inner unit root behaviour in five of the sixteen countries analyzed.

Keywords: transition regression model, real exchange rate, nonlinearities, price differentials, PPP, commodity points

Procedia PDF Downloads 254
5775 Early Impact Prediction and Key Factors Study of Artificial Intelligence Patents: A Method Based on LightGBM and Interpretable Machine Learning

Authors: Xingyu Gao, Qiang Wu

Abstract:

Patents play a crucial role in protecting innovation and intellectual property. Early prediction of the impact of artificial intelligence (AI) patents helps researchers and companies allocate resources and make better decisions. Understanding the key factors that influence patent impact can assist researchers in gaining a better understanding of the evolution of AI technology and innovation trends. Therefore, identifying highly impactful patents early and providing support for them holds immeasurable value in accelerating technological progress, reducing research and development costs, and mitigating market positioning risks. Despite the extensive research on AI patents, accurately predicting their early impact remains a challenge. Traditional methods often consider only single factors or simple combinations, failing to comprehensively and accurately reflect the actual impact of patents. This paper utilized the artificial intelligence patent database from the United States Patent and Trademark Office and the Len.org patent retrieval platform to obtain specific information on 35,708 AI patents. Using six machine learning models, namely Multiple Linear Regression, Random Forest Regression, XGBoost Regression, LightGBM Regression, Support Vector Machine Regression, and K-Nearest Neighbors Regression, and using early indicators of patents as features, the paper comprehensively predicted the impact of patents from three aspects: technical, social, and economic. These aspects include the technical leadership of patents, the number of citations they receive, and their shared value. The SHAP (Shapley Additive exPlanations) metric was used to explain the predictions of the best model, quantifying the contribution of each feature to the model's predictions. The experimental results on the AI patent dataset indicate that, for all three target variables, LightGBM regression shows the best predictive performance. Specifically, patent novelty has the greatest impact on predicting the technical impact of patents and has a positive effect. Additionally, the number of owners, the number of backward citations, and the number of independent claims are all crucial and have a positive influence on predicting technical impact. In predicting the social impact of patents, the number of applicants is considered the most critical input variable, but it has a negative impact on social impact. At the same time, the number of independent claims, the number of owners, and the number of backward citations are also important predictive factors, and they have a positive effect on social impact. For predicting the economic impact of patents, the number of independent claims is considered the most important factor and has a positive impact on economic impact. The number of owners, the number of sibling countries or regions, and the size of the extended patent family also have a positive influence on economic impact. The study primarily relies on data from the United States Patent and Trademark Office for artificial intelligence patents. Future research could consider more comprehensive data sources, including artificial intelligence patent data, from a global perspective. While the study takes into account various factors, there may still be other important features not considered. In the future, factors such as patent implementation and market applications may be considered as they could have an impact on the influence of patents.

Keywords: patent influence, interpretable machine learning, predictive models, SHAP

Procedia PDF Downloads 19
5774 Agile Software Effort Estimation Using Regression Techniques

Authors: Mikiyas Adugna

Abstract:

Effort estimation is among the activities carried out in software development processes. An accurate model of estimation leads to project success. The method of agile effort estimation is a complex task because of the dynamic nature of software development. Researchers are still conducting studies on agile effort estimation to enhance prediction accuracy. Due to these reasons, we investigated and proposed a model on LASSO and Elastic Net regression to enhance estimation accuracy. The proposed model has major components: preprocessing, train-test split, training with default parameters, and cross-validation. During the preprocessing phase, the entire dataset is normalized. After normalization, a train-test split is performed on the dataset, setting training at 80% and testing set to 20%. We chose two different phases for training the two algorithms (Elastic Net and LASSO) regression following the train-test-split. In the first phase, the two algorithms are trained using their default parameters and evaluated on the testing data. In the second phase, the grid search technique (the grid is used to search for tuning and select optimum parameters) and 5-fold cross-validation to get the final trained model. Finally, the final trained model is evaluated using the testing set. The experimental work is applied to the agile story point dataset of 21 software projects collected from six firms. The results show that both Elastic Net and LASSO regression outperformed the compared ones. Compared to the proposed algorithms, LASSO regression achieved better predictive performance and has acquired PRED (8%) and PRED (25%) results of 100.0, MMRE of 0.0491, MMER of 0.0551, MdMRE of 0.0593, MdMER of 0.063, and MSE of 0.0007. The result implies LASSO regression algorithm trained model is the most acceptable, and higher estimation performance exists in the literature.

Keywords: agile software development, effort estimation, elastic net regression, LASSO

Procedia PDF Downloads 27
5773 Robustified Asymmetric Logistic Regression Model for Global Fish Stock Assessment

Authors: Osamu Komori, Shinto Eguchi, Hiroshi Okamura, Momoko Ichinokawa

Abstract:

The long time-series data on population assessments are essential for global ecosystem assessment because the temporal change of biomass in such a database reflects the status of global ecosystem properly. However, the available assessment data usually have limited sample sizes and the ratio of populations with low abundance of biomass (collapsed) to those with high abundance (non-collapsed) is highly imbalanced. To allow for the imbalance and uncertainty involved in the ecological data, we propose a binary regression model with mixed effects for inferring ecosystem status through an asymmetric logistic model. In the estimation equation, we observe that the weights for the non-collapsed populations are relatively reduced, which in turn puts more importance on the small number of observations of collapsed populations. Moreover, we extend the asymmetric logistic regression model using propensity score to allow for the sample biases observed in the labeled and unlabeled datasets. It robustified the estimation procedure and improved the model fitting.

Keywords: double robust estimation, ecological binary data, mixed effect logistic regression model, propensity score

Procedia PDF Downloads 237
5772 Urban-Rural Inequality in Mexico after Nafta: A Quantile Regression Analysis

Authors: Rene Valdiviezo-Issa

Abstract:

In this paper, we use Mexico’s Households Income and Expenditures (ENIGH) survey to explain the behaviour that the urban-rural expenditure gap has had since Mexico’s incorporation to the North American Free Trade Agreement (NAFTA) in 1994 and we compare it with the latest available survey, which took place in 2014. We use real trimestral expenditure per capita (RTEPC) as the measure of welfare. We use quantile regressions and a quantile regression decomposition to describe the gap between urban and rural distributions of log RTEPC. We discover that the decrease in the difference between the urban and rural distributions of log RTEPC, or inequality, is motivated because of a deprivation of the urban areas, in very specific characteristics, rather than an improvement of the urban areas. When using the decomposition we observe that the gap is primarily brought about because differences in returns to covariates between the urban and rural areas.

Keywords: quantile regression, urban-rural inequality, inequality in Mexico, income decompositon

Procedia PDF Downloads 256
5771 Simplified Analysis on Steel Frame Infill with FRP Composite Panel

Authors: HyunSu Seo, HoYoung Son, Sungjin Kim, WooYoung Jung

Abstract:

In order to understand the seismic behavior of steel frame structure with infill FRP composite panel, simple models for simulation on the steel frame with the panel systems were developed in this study. To achieve the simple design method of the steel framed structure with the damping panel system, 2-D finite element analysis with the springs and dashpots models was conducted in ABAQUS. Under various applied spring stiffness and dashpot coefficient, the expected hysteretic energy responses of the steel frame with damping panel systems we re investigated. Using the proposed simple design method which decides the stiffness and the damping, it is possible to decide the FRP and damping materials on a steel frame system.

Keywords: numerical analysis, FEM, infill, GFRP, damping

Procedia PDF Downloads 394
5770 Knowledge and Eating Behavior of Teenage Pregnancy

Authors: Udomporn Yingpaisuk, Premwadee Karuhadej

Abstract:

The purposed of this research was to study the eating habit of teenage pregnancy and its relationship to the knowledge of nutrition during pregnancy. The 100 samples were derived from simple random sampling technique of the teenage pregnancy in Bangkae District. The questionnaire was used to collect data with the reliability of 0.8. The data were analyzed by SPSS for Windows with multiple regression technique. Percentage, mean and the relationship of knowledge of eating and eating behavior were obtained. The research results revealed that their knowledge in nutrition was at the average of 4.07 and their eating habit that they mentioned most was to refrain from alcohol and caffeine at 82% and the knowledge in nutrition influenced their eating habits at 54% with the statistically significant level of 0.001.

Keywords: teenage pregnancy, knowledge of eating, eating behavior, alcohol, caffeine

Procedia PDF Downloads 322
5769 Improvement of Data Transfer over Simple Object Access Protocol (SOAP)

Authors: Khaled Ahmed Kadouh, Kamal Ali Albashiri

Abstract:

This paper presents a designed algorithm involves improvement of transferring data over Simple Object Access Protocol (SOAP). The aim of this work is to establish whether using SOAP in exchanging XML messages has any added advantages or not. The results showed that XML messages without SOAP take longer time and consume more memory, especially with binary data.

Keywords: JAX-WS, SMTP, SOAP, web service, XML

Procedia PDF Downloads 463
5768 Developing Variable Repetitive Group Sampling Control Chart Using Regression Estimator

Authors: Liaquat Ahmad, Muhammad Aslam, Muhammad Azam

Abstract:

In this article, we propose a control chart based on repetitive group sampling scheme for the location parameter. This charting scheme is based on the regression estimator; an estimator that capitalize the relationship between the variables of interest to provide more sensitive control than the commonly used individual variables. The control limit coefficients have been estimated for different sample sizes for less and highly correlated variables. The monitoring of the production process is constructed by adopting the procedure of the Shewhart’s x-bar control chart. Its performance is verified by the average run length calculations when the shift occurs in the average value of the estimator. It has been observed that the less correlated variables have rapid false alarm rate.

Keywords: average run length, control charts, process shift, regression estimators, repetitive group sampling

Procedia PDF Downloads 536