Search results for: nonparametric regression model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18218

Search results for: nonparametric regression model

17918 Electroencephalogram Based Approach for Mental Stress Detection during Gameplay with Level Prediction

Authors: Priyadarsini Samal, Rajesh Singla

Abstract:

Many mobile games come with the benefits of entertainment by introducing stress to the human brain. In recognizing this mental stress, the brain-computer interface (BCI) plays an important role. It has various neuroimaging approaches which help in analyzing the brain signals. Electroencephalogram (EEG) is the most commonly used method among them as it is non-invasive, portable, and economical. Here, this paper investigates the pattern in brain signals when introduced with mental stress. Two healthy volunteers played a game whose aim was to search hidden words from the grid, and the levels were chosen randomly. The EEG signals during gameplay were recorded to investigate the impacts of stress with the changing levels from easy to medium to hard. A total of 16 features of EEG were analyzed for this experiment which includes power band features with relative powers, event-related desynchronization, along statistical features. Support vector machine was used as the classifier, which resulted in an accuracy of 93.9% for three-level stress analysis; for two levels, the accuracy of 92% and 98% are achieved. In addition to that, another game that was similar in nature was played by the volunteers. A suitable regression model was designed for prediction where the feature sets of the first and second game were used for testing and training purposes, respectively, and an accuracy of 73% was found.

Keywords: brain computer interface, electroencephalogram, regression model, stress, word search

Procedia PDF Downloads 164
17917 Gender Estimation by Means of Quantitative Measurements of Foramen Magnum: An Analysis of CT Head Images

Authors: Thilini Hathurusinghe, Uthpalie Siriwardhana, W. M. Ediri Arachchi, Ranga Thudugala, Indeewari Herath, Gayani Senanayake

Abstract:

The foramen magnum is more prone to protect than other skeletal remains during high impact and severe disruptive injuries. Therefore, it is worthwhile to explore whether these measurements can be used to determine the human gender which is vital in forensic and anthropological studies. The idea was to find out the ability to use quantitative measurements of foramen magnum as an anatomical indicator for human gender estimation and to evaluate the gender-dependent variations of foramen magnum using quantitative measurements. Randomly selected 113 subjects who underwent CT head scans at Sri Jayawardhanapura General Hospital of Sri Lanka within a period of six months, were included in the study. The sample contained 58 males (48.76 ± 14.7 years old) and 55 females (47.04 ±15.9 years old). Maximum length of the foramen magnum (LFM), maximum width of the foramen magnum (WFM), minimum distance between occipital condyles (MnD) and maximum interior distance between occipital condyles (MxID) were measured. Further, AreaT and AreaR were also calculated. The gender was estimated using binomial logistic regression. The mean values of all explanatory variables (LFM, WFM, MnD, MxID, AreaT, and AreaR) were greater among male than female. All explanatory variables except MnD (p=0.669) were statistically significant (p < 0.05). Significant bivariate correlations were demonstrated by AreaT and AreaR with the explanatory variables. The results evidenced that WFM and MxID were the best measurements in predicting gender according to binomial logistic regression. The estimated model was: log (p/1-p) =10.391-0.136×MxID-0.231×WFM, where p is the probability of being a female. The classification accuracy given by the above model was 65.5%. The quantitative measurements of foramen magnum can be used as a reliable anatomical marker for human gender estimation in the Sri Lankan context.

Keywords: foramen magnum, forensic and anthropological studies, gender estimation, logistic regression

Procedia PDF Downloads 125
17916 Kernel-Based Double Nearest Proportion Feature Extraction for Hyperspectral Image Classification

Authors: Hung-Sheng Lin, Cheng-Hsuan Li

Abstract:

Over the past few years, kernel-based algorithms have been widely used to extend some linear feature extraction methods such as principal component analysis (PCA), linear discriminate analysis (LDA), and nonparametric weighted feature extraction (NWFE) to their nonlinear versions, kernel principal component analysis (KPCA), generalized discriminate analysis (GDA), and kernel nonparametric weighted feature extraction (KNWFE), respectively. These nonlinear feature extraction methods can detect nonlinear directions with the largest nonlinear variance or the largest class separability based on the given kernel function. Moreover, they have been applied to improve the target detection or the image classification of hyperspectral images. The double nearest proportion feature extraction (DNP) can effectively reduce the overlap effect and have good performance in hyperspectral image classification. The DNP structure is an extension of the k-nearest neighbor technique. For each sample, there are two corresponding nearest proportions of samples, the self-class nearest proportion and the other-class nearest proportion. The term “nearest proportion” used here consider both the local information and other more global information. With these settings, the effect of the overlap between the sample distributions can be reduced. Usually, the maximum likelihood estimator and the related unbiased estimator are not ideal estimators in high dimensional inference problems, particularly in small data-size situation. Hence, an improved estimator by shrinkage estimation (regularization) is proposed. Based on the DNP structure, LDA is included as a special case. In this paper, the kernel method is applied to extend DNP to kernel-based DNP (KDNP). In addition to the advantages of DNP, KDNP surpasses DNP in the experimental results. According to the experiments on the real hyperspectral image data sets, the classification performance of KDNP is better than that of PCA, LDA, NWFE, and their kernel versions, KPCA, GDA, and KNWFE.

Keywords: feature extraction, kernel method, double nearest proportion feature extraction, kernel double nearest feature extraction

Procedia PDF Downloads 299
17915 Analytical Authentication of Butter Using Fourier Transform Infrared Spectroscopy Coupled with Chemometrics

Authors: M. Bodner, M. Scampicchio

Abstract:

Fourier Transform Infrared (FT-IR) spectroscopy coupled with chemometrics was used to distinguish between butter samples and non-butter samples. Further, quantification of the content of margarine in adulterated butter samples was investigated. Fingerprinting region (1400-800 cm–1) was used to develop unsupervised pattern recognition (Principal Component Analysis, PCA), supervised modeling (Soft Independent Modelling by Class Analogy, SIMCA), classification (Partial Least Squares Discriminant Analysis, PLS-DA) and regression (Partial Least Squares Regression, PLS-R) models. PCA of the fingerprinting region shows a clustering of the two sample types. All samples were classified in their rightful class by SIMCA approach; however, nine adulterated samples (between 1% and 30% w/w of margarine) were classified as belonging both at the butter class and at the non-butter one. In the two-class PLS-DA model’s (R2 = 0.73, RMSEP, Root Mean Square Error of Prediction = 0.26% w/w) sensitivity was 71.4% and Positive Predictive Value (PPV) 100%. Its threshold was calculated at 7% w/w of margarine in adulterated butter samples. Finally, PLS-R model (R2 = 0.84, RMSEP = 16.54%) was developed. PLS-DA was a suitable classification tool and PLS-R a proper quantification approach. Results demonstrate that FT-IR spectroscopy combined with PLS-R can be used as a rapid, simple and safe method to identify pure butter samples from adulterated ones and to determine the grade of adulteration of margarine in butter samples.

Keywords: adulterated butter, margarine, PCA, PLS-DA, PLS-R, SIMCA

Procedia PDF Downloads 114
17914 Spatiotemporal Variability in Rainfall Trends over Sinai Peninsula Using Nonparametric Methods and Discrete Wavelet Transforms

Authors: Mosaad Khadr

Abstract:

Knowledge of the temporal and spatial variability of rainfall trends has been of great concern for efficient water resource planning, management. In this study annual, seasonal and monthly rainfall trends over the Sinai Peninsula were analyzed by using absolute homogeneity tests, nonparametric Mann–Kendall (MK) test and Sen’s slope estimator methods. The homogeneity of rainfall time-series was examined using four absolute homogeneity tests namely, the Pettitt test, standard normal homogeneity test, Buishand range test, and von Neumann ratio test. Further, the sequential change in the trend of annual and seasonal rainfalls is conducted using sequential MK (SQMK) method. Then the trend analysis based on discrete wavelet transform technique (DWT) in conjunction with SQMK method is performed. The spatial patterns of the detected rainfall trends were investigated using a geostatistical and deterministic spatial interpolation technique. The results achieved from the Mann–Kendall test to the data series (using the 5% significance level) highlighted that rainfall was generally decreasing in January, February, March, November, December, wet season, and annual rainfall. A significant decreasing trend in the winter and annual rainfall with significant levels were inferred based on the Mann-Kendall rank statistics and linear trend. Further, the discrete wavelet transform (DWT) analysis reveal that in general, intra- and inter-annual events (up to 4 years) are more influential in affecting the observed trends. The nature of the trend captured by both methods is similar for all of the cases. On the basis of spatial trend analysis, significant rainfall decreases were also noted in the investigated stations. Overall, significant downward trends in winter and annual rainfall over the Sinai Peninsula was observed during the study period.

Keywords: trend analysis, rainfall, Mann–Kendall test, discrete wavelet transform, Sinai Peninsula

Procedia PDF Downloads 140
17913 An Application of the Single Equation Regression Model

Authors: S. K. Ashiquer Rahman

Abstract:

Recently, oil has become more influential in almost every economic sector as a key material. As can be seen from the news, when there are some changes in an oil price or OPEC announces a new strategy, its effect spreads to every part of the economy directly and indirectly. That’s a reason why people always observe the oil price and try to forecast the changes of it. The most important factor affecting the price is its supply which is determined by the number of wildcats drilled. Therefore, a study about the number of wellheads and other economic variables may give us some understanding of the mechanism indicated by the amount of oil supplies. In this paper, we will consider a relationship between the number of wellheads and three key factors: the price of the wellhead, domestic output, and GNP constant dollars. We also add trend variables in the models because the consumption of oil varies from time to time. Moreover, this paper will use an econometrics method to estimate parameters in the model, apply some tests to verify the result we acquire, and then conclude the model.

Keywords: price, domestic output, GNP, trend variable, wildcat activity

Procedia PDF Downloads 24
17912 Using Machine Learning to Enhance Win Ratio for College Ice Hockey Teams

Authors: Sadixa Sanjel, Ahmed Sadek, Naseef Mansoor, Zelalem Denekew

Abstract:

Collegiate ice hockey (NCAA) sports analytics is different from the national level hockey (NHL). We apply and compare multiple machine learning models such as Linear Regression, Random Forest, and Neural Networks to predict the win ratio for a team based on their statistics. Data exploration helps determine which statistics are most useful in increasing the win ratio, which would be beneficial to coaches and team managers. We ran experiments to select the best model and chose Random Forest as the best performing. We conclude with how to bridge the gap between the college and national levels of sports analytics and the use of machine learning to enhance team performance despite not having a lot of metrics or budget for automatic tracking.

Keywords: NCAA, NHL, sports analytics, random forest, regression, neural networks, game predictions

Procedia PDF Downloads 81
17911 Development of a Data-Driven Method for Diagnosing the State of Health of Battery Cells, Based on the Use of an Electrochemical Aging Model, with a View to Their Use in Second Life

Authors: Desplanches Maxime

Abstract:

Accurate estimation of the remaining useful life of lithium-ion batteries for electronic devices is crucial. Data-driven methodologies encounter challenges related to data volume and acquisition protocols, particularly in capturing a comprehensive range of aging indicators. To address these limitations, we propose a hybrid approach that integrates an electrochemical model with state-of-the-art data analysis techniques, yielding a comprehensive database. Our methodology involves infusing an aging phenomenon into a Newman model, leading to the creation of an extensive database capturing various aging states based on non-destructive parameters. This database serves as a robust foundation for subsequent analysis. Leveraging advanced data analysis techniques, notably principal component analysis and t-Distributed Stochastic Neighbor Embedding, we extract pivotal information from the data. This information is harnessed to construct a regression function using either random forest or support vector machine algorithms. The resulting predictor demonstrates a 5% error margin in estimating remaining battery life, providing actionable insights for optimizing usage. Furthermore, the database was built from the Newman model calibrated for aging and performance using data from a European project called Teesmat. The model was then initialized numerous times with different aging values, for instance, with varying thicknesses of SEI (Solid Electrolyte Interphase). This comprehensive approach ensures a thorough exploration of battery aging dynamics, enhancing the accuracy and reliability of our predictive model. Of particular importance is our reliance on the database generated through the integration of the electrochemical model. This database serves as a crucial asset in advancing our understanding of aging states. Beyond its capability for precise remaining life predictions, this database-driven approach offers valuable insights for optimizing battery usage and adapting the predictor to various scenarios. This underscores the practical significance of our method in facilitating better decision-making regarding lithium-ion battery management.

Keywords: Li-ion battery, aging, diagnostics, data analysis, prediction, machine learning, electrochemical model, regression

Procedia PDF Downloads 33
17910 Machine Learning Approach for Stress Detection Using Wireless Physical Activity Tracker

Authors: B. Padmaja, V. V. Rama Prasad, K. V. N. Sunitha, E. Krishna Rao Patro

Abstract:

Stress is a psychological condition that reduces the quality of sleep and affects every facet of life. Constant exposure to stress is detrimental not only for mind but also body. Nevertheless, to cope with stress, one should first identify it. This paper provides an effective method for the cognitive stress level detection by using data provided from a physical activity tracker device Fitbit. This device gathers people’s daily activities of food, weight, sleep, heart rate, and physical activities. In this paper, four major stressors like physical activities, sleep patterns, working hours and change in heart rate are used to assess the stress levels of individuals. The main motive of this system is to use machine learning approach in stress detection with the help of Smartphone sensor technology. Individually, the effect of each stressor is evaluated using logistic regression and then combined model is built and assessed using variants of ordinal logistic regression models like logit, probit and complementary log-log. Then the quality of each model is evaluated using Akaike Information Criterion (AIC) and probit is assessed as the more suitable model for our dataset. This system is experimented and evaluated in a real time environment by taking data from adults working in IT and other sectors in India. The novelty of this work lies in the fact that stress detection system should be less invasive as possible for the users.

Keywords: physical activity tracker, sleep pattern, working hours, heart rate, smartphone sensor

Procedia PDF Downloads 211
17909 Stock Prediction and Portfolio Optimization Thesis

Authors: Deniz Peksen

Abstract:

This thesis aims to predict trend movement of closing price of stock and to maximize portfolio by utilizing the predictions. In this context, the study aims to define a stock portfolio strategy from models created by using Logistic Regression, Gradient Boosting and Random Forest. Recently, predicting the trend of stock price has gained a significance role in making buy and sell decisions and generating returns with investment strategies formed by machine learning basis decisions. There are plenty of studies in the literature on the prediction of stock prices in capital markets using machine learning methods but most of them focus on closing prices instead of the direction of price trend. Our study differs from literature in terms of target definition. Ours is a classification problem which is focusing on the market trend in next 20 trading days. To predict trend direction, fourteen years of data were used for training. Following three years were used for validation. Finally, last three years were used for testing. Training data are between 2002-06-18 and 2016-12-30 Validation data are between 2017-01-02 and 2019-12-31 Testing data are between 2020-01-02 and 2022-03-17 We determine Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate as benchmarks which we should outperform. We compared our machine learning basis portfolio return on test data with return of Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate. We assessed our model performance with the help of roc-auc score and lift charts. We use logistic regression, Gradient Boosting and Random Forest with grid search approach to fine-tune hyper-parameters. As a result of the empirical study, the existence of uptrend and downtrend of five stocks could not be predicted by the models. When we use these predictions to define buy and sell decisions in order to generate model-based-portfolio, model-based-portfolio fails in test dataset. It was found that Model-based buy and sell decisions generated a stock portfolio strategy whose returns can not outperform non-model portfolio strategies on test dataset. We found that any effort for predicting the trend which is formulated on stock price is a challenge. We found same results as Random Walk Theory claims which says that stock price or price changes are unpredictable. Our model iterations failed on test dataset. Although, we built up several good models on validation dataset, we failed on test dataset. We implemented Random Forest, Gradient Boosting and Logistic Regression. We discovered that complex models did not provide advantage or additional performance while comparing them with Logistic Regression. More complexity did not lead us to reach better performance. Using a complex model is not an answer to figure out the stock-related prediction problem. Our approach was to predict the trend instead of the price. This approach converted our problem into classification. However, this label approach does not lead us to solve the stock prediction problem and deny or refute the accuracy of the Random Walk Theory for the stock price.

Keywords: stock prediction, portfolio optimization, data science, machine learning

Procedia PDF Downloads 52
17908 On Reliability of a Credit Default Swap Contract during the EMU Debt Crisis

Authors: Petra Buzkova, Milos Kopa

Abstract:

Reliability of the credit default swap market had been questioned repeatedly during the EMU debt crisis. The article examines whether this development influenced sovereign EMU CDS prices in general. We regress the CDS market price on a model risk neutral CDS price obtained from an adopted reduced form valuation model in the 2009-2013 period. We look for a break point in the single-equation and multi-equation econometric models in order to show the changes in relations between CDS market and model prices. Our results differ according to the risk profile of a country. We find that in the case of riskier countries, the relationship between the market and model price changed when market participants started to question the ability of CDS contracts to protect their buyers. Specifically, it weakened after the change. In the case of less risky countries, the change happened earlier and the effect of a weakened relationship is not observed.

Keywords: chow stability test, credit default swap, debt crisis, reduced form valuation model, seemingly unrelated regression

Procedia PDF Downloads 232
17907 Inequality of Opportunities in the Health of the Adult Population of Russia

Authors: Marina Kartseva, Polina Kuznetsova

Abstract:

In our work, we estimate the contribution of inequality of opportunity to inequality in the health of the Russian population aged 25 to 74 years. The empirical basis of the study is the nationally representative data of the RLMS for 2018. Individual health is measured using a self-reported status on five-point scale. The startconditions are characterized by parental education and place of birth (country, type of settlement). Personal efforts to maintain health include the level of education, smoking status, and physical activity. To understand how start opportunities affect an individual's health, we use the methodology proposed in (Trannoy et al., 2010), which takes into account both direct and indirect (through the influence on efforts) effects. Regression analysis shows that all other things being equal, the starting capabilities of individuals have a significant impact on their health. In particular, parental education has a positive effect on self-reported health. Birth in another country, in another settlement, and in an urban area, on the contrary, reduceself-reported health. This allows to conclude that there exists an unfair inequality in health, namely inequality caused by factors that are independent of a person's own efforts. We estimate the contribution of inequality of opportunity to inequality in health using a nonparametric approach (Checchi, Peragine, 2010; Lazar, 2013). According to the obtained results, the contribution of unfair inequality as 72-74% for the population as a whole, being slightly higher for women (62-74% and 60-69% for men and women, respectively) and for older age (59- 62% and 67-75% for groups 25-44 years old and 45-74 years old, respectively). The obtained estimates are comparable with the results for other countries and indicate the importance of the problem of inequality of opportunities in health in Russia.

Keywords: inequality of opportunity, inequality in health, self-reported health, efforts, health-related lifestyle, Russia, RLMS

Procedia PDF Downloads 148
17906 Automatic API Regression Analyzer and Executor

Authors: Praveena Sridhar, Nihar Devathi, Parikshit Chakraborty

Abstract:

As the software product changes versions across releases, there are changes to the API’s and features and the upgrades become necessary. Hence, it becomes imperative to get the impact of upgrading the dependent components. This tool finds out API changes across two versions and their impact on other API’s followed by execution of the automated regression suites relevant to updates and their impacted areas. This tool has 4 layer architecture, each layer with its own unique pre-assigned capability which it does and sends the required information to next layer. This are the 4 layers. 1) Comparator: Compares the two versions of API. 2) Analyzer: Analyses the API doc and gives the modified class and its dependencies along with implemented interface details. 3) Impact Filter: Find the impact of the modified class on the other API methods. 4) Auto Executer: Based on the output given by Impact Filter, Executor will run the API regression Suite. Tool reads the java doc and extracts the required information of classes, interfaces and enumerations. The extracted information is saved into a data structure which shows the class details and its dependencies along with interfaces and enumerations that are listed in the java doc.

Keywords: automation impact regression, java doc, executor, analyzer, layers

Procedia PDF Downloads 461
17905 Impact Factor Analysis for Spatially Varying Aerosol Optical Depth in Wuhan Agglomeration

Authors: Wenting Zhang, Shishi Liu, Peihong Fu

Abstract:

As an indicator of air quality and directly related to concentration of ground PM2.5, the spatial-temporal variation and impact factor analysis of Aerosol Optical Depth (AOD) have been a hot spot in air pollution. This paper concerns the non-stationarity and the autocorrelation (with Moran’s I index of 0.75) of the AOD in Wuhan agglomeration (WHA), in central China, uses the geographically weighted regression (GRW) to identify the spatial relationship of AOD and its impact factors. The 3 km AOD product of Moderate Resolution Imaging Spectrometer (MODIS) is used in this study. Beyond the economic-social factor, land use density factors, vegetable cover, and elevation, the landscape metric is also considered as one factor. The results suggest that the GWR model is capable of dealing with spatial varying relationship, with R square, corrected Akaike Information Criterion (AICc) and standard residual better than that of ordinary least square (OLS) model. The results of GWR suggest that the urban developing, forest, landscape metric, and elevation are the major driving factors of AOD. Generally, the higher AOD trends to located in the place with higher urban developing, less forest, and flat area.

Keywords: aerosol optical depth, geographically weighted regression, land use change, Wuhan agglomeration

Procedia PDF Downloads 331
17904 Partial Least Square Regression for High-Dimentional and High-Correlated Data

Authors: Mohammed Abdullah Alshahrani

Abstract:

The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.

Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data

Procedia PDF Downloads 9
17903 Modeling Karachi Dengue Outbreak and Exploration of Climate Structure

Authors: Syed Afrozuddin Ahmed, Junaid Saghir Siddiqi, Sabah Quaiser

Abstract:

Various studies have reported that global warming causes unstable climate and many serious impact to physical environment and public health. The increasing incidence of dengue incidence is now a priority health issue and become a health burden of Pakistan. In this study it has been investigated that spatial pattern of environment causes the emergence or increasing rate of dengue fever incidence that effects the population and its health. The climatic or environmental structure data and the Dengue Fever (DF) data was processed by coding, editing, tabulating, recoding, restructuring in terms of re-tabulating was carried out, and finally applying different statistical methods, techniques, and procedures for the evaluation. Five climatic variables which we have studied are precipitation (P), Maximum temperature (Mx), Minimum temperature (Mn), Humidity (H) and Wind speed (W) collected from 1980-2012. The dengue cases in Karachi from 2010 to 2012 are reported on weekly basis. Principal component analysis is applied to explore the climatic variables and/or the climatic (structure) which may influence in the increase or decrease in the number of dengue fever cases in Karachi. PC1 for all the period is General atmospheric condition. PC2 for dengue period is contrast between precipitation and wind speed. PC3 is the weighted difference between maximum temperature and wind speed. PC4 for dengue period contrast between maximum and wind speed. Negative binomial and Poisson regression model are used to correlate the dengue fever incidence to climatic variable and principal component score. Relative humidity is estimated to positively influence on the chances of dengue occurrence by 1.71% times. Maximum temperature positively influence on the chances dengue occurrence by 19.48% times. Minimum temperature affects positively on the chances of dengue occurrence by 11.51% times. Wind speed is effecting negatively on the weekly occurrence of dengue fever by 7.41% times.

Keywords: principal component analysis, dengue fever, negative binomial regression model, poisson regression model

Procedia PDF Downloads 407
17902 Exploring Factors Affecting Electricity Production in Malaysia

Authors: Endang Jati Mat Sahid, Hussain Ali Bekhet

Abstract:

Ability to supply reliable and secure electricity has been one of the crucial components of economic development for any country. Forecasting of electricity production is therefore very important for accurate investment planning of generation power plants. In this study, we aim to examine and analyze the factors that affect electricity generation. Multiple regression models were used to find the relationship between various variables and electricity production. The models will simultaneously determine the effects of the variables on electricity generation. Many variables influencing electricity generation, i.e. natural gas (NG), coal (CO), fuel oil (FO), renewable energy (RE), gross domestic product (GDP) and fuel prices (FP), were examined for Malaysia. The results demonstrate that NG, CO, and FO were the main factors influencing electricity generation growth. This study then identified a number of policy implications resulting from the empirical results.

Keywords: energy policy, energy security, electricity production, Malaysia, the regression model

Procedia PDF Downloads 125
17901 Bayesian Variable Selection in Quantile Regression with Application to the Health and Retirement Study

Authors: Priya Kedia, Kiranmoy Das

Abstract:

There is a rich literature on variable selection in regression setting. However, most of these methods assume normality for the response variable under consideration for implementing the methodology and establishing the statistical properties of the estimates. In many real applications, the distribution for the response variable may be non-Gaussian, and one might be interested in finding the best subset of covariates at some predetermined quantile level. We develop dynamic Bayesian approach for variable selection in quantile regression framework. We use a zero-inflated mixture prior for the regression coefficients, and consider the asymmetric Laplace distribution for the response variable for modeling different quantiles of its distribution. An efficient Gibbs sampler is developed for our computation. Our proposed approach is assessed through extensive simulation studies, and real application of the proposed approach is also illustrated. We consider the data from health and retirement study conducted by the University of Michigan, and select the important predictors when the outcome of interest is out-of-pocket medical cost, which is considered as an important measure for financial risk. Our analysis finds important predictors at different quantiles of the outcome, and thus enhance our understanding on the effects of different predictors on the out-of-pocket medical cost.

Keywords: variable selection, quantile regression, Gibbs sampler, asymmetric Laplace distribution

Procedia PDF Downloads 125
17900 Calibration Model of %Titratable Acidity (Citric Acid) for Intact Tomato by Transmittance SW-NIR Spectroscopy

Authors: K. Petcharaporn, S. Kumchoo

Abstract:

The acidity (citric acid) is one of the chemical contents that can refer to the internal quality and the maturity index of tomato. The titratable acidity (%TA) can be predicted by a non-destructive method prediction by using the transmittance short wavelength (SW-NIR). Spectroscopy in the wavelength range between 665-955 nm. The set of 167 tomato samples divided into groups of 117 tomatoes sample for training set and 50 tomatoes sample for test set were used to establish the calibration model to predict and measure %TA by partial least squares regression (PLSR) technique. The spectra were pretreated with MSC pretreatment and it gave the optimal result for calibration model as (R = 0.92, RMSEC = 0.03%) and this model obtained high accuracy result to use for %TA prediction in test set as (R = 0.81, RMSEP = 0.05%). From the result of prediction in test set shown that the transmittance SW-NIR spectroscopy technique can be used for a non-destructive method for %TA prediction of tomatoes.

Keywords: tomato, quality, prediction, transmittance, titratable acidity, citric acid

Procedia PDF Downloads 241
17899 Development and Validation of a Coronary Heart Disease Risk Score in Indian Type 2 Diabetes Mellitus Patients

Authors: Faiz N. K. Yusufi, Aquil Ahmed, Jamal Ahmad

Abstract:

Diabetes in India is growing at an alarming rate and the complications caused by it need to be controlled. Coronary heart disease (CHD) is one of the complications that will be discussed for prediction in this study. India has the second most number of diabetes patients in the world. To the best of our knowledge, there is no CHD risk score for Indian type 2 diabetes patients. Any form of CHD has been taken as the event of interest. A sample of 750 was determined and randomly collected from the Rajiv Gandhi Centre for Diabetes and Endocrinology, J.N.M.C., A.M.U., Aligarh, India. Collected variables include patients data such as sex, age, height, weight, body mass index (BMI), blood sugar fasting (BSF), post prandial sugar (PP), glycosylated haemoglobin (HbA1c), diastolic blood pressure (DBP), systolic blood pressure (SBP), smoking, alcohol habits, total cholesterol (TC), triglycerides (TG), high density lipoprotein (HDL), low density lipoprotein (LDL), very low density lipoprotein (VLDL), physical activity, duration of diabetes, diet control, history of antihypertensive drug treatment, family history of diabetes, waist circumference, hip circumference, medications, central obesity and history of CHD. Predictive risk scores of CHD events are designed by cox proportional hazard regression. Model calibration and discrimination is assessed from Hosmer Lemeshow and area under receiver operating characteristic (ROC) curve. Overfitting and underfitting of the model is checked by applying regularization techniques and best method is selected between ridge, lasso and elastic net regression. Youden’s index is used to choose the optimal cut off point from the scores. Five year probability of CHD is predicted by both survival function and Markov chain two state model and the better technique is concluded. The risk scores for CHD developed can be calculated by doctors and patients for self-control of diabetes. Furthermore, the five-year probabilities can be implemented as well to forecast and maintain the condition of patients.

Keywords: coronary heart disease, cox proportional hazard regression, ROC curve, type 2 diabetes Mellitus

Procedia PDF Downloads 189
17898 Analysis of Attention to the Confucius Institute from Domestic and Foreign Mainstream Media

Authors: Wei Yang, Xiaohui Cui, Weiping Zhu, Liqun Liu

Abstract:

The rapid development of the Confucius Institute is attracting more and more attention from mainstream media around the world. Mainstream media plays a large role in public information dissemination and public opinion. This study presents efforts to analyze the correlation and functional relationship between domestic and foreign mainstream media by analyzing the amount of reports on the Confucius Institute. Three kinds of correlation calculation methods, the Pearson correlation coefficient (PCC), the Spearman correlation coefficient (SCC), and the Kendall rank correlation coefficient (KCC), were applied to analyze the correlations among mainstream media from three regions: mainland of China; Hong Kong and Macao (the two special administration regions of China denoted as SARs); and overseas countries excluding China, such as the United States, England, and Canada. Further, the paper measures the functional relationships among the regions using a regression model. The experimental analyses found high correlations among mainstream media from the different regions. Additionally, we found that there is a linear relationship between the mainstream media of overseas countries and those of the SARs by analyzing the amount of reports on the Confucius Institute based on a data set obtained by crawling the websites of 106 mainstream media during the years 2004 to 2014.

Keywords: mainstream media, Confucius institute, correlation analysis, regression model

Procedia PDF Downloads 287
17897 Application of Grey Theory in the Forecast of Facility Maintenance Hours for Office Building Tenants and Public Areas

Authors: Yen Chia-Ju, Cheng Ding-Ruei

Abstract:

This study took case office building as subject and explored the responsive work order repair request of facilities and equipment in offices and public areas by gray theory, with the purpose of providing for future related office building owners, executive managers, property management companies, mechanical and electrical companies as reference for deciding and assessing forecast model. Important conclusions of this study are summarized as follows according to the study findings: 1. Grey Relational Analysis discusses the importance of facilities repair number of six categories, namely, power systems, building systems, water systems, air conditioning systems, fire systems and manpower dispatch in order. In terms of facilities maintenance importance are power systems, building systems, water systems, air conditioning systems, manpower dispatch and fire systems in order. 2. GM (1,N) and regression method took maintenance hours as dependent variables and repair number, leased area and tenants number as independent variables and conducted single month forecast based on 12 data from January to December 2011. The mean absolute error and average accuracy of GM (1,N) from verification results were 6.41% and 93.59%; the mean absolute error and average accuracy of regression model were 4.66% and 95.34%, indicating that they have highly accurate forecast capability.

Keywords: rey theory, forecast model, Taipei 101, office buildings, property management, facilities, equipment

Procedia PDF Downloads 410
17896 On Estimating the Headcount Index by Using the Logistic Regression Estimator

Authors: Encarnación Álvarez, Rosa M. García-Fernández, Juan F. Muñoz, Francisco J. Blanco-Encomienda

Abstract:

The problem of estimating a proportion has important applications in the field of economics, and in general, in many areas such as social sciences. A common application in economics is the estimation of the headcount index. In this paper, we define the general headcount index as a proportion. Furthermore, we introduce a new quantitative method for estimating the headcount index. In particular, we suggest to use the logistic regression estimator for the problem of estimating the headcount index. Assuming a real data set, results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the traditional estimator of the headcount index.

Keywords: poverty line, poor, risk of poverty, Monte Carlo simulations, sample

Procedia PDF Downloads 397
17895 The Role of Urban Development Patterns for Mitigating Extreme Urban Heat: The Case Study of Doha, Qatar

Authors: Yasuyo Makido, Vivek Shandas, David J. Sailor, M. Salim Ferwati

Abstract:

Mitigating extreme urban heat is challenging in a desert climate such as Doha, Qatar, since outdoor daytime temperature area often too high for the human body to tolerate. Recent studies demonstrate that cities in arid and semiarid areas can exhibit ‘urban cool islands’ - urban areas that are cooler than the surrounding desert. However, the variation of temperatures as a result of the time of day and factors leading to temperature change remain at the question. To address these questions, we examined the spatial and temporal variation of air temperature in Doha, Qatar by conducting multiple vehicle-base local temperature observations. We also employed three statistical approaches to model surface temperatures using relevant predictors: (1) Ordinary Least Squares, (2) Regression Tree Analysis and (3) Random Forest for three time periods. Although the most important determinant factors varied by day and time, distance to the coast was the significant determinant at midday. A 70%/30% holdout method was used to create a testing dataset to validate the results through Pearson’s correlation coefficient. The Pearson’s analysis suggests that the Random Forest model more accurately predicts the surface temperatures than the other methods. We conclude with recommendations about the types of development patterns that show the greatest potential for reducing extreme heat in air climates.

Keywords: desert cities, tree-structure regression model, urban cool Island, vehicle temperature traverse

Procedia PDF Downloads 365
17894 Mapping of Urban Micro-Climate in Lyon (France) by Integrating Complementary Predictors at Different Scales into Multiple Linear Regression Models

Authors: Lucille Alonso, Florent Renard

Abstract:

The characterizations of urban heat island (UHI) and their interactions with climate change and urban climates are the main research and public health issue, due to the increasing urbanization of the population. These solutions require a better knowledge of the UHI and micro-climate in urban areas, by combining measurements and modelling. This study is part of this topic by evaluating microclimatic conditions in dense urban areas in the Lyon Metropolitan Area (France) using a combination of data traditionally used such as topography, but also from LiDAR (Light Detection And Ranging) data, Landsat 8 satellite observation and Sentinel and ground measurements by bike. These bicycle-dependent weather data collections are used to build the database of the variable to be modelled, the air temperature, over Lyon’s hyper-center. This study aims to model the air temperature, measured during 6 mobile campaigns in Lyon in clear weather, using multiple linear regressions based on 33 explanatory variables. They are of various categories such as meteorological parameters from remote sensing, topographic variables, vegetation indices, the presence of water, humidity, bare soil, buildings, radiation, urban morphology or proximity and density to various land uses (water surfaces, vegetation, bare soil, etc.). The acquisition sources are multiple and come from the Landsat 8 and Sentinel satellites, LiDAR points, and cartographic products downloaded from an open data platform in Greater Lyon. Regarding the presence of low, medium, and high vegetation, the presence of buildings and ground, several buffers close to these factors were tested (5, 10, 20, 25, 50, 100, 200 and 500m). The buffers with the best linear correlations with air temperature for ground are 5m around the measurement points, for low and medium vegetation, and for building 50m and for high vegetation is 100m. The explanatory model of the dependent variable is obtained by multiple linear regression of the remaining explanatory variables (Pearson correlation matrix with a |r| < 0.7 and VIF with < 5) by integrating a stepwise sorting algorithm. Moreover, holdout cross-validation is performed, due to its ability to detect over-fitting of multiple regression, although multiple regression provides internal validation and randomization (80% training, 20% testing). Multiple linear regression explained, on average, 72% of the variance for the study days, with an average RMSE of only 0.20°C. The impact on the model of surface temperature in the estimation of air temperature is the most important variable. Other variables are recurrent such as distance to subway stations, distance to water areas, NDVI, digital elevation model, sky view factor, average vegetation density, or building density. Changing urban morphology influences the city's thermal patterns. The thermal atmosphere in dense urban areas can only be analysed on a microscale to be able to consider the local impact of trees, streets, and buildings. There is currently no network of fixed weather stations sufficiently deployed in central Lyon and most major urban areas. Therefore, it is necessary to use mobile measurements, followed by modelling to characterize the city's multiple thermal environments.

Keywords: air temperature, LIDAR, multiple linear regression, surface temperature, urban heat island

Procedia PDF Downloads 107
17893 Removal of Phenol from Aqueous Solution Using Watermelon (Citrullus C. lanatus) Rind

Authors: Fidelis Chigondo

Abstract:

This study focuses on investigating the effectiveness of watermelon rind in phenol removal from aqueous solution. The effects of various parameters (pH, initial phenol concentration, biosorbent dosage and contact time) on phenol adsorption were investigated. The pH of 2, initial phenol concentration of 40 ppm, the biosorbent dosage of 0.6 g and contact time of 6 h also deduced to be the optimum conditions for the adsorption process. The maximum phenol removal under optimized conditions was 85%. The sorption data fitted to the Freundlich isotherm with a regression coefficient of 0.9824. The kinetics was best described by the intraparticle diffusion model and Elovich Equation with regression coefficients of 1 and 0.8461 respectively showing that the reaction is chemisorption on a heterogeneous surface and the intraparticle diffusion rate only is the rate determining step. The study revealed that watermelon rind has a potential of removing phenol from industrial wastewaters.

Keywords: biosorption, phenol, biosorbent, watermelon rind

Procedia PDF Downloads 222
17892 Low-Cost, Portable Optical Sensor with Regression Algorithm Models for Accurate Monitoring of Nitrites in Environments

Authors: David X. Dong, Qingming Zhang, Meng Lu

Abstract:

Nitrites enter waterways as runoff from croplands and are discharged from many industrial sites. Excessive nitrite inputs to water bodies lead to eutrophication. On-site rapid detection of nitrite is of increasing interest for managing fertilizer application and monitoring water source quality. Existing methods for detecting nitrites use spectrophotometry, ion chromatography, electrochemical sensors, ion-selective electrodes, chemiluminescence, and colorimetric methods. However, these methods either suffer from high cost or provide low measurement accuracy due to their poor selectivity to nitrites. Therefore, it is desired to develop an accurate and economical method to monitor nitrites in environments. We report a low-cost optical sensor, in conjunction with a machine learning (ML) approach to enable high-accuracy detection of nitrites in water sources. The sensor works under the principle of measuring molecular absorptions of nitrites at three narrowband wavelengths (295 nm, 310 nm, and 357 nm) in the ultraviolet (UV) region. These wavelengths are chosen because they have relatively high sensitivity to nitrites; low-cost light-emitting devices (LEDs) and photodetectors are also available at these wavelengths. A regression model is built, trained, and utilized to minimize cross-sensitivities of these wavelengths to the same analyte, thus achieving precise and reliable measurements with various interference ions. The measured absorbance data is input to the trained model that can provide nitrite concentration prediction for the sample. The sensor is built with i) a miniature quartz cuvette as the test cell that contains a liquid sample under test, ii) three low-cost UV LEDs placed on one side of the cell as light sources, with each LED providing a narrowband light, and iii) a photodetector with a built-in amplifier and an analog-to-digital converter placed on the other side of the test cell to measure the power of transmitted light. This simple optical design allows measuring the absorbance data of the sample at the three wavelengths. To train the regression model, absorbances of nitrite ions and their combination with various interference ions are first obtained at the three UV wavelengths using a conventional spectrophotometer. Then, the spectrophotometric data are inputs to different regression algorithm models for training and evaluating high-accuracy nitrite concentration prediction. Our experimental results show that the proposed approach enables instantaneous nitrite detection within several seconds. The sensor hardware costs about one hundred dollars, which is much cheaper than a commercial spectrophotometer. The ML algorithm helps to reduce the average relative errors to below 3.5% over a concentration range from 0.1 ppm to 100 ppm of nitrites. The sensor has been validated to measure nitrites at three sites in Ames, Iowa, USA. This work demonstrates an economical and effective approach to the rapid, reagent-free determination of nitrites with high accuracy. The integration of the low-cost optical sensor and ML data processing can find a wide range of applications in environmental monitoring and management.

Keywords: optical sensor, regression model, nitrites, water quality

Procedia PDF Downloads 45
17891 The Impact of Simulation-based Learning on the Clinical Self-efficacy and Adherence to Infection Control Practices of Nursing Students

Authors: Raeed Alanazi

Abstract:

Introduction: Nursing students have a crucial role to play in the inhibition of infectious diseases and, therefore, must be trained in infection control and prevention modules prior to entering clinical settings. Simulations have been found to have a positive impact on infection control skills and the use of standard precautions. Aim: The purpose of this study was to use the four sources of self-efficacy in explaining the level of clinical self-efficacy and adherence to infection control practices in Saudi nursing students during simulation practice. Method: A cross-sectional design with convenience sampling was used. This study was conducted in all Saudi nursing schools, with a total number of 197 students participated in this study. Three scales were used simulation self- efficacy Scale (SSES), the four sources of self-efficacy scale (SSES), and Compliance with Standard Precautions Scale (CSPS). Multiple linear regression was used to test the use of the four sources of self-efficacy (SSES) in explaining level of clinical self-efficacy and adherence to infection control in nursing students. Results: The vicarious experience subscale (p =.044) was statistically significant. The regression model indicated that for every one unit increase in vicarious experience (observation and reflection in simulation), the participants’ adherence to infection control increased by .13 units (β =.22, t = 2.03, p =.044). In addition, the regression model indicated that for every one unit increase in education level, the participants’ adherence to infection control increased by 1.82 units (beta=.34= 3.64, p <.001). Also, the mastery experience subscale (p <.001) and vicarious experience subscale (p = .020) were shared significant associations with clinical self-efficacy. Conclusion: The findings of this research support the idea that simulation-based learning can be a valuable teaching-learning method to help nursing students develop clinical competence, which is essential in providing quality and safe nursing care.

Keywords: simulation-based learning, clinical self-efficacy, infection control, nursing students

Procedia PDF Downloads 43
17890 Quantitative Structure Activity Relationship Model for Predicting the Aromatase Inhibition Activity of 1,2,3-Triazole Derivatives

Authors: M. Ouassaf, S. Belaidi

Abstract:

Aromatase is an estrogen biosynthetic enzyme belonging to the cytochrome P450 family, which catalyzes the limiting step in the conversion of androgens to estrogens. As it is relevant for the promotion of tumor cell growth. A set of thirty 1,2,3-triazole derivatives was used in the quantitative structure activity relationship (QSAR) study using regression multiple linear (MLR), We divided the data into two training and testing groups. The results showed a good predictive ability of the MLR model, the models were statistically robust internally (R² = 0.982) and the predictability of the model was tested by several parameters. including external criteria (R²pred = 0.851, CCC = 0.946). The knowledge gained in this study should provide relevant information that contributes to the origins of aromatase inhibitory activity and, therefore, facilitates our ongoing quest for aromatase inhibitors with robust properties.

Keywords: aromatase inhibitors, QSAR, MLR, 1, 2, 3-triazole

Procedia PDF Downloads 87
17889 Assessing the Structure of Non-Verbal Semantic Knowledge: The Evaluation and First Results of the Hungarian Semantic Association Test

Authors: Alinka Molnár-Tóth, Tímea Tánczos, Regina Barna, Katalin Jakab, Péter Klivényi

Abstract:

Supported by neuroscientific findings, the so-called Hub-and-Spoke model of the human semantic system is based on two subcomponents of semantic cognition, namely the semantic control process and semantic representation. Our semantic knowledge is multimodal in nature, as the knowledge system stored in relation to a conception is extensive and broad, while different aspects of the conception may be relevant depending on the purpose. The motivation of our research is to develop a new diagnostic measurement procedure based on the preservation of semantic representation, which is appropriate to the specificities of the Hungarian language and which can be used to compare the non-verbal semantic knowledge of healthy and aphasic persons. The development of the test will broaden the Hungarian clinical diagnostic toolkit, which will allow for more specific therapy planning. The sample of healthy persons (n=480) was determined by the last census data for the representativeness of the sample. Based on the concept of the Pyramids and Palm Tree Test, and according to the characteristics of the Hungarian language, we have elaborated a test based on different types of semantic information, in which the subjects are presented with three pictures: they have to choose the one that best fits the target word above from the two lower options, based on the semantic relation defined. We have measured 5 types of semantic knowledge representations: associative relations, taxonomy, motional representations, concrete as well as abstract verbs. As the first step in our data analysis, we examined the normal distribution of our results, and since it was not normally distributed (p < 0.05), we used nonparametric statistics further into the analysis. Using descriptive statistics, we could determine the frequency of the correct and incorrect responses, and with this knowledge, we could later adjust and remove the items of questionable reliability. The reliability was tested using Cronbach’s α, and it can be safely said that all the results were in an acceptable range of reliability (α = 0.6-0.8). We then tested for the potential gender differences using the Mann Whitney-U test, however, we found no difference between the two (p < 0.05). Likewise, we didn’t see that the age had any effect on the results using one-way ANOVA (p < 0.05), however, the level of education did influence the results (p > 0.05). The relationships between the subtests were observed by the nonparametric Spearman’s rho correlation matrix, showing statistically significant correlation between the subtests (p > 0.05), signifying a linear relationship between the measured semantic functions. A margin of error of 5% was used in all cases. The research will contribute to the expansion of the clinical diagnostic toolkit and will be relevant for the individualised therapeutic design of treatment procedures. The use of a non-verbal test procedure will allow an early assessment of the most severe language conditions, which is a priority in the differential diagnosis. The measurement of reaction time is expected to advance prodrome research, as the tests can be easily conducted in the subclinical phase.

Keywords: communication disorders, diagnostic toolkit, neurorehabilitation, semantic knowlegde

Procedia PDF Downloads 69