Search results for: poisson regression model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18875

Search results for: poisson regression model

18515 Applying Multiplicative Weight Update to Skin Cancer Classifiers

Authors: Animish Jain

Abstract:

This study deals with using Multiplicative Weight Update within artificial intelligence and machine learning to create models that can diagnose skin cancer using microscopic images of cancer samples. In this study, the multiplicative weight update method is used to take the predictions of multiple models to try and acquire more accurate results. Logistic Regression, Convolutional Neural Network (CNN), and Support Vector Machine Classifier (SVMC) models are employed within the Multiplicative Weight Update system. These models are trained on pictures of skin cancer from the ISIC-Archive, to look for patterns to label unseen scans as either benign or malignant. These models are utilized in a multiplicative weight update algorithm which takes into account the precision and accuracy of each model through each successive guess to apply weights to their guess. These guesses and weights are then analyzed together to try and obtain the correct predictions. The research hypothesis for this study stated that there would be a significant difference in the accuracy of the three models and the Multiplicative Weight Update system. The SVMC model had an accuracy of 77.88%. The CNN model had an accuracy of 85.30%. The Logistic Regression model had an accuracy of 79.09%. Using Multiplicative Weight Update, the algorithm received an accuracy of 72.27%. The final conclusion that was drawn was that there was a significant difference in the accuracy of the three models and the Multiplicative Weight Update system. The conclusion was made that using a CNN model would be the best option for this problem rather than a Multiplicative Weight Update system. This is due to the possibility that Multiplicative Weight Update is not effective in a binary setting where there are only two possible classifications. In a categorical setting with multiple classes and groupings, a Multiplicative Weight Update system might become more proficient as it takes into account the strengths of multiple different models to classify images into multiple categories rather than only two categories, as shown in this study. This experimentation and computer science project can help to create better algorithms and models for the future of artificial intelligence in the medical imaging field.

Keywords: artificial intelligence, machine learning, multiplicative weight update, skin cancer

Procedia PDF Downloads 79
18514 Prediction on Housing Price Based on Deep Learning

Authors: Li Yu, Chenlu Jiao, Hongrun Xin, Yan Wang, Kaiyang Wang

Abstract:

In order to study the impact of various factors on the housing price, we propose to build different prediction models based on deep learning to determine the existing data of the real estate in order to more accurately predict the housing price or its changing trend in the future. Considering that the factors which affect the housing price vary widely, the proposed prediction models include two categories. The first one is based on multiple characteristic factors of the real estate. We built Convolution Neural Network (CNN) prediction model and Long Short-Term Memory (LSTM) neural network prediction model based on deep learning, and logical regression model was implemented to make a comparison between these three models. Another prediction model is time series model. Based on deep learning, we proposed an LSTM-1 model purely regard to time series, then implementing and comparing the LSTM model and the Auto-Regressive and Moving Average (ARMA) model. In this paper, comprehensive study of the second-hand housing price in Beijing has been conducted from three aspects: crawling and analyzing, housing price predicting, and the result comparing. Ultimately the best model program was produced, which is of great significance to evaluation and prediction of the housing price in the real estate industry.

Keywords: deep learning, convolutional neural network, LSTM, housing prediction

Procedia PDF Downloads 306
18513 Educational Data Mining: The Case of the Department of Mathematics and Computing in the Period 2009-2018

Authors: Mário Ernesto Sitoe, Orlando Zacarias

Abstract:

University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.

Keywords: evasion and retention, cross-validation, bagging, stacking

Procedia PDF Downloads 82
18512 Using Linear Logistic Regression to Evaluation the Patient and System Delay and Effective Factors in Mortality of Patients with Acute Myocardial Infarction

Authors: Firouz Amani, Adalat Hoseinian, Sajjad Hakimian

Abstract:

Background: The mortality due to Myocardial Infarction (MI) is often occur during the first hours after onset of symptom. So, for taking the necessary treatment and decreasing the mortality rate, timely visited of the hospital could be effective in this regard. The aim of this study was to investigate the impact of effective factors in mortality of MI patients by using Linear Logistic Regression. Materials and Methods: In this case-control study, all patients with Acute MI who referred to the Ardabil city hospital were studied. All of died patients were considered as the case group (n=27) and we select 27 matched patients without Acute MI as a control group. Data collected for all patients in two groups by a same checklist and then analyzed by SPSS version 24 software using statistical methods. We used the linear logistic regression model to determine the effective factors on mortality of MI patients. Results: The mean age of patients in case group was significantly higher than control group (75.1±11.7 vs. 63.1±11.6, p=0.001).The history of non-cardinal diseases in case group with 44.4% significantly higher than control group with 7.4% (p=0.002).The number of performed PCIs in case group with 40.7% significantly lower than control group with 74.1% (P=0.013). The time distance between hospital admission and performed PCI in case group with 110.9 min was significantly upper than control group with 56 min (P=0.001). The mean of delay time from Onset of symptom to hospital admission (patient delay) and the mean of delay time from hospital admissions to receive treatment (system delay) was similar between two groups. By using logistic regression model we revealed that history of non-cardinal diseases (OR=283) and the number of performed PCIs (OR=24.5) had significant impact on mortality of MI patients in compare to other factors. Conclusion: Results of this study showed that of all studied factors, the number of performed PCIs, history of non-cardinal illness and the interval between onset of symptoms and performed PCI have significant relation with morality of MI patients and other factors were not meaningful. So, doing more studies with a large sample and investigated other involved factors such as smoking, weather and etc. is recommended in future.

Keywords: acute MI, mortality, heart failure, arrhythmia

Procedia PDF Downloads 122
18511 Benchmarking Machine Learning Approaches for Forecasting Hotel Revenue

Authors: Rachel Y. Zhang, Christopher K. Anderson

Abstract:

A critical aspect of revenue management is a firm’s ability to predict demand as a function of price. Historically hotels have used simple time series models (regression and/or pick-up based models) owing to the complexities of trying to build casual models of demands. Machine learning approaches are slowly attracting attention owing to their flexibility in modeling relationships. This study provides an overview of approaches to forecasting hospitality demand – focusing on the opportunities created by machine learning approaches, including K-Nearest-Neighbors, Support vector machine, Regression Tree, and Artificial Neural Network algorithms. The out-of-sample performances of above approaches to forecasting hotel demand are illustrated by using a proprietary sample of the market level (24 properties) transactional data for Las Vegas NV. Causal predictive models can be built and evaluated owing to the availability of market level (versus firm level) data. This research also compares and contrast model accuracy of firm-level models (i.e. predictive models for hotel A only using hotel A’s data) to models using market level data (prices, review scores, location, chain scale, etc… for all hotels within the market). The prospected models will be valuable for hotel revenue prediction given the basic characters of a hotel property or can be applied in performance evaluation for an existed hotel. The findings will unveil the features that play key roles in a hotel’s revenue performance, which would have considerable potential usefulness in both revenue prediction and evaluation.

Keywords: hotel revenue, k-nearest-neighbors, machine learning, neural network, prediction model, regression tree, support vector machine

Procedia PDF Downloads 132
18510 Estimation of Dynamic Characteristics of a Middle Rise Steel Reinforced Concrete Building Using Long-Term

Authors: Fumiya Sugino, Naohiro Nakamura, Yuji Miyazu

Abstract:

In earthquake resistant design of buildings, evaluation of vibration characteristics is important. In recent years, due to the increment of super high-rise buildings, the evaluation of response is important for not only the first mode but also higher modes. The knowledge of vibration characteristics in buildings is mostly limited to the first mode and the knowledge of higher modes is still insufficient. In this paper, using earthquake observation records of a SRC building by applying frequency filter to ARX model, characteristics of first and second modes were studied. First, we studied the change of the eigen frequency and the damping ratio during the 3.11 earthquake. The eigen frequency gradually decreases from the time of earthquake occurrence, and it is almost stable after about 150 seconds have passed. At this time, the decreasing rates of the 1st and 2nd eigen frequencies are both about 0.7. Although the damping ratio has more large error than the eigen frequency, both the 1st and 2nd damping ratio are 3 to 5%. Also, there is a strong correlation between the 1st and 2nd eigen frequency, and the regression line is y=3.17x. In the damping ratio, the regression line is y=0.90x. Therefore 1st and 2nd damping ratios are approximately the same degree. Next, we study the eigen frequency and damping ratio from 1998 after 3.11 earthquakes, the final year is 2014. In all the considered earthquakes, they are connected in order of occurrence respectively. The eigen frequency slowly declined from immediately after completion, and tend to stabilize after several years. Although it has declined greatly after the 3.11 earthquake. Both the decresing rate of the 1st and 2nd eigen frequencies until about 7 years later are about 0.8. For the damping ratio, both the 1st and 2nd are about 1 to 6%. After the 3.11 earthquake, the 1st increases by about 1% and the 2nd increases by less than 1%. For the eigen frequency, there is a strong correlation between the 1st and 2nd, and the regression line is y=3.17x. For the damping ratio, the regression line is y=1.01x. Therefore, it can be said that the 1st and 2nd damping ratio is approximately the same degree. Based on the above results, changes in eigen frequency and damping ratio are summarized as follows. In the long-term study of the eigen frequency, both the 1st and 2nd gradually declined from immediately after completion, and tended to stabilize after a few years. Further it declined after the 3.11 earthquake. In addition, there is a strong correlation between the 1st and 2nd, and the declining time and the decreasing rate are the same degree. In the long-term study of the damping ratio, both the 1st and 2nd are about 1 to 6%. After the 3.11 earthquake, the 1st increases by about 1%, the 2nd increases by less than 1%. Also, the 1st and 2nd are approximately the same degree.

Keywords: eigenfrequency, damping ratio, ARX model, earthquake observation records

Procedia PDF Downloads 217
18509 Predicting Football Player Performance: Integrating Data Visualization and Machine Learning

Authors: Saahith M. S., Sivakami R.

Abstract:

In the realm of football analytics, particularly focusing on predicting football player performance, the ability to forecast player success accurately is of paramount importance for teams, managers, and fans. This study introduces an elaborate examination of predicting football player performance through the integration of data visualization methods and machine learning algorithms. The research entails the compilation of an extensive dataset comprising player attributes, conducting data preprocessing, feature selection, model selection, and model training to construct predictive models. The analysis within this study will involve delving into feature significance using methodologies like Select Best and Recursive Feature Elimination (RFE) to pinpoint pertinent attributes for predicting player performance. Various machine learning algorithms, including Random Forest, Decision Tree, Linear Regression, Support Vector Regression (SVR), and Artificial Neural Networks (ANN), will be explored to develop predictive models. The evaluation of each model's performance utilizing metrics such as Mean Squared Error (MSE) and R-squared will be executed to gauge their efficacy in predicting player performance. Furthermore, this investigation will encompass a top player analysis to recognize the top-performing players based on the anticipated overall performance scores. Nationality analysis will entail scrutinizing the player distribution based on nationality and investigating potential correlations between nationality and player performance. Positional analysis will concentrate on examining the player distribution across various positions and assessing the average performance of players in each position. Age analysis will evaluate the influence of age on player performance and identify any discernible trends or patterns associated with player age groups. The primary objective is to predict a football player's overall performance accurately based on their individual attributes, leveraging data-driven insights to enrich the comprehension of player success on the field. By amalgamating data visualization and machine learning methodologies, the aim is to furnish valuable tools for teams, managers, and fans to effectively analyze and forecast player performance. This research contributes to the progression of sports analytics by showcasing the potential of machine learning in predicting football player performance and offering actionable insights for diverse stakeholders in the football industry.

Keywords: football analytics, player performance prediction, data visualization, machine learning algorithms, random forest, decision tree, linear regression, support vector regression, artificial neural networks, model evaluation, top player analysis, nationality analysis, positional analysis

Procedia PDF Downloads 38
18508 Analytical Modelling of Surface Roughness during Compacted Graphite Iron Milling Using Ceramic Inserts

Authors: Ş. Karabulut, A. Güllü, A. Güldaş, R. Gürbüz

Abstract:

This study investigates the effects of the lead angle and chip thickness variation on surface roughness during the machining of compacted graphite iron using ceramic cutting tools under dry cutting conditions. Analytical models were developed for predicting the surface roughness values of the specimens after the face milling process. Experimental data was collected and imported to the artificial neural network model. A multilayer perceptron model was used with the back propagation algorithm employing the input parameters of lead angle, cutting speed and feed rate in connection with chip thickness. Furthermore, analysis of variance was employed to determine the effects of the cutting parameters on surface roughness. Artificial neural network and regression analysis were used to predict surface roughness. The values thus predicted were compared with the collected experimental data, and the corresponding percentage error was computed. Analysis results revealed that the lead angle is the dominant factor affecting surface roughness. Experimental results indicated an improvement in the surface roughness value with decreasing lead angle value from 88° to 45°.

Keywords: CGI, milling, surface roughness, ANN, regression, modeling, analysis

Procedia PDF Downloads 448
18507 Use of Front-Face Fluorescence Spectroscopy and Multiway Analysis for the Prediction of Olive Oil Quality Features

Authors: Omar Dib, Rita Yaacoub, Luc Eveleigh, Nathalie Locquet, Hussein Dib, Ali Bassal, Christophe B. Y. Cordella

Abstract:

The potential of front-face fluorescence coupled with chemometric techniques, namely parallel factor analysis (PARAFAC) and multiple linear regression (MLR) as a rapid analysis tool to characterize Lebanese virgin olive oils was investigated. Fluorescence fingerprints were acquired directly on 102 Lebanese virgin olive oil samples in the range of 280-540 nm in excitation and 280-700 nm in emission. A PARAFAC model with seven components was considered optimal with a residual of 99.64% and core consistency value of 78.65. The model revealed seven main fluorescence profiles in olive oil and was mainly associated with tocopherols, polyphenols, chlorophyllic compounds and oxidation/hydrolysis products. 23 MLR regression models based on PARAFAC scores were generated, the majority of which showed a good correlation coefficient (R > 0.7 for 12 predicted variables), thus satisfactory prediction performances. Acid values, peroxide values, and Delta K had the models with the highest predictions, with R values of 0.89, 0.84 and 0.81 respectively. Among fatty acids, linoleic and oleic acids were also highly predicted with R values of 0.8 and 0.76, respectively. Factors contributing to the model's construction were related to common fluorophores found in olive oil, mainly chlorophyll, polyphenols, and oxidation products. This study demonstrates the interest of front-face fluorescence as a promising tool for quality control of Lebanese virgin olive oils.

Keywords: front-face fluorescence, Lebanese virgin olive oils, multiple Linear regressions, PARAFAC analysis

Procedia PDF Downloads 452
18506 Automatic API Regression Analyzer and Executor

Authors: Praveena Sridhar, Nihar Devathi, Parikshit Chakraborty

Abstract:

As the software product changes versions across releases, there are changes to the API’s and features and the upgrades become necessary. Hence, it becomes imperative to get the impact of upgrading the dependent components. This tool finds out API changes across two versions and their impact on other API’s followed by execution of the automated regression suites relevant to updates and their impacted areas. This tool has 4 layer architecture, each layer with its own unique pre-assigned capability which it does and sends the required information to next layer. This are the 4 layers. 1) Comparator: Compares the two versions of API. 2) Analyzer: Analyses the API doc and gives the modified class and its dependencies along with implemented interface details. 3) Impact Filter: Find the impact of the modified class on the other API methods. 4) Auto Executer: Based on the output given by Impact Filter, Executor will run the API regression Suite. Tool reads the java doc and extracts the required information of classes, interfaces and enumerations. The extracted information is saved into a data structure which shows the class details and its dependencies along with interfaces and enumerations that are listed in the java doc.

Keywords: automation impact regression, java doc, executor, analyzer, layers

Procedia PDF Downloads 488
18505 Gender Estimation by Means of Quantitative Measurements of Foramen Magnum: An Analysis of CT Head Images

Authors: Thilini Hathurusinghe, Uthpalie Siriwardhana, W. M. Ediri Arachchi, Ranga Thudugala, Indeewari Herath, Gayani Senanayake

Abstract:

The foramen magnum is more prone to protect than other skeletal remains during high impact and severe disruptive injuries. Therefore, it is worthwhile to explore whether these measurements can be used to determine the human gender which is vital in forensic and anthropological studies. The idea was to find out the ability to use quantitative measurements of foramen magnum as an anatomical indicator for human gender estimation and to evaluate the gender-dependent variations of foramen magnum using quantitative measurements. Randomly selected 113 subjects who underwent CT head scans at Sri Jayawardhanapura General Hospital of Sri Lanka within a period of six months, were included in the study. The sample contained 58 males (48.76 ± 14.7 years old) and 55 females (47.04 ±15.9 years old). Maximum length of the foramen magnum (LFM), maximum width of the foramen magnum (WFM), minimum distance between occipital condyles (MnD) and maximum interior distance between occipital condyles (MxID) were measured. Further, AreaT and AreaR were also calculated. The gender was estimated using binomial logistic regression. The mean values of all explanatory variables (LFM, WFM, MnD, MxID, AreaT, and AreaR) were greater among male than female. All explanatory variables except MnD (p=0.669) were statistically significant (p < 0.05). Significant bivariate correlations were demonstrated by AreaT and AreaR with the explanatory variables. The results evidenced that WFM and MxID were the best measurements in predicting gender according to binomial logistic regression. The estimated model was: log (p/1-p) =10.391-0.136×MxID-0.231×WFM, where p is the probability of being a female. The classification accuracy given by the above model was 65.5%. The quantitative measurements of foramen magnum can be used as a reliable anatomical marker for human gender estimation in the Sri Lankan context.

Keywords: foramen magnum, forensic and anthropological studies, gender estimation, logistic regression

Procedia PDF Downloads 151
18504 Optimal Production and Maintenance Policy for a Partially Observable Production System with Stochastic Demand

Authors: Leila Jafari, Viliam Makis

Abstract:

In this paper, the joint optimization of the economic manufacturing quantity (EMQ), safety stock level, and condition-based maintenance (CBM) is presented for a partially observable, deteriorating system subject to random failure. The demand is stochastic and it is described by a Poisson process. The stochastic model is developed and the optimization problem is formulated in the semi-Markov decision process framework. A modification of the policy iteration algorithm is developed to find the optimal policy. A numerical example is presented to compare the optimal policy with the policy considering zero safety stock.

Keywords: condition-based maintenance, economic manufacturing quantity, safety stock, stochastic demand

Procedia PDF Downloads 464
18503 A Predictive Machine Learning Model of the Survival of Female-led and Co-Led Small and Medium Enterprises in the UK

Authors: Mais Khader, Xingjie Wei

Abstract:

This research sheds light on female entrepreneurs by providing new insights on the survival predictions of companies led by females in the UK. This study aims to build a predictive machine learning model of the survival of female-led & co-led small & medium enterprises (SMEs) in the UK over the period 2000-2020. The predictive model built utilised a combination of financial and non-financial features related to both companies and their directors to predict SMEs' survival. These features were studied in terms of their contribution to the resultant predictive model. Five machine learning models are used in the modelling: Decision tree, AdaBoost, Naïve Bayes, Logistic regression and SVM. The AdaBoost model had the highest performance of the five models, with an accuracy of 73% and an AUC of 80%. The results show high feature importance in predicting companies' survival for company size, management experience, financial performance, industry, region, and females' percentage in management.

Keywords: company survival, entrepreneurship, females, machine learning, SMEs

Procedia PDF Downloads 101
18502 Developing and Evaluating Clinical Risk Prediction Models for Coronary Artery Bypass Graft Surgery

Authors: Mohammadreza Mohebbi, Masoumeh Sanagou

Abstract:

The ability to predict clinical outcomes is of great importance to physicians and clinicians. A number of different methods have been used in an effort to accurately predict these outcomes. These methods include the development of scoring systems based on multivariate statistical modelling, and models involving the use of classification and regression trees. The process usually consists of two consecutive phases, namely model development and external validation. The model development phase consists of building a multivariate model and evaluating its predictive performance by examining calibration and discrimination, and internal validation. External validation tests the predictive performance of a model by assessing its calibration and discrimination in different but plausibly related patients. A motivate example focuses on prediction modeling using a sample of patients undergone coronary artery bypass graft (CABG) has been used for illustrative purpose and a set of primary considerations for evaluating prediction model studies using specific quality indicators as criteria to help stakeholders evaluate the quality of a prediction model study has been proposed.

Keywords: clinical prediction models, clinical decision rule, prognosis, external validation, model calibration, biostatistics

Procedia PDF Downloads 297
18501 Using Machine Learning to Enhance Win Ratio for College Ice Hockey Teams

Authors: Sadixa Sanjel, Ahmed Sadek, Naseef Mansoor, Zelalem Denekew

Abstract:

Collegiate ice hockey (NCAA) sports analytics is different from the national level hockey (NHL). We apply and compare multiple machine learning models such as Linear Regression, Random Forest, and Neural Networks to predict the win ratio for a team based on their statistics. Data exploration helps determine which statistics are most useful in increasing the win ratio, which would be beneficial to coaches and team managers. We ran experiments to select the best model and chose Random Forest as the best performing. We conclude with how to bridge the gap between the college and national levels of sports analytics and the use of machine learning to enhance team performance despite not having a lot of metrics or budget for automatic tracking.

Keywords: NCAA, NHL, sports analytics, random forest, regression, neural networks, game predictions

Procedia PDF Downloads 114
18500 Electroencephalogram Based Approach for Mental Stress Detection during Gameplay with Level Prediction

Authors: Priyadarsini Samal, Rajesh Singla

Abstract:

Many mobile games come with the benefits of entertainment by introducing stress to the human brain. In recognizing this mental stress, the brain-computer interface (BCI) plays an important role. It has various neuroimaging approaches which help in analyzing the brain signals. Electroencephalogram (EEG) is the most commonly used method among them as it is non-invasive, portable, and economical. Here, this paper investigates the pattern in brain signals when introduced with mental stress. Two healthy volunteers played a game whose aim was to search hidden words from the grid, and the levels were chosen randomly. The EEG signals during gameplay were recorded to investigate the impacts of stress with the changing levels from easy to medium to hard. A total of 16 features of EEG were analyzed for this experiment which includes power band features with relative powers, event-related desynchronization, along statistical features. Support vector machine was used as the classifier, which resulted in an accuracy of 93.9% for three-level stress analysis; for two levels, the accuracy of 92% and 98% are achieved. In addition to that, another game that was similar in nature was played by the volunteers. A suitable regression model was designed for prediction where the feature sets of the first and second game were used for testing and training purposes, respectively, and an accuracy of 73% was found.

Keywords: brain computer interface, electroencephalogram, regression model, stress, word search

Procedia PDF Downloads 187
18499 A Multi-Release Software Reliability Growth Models Incorporating Imperfect Debugging and Change-Point under the Simulated Testing Environment and Software Release Time

Authors: Sujit Kumar Pradhan, Anil Kumar, Vijay Kumar

Abstract:

The testing process of the software during the software development time is a crucial step as it makes the software more efficient and dependable. To estimate software’s reliability through the mean value function, many software reliability growth models (SRGMs) were developed under the assumption that operating and testing environments are the same. Practically, it is not true because when the software works in a natural field environment, the reliability of the software differs. This article discussed an SRGM comprising change-point and imperfect debugging in a simulated testing environment. Later on, we extended it in a multi-release direction. Initially, the software was released to the market with few features. According to the market’s demand, the software company upgraded the current version by adding new features as time passed. Therefore, we have proposed a generalized multi-release SRGM where change-point and imperfect debugging concepts have been addressed in a simulated testing environment. The failure-increasing rate concept has been adopted to determine the change point for each software release. Based on nine goodness-of-fit criteria, the proposed model is validated on two real datasets. The results demonstrate that the proposed model fits the datasets better. We have also discussed the optimal release time of the software through a cost model by assuming that the testing and debugging costs are time-dependent.

Keywords: software reliability growth models, non-homogeneous Poisson process, multi-release software, mean value function, change-point, environmental factors

Procedia PDF Downloads 74
18498 The Relationship between Lithological and Geomechanical Properties of Carbonate Rocks. Case study: Arab-D Reservoir Outcrop Carbonate, Central Saudi Arabia

Authors: Ammar Juma Abdlmutalib, Osman Abdullatif

Abstract:

Upper Jurrasic Arab-D Reservoir is considered as the largest oil reservoir in Saudi Arabia. The equivalent outcrop is exposed near Riyadh. The study investigates the relationships between lithofacies properties changes and geomechanical properties of Arab-D Reservoir in the outcrop scale. The methods used included integrated field observations and laboratory measurements. Schmidt Hammer Rebound Hardness, Point Load Index tests were carried out to estimate the strength of the samples, ultrasonic wave velocity test also was applied to measure P-wave, S-wave, and dynamic Poisson's ratio. Thin sections have been analyzed and described. The results show that there is a variation in geomechanical properties between the Arab-D member and Upper Jubaila Formation at outcrop scale, the change in texture or grain size has no or little effect on these properties. This is because of the clear effect of diagenesis which changes the strength of the samples. The result also shows the negative or inverse correlation between porosity and geomechanical properties. As for the strength, dolomitic mudstone and wackestone within Upper Jubaila Formation has higher Schmidt hammer values, wavy rippled sandy grainstone which is rich in quarts has the greater point load index values. While laminated mudstone and breccias, facies has lower strength. This emphasizes the role of mineral content in the geomechanical properties of Arab-D reservoir lithofacies.

Keywords: geomechanical properties, Arab-D reservoir, lithofacies changes, Poisson's ratio, diageneis

Procedia PDF Downloads 397
18497 Analytical Authentication of Butter Using Fourier Transform Infrared Spectroscopy Coupled with Chemometrics

Authors: M. Bodner, M. Scampicchio

Abstract:

Fourier Transform Infrared (FT-IR) spectroscopy coupled with chemometrics was used to distinguish between butter samples and non-butter samples. Further, quantification of the content of margarine in adulterated butter samples was investigated. Fingerprinting region (1400-800 cm–1) was used to develop unsupervised pattern recognition (Principal Component Analysis, PCA), supervised modeling (Soft Independent Modelling by Class Analogy, SIMCA), classification (Partial Least Squares Discriminant Analysis, PLS-DA) and regression (Partial Least Squares Regression, PLS-R) models. PCA of the fingerprinting region shows a clustering of the two sample types. All samples were classified in their rightful class by SIMCA approach; however, nine adulterated samples (between 1% and 30% w/w of margarine) were classified as belonging both at the butter class and at the non-butter one. In the two-class PLS-DA model’s (R2 = 0.73, RMSEP, Root Mean Square Error of Prediction = 0.26% w/w) sensitivity was 71.4% and Positive Predictive Value (PPV) 100%. Its threshold was calculated at 7% w/w of margarine in adulterated butter samples. Finally, PLS-R model (R2 = 0.84, RMSEP = 16.54%) was developed. PLS-DA was a suitable classification tool and PLS-R a proper quantification approach. Results demonstrate that FT-IR spectroscopy combined with PLS-R can be used as a rapid, simple and safe method to identify pure butter samples from adulterated ones and to determine the grade of adulteration of margarine in butter samples.

Keywords: adulterated butter, margarine, PCA, PLS-DA, PLS-R, SIMCA

Procedia PDF Downloads 143
18496 Bayesian Variable Selection in Quantile Regression with Application to the Health and Retirement Study

Authors: Priya Kedia, Kiranmoy Das

Abstract:

There is a rich literature on variable selection in regression setting. However, most of these methods assume normality for the response variable under consideration for implementing the methodology and establishing the statistical properties of the estimates. In many real applications, the distribution for the response variable may be non-Gaussian, and one might be interested in finding the best subset of covariates at some predetermined quantile level. We develop dynamic Bayesian approach for variable selection in quantile regression framework. We use a zero-inflated mixture prior for the regression coefficients, and consider the asymmetric Laplace distribution for the response variable for modeling different quantiles of its distribution. An efficient Gibbs sampler is developed for our computation. Our proposed approach is assessed through extensive simulation studies, and real application of the proposed approach is also illustrated. We consider the data from health and retirement study conducted by the University of Michigan, and select the important predictors when the outcome of interest is out-of-pocket medical cost, which is considered as an important measure for financial risk. Our analysis finds important predictors at different quantiles of the outcome, and thus enhance our understanding on the effects of different predictors on the out-of-pocket medical cost.

Keywords: variable selection, quantile regression, Gibbs sampler, asymmetric Laplace distribution

Procedia PDF Downloads 156
18495 Effect of pH-Dependent Surface Charge on the Electroosmotic Flow through Nanochannel

Authors: Partha P. Gopmandal, Somnath Bhattacharyya, Naren Bag

Abstract:

In this article, we have studied the effect of pH-regulated surface charge on the electroosmotic flow (EOF) through nanochannel filled with binary symmetric electrolyte solution. The channel wall possesses either an acidic or a basic functional group. Going beyond the widely employed Debye-Huckel linearization, we develop a mathematical model based on Nernst-Planck equation for the charged species, Poisson equation for the induced potential, Stokes equation for fluid flow. A finite volume based numerical algorithm is adopted to study the effect of key parameters on the EOF. We have computed the coupled governing equations through the finite volume method and our results found to be in good agreement with the analytical solution obtained from the corresponding linear model based on low surface charge condition or strong electrolyte solution. The influence of the surface charge density, reaction constant of the functional groups, bulk pH, and concentration of the electrolyte solution on the overall flow rate is studied extensively. We find the effect of surface charge diminishes with the increase in electrolyte concentration. In addition for strong electrolyte, the surface charge becomes independent of pH due to complete dissociation of the functional groups.

Keywords: electroosmosis, finite volume method, functional group, surface charge

Procedia PDF Downloads 419
18494 Partial Least Square Regression for High-Dimentional and High-Correlated Data

Authors: Mohammed Abdullah Alshahrani

Abstract:

The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.

Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data

Procedia PDF Downloads 49
18493 Currency Exchange Rate Forecasts Using Quantile Regression

Authors: Yuzhi Cai

Abstract:

In this paper, we discuss a Bayesian approach to quantile autoregressive (QAR) time series model estimation and forecasting. Together with a combining forecasts technique, we then predict USD to GBP currency exchange rates. Combined forecasts contain all the information captured by the fitted QAR models at different quantile levels and are therefore better than those obtained from individual models. Our results show that an unequally weighted combining method performs better than other forecasting methodology. We found that a median AR model can perform well in point forecasting when the predictive density functions are symmetric. However, in practice, using the median AR model alone may involve the loss of information about the data captured by other QAR models. We recommend that combined forecasts should be used whenever possible.

Keywords: combining forecasts, MCMC, predictive density functions, quantile forecasting, quantile modelling

Procedia PDF Downloads 256
18492 Exploring Factors Affecting Electricity Production in Malaysia

Authors: Endang Jati Mat Sahid, Hussain Ali Bekhet

Abstract:

Ability to supply reliable and secure electricity has been one of the crucial components of economic development for any country. Forecasting of electricity production is therefore very important for accurate investment planning of generation power plants. In this study, we aim to examine and analyze the factors that affect electricity generation. Multiple regression models were used to find the relationship between various variables and electricity production. The models will simultaneously determine the effects of the variables on electricity generation. Many variables influencing electricity generation, i.e. natural gas (NG), coal (CO), fuel oil (FO), renewable energy (RE), gross domestic product (GDP) and fuel prices (FP), were examined for Malaysia. The results demonstrate that NG, CO, and FO were the main factors influencing electricity generation growth. This study then identified a number of policy implications resulting from the empirical results.

Keywords: energy policy, energy security, electricity production, Malaysia, the regression model

Procedia PDF Downloads 163
18491 Machine Learning Approach for Stress Detection Using Wireless Physical Activity Tracker

Authors: B. Padmaja, V. V. Rama Prasad, K. V. N. Sunitha, E. Krishna Rao Patro

Abstract:

Stress is a psychological condition that reduces the quality of sleep and affects every facet of life. Constant exposure to stress is detrimental not only for mind but also body. Nevertheless, to cope with stress, one should first identify it. This paper provides an effective method for the cognitive stress level detection by using data provided from a physical activity tracker device Fitbit. This device gathers people’s daily activities of food, weight, sleep, heart rate, and physical activities. In this paper, four major stressors like physical activities, sleep patterns, working hours and change in heart rate are used to assess the stress levels of individuals. The main motive of this system is to use machine learning approach in stress detection with the help of Smartphone sensor technology. Individually, the effect of each stressor is evaluated using logistic regression and then combined model is built and assessed using variants of ordinal logistic regression models like logit, probit and complementary log-log. Then the quality of each model is evaluated using Akaike Information Criterion (AIC) and probit is assessed as the more suitable model for our dataset. This system is experimented and evaluated in a real time environment by taking data from adults working in IT and other sectors in India. The novelty of this work lies in the fact that stress detection system should be less invasive as possible for the users.

Keywords: physical activity tracker, sleep pattern, working hours, heart rate, smartphone sensor

Procedia PDF Downloads 256
18490 Stock Prediction and Portfolio Optimization Thesis

Authors: Deniz Peksen

Abstract:

This thesis aims to predict trend movement of closing price of stock and to maximize portfolio by utilizing the predictions. In this context, the study aims to define a stock portfolio strategy from models created by using Logistic Regression, Gradient Boosting and Random Forest. Recently, predicting the trend of stock price has gained a significance role in making buy and sell decisions and generating returns with investment strategies formed by machine learning basis decisions. There are plenty of studies in the literature on the prediction of stock prices in capital markets using machine learning methods but most of them focus on closing prices instead of the direction of price trend. Our study differs from literature in terms of target definition. Ours is a classification problem which is focusing on the market trend in next 20 trading days. To predict trend direction, fourteen years of data were used for training. Following three years were used for validation. Finally, last three years were used for testing. Training data are between 2002-06-18 and 2016-12-30 Validation data are between 2017-01-02 and 2019-12-31 Testing data are between 2020-01-02 and 2022-03-17 We determine Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate as benchmarks which we should outperform. We compared our machine learning basis portfolio return on test data with return of Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate. We assessed our model performance with the help of roc-auc score and lift charts. We use logistic regression, Gradient Boosting and Random Forest with grid search approach to fine-tune hyper-parameters. As a result of the empirical study, the existence of uptrend and downtrend of five stocks could not be predicted by the models. When we use these predictions to define buy and sell decisions in order to generate model-based-portfolio, model-based-portfolio fails in test dataset. It was found that Model-based buy and sell decisions generated a stock portfolio strategy whose returns can not outperform non-model portfolio strategies on test dataset. We found that any effort for predicting the trend which is formulated on stock price is a challenge. We found same results as Random Walk Theory claims which says that stock price or price changes are unpredictable. Our model iterations failed on test dataset. Although, we built up several good models on validation dataset, we failed on test dataset. We implemented Random Forest, Gradient Boosting and Logistic Regression. We discovered that complex models did not provide advantage or additional performance while comparing them with Logistic Regression. More complexity did not lead us to reach better performance. Using a complex model is not an answer to figure out the stock-related prediction problem. Our approach was to predict the trend instead of the price. This approach converted our problem into classification. However, this label approach does not lead us to solve the stock prediction problem and deny or refute the accuracy of the Random Walk Theory for the stock price.

Keywords: stock prediction, portfolio optimization, data science, machine learning

Procedia PDF Downloads 80
18489 Impact Factor Analysis for Spatially Varying Aerosol Optical Depth in Wuhan Agglomeration

Authors: Wenting Zhang, Shishi Liu, Peihong Fu

Abstract:

As an indicator of air quality and directly related to concentration of ground PM2.5, the spatial-temporal variation and impact factor analysis of Aerosol Optical Depth (AOD) have been a hot spot in air pollution. This paper concerns the non-stationarity and the autocorrelation (with Moran’s I index of 0.75) of the AOD in Wuhan agglomeration (WHA), in central China, uses the geographically weighted regression (GRW) to identify the spatial relationship of AOD and its impact factors. The 3 km AOD product of Moderate Resolution Imaging Spectrometer (MODIS) is used in this study. Beyond the economic-social factor, land use density factors, vegetable cover, and elevation, the landscape metric is also considered as one factor. The results suggest that the GWR model is capable of dealing with spatial varying relationship, with R square, corrected Akaike Information Criterion (AICc) and standard residual better than that of ordinary least square (OLS) model. The results of GWR suggest that the urban developing, forest, landscape metric, and elevation are the major driving factors of AOD. Generally, the higher AOD trends to located in the place with higher urban developing, less forest, and flat area.

Keywords: aerosol optical depth, geographically weighted regression, land use change, Wuhan agglomeration

Procedia PDF Downloads 357
18488 Development of a Data-Driven Method for Diagnosing the State of Health of Battery Cells, Based on the Use of an Electrochemical Aging Model, with a View to Their Use in Second Life

Authors: Desplanches Maxime

Abstract:

Accurate estimation of the remaining useful life of lithium-ion batteries for electronic devices is crucial. Data-driven methodologies encounter challenges related to data volume and acquisition protocols, particularly in capturing a comprehensive range of aging indicators. To address these limitations, we propose a hybrid approach that integrates an electrochemical model with state-of-the-art data analysis techniques, yielding a comprehensive database. Our methodology involves infusing an aging phenomenon into a Newman model, leading to the creation of an extensive database capturing various aging states based on non-destructive parameters. This database serves as a robust foundation for subsequent analysis. Leveraging advanced data analysis techniques, notably principal component analysis and t-Distributed Stochastic Neighbor Embedding, we extract pivotal information from the data. This information is harnessed to construct a regression function using either random forest or support vector machine algorithms. The resulting predictor demonstrates a 5% error margin in estimating remaining battery life, providing actionable insights for optimizing usage. Furthermore, the database was built from the Newman model calibrated for aging and performance using data from a European project called Teesmat. The model was then initialized numerous times with different aging values, for instance, with varying thicknesses of SEI (Solid Electrolyte Interphase). This comprehensive approach ensures a thorough exploration of battery aging dynamics, enhancing the accuracy and reliability of our predictive model. Of particular importance is our reliance on the database generated through the integration of the electrochemical model. This database serves as a crucial asset in advancing our understanding of aging states. Beyond its capability for precise remaining life predictions, this database-driven approach offers valuable insights for optimizing battery usage and adapting the predictor to various scenarios. This underscores the practical significance of our method in facilitating better decision-making regarding lithium-ion battery management.

Keywords: Li-ion battery, aging, diagnostics, data analysis, prediction, machine learning, electrochemical model, regression

Procedia PDF Downloads 69
18487 An Application of the Single Equation Regression Model

Authors: S. K. Ashiquer Rahman

Abstract:

Recently, oil has become more influential in almost every economic sector as a key material. As can be seen from the news, when there are some changes in an oil price or OPEC announces a new strategy, its effect spreads to every part of the economy directly and indirectly. That’s a reason why people always observe the oil price and try to forecast the changes of it. The most important factor affecting the price is its supply which is determined by the number of wildcats drilled. Therefore, a study about the number of wellheads and other economic variables may give us some understanding of the mechanism indicated by the amount of oil supplies. In this paper, we will consider a relationship between the number of wellheads and three key factors: the price of the wellhead, domestic output, and GNP constant dollars. We also add trend variables in the models because the consumption of oil varies from time to time. Moreover, this paper will use an econometrics method to estimate parameters in the model, apply some tests to verify the result we acquire, and then conclude the model.

Keywords: price, domestic output, GNP, trend variable, wildcat activity

Procedia PDF Downloads 62
18486 Use of Protection Motivation Theory to Assess Preventive Behaviors of COVID-19

Authors: Maryam Khazaee-Pool, Tahereh Pashaei, Koen Ponnet

Abstract:

Background: The global prevalence and morbidity of Coronavirus disease 2019 (COVID-19) are high. Preventive behaviors are proven to reduce the damage caused by the disease. There is a paucity of information on determinants of preventive behaviors in response to COVID-19 in Mazandaran province, north of Iran. So, we aimed to evaluate the protection motivation theory (PMT) in promoting preventive behaviors of COVID-19 in Mazandaran province. Materials and Methods: In this descriptive cross-sectional study, 1220 individuals participated. They were selected via social networks using convenience sampling in 2020. Data were collected online using a demographic questionnaire and a valid and reliable scale based on PMT. Data analysis was done using the Pearson correlation coefficient and linear regression in SPSS V24. Result: The mean age of the participants was 39.34±8.74 years. The regression model showed perceived threat (ß =0.033, P =0.007), perceived costs (ß=0.039, P=0.045), perceived self-efficacy (ß =0.116, P>0.001), and perceived fear (ß=0.131, P>0.001) as the significant predictors of COVID-19 preventive behaviors. This model accounted for 78% of the variance in these behaviors. Conclusion: According to constructs of the PMT associated with protection against COVID-19, educational programs and health promotion based on the theory and benefiting from social networks could be helpful in increasing the motivation of people towards protective behaviors against COVID-19.

Keywords: questionnaire development, validation, intention, prevention, covid-19

Procedia PDF Downloads 42