Search results for: robust regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4375

Search results for: robust regression

4135 Regression Analysis in Estimating Stream-Flow and the Effect of Hierarchical Clustering Analysis: A Case Study in Euphrates-Tigris Basin

Authors: Goksel Ezgi Guzey, Bihrat Onoz

Abstract:

The scarcity of streamflow gauging stations and the increasing effects of global warming cause designing water management systems to be very difficult. This study is a significant contribution to assessing regional regression models for estimating streamflow. In this study, simulated meteorological data was related to the observed streamflow data from 1971 to 2020 for 33 stream gauging stations of the Euphrates-Tigris Basin. Ordinary least squares regression was used to predict flow for 2020-2100 with the simulated meteorological data. CORDEX- EURO and CORDEX-MENA domains were used with 0.11 and 0.22 grids, respectively, to estimate climate conditions under certain climate scenarios. Twelve meteorological variables simulated by two regional climate models, RCA4 and RegCM4, were used as independent variables in the ordinary least squares regression, where the observed streamflow was the dependent variable. The variability of streamflow was then calculated with 5-6 meteorological variables and watershed characteristics such as area and height prior to the application. Of the regression analysis of 31 stream gauging stations' data, the stations were subjected to a clustering analysis, which grouped the stations in two clusters in terms of their hydrometeorological properties. Two streamflow equations were found for the two clusters of stream gauging stations for every domain and every regional climate model, which increased the efficiency of streamflow estimation by a range of 10-15% for all the models. This study underlines the importance of homogeneity of a region in estimating streamflow not only in terms of the geographical location but also in terms of the meteorological characteristics of that region.

Keywords: hydrology, streamflow estimation, climate change, hydrologic modeling, HBV, hydropower

Procedia PDF Downloads 95
4134 DWT-SATS Based Detection of Image Region Cloning

Authors: Michael Zimba

Abstract:

A duplicated image region may be subjected to a number of attacks such as noise addition, compression, reflection, rotation, and scaling with the intention of either merely mating it to its targeted neighborhood or preventing its detection. In this paper, we present an effective and robust method of detecting duplicated regions inclusive of those affected by the various attacks. In order to reduce the dimension of the image, the proposed algorithm firstly performs discrete wavelet transform, DWT, of a suspicious image. However, unlike most existing copy move image forgery (CMIF) detection algorithms operating in the DWT domain which extract only the low frequency sub-band of the DWT of the suspicious image thereby leaving valuable information in the other three sub-bands, the proposed algorithm simultaneously extracts features from all the four sub-bands. The extracted features are not only more accurate representation of image regions but also robust to additive noise, JPEG compression, and affine transformation. Furthermore, principal component analysis-eigenvalue decomposition, PCA-EVD, is applied to reduce the dimension of the features. The extracted features are then sorted using the more computationally efficient Radix Sort algorithm. Finally, same affine transformation selection, SATS, a duplication verification method, is applied to detect duplicated regions. The proposed algorithm is not only fast but also more robust to attacks compared to the related CMIF detection algorithms. The experimental results show high detection rates.

Keywords: affine transformation, discrete wavelet transform, radix sort, SATS

Procedia PDF Downloads 201
4133 Robust Heart Rate Estimation from Multiple Cardiovascular and Non-Cardiovascular Physiological Signals Using Signal Quality Indices and Kalman Filter

Authors: Shalini Rankawat, Mansi Rankawat, Rahul Dubey, Mazad Zaveri

Abstract:

Physiological signals such as electrocardiogram (ECG) and arterial blood pressure (ABP) in the intensive care unit (ICU) are often seriously corrupted by noise, artifacts, and missing data, which lead to errors in the estimation of heart rate (HR) and incidences of false alarm from ICU monitors. Clinical support in ICU requires most reliable heart rate estimation. Cardiac activity, because of its relatively high electrical energy, may introduce artifacts in Electroencephalogram (EEG), Electrooculogram (EOG), and Electromyogram (EMG) recordings. This paper presents a robust heart rate estimation method by detection of R-peaks of ECG artifacts in EEG, EMG & EOG signals, using energy-based function and a novel Signal Quality Index (SQI) assessment technique. SQIs of physiological signals (EEG, EMG, & EOG) were obtained by correlation of nonlinear energy operator (teager energy) of these signals with either ECG or ABP signal. HR is estimated from ECG, ABP, EEG, EMG, and EOG signals from separate Kalman filter based upon individual SQIs. Data fusion of each HR estimate was then performed by weighing each estimate by the Kalman filters’ SQI modified innovations. The fused signal HR estimate is more accurate and robust than any of the individual HR estimate. This method was evaluated on MIMIC II data base of PhysioNet from bedside monitors of ICU patients. The method provides an accurate HR estimate even in the presence of noise and artifacts.

Keywords: ECG, ABP, EEG, EMG, EOG, ECG artifacts, Teager-Kaiser energy, heart rate, signal quality index, Kalman filter, data fusion

Procedia PDF Downloads 672
4132 Further Analysis of Global Robust Stability of Neural Networks with Multiple Time Delays

Authors: Sabri Arik

Abstract:

In this paper, we study the global asymptotic robust stability of delayed neural networks with norm-bounded uncertainties. By employing the Lyapunov stability theory and Homeomorphic mapping theorem, we derive some new types of sufficient conditions ensuring the existence, uniqueness and global asymptotic stability of the equilibrium point for the class of neural networks with discrete time delays under parameter uncertainties and with respect to continuous and slopebounded activation functions. An important aspect of our results is their low computational complexity as the reported results can be verified by checking some properties symmetric matrices associated with the uncertainty sets of network parameters. The obtained results are shown to be generalization of some of the previously published corresponding results. Some comparative numerical examples are also constructed to compare our results with some closely related existing literature results.

Keywords: neural networks, delayed systems, lyapunov functionals, stability analysis

Procedia PDF Downloads 499
4131 The Impact of Governance on Happiness: Evidence from Quantile Regressions

Authors: Chiung-Ju Huang

Abstract:

This study utilizes the quantile regression analysis to examine the impact of governance (including democratic quality and technical quality) on happiness in 101 countries worldwide, classified as “developed countries” and “developing countries”. The empirical results show that the impact of democratic quality and technical quality on happiness is significantly positive for “developed countries”, while is insignificant for “developing countries”. The results suggest that the authorities in developed countries can enhance the level of individual happiness by means of improving the democracy quality and technical quality. However, for developing countries, promoting the quality of governance in order to enhance the level of happiness may not be effective. Policy makers in developed countries may pay more attention on increasing real GDP per capita instead of promoting the quality of governance to enhance individual happiness.

Keywords: governance, happiness, multiple regression, quantile regression

Procedia PDF Downloads 250
4130 Breast Cancer Mortality and Comorbidities in Portugal: A Predictive Model Built with Real World Data

Authors: Cecília M. Antão, Paulo Jorge Nogueira

Abstract:

Breast cancer (BC) is the first cause of cancer mortality among Portuguese women. This retrospective observational study aimed at identifying comorbidities associated with BC female patients admitted to Portuguese public hospitals (2010-2018), investigating the effect of comorbidities on BC mortality rate, and building a predictive model using logistic regression. Results showed that the BC mortality in Portugal decreased in this period and reached 4.37% in 2018. Adjusted odds ratio indicated that secondary malignant neoplasms of liver, of bone and bone marrow, congestive heart failure, and diabetes were associated with an increased chance of dying from breast cancer. Although the Lisbon district (the most populated area) accounted for the largest percentage of BC patients, the logistic regression model showed that, besides patient’s age, being resident in Bragança, Castelo Branco, or Porto districts was directly associated with an increase of the mortality rate.

Keywords: breast cancer, comorbidities, logistic regression, adjusted odds ratio

Procedia PDF Downloads 57
4129 Assessing Relationships between Glandularity and Gray Level by Using Breast Phantoms

Authors: Yun-Xuan Tang, Pei-Yuan Liu, Kun-Mu Lu, Min-Tsung Tseng, Liang-Kuang Chen, Yuh-Feng Tsai, Ching-Wen Lee, Jay Wu

Abstract:

Breast cancer is predominant of malignant tumors in females. The increase in the glandular density increases the risk of breast cancer. BI-RADS is a frequently used density indicator in mammography; however, it significantly overestimates the glandularity. Therefore, it is very important to accurately and quantitatively assess the glandularity by mammography. In this study, 20%, 30% and 50% glandularity phantoms were exposed using a mammography machine at 28, 30 and 31 kVp, and 30, 55, 80 and 105 mAs, respectively. The regions of interest (ROIs) were drawn to assess the gray level. The relationship between the glandularity and gray level under various compression thicknesses, kVp, and mAs was established by the multivariable linear regression. A phantom verification was performed with automatic exposure control (AEC). The regression equation was obtained with an R-square value of 0.928. The average gray levels of the verification phantom were 8708, 8660 and 8434 for 0.952, 0.963 and 0.985 g/cm3, respectively. The percent differences of glandularity to the regression equation were 3.24%, 2.75% and 13.7%. We concluded that the proposed method could be clinically applied in mammography to improve the glandularity estimation and further increase the importance of breast cancer screening.

Keywords: mammography, glandularity, gray value, BI-RADS

Procedia PDF Downloads 464
4128 Data-Driven Dynamic Overbooking Model for Tour Operators

Authors: Kannapha Amaruchkul

Abstract:

We formulate a dynamic overbooking model for a tour operator, in which most reservations contain at least two people. The cancellation rate and the timing of the cancellation may depend on the group size. We propose two overbooking policies, namely economic- and service-based. In an economic-based policy, we want to minimize the expected oversold and underused cost, whereas, in a service-based policy, we ensure that the probability of an oversold situation does not exceed the pre-specified threshold. To illustrate the applicability of our approach, we use tour package data in 2016-2018 from a tour operator in Thailand to build a data-driven robust optimization model, and we tested the proposed overbooking policy in 2019. We also compare the data-driven approach to the conventional approach of fitting data into a probability distribution.

Keywords: applied stochastic model, data-driven robust optimization, overbooking, revenue management, tour operator

Procedia PDF Downloads 106
4127 An Analysis of the Regression Hypothesis from a Shona Broca’s Aphasci Perspective

Authors: Esther Mafunda, Simbarashe Muparangi

Abstract:

The present paper tests the applicability of the Regression Hypothesis on the pathological language dissolution of a Shona male adult with Broca’s aphasia. It particularly assesses the prediction of the Regression Hypothesis, which states that the process according to which language is forgotten will be the reversal of the process according to which it will be acquired. The main aim of the paper is to find out whether mirror symmetries between L1 acquisition and L1 dissolution of tense in Shona and, if so, what might cause these regression patterns. The paper also sought to highlight the practical contributions that Linguistic theory can make to solving language-related problems. Data was collected from a 46-year-old male adult with Broca’s aphasia who was receiving speech therapy at St Giles Rehabilitation Centre in Harare, Zimbabwe. The primary data elicitation method was experimental, using the probe technique. The TART (Test for Assessing Reference Time) Shona version in the form of sequencing pictures was used to access tense by Broca’s aphasic and 3.5-year-old child. Using the SPSS (Statistical Package for Social Studies) and Excel analysis, it was established that the use of the future tense was impaired in Shona Broca’s aphasic whilst the present and past tense was intact. However, though the past tense was intact in the male adult with Broca’s aphasic, a reference to the remote past was made. The use of the future tense was also found to be difficult for the 3,5-year-old speaking child. No difficulties were encountered in using the present and past tenses. This means that mirror symmetries were found between L1 acquisition and L1 dissolution of tense in Shona. On the basis of the results of this research, it can be concluded that the use of tense in a Shona adult with Broca’s aphasia supports the Regression Hypothesis. The findings of this study are important in terms of speech therapy in the context of Zimbabwe. The study also contributes to Bantu linguistics in general and to Shona linguistics in particular. Further studies could also be done focusing on the rest of the Bantu language varieties in terms of aphasia.

Keywords: Broca’s Aphasia, regression hypothesis, Shona, language dissolution

Procedia PDF Downloads 61
4126 Apricot Insurance Portfolio Risk

Authors: Kasirga Yildirak, Ismail Gur

Abstract:

We propose a model to measure hail risk of an Agricultural Insurance portfolio. Hail is one of the major catastrophic event that causes big amount of loss to an insurer. Moreover, it is very hard to predict due to its strange atmospheric characteristics. We make use of parcel based claims data on apricot damage collected by the Turkish Agricultural Insurance Pool (TARSIM). As our ultimate aim is to compute the loadings assigned to specific parcels, we build a portfolio risk model that makes use of PD and the severity of the exposures. PD is computed by Spherical-Linear and Circular –Linear regression models as the data carries coordinate information and seasonality. Severity is mapped into integer brackets so that Probability Generation Function could be employed. Individual regressions are run on each clusters estimated on different criteria. Loss distribution is constructed by Panjer Recursion technique. We also show that one risk-one crop model can easily be extended to the multi risk–multi crop model by assuming conditional independency.

Keywords: hail insurance, spherical regression, circular regression, spherical clustering

Procedia PDF Downloads 228
4125 Enhancing the Interpretation of Group-Level Diagnostic Results from Cognitive Diagnostic Assessment: Application of Quantile Regression and Cluster Analysis

Authors: Wenbo Du, Xiaomei Ma

Abstract:

With the empowerment of Cognitive Diagnostic Assessment (CDA), various domains of language testing and assessment have been investigated to dig out more diagnostic information. What is noticeable is that most of the extant empirical CDA-based research puts much emphasis on individual-level diagnostic purpose with very few concerned about learners’ group-level performance. Even though the personalized diagnostic feedback is the unique feature that differentiates CDA from other assessment tools, group-level diagnostic information cannot be overlooked in that it might be more practical in classroom setting. Additionally, the group-level diagnostic information obtained via current CDA always results in a “flat pattern”, that is, the mastery/non-mastery of all tested skills accounts for the two highest proportion. In that case, the outcome does not bring too much benefits than the original total score. To address these issues, the present study attempts to apply cluster analysis for group classification and quantile regression analysis to pinpoint learners’ performance at different proficiency levels (beginner, intermediate and advanced) thus to enhance the interpretation of the CDA results extracted from a group of EFL learners’ reading performance on a diagnostic reading test designed by PELDiaG research team from a key university in China. The results show that EM method in cluster analysis yield more appropriate classification results than that of CDA, and quantile regression analysis does picture more insightful characteristics of learners with different reading proficiencies. The findings are helpful and practical for instructors to refine EFL reading curriculum and instructional plan tailored based on the group classification results and quantile regression analysis. Meanwhile, these innovative statistical methods could also make up the deficiencies of CDA and push forward the development of language testing and assessment in the future.

Keywords: cognitive diagnostic assessment, diagnostic feedback, EFL reading, quantile regression

Procedia PDF Downloads 121
4124 Machine Learning Framework: Competitive Intelligence and Key Drivers Identification of Market Share Trends among Healthcare Facilities

Authors: Anudeep Appe, Bhanu Poluparthi, Lakshmi Kasivajjula, Udai Mv, Sobha Bagadi, Punya Modi, Aditya Singh, Hemanth Gunupudi, Spenser Troiano, Jeff Paul, Justin Stovall, Justin Yamamoto

Abstract:

The necessity of data-driven decisions in healthcare strategy formulation is rapidly increasing. A reliable framework which helps identify factors impacting a healthcare provider facility or a hospital (from here on termed as facility) market share is of key importance. This pilot study aims at developing a data-driven machine learning-regression framework which aids strategists in formulating key decisions to improve the facility’s market share which in turn impacts in improving the quality of healthcare services. The US (United States) healthcare business is chosen for the study, and the data spanning 60 key facilities in Washington State and about 3 years of historical data is considered. In the current analysis, market share is termed as the ratio of the facility’s encounters to the total encounters among the group of potential competitor facilities. The current study proposes a two-pronged approach of competitor identification and regression approach to evaluate and predict market share, respectively. Leveraged model agnostic technique, SHAP, to quantify the relative importance of features impacting the market share. Typical techniques in literature to quantify the degree of competitiveness among facilities use an empirical method to calculate a competitive factor to interpret the severity of competition. The proposed method identifies a pool of competitors, develops Directed Acyclic Graphs (DAGs) and feature level word vectors, and evaluates the key connected components at the facility level. This technique is robust since its data-driven, which minimizes the bias from empirical techniques. The DAGs factor in partial correlations at various segregations and key demographics of facilities along with a placeholder to factor in various business rules (for ex. quantifying the patient exchanges, provider references, and sister facilities). Identified are the multiple groups of competitors among facilities. Leveraging the competitors' identified developed and fine-tuned Random Forest Regression model to predict the market share. To identify key drivers of market share at an overall level, permutation feature importance of the attributes was calculated. For relative quantification of features at a facility level, incorporated SHAP (SHapley Additive exPlanations), a model agnostic explainer. This helped to identify and rank the attributes at each facility which impacts the market share. This approach proposes an amalgamation of the two popular and efficient modeling practices, viz., machine learning with graphs and tree-based regression techniques to reduce the bias. With these, we helped to drive strategic business decisions.

Keywords: competition, DAGs, facility, healthcare, machine learning, market share, random forest, SHAP

Procedia PDF Downloads 55
4123 The Factors of Supply Chain Collaboration

Authors: Ghada Soltane

Abstract:

The objective of this study was to identify factors impacting supply chain collaboration. a quantitative study was carried out on a sample of 84 Tunisian industrial companies. To verify the research hypotheses and test the direct effect of these factors on supply chain collaboration a multiple regression method was used using SPSS 26 software. The results show that there are four factors direct effects that affect supply chain collaboration in a meaningful and positive way, including: trust, engagement, information sharing and information quality

Keywords: supply chain collaboration, factors of collaboration, principal component analysis, multiple regression

Procedia PDF Downloads 9
4122 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining

Procedia PDF Downloads 93
4121 Multi-Point Dieless Forming Product Defect Reduction Using Reliability-Based Robust Process Optimization

Authors: Misganaw Abebe Baye, Ji-Woo Park, Beom-Soo Kang

Abstract:

The product quality of multi-point dieless forming (MDF) is identified to be dependent on the process parameters. Moreover, a certain variation of friction and material properties may have a substantially worse influence on the final product quality. This study proposed on how to compensate the MDF product defects by minimizing the sensitivity of noise parameter variations. This can be attained by reliability-based robust optimization (RRO) technique to obtain the optimal process setting of the controllable parameters. Initially two MDF Finite Element (FE) simulations of AA3003-H14 saddle shape showed a substantial amount of dimpling, wrinkling, and shape error. FE analyses are consequently applied on ABAQUS commercial software to obtain the correlation between the control process setting and noise variation with regard to the product defects. The best prediction models are chosen from the family of metamodels to swap the computational expensive FE simulation. Genetic algorithm (GA) is applied to determine the optimal process settings of the control parameters. Monte Carlo Analysis (MCA) is executed to determine how the noise parameter variation affects the final product quality. Finally, the RRO FE simulation and the experimental result show that the amendment of the control parameters in the final forming process leads to a considerably better-quality product.

Keywords: dimpling, multi-point dieless forming, reliability-based robust optimization, shape error, variation, wrinkling

Procedia PDF Downloads 223
4120 Study on Optimal Control Strategy of PM2.5 in Wuhan, China

Authors: Qiuling Xie, Shanliang Zhu, Zongdi Sun

Abstract:

In this paper, we analyzed the correlation relationship among PM2.5 from other five Air Quality Indices (AQIs) based on the grey relational degree, and built a multivariate nonlinear regression equation model of PM2.5 and the five monitoring indexes. For the optimal control problem of PM2.5, we took the partial large Cauchy distribution of membership equation as satisfaction function. We established a nonlinear programming model with the goal of maximum performance to price ratio. And the optimal control scheme is given.

Keywords: grey relational degree, multiple linear regression, membership function, nonlinear programming

Procedia PDF Downloads 261
4119 SVM-Based Modeling of Mass Transfer Potential of Multiple Plunging Jets

Authors: Surinder Deswal, Mahesh Pal

Abstract:

The paper investigates the potential of support vector machines based regression approach to model the mass transfer capacity of multiple plunging jets, both vertical (θ = 90°) and inclined (θ = 60°). The data set used in this study consists of four input parameters with a total of eighty eight cases. For testing, tenfold cross validation was used. Correlation coefficient values of 0.971 and 0.981 (root mean square error values of 0.0025 and 0.0020) were achieved by using polynomial and radial basis kernel functions based support vector regression respectively. Results suggest an improved performance by radial basis function in comparison to polynomial kernel based support vector machines. The estimated overall mass transfer coefficient, by both the kernel functions, is in good agreement with actual experimental values (within a scatter of ±15 %); thereby suggesting the utility of support vector machines based regression approach.

Keywords: mass transfer, multiple plunging jets, support vector machines, ecological sciences

Procedia PDF Downloads 425
4118 Fast and Robust Long-term Tracking with Effective Searching Model

Authors: Thang V. Kieu, Long P. Nguyen

Abstract:

Kernelized Correlation Filter (KCF) based trackers have gained a lot of attention recently because of their accuracy and fast calculation speed. However, this algorithm is not robust in cases where the object is lost by a sudden change of direction, being obscured or going out of view. In order to improve KCF performance in long-term tracking, this paper proposes an anomaly detection method for target loss warning by analyzing the response map of each frame, and a classification algorithm for reliable target re-locating mechanism by using Random fern. Being tested with Visual Tracker Benchmark and Visual Object Tracking datasets, the experimental results indicated that the precision and success rate of the proposed algorithm were 2.92 and 2.61 times higher than that of the original KCF algorithm, respectively. Moreover, the proposed tracker handles occlusion better than many state-of-the-art long-term tracking methods while running at 60 frames per second.

Keywords: correlation filter, long-term tracking, random fern, real-time tracking

Procedia PDF Downloads 114
4117 Supervised-Component-Based Generalised Linear Regression with Multiple Explanatory Blocks: THEME-SCGLR

Authors: Bry X., Trottier C., Mortier F., Cornu G., Verron T.

Abstract:

We address component-based regularization of a Multivariate Generalized Linear Model (MGLM). A set of random responses Y is assumed to depend, through a GLM, on a set X of explanatory variables, as well as on a set T of additional covariates. X is partitioned into R conceptually homogeneous blocks X1, ... , XR , viewed as explanatory themes. Variables in each Xr are assumed many and redundant. Thus, Generalised Linear Regression (GLR) demands regularization with respect to each Xr. By contrast, variables in T are assumed selected so as to demand no regularization. Regularization is performed searching each Xr for an appropriate number of orthogonal components that both contribute to model Y and capture relevant structural information in Xr. We propose a very general criterion to measure structural relevance (SR) of a component in a block, and show how to take SR into account within a Fisher-scoring-type algorithm in order to estimate the model. We show how to deal with mixed-type explanatory variables. The method, named THEME-SCGLR, is tested on simulated data.

Keywords: Component-Model, Fisher Scoring Algorithm, GLM, PLS Regression, SCGLR, SEER, THEME

Procedia PDF Downloads 373
4116 Parameter Estimation via Metamodeling

Authors: Sergio Haram Sarmiento, Arcady Ponosov

Abstract:

Based on appropriate multivariate statistical methodology, we suggest a generic framework for efficient parameter estimation for ordinary differential equations and the corresponding nonlinear models. In this framework classical linear regression strategies is refined into a nonlinear regression by a locally linear modelling technique (known as metamodelling). The approach identifies those latent variables of the given model that accumulate most information about it among all approximations of the same dimension. The method is applied to several benchmark problems, in particular, to the so-called ”power-law systems”, being non-linear differential equations typically used in Biochemical System Theory.

Keywords: principal component analysis, generalized law of mass action, parameter estimation, metamodels

Procedia PDF Downloads 480
4115 Development of Computational Approach for Calculation of Hydrogen Solubility in Hydrocarbons for Treatment of Petroleum

Authors: Abdulrahman Sumayli, Saad M. AlShahrani

Abstract:

For the hydrogenation process, knowing the solubility of hydrogen (H2) in hydrocarbons is critical to improve the efficiency of the process. We investigated the H2 solubility computation in four heavy crude oil feedstocks using machine learning techniques. Temperature, pressure, and feedstock type were considered as the inputs to the models, while the hydrogen solubility was the sole response. Specifically, we employed three different models: Support Vector Regression (SVR), Gaussian process regression (GPR), and Bayesian ridge regression (BRR). To achieve the best performance, the hyper-parameters of these models are optimized using the whale optimization algorithm (WOA). We evaluated the models using a dataset of solubility measurements in various feedstocks, and we compared their performance based on several metrics. Our results show that the WOA-SVR model tuned with WOA achieves the best performance overall, with an RMSE of 1.38 × 10− 2 and an R-squared of 0.991. These findings suggest that machine learning techniques can provide accurate predictions of hydrogen solubility in different feedstocks, which could be useful in the development of hydrogen-related technologies. Besides, the solubility of hydrogen in the four heavy oil fractions is estimated in different ranges of temperatures and pressures of 150 ◦C–350 ◦C and 1.2 MPa–10.8 MPa, respectively

Keywords: temperature, pressure variations, machine learning, oil treatment

Procedia PDF Downloads 42
4114 Representativity Based Wasserstein Active Regression

Authors: Benjamin Bobbia, Matthias Picard

Abstract:

In recent years active learning methodologies based on the representativity of the data seems more promising to limit overfitting. The presented query methodology for regression using the Wasserstein distance measuring the representativity of our labelled dataset compared to the global distribution. In this work a crucial use of GroupSort Neural Networks is made therewith to draw a double advantage. The Wasserstein distance can be exactly expressed in terms of such neural networks. Moreover, one can provide explicit bounds for their size and depth together with rates of convergence. However, heterogeneity of the dataset is also considered by weighting the Wasserstein distance with the error of approximation at the previous step of active learning. Such an approach leads to a reduction of overfitting and high prediction performance after few steps of query. After having detailed the methodology and algorithm, an empirical study is presented in order to investigate the range of our hyperparameters. The performances of this method are compared, in terms of numbers of query needed, with other classical and recent query methods on several UCI datasets.

Keywords: active learning, Lipschitz regularization, neural networks, optimal transport, regression

Procedia PDF Downloads 56
4113 A Machine Learning Approach for Earthquake Prediction in Various Zones Based on Solar Activity

Authors: Viacheslav Shkuratskyy, Aminu Bello Usman, Michael O’Dea, Saifur Rahman Sabuj

Abstract:

This paper examines relationships between solar activity and earthquakes; it applied machine learning techniques: K-nearest neighbour, support vector regression, random forest regression, and long short-term memory network. Data from the SILSO World Data Center, the NOAA National Center, the GOES satellite, NASA OMNIWeb, and the United States Geological Survey were used for the experiment. The 23rd and 24th solar cycles, daily sunspot number, solar wind velocity, proton density, and proton temperature were all included in the dataset. The study also examined sunspots, solar wind, and solar flares, which all reflect solar activity and earthquake frequency distribution by magnitude and depth. The findings showed that the long short-term memory network model predicts earthquakes more correctly than the other models applied in the study, and solar activity is more likely to affect earthquakes of lower magnitude and shallow depth than earthquakes of magnitude 5.5 or larger with intermediate depth and deep depth.

Keywords: k-nearest neighbour, support vector regression, random forest regression, long short-term memory network, earthquakes, solar activity, sunspot number, solar wind, solar flares

Procedia PDF Downloads 36
4112 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis

Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin

Abstract:

Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.

Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve

Procedia PDF Downloads 304
4111 Empirical Evidence to Beliefs and Perceptions About Mental Health Disorder and Substance Abuse: The Role of a Social Worker

Authors: Helena Baffoe

Abstract:

Context: In the United States, there have been significant advancements in programs aimed at improving the lives of individuals with mental health disorders and substance abuse problems. However, public attitudes and beliefs regarding these issues have not improved correspondingly. This study aims to explore the perceptions and beliefs surrounding mental health disorders and substance abuse in the context of data analytics in the field of social work. Research Aim: The aim of this research is to provide empirical evidence on the beliefs and perceptions regarding mental health disorders and substance abuse. Specifically, the study seeks to answer the question of whether being diagnosed with a mental disorder implies a diagnosis of substance abuse. Additionally, the research aims to analyze the specific roles that social workers can play in addressing individuals with mental disorders. Methodology: This research adopts a data-driven methodology, acquiring comprehensive data from the Substance Abuse and Mental Health Services Administration (SAMHSA). A noteworthy causal connection between mental disorders and substance abuse exists, a relationship that current literature tends to overlook critically. To address this gap, we applied logistic regression with an Instrumental Variable approach, effectively mitigating potential endogeneity issues in the analysis in order to ensure robust and unbiased results. This methodology allows for a rigorous examination of the relationship between mental disorders and substance abuse. Empirical Findings: The analysis of the data reveals that depressive, anxiety, and trauma/stressor mental disorders are the most common in the United States. However, the study does not find statistically significant evidence to support the notion that being diagnosed with these mental disorders necessarily implies a diagnosis of substance abuse. This suggests that there is a misconception among the public regarding the relationship between mental health disorders and substance abuse. Theoretical Importance: The research contributes to the existing body of literature by providing empirical evidence to challenge prevailing beliefs and perceptions regarding mental health disorders and substance abuse. By using a novel methodological approach and analyzing new US data, the study sheds light on the cultural and social factors that influence these attitudes.

Keywords: mental health disorder, substance abuse, empirical evidence, logistic regression with IV

Procedia PDF Downloads 30
4110 Linear Regression Estimation of Tactile Comfort for Denim Fabrics Based on In-Plane Shear Behavior

Authors: Nazli Uren, Ayse Okur

Abstract:

Tactile comfort of a textile product is an essential property and a major concern when it comes to customer perceptions and preferences. The subjective nature of comfort and the difficulties regarding the simulation of human hand sensory feelings make it hard to establish a well-accepted link between tactile comfort and objective evaluations. On the other hand, shear behavior of a fabric is a mechanical parameter which can be measured by various objective test methods. The principal aim of this study is to determine the tactile comfort of commercially available denim fabrics by subjective measurements, create a tactile score database for denim fabrics and investigate the relations between tactile comfort and shear behavior. In-plane shear behaviors of 17 different commercially available denim fabrics with a variety of raw material and weave structure were measured by a custom design shear frame and conventional bias extension method in two corresponding diagonal directions. Tactile comfort of denim fabrics was determined via subjective customer evaluations as well. Aforesaid relations were statistically investigated and introduced as regression equations. The analyses regarding the relations between tactile comfort and shear behavior showed that there are considerably high correlation coefficients. The suggested regression equations were likewise found out to be statistically significant. Accordingly, it was concluded that the tactile comfort of denim fabrics can be estimated with a high precision, based on the results of in-plane shear behavior measurements.

Keywords: denim fabrics, in-plane shear behavior, linear regression estimation, tactile comfort

Procedia PDF Downloads 273
4109 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs

Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa

Abstract:

Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.

Keywords: classification models, egg weight, fertilised eggs, multiple linear regression

Procedia PDF Downloads 58
4108 Non-Methane Hydrocarbons Emission during the Photocopying Process

Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Kecić S. Vesna, Oros B. Ivana

Abstract:

The prosperity of electronic equipment in photocopying environment not only has improved work efficiency, but also has changed indoor air quality. Considering the number of photocopying employed, indoor air quality might be worse than in general office environments. Determining the contribution from any type of equipment to indoor air pollution is a complex matter. Non-methane hydrocarbons are known to have an important role of air quality due to their high reactivity. The presence of hazardous pollutants in indoor air has been detected in one photocopying shop in Novi Sad, Serbia. Air samples were collected and analyzed for five days, during 8-hr working time in three-time intervals, whereas three different sampling points were determined. Using multiple linear regression model and software package STATISTICA 10 the concentrations of occupational hazards and micro-climates parameters were mutually correlated. Based on the obtained multiple coefficients of determination (0.3751, 0.2389, and 0.1975), a weak positive correlation between the observed variables was determined. Small values of parameter F indicated that there was no statistically significant difference between the concentration levels of non-methane hydrocarbons and micro-climates parameters. The results showed that variable could be presented by the general regression model: y = b0 + b1xi1+ b2xi2. Obtained regression equations allow to measure the quantitative agreement between the variation of variables and thus obtain more accurate knowledge of their mutual relations.

Keywords: non-methane hydrocarbons, photocopying process, multiple regression analysis, indoor air quality, pollutant emission

Procedia PDF Downloads 350
4107 Principal Component Regression in Amylose Content on the Malaysian Market Rice Grains Using Near Infrared Reflectance Spectroscopy

Authors: Syahira Ibrahim, Herlina Abdul Rahim

Abstract:

The amylose content is an essential element in determining the texture and taste of rice grains. This paper evaluates the use of VIS-SWNIRS in estimating the amylose content for seven varieties of rice grains available in the Malaysian market. Each type consists of 30 samples and all the samples are scanned using the spectroscopy to obtain a range of values between 680-1000nm. The Savitzky-Golay (SG) smoothing filter is applied to each sample’s data before the Principal Component Regression (PCR) technique is used to examine the data and produce a single value for each sample. This value is then compared with reference values obtained from the standard iodine colorimetric test in terms of its coefficient of determination, R2. Results show that this technique produced low R2 values of less than 0.50. In order to improve the result, the range should include a wavelength range of 1100-2500nm and the number of samples processed should also be increased.

Keywords: amylose content, diffuse reflectance, Malaysia rice grain, principal component regression (PCR), Visible and Shortwave near-infrared spectroscopy (VIS-SWNIRS)

Procedia PDF Downloads 354
4106 Investigating The Nexus Between Energy Deficiency, Environmental Sustainability and Renewable Energy: The Role of Energy Trade in Global Perspectives

Authors: Fahim Ullah, Muhammad Usman

Abstract:

Energy consumption and environmental sustainability are hard challenges of 21st century. Energy richness increases environmental pollution while energy poverty hinders economic growth. Considering these two aspects, present study calculates energy deficiency and examines the role of renewable energy to overcome rising energy deficiency and carbon emission for selected countries from 1990 to 2021. For empirical analysis, this study uses methods of moments panel quantile regression analysis and to check the robustness, study used panel quantile robust analysis. Graphical analysis indicated rising global energy deficiency since last three decades where energy consumption is higher than energy production. Empirical results showed that renewable energy is a significant factor for reducing energy deficiency. Secondly, the energy deficiency increases carbon emission level and again renewable energy decreases emissions level. This study recommends that global energy deficiency and rising carbon emissions can be controlled through structural change in the form of energy transition to replace non-renewable resources with renewable resources.

Keywords: energy deficiency, renewable energy, carbon emission, energy trade, PQL analysis

Procedia PDF Downloads 22