Search results for: Cox regression model
18422 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course
Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu
Abstract:
This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN
Procedia PDF Downloads 4418421 Transport Related Air Pollution Modeling Using Artificial Neural Network
Authors: K. D. Sharma, M. Parida, S. S. Jain, Anju Saini, V. K. Katiyar
Abstract:
Air quality models form one of the most important components of an urban air quality management plan. Various statistical modeling techniques (regression, multiple regression and time series analysis) have been used to predict air pollution concentrations in the urban environment. These models calculate pollution concentrations due to observed traffic, meteorological and pollution data after an appropriate relationship has been obtained empirically between these parameters. Artificial neural network (ANN) is increasingly used as an alternative tool for modeling the pollutants from vehicular traffic particularly in urban areas. In the present paper, an attempt has been made to model traffic air pollution, specifically CO concentration using neural networks. In case of CO concentration, two scenarios were considered. First, with only classified traffic volume input and the second with both classified traffic volume and meteorological variables. The results showed that CO concentration can be predicted with good accuracy using artificial neural network (ANN).Keywords: air quality management, artificial neural network, meteorological variables, statistical modeling
Procedia PDF Downloads 52418420 Impact of Improved Beehive on Income of Rural Households: Evidence from Bugina District of Northern Ethiopia
Authors: Wondmnew Derebe
Abstract:
Increased adoption of modern beehives improves the livelihood of smallholder farmers whose income largely depends on mixed crop-livestock farming. Improved beehives have been disseminated to farmers in many parts of Ethiopia. However, its impact on income is less investigated. Thus, this study estimates how adopting improved beehives impacts rural households' income. Survey data were collected from 350 randomly selected households' and analyzed using an endogenous switching regression model. The result revealed that the adoption of improved beehives is associated with a higher annual income. On average, improved beehive adopters earned about 6,077 (ETB) more money than their counterparts. However, the impact of adoption would have been larger for actual non-adopters, as reflected in the negative transitional heterogeneity effect of 1792 (ETB). The result also indicated that the decision to adopt or not to adopt improved beehives was subjected to individual self-selection. Improved beehive adoption can increase farmers' income and can be used as an alternative poverty reduction strategy.Keywords: impact, adoption, endogenous switching regression, income, improved
Procedia PDF Downloads 7418419 Early Impact Prediction and Key Factors Study of Artificial Intelligence Patents: A Method Based on LightGBM and Interpretable Machine Learning
Authors: Xingyu Gao, Qiang Wu
Abstract:
Patents play a crucial role in protecting innovation and intellectual property. Early prediction of the impact of artificial intelligence (AI) patents helps researchers and companies allocate resources and make better decisions. Understanding the key factors that influence patent impact can assist researchers in gaining a better understanding of the evolution of AI technology and innovation trends. Therefore, identifying highly impactful patents early and providing support for them holds immeasurable value in accelerating technological progress, reducing research and development costs, and mitigating market positioning risks. Despite the extensive research on AI patents, accurately predicting their early impact remains a challenge. Traditional methods often consider only single factors or simple combinations, failing to comprehensively and accurately reflect the actual impact of patents. This paper utilized the artificial intelligence patent database from the United States Patent and Trademark Office and the Len.org patent retrieval platform to obtain specific information on 35,708 AI patents. Using six machine learning models, namely Multiple Linear Regression, Random Forest Regression, XGBoost Regression, LightGBM Regression, Support Vector Machine Regression, and K-Nearest Neighbors Regression, and using early indicators of patents as features, the paper comprehensively predicted the impact of patents from three aspects: technical, social, and economic. These aspects include the technical leadership of patents, the number of citations they receive, and their shared value. The SHAP (Shapley Additive exPlanations) metric was used to explain the predictions of the best model, quantifying the contribution of each feature to the model's predictions. The experimental results on the AI patent dataset indicate that, for all three target variables, LightGBM regression shows the best predictive performance. Specifically, patent novelty has the greatest impact on predicting the technical impact of patents and has a positive effect. Additionally, the number of owners, the number of backward citations, and the number of independent claims are all crucial and have a positive influence on predicting technical impact. In predicting the social impact of patents, the number of applicants is considered the most critical input variable, but it has a negative impact on social impact. At the same time, the number of independent claims, the number of owners, and the number of backward citations are also important predictive factors, and they have a positive effect on social impact. For predicting the economic impact of patents, the number of independent claims is considered the most important factor and has a positive impact on economic impact. The number of owners, the number of sibling countries or regions, and the size of the extended patent family also have a positive influence on economic impact. The study primarily relies on data from the United States Patent and Trademark Office for artificial intelligence patents. Future research could consider more comprehensive data sources, including artificial intelligence patent data, from a global perspective. While the study takes into account various factors, there may still be other important features not considered. In the future, factors such as patent implementation and market applications may be considered as they could have an impact on the influence of patents.Keywords: patent influence, interpretable machine learning, predictive models, SHAP
Procedia PDF Downloads 5018418 Pattern Synthesis of Nonuniform Linear Arrays Including Mutual Coupling Effects Based on Gaussian Process Regression and Genetic Algorithm
Authors: Ming Su, Ziqiang Mu
Abstract:
This paper proposes a synthesis method for nonuniform linear antenna arrays that combine Gaussian process regression (GPR) and genetic algorithm (GA). In this method, the GPR model can be used to calculate the array radiation pattern in the presence of mutual coupling effects, and then the GA is used to optimize the excitations and locations of the elements so as to generate the desired radiation pattern. In this paper, taking a 9-element nonuniform linear array as an example and the desired radiation pattern corresponding to a Chebyshev distribution as the optimization objective, optimize the excitations and locations of the elements. Finally, the optimization results are verified by electromagnetic simulation software CST, which shows that the method is effective.Keywords: nonuniform linear antenna arrays, GPR, GA, mutual coupling effects, active element pattern
Procedia PDF Downloads 11018417 Analysing the Interactive Effects of Factors Influencing Sand Production on Drawdown Time in High Viscosity Reservoirs
Authors: Gerald Gwamba, Bo Zhou, Yajun Song, Dong Changyin
Abstract:
The challenges that sand production presents to the oil and gas industry, particularly while working in poorly consolidated reservoirs, cannot be overstated. From restricting production to blocking production tubing, sand production increases the costs associated with production as it elevates the cost of servicing production equipment over time. Production in reservoirs that present with high viscosities, flow rate, cementation, clay content as well as fine sand contents is even more complex and challenging. As opposed to the one-factor at a-time testing, investigating the interactive effects arising from a combination of several factors offers increased reliability of results as well as representation of actual field conditions. It is thus paramount to investigate the conditions leading to the onset of sanding during production to ensure the future sustainability of hydrocarbon production operations under viscous conditions. We adopt the Design of Experiments (DOE) to analyse, using Taguchi factorial designs, the most significant interactive effects of sanding. We propose an optimized regression model to predict the drawdown time at sand production. The results obtained underscore that reservoirs characterized by varying (high and low) levels of viscosity, flow rate, cementation, clay, and fine sand content have a resulting impact on sand production. The only significant interactive effect recorded arises from the interaction between BD (fine sand content and flow rate), while the main effects included fluid viscosity and cementation, with percentage significances recorded as 31.3%, 37.76%, and 30.94%, respectively. The drawdown time model presented could be useful for predicting the time to reach the maximum drawdown pressure under viscous conditions during the onset of sand production.Keywords: factorial designs, DOE optimization, sand production prediction, drawdown time, regression model
Procedia PDF Downloads 15218416 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling
Authors: Florin Leon, Silvia Curteanu
Abstract:
Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.Keywords: batch bulk methyl methacrylate polymerization, adaptive sampling, machine learning, large margin nearest neighbor regression
Procedia PDF Downloads 30518415 Generalized Correlation Coefficient in Genome-Wide Association Analysis of Cognitive Ability in Twins
Authors: Afsaneh Mohammadnejad, Marianne Nygaard, Jan Baumbach, Shuxia Li, Weilong Li, Jesper Lund, Jacob v. B. Hjelmborg, Lene Christensen, Qihua Tan
Abstract:
Cognitive impairment in the elderly is a key issue affecting the quality of life. Despite a strong genetic background in cognition, only a limited number of single nucleotide polymorphisms (SNPs) have been found. These explain a small proportion of the genetic component of cognitive function, thus leaving a large proportion unaccounted for. We hypothesize that one reason for this missing heritability is the misspecified modeling in data analysis concerning phenotype distribution as well as the relationship between SNP dosage and the phenotype of interest. In an attempt to overcome these issues, we introduced a model-free method based on the generalized correlation coefficient (GCC) in a genome-wide association study (GWAS) of cognitive function in twin samples and compared its performance with two popular linear regression models. The GCC-based GWAS identified two genome-wide significant (P-value < 5e-8) SNPs; rs2904650 near ZDHHC2 on chromosome 8 and rs111256489 near CD6 on chromosome 11. The kinship model also detected two genome-wide significant SNPs, rs112169253 on chromosome 4 and rs17417920 on chromosome 7, whereas no genome-wide significant SNPs were found by the linear mixed model (LME). Compared to the linear models, more meaningful biological pathways like GABA receptor activation, ion channel transport, neuroactive ligand-receptor interaction, and the renin-angiotensin system were found to be enriched by SNPs from GCC. The GCC model outperformed the linear regression models by identifying more genome-wide significant genetic variants and more meaningful biological pathways related to cognitive function. Moreover, GCC-based GWAS was robust in handling genetically related twin samples, which is an important feature in handling genetic confounding in association studies.Keywords: cognition, generalized correlation coefficient, GWAS, twins
Procedia PDF Downloads 12418414 Modelling Conceptual Quantities Using Support Vector Machines
Authors: Ka C. Lam, Oluwafunmibi S. Idowu
Abstract:
Uncertainty in cost is a major factor affecting performance of construction projects. To our knowledge, several conceptual cost models have been developed with varying degrees of accuracy. Incorporating conceptual quantities into conceptual cost models could improve the accuracy of early predesign cost estimates. Hence, the development of quantity models for estimating conceptual quantities of framed reinforced concrete structures using supervised machine learning is the aim of the current research. Using measured quantities of structural elements and design variables such as live loads and soil bearing pressures, response and predictor variables were defined and used for constructing conceptual quantities models. Twenty-four models were developed for comparison using a combination of non-parametric support vector regression, linear regression, and bootstrap resampling techniques. R programming language was used for data analysis and model implementation. Gross soil bearing pressure and gross floor loading were discovered to have a major influence on the quantities of concrete and reinforcement used for foundations. Building footprint and gross floor loading had a similar influence on beams and slabs. Future research could explore the modelling of other conceptual quantities for walls, finishes, and services using machine learning techniques. Estimation of conceptual quantities would assist construction planners in early resource planning and enable detailed performance evaluation of early cost predictions.Keywords: bootstrapping, conceptual quantities, modelling, reinforced concrete, support vector regression
Procedia PDF Downloads 20618413 Innovation and Economic Growth Model of East Asian Countries: The Adaptability of the Model in Ethiopia
Authors: Khalid Yousuf Ahmed
Abstract:
At the beginning of growth period, East Asian countries achieved impressive economic growth for the decades. They transformed from agricultural economy toward industrialization and contributed to dynamic structural transformation. The achievements were driven by government-led development policies that implemented effective innovation policy to boost technological capability of local firms. Recently, most Sub-Saharan African have been showing sustainable growth. Exceptionally, Ethiopia has been recording double-digit growth for a decade. Hence, Ethiopia has claimed to follow the footstep of East Asia development model. The study is going to examine whether Ethiopia can replicate innovation and economic growth model of East Asia by using Japan, Taiwan, South Korea and China as a case to illustrate their model of growth. This research will be based on empirical data gathering and extended theory of national innovation system and economic growth theory. Moreover, the methodology is based on Knowledge Assessment Methodology (KAM) and also employing cross-countries regression analysis. The results explained that there is a significant relationship between innovation indicators and economic growth in East Asian countries while the relationship is non-existing for Ethiopia except implementing similar policies and achieving similar growth trend. Therefore, Ethiopia needs to introduce inclusive policies that give priority to improving human capital and invest on the knowledge-based economy to replicate East Asian Model.Keywords: economic growth, FDI, endogenous growth theory, East Asia model
Procedia PDF Downloads 27518412 Marginalized Two-Part Joint Models for Generalized Gamma Family of Distributions
Authors: Mohadeseh Shojaei Shahrokhabadi, Ding-Geng (Din) Chen
Abstract:
Positive continuous outcomes with a substantial number of zero values and incomplete longitudinal follow-up are quite common in medical cost data. To jointly model semi-continuous longitudinal cost data and survival data and to provide marginalized covariate effect estimates, a marginalized two-part joint model (MTJM) has been developed for outcome variables with lognormal distributions. In this paper, we propose MTJM models for outcome variables from a generalized gamma (GG) family of distributions. The GG distribution constitutes a general family that includes approximately all of the most frequently used distributions like the Gamma, Exponential, Weibull, and Log Normal. In the proposed MTJM-GG model, the conditional mean from a conventional two-part model with a three-parameter GG distribution is parameterized to provide the marginal interpretation for regression coefficients. In addition, MTJM-gamma and MTJM-Weibull are developed as special cases of MTJM-GG. To illustrate the applicability of the MTJM-GG, we applied the model to a set of real electronic health record data recently collected in Iran, and we provided SAS code for application. The simulation results showed that when the outcome distribution is unknown or misspecified, which is usually the case in real data sets, the MTJM-GG consistently outperforms other models. The GG family of distribution facilitates estimating a model with improved fit over the MTJM-gamma, standard Weibull, or Log-Normal distributions.Keywords: marginalized two-part model, zero-inflated, right-skewed, semi-continuous, generalized gamma
Procedia PDF Downloads 17618411 Determinants of Travel to Western Countries by Kuwaiti Nationals
Authors: Yvette Reisinger
Abstract:
Relatively little is known about the Arab travel market, especially the outbound travel market from Arab countries in the Middle East. The Kuwaiti travel market is the smallest yet fastest growing in the Gulf Cooperation Council (GCC) region. The Kuwaiti travel market represents a great potential for the international tourism industry. Kuwaiti nationals have a very high spending power due to the Kuwaiti dinar being the highest-valued currency unit in the world. Although Europe, North America, and Asia/Pacific try to attract the Arab tourist market the number of Kuwaiti travellers attracted to these destinations is very low. The success in attracting the Kuwaiti travel market to Western countries must be guided by an analysis of the factors that affect its travel decisions. The objective of the study is to identify major factors that influence Kuwaiti nationals’ intentions to travel to Western countries. A model is developed and empirically tested on a sample of 343 Kuwaiti nationals. A series of regression analyses are run to determine the effects of different factors on Kuwaiti’s travel decisions. A Herman’s single factor test and Durbin-Watson test are used to assess the validity of the regression model. Analysis is controlled for socio-demographics. The results show that the Muslim friendly amenities and destination cognitive image exert significant effects on Kuwaiti nationals’ intentions to travel to Western countries. The study provides a better understanding of the factors that attract Kuwaiti tourists to Western countries. By knowing what encourages Kuwaitis to travel to Western countries marketers can plan and promote these countries accordingly. The study provides a foundation of future empirical research into the Kuwaiti/Arab travel market.Keywords: Kuwaiti travel market, travel decisions, Western countries
Procedia PDF Downloads 19218410 Optimal Evaluation of Weather Risk Insurance for Wheat
Authors: Slim Amami
Abstract:
A model is developed to prevent the risks related to climate conditions in the agricultural sector. It will determine the yearly optimum premium to be paid by a farmer in order to reach his required turnover. The model is mainly based on both climatic stability and 'soft' responses of usually grown species to average climate variations at the same place and inside a safety ball which can be determined from past meteorological data. This allows the use of linear regression expression for dependence of production result in terms of driving meteorological parameters, main ones of which are daily average sunlight, rainfall and temperature. By a simple best parameter fit from the expert table drawn with professionals, optimal representation of yearly production is deduced from records of previous years, and yearly payback is evaluated from minimum yearly produced turnover. Optimal premium is then deduced, and gives the producer a useful bound for negotiating an offer by insurance companies to effectively protect their harvest. The application to wheat production in the French Oise department illustrates the reliability of the present model with as low as 6% difference between predicted and real data. The model can be adapted to almost every agricultural field by changing state parameters and calibrating their associated coefficients.Keywords: agriculture, database, meteorological factors, production model, optimal price
Procedia PDF Downloads 22218409 Investigating the Impacts on Cyclist Casualty Severity at Roundabouts: A UK Case Study
Authors: Nurten Akgun, Dilum Dissanayake, Neil Thorpe, Margaret C. Bell
Abstract:
Cycling has gained a great attention with comparable speeds, low cost, health benefits and reducing the impact on the environment. The main challenge associated with cycling is the provision of safety for the people choosing to cycle as their main means of transport. From the road safety point of view, cyclists are considered as vulnerable road users because they are at higher risk of serious casualty in the urban network but more specifically at roundabouts. This research addresses the development of an enhanced mathematical model by including a broad spectrum of casualty related variables. These variables were geometric design measures (approach number of lanes and entry path radius), speed limit, meteorological condition variables (light, weather, road surface) and socio-demographic characteristics (age and gender), as well as contributory factors. Contributory factors included driver’s behavior related variables such as failed to look properly, sudden braking, a vehicle passing too close to a cyclist, junction overshot, failed to judge other person’s path, restart moving off at the junction, poor turn or manoeuvre and disobeyed give-way. Tyne and Wear in the UK were selected as a case study area. The cyclist casualty data was obtained from UK STATS19 National dataset. The reference categories for the regression model were set to slight and serious cyclist casualties. Therefore, binary logistic regression was applied. Binary logistic regression analysis showed that approach number of lanes was statistically significant at the 95% level of confidence. A higher number of approach lanes increased the probability of severity of cyclist casualty occurrence. In addition, sudden braking statistically significantly increased the cyclist casualty severity at the 95% level of confidence. The result concluded that cyclist casualty severity was highly related to approach a number of lanes and sudden braking. Further research should be carried out an in-depth analysis to explore connectivity of sudden braking and approach number of lanes in order to investigate the driver’s behavior at approach locations. The output of this research will inform investment in measure to improve the safety of cyclists at roundabouts.Keywords: binary logistic regression, casualty severity, cyclist safety, roundabout
Procedia PDF Downloads 17818408 Stature and Gender Estimation Using Foot Measurements in South Indian Population
Authors: Jagadish Rao Padubidri, Mehak Bhandary, Sowmya J. Rao
Abstract:
Introduction: The significance of the human foot and its measurements in identifying an individual has been proved a lot of times by different studies in different geographical areas and its association to the stature and gender of the individual has been justified by many researches. In our study we have used different foot measurements including the length, width, malleol height and navicular height for establishing its association to stature and gender and to find out its accuracy. The purpose of this study is to show the relation of foot measurements with stature and gender, and to derive Multiple and Logistic regression equations for stature and gender estimation in South Indian population. Materials and Methods: The subjects for this study were 200 South Indian students out of which 100 were females and 100 were males, aged between 18 to 24 years. The data for the present study included the stature, foot length, foot breath, foot malleol height, foot navicular height of both right and left foot. Descriptive statistics, T-test and Pearson correlation coefficients were derived between stature, gender and foot measurements. The stature was estimated from right and left foot measurements for both male and female South Indian population using multiple regression analysis and logistic regression analysis for gender estimation. Results: The means, standard deviation, stature, right and left foot measurements and T-test in male population were higher than in females. LFL (Left foot length) is more than RFL (Right Foot length) in male groups, but in female groups the length of both foot are almost equal [RFL=226.6, LFL=227.1]. There is not much of difference in means of RFW (Right foot width) and LFW (Left foot width) in both the genders. Significant difference were seen in mean values of malleol and navicular height of right and left feet in male gender. No such difference was seen in female subjects. Conclusions: The study has successfully demonstrated the correlation of foot length in stature estimation in all the three study groups in both right and left foot. Next in parameters are Foot width and malleol height in estimating stature among male and female groups. Navicular height of both right and left foot showed poor relationship with stature estimation in both male and female groups. Multiple regression equations for both right and left foot measurements to estimate stature were derived with standard error ranging from 11-12 cm in males and 10-11 cm in females. The SEE was 5.8 when both male and female groups were pooled together. The logistic regression model which was derived to determine gender showed 85% accuracy and 92.5% accuracy using right and left foot measurements respectively. We believe that stature and gender can be estimated with foot measurements in South Indian population.Keywords: foot length, gender, stature, South Indian
Procedia PDF Downloads 33518407 Regression for Doubly Inflated Multivariate Poisson Distributions
Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta
Abstract:
Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios
Procedia PDF Downloads 15618406 Modeling of the Effect of Explosives, Geological and Geotechnical Parameters on the Stability of Rock Masses Case of Marrakech: Agadir Highway, Morocco
Authors: Taoufik Benchelha, Toufik Remmal, Rachid El Hamdouni, Hamou Mansouri, Houssein Ejjaouani, Halima Jounaid, Said Benchelha
Abstract:
During the earthworks for the construction of Marrakech-Agadir highway in southern Morocco, which crosses mountainous areas of the High Western Atlas, the main problem faced is the stability of the slopes. Indeed, the use of explosives as a means of excavation associated with the geological structure of the terrain encountered can trigger major ruptures and cause damage which depends on the intrinsic characteristics of the rock mass. The study consists of a geological and geotechnical analysis of several unstable zones located along the route, mobilizing millions of cubic meters of rock, with deduction of the parameters influencing slope stability. From this analysis, a predictive model for rock mass stability is carried out, based on a statistic method of logistic regression, in order to predict the geomechanical behavior of the rock slopes constrained by earthworks.Keywords: explosive, logistic regression, rock mass, slope stability
Procedia PDF Downloads 37618405 Effects of Cash Transfers Mitigation Impacts in the Face of Socioeconomic External Shocks: Evidence from Egypt
Authors: Basma Yassa
Abstract:
Evidence on cash transfers’ effectiveness in mitigating macro and idiosyncratic shocks’ impacts has been mixed and is mostly concentrated in Latin America, Sub-Saharan Africa, and South Asia with very limited evidence from the MENA region. Yet conditional cash transfers schemes have been continually used, especially in Egypt, as the main social protection tool in response to the recent socioeconomic crises and macro shocks. We use 2 panel datasets and 1 cross-sectional dataset to estimate the effectiveness of cash transfers as a shock-mitigative mechanism in the Egyptian context. In this paper, the results from the different models (Panel Fixed Effects model and the Regression Discontinuity Design (RDD) model) confirm that micro and macro shocks lead to significant decline in several household-level welfare outcomes and that Takaful cash transfers have a significant positive impact in mitigating the negative shock impacts, especially on households’ debt incidence, debt levels, and asset ownership, but not necessarily on food, and non-food expenditure levels. The results indicate large positive significant effects on decreasing household incidence of debt by up to 12.4 percent and lowered the debt size by approximately 18 percent among Takaful beneficiaries compared to non-beneficiaries’. Similar evidence is found on asset ownership levels, as the RDD model shows significant positive effects on total asset ownership and productive asset ownership, but the model failed to detect positive impacts on per capita food and non-food expenditures. Further extensions are still in progress to compare the models’ results with the DID model results when using a nationally representative ELMPS panel data (2018/2024) rounds. Finally, our initial analysis suggests that conditional cash transfers are effective in buffering the negative shock impacts on certain welfare indicators even after successive macro-economic shocks in 2022 and 2023 in the Egyptian Context.Keywords: cash transfers, fixed effects, household welfare, household debt, micro shocks, regression discontinuity design
Procedia PDF Downloads 4618404 Homeless Population Modeling and Trend Prediction Through Identifying Key Factors and Machine Learning
Authors: Shayla He
Abstract:
Background and Purpose: According to Chamie (2017), it’s estimated that no less than 150 million people, or about 2 percent of the world’s population, are homeless. The homeless population in the United States has grown rapidly in the past four decades. In New York City, the sheltered homeless population has increased from 12,830 in 1983 to 62,679 in 2020. Knowing the trend on the homeless population is crucial at helping the states and the cities make affordable housing plans, and other community service plans ahead of time to better prepare for the situation. This study utilized the data from New York City, examined the key factors associated with the homelessness, and developed systematic modeling to predict homeless populations of the future. Using the best model developed, named HP-RNN, an analysis on the homeless population change during the months of 2020 and 2021, which were impacted by the COVID-19 pandemic, was conducted. Moreover, HP-RNN was tested on the data from Seattle. Methods: The methodology involves four phases in developing robust prediction methods. Phase 1 gathered and analyzed raw data of homeless population and demographic conditions from five urban centers. Phase 2 identified the key factors that contribute to the rate of homelessness. In Phase 3, three models were built using Linear Regression, Random Forest, and Recurrent Neural Network (RNN), respectively, to predict the future trend of society's homeless population. Each model was trained and tuned based on the dataset from New York City for its accuracy measured by Mean Squared Error (MSE). In Phase 4, the final phase, the best model from Phase 3 was evaluated using the data from Seattle that was not part of the model training and tuning process in Phase 3. Results: Compared to the Linear Regression based model used by HUD et al (2019), HP-RNN significantly improved the prediction metrics of Coefficient of Determination (R2) from -11.73 to 0.88 and MSE by 99%. HP-RNN was then validated on the data from Seattle, WA, which showed a peak %error of 14.5% between the actual and the predicted count. Finally, the modeling results were collected to predict the trend during the COVID-19 pandemic. It shows a good correlation between the actual and the predicted homeless population, with the peak %error less than 8.6%. Conclusions and Implications: This work is the first work to apply RNN to model the time series of the homeless related data. The Model shows a close correlation between the actual and the predicted homeless population. There are two major implications of this result. First, the model can be used to predict the homeless population for the next several years, and the prediction can help the states and the cities plan ahead on affordable housing allocation and other community service to better prepare for the future. Moreover, this prediction can serve as a reference to policy makers and legislators as they seek to make changes that may impact the factors closely associated with the future homeless population trend.Keywords: homeless, prediction, model, RNN
Procedia PDF Downloads 12118403 Separating Landform from Noise in High-Resolution Digital Elevation Models through Scale-Adaptive Window-Based Regression
Authors: Anne M. Denton, Rahul Gomes, David W. Franzen
Abstract:
High-resolution elevation data are becoming increasingly available, but typical approaches for computing topographic features, like slope and curvature, still assume small sliding windows, for example, of size 3x3. That means that the digital elevation model (DEM) has to be resampled to the scale of the landform features that are of interest. Any higher resolution is lost in this resampling. When the topographic features are computed through regression that is performed at the resolution of the original data, the accuracy can be much higher, and the reported result can be adjusted to the length scale that is relevant locally. Slope and variance are calculated for overlapping windows, meaning that one regression result is computed per raster point. The number of window centers per area is the same for the output as for the original DEM. Slope and variance are computed by performing regression on the points in the surrounding window. Such an approach is computationally feasible because of the additive nature of regression parameters and variance. Any doubling of window size in each direction only takes a single pass over the data, corresponding to a logarithmic scaling of the resulting algorithm as a function of the window size. Slope and variance are stored for each aggregation step, allowing the reported slope to be selected to minimize variance. The approach thereby adjusts the effective window size to the landform features that are characteristic to the area within the DEM. Starting with a window size of 2x2, each iteration aggregates 2x2 non-overlapping windows from the previous iteration. Regression results are stored for each iteration, and the slope at minimal variance is reported in the final result. As such, the reported slope is adjusted to the length scale that is characteristic of the landform locally. The length scale itself and the variance at that length scale are also visualized to aid in interpreting the results for slope. The relevant length scale is taken to be half of the window size of the window over which the minimum variance was achieved. The resulting process was evaluated for 1-meter DEM data and for artificial data that was constructed to have defined length scales and added noise. A comparison with ESRI ArcMap was performed and showed the potential of the proposed algorithm. The resolution of the resulting output is much higher and the slope and aspect much less affected by noise. Additionally, the algorithm adjusts to the scale of interest within the region of the image. These benefits are gained without additional computational cost in comparison with resampling the DEM and computing the slope over 3x3 images in ESRI ArcMap for each resolution. In summary, the proposed approach extracts slope and aspect of DEMs at the lengths scales that are characteristic locally. The result is of higher resolution and less affected by noise than existing techniques.Keywords: high resolution digital elevation models, multi-scale analysis, slope calculation, window-based regression
Procedia PDF Downloads 12918402 Lifestyle Factors Associated With Overweight/obesity Status In Croatian Adolescents: A Population-Based Study
Authors: Lovro Štefan
Abstract:
The main purpose of the present study was to investigate the associations between the overweight/obesity status and lifestyle factors. In this cross-sectional study, participants were 1950 urban secondary-school students (54.7% of female students) aged 17-18 years old. Dependent variable was body-mass index status derived from self-reported height and weight. The outcome was binarised, where participants with value <25 kg/m2 were collapsed into „normal“, while those ≥25 kg/m2 into „overweight/obesity“ category. Independent variables were gender, type of school, physical activity, sedentary behaviour, self-rated health, self-perceived socioeconomic status and psychological distress. The associations between the dependent and independent variables were analyzed by using multiple logistic regression analysis. In the univariate model, being overweight/obese was significantly associated with being a male student (OR 0.31; 95% CI 0.23 to 0.42), attending a vocational school (OR 1.87; 95% CI 1.42 to 2.48), not meeting the recommendations for moderate-to-vigorous physical activity (OR 0.44; 95% CI 0.22 to 0.88), more time spending in sedentary behaviour (OR 1.53; 95% CI 1.07 to 2.19), poor self-rated health (OR 0.35, 95% CI 0.20 to 0.56) and lower socioeconomic status (OR 0.63; 95% CI 0.48 to 0.84). In the multivariate model, the same associations occured between the dependent and independent variable. In both models, psychological distress was not associated with being overweight/obese. In conclusion, our findings suggest, that lifestyle factors are independently associated with body-mass indexKeywords: body mass index, secondary-school students, Croatia, physical activity, sedentary behaviour, logistic regression
Procedia PDF Downloads 8918401 Automating and Optimization Monitoring Prognostics for Rolling Bearing
Authors: H. Hotait, X. Chiementin, L. Rasolofondraibe
Abstract:
This paper presents a continuous work to detect the abnormal state in the rolling bearing by studying the vibration signature analysis and calculation of the remaining useful life. To achieve these aims, two methods; the first method is the classification to detect the degradation state by the AOM-OPTICS (Acousto-Optic Modulator) method. The second one is the prediction of the degradation state using least-squares support vector regression and then compared with the linear degradation model. An experimental investigation on ball-bearing was conducted to see the effectiveness of the used method by applying the acquired vibration signals. The proposed model for predicting the state of bearing gives us accurate results with the experimental and numerical data.Keywords: bearings, automatization, optimization, prognosis, classification, defect detection
Procedia PDF Downloads 12018400 Foreign Investment, Technological Diffusion and Competiveness of Exports: A Case for Textile Industry in Pakistan
Authors: Syed Toqueer Akhter, Muhammad Awais
Abstract:
Pakistan is a country which is gifted by naturally abundant resources these resources are a pioneer towards a prospect and developed country. Pakistan is the fourth largest exporter of the textile in the world and with the passage of time the competitiveness of these exports is subject to a decline. With a lot of International players in the textile world like China, Bangladesh, India, and Sri Lanka, Pakistan needs to put up a lot of effort to compete with these countries. This research paper would determine the impact of Foreign Direct Investment upon technological diffusion and that how significantly it may be affecting on export performance of the country. It would also demonstrate that with the increase in Foreign Direct Investment, technological diffusion, strong property rights, and using different policy tools, export competitiveness of the country could be improved. The research has been carried out using time series data from 1995 to 2013 and the results have been estimated by using competing Econometrics modes such as Robust regression and Generalized least squares so that to consolidate the impact of the Foreign Investments and Technological diffusion upon export competitiveness comprehensively. Distributed Lag model has also been used to encompass the lagged effect of policy tools variables used by the government. Model estimates entail that 'FDI' and 'Technological Diffusion' do have a significant impact on the competitiveness of the exports of Pakistan. It may also be inferred that competitiveness of Textile Sector requires integrated policy framework, primarily including the reduction in interest rates, providing subsides, and manufacturing of value added products.Keywords: high technology export, robust regression, patents, technological diffusion, export competitiveness
Procedia PDF Downloads 50118399 Spatial Differentiation Patterns and Influencing Mechanism of Urban Greening in China: Based on Data of 289 Cities
Authors: Fangzheng Li, Xiong Li
Abstract:
Significant differences in urban greening have occurred in Chinese cities, which accompanied with China's rapid urbanization. However, few studies focused on the spatial differentiation of urban greening in China with large amounts of data. The spatial differentiation pattern, spatial correlation characteristics and the distribution shape of urban green space ratio, urban green coverage rate and public green area per capita were calculated and analyzed, using Global and Local Moran's I using data from 289 cities in 2014. We employed Spatial Lag Model and Spatial Error Model to assess the impacts of urbanization process on urban greening of China. Then we used Geographically Weighted Regression to estimate the spatial variations of the impacts. The results showed: 1. a significant spatial dependence and heterogeneity existed in urban greening values, and the differentiation patterns were featured by the administrative grade and the spatial agglomeration simultaneously; 2. it revealed that urbanization has a negative correlation with urban greening in Chinese cities. Among the indices, the the proportion of secondary industry, urbanization rate, population and the scale of urban land use has significant negative correlation with the urban greening of China. Automobile density and per capita Gross Domestic Product has no significant impact. The results of GWR modeling showed that the relationship between urbanization and urban greening was not constant in space. Further, the local parameter estimates suggested significant spatial variation in the impacts of various urbanization factors on urban greening.Keywords: China’s urbanization, geographically weighted regression, spatial differentiation pattern, urban greening
Procedia PDF Downloads 46118398 Support Vector Regression for Retrieval of Soil Moisture Using Bistatic Scatterometer Data at X-Band
Authors: Dileep Kumar Gupta, Rajendra Prasad, Pradeep Kumar, Varun Narayan Mishra, Ajeet Kumar Vishwakarma, Prashant K. Srivastava
Abstract:
An approach was evaluated for the retrieval of soil moisture of bare soil surface using bistatic scatterometer data in the angular range of 200 to 700 at VV- and HH- polarization. The microwave data was acquired by specially designed X-band (10 GHz) bistatic scatterometer. The linear regression analysis was done between scattering coefficients and soil moisture content to select the suitable incidence angle for retrieval of soil moisture content. The 250 incidence angle was found more suitable. The support vector regression analysis was used to approximate the function described by the input-output relationship between the scattering coefficient and corresponding measured values of the soil moisture content. The performance of support vector regression algorithm was evaluated by comparing the observed and the estimated soil moisture content by statistical performance indices %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 2.9451, 1.0986, and 0.9214, respectively at HH-polarization. At VV- polarization, the values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 3.6186, 0.9373, and 0.9428, respectively.Keywords: bistatic scatterometer, soil moisture, support vector regression, RMSE, %Bias, NSE
Procedia PDF Downloads 42818397 Young Adult Gay Men's Healthcare Access in the Era of the Affordable Care Act
Authors: Marybec Griffin
Abstract:
Purpose: The purpose of this cross-sectional study was to get a better understanding of healthcare usage and satisfaction among young adult gay men (YAGM), including the facility used as the usual source of healthcare, preference for coordinated healthcare, and if their primary care provider (PCP) adequately addressed the health needs of gay men. Methods: Interviews were conducted among n=800 YAGM in New York City (NYC). Participants were surveyed about their sociodemographic characteristics and healthcare usage and satisfaction access using multivariable logistic regression models. The surveys were conducted between November 2015 and June 2016. Results: The mean age of the sample was 24.22 years old (SD=4.26). The racial and ethnic background of the participants is as follows: 35.8% (n=286) Black Non-Hispanic, 31.9% (n=225) Hispanic/Latino, 20.5% (n=164) White Non-Hispanic, 4.4% (n=35) Asian/Pacific Islander, and 6.9% (n=55) reporting some other racial or ethnic background. 31.1% (n=249) of the sample had an income below $14,999. 86.7% (n=694) report having either public or private health insurance. For usual source of healthcare, 44.6% (n=357) of the sample reported a private doctor’s office, 16.3% (n=130) reported a community health center, and 7.4% (n=59) reported an urgent care facility, and 7.6% (n=61) reported not having a usual source of healthcare. 56.4% (n=451) of the sample indicated a preference for coordinated healthcare. 54% (n=334) of the sample were very satisfied with their healthcare. Findings from multivariable logistical regression models indicate that participants with higher incomes (AOR=0.54, 95% CI 0.36-0.81, p < 0.01) and participants with a PCP (AOR=0.12, 95% CI 0.07-0.20, p < 0.001) were less likely to use a walk-in facility as their usual source of healthcare. Results from the second multivariable logistic regression model indicated that participants who experienced discrimination in a healthcare setting were less likely to prefer coordinated healthcare (AOR=0.63, 95% CI 0.42-0.96, p < 0.05). In the final multivariable logistic model, results indicated that participants who had disclosed their sexual orientation to their PCP (AOR=2.57, 95% CI 1.25-5.21, p < 0.01) and were comfortable discussing their sexual activity with their PCP (AOR=8.04, 95% CI 4.76-13.58, p < 0.001) were more likely to agree that their PCP adequately addressed the healthcare needs of gay men. Conclusion: Understanding healthcare usage and satisfaction among YAGM is necessary as the healthcare landscape changes, especially given the relatively recent addition of urgent care facilities. The type of healthcare facility used as a usual source of care influences the ability to seek comprehensive and coordinated healthcare services. While coordinated primary and sexual healthcare may be ideal, individual preference for this coordination among YAGM is desired but may be limited due to experiences of discrimination in primary care settings.Keywords: healthcare policy, gay men, healthcare access, Affordable Care Act
Procedia PDF Downloads 23918396 Application of Logistics Regression Model to Ascertain the Determinants of Food Security among Households in Maiduguri, Metropolis, Borno State, Nigeria
Authors: Abdullahi Yahaya Musa, Harun Rann Bakari
Abstract:
The study examined the determinants of food security among households in Maiduguri, Metropolis, Borno State, Nigeria. The objectives of the study are to: examine the determinants of food security among households; identify the coping strategies employed by food-insecure households in Maiduguri, Metropolis, Borno State, Nigeria. The population of the study is 843,964 respondents out of which 400 respondents were sampled. The study used a self-developed questionnaire to collect data from four hundred (400) respondents. Four hundred (400) copies of questionnaires were administered and all were retrieved, making 100% return rate. The study employed descriptive and inferential statistics for data analysis. Descriptive statistics (frequency counts and percentages) was used to analyze the socio-economic characteristics of the respondents and objective four, while inferential statistics (logit regression analysis) was used to analyze one. Four hundred (400) copies of questionnaires were administered and all the four hundred (400) were retrieved, making a 100% return rate. The results were presented in tables and discussed according to the research objectives. The study revealed that HHA, HHE, HHSZ, HHSX, HHAS, HHI, HHFS, HHFE, HHAC and HHCDR were the determinants of food security in Maiduguri Metropolis. Relying on less preferred foods, purchasing food on credit, limiting food intake to ensure children get enough, borrowing money to buy foodstuffs, relying on help from relatives or friends outside the household, adult family members skipping or reducing a meal because of insufficient finances and ration money to household members to buy street food were the coping strategies employed by food-insecure households in Maiduguri metropolis. The study recommended that Nigeria Government should intensify the fight against the Boko haram insurgency. This will put an end to Boko Haram Insurgency and enable farmers to return to farming in Borno state.Keywords: internally displaced persons, food security, coping strategies, descriptive statistics, logistics regression model, odd ratio
Procedia PDF Downloads 14718395 Assessment of Soil Salinity through Remote Sensing Technique in the Coastal Region of Bangladesh
Abstract:
Soil salinity is a major problem for the coastal region of Bangladesh, which has been increasing for the last four decades. Determination of soil salinity is essential for proper land use planning for agricultural crop production. The aim of the research is to estimate and monitor the soil salinity in the study area. Remote sensing can be an effective tool for detecting soil salinity in data-scarce conditions. In the research, Landsat 8 is used, which required atmospheric and radiometric correction, and nine soil salinity indices are applied to develop a soil salinity map. Ground soil salinity data, i.e., EC value, is collected as a printed map which is then scanned and digitized to develop a point shapefile. Linear regression is made between satellite-based generated map and ground soil salinity data, i.e., EC value. The results show that maximum R² value is found for salinity index SI 7 = G*R/B representing 0.022. This minimal R² value refers that there is a negligible relationship between ground EC value and salinity index generated value. Hence, these indices are not appropriate to assess soil salinity though many studies used those soil salinity indices successfully. Therefore, further research is necessary to formulate a model for determining the soil salinity in the coastal of Bangladesh.Keywords: soil salinity, EC, Landsat 8, salinity indices, linear regression, remote sensing
Procedia PDF Downloads 34218394 Deep Learning for Qualitative and Quantitative Grain Quality Analysis Using Hyperspectral Imaging
Authors: Ole-Christian Galbo Engstrøm, Erik Schou Dreier, Birthe Møller Jespersen, Kim Steenstrup Pedersen
Abstract:
Grain quality analysis is a multi-parameterized problem that includes a variety of qualitative and quantitative parameters such as grain type classification, damage type classification, and nutrient regression. Currently, these parameters require human inspection, a multitude of instruments employing a variety of sensor technologies, and predictive model types or destructive and slow chemical analysis. This paper investigates the feasibility of applying near-infrared hyperspectral imaging (NIR-HSI) to grain quality analysis. For this study two datasets of NIR hyperspectral images in the wavelength range of 900 nm - 1700 nm have been used. Both datasets contain images of sparsely and densely packed grain kernels. The first dataset contains ~87,000 image crops of bulk wheat samples from 63 harvests where protein value has been determined by the FOSS Infratec NOVA which is the golden industry standard for protein content estimation in bulk samples of cereal grain. The second dataset consists of ~28,000 image crops of bulk grain kernels from seven different wheat varieties and a single rye variety. In the first dataset, protein regression analysis is the problem to solve while variety classification analysis is the problem to solve in the second dataset. Deep convolutional neural networks (CNNs) have the potential to utilize spatio-spectral correlations within a hyperspectral image to simultaneously estimate the qualitative and quantitative parameters. CNNs can autonomously derive meaningful representations of the input data reducing the need for advanced preprocessing techniques required for classical chemometric model types such as artificial neural networks (ANNs) and partial least-squares regression (PLS-R). A comparison between different CNN architectures utilizing 2D and 3D convolution is conducted. These results are compared to the performance of ANNs and PLS-R. Additionally, a variety of preprocessing techniques from image analysis and chemometrics are tested. These include centering, scaling, standard normal variate (SNV), Savitzky-Golay (SG) filtering, and detrending. The results indicate that the combination of NIR-HSI and CNNs has the potential to be the foundation for an automatic system unifying qualitative and quantitative grain quality analysis within a single sensor technology and predictive model type.Keywords: deep learning, grain analysis, hyperspectral imaging, preprocessing techniques
Procedia PDF Downloads 9918393 A Comparative Analysis of Machine Learning Techniques for PM10 Forecasting in Vilnius
Authors: Mina Adel Shokry Fahim, Jūratė Sužiedelytė Visockienė
Abstract:
With the growing concern over air pollution (AP), it is clear that this has gained more prominence than ever before. The level of consciousness has increased and a sense of knowledge now has to be forwarded as a duty by those enlightened enough to disseminate it to others. This realisation often comes after an understanding of how poor air quality indices (AQI) damage human health. The study focuses on assessing air pollution prediction models specifically for Lithuania, addressing a substantial need for empirical research within the region. Concentrating on Vilnius, it specifically examines particulate matter concentrations 10 micrometers or less in diameter (PM10). Utilizing Gaussian Process Regression (GPR) and Regression Tree Ensemble, and Regression Tree methodologies, predictive forecasting models are validated and tested using hourly data from January 2020 to December 2022. The study explores the classification of AP data into anthropogenic and natural sources, the impact of AP on human health, and its connection to cardiovascular diseases. The study revealed varying levels of accuracy among the models, with GPR achieving the highest accuracy, indicated by an RMSE of 4.14 in validation and 3.89 in testing.Keywords: air pollution, anthropogenic and natural sources, machine learning, Gaussian process regression, tree ensemble, forecasting models, particulate matter
Procedia PDF Downloads 53