Search results for: regression models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8845

Search results for: regression models

8605 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies

Authors: Chen Li-Ching

Abstract:

The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.

Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression

Procedia PDF Downloads 425
8604 Analysis of Atomic Models in High School Physics Textbooks

Authors: Meng-Fei Cheng, Wei Fneg

Abstract:

New Taiwan high school standards emphasize employing scientific models and modeling practices in physics learning. However, to our knowledge. Few studies address how scientific models and modeling are approached in current science teaching, and they do not examine the views of scientific models portrayed in the textbooks. To explore the views of scientific models and modeling in textbooks, this study investigated the atomic unit in different textbook versions as an example and provided suggestions for modeling curriculum. This study adopted a quantitative analysis of qualitative data in the atomic units of four mainstream version of Taiwan high school physics textbooks. The models were further analyzed using five dimensions of the views of scientific models (nature of models, multiple models, purpose of the models, testing models, and changing models); each dimension had three levels (low, medium, high). Descriptive statistics were employed to compare the frequency of describing the five dimensions of the views of scientific models in the atomic unit to understand the emphasis of the views and to compare the frequency of the eight scientific models’ use to investigate the atomic model that was used most often in the textbooks. Descriptive statistics were further utilized to investigate the average levels of the five dimensions of the views of scientific models to examine whether the textbooks views were close to the scientific view. The average level of the five dimensions of the eight atomic models were also compared to examine whether the views of the eight atomic models were close to the scientific views. The results revealed the following three major findings from the atomic unit. (1) Among the five dimensions of the views of scientific models, the most portrayed dimension was the 'purpose of models,' and the least portrayed dimension was 'multiple models.' The most diverse view was the 'purpose of models,' and the most sophisticated scientific view was the 'nature of models.' The least sophisticated scientific view was 'multiple models.' (2) Among the eight atomic models, the most mentioned model was the atomic nucleus model, and the least mentioned model was the three states of matter. (3) Among the correlations between the five dimensions, the dimension of 'testing models' was highly related to the dimension of 'changing models.' In short, this study examined the views of scientific models based on the atomic units of physics textbooks to identify the emphasized and disregarded views in the textbooks. The findings suggest how future textbooks and curriculum can provide a thorough view of scientific models to enhance students' model-based learning.

Keywords: atomic models, textbooks, science education, scientific model

Procedia PDF Downloads 123
8603 A Study on Inference from Distance Variables in Hedonic Regression

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban area, several landmarks may affect housing price and rents, hedonic analysis should employ distance variables corresponding to each landmarks. Unfortunately, the effects of distances to landmarks on housing prices are generally not consistent with the true price. These distance variables may cause magnitude error in regression, pointing a problem of spatial multicollinearity. In this paper, we provided some approaches for getting the samples with less bias and method on locating the specific sampling area to avoid the multicollinerity problem in two specific landmarks case.

Keywords: landmarks, hedonic regression, distance variables, collinearity, multicollinerity

Procedia PDF Downloads 423
8602 Performance of the Cmip5 Models in Simulation of the Present and Future Precipitation over the Lake Victoria Basin

Authors: M. A. Wanzala, L. A. Ogallo, F. J. Opijah, J. N. Mutemi

Abstract:

The usefulness and limitations in climate information are due to uncertainty inherent in the climate system. For any given region to have sustainable development it is important to apply climate information into its socio-economic strategic plans. The overall objective of the study was to assess the performance of the Coupled Model Inter-comparison Project (CMIP5) over the Lake Victoria Basin. The datasets used included the observed point station data, gridded rainfall data from Climate Research Unit (CRU) and hindcast data from eight CMIP5. The methodology included trend analysis, spatial analysis, correlation analysis, Principal Component Analysis (PCA) regression analysis, and categorical statistical skill score. Analysis of the trends in the observed rainfall records indicated an increase in rainfall variability both in space and time for all the seasons. The spatial patterns of the individual models output from the models of MPI, MIROC, EC-EARTH and CNRM were closest to the observed rainfall patterns.

Keywords: categorical statistics, coupled model inter-comparison project, principal component analysis, statistical downscaling

Procedia PDF Downloads 340
8601 Forecasting of Grape Juice Flavor by Using Support Vector Regression

Authors: Ren-Jieh Kuo, Chun-Shou Huang

Abstract:

The research of juice flavor forecasting has become more important in China. Due to the fast economic growth in China, many different kinds of juices have been introduced to the market. If a beverage company can understand their customers’ preference well, the juice can be served more attractively. Thus, this study intends to introduce the basic theory and computing process of grapes juice flavor forecasting based on support vector regression (SVR). Applying SVR, BPN and LR to forecast the flavor of grapes juice in real data, the result shows that SVR is more suitable and effective at predicting performance.

Keywords: flavor forecasting, artificial neural networks, Support Vector Regression, China

Procedia PDF Downloads 450
8600 Generalized Correlation Coefficient in Genome-Wide Association Analysis of Cognitive Ability in Twins

Authors: Afsaneh Mohammadnejad, Marianne Nygaard, Jan Baumbach, Shuxia Li, Weilong Li, Jesper Lund, Jacob v. B. Hjelmborg, Lene Christensen, Qihua Tan

Abstract:

Cognitive impairment in the elderly is a key issue affecting the quality of life. Despite a strong genetic background in cognition, only a limited number of single nucleotide polymorphisms (SNPs) have been found. These explain a small proportion of the genetic component of cognitive function, thus leaving a large proportion unaccounted for. We hypothesize that one reason for this missing heritability is the misspecified modeling in data analysis concerning phenotype distribution as well as the relationship between SNP dosage and the phenotype of interest. In an attempt to overcome these issues, we introduced a model-free method based on the generalized correlation coefficient (GCC) in a genome-wide association study (GWAS) of cognitive function in twin samples and compared its performance with two popular linear regression models. The GCC-based GWAS identified two genome-wide significant (P-value < 5e-8) SNPs; rs2904650 near ZDHHC2 on chromosome 8 and rs111256489 near CD6 on chromosome 11. The kinship model also detected two genome-wide significant SNPs, rs112169253 on chromosome 4 and rs17417920 on chromosome 7, whereas no genome-wide significant SNPs were found by the linear mixed model (LME). Compared to the linear models, more meaningful biological pathways like GABA receptor activation, ion channel transport, neuroactive ligand-receptor interaction, and the renin-angiotensin system were found to be enriched by SNPs from GCC. The GCC model outperformed the linear regression models by identifying more genome-wide significant genetic variants and more meaningful biological pathways related to cognitive function. Moreover, GCC-based GWAS was robust in handling genetically related twin samples, which is an important feature in handling genetic confounding in association studies.

Keywords: cognition, generalized correlation coefficient, GWAS, twins

Procedia PDF Downloads 95
8599 Estimation of Coefficients of Ridge and Principal Components Regressions with Multicollinear Data

Authors: Rajeshwar Singh

Abstract:

The presence of multicollinearity is common in handling with several explanatory variables simultaneously due to exhibiting a linear relationship among them. A great problem arises in understanding the impact of explanatory variables on the dependent variable. Thus, the method of least squares estimation gives inexact estimates. In this case, it is advised to detect its presence first before proceeding further. Using the ridge regression degree of its occurrence is reduced but principal components regression gives good estimates in this situation. This paper discusses well-known techniques of the ridge and principal components regressions and applies to get the estimates of coefficients by both techniques. In addition to it, this paper also discusses the conflicting claim on the discovery of the method of ridge regression based on available documents.

Keywords: conflicting claim on credit of discovery of ridge regression, multicollinearity, principal components and ridge regressions, variance inflation factor

Procedia PDF Downloads 372
8598 Identifying and Quantifying Factors Affecting Traffic Crash Severity under Heterogeneous Traffic Flow

Authors: Praveen Vayalamkuzhi, Veeraragavan Amirthalingam

Abstract:

Studies on safety on highways are becoming the need of the hour as over 400 lives are lost every day in India due to road crashes. In order to evaluate the factors that lead to different levels of crash severity, it is necessary to investigate the level of safety of highways and their relation to crashes. In the present study, an attempt is made to identify the factors that contribute to road crashes and to quantify their effect on the severity of road crashes. The study was carried out on a four-lane divided rural highway in India. The variables considered in the analysis includes components of horizontal alignment of highway, viz., straight or curve section; time of day, driveway density, presence of median; median opening; gradient; operating speed; and annual average daily traffic. These variables were considered after a preliminary analysis. The major complexities in the study are the heterogeneous traffic and the speed variation between different classes of vehicles along the highway. To quantify the impact of each of these factors, statistical analyses were carried out using Logit model and also negative binomial regression. The output from the statistical models proved that the variables viz., horizontal components of the highway alignment; driveway density; time of day; operating speed as well as annual average daily traffic show significant relation with the severity of crashes viz., fatal as well as injury crashes. Further, the annual average daily traffic has significant effect on the severity compared to other variables. The contribution of highway horizontal components on crash severity is also significant. Logit models can predict crashes better than the negative binomial regression models. The results of the study will help the transport planners to look into these aspects at the planning stage itself in the case of highways operated under heterogeneous traffic flow condition.

Keywords: geometric design, heterogeneous traffic, road crash, statistical analysis, level of safety

Procedia PDF Downloads 263
8597 Young Adult Gay Men's Healthcare Access in the Era of the Affordable Care Act

Authors: Marybec Griffin

Abstract:

Purpose: The purpose of this cross-sectional study was to get a better understanding of healthcare usage and satisfaction among young adult gay men (YAGM), including the facility used as the usual source of healthcare, preference for coordinated healthcare, and if their primary care provider (PCP) adequately addressed the health needs of gay men. Methods: Interviews were conducted among n=800 YAGM in New York City (NYC). Participants were surveyed about their sociodemographic characteristics and healthcare usage and satisfaction access using multivariable logistic regression models. The surveys were conducted between November 2015 and June 2016. Results: The mean age of the sample was 24.22 years old (SD=4.26). The racial and ethnic background of the participants is as follows: 35.8% (n=286) Black Non-Hispanic, 31.9% (n=225) Hispanic/Latino, 20.5% (n=164) White Non-Hispanic, 4.4% (n=35) Asian/Pacific Islander, and 6.9% (n=55) reporting some other racial or ethnic background. 31.1% (n=249) of the sample had an income below $14,999. 86.7% (n=694) report having either public or private health insurance. For usual source of healthcare, 44.6% (n=357) of the sample reported a private doctor’s office, 16.3% (n=130) reported a community health center, and 7.4% (n=59) reported an urgent care facility, and 7.6% (n=61) reported not having a usual source of healthcare. 56.4% (n=451) of the sample indicated a preference for coordinated healthcare. 54% (n=334) of the sample were very satisfied with their healthcare. Findings from multivariable logistical regression models indicate that participants with higher incomes (AOR=0.54, 95% CI 0.36-0.81, p < 0.01) and participants with a PCP (AOR=0.12, 95% CI 0.07-0.20, p < 0.001) were less likely to use a walk-in facility as their usual source of healthcare. Results from the second multivariable logistic regression model indicated that participants who experienced discrimination in a healthcare setting were less likely to prefer coordinated healthcare (AOR=0.63, 95% CI 0.42-0.96, p < 0.05). In the final multivariable logistic model, results indicated that participants who had disclosed their sexual orientation to their PCP (AOR=2.57, 95% CI 1.25-5.21, p < 0.01) and were comfortable discussing their sexual activity with their PCP (AOR=8.04, 95% CI 4.76-13.58, p < 0.001) were more likely to agree that their PCP adequately addressed the healthcare needs of gay men. Conclusion: Understanding healthcare usage and satisfaction among YAGM is necessary as the healthcare landscape changes, especially given the relatively recent addition of urgent care facilities. The type of healthcare facility used as a usual source of care influences the ability to seek comprehensive and coordinated healthcare services. While coordinated primary and sexual healthcare may be ideal, individual preference for this coordination among YAGM is desired but may be limited due to experiences of discrimination in primary care settings.

Keywords: healthcare policy, gay men, healthcare access, Affordable Care Act

Procedia PDF Downloads 203
8596 Statistical Time-Series and Neural Architecture of Malaria Patients Records in Lagos, Nigeria

Authors: Akinbo Razak Yinka, Adesanya Kehinde Kazeem, Oladokun Oluwagbenga Peter

Abstract:

Time series data are sequences of observations collected over a period of time. Such data can be used to predict health outcomes, such as disease progression, mortality, hospitalization, etc. The Statistical approach is based on mathematical models that capture the patterns and trends of the data, such as autocorrelation, seasonality, and noise, while Neural methods are based on artificial neural networks, which are computational models that mimic the structure and function of biological neurons. This paper compared both parametric and non-parametric time series models of patients treated for malaria in Maternal and Child Health Centres in Lagos State, Nigeria. The forecast methods considered linear regression, Integrated Moving Average, ARIMA and SARIMA Modeling for the parametric approach, while Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM) Network were used for the non-parametric model. The performance of each method is evaluated using the Mean Absolute Error (MAE), R-squared (R2) and Root Mean Square Error (RMSE) as criteria to determine the accuracy of each model. The study revealed that the best performance in terms of error was found in MLP, followed by the LSTM and ARIMA models. In addition, the Bootstrap Aggregating technique was used to make robust forecasts when there are uncertainties in the data.

Keywords: ARIMA, bootstrap aggregation, MLP, LSTM, SARIMA, time-series analysis

Procedia PDF Downloads 29
8595 Computational Study of Chromatographic Behavior of a Series of S-Triazine Pesticides Based on Their in Silico Biological and Lipophilicity Descriptors

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

In this paper, quantitative structure-retention relationships (QSRR) analysis was applied in order to correlate in silico biological and lipophilicity molecular descriptors with retention values for the set of selected s-triazine herbicides. In silico generated biological and lipophilicity descriptors were discriminated using generalized pair correlation method (GPCM). According to this method, the significant difference between independent variables can be noticed regardless almost equal correlation with dependent variable. Using established multiple linear regression (MLR) models some biological characteristics could be predicted. Established MLR models were evaluated statistically and the most suitable models were selected and ranked using sum of ranking differences (SRD) method. In this method, as reference values, average experimentally obtained values are used. Additionally, using SRD method, similarities among investigated s-triazine herbicides can be noticed. These analysis were conducted in order to characterize selected s-triazine herbicides for future investigations regarding their biodegradability. This study is financially supported by COST action TD1305.

Keywords: descriptors, generalized pair correlation method, pesticides, sum of ranking differences

Procedia PDF Downloads 261
8594 Calculation of Pressure-Varying Langmuir and Brunauer-Emmett-Teller Isotherm Adsorption Parameters

Authors: Trevor C. Brown, David J. Miron

Abstract:

Gas-solid physical adsorption methods are central to the characterization and optimization of the effective surface area, pore size and porosity for applications such as heterogeneous catalysis, and gas separation and storage. Properties such as adsorption uptake, capacity, equilibrium constants and Gibbs free energy are dependent on the composition and structure of both the gas and the adsorbent. However, challenges remain, in accurately calculating these properties from experimental data. Gas adsorption experiments involve measuring the amounts of gas adsorbed over a range of pressures under isothermal conditions. Various constant-parameter models, such as Langmuir and Brunauer-Emmett-Teller (BET) theories are used to provide information on adsorbate and adsorbent properties from the isotherm data. These models typically do not provide accurate interpretations across the full range of pressures and temperatures. The Langmuir adsorption isotherm is a simple approximation for modelling equilibrium adsorption data and has been effective in estimating surface areas and catalytic rate laws, particularly for high surface area solids. The Langmuir isotherm assumes the systematic filling of identical adsorption sites to a monolayer coverage. The BET model is based on the Langmuir isotherm and allows for the formation of multiple layers. These additional layers do not interact with the first layer and the energetics are equal to the adsorbate as a bulk liquid. This BET method is widely used to measure the specific surface area of materials. Both Langmuir and BET models assume that the affinity of the gas for all adsorption sites are identical and so the calculated adsorbent uptake at the monolayer and equilibrium constant are independent of coverage and pressure. Accurate representations of adsorption data have been achieved by extending the Langmuir and BET models to include pressure-varying uptake capacities and equilibrium constants. These parameters are determined using a novel regression technique called flexible least squares for time-varying linear regression. For isothermal adsorption the adsorption parameters are assumed to vary slowly and smoothly with increasing pressure. The flexible least squares for pressure-varying linear regression (FLS-PVLR) approach assumes two distinct types of discrepancy terms, dynamic and measurement for all parameters in the linear equation used to simulate the data. Dynamic terms account for pressure variation in successive parameter vectors, and measurement terms account for differences between observed and theoretically predicted outcomes via linear regression. The resultant pressure-varying parameters are optimized by minimizing both dynamic and measurement residual squared errors. Validation of this methodology has been achieved by simulating adsorption data for n-butane and isobutane on activated carbon at 298 K, 323 K and 348 K and for nitrogen on mesoporous alumina at 77 K with pressure-varying Langmuir and BET adsorption parameters (equilibrium constants and uptake capacities). This modeling provides information on the adsorbent (accessible surface area and micropore volume), adsorbate (molecular areas and volumes) and thermodynamic (Gibbs free energies) variations of the adsorption sites.

Keywords: Langmuir adsorption isotherm, BET adsorption isotherm, pressure-varying adsorption parameters, adsorbate and adsorbent properties and energetics

Procedia PDF Downloads 190
8593 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing

Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva

Abstract:

In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.

Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity

Procedia PDF Downloads 285
8592 Using Simulation Modeling Approach to Predict USMLE Steps 1 and 2 Performances

Authors: Chau-Kuang Chen, John Hughes, Jr., A. Dexter Samuels

Abstract:

The prediction models for the United States Medical Licensure Examination (USMLE) Steps 1 and 2 performances were constructed by the Monte Carlo simulation modeling approach via linear regression. The purpose of this study was to build robust simulation models to accurately identify the most important predictors and yield the valid range estimations of the Steps 1 and 2 scores. The application of simulation modeling approach was deemed an effective way in predicting student performances on licensure examinations. Also, sensitivity analysis (a/k/a what-if analysis) in the simulation models was used to predict the magnitudes of Steps 1 and 2 affected by changes in the National Board of Medical Examiners (NBME) Basic Science Subject Board scores. In addition, the study results indicated that the Medical College Admission Test (MCAT) Verbal Reasoning score and Step 1 score were significant predictors of the Step 2 performance. Hence, institutions could screen qualified student applicants for interviews and document the effectiveness of basic science education program based on the simulation results.

Keywords: prediction model, sensitivity analysis, simulation method, USMLE

Procedia PDF Downloads 307
8591 Effects of Machining Parameters on the Surface Roughness and Vibration of the Milling Tool

Authors: Yung C. Lin, Kung D. Wu, Wei C. Shih, Jui P. Hung

Abstract:

High speed and high precision machining have become the most important technology in manufacturing industry. The surface roughness of high precision components is regarded as the important characteristics of the product quality. However, machining chatter could damage the machined surface and restricts the process efficiency. Therefore, selection of the appropriate cutting conditions is of importance to prevent the occurrence of chatter. In addition, vibration of the spindle tool also affects the surface quality, which implies the surface precision can be controlled by monitoring the vibration of the spindle tool. Based on this concept, this study was aimed to investigate the influence of the machining conditions on the surface roughness and the vibration of the spindle tool. To this end, a series of machining tests were conducted on aluminum alloy. In tests, the vibration of the spindle tool was measured by using the acceleration sensors. The surface roughness of the machined parts was examined using white light interferometer. The response surface methodology (RSM) was employed to establish the mathematical models for predicting surface finish and tool vibration, respectively. The correlation between the surface roughness and spindle tool vibration was also analyzed by ANOVA analysis. According to the machining tests, machined surface with or without chattering was marked on the lobes diagram as the verification of the machining conditions. Using multivariable regression analysis, the mathematical models for predicting the surface roughness and tool vibrations were developed based on the machining parameters, cutting depth (a), feed rate (f) and spindle speed (s). The predicted roughness is shown to agree well with the measured roughness, an average percentage of errors of 10%. The average percentage of errors of the tool vibrations between the measurements and the predictions of mathematical model is about 7.39%. In addition, the tool vibration under various machining conditions has been found to have a positive influence on the surface roughness (r=0.78). As a conclusion from current results, the mathematical models were successfully developed for the predictions of the surface roughness and vibration level of the spindle tool under different cutting condition, which can help to select appropriate cutting parameters and to monitor the machining conditions to achieve high surface quality in milling operation.

Keywords: machining parameters, machining stability, regression analysis, surface roughness

Procedia PDF Downloads 200
8590 Power MOSFET Models Including Quasi-Saturation Effect

Authors: Abdelghafour Galadi

Abstract:

In this paper, accurate power MOSFET models including quasi-saturation effect are presented. These models have no internal node voltages determined by the circuit simulator and use one JFET or one depletion mode MOSFET transistors controlled by an “effective” gate voltage taking into account the quasi-saturation effect. The proposed models achieve accurate simulation results with an average error percentage less than 9%, which is an improvement of 21 percentage points compared to the commonly used standard power MOSFET model. In addition, the models can be integrated in any available commercial circuit simulators by using their analytical equations. A description of the models will be provided along with the parameter extraction procedure.

Keywords: power MOSFET, drift layer, quasi-saturation effect, SPICE model

Procedia PDF Downloads 167
8589 Documentation Project on Boat Models from Saqqara, in the Grand Egyptian Museum

Authors: Ayman Aboelkassem, Mohamoud Ali, Rezq Diab

Abstract:

This project aims to document and preserve boat models which were discovered in the Saqqara by Czech Institute of Egyptology archeological mission at Saqqara (GEM numbers, 46007, 46008, 46009). These boat models dates back to Egyptian Old Kingdom and have been transferred to the Conservation Center of the Grand Egyptian Museum, to be displayed at the new museum.The project objectives making such boat models more visible to visitors through the use of 3D reconstructed models and high resolution photos which describe the history of using the boats during the Ancient Egyptian history. Especially, The Grand Egyptian Museum is going to exhibit the second boat of King Khufu from Old kingdom. The project goals are to document the boat models and arrange an exhibition, where such Models going to be displayed next to the Khufu Second Boat. The project shows the importance of using boats in Ancient Egypt, and connecting their usage through Ancient Egyptian periods till now. The boat models had a unique Symbolized in ancient Egypt and connect the public with their kings. The Egyptian kings allowed high ranked employees to put boat models in their tombs which has a great meaning that they hope to fellow their kings in the journey of the afterlife.

Keywords: archaeology, boat models, 3D digital tools for heritage management, museums

Procedia PDF Downloads 96
8588 Incorporating Anomaly Detection in a Digital Twin Scenario Using Symbolic Regression

Authors: Manuel Alves, Angelica Reis, Armindo Lobo, Valdemar Leiras

Abstract:

In industry 4.0, it is common to have a lot of sensor data. In this deluge of data, hints of possible problems are difficult to spot. The digital twin concept aims to help answer this problem, but it is mainly used as a monitoring tool to handle the visualisation of data. Failure detection is of paramount importance in any industry, and it consumes a lot of resources. Any improvement in this regard is of tangible value to the organisation. The aim of this paper is to add the ability to forecast test failures, curtailing detection times. To achieve this, several anomaly detection algorithms were compared with a symbolic regression approach. To this end, Isolation Forest, One-Class SVM and an auto-encoder have been explored. For the symbolic regression PySR library was used. The first results show that this approach is valid and can be added to the tools available in this context as a low resource anomaly detection method since, after training, the only requirement is the calculation of a polynomial, a useful feature in the digital twin context.

Keywords: anomaly detection, digital twin, industry 4.0, symbolic regression

Procedia PDF Downloads 87
8587 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.

Keywords: classification, CRISP-DM, machine learning, predictive quality, regression

Procedia PDF Downloads 109
8586 A Machine Learning Approach for Intelligent Transportation System Management on Urban Roads

Authors: Ashish Dhamaniya, Vineet Jain, Rajesh Chouhan

Abstract:

Traffic management is one of the gigantic issue in most of the urban roads in al-most all metropolitan cities in India. Speed is one of the critical traffic parameters for effective Intelligent Transportation System (ITS) implementation as it decides the arrival rate of vehicles on an intersection which are majorly the point of con-gestions. The study aimed to leverage Machine Learning (ML) models to produce precise predictions of speed on urban roadway links. The research objective was to assess how categorized traffic volume and road width, serving as variables, in-fluence speed prediction. Four tree-based regression models namely: Decision Tree (DT), Random Forest (RF), Extra Tree (ET), and Extreme Gradient Boost (XGB)are employed for this purpose. The models' performances were validated using test data, and the results demonstrate that Random Forest surpasses other machine learning techniques and a conventional utility theory-based model in speed prediction. The study is useful for managing the urban roadway network performance under mixed traffic conditions and effective implementation of ITS.

Keywords: stream speed, urban roads, machine learning, traffic flow

Procedia PDF Downloads 23
8585 Understanding the Impact of Climate-Induced Rural-Urban Migration on the Technical Efficiency of Maize Production in Malawi

Authors: Innocent Pangapanga-Phiri, Eric Dada Mungatana

Abstract:

This study estimates the effect of climate-induced rural-urban migrants (RUM) on maize productivity. It uses panel data gathered by the National Statistics Office and the World Bank to understand the effect of RUM on the technical efficiency of maize production in rural Malawi. The study runs the two-stage Tobit regression to isolate the real effect of rural-urban migration on the technical efficiency of maize production. The results show that RUM significantly reduces the technical efficiency of maize production. However, the interaction of RUM and climate-smart agriculture has a positive and significant influence on the technical efficiency of maize production, suggesting the need for re-investing migrants’ remittances in agricultural activities.

Keywords: climate-smart agriculture, farm productivity, rural-urban migration, panel stochastic frontier models, two-stage Tobit regression

Procedia PDF Downloads 83
8584 Applying Genetic Algorithm in Exchange Rate Models Determination

Authors: Mehdi Rostamzadeh

Abstract:

Genetic Algorithms (GAs) are an adaptive heuristic search algorithm premised on the evolutionary ideas of natural selection and genetic. In this study, we apply GAs for fundamental and technical models of exchange rate determination in exchange rate market. In this framework, we estimated absolute and relative purchasing power parity, Mundell-Fleming, sticky and flexible prices (monetary models), equilibrium exchange rate and portfolio balance model as fundamental models and Auto Regressive (AR), Moving Average (MA), Auto-Regressive with Moving Average (ARMA) and Mean Reversion (MR) as technical models for Iranian Rial against European Union’s Euro using monthly data from January 1992 to December 2014. Then, we put these models into the genetic algorithm system for measuring their optimal weight for each model. These optimal weights have been measured according to four criteria i.e. R-Squared (R2), mean square error (MSE), mean absolute percentage error (MAPE) and root mean square error (RMSE).Based on obtained Results, it seems that for explaining of Iranian Rial against EU Euro exchange rate behavior, fundamental models are better than technical models.

Keywords: exchange rate, genetic algorithm, fundamental models, technical models

Procedia PDF Downloads 244
8583 Use of Predictive Food Microbiology to Determine the Shelf-Life of Foods

Authors: Fatih Tarlak

Abstract:

Predictive microbiology can be considered as an important field in food microbiology in which it uses predictive models to describe the microbial growth in different food products. Predictive models estimate the growth of microorganisms quickly, efficiently, and in a cost-effective way as compared to traditional methods of enumeration, which are long-lasting, expensive, and time-consuming. The mathematical models used in predictive microbiology are mainly categorised as primary and secondary models. The primary models are the mathematical equations that define the growth data as a function of time under a constant environmental condition. The secondary models describe the effects of environmental factors, such as temperature, pH, and water activity (aw) on the parameters of the primary models, including the maximum specific growth rate and lag phase duration, which are the most critical growth kinetic parameters. The combination of primary and secondary models provides valuable information to set limits for the quantitative detection of the microbial spoilage and assess product shelf-life.

Keywords: shelf-life, growth model, predictive microbiology, simulation

Procedia PDF Downloads 164
8582 Impact of Infrastructural Development on Socio-Economic Growth: An Empirical Investigation in India

Authors: Jonardan Koner

Abstract:

The study attempts to find out the impact of infrastructural investment on state economic growth in India. It further tries to determine the magnitude of the impact of infrastructural investment on economic indicator, i.e., per-capita income (PCI) in Indian States. The study uses panel regression technique to measure the impact of infrastructural investment on per-capita income (PCI) in Indian States. Panel regression technique helps incorporate both the cross-section and time-series aspects of the dataset. In order to analyze the difference in impact of the explanatory variables on the explained variables across states, the study uses Fixed Effect Panel Regression Model. The conclusions of the study are that infrastructural investment has a desirable impact on economic development and that the impact is different for different states in India. We analyze time series data (annual frequency) ranging from 1991 to 2010. The study reveals that the infrastructural investment significantly explains the variation of economic indicators.

Keywords: infrastructural investment, multiple regression, panel regression techniques, economic development, fixed effect dummy variable model

Procedia PDF Downloads 342
8581 Comparison of Applicability of Time Series Forecasting Models VAR, ARCH and ARMA in Management Science: Study Based on Empirical Analysis of Time Series Techniques

Authors: Muhammad Tariq, Hammad Tahir, Fawwad Mahmood Butt

Abstract:

Purpose: This study attempts to examine the best forecasting methodologies in the time series. The time series forecasting models such as VAR, ARCH and the ARMA are considered for the analysis. Methodology: The Bench Marks or the parameters such as Adjusted R square, F-stats, Durban Watson, and Direction of the roots have been critically and empirically analyzed. The empirical analysis consists of time series data of Consumer Price Index and Closing Stock Price. Findings: The results show that the VAR model performed better in comparison to other models. Both the reliability and significance of VAR model is highly appreciable. In contrary to it, the ARCH model showed very poor results for forecasting. However, the results of ARMA model appeared double standards i.e. the AR roots showed that model is stationary and that of MA roots showed that the model is invertible. Therefore, the forecasting would remain doubtful if it made on the bases of ARMA model. It has been concluded that VAR model provides best forecasting results. Practical Implications: This paper provides empirical evidences for the application of time series forecasting model. This paper therefore provides the base for the application of best time series forecasting model.

Keywords: forecasting, time series, auto regression, ARCH, ARMA

Procedia PDF Downloads 301
8580 A Quadratic Model to Early Predict the Blastocyst Stage with a Time Lapse Incubator

Authors: Cecile Edel, Sandrine Giscard D'Estaing, Elsa Labrune, Jacqueline Lornage, Mehdi Benchaib

Abstract:

Introduction: The use of incubator equipped with time-lapse technology in Artificial Reproductive Technology (ART) allows a continuous surveillance. With morphocinetic parameters, algorithms are available to predict the potential outcome of an embryo. However, the different proposed time-lapse algorithms do not take account the missing data, and then some embryos could not be classified. The aim of this work is to construct a predictive model even in the case of missing data. Materials and methods: Patients: A retrospective study was performed, in biology laboratory of reproduction at the hospital ‘Femme Mère Enfant’ (Lyon, France) between 1 May 2013 and 30 April 2015. Embryos (n= 557) obtained from couples (n=108) were cultured in a time-lapse incubator (Embryoscope®, Vitrolife, Goteborg, Sweden). Time-lapse incubator: The morphocinetic parameters obtained during the three first days of embryo life were used to build the predictive model. Predictive model: A quadratic regression was performed between the number of cells and time. N = a. T² + b. T + c. N: number of cells at T time (T in hours). The regression coefficients were calculated with Excel software (Microsoft, Redmond, WA, USA), a program with Visual Basic for Application (VBA) (Microsoft) was written for this purpose. The quadratic equation was used to find a value that allows to predict the blastocyst formation: the synthetize value. The area under the curve (AUC) obtained from the ROC curve was used to appreciate the performance of the regression coefficients and the synthetize value. A cut-off value has been calculated for each regression coefficient and for the synthetize value to obtain two groups where the difference of blastocyst formation rate according to the cut-off values was maximal. The data were analyzed with SPSS (IBM, Il, Chicago, USA). Results: Among the 557 embryos, 79.7% had reached the blastocyst stage. The synthetize value corresponds to the value calculated with time value equal to 99, the highest AUC was then obtained. The AUC for regression coefficient ‘a’ was 0.648 (p < 0.001), 0.363 (p < 0.001) for the regression coefficient ‘b’, 0.633 (p < 0.001) for the regression coefficient ‘c’, and 0.659 (p < 0.001) for the synthetize value. The results are presented as follow: blastocyst formation rate under cut-off value versus blastocyst rate formation above cut-off value. For the regression coefficient ‘a’ the optimum cut-off value was -1.14.10-3 (61.3% versus 84.3%, p < 0.001), 0.26 for the regression coefficient ‘b’ (83.9% versus 63.1%, p < 0.001), -4.4 for the regression coefficient ‘c’ (62.2% versus 83.1%, p < 0.001) and 8.89 for the synthetize value (58.6% versus 85.0%, p < 0.001). Conclusion: This quadratic regression allows to predict the outcome of an embryo even in case of missing data. Three regression coefficients and a synthetize value could represent the identity card of an embryo. ‘a’ regression coefficient represents the acceleration of cells division, ‘b’ regression coefficient represents the speed of cell division. We could hypothesize that ‘c’ regression coefficient could represent the intrinsic potential of an embryo. This intrinsic potential could be dependent from oocyte originating the embryo. These hypotheses should be confirmed by studies analyzing relationship between regression coefficients and ART parameters.

Keywords: ART procedure, blastocyst formation, time-lapse incubator, quadratic model

Procedia PDF Downloads 283
8579 Early Impact Prediction and Key Factors Study of Artificial Intelligence Patents: A Method Based on LightGBM and Interpretable Machine Learning

Authors: Xingyu Gao, Qiang Wu

Abstract:

Patents play a crucial role in protecting innovation and intellectual property. Early prediction of the impact of artificial intelligence (AI) patents helps researchers and companies allocate resources and make better decisions. Understanding the key factors that influence patent impact can assist researchers in gaining a better understanding of the evolution of AI technology and innovation trends. Therefore, identifying highly impactful patents early and providing support for them holds immeasurable value in accelerating technological progress, reducing research and development costs, and mitigating market positioning risks. Despite the extensive research on AI patents, accurately predicting their early impact remains a challenge. Traditional methods often consider only single factors or simple combinations, failing to comprehensively and accurately reflect the actual impact of patents. This paper utilized the artificial intelligence patent database from the United States Patent and Trademark Office and the Len.org patent retrieval platform to obtain specific information on 35,708 AI patents. Using six machine learning models, namely Multiple Linear Regression, Random Forest Regression, XGBoost Regression, LightGBM Regression, Support Vector Machine Regression, and K-Nearest Neighbors Regression, and using early indicators of patents as features, the paper comprehensively predicted the impact of patents from three aspects: technical, social, and economic. These aspects include the technical leadership of patents, the number of citations they receive, and their shared value. The SHAP (Shapley Additive exPlanations) metric was used to explain the predictions of the best model, quantifying the contribution of each feature to the model's predictions. The experimental results on the AI patent dataset indicate that, for all three target variables, LightGBM regression shows the best predictive performance. Specifically, patent novelty has the greatest impact on predicting the technical impact of patents and has a positive effect. Additionally, the number of owners, the number of backward citations, and the number of independent claims are all crucial and have a positive influence on predicting technical impact. In predicting the social impact of patents, the number of applicants is considered the most critical input variable, but it has a negative impact on social impact. At the same time, the number of independent claims, the number of owners, and the number of backward citations are also important predictive factors, and they have a positive effect on social impact. For predicting the economic impact of patents, the number of independent claims is considered the most important factor and has a positive impact on economic impact. The number of owners, the number of sibling countries or regions, and the size of the extended patent family also have a positive influence on economic impact. The study primarily relies on data from the United States Patent and Trademark Office for artificial intelligence patents. Future research could consider more comprehensive data sources, including artificial intelligence patent data, from a global perspective. While the study takes into account various factors, there may still be other important features not considered. In the future, factors such as patent implementation and market applications may be considered as they could have an impact on the influence of patents.

Keywords: patent influence, interpretable machine learning, predictive models, SHAP

Procedia PDF Downloads 16
8578 Interactions between Residential Mobility, Car Ownership and Commute Mode: The Case for Melbourne

Authors: Solmaz Jahed Shiran, John Hearne, Tayebeh Saghapour

Abstract:

Daily travel behavior is strongly influenced by the location of the places of residence, education, and employment. Hence a change in those locations due to a move or changes in an occupation leads to a change in travel behavior. Given the interventions of housing mobility and travel behaviors, the hypothesis is that a mobile housing market allows households to move as a result of any change in their life course, allowing them to be closer to central services, public transport facilities and workplace and hence reducing the time spent by individuals on daily travel. Conversely, household’s immobility may lead to longer commutes of residents, for example, after a change of a job or a need for new services such as schools for children who have reached their school age. This paper aims to investigate the association between residential mobility and travel behavior. The Victorian Integrated Survey of Travel and Activity (VISTA) data is used for the empirical analysis. Car ownership and journey to work time and distance of employed people are used as indicators of travel behavior. Change of usual residence within the last five years used to identify movers and non-movers. Statistical analysis, including regression models, is used to compare the travel behavior of movers and non-movers. The results show travel time, and the distance does not differ for movers and non-movers. However, this is not the case when taking into account the residence tenure-type. In addition, car ownership rate and number found to be significantly higher for non-movers. It is hoped that the results from this study will contribute to a better understanding of factors other than common socioeconomic and built environment features influencing travel behavior.

Keywords: journey to work, regression models, residential mobility, commute mode, car ownership

Procedia PDF Downloads 95
8577 Understanding the Linkages of Human Development and Fertility Change in Districts of Uttar Pradesh

Authors: Mamta Rajbhar, Sanjay K. Mohanty

Abstract:

India's progress in achieving replacement level of fertility is largely contingent on fertility reduction in the state of Uttar Pradesh as it accounts 17% of India's population with a low level of development. Though the TFR in the state has declined from 5.1 in 1991 to 3.4 by 2011, it conceals large differences in fertility level across districts. Using data from multiple sources this paper tests the hypothesis that the improvement in human development significantly reduces the fertility levels in districts of Uttar Pradesh. The unit of analyses is district, and fertility estimates are derived using the reverse survival method(RSM) while human development indices(HDI) are are estimated using uniform methodology adopted by UNDP for three period. The correlation and linear regression models are used to examine the relationship of fertility change and human development indices across districts. Result show the large variation and significant change in fertility level among the districts of Uttar Pradesh. During 1991-2011, eight districts had experienced a decline of TFR by 10-20%, 30 districts by 20-30% and 32 districts had experienced decline of more than 30%. On human development aspect, 17 districts recorded increase of more than 0.170 in HDI, 18 districts in the range of 0.150-0.170, 29 districts between 0.125-0.150 and six districts in the range of 0.1-0.125 during 1991-2011. Study shows significant negative relationship between HDI and TFR. HDI alone explains 70% variation in TFR. Also, the regression coefficient of TFR and HDI has become stronger over time; from -0.524 in 1991, -0.7477 by 2001 and -0.7181 by 2010. The regression analyses indicate that 0.1 point increase in HDI value will lead to 0.78 point decline in TFR. The HDI alone explains 70% variation in TFR. Improving the HDI will certainly reduce the fertility level in the districts.

Keywords: Fertility, HDI, Uttar Pradesh

Procedia PDF Downloads 215
8576 Plot Scale Estimation of Crop Biophysical Parameters from High Resolution Satellite Imagery

Authors: Shreedevi Moharana, Subashisa Dutta

Abstract:

The present study focuses on the estimation of crop biophysical parameters like crop chlorophyll, nitrogen and water stress at plot scale in the crop fields. To achieve these, we have used high-resolution satellite LISS IV imagery. A new methodology has proposed in this research work, the spectral shape function of paddy crop is employed to get the significant wavelengths sensitive to paddy crop parameters. From the shape functions, regression index models were established for the critical wavelength with minimum and maximum wavelengths of multi-spectrum high-resolution LISS IV data. Moreover, the functional relationships were utilized to develop the index models. From these index models crop, biophysical parameters were estimated and mapped from LISS IV imagery at plot scale in crop field level. The result showed that the nitrogen content of the paddy crop varied from 2-8%, chlorophyll from 1.5-9% and water content variation observed from 40-90% respectively. It was observed that the variability in rice agriculture system in India was purely a function of field topography.

Keywords: crop parameters, index model, LISS IV imagery, plot scale, shape function

Procedia PDF Downloads 137