Search results for: nonparametric geographically weighted regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3748

Search results for: nonparametric geographically weighted regression

3478 A Weighted Sum Particle Swarm Approach (WPSO) Combined with a Novel Feasibility-Based Ranking Strategy for Constrained Multi-Objective Optimization of Compact Heat Exchangers

Authors: Milad Yousefi, Moslem Yousefi, Ricarpo Poley, Amer Nordin Darus

Abstract:

Design optimization of heat exchangers is a very complicated task that has been traditionally carried out based on a trial-and-error procedure. To overcome the difficulties of the conventional design approaches especially when a large number of variables, constraints and objectives are involved, a new method based on a well-stablished evolutionary algorithm, particle swarm optimization (PSO), weighted sum approach and a novel constraint handling strategy is presented in this study. Since, the conventional constraint handling strategies are not effective and easy-to-implement in multi-objective algorithms, a novel feasibility-based ranking strategy is introduced which is both extremely user-friendly and effective. A case study from industry has been investigated to illustrate the performance of the presented approach. The results show that the proposed algorithm can find the near pareto-optimal with higher accuracy when it is compared to conventional non-dominated sorting genetic algorithm II (NSGA-II). Moreover, the difficulties of a trial-and-error process for setting the penalty parameters is solved in this algorithm.

Keywords: Heat exchanger, Multi-objective optimization, Particle swarm optimization, NSGA-II Constraints handling.

Procedia PDF Downloads 531
3477 Regression Analysis in Estimating Stream-Flow and the Effect of Hierarchical Clustering Analysis: A Case Study in Euphrates-Tigris Basin

Authors: Goksel Ezgi Guzey, Bihrat Onoz

Abstract:

The scarcity of streamflow gauging stations and the increasing effects of global warming cause designing water management systems to be very difficult. This study is a significant contribution to assessing regional regression models for estimating streamflow. In this study, simulated meteorological data was related to the observed streamflow data from 1971 to 2020 for 33 stream gauging stations of the Euphrates-Tigris Basin. Ordinary least squares regression was used to predict flow for 2020-2100 with the simulated meteorological data. CORDEX- EURO and CORDEX-MENA domains were used with 0.11 and 0.22 grids, respectively, to estimate climate conditions under certain climate scenarios. Twelve meteorological variables simulated by two regional climate models, RCA4 and RegCM4, were used as independent variables in the ordinary least squares regression, where the observed streamflow was the dependent variable. The variability of streamflow was then calculated with 5-6 meteorological variables and watershed characteristics such as area and height prior to the application. Of the regression analysis of 31 stream gauging stations' data, the stations were subjected to a clustering analysis, which grouped the stations in two clusters in terms of their hydrometeorological properties. Two streamflow equations were found for the two clusters of stream gauging stations for every domain and every regional climate model, which increased the efficiency of streamflow estimation by a range of 10-15% for all the models. This study underlines the importance of homogeneity of a region in estimating streamflow not only in terms of the geographical location but also in terms of the meteorological characteristics of that region.

Keywords: hydrology, streamflow estimation, climate change, hydrologic modeling, HBV, hydropower

Procedia PDF Downloads 95
3476 Undernutrition Among Children Below Five Years of Age in Uganda: A Deep Dive into Space and Time

Authors: Vallence Ngabo Maniragaba

Abstract:

This study aimed at examining the variations of undernutrition among children below 5 years of age in Uganda. The approach of spatial and spatiotemporal analysis helped in identifying cluster patterns, hot spots and emerging hot spots. Data from the 6 Uganda Demographic and Health Surveys spanning from 1990 to 2016 were used with the main outcome variable being undernutrition among children <5 years of age. All data that were relevant to this study were retrieved from the survey datasets and combined with the 214 shape files for the districts of Uganda to enable spatial and spatiotemporal analysis. Spatial maps with the spatial distribution of the prevalence of undernutrition, both in space and time, were generated using ArcGIS Pro version 2.8. Moran’s I, an index of spatial autocorrelation, rules out doubts of spatial randomness in order to identify spatially clustered patterns of hot or cold spot areas. Furthermore, space-time cubes were generated to establish the trend in undernutrition as well as to mirror its variations over time and across Uganda. Moreover, emerging hot spot analysis was done to help identify the patterns of undernutrition over time. The results indicate a heterogeneous distribution of undernutrition across Uganda and the same variations were also evident over time. Moran’s I index confirmed spatial clustered patterns as opposed to random distributions of undernutrition prevalence. Four hot spot areas, namely; the Karamoja, the Sebei, the West Nile and the Toro regions were significantly evident, most of the central parts of Uganda were identified as cold spot clusters, while most of Western Uganda, the Acholi and the Lango regions had no statistically significant spatial patterns by the year 2016. The spatio-temporal analysis identified the Karamoja and Sebei regions as clusters of persistent, consecutive and intensifying hot spots, West Nile region was identified as a sporadic hot spot area while the Toro region was identified with both sporadic and emerging hotspots. In conclusion, undernutrition is a silent pandemic that needs to be handled with both hands. At 31.2 percent, the prevalence is still very high and unpleasant. The distribution across the country is nonuniform with some areas such as the Karamoja, the West Nile, the Sebei and the Toro regions being epicenters of undernutrition in Uganda. Over time, the same areas have experienced and exhibited high undernutrition prevalence. Policymakers, as well as the implementers, should bear in mind the spatial variations across the country and prioritize hot spot areas in order to have efficient, timely and region-specific interventions.

Keywords: undernutrition, spatial autocorrelation, hotspots analysis, geographically weighted regressions, emerging hotspots analysis, under-fives, Uganda

Procedia PDF Downloads 52
3475 The Impact of Governance on Happiness: Evidence from Quantile Regressions

Authors: Chiung-Ju Huang

Abstract:

This study utilizes the quantile regression analysis to examine the impact of governance (including democratic quality and technical quality) on happiness in 101 countries worldwide, classified as “developed countries” and “developing countries”. The empirical results show that the impact of democratic quality and technical quality on happiness is significantly positive for “developed countries”, while is insignificant for “developing countries”. The results suggest that the authorities in developed countries can enhance the level of individual happiness by means of improving the democracy quality and technical quality. However, for developing countries, promoting the quality of governance in order to enhance the level of happiness may not be effective. Policy makers in developed countries may pay more attention on increasing real GDP per capita instead of promoting the quality of governance to enhance individual happiness.

Keywords: governance, happiness, multiple regression, quantile regression

Procedia PDF Downloads 249
3474 Conflation Methodology Applied to Flood Recovery

Authors: Eva L. Suarez, Daniel E. Meeroff, Yan Yong

Abstract:

Current flooding risk modeling focuses on resilience, defined as the probability of recovery from a severe flooding event. However, the long-term damage to property and well-being by nuisance flooding and its long-term effects on communities are not typically included in risk assessments. An approach was developed to address the probability of recovering from a severe flooding event combined with the probability of community performance during a nuisance event. A consolidated model, namely the conflation flooding recovery (&FR) model, evaluates risk-coping mitigation strategies for communities based on the recovery time from catastrophic events, such as hurricanes or extreme surges, and from everyday nuisance flooding events. The &FR model assesses the variation contribution of each independent input and generates a weighted output that favors the distribution with minimum variation. This approach is especially useful if the input distributions have dissimilar variances. The &FR is defined as a single distribution resulting from the product of the individual probability density functions. The resulting conflated distribution resides between the parent distributions, and it infers the recovery time required by a community to return to basic functions, such as power, utilities, transportation, and civil order, after a flooding event. The &FR model is more accurate than averaging individual observations before calculating the mean and variance or averaging the probabilities evaluated at the input values, which assigns the same weighted variation to each input distribution. The main disadvantage of these traditional methods is that the resulting measure of central tendency is exactly equal to the average of the input distribution’s means without the additional information provided by each individual distribution variance. When dealing with exponential distributions, such as resilience from severe flooding events and from nuisance flooding events, conflation results are equivalent to the weighted least squares method or best linear unbiased estimation. The combination of severe flooding risk with nuisance flooding improves flood risk management for highly populated coastal communities, such as in South Florida, USA, and provides a method to estimate community flood recovery time more accurately from two different sources, severe flooding events and nuisance flooding events.

Keywords: community resilience, conflation, flood risk, nuisance flooding

Procedia PDF Downloads 61
3473 Breast Cancer Mortality and Comorbidities in Portugal: A Predictive Model Built with Real World Data

Authors: Cecília M. Antão, Paulo Jorge Nogueira

Abstract:

Breast cancer (BC) is the first cause of cancer mortality among Portuguese women. This retrospective observational study aimed at identifying comorbidities associated with BC female patients admitted to Portuguese public hospitals (2010-2018), investigating the effect of comorbidities on BC mortality rate, and building a predictive model using logistic regression. Results showed that the BC mortality in Portugal decreased in this period and reached 4.37% in 2018. Adjusted odds ratio indicated that secondary malignant neoplasms of liver, of bone and bone marrow, congestive heart failure, and diabetes were associated with an increased chance of dying from breast cancer. Although the Lisbon district (the most populated area) accounted for the largest percentage of BC patients, the logistic regression model showed that, besides patient’s age, being resident in Bragança, Castelo Branco, or Porto districts was directly associated with an increase of the mortality rate.

Keywords: breast cancer, comorbidities, logistic regression, adjusted odds ratio

Procedia PDF Downloads 56
3472 Assessing Relationships between Glandularity and Gray Level by Using Breast Phantoms

Authors: Yun-Xuan Tang, Pei-Yuan Liu, Kun-Mu Lu, Min-Tsung Tseng, Liang-Kuang Chen, Yuh-Feng Tsai, Ching-Wen Lee, Jay Wu

Abstract:

Breast cancer is predominant of malignant tumors in females. The increase in the glandular density increases the risk of breast cancer. BI-RADS is a frequently used density indicator in mammography; however, it significantly overestimates the glandularity. Therefore, it is very important to accurately and quantitatively assess the glandularity by mammography. In this study, 20%, 30% and 50% glandularity phantoms were exposed using a mammography machine at 28, 30 and 31 kVp, and 30, 55, 80 and 105 mAs, respectively. The regions of interest (ROIs) were drawn to assess the gray level. The relationship between the glandularity and gray level under various compression thicknesses, kVp, and mAs was established by the multivariable linear regression. A phantom verification was performed with automatic exposure control (AEC). The regression equation was obtained with an R-square value of 0.928. The average gray levels of the verification phantom were 8708, 8660 and 8434 for 0.952, 0.963 and 0.985 g/cm3, respectively. The percent differences of glandularity to the regression equation were 3.24%, 2.75% and 13.7%. We concluded that the proposed method could be clinically applied in mammography to improve the glandularity estimation and further increase the importance of breast cancer screening.

Keywords: mammography, glandularity, gray value, BI-RADS

Procedia PDF Downloads 464
3471 An Analysis of the Regression Hypothesis from a Shona Broca’s Aphasci Perspective

Authors: Esther Mafunda, Simbarashe Muparangi

Abstract:

The present paper tests the applicability of the Regression Hypothesis on the pathological language dissolution of a Shona male adult with Broca’s aphasia. It particularly assesses the prediction of the Regression Hypothesis, which states that the process according to which language is forgotten will be the reversal of the process according to which it will be acquired. The main aim of the paper is to find out whether mirror symmetries between L1 acquisition and L1 dissolution of tense in Shona and, if so, what might cause these regression patterns. The paper also sought to highlight the practical contributions that Linguistic theory can make to solving language-related problems. Data was collected from a 46-year-old male adult with Broca’s aphasia who was receiving speech therapy at St Giles Rehabilitation Centre in Harare, Zimbabwe. The primary data elicitation method was experimental, using the probe technique. The TART (Test for Assessing Reference Time) Shona version in the form of sequencing pictures was used to access tense by Broca’s aphasic and 3.5-year-old child. Using the SPSS (Statistical Package for Social Studies) and Excel analysis, it was established that the use of the future tense was impaired in Shona Broca’s aphasic whilst the present and past tense was intact. However, though the past tense was intact in the male adult with Broca’s aphasic, a reference to the remote past was made. The use of the future tense was also found to be difficult for the 3,5-year-old speaking child. No difficulties were encountered in using the present and past tenses. This means that mirror symmetries were found between L1 acquisition and L1 dissolution of tense in Shona. On the basis of the results of this research, it can be concluded that the use of tense in a Shona adult with Broca’s aphasia supports the Regression Hypothesis. The findings of this study are important in terms of speech therapy in the context of Zimbabwe. The study also contributes to Bantu linguistics in general and to Shona linguistics in particular. Further studies could also be done focusing on the rest of the Bantu language varieties in terms of aphasia.

Keywords: Broca’s Aphasia, regression hypothesis, Shona, language dissolution

Procedia PDF Downloads 60
3470 Apricot Insurance Portfolio Risk

Authors: Kasirga Yildirak, Ismail Gur

Abstract:

We propose a model to measure hail risk of an Agricultural Insurance portfolio. Hail is one of the major catastrophic event that causes big amount of loss to an insurer. Moreover, it is very hard to predict due to its strange atmospheric characteristics. We make use of parcel based claims data on apricot damage collected by the Turkish Agricultural Insurance Pool (TARSIM). As our ultimate aim is to compute the loadings assigned to specific parcels, we build a portfolio risk model that makes use of PD and the severity of the exposures. PD is computed by Spherical-Linear and Circular –Linear regression models as the data carries coordinate information and seasonality. Severity is mapped into integer brackets so that Probability Generation Function could be employed. Individual regressions are run on each clusters estimated on different criteria. Loss distribution is constructed by Panjer Recursion technique. We also show that one risk-one crop model can easily be extended to the multi risk–multi crop model by assuming conditional independency.

Keywords: hail insurance, spherical regression, circular regression, spherical clustering

Procedia PDF Downloads 228
3469 Use of Thrombolytics for Acute Myocardial Infarctions in Resource-Limited Settings, Globally: A Systematic Literature Review

Authors: Sara Zelman, Courtney Meyer, Hiren Patel, Lisa Philpotts, Sue Lahey, Thomas Burke

Abstract:

Background: As the global burden of disease shifts from infectious diseases to noncommunicable diseases, there is growing urgency to provide treatment for time-sensitive illnesses, such as ST-Elevation Myocardial Infarctions (STEMIs). The standard of care for STEMIs in developed countries is Percutaneous Coronary Intervention (PCI). However, this is inaccessible in resource-limited settings. Before the discovery of PCI, Streptokinase (STK) and other thrombolytic drugs were first-line treatments for STEMIs. STK has been recognized as a cost-effective and safe treatment for STEMIs; however, in settings which lack access to PCI, it has not become the established second-line therapy. A systematic literature review was conducted to geographically map the use of STK for STEMIs in resource-limited settings. Methods: Our literature review group searched the databases Cinhal, Embase, Ovid, Pubmed, Web of Science, and WHO’s Index Medicus. The search terms included ‘thrombolytics’ AND ‘myocardial infarction’ AND ‘resource-limited’ and were restricted to human studies and papers written in English. A considerable number of studies came from Latin America; however, these studies were not written in English and were excluded. The initial search yielded 3,487 articles, which was reduced to 3,196 papers after titles were screened. Three medical professionals then screened abstracts, from which 291 articles were selected for full-text review and 94 papers were chosen for final inclusion. These articles were then analyzed and mapped geographically. Results: This systematic literature review revealed that STK has been used for the treatment of STEMIs in 33 resource-limited countries, with 18 of 94 studies taking place in India. Furthermore, 13 studies occurred in Pakistan, followed by Iran (6), Sri Lanka (5), Brazil (4), China (4), and South Africa (4). Conclusion: Our systematic review revealed that STK has been used for the treatment of STEMIs in 33 resource-limited countries, with the highest utilization occurring in India. This demonstrates that even though STK has high utility for STEMI treatment in resource-limited settings, it still has not become the standard of care. Future research should investigate the barriers preventing the establishment of STK use as second-line treatment after PCI.

Keywords: cardiovascular disease, global health, resource-limited setting, ST-Elevation Myocardial Infarction, Streptokinase

Procedia PDF Downloads 122
3468 Enhancing the Interpretation of Group-Level Diagnostic Results from Cognitive Diagnostic Assessment: Application of Quantile Regression and Cluster Analysis

Authors: Wenbo Du, Xiaomei Ma

Abstract:

With the empowerment of Cognitive Diagnostic Assessment (CDA), various domains of language testing and assessment have been investigated to dig out more diagnostic information. What is noticeable is that most of the extant empirical CDA-based research puts much emphasis on individual-level diagnostic purpose with very few concerned about learners’ group-level performance. Even though the personalized diagnostic feedback is the unique feature that differentiates CDA from other assessment tools, group-level diagnostic information cannot be overlooked in that it might be more practical in classroom setting. Additionally, the group-level diagnostic information obtained via current CDA always results in a “flat pattern”, that is, the mastery/non-mastery of all tested skills accounts for the two highest proportion. In that case, the outcome does not bring too much benefits than the original total score. To address these issues, the present study attempts to apply cluster analysis for group classification and quantile regression analysis to pinpoint learners’ performance at different proficiency levels (beginner, intermediate and advanced) thus to enhance the interpretation of the CDA results extracted from a group of EFL learners’ reading performance on a diagnostic reading test designed by PELDiaG research team from a key university in China. The results show that EM method in cluster analysis yield more appropriate classification results than that of CDA, and quantile regression analysis does picture more insightful characteristics of learners with different reading proficiencies. The findings are helpful and practical for instructors to refine EFL reading curriculum and instructional plan tailored based on the group classification results and quantile regression analysis. Meanwhile, these innovative statistical methods could also make up the deficiencies of CDA and push forward the development of language testing and assessment in the future.

Keywords: cognitive diagnostic assessment, diagnostic feedback, EFL reading, quantile regression

Procedia PDF Downloads 120
3467 A Geographical Information System Supported Method for Determining Urban Transformation Areas in the Scope of Disaster Risks in Kocaeli

Authors: Tayfun Salihoğlu

Abstract:

Following the Law No: 6306 on Transformation of Disaster Risk Areas, urban transformation in Turkey found its legal basis. In the best practices all over the World, the urban transformation was shaped as part of comprehensive social programs through the discourses of renewing the economic, social and physical degraded parts of the city, producing spaces resistant to earthquakes and other possible disasters and creating a livable environment. In Turkish practice, a contradictory process is observed. In this study, it is aimed to develop a method for better understanding of the urban space in terms of disaster risks in order to constitute a basis for decisions in Kocaeli Urban Transformation Master Plan, which is being prepared by Kocaeli Metropolitan Municipality. The spatial unit used in the study is the 50x50 meter grids. In order to reflect the multidimensionality of urban transformation, three basic components that have spatial data in Kocaeli were identified. These components were named as 'Problems in Built-up Areas', 'Disaster Risks arising from Geological Conditions of the Ground and Problems of Buildings', and 'Inadequacy of Urban Services'. Each component was weighted and scored for each grid. In order to delimitate urban transformation zones Optimized Outlier Analysis (Local Moran I) in the ArcGIS 10.6.1 was conducted to test the type of distribution (clustered or scattered) and its significance on the grids by assuming the weighted total score of the grid as Input Features. As a result of this analysis, it was found that the weighted total scores were not significantly clustering at all grids in urban space. The grids which the input feature is clustered significantly were exported as the new database to use in further mappings. Total Score Map reflects the significant clusters in terms of weighted total scores of 'Problems in Built-up Areas', 'Disaster Risks arising from Geological Conditions of the Ground and Problems of Buildings' and 'Inadequacy of Urban Services'. Resulting grids with the highest scores are the most likely candidates for urban transformation in this citywide study. To categorize urban space in terms of urban transformation, Grouping Analysis in ArcGIS 10.6.1 was conducted to data that includes each component scores in significantly clustered grids. Due to Pseudo Statistics and Box Plots, 6 groups with the highest F stats were extracted. As a result of the mapping of the groups, it can be said that 6 groups can be interpreted in a more meaningful manner in relation to the urban space. The method presented in this study can be magnified due to the availability of more spatial data. By integrating with other data to be obtained during the planning process, this method can contribute to the continuation of research and decision-making processes of urban transformation master plans on a more consistent basis.

Keywords: urban transformation, GIS, disaster risk assessment, Kocaeli

Procedia PDF Downloads 94
3466 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM

Procedia PDF Downloads 382
3465 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining

Procedia PDF Downloads 93
3464 Multi-Temporal Cloud Detection and Removal in Satellite Imagery for Land Resources Investigation

Authors: Feng Yin

Abstract:

Clouds are inevitable contaminants in optical satellite imagery, and prevent the satellite imaging systems from acquiring clear view of the earth surface. The presence of clouds in satellite imagery bring negative influences for remote sensing land resources investigation. As a consequence, detecting the locations of clouds in satellite imagery is an essential preprocessing step, and further remove the existing clouds is crucial for the application of imagery. In this paper, a multi-temporal based satellite imagery cloud detection and removal method is proposed, which will be used for large-scale land resource investigation. The proposed method is mainly composed of four steps. First, cloud masks are generated for cloud contaminated images by single temporal cloud detection based on multiple spectral features. Then, a cloud-free reference image of target areas is synthesized by weighted averaging time-series images in which cloud pixels are ignored. Thirdly, the refined cloud detection results are acquired by multi-temporal analysis based on the reference image. Finally, detected clouds are removed via multi-temporal linear regression. The results of a case application in Hubei province indicate that the proposed multi-temporal cloud detection and removal method is effective and promising for large-scale land resource investigation.

Keywords: cloud detection, cloud remove, multi-temporal imagery, land resources investigation

Procedia PDF Downloads 252
3463 Study on Optimal Control Strategy of PM2.5 in Wuhan, China

Authors: Qiuling Xie, Shanliang Zhu, Zongdi Sun

Abstract:

In this paper, we analyzed the correlation relationship among PM2.5 from other five Air Quality Indices (AQIs) based on the grey relational degree, and built a multivariate nonlinear regression equation model of PM2.5 and the five monitoring indexes. For the optimal control problem of PM2.5, we took the partial large Cauchy distribution of membership equation as satisfaction function. We established a nonlinear programming model with the goal of maximum performance to price ratio. And the optimal control scheme is given.

Keywords: grey relational degree, multiple linear regression, membership function, nonlinear programming

Procedia PDF Downloads 261
3462 SVM-Based Modeling of Mass Transfer Potential of Multiple Plunging Jets

Authors: Surinder Deswal, Mahesh Pal

Abstract:

The paper investigates the potential of support vector machines based regression approach to model the mass transfer capacity of multiple plunging jets, both vertical (θ = 90°) and inclined (θ = 60°). The data set used in this study consists of four input parameters with a total of eighty eight cases. For testing, tenfold cross validation was used. Correlation coefficient values of 0.971 and 0.981 (root mean square error values of 0.0025 and 0.0020) were achieved by using polynomial and radial basis kernel functions based support vector regression respectively. Results suggest an improved performance by radial basis function in comparison to polynomial kernel based support vector machines. The estimated overall mass transfer coefficient, by both the kernel functions, is in good agreement with actual experimental values (within a scatter of ±15 %); thereby suggesting the utility of support vector machines based regression approach.

Keywords: mass transfer, multiple plunging jets, support vector machines, ecological sciences

Procedia PDF Downloads 424
3461 Reducing Uncertainty in Climate Projections over Uganda by Numerical Models Using Bias Correction

Authors: Isaac Mugume

Abstract:

Since the beginning of the 21st century, climate change has been an issue due to the reported rise in global temperature and changes in the frequency as well as severity of extreme weather and climatic events. The changing climate has been attributed to rising concentrations of greenhouse gases, including environmental changes such as ecosystems and land-uses. Climatic projections have been carried out under the auspices of the intergovernmental panel on climate change where a couple of models have been run to inform us about the likelihood of future climates. Since one of the major forcings informing the changing climate is emission of greenhouse gases, different scenarios have been proposed and future climates for different periods presented. The global climate models project different areas to experience different impacts. While regional modeling is being carried out for high impact studies, bias correction is less documented. Yet, the regional climate models suffer bias which introduces uncertainty. This is addressed in this study by bias correcting the regional models. This study uses the Weather Research and Forecasting model under different representative concentration pathways and correcting the products of these models using observed climatic data. This study notes that bias correction (e.g., the running-mean bias correction; the best easy systematic estimator method; the simple linear regression method, nearest neighborhood, weighted mean) improves the climatic projection skill and therefore reduce the uncertainty inherent in the climatic projections.

Keywords: bias correction, climatic projections, numerical models, representative concentration pathways

Procedia PDF Downloads 87
3460 Supervised-Component-Based Generalised Linear Regression with Multiple Explanatory Blocks: THEME-SCGLR

Authors: Bry X., Trottier C., Mortier F., Cornu G., Verron T.

Abstract:

We address component-based regularization of a Multivariate Generalized Linear Model (MGLM). A set of random responses Y is assumed to depend, through a GLM, on a set X of explanatory variables, as well as on a set T of additional covariates. X is partitioned into R conceptually homogeneous blocks X1, ... , XR , viewed as explanatory themes. Variables in each Xr are assumed many and redundant. Thus, Generalised Linear Regression (GLR) demands regularization with respect to each Xr. By contrast, variables in T are assumed selected so as to demand no regularization. Regularization is performed searching each Xr for an appropriate number of orthogonal components that both contribute to model Y and capture relevant structural information in Xr. We propose a very general criterion to measure structural relevance (SR) of a component in a block, and show how to take SR into account within a Fisher-scoring-type algorithm in order to estimate the model. We show how to deal with mixed-type explanatory variables. The method, named THEME-SCGLR, is tested on simulated data.

Keywords: Component-Model, Fisher Scoring Algorithm, GLM, PLS Regression, SCGLR, SEER, THEME

Procedia PDF Downloads 372
3459 A Moving Target: Causative Factors for Geographic Variation in a Handed Flower

Authors: Celeste De Kock, Bruce Anderson, Corneile Minnaar

Abstract:

Geographic variation in the floral morphology of a flower species has often been assumed to result from co-variation in the availability of regionally-specific functional pollinator types, giving rise to plant ecotypes that are adapted to the morphology of the main pollinator types in that area. Wachendorfia paniculata is a geographically variable enantiostylous (handed) flower with preliminary observations suggesting that differences in pollinator community composition might be driving differences in the degree of herkogamy (spatial separation of the stigma and anthers on the same flower) across its geographic range. This study aimed to determine if pollinator-related variables such as visitation rate and pollinator type could explain differences in floral morphology seen in different populations. To assess pollinator community compositions, pollinator visitation rates, and the degree of herkogamy and flower size, flowers from 13 populations were observed and measured across the Western Cape, South Africa. Multiple regression analyses indicated that pollinator-related variables had no significant effect on the degree of herkogamy between sites. However, the degree of herkogamy was strongly negatively associated with the time of measurement. It remains possible that pollinators have had an effect on the development of herkogamy throughout the evolutionary timeline of different W. paniculata populations, but not necessarily to the fine-scale degree, as was predicted for this study. Annual fluctuations in pollinator community composition, paired with recent disturbances such as urbanization and the overabundance of artificially introduced honeybee hives, might also result in the signal of pollinator adaptation getting lost. Surprisingly, differences in herkogamy between populations could largely be explained by the time of day at which flowers were measured, suggesting a significant narrowing of the distance between reproductive parts throughout the day. We propose that this floral movement could possibly be an adaptation to ensure pollination if pollinator visitation to a flower was not sufficient earlier in the day, and will be explored in subsequent studies.

Keywords: enantiostyly, floral movement, geographic variation, ecotypes

Procedia PDF Downloads 249
3458 Parameter Estimation via Metamodeling

Authors: Sergio Haram Sarmiento, Arcady Ponosov

Abstract:

Based on appropriate multivariate statistical methodology, we suggest a generic framework for efficient parameter estimation for ordinary differential equations and the corresponding nonlinear models. In this framework classical linear regression strategies is refined into a nonlinear regression by a locally linear modelling technique (known as metamodelling). The approach identifies those latent variables of the given model that accumulate most information about it among all approximations of the same dimension. The method is applied to several benchmark problems, in particular, to the so-called ”power-law systems”, being non-linear differential equations typically used in Biochemical System Theory.

Keywords: principal component analysis, generalized law of mass action, parameter estimation, metamodels

Procedia PDF Downloads 479
3457 Development of Computational Approach for Calculation of Hydrogen Solubility in Hydrocarbons for Treatment of Petroleum

Authors: Abdulrahman Sumayli, Saad M. AlShahrani

Abstract:

For the hydrogenation process, knowing the solubility of hydrogen (H2) in hydrocarbons is critical to improve the efficiency of the process. We investigated the H2 solubility computation in four heavy crude oil feedstocks using machine learning techniques. Temperature, pressure, and feedstock type were considered as the inputs to the models, while the hydrogen solubility was the sole response. Specifically, we employed three different models: Support Vector Regression (SVR), Gaussian process regression (GPR), and Bayesian ridge regression (BRR). To achieve the best performance, the hyper-parameters of these models are optimized using the whale optimization algorithm (WOA). We evaluated the models using a dataset of solubility measurements in various feedstocks, and we compared their performance based on several metrics. Our results show that the WOA-SVR model tuned with WOA achieves the best performance overall, with an RMSE of 1.38 × 10− 2 and an R-squared of 0.991. These findings suggest that machine learning techniques can provide accurate predictions of hydrogen solubility in different feedstocks, which could be useful in the development of hydrogen-related technologies. Besides, the solubility of hydrogen in the four heavy oil fractions is estimated in different ranges of temperatures and pressures of 150 ◦C–350 ◦C and 1.2 MPa–10.8 MPa, respectively

Keywords: temperature, pressure variations, machine learning, oil treatment

Procedia PDF Downloads 42
3456 Representativity Based Wasserstein Active Regression

Authors: Benjamin Bobbia, Matthias Picard

Abstract:

In recent years active learning methodologies based on the representativity of the data seems more promising to limit overfitting. The presented query methodology for regression using the Wasserstein distance measuring the representativity of our labelled dataset compared to the global distribution. In this work a crucial use of GroupSort Neural Networks is made therewith to draw a double advantage. The Wasserstein distance can be exactly expressed in terms of such neural networks. Moreover, one can provide explicit bounds for their size and depth together with rates of convergence. However, heterogeneity of the dataset is also considered by weighting the Wasserstein distance with the error of approximation at the previous step of active learning. Such an approach leads to a reduction of overfitting and high prediction performance after few steps of query. After having detailed the methodology and algorithm, an empirical study is presented in order to investigate the range of our hyperparameters. The performances of this method are compared, in terms of numbers of query needed, with other classical and recent query methods on several UCI datasets.

Keywords: active learning, Lipschitz regularization, neural networks, optimal transport, regression

Procedia PDF Downloads 56
3455 A Machine Learning Approach for Earthquake Prediction in Various Zones Based on Solar Activity

Authors: Viacheslav Shkuratskyy, Aminu Bello Usman, Michael O’Dea, Saifur Rahman Sabuj

Abstract:

This paper examines relationships between solar activity and earthquakes; it applied machine learning techniques: K-nearest neighbour, support vector regression, random forest regression, and long short-term memory network. Data from the SILSO World Data Center, the NOAA National Center, the GOES satellite, NASA OMNIWeb, and the United States Geological Survey were used for the experiment. The 23rd and 24th solar cycles, daily sunspot number, solar wind velocity, proton density, and proton temperature were all included in the dataset. The study also examined sunspots, solar wind, and solar flares, which all reflect solar activity and earthquake frequency distribution by magnitude and depth. The findings showed that the long short-term memory network model predicts earthquakes more correctly than the other models applied in the study, and solar activity is more likely to affect earthquakes of lower magnitude and shallow depth than earthquakes of magnitude 5.5 or larger with intermediate depth and deep depth.

Keywords: k-nearest neighbour, support vector regression, random forest regression, long short-term memory network, earthquakes, solar activity, sunspot number, solar wind, solar flares

Procedia PDF Downloads 36
3454 Robust Adaptation to Background Noise in Multichannel C-OTDR Monitoring Systems

Authors: Andrey V. Timofeev, Viktor M. Denisov

Abstract:

A robust sequential nonparametric method is proposed for adaptation to background noise parameters for real-time. The distribution of background noise was modelled like to Huber contamination mixture. The method is designed to operate as an adaptation-unit, which is included inside a detection subsystem of an integrated multichannel monitoring system. The proposed method guarantees the given size of a nonasymptotic confidence set for noise parameters. Properties of the suggested method are rigorously proved. The proposed algorithm has been successfully tested in real conditions of a functioning C-OTDR monitoring system, which was designed to monitor railways.

Keywords: guaranteed estimation, multichannel monitoring systems, non-asymptotic confidence set, contamination mixture

Procedia PDF Downloads 396
3453 A Hybrid Fuzzy Clustering Approach for Fertile and Unfertile Analysis

Authors: Shima Soltanzadeh, Mohammad Hosain Fazel Zarandi, Mojtaba Barzegar Astanjin

Abstract:

Diagnosis of male infertility by the laboratory tests is expensive and, sometimes it is intolerable for patients. Filling out the questionnaire and then using classification method can be the first step in decision-making process, so only in the cases with a high probability of infertility we can use the laboratory tests. In this paper, we evaluated the performance of four classification methods including naive Bayesian, neural network, logistic regression and fuzzy c-means clustering as a classification, in the diagnosis of male infertility due to environmental factors. Since the data are unbalanced, the ROC curves are most suitable method for the comparison. In this paper, we also have selected the more important features using a filtering method and examined the impact of this feature reduction on the performance of each methods; generally, most of the methods had better performance after applying the filter. We have showed that using fuzzy c-means clustering as a classification has a good performance according to the ROC curves and its performance is comparable to other classification methods like logistic regression.

Keywords: classification, fuzzy c-means, logistic regression, Naive Bayesian, neural network, ROC curve

Procedia PDF Downloads 304
3452 Sensitivity Based Robust Optimization Using 9 Level Orthogonal Array and Stepwise Regression

Authors: K. K. Lee, H. W. Han, H. L. Kang, T. A. Kim, S. H. Han

Abstract:

For the robust optimization of the manufacturing product design, there are design objectives that must be achieved, such as a minimization of the mean and standard deviation in objective functions within the required sensitivity constraints. The authors utilized the sensitivity of objective functions and constraints with respect to the effective design variables to reduce the computational burden associated with the evaluation of the probabilities. The individual mean and sensitivity values could be estimated easily by using the 9 level orthogonal array based response surface models optimized by the stepwise regression. The present study evaluates a proposed procedure from the robust optimization of rubber domes that are commonly used for keyboard switching, by using the 9 level orthogonal array and stepwise regression along with a desirability function. In addition, a new robust optimization process, i.e., the I2GEO (Identify, Integrate, Generate, Explore and Optimize), was proposed on the basis of the robust optimization in rubber domes. The optimized results from the response surface models and the estimated results by using the finite element analysis were consistent within a small margin of error. The standard deviation of objective function is decreasing 54.17% with suggested sensitivity based robust optimization. (Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2017, S2455569)

Keywords: objective function, orthogonal array, response surface model, robust optimization, stepwise regression

Procedia PDF Downloads 261
3451 Linear Regression Estimation of Tactile Comfort for Denim Fabrics Based on In-Plane Shear Behavior

Authors: Nazli Uren, Ayse Okur

Abstract:

Tactile comfort of a textile product is an essential property and a major concern when it comes to customer perceptions and preferences. The subjective nature of comfort and the difficulties regarding the simulation of human hand sensory feelings make it hard to establish a well-accepted link between tactile comfort and objective evaluations. On the other hand, shear behavior of a fabric is a mechanical parameter which can be measured by various objective test methods. The principal aim of this study is to determine the tactile comfort of commercially available denim fabrics by subjective measurements, create a tactile score database for denim fabrics and investigate the relations between tactile comfort and shear behavior. In-plane shear behaviors of 17 different commercially available denim fabrics with a variety of raw material and weave structure were measured by a custom design shear frame and conventional bias extension method in two corresponding diagonal directions. Tactile comfort of denim fabrics was determined via subjective customer evaluations as well. Aforesaid relations were statistically investigated and introduced as regression equations. The analyses regarding the relations between tactile comfort and shear behavior showed that there are considerably high correlation coefficients. The suggested regression equations were likewise found out to be statistically significant. Accordingly, it was concluded that the tactile comfort of denim fabrics can be estimated with a high precision, based on the results of in-plane shear behavior measurements.

Keywords: denim fabrics, in-plane shear behavior, linear regression estimation, tactile comfort

Procedia PDF Downloads 272
3450 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs

Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa

Abstract:

Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.

Keywords: classification models, egg weight, fertilised eggs, multiple linear regression

Procedia PDF Downloads 58
3449 Scholastic Ability and Achievement as Predictors of College Performance among Selected Second Year College Students at University of Perpetual Help System DALTA, Calamba

Authors: Shielilo R. Amihan, Ederliza De Jesus

Abstract:

The study determined the predictors of college performance of 2nd Yr students of UPHSD-Calamba. This quantitative study conducted a survey using the Scholastic Abilities Test for Adults (SATA), and the retrieval of entrance examinations results and current General Weighted Average (GWA) of the 242 randomly selected respondents. The mean, Pearson r and multiple regression analyses through SPSS revealed that students are capable of verbal, non-verbal and quantitative reasoning, reading vocabulary, comprehension, math calculation, and writing mechanics but have difficulty in math application and writing composition. The study found out the Scholastic Ability and Achievement, except in mathematics, are significantly related to college performance. It concludes that students with high ability and achievement may perform better in college. However, only English subset results in the entrance exam predicts the academic success of students in college while SATA and Math entrance exam results do not. The study recommends providing pre-college Math and Writing courses as requisites in college. It also suggests implementing formative curriculum-based enhancement programs on specific priority areas, profiling programs towards informed individual academic decision-making, revising the Entrance Examinations, monitoring the development of the students, and exploring other predictors of college academic performance such as non-cognitive factors.

Keywords: scholastic ability, scholastic achievement, entrance exam, college performance

Procedia PDF Downloads 234