Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 29589

Search results for: panel data regression models

29289 Monitoring Large-Coverage Forest Canopy Height by Integrating LiDAR and Sentinel-2 Images

Authors: Xiaobo Liu, Rakesh Mishra, Yun Zhang

Abstract:

Continuous monitoring of forest canopy height with large coverage is essential for obtaining forest carbon stocks and emissions, quantifying biomass estimation, analyzing vegetation coverage, and determining biodiversity. LiDAR can be used to collect accurate woody vegetation structure such as canopy height. However, LiDAR’s coverage is usually limited because of its high cost and limited maneuverability, which constrains its use for dynamic and large area forest canopy monitoring. On the other hand, optical satellite images, like Sentinel-2, have the ability to cover large forest areas with a high repeat rate, but they do not have height information. Hence, exploring the solution of integrating LiDAR data and Sentinel-2 images to enlarge the coverage of forest canopy height prediction and increase the prediction repeat rate has been an active research topic in the environmental remote sensing community. In this study, we explore the potential of training a Random Forest Regression (RFR) model and a Convolutional Neural Network (CNN) model, respectively, to develop two predictive models for predicting and validating the forest canopy height of the Acadia Forest in New Brunswick, Canada, with a 10m ground sampling distance (GSD), for the year 2018 and 2021. Two 10m airborne LiDAR-derived canopy height models, one for 2018 and one for 2021, are used as ground truth to train and validate the RFR and CNN predictive models. To evaluate the prediction performance of the trained RFR and CNN models, two new predicted canopy height maps (CHMs), one for 2018 and one for 2021, are generated using the trained RFR and CNN models and 10m Sentinel-2 images of 2018 and 2021, respectively. The two 10m predicted CHMs from Sentinel-2 images are then compared with the two 10m airborne LiDAR-derived canopy height models for accuracy assessment. The validation results show that the mean absolute error (MAE) for year 2018 of the RFR model is 2.93m, CNN model is 1.71m; while the MAE for year 2021 of the RFR model is 3.35m, and the CNN model is 3.78m. These demonstrate the feasibility of using the RFR and CNN models developed in this research for predicting large-coverage forest canopy height at 10m spatial resolution and a high revisit rate.

Keywords: remote sensing, forest canopy height, LiDAR, Sentinel-2, artificial intelligence, random forest regression, convolutional neural network

Procedia PDF Downloads 68

29288 Consumption Insurance against the Chronic Illness: Evidence from Thailand

Authors: Yuthapoom Thanakijborisut

Abstract:

This paper studies consumption insurance against the chronic illness in Thailand. The study estimates the impact of household consumption in the chronic illness on consumption growth. Chronic illness is the health care costs of a person or a household’s decision in treatment for the long term; the causes and effects of the household’s ability for smooth consumption. The chronic illnesses are measured in health status when at least one member within the household faces the chronic illness. The data used is from the Household Social Economic Panel Survey conducted during 2007 and 2012. The survey collected data from approximately 6,000 households from every province, both inside and outside municipal areas in Thailand. The study estimates the change in household consumption by using an ordinary least squares (OLS) regression model. The result shows that the members within the household facing the chronic illness would reduce the consumption by around 4%. This case indicates that consumption insurance in Thailand is quite sufficient against chronic illness.

Keywords: consumption insurance, chronic illness, health care, Thailand

Procedia PDF Downloads 224

29287 Predicting Data Center Resource Usage Using Quantile Regression to Conserve Energy While Fulfilling the Service Level Agreement

Authors: Ahmed I. Alutabi, Naghmeh Dezhabad, Sudhakar Ganti

Abstract:

Data centers have been growing in size and dema nd continuously in the last two decades. Planning for the deployment of resources has been shallow and always resorted to over-provisioning. Data center operators try to maximize the availability of their services by allocating multiple of the needed resources. One resource that has been wasted, with little thought, has been energy. In recent years, programmable resource allocation has paved the way to allow for more efficient and robust data centers. In this work, we examine the predictability of resource usage in a data center environment. We use a number of models that cover a wide spectrum of machine learning categories. Then we establish a framework to guarantee the client service level agreement (SLA). Our results show that using prediction can cut energy loss by up to 55%.

Keywords: machine learning, artificial intelligence, prediction, data center, resource allocation, green computing

Procedia PDF Downloads 92

29286 Optimal Designof Brush Roll for Semiconductor Wafer Using CFD Analysis

Authors: Byeong-Sam Kim, Kyoungwoo Park

Abstract:

This research analyzes structure of flat panel display (FPD) such as LCD as quantitative through CFD analysis and modeling change to minimize the badness rate and rate of production decrease by damage of large scale plater at wafer heating chamber at semi-conductor manufacturing process. This glass panel and wafer device with atmospheric pressure or chemical vapor deposition equipment for transporting and transferring wafers, robot hands carry these longer and wider wafers can also be easily handled. As a contact handling system composed of several problems in increased potential for fracture or warping. A non-contact handling system is required to solve this problem. The panel and wafer warping makes it difficult to carry out conventional contact to analysis. We propose a new non-contact transportation system with combining air suction and blowout. The numerical analysis and experimental is, therefore, should be performed to obtain compared to results achieved with non-contact solutions. This wafer panel noncontact handler shows its strength in maintaining high cleanliness levels for semiconductor production processes.

Keywords: flat panel display, non contact transportation, heat treatment process, CFD analysis

Procedia PDF Downloads 399

29285 An Application of Quantile Regression to Large-Scale Disaster Research

Authors: Katarzyna Wyka, Dana Sylvan, JoAnn Difede

Abstract:

Background and significance: The following disaster, population-based screening programs are routinely established to assess physical and psychological consequences of exposure. These data sets are highly skewed as only a small percentage of trauma-exposed individuals develop health issues. Commonly used statistical methodology in post-disaster mental health generally involves population-averaged models. Such models aim to capture the overall response to the disaster and its aftermath; however, they may not be sensitive enough to accommodate population heterogeneity in symptomatology, such as post-traumatic stress or depressive symptoms. Methods: We use an archival longitudinal data set from Weill-Cornell 9/11 Mental Health Screening Program established following the World Trade Center (WTC) terrorist attacks in New York in 2001. Participants are rescue and recovery workers who participated in the site cleanup and restoration (n=2960). The main outcome is the post-traumatic stress symptoms (PTSD) severity score assessed via clinician interviews (CAPS). For a detailed understanding of response to the disaster and its aftermath, we are adapting quantile regression methodology with particular focus on predictors of extreme distress and resilience to trauma. Results: The response variable was defined as the quantile of the CAPS score for each individual under two different scenarios specifying the unconditional quantiles based on: 1) clinically meaningful CAPS cutoff values and 2) CAPS distribution in the population. We present graphical summaries of the differential effects. For instance, we found that the effect of the WTC exposures, namely seeing bodies and feeling that life was in danger during rescue/recovery work was associated with very high PTSD symptoms. A similar effect was apparent in individuals with prior psychiatric history. Differential effects were also present for age and education level of the individuals. Conclusion: We evaluate the utility of quantile regression in disaster research in contrast to the commonly used population-averaged models. We focused on assessing the distribution of risk factors for post-traumatic stress symptoms across quantiles. This innovative approach provides a comprehensive understanding of the relationship between dependent and independent variables and could be used for developing tailored training programs and response plans for different vulnerability groups.

Keywords: disaster workers, post traumatic stress, PTSD, quantile regression

Procedia PDF Downloads 269

29284 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 423

29283 Modeling Geogenic Groundwater Contamination Risk with the Groundwater Assessment Platform (GAP)

Authors: Joel Podgorski, Manouchehr Amini, Annette Johnson, Michael Berg

Abstract:

One-third of the world’s population relies on groundwater for its drinking water. Natural geogenic arsenic and fluoride contaminate ~10% of wells. Prolonged exposure to high levels of arsenic can result in various internal cancers, while high levels of fluoride are responsible for the development of dental and crippling skeletal fluorosis. In poor urban and rural settings, the provision of drinking water free of geogenic contamination can be a major challenge. In order to efficiently apply limited resources in the testing of wells, water resource managers need to know where geogenically contaminated groundwater is likely to occur. The Groundwater Assessment Platform (GAP) fulfills this need by providing state-of-the-art global arsenic and fluoride contamination hazard maps as well as enabling users to create their own groundwater quality models. The global risk models were produced by logistic regression of arsenic and fluoride measurements using predictor variables of various soil, geological and climate parameters. The maps display the probability of encountering concentrations of arsenic or fluoride exceeding the World Health Organization’s (WHO) stipulated concentration limits of 10 µg/L or 1.5 mg/L, respectively. In addition to a reconsideration of the relevant geochemical settings, these second-generation maps represent a great improvement over the previous risk maps due to a significant increase in data quantity and resolution. For example, there is a 10-fold increase in the number of measured data points, and the resolution of predictor variables is generally 60 times greater. These same predictor variable datasets are available on the GAP platform for visualization as well as for use with a modeling tool. The latter requires that users upload their own concentration measurements and select the predictor variables that they wish to incorporate in their models. In addition, users can upload additional predictor variable datasets either as features or coverages. Such models can represent an improvement over the global models already supplied, since (a) users may be able to use their own, more detailed datasets of measured concentrations and (b) the various processes leading to arsenic and fluoride groundwater contamination can be isolated more effectively on a smaller scale, thereby resulting in a more accurate model. All maps, including user-created risk models, can be downloaded as PDFs. There is also the option to share data in a secure environment as well as the possibility to collaborate in a secure environment through the creation of communities. In summary, GAP provides users with the means to reliably and efficiently produce models specific to their region of interest by making available the latest datasets of predictor variables along with the necessary modeling infrastructure.

Keywords: arsenic, fluoride, groundwater contamination, logistic regression

Procedia PDF Downloads 327

29282 Effect of Corrosion on the Shear Buckling Strength

Authors: Myoung-Jin Lee, Sung-Jin Lee, Young-Kon Park, Jin-Wook Kim, Bo-Kyoung Kim, Song-Hun Chong, Sun-Ii Kim

Abstract:

The ability to resist the shear strength arises mainly from the web panel of steel girders and as such, the shear buckling strength of these girders has been extensively investigated. For example, Blaser’s reported that when buckling occurs, the tension field has an effect after the buckling strength of the steel is reached. The findings of these studies have been applied by AASHTO, AISC, and to the European Code that provides guidelines for designs aimed at preventing shear buckling. Steel girders are susceptible to corrosion resulting from exposure to natural elements such as rainfall, humidity, and temperature. This corrosion leads to a reduction in the size of the web panel section, thereby resulting in a decrease in the shear strength. The decrease in the panel section has a significant effect on the maintenance section of the bridge. However, in most conventional designs, the influence of corrosion is overlooked during the calculation of the shear buckling strength and hence over-design is common. Therefore, in this study, a steel girder with an A/D of 1:1, as well as a 6-mm-, 16-mm-, and 12-mm-thick web panel, flange, and intermediate reinforcing material, respectively, were used. The total length was set to that (3200 mm) of the default model. The effect of corrosion shear buckling was investigated by determining the volume amount of corrosion, shape of the erosion patterns, and the angular change in the tensile field of the shear buckling strength. This study provides the basic data that will enable designs that incorporate values closer (than those used in most conventional designs) to the actual shear buckling strength.

Keywords: corrosion, shear buckling strength, steel girder, shear strength

Procedia PDF Downloads 355

29281 Bank, Stock Market Efficiency and Economic Growth: Lessons for ASEAN-5

Authors: Tan Swee Liang

Abstract:

This paper estimates bank and stock market efficiency associations with real per capita GDP growth by examining panel-data across three different regions using Panel-Corrected Standard Errors (PCSE) regression developed by Beck and Katz (1995). Data from five economies in ASEAN (Singapore, Malaysia, Thailand, Philippines, and Indonesia), five economies in Asia (Japan, China, Hong Kong SAR, South Korea, and India) and seven economies in OECD (Australia, Canada, Denmark, Norway, Sweden, United Kingdom U.K., and United States U.S.), between 1990 and 2017 are used. Empirical findings suggest one, for Asia-5 high bank net interest margin means greater bank profitability, hence spurring economic growth. Two, for OECD-7 low bank overhead costs (as a share of total assets) may reflect weak competition and weak investment in providing superior banking services, hence dampening economic growth. Three, stock market turnover ratio has negative association with OECD-7 economic growth, but a positive association with Asia-5, which suggest the relationship between liquidity and growth is ambiguous. Lastly, for ASEAN-5 high bank overhead costs (as a share of total assets) may suggest expenses have not been channelled efficiently to income generating activities. One practical implication of the findings is that policy makers should take necessary measures toward financial liberalisation policies that boost growth through the efficiency channel, so that funds are efficiently allocated through the financial system between financial and real sectors.

Keywords: financial development, banking system, capital markets, economic growth

Procedia PDF Downloads 119

29280 Towards the Development of Uncertainties Resilient Business Model for Driving the Solar Panel Industry in Nigeria Power Sector

Authors: Balarabe Z. Ahmad, Anne-Lorène Vernay

Abstract:

The emergence of electricity in Nigeria was dated back to 1896. The power plants have the potential to generate 12,522 MW of electric power. Whereas current dispatch is about 4,000 MW, access to electrification is about 60%, with consumption at 0.14 MWh/capita. The government embarked on energy reforms to mitigate energy poverty. The reform targeted the provision of electricity access to 75% of the population by 2020 and 90% by 2030. Growth of total electricity demand by a factor of 5 by 2035 had been projected. This means that Nigeria will require almost 530 TWh of electricity which can be delivered through generators with a capacity of 65 GW. Analogously, the geographical location of Nigeria has placed it in an advantageous position as the source of solar energy; the availability of a high sunshine belt is obvious in the country. The implication is that the far North, where energy poverty is high, equally has about twice the solar radiation as against southern Nigeria. Hence, the chance of generating solar electricity is 66% possible at 11850 x 103 GWh per year, which is one hundred times the current electricity consumption rate in the country. Harvesting these huge potentials may be a mirage if the entrepreneurs in the solar panel business are left with the conventional business models that are not uncertainty resilient. Currently, business entities in RE in Nigeria are uncertain of; accessing the national grid, purchasing potentials of cooperating organizations, currency fluctuation and interest rate increases. Uncertainties such as the security of projects and government policy are issues entrepreneurs must navigate to remain sustainable in the solar panel industry in Nigeria. The aim of this paper is to identify how entrepreneurial firms consider uncertainties in developing workable business models for commercializing solar energy projects in Nigeria. In an attempt to develop a novel business model, the paper investigated how entrepreneurial firms assess and navigate uncertainties. The roles of key stakeholders in helping entrepreneurs to manage uncertainties in the Nigeria RE sector were probed in the ongoing study. The study explored empirical uncertainties that are peculiar to RE entrepreneurs in Nigeria. A mixed-mode of research was embraced using qualitative data from face-to-face interviews conducted on the Solar Energy Entrepreneurs and the experts drawn from key stakeholders. Content analysis of the interview was done using Atlas. It is a nine qualitative tool. The result suggested that all stakeholders are required to synergize in developing an uncertainty resilient business model. It was opined that the RE entrepreneurs need modifications in the business recommendations encapsulated in the energy policy in Nigeria to strengthen their capability in delivering solar energy solutions to the yawning Nigerians.

Keywords: uncertainties, entrepreneurial, business model, solar-panel

Procedia PDF Downloads 130

29279 A Quadratic Model to Early Predict the Blastocyst Stage with a Time Lapse Incubator

Authors: Cecile Edel, Sandrine Giscard D'Estaing, Elsa Labrune, Jacqueline Lornage, Mehdi Benchaib

Abstract:

Introduction: The use of incubator equipped with time-lapse technology in Artificial Reproductive Technology (ART) allows a continuous surveillance. With morphocinetic parameters, algorithms are available to predict the potential outcome of an embryo. However, the different proposed time-lapse algorithms do not take account the missing data, and then some embryos could not be classified. The aim of this work is to construct a predictive model even in the case of missing data. Materials and methods: Patients: A retrospective study was performed, in biology laboratory of reproduction at the hospital ‘Femme Mère Enfant’ (Lyon, France) between 1 May 2013 and 30 April 2015. Embryos (n= 557) obtained from couples (n=108) were cultured in a time-lapse incubator (Embryoscope®, Vitrolife, Goteborg, Sweden). Time-lapse incubator: The morphocinetic parameters obtained during the three first days of embryo life were used to build the predictive model. Predictive model: A quadratic regression was performed between the number of cells and time. N = a. T² + b. T + c. N: number of cells at T time (T in hours). The regression coefficients were calculated with Excel software (Microsoft, Redmond, WA, USA), a program with Visual Basic for Application (VBA) (Microsoft) was written for this purpose. The quadratic equation was used to find a value that allows to predict the blastocyst formation: the synthetize value. The area under the curve (AUC) obtained from the ROC curve was used to appreciate the performance of the regression coefficients and the synthetize value. A cut-off value has been calculated for each regression coefficient and for the synthetize value to obtain two groups where the difference of blastocyst formation rate according to the cut-off values was maximal. The data were analyzed with SPSS (IBM, Il, Chicago, USA). Results: Among the 557 embryos, 79.7% had reached the blastocyst stage. The synthetize value corresponds to the value calculated with time value equal to 99, the highest AUC was then obtained. The AUC for regression coefficient ‘a’ was 0.648 (p < 0.001), 0.363 (p < 0.001) for the regression coefficient ‘b’, 0.633 (p < 0.001) for the regression coefficient ‘c’, and 0.659 (p < 0.001) for the synthetize value. The results are presented as follow: blastocyst formation rate under cut-off value versus blastocyst rate formation above cut-off value. For the regression coefficient ‘a’ the optimum cut-off value was -1.14.10-3 (61.3% versus 84.3%, p < 0.001), 0.26 for the regression coefficient ‘b’ (83.9% versus 63.1%, p < 0.001), -4.4 for the regression coefficient ‘c’ (62.2% versus 83.1%, p < 0.001) and 8.89 for the synthetize value (58.6% versus 85.0%, p < 0.001). Conclusion: This quadratic regression allows to predict the outcome of an embryo even in case of missing data. Three regression coefficients and a synthetize value could represent the identity card of an embryo. ‘a’ regression coefficient represents the acceleration of cells division, ‘b’ regression coefficient represents the speed of cell division. We could hypothesize that ‘c’ regression coefficient could represent the intrinsic potential of an embryo. This intrinsic potential could be dependent from oocyte originating the embryo. These hypotheses should be confirmed by studies analyzing relationship between regression coefficients and ART parameters.

Keywords: ART procedure, blastocyst formation, time-lapse incubator, quadratic model

Procedia PDF Downloads 293

29278 Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

Authors: Seyoung Kim, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Keywords: big data, k-NN, machine learning, traffic speed prediction

Procedia PDF Downloads 337

29277 Predicting the Impact of Scope Changes on Project Cost and Schedule Using Machine Learning Techniques

Authors: Soheila Sadeghi

Abstract:

In the dynamic landscape of project management, scope changes are an inevitable reality that can significantly impact project performance. These changes, whether initiated by stakeholders, external factors, or internal project dynamics, can lead to cost overruns and schedule delays. Accurately predicting the consequences of these changes is crucial for effective project control and informed decision-making. This study aims to develop predictive models to estimate the impact of scope changes on project cost and schedule using machine learning techniques. The research utilizes a comprehensive dataset containing detailed information on project tasks, including the Work Breakdown Structure (WBS), task type, productivity rate, estimated cost, actual cost, duration, task dependencies, scope change magnitude, and scope change timing. Multiple machine learning models are developed and evaluated to predict the impact of scope changes on project cost and schedule. These models include Linear Regression, Decision Tree, Ridge Regression, Random Forest, Gradient Boosting, and XGBoost. The dataset is split into training and testing sets, and the models are trained using the preprocessed data. Cross-validation techniques are employed to assess the robustness and generalization ability of the models. The performance of the models is evaluated using metrics such as Mean Squared Error (MSE) and R-squared. Residual plots are generated to assess the goodness of fit and identify any patterns or outliers. Hyperparameter tuning is performed to optimize the XGBoost model and improve its predictive accuracy. The feature importance analysis reveals the relative significance of different project attributes in predicting the impact on cost and schedule. Key factors such as productivity rate, scope change magnitude, task dependencies, estimated cost, actual cost, duration, and specific WBS elements are identified as influential predictors. The study highlights the importance of considering both cost and schedule implications when managing scope changes. The developed predictive models provide project managers with a data-driven tool to proactively assess the potential impact of scope changes on project cost and schedule. By leveraging these insights, project managers can make informed decisions, optimize resource allocation, and develop effective mitigation strategies. The findings of this research contribute to improved project planning, risk management, and overall project success.

Keywords: cost impact, machine learning, predictive modeling, schedule impact, scope changes

Procedia PDF Downloads 17

29276 Evaluating Traffic Congestion Using the Bayesian Dirichlet Process Mixture of Generalized Linear Models

Authors: Ren Moses, Emmanuel Kidando, Eren Ozguven, Yassir Abdelrazig

Abstract:

This study applied traffic speed and occupancy to develop clustering models that identify different traffic conditions. Particularly, these models are based on the Dirichlet Process Mixture of Generalized Linear regression (DML) and change-point regression (CR). The model frameworks were implemented using 2015 historical traffic data aggregated at a 15-minute interval from an Interstate 295 freeway in Jacksonville, Florida. Using the deviance information criterion (DIC) to identify the appropriate number of mixture components, three traffic states were identified as free-flow, transitional, and congested condition. Results of the DML revealed that traffic occupancy is statistically significant in influencing the reduction of traffic speed in each of the identified states. Influence on the free-flow and the congested state was estimated to be higher than the transitional flow condition in both evening and morning peak periods. Estimation of the critical speed threshold using CR revealed that 47 mph and 48 mph are speed thresholds for congested and transitional traffic condition during the morning peak hours and evening peak hours, respectively. Free-flow speed thresholds for morning and evening peak hours were estimated at 64 mph and 66 mph, respectively. The proposed approaches will facilitate accurate detection and prediction of traffic congestion for developing effective countermeasures.

Keywords: traffic congestion, multistate speed distribution, traffic occupancy, Dirichlet process mixtures of generalized linear model, Bayesian change-point detection

Procedia PDF Downloads 274

29275 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado

Authors: Ana Paula Camelo, Keila Sanches

Abstract:

The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.

Keywords: deforestation, geographically weighted regression, land use, spatial analysis

Procedia PDF Downloads 342

29274 Regional Flood-Duration-Frequency Models for Norway

Authors: Danielle M. Barna, Kolbjørn Engeland, Thordis Thorarinsdottir, Chong-Yu Xu

Abstract:

Design flood values give estimates of flood magnitude within a given return period and are essential to making adaptive decisions around land use planning, infrastructure design, and disaster mitigation. Often design flood values are needed at locations with insufficient data. Additionally, in hydrologic applications where flood retention is important (e.g., floodplain management and reservoir design), design flood values are required at different flood durations. A statistical approach to this problem is a development of a regression model for extremes where some of the parameters are dependent on flood duration in addition to being covariate-dependent. In hydrology, this is called a regional flood-duration-frequency (regional-QDF) model. Typically, the underlying statistical distribution is chosen to be the Generalized Extreme Value (GEV) distribution. However, as the support of the GEV distribution depends on both its parameters and the range of the data, special care must be taken with the development of the regional model. In particular, we find that the GEV is problematic when developing a GAMLSS-type analysis due to the difficulty of proposing a link function that is independent of the unknown parameters and the observed data. We discuss these challenges in the context of developing a regional QDF model for Norway.

Keywords: design flood values, bayesian statistics, regression modeling of extremes, extreme value analysis, GEV

Procedia PDF Downloads 60

29273 Foreign Direct Investment, Economic Growth and CO2 Emissions: Evidence from WAIFEM Member Countries

Authors: Nasiru Inuwa, Haruna Usman Modibbo, Yahya Zakari Abdullahi

Abstract:

The purpose of this paper is to investigate the effects of foreign direct investment (FDI), economic growth on carbon emissions in context of WAIFEM member countries. The Im-Pesaran-Shin panel unit root test, Kao residual based test panel cointegration technique and panel Granger causality tests over the period 1980-2012 within a multivariate framework were applied. The results of cointegration test revealed a long run equilibrium relationship among CO2 emissions, economic growth and foreign direct investment. The results of Granger causality tests revealed a unidirectional causality running from economic growth to CO2 emissions for the panel of WAIFEM countries at the 5% level. Also, Granger causality runs from economic growth to foreign direct investment without feedback. However, no causality relationship between foreign direct investment and CO2 emissions for the panel of WAIFEM countries was observed. The study therefore, suggest that policy makers from WAIFEM member countries should design policies aim at attracting more foreign direct investments inflow as well the adoption of cleaner production technologies in order to reduce CO2 emissions.

Keywords: economic growth, CO2 emissions, causality, WAIFEM

Procedia PDF Downloads 551

29272 Technical and Practical Aspects of Sizing a Autonomous PV System

Authors: Abdelhak Bouchakour, Mustafa Brahami, Layachi Zaghba

Abstract:

The use of photovoltaic energy offers an inexhaustible supply of energy but also a clean and non-polluting energy, which is a definite advantage. The geographical location of Algeria promotes the development of the use of this energy. Indeed, given the importance of the intensity of the radiation received and the duration of sunshine. For this reason, the objective of our work is to develop a data-processing tool (software) of calculation and optimization of dimensioning of the photovoltaic installations. Our approach of optimization is basing on mathematical models, which amongst other things describe the operation of each part of the installation, the energy production, the storage and the consumption of energy.

Keywords: solar panel, solar radiation, inverter, optimization

Procedia PDF Downloads 588

29271 The Effect of Particle Porosity in Mixed Matrix Membrane Permeation Models

Authors: Z. Sadeghi, M. R. Omidkhah, M. E. Masoomi

Abstract:

The purpose of this paper is to examine gas transport behavior of mixed matrix membranes (MMMs) combined with porous particles. Main existing models are categorized in two main groups; two-phase (ideal contact) and three-phase (non-ideal contact). A new coefficient, J, was obtained to express equations for estimating effect of the particle porosity in two-phase and three-phase models. Modified models evaluates with existing models and experimental data using Matlab software. Comparison of gas permeability of proposed modified models with existing models in different MMMs shows a better prediction of gas permeability in MMMs.

Keywords: mixed matrix membrane, permeation models, porous particles, porosity

Procedia PDF Downloads 368

29270 Prediction of Mechanical Strength of Multiscale Hybrid Reinforced Cementitious Composite

Authors: Salam Alrekabi, A. B. Cundy, Mohammed Haloob Al-Majidi

Abstract:

Novel multiscale hybrid reinforced cementitious composites based on carbon nanotubes (MHRCC-CNT), and carbon nanofibers (MHRCC-CNF) are new types of cement-based material fabricated with micro steel fibers and nanofilaments, featuring superior strain hardening, ductility, and energy absorption. This study focused on established models to predict the compressive strength, and direct and splitting tensile strengths of the produced cementitious composites. The analysis was carried out based on the experimental data presented by the previous author’s study, regression analysis, and the established models that available in the literature. The obtained models showed small differences in the predictions and target values with experimental verification indicated that the estimation of the mechanical properties could be achieved with good accuracy.

Keywords: multiscale hybrid reinforced cementitious composites, carbon nanotubes, carbon nanofibers, mechanical strength prediction

Procedia PDF Downloads 148

29269 Big Data Analysis with Rhipe

Authors: Byung Ho Jung, Ji Eun Shin, Dong Hoon Lim

Abstract:

Rhipe that integrates R and Hadoop environment made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data. Experimental results for comparing the performance of our Rhipe with stats and biglm packages available on bigmemory, showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases. We also compared the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster. The results showed that fully-distributed mode was faster than pseudo-distributed mode, and computing speeds of fully-distributed mode were faster as the number of data nodes increases.

Keywords: big data, Hadoop, Parallel regression analysis, R, Rhipe

Procedia PDF Downloads 487

29268 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second

Authors: P. V. Pramila , V. Mahesh

Abstract:

Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.

Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest

Procedia PDF Downloads 289

29267 The Effects of Corporate Governance on Firm’s Financial Performance: A Study of Family and Non-family Owned Firms in Pakistan

Authors: Saad Bin Nasir

Abstract:

This research will examine the impact of corporate governance on firm performance in family and non-family owned firms in Pakistan. For the purpose of this research, corporate governance mechanisms which included are board size, board composition, leadership structure, board meetings are taken as independent variable and firm performance taken as dependent variable and it will be measured with return on asset and return on equity. Firm size and firm’s age will be taken as control variables. Secondary data will collect from audited annul reports of companies and panel data regression model will applied, to check the impact of corporate governance on firm performance.

Keywords: board size, board composition, Leadership Structure, board meetings, firm performance, family and non-family owned firms

Procedia PDF Downloads 359

29266 Using Machine Learning to Enhance Win Ratio for College Ice Hockey Teams

Authors: Sadixa Sanjel, Ahmed Sadek, Naseef Mansoor, Zelalem Denekew

Abstract:

Collegiate ice hockey (NCAA) sports analytics is different from the national level hockey (NHL). We apply and compare multiple machine learning models such as Linear Regression, Random Forest, and Neural Networks to predict the win ratio for a team based on their statistics. Data exploration helps determine which statistics are most useful in increasing the win ratio, which would be beneficial to coaches and team managers. We ran experiments to select the best model and chose Random Forest as the best performing. We conclude with how to bridge the gap between the college and national levels of sports analytics and the use of machine learning to enhance team performance despite not having a lot of metrics or budget for automatic tracking.

Keywords: NCAA, NHL, sports analytics, random forest, regression, neural networks, game predictions

Procedia PDF Downloads 100

29265 A Survey on Quasi-Likelihood Estimation Approaches for Longitudinal Set-ups

Authors: Naushad Mamode Khan

Abstract:

The Com-Poisson (CMP) model is one of the most popular discrete generalized linear models (GLMS) that handles both equi-, over- and under-dispersed data. In longitudinal context, an integer-valued autoregressive (INAR(1)) process that incorporates covariate specification has been developed to model longitudinal CMP counts. However, the joint likelihood CMP function is difficult to specify and thus restricts the likelihood based estimating methodology. The joint generalized quasilikelihood approach (GQL-I) was instead considered but is rather computationally intensive and may not even estimate the regression effects due to a complex and frequently ill conditioned covariance structure. This paper proposes a new GQL approach for estimating the regression parameters (GQLIII) that are based on a single score vector representation. The performance of GQL-III is compared with GQL-I and separate marginal GQLs (GQL-II) through some simulation experiments and is proved to yield equally efficient estimates as GQL-I and is far more computationally stable.

Keywords: longitudinal, com-Poisson, ill-conditioned, INAR(1), GLMS, GQL

Procedia PDF Downloads 341

29264 Impact of Board Characteristics on Financial Performance: A Study of Manufacturing Sector of Pakistan

Authors: Saad Bin Nasir

Abstract:

The research will examine the role of corporate governance (CG) practices on firm’s financial performance. Population of this research will be manufacture sector of Pakistan. For the purposes of measurement of impact of corporate governance practices such as board size, board independence, ceo/chairman duality, will take as independent variables and for the measurement of firm’s performance return on assets and return on equity will take as dependent variables. Panel data regression model will be used to estimate the impact of CG on firm performance.

Keywords: corporate governance, board size, board independence, leadership

Procedia PDF Downloads 503

29263 Validation of Escherichia coli O157:H7 Inactivation on Apple-Carrot Juice Treated with Manothermosonication by Kinetic Models

Authors: Ozan Kahraman, Hao Feng

Abstract:

Several models such as Weibull, Modified Gompertz, Biphasic linear, and Log-logistic models have been proposed in order to describe non-linear inactivation kinetics and used to fit non-linear inactivation data of several microorganisms for inactivation by heat, high pressure processing or pulsed electric field. First-order kinetic parameters (D-values and z-values) have often been used in order to identify microbial inactivation by non-thermal processing methods such as ultrasound. Most ultrasonic inactivation studies employed first-order kinetic parameters (D-values and z-values) in order to describe the reduction on microbial survival count. This study was conducted to analyze the E. coli O157:H7 inactivation data by using five microbial survival models (First-order, Weibull, Modified Gompertz, Biphasic linear and Log-logistic). First-order, Weibull, Modified Gompertz, Biphasic linear and Log-logistic kinetic models were used for fitting inactivation curves of Escherichia coli O157:H7. The residual sum of squares and the total sum of squares criteria were used to evaluate the models. The statistical indices of the kinetic models were used to fit inactivation data for E. coli O157:H7 by MTS at three temperatures (40, 50, and 60 0C) and three pressures (100, 200, and 300 kPa). Based on the statistical indices and visual observations, the Weibull and Biphasic models were best fitting of the data for MTS treatment as shown by high R2 values. The non-linear kinetic models, including the Modified Gompertz, First-order, and Log-logistic models did not provide any better fit to data from MTS compared the Weibull and Biphasic models. It was observed that the data found in this study did not follow the first-order kinetics. It is possibly because of the cells which are sensitive to ultrasound treatment were inactivated first, resulting in a fast inactivation period, while those resistant to ultrasound were killed slowly. The Weibull and biphasic models were found as more flexible in order to determine the survival curves of E. coli O157:H7 treated by MTS on apple-carrot juice.

Keywords: Weibull, Biphasic, MTS, kinetic models, E.coli O157:H7

Procedia PDF Downloads 348

29262 Geographic Information System for District Level Energy Performance Simulations

Authors: Avichal Malhotra, Jerome Frisch, Christoph van Treeck

Abstract:

The utilization of semantic, cadastral and topological data from geographic information systems (GIS) has exponentially increased for building and urban-scale energy performance simulations. Urban planners, simulation scientists, and researchers use virtual 3D city models for energy analysis, algorithms and simulation tools. For dynamic energy simulations at city and district level, this paper provides an overview of the available GIS data models and their levels of detail. Adhering to different norms and standards, these models also intend to describe building and construction industry data. For further investigations, CityGML data models are considered for simulations. Though geographical information modelling has considerably many different implementations, extensions of virtual city data can also be made for domain specific applications. Highlighting the use of the extended CityGML models for energy researches, a brief introduction to the Energy Application Domain Extension (ADE) along with its significance is made. Consequently, addressing specific input simulation data, a workflow using Modelica underlining the usage of GIS information and the quantification of its significance over annual heating energy demand is presented in this paper.

Keywords: CityGML, EnergyADE, energy performance simulation, GIS

Procedia PDF Downloads 152

29261 Partial Least Square Regression for High-Dimentional and High-Correlated Data

Authors: Mohammed Abdullah Alshahrani

Abstract:

The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.

Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data

Procedia PDF Downloads 33

29260 Thin-Layer Drying Characteristics and Modelling of Instant Coffee Solution

Authors: Apolinar Picado, Ronald Solís, Rafael Gamero

Abstract:

The thin-layer drying characteristics of instant coffee solution were investigated in a laboratory tunnel dryer. Drying experiments were carried out at three temperatures (80, 100 and 120 °C) and an air velocity of 1.2 m/s. Drying experimental data obtained are fitted to six (6) thin-layer drying models using the non-linear least squares regression analysis. The acceptability of the thin-layer drying model has been based on a value of the correlation coefficient that should be close to one, and low values for root mean square error (RMSE) and chi-square (x²). According to this evaluation, the most suitable model for describing drying process of thin-layer instant coffee solution is the Page model. Further, the effective moisture diffusivity and the activation energy were computed employing the drying experimental data. The effective moisture diffusivity values varied from 1.6133 × 10⁻⁹ to 1.6224 × 10⁻⁹ m²/s over the temperature range studied and the activation energy was estimated to be 162.62 J/mol.

Keywords: activation energy, diffusivity, instant coffee, thin-layer models

Procedia PDF Downloads 240