Search results for: regression models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8992

Search results for: regression models

8362 Using Machine-Learning Methods for Allergen Amino Acid Sequence's Permutations

Authors: Kuei-Ling Sun, Emily Chia-Yu Su

Abstract:

Allergy is a hypersensitive overreaction of the immune system to environmental stimuli, and a major health problem. These overreactions include rashes, sneezing, fever, food allergies, anaphylaxis, asthmatic, shock, or other abnormal conditions. Allergies can be caused by food, insect stings, pollen, animal wool, and other allergens. Their development of allergies is due to both genetic and environmental factors. Allergies involve immunoglobulin E antibodies, a part of the body’s immune system. Immunoglobulin E antibodies will bind to an allergen and then transfer to a receptor on mast cells or basophils triggering the release of inflammatory chemicals such as histamine. Based on the increasingly serious problem of environmental change, changes in lifestyle, air pollution problem, and other factors, in this study, we both collect allergens and non-allergens from several databases and use several machine learning methods for classification, including logistic regression (LR), stepwise regression, decision tree (DT) and neural networks (NN) to do the model comparison and determine the permutations of allergen amino acid’s sequence.

Keywords: allergy, classification, decision tree, logistic regression, machine learning

Procedia PDF Downloads 286
8361 Kenaf MDF Panels with Soy Based Adhesive. The Influence of Preparation Parameters on Physciomechanical Properties

Authors: Imtiaz Ali, Krishnan Jayaraman, Debes Bhattacharyya

Abstract:

Soybean concentrate is abundant material and renewable product that is recently been explored as an alternative to conventional formaldehyde based resins in wood based products. The main goal of this study is to evaluate the technical feasibility of manufacturing environment friendly MDF panels from renewable resources. The panels are made by using kenaf bast fibers (KB) as wood substitute and soy based adhesive as bonding material. Second order response surface regression models are used to understand the effects and interactions of resin content (RC) and pressing time (PT) on the mechanical and water soaking properties of kenaf panels. The mechanical and water soaking properties are significantly improved as the RC increased and reached at the highest level at maximum resin loading (12%). The effect of pressing time is significant in the first phase when the pressing time increased from 4 to 6 min; however the effect was not as significant when pressing time further increased to 8 min. The second order regression equations further confirm that the variation in process parameters has strong relationship with the physciomechanical properties. The MDF panels the minimum requirements of internal bond strength, modulus of rupture and modulus of elasticity as recommended by US wood MDF standard specifications for G110, G120, G130 and G140 grade MDF panels. However, the thickness swelling results are considerably poorer than the recommended values of general purpose standard requirements. This deficiency can be counterbalanced by the advantage of being formaldehyde free panels made from renewable sources and by making them suitable alternative for less humid environment applications.

Keywords: kenaf, Medium density fibreboard, soy adhesive, mechanical properties, water soaking properties

Procedia PDF Downloads 359
8360 Plant Identification Using Convolution Neural Network and Vision Transformer-Based Models

Authors: Virender Singh, Mathew Rees, Simon Hampton, Sivaram Annadurai

Abstract:

Plant identification is a challenging task that aims to identify the family, genus, and species according to plant morphological features. Automated deep learning-based computer vision algorithms are widely used for identifying plants and can help users narrow down the possibilities. However, numerous morphological similarities between and within species render correct classification difficult. In this paper, we tested custom convolution neural network (CNN) and vision transformer (ViT) based models using the PyTorch framework to classify plants. We used a large dataset of 88,000 provided by the Royal Horticultural Society (RHS) and a smaller dataset of 16,000 images from the PlantClef 2015 dataset for classifying plants at genus and species levels, respectively. Our results show that for classifying plants at the genus level, ViT models perform better compared to CNN-based models ResNet50 and ResNet-RS-420 and other state-of-the-art CNN-based models suggested in previous studies on a similar dataset. ViT model achieved top accuracy of 83.3% for classifying plants at the genus level. For classifying plants at the species level, ViT models perform better compared to CNN-based models ResNet50 and ResNet-RS-420, with a top accuracy of 92.5%. We show that the correct set of augmentation techniques plays an important role in classification success. In conclusion, these results could help end users, professionals and the general public alike in identifying plants quicker and with improved accuracy.

Keywords: plant identification, CNN, image processing, vision transformer, classification

Procedia PDF Downloads 78
8359 Sensitivity and Uncertainty Analysis of One Dimensional Shape Memory Alloy Constitutive Models

Authors: A. B. M. Rezaul Islam, Ernur Karadogan

Abstract:

Shape memory alloys (SMAs) are known for their shape memory effect and pseudoelasticity behavior. Their thermomechanical behaviors are modeled by numerous researchers using microscopic thermodynamic and macroscopic phenomenological point of view. Tanaka, Liang-Rogers and Ivshin-Pence models are some of the most popular SMA macroscopic phenomenological constitutive models. They describe SMA behavior in terms of stress, strain and temperature. These models involve material parameters and they have associated uncertainty present in them. At different operating temperatures, the uncertainty propagates to the output when the material is subjected to loading followed by unloading. The propagation of uncertainty while utilizing these models in real-life application can result in performance discrepancies or failure at extreme conditions. To resolve this, we used probabilistic approach to perform the sensitivity and uncertainty analysis of Tanaka, Liang-Rogers, and Ivshin-Pence models. Sobol and extended Fourier Amplitude Sensitivity Testing (eFAST) methods have been used to perform the sensitivity analysis for simulated isothermal loading/unloading at various operating temperatures. As per the results, it is evident that the models vary due to the change in operating temperature and loading condition. The average and stress-dependent sensitivity indices present the most significant parameters at several temperatures. This work highlights the sensitivity and uncertainty analysis results and shows comparison of them at different temperatures and loading conditions for all these models. The analysis presented will aid in designing engineering applications by eliminating the probability of model failure due to the uncertainty in the input parameters. Thus, it is recommended to have a proper understanding of sensitive parameters and the uncertainty propagation at several operating temperatures and loading conditions as per Tanaka, Liang-Rogers, and Ivshin-Pence model.

Keywords: constitutive models, FAST sensitivity analysis, sensitivity analysis, sobol, shape memory alloy, uncertainty analysis

Procedia PDF Downloads 124
8358 Measuring Environmental Efficiency of Energy in OPEC Countries

Authors: Bahram Fathi, Seyedhossein Sajadifar, Naser Khiabani

Abstract:

Data envelopment analysis (DEA) has recently gained popularity in energy efficiency analysis. A common feature of the previously proposed DEA models for measuring energy efficiency performance is that they treat energy consumption as an input within a production framework without considering undesirable outputs. However, energy use results in the generation of undesirable outputs as byproducts of producing desirable outputs. Within a joint production framework of both desirable and undesirable outputs, this paper presents several DEA-type linear programming models for measuring energy efficiency performance. In addition to considering undesirable outputs, our models treat different energy sources as different inputs so that changes in energy mix could be accounted for in evaluating energy efficiency. The proposed models are applied to measure the energy efficiency performances of 12 OPEC countries and the results obtained are presented.

Keywords: energy efficiency, undesirable outputs, data envelopment analysis

Procedia PDF Downloads 715
8357 Enhancing Model Interoperability and Reuse by Designing and Developing a Unified Metamodel Standard

Authors: Arash Gharibi

Abstract:

Mankind has always used models to solve problems. Essentially, models are simplified versions of reality, whose need stems from having to deal with complexity; many processes or phenomena are too complex to be described completely. Thus a fundamental model requirement is that it contains the characteristic features that are essential in the context of the problem to be solved or described. Models are used in virtually every scientific domain to deal with various problems. During the recent decades, the number of models has increased exponentially. Publication of models as part of original research has traditionally been in in scientific periodicals, series, monographs, agency reports, national journals and laboratory reports. This makes it difficult for interested groups and communities to stay informed about the state-of-the-art. During the modeling process, many important decisions are made which impact the final form of the model. Without a record of these considerations, the final model remains ill-defined and open to varying interpretations. Unfortunately, the details of these considerations are often lost or in case there is any existing information about a model, it is likely to be written intuitively in different layouts and in different degrees of detail. In order to overcome these issues, different domains have attempted to implement their own approaches to preserve their models’ information in forms of model documentation. The most frequently cited model documentation approaches show that they are domain specific, not to applicable to the existing models and evolutionary flexibility and intrinsic corrections and improvements are not possible with the current approaches. These issues are all because of a lack of unified standards for model documentation. As a way forward, this research will propose a new standard for capturing and managing models’ information in a unified way so that interoperability and reusability of models become possible. This standard will also be evolutionary, meaning members of modeling realm could contribute to its ongoing developments and improvements. In this paper, the current 3 of the most common metamodels are reviewed and according to pros and cons of each, a new metamodel is proposed.

Keywords: metamodel, modeling, interoperability, reuse

Procedia PDF Downloads 182
8356 Implied Adjusted Volatility by Leland Option Pricing Models: Evidence from Australian Index Options

Authors: Mimi Hafizah Abdullah, Hanani Farhah Harun, Nik Ruzni Nik Idris

Abstract:

With the implied volatility as an important factor in financial decision-making, in particular in option pricing valuation, and also the given fact that the pricing biases of Leland option pricing models and the implied volatility structure for the options are related, this study considers examining the implied adjusted volatility smile patterns and term structures in the S&P/ASX 200 index options using the different Leland option pricing models. The examination of the implied adjusted volatility smiles and term structures in the Australian index options market covers the global financial crisis in the mid-2007. The implied adjusted volatility was found to escalate approximately triple the rate prior the crisis.

Keywords: implied adjusted volatility, financial crisis, Leland option pricing models, Australian index options

Procedia PDF Downloads 361
8355 On Improving Breast Cancer Prediction Using GRNN-CP

Authors: Kefaya Qaddoum

Abstract:

The aim of this study is to predict breast cancer and to construct a supportive model that will stimulate a more reliable prediction as a factor that is fundamental for public health. In this study, we utilize general regression neural networks (GRNN) to replace the normal predictions with prediction periods to achieve a reasonable percentage of confidence. The mechanism employed here utilises a machine learning system called conformal prediction (CP), in order to assign consistent confidence measures to predictions, which are combined with GRNN. We apply the resulting algorithm to the problem of breast cancer diagnosis. The results show that the prediction constructed by this method is reasonable and could be useful in practice.

Keywords: neural network, conformal prediction, cancer classification, regression

Procedia PDF Downloads 268
8354 High Speed Rail vs. Other Factors Affecting the Tourism Market in Italy

Authors: F. Pagliara, F. Mauriello

Abstract:

The objective of this paper is to investigate the relationship between the increase of accessibility brought by high speed rail (HSR) systems and the tourism market in Italy. The impacts of HSR projects on tourism can be quantified in different ways. In this manuscript, an empirical analysis has been carried out with the aid of a dataset containing information both on tourism and transport for 99 Italian provinces during the 2006-2016 period. Panel data regression models have been considered, since they allow modelling a wide variety of correlation patterns. Results show that HSR has an impact on the choice of a given destination for Italian tourists while the presence of a second level hub mainly affects foreign tourists. Attraction variables are also significant for both categories and the variables concerning security, such as number of crimes registered in a given destination, have a negative impact on the choice of a destination.

Keywords: tourists, overnights, high speed rail, attractions, security

Procedia PDF Downloads 142
8353 Evaluation of Environmental, Technical, and Economic Indicators of a Fused Deposition Modeling Process

Authors: M. Yosofi, S. Ezeddini, A. Ollivier, V. Lavaste, C. Mayousse

Abstract:

Additive manufacturing processes have changed significantly in a wide range of industries and their application progressed from rapid prototyping to production of end-use products. However, their environmental impact is still a rather open question. In order to support the growth of this technology in the industrial sector, environmental aspects should be considered and predictive models may help monitor and reduce the environmental footprint of the processes. This work presents predictive models based on a previously developed methodology for the environmental impact evaluation combined with a technical and economical assessment. Here we applied the methodology to the Fused Deposition Modeling process. First, we present the predictive models relative to different types of machines. Then, we present a decision-making tool designed to identify the optimum manufacturing strategy regarding technical, economic, and environmental criteria.

Keywords: additive manufacturing, decision-makings, environmental impact, predictive models

Procedia PDF Downloads 110
8352 Testing a Dose-Response Model of Intergenerational Transmission of Family Violence

Authors: Katherine Maurer

Abstract:

Background and purpose: Violence that occurs within families is a global social problem. Children who are victims or witness to family violence are at risk for many negative effects both proximally and distally. One of the most disconcerting long-term effects occurs when child victims become adult perpetrators: the intergenerational transmission of family violence (ITFV). Early identification of those children most at risk for ITFV is needed to inform interventions to prevent future family violence perpetration and victimization. Only about 25-30% of child family violence victims become perpetrators of adult family violence (either child abuse, partner abuse, or both). Prior research has primarily been conducted using dichotomous measures of exposure (yes; no) to predict ITFV, given the low incidence rate in community samples. It is often assumed that exposure to greater amounts of violence predicts greater risk of ITFV. However, no previous longitudinal study with a community sample has tested a dose-response model of exposure to physical child abuse and parental physical intimate partner violence (IPV) using count data of frequency and severity of violence to predict adult ITFV. The current study used advanced statistical methods to test if increased childhood exposure would predict greater risk of ITFV. Methods: The study utilized 3 panels of prospective data from a cohort of 15 year olds (N=338) from the Project on Human Development in Chicago Neighborhoods longitudinal study. The data were comprised of a stratified probability sample of seven ethnic/racial categories and three socio-economic status levels. Structural equation modeling was employed to test a hurdle regression model of dose-response to predict ITFV. A version of the Conflict Tactics Scale was used to measure physical violence victimization, witnessing parental IPV and young adult IPV perpetration and victimization. Results: Consistent with previous findings, past 12 months incidence rates severity and frequency of interpersonal violence were highly skewed. While rates of parental and young adult IPV were about 40%, an unusually high rate of physical child abuse (57%) was reported. The vast majority of a number of acts of violence, whether minor or severe, were in the 1-3 range in the past 12 months. Reported frequencies of more than 5 times in the past year were rare, with less than 10% of those reporting more than six acts of minor or severe physical violence. As expected, minor acts of violence were much more common than acts of severe violence. Overall, regression analyses were not significant for the dose-response model of ITFV. Conclusions and implications: The results of the dose-response model were not significant due to a lack of power in the final sample (N=338). Nonetheless, the value of the approach was confirmed for the future research given the bi-modal nature of the distributions which suggest that in the context of both child physical abuse and physical IPV, there are at least two classes when frequency of acts is considered. Taking frequency into account in predictive models may help to better understand the relationship of exposure to ITFV outcomes. Further testing using hurdle regression models is suggested.

Keywords: intergenerational transmission of family violence, physical child abuse, intimate partner violence, structural equation modeling

Procedia PDF Downloads 226
8351 Multicollinearity and MRA in Sustainability: Application of the Raise Regression

Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez

Abstract:

Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.

Keywords: multicollinearity, MRA, interaction, raise

Procedia PDF Downloads 88
8350 Leveraging Unannotated Data to Improve Question Answering for French Contract Analysis

Authors: Touila Ahmed, Elie Louis, Hamza Gharbi

Abstract:

State of the art question answering models have recently shown impressive performance especially in a zero-shot setting. This approach is particularly useful when confronted with a highly diverse domain such as the legal field, in which it is increasingly difficult to have a dataset covering every notion and concept. In this work, we propose a flexible generative question answering approach to contract analysis as well as a weakly supervised procedure to leverage unannotated data and boost our models’ performance in general, and their zero-shot performance in particular.

Keywords: question answering, contract analysis, zero-shot, natural language processing, generative models, self-supervision

Procedia PDF Downloads 164
8349 Dow Polyols near Infrared Chemometric Model Reduction Based on Clustering: Reducing Thirty Global Hydroxyl Number (OH) Models to Less Than Five

Authors: Wendy Flory, Kazi Czarnecki, Matthijs Mercy, Mark Joswiak, Mary Beth Seasholtz

Abstract:

Polyurethane Materials are present in a wide range of industrial segments such as Furniture, Building and Construction, Composites, Automotive, Electronics, and more. Dow is one of the leaders for the manufacture of the two main raw materials, Isocyanates and Polyols used to produce polyurethane products. Dow is also a key player for the manufacture of Polyurethane Systems/Formulations designed for targeted applications. In 1990, the first analytical chemometric models were developed and deployed for use in the Dow QC labs of the polyols business for the quantification of OH, water, cloud point, and viscosity. Over the years many models have been added; there are now over 140 models for quantification and hundreds for product identification, too many to be reasonable for support. There are 29 global models alone for the quantification of OH across > 70 products at many sites. An attempt was made to consolidate these into a single model. While the consolidated model proved good statistics across the entire range of OH, several products had a bias by ASTM E1655 with individual product validation. This project summary will show the strategy for global model updates for OH, to reduce the number of models for quantification from over 140 to 5 or less using chemometric methods. In order to gain an understanding of the best product groupings, we identify clusters by reducing spectra to a few dimensions via Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). Results from these cluster analyses and a separate validation set allowed dow to reduce the number of models for predicting OH from 29 to 3 without loss of accuracy.

Keywords: hydroxyl, global model, model maintenance, near infrared, polyol

Procedia PDF Downloads 119
8348 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 157
8347 Non-Destructive Prediction System Using near Infrared Spectroscopy for Crude Palm Oil

Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim

Abstract:

Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of predictive models has facilitated the estimation process in recent years. In this research, 176 crude palm oil (CPO) samples acquired from Felda Johor Bulker Sdn Bhd were studied. A FOSS NIRSystem was used to tak e absorbance measurements from the sample. The wavelength range for the spectral measurement is taken at 1600nm to 1900nm. Partial Least Square Regression (PLSR) prediction model with 50 optimal number of principal components was implemented to study the relationship between the measured Free Fatty Acid (FFA) values and the measured spectral absorption. PLSR showed predictive ability of FFA values with correlative coefficient (R) of 0.9808 for the training set and 0.9684 for the testing set.

Keywords: palm oil, fatty acid, NIRS, PLSR

Procedia PDF Downloads 195
8346 Geographic Information System for District Level Energy Performance Simulations

Authors: Avichal Malhotra, Jerome Frisch, Christoph van Treeck

Abstract:

The utilization of semantic, cadastral and topological data from geographic information systems (GIS) has exponentially increased for building and urban-scale energy performance simulations. Urban planners, simulation scientists, and researchers use virtual 3D city models for energy analysis, algorithms and simulation tools. For dynamic energy simulations at city and district level, this paper provides an overview of the available GIS data models and their levels of detail. Adhering to different norms and standards, these models also intend to describe building and construction industry data. For further investigations, CityGML data models are considered for simulations. Though geographical information modelling has considerably many different implementations, extensions of virtual city data can also be made for domain specific applications. Highlighting the use of the extended CityGML models for energy researches, a brief introduction to the Energy Application Domain Extension (ADE) along with its significance is made. Consequently, addressing specific input simulation data, a workflow using Modelica underlining the usage of GIS information and the quantification of its significance over annual heating energy demand is presented in this paper.

Keywords: CityGML, EnergyADE, energy performance simulation, GIS

Procedia PDF Downloads 153
8345 Impact of Leadership Styles on Work Motivation and Organizational Commitment among Faculty Members of Public Sector Universities in Punjab

Authors: Wajeeha Shahid

Abstract:

The study was designed to assess the impact of transformational and transactional leadership styles on work motivation and organizational commitment among faculty members of universities of Punjab. 713 faculty members were selected as sample through convenient random sampling technique. Three self-constructed questionnaires namely Leadership Styles Questionnaire (LSQ), Work Motivation Questionnaire (WMQ) and Organizational Commitment Questionnaire (OCMQ) were used as research instruments. Major objectives of the study included assessing the effect and impact of transformational and transactional leadership styles on work motivation and organizational commitment. Theoretical frame work of the study included Idealized Influence, Inspirational Motivation, Intellectual Stimulation, Individualized Consideration, Contingent Rewards and Management by Exception as independent variables and Extrinsic motivation, Intrinsic motivation, Affective commitment, Continuance commitment and Normative commitment as dependent variables. SPSS Version 21 was used to analyze and tabulate data. Cronbach's Alpha reliability, Pearson Correlation and Multiple regression analysis were applied as statistical treatments for the analysis. Results revealed that Idealized Influence correlated significantly with intrinsic motivation and Affective commitment whereas Contingent rewards had a strong positive correlation with extrinsic motivation and affective commitment. Multiple regression models revealed a variance of 85% for transformational leadership style over work motivation and organizational commitment. Whereas transactional style as a predictor manifested a variance of 79% for work motivation and 76% for organizational commitment. It was suggested that changing organizational cultures are demanding more from their leadership. All organizations need to consider transformational leadership style as an important part of their equipment in leveraging both soft and hard organizational targets.

Keywords: leadership styles, work motivation, organizational commitment, faculty member

Procedia PDF Downloads 291
8344 Bayesian Reliability of Weibull Regression with Type-I Censored Data

Authors: Al Omari Moahmmed Ahmed

Abstract:

In the Bayesian, we developed an approach by using non-informative prior with covariate and obtained by using Gauss quadrature method to estimate the parameters of the covariate and reliability function of the Weibull regression distribution with Type-I censored data. The maximum likelihood seen that the estimators obtained are not available in closed forms, although they can be solved it by using Newton-Raphson methods. The comparison criteria are the MSE and the performance of these estimates are assessed using simulation considering various sample size, several specific values of shape parameter. The results show that Bayesian with non-informative prior is better than Maximum Likelihood Estimator.

Keywords: non-informative prior, Bayesian method, type-I censoring, Gauss quardature

Procedia PDF Downloads 485
8343 Multi-Model Super Ensemble Based Advanced Approaches for Monsoon Rainfall Prediction

Authors: Swati Bhomia, C. M. Kishtawal, Neeru Jaiswal

Abstract:

Traditionally, monsoon forecasts have encountered many difficulties that stem from numerous issues such as lack of adequate upper air observations, mesoscale nature of convection, proper resolution, radiative interactions, planetary boundary layer physics, mesoscale air-sea fluxes, representation of orography, etc. Uncertainties in any of these areas lead to large systematic errors. Global circulation models (GCMs), which are developed independently at different institutes, each of which carries somewhat different representation of the above processes, can be combined to reduce the collective local biases in space, time, and for different variables from different models. This is the basic concept behind the multi-model superensemble and comprises of a training and a forecast phase. The training phase learns from the recent past performances of models and is used to determine statistical weights from a least square minimization via a simple multiple regression. These weights are then used in the forecast phase. The superensemble forecasts carry the highest skill compared to simple ensemble mean, bias corrected ensemble mean and the best model out of the participating member models. This approach is a powerful post-processing method for the estimation of weather forecast parameters reducing the direct model output errors. Although it can be applied successfully to the continuous parameters like temperature, humidity, wind speed, mean sea level pressure etc., in this paper, this approach is applied to rainfall, a parameter quite difficult to handle with standard post-processing methods, due to its high temporal and spatial variability. The present study aims at the development of advanced superensemble schemes comprising of 1-5 day daily precipitation forecasts from five state-of-the-art global circulation models (GCMs), i.e., European Centre for Medium Range Weather Forecasts (Europe), National Center for Environmental Prediction (USA), China Meteorological Administration (China), Canadian Meteorological Centre (Canada) and U.K. Meteorological Office (U.K.) obtained from THORPEX Interactive Grand Global Ensemble (TIGGE), which is one of the most complete data set available. The novel approaches include the dynamical model selection approach in which the selection of the superior models from the participating member models at each grid and for each forecast step in the training period is carried out. Multi-model superensemble based on the training using similar conditions is also discussed in the present study, which is based on the assumption that training with the similar type of conditions may provide the better forecasts in spite of the sequential training which is being used in the conventional multi-model ensemble (MME) approaches. Further, a variety of methods that incorporate a 'neighborhood' around each grid point which is available in literature to allow for spatial error or uncertainty, have also been experimented with the above mentioned approaches. The comparison of these schemes with respect to the observations verifies that the newly developed approaches provide more unified and skillful prediction of the summer monsoon (viz. June to September) rainfall compared to the conventional multi-model approach and the member models.

Keywords: multi-model superensemble, dynamical model selection, similarity criteria, neighborhood technique, rainfall prediction

Procedia PDF Downloads 121
8342 Talent-to-Vec: Using Network Graphs to Validate Models with Data Sparsity

Authors: Shaan Khosla, Jon Krohn

Abstract:

In a recruiting context, machine learning models are valuable for recommendations: to predict the best candidates for a vacancy, to match the best vacancies for a candidate, and compile a set of similar candidates for any given candidate. While useful to create these models, validating their accuracy in a recommendation context is difficult due to a sparsity of data. In this report, we use network graph data to generate useful representations for candidates and vacancies. We use candidates and vacancies as network nodes and designate a bi-directional link between them based on the candidate interviewing for the vacancy. After using node2vec, the embeddings are used to construct a validation dataset with a ranked order, which will help validate new recommender systems.

Keywords: AI, machine learning, NLP, recruiting

Procedia PDF Downloads 74
8341 Bridging the Gap between Different Interfaces for Business Process Modeling

Authors: Katalina Grigorova, Kaloyan Mironov

Abstract:

The paper focuses on the benefits of business process modeling. Although this discipline is developing for many years, there is still necessity of creating new opportunities to meet the ever-increasing users’ needs. Because one of these needs is related to the conversion of business process models from one standard to another, the authors have developed a converter between BPMN and EPC standards using workflow patterns as intermediate tool. Nowadays there are too many systems for business process modeling. The variety of output formats is almost the same as the systems themselves. This diversity additionally hampers the conversion of the models. The presented study is aimed at discussing problems due to differences in the output formats of various modeling environments.

Keywords: business process modeling, business process modeling standards, workflow patterns, converting models

Procedia PDF Downloads 566
8340 Analysis of Ferroresonant Overvoltages in Cable-fed Transformers

Authors: George Eduful, Ebenezer A. Jackson, Kingsford A. Atanga

Abstract:

This paper investigates the impacts of cable length and capacity of transformer on ferroresonant overvoltage in cable-fed transformers. The study was conducted by simulation using the EMTP RV. Results show that ferroresonance can cause dangerous overvoltages ranging from 2 to 5 per unit. These overvoltages impose stress on insulations of transformers and cables and subsequently result in system failures. Undertaking Basic Multiple Regression Analysis (BMR) on the results obtained, a statistical model was obtained in terms of cable length and transformer capacity. The model is useful for ferroresonant prediction and control in cable-fed transformers.

Keywords: ferroresonance, cable-fed transformers, EMTP RV, regression analysis

Procedia PDF Downloads 515
8339 Hybrid Project Management Model Based on Lean and Agile Approach

Authors: Fatima-Zahra Eddoug, Jamal Benhra, Rajaa Benabbou

Abstract:

Several project management models exist in the literature and the most used ones are the hybrids for their multiple advantages. Our objective in this paper is to analyze the existing models, which are based on the Lean and Agile approaches and to propose a novel framework with the convenient tools that will allow efficient management of a general project. To create the desired framework, we were based essentially on 7 existing models. Only the Scrum tool among the agile tools was identified by several authors to be appropriate for project management. In contrast, multiple lean tools were proposed in different phases of the project.

Keywords: agility, hybrid project management, lean, scrum

Procedia PDF Downloads 118
8338 Towards an Effective Approach for Modelling near Surface Air Temperature Combining Weather and Satellite Data

Authors: Nicola Colaninno, Eugenio Morello

Abstract:

The urban environment affects local-to-global climate and, in turn, suffers global warming phenomena, with worrying impacts on human well-being, health, social and economic activities. Physic-morphological features of the built-up space affect urban air temperature, locally, causing the urban environment to be warmer compared to surrounding rural. This occurrence, typically known as the Urban Heat Island (UHI), is normally assessed by means of air temperature from fixed weather stations and/or traverse observations or based on remotely sensed Land Surface Temperatures (LST). The information provided by ground weather stations is key for assessing local air temperature. However, the spatial coverage is normally limited due to low density and uneven distribution of the stations. Although different interpolation techniques such as Inverse Distance Weighting (IDW), Ordinary Kriging (OK), or Multiple Linear Regression (MLR) are used to estimate air temperature from observed points, such an approach may not effectively reflect the real climatic conditions of an interpolated point. Quantifying local UHI for extensive areas based on weather stations’ observations only is not practicable. Alternatively, the use of thermal remote sensing has been widely investigated based on LST. Data from Landsat, ASTER, or MODIS have been extensively used. Indeed, LST has an indirect but significant influence on air temperatures. However, high-resolution near-surface air temperature (NSAT) is currently difficult to retrieve. Here we have experimented Geographically Weighted Regression (GWR) as an effective approach to enable NSAT estimation by accounting for spatial non-stationarity of the phenomenon. The model combines on-site measurements of air temperature, from fixed weather stations and satellite-derived LST. The approach is structured upon two main steps. First, a GWR model has been set to estimate NSAT at low resolution, by combining air temperature from discrete observations retrieved by weather stations (dependent variable) and the LST from satellite observations (predictor). At this step, MODIS data, from Terra satellite, at 1 kilometer of spatial resolution have been employed. Two time periods are considered according to satellite revisit period, i.e. 10:30 am and 9:30 pm. Afterward, the results have been downscaled at 30 meters of spatial resolution by setting a GWR model between the previously retrieved near-surface air temperature (dependent variable), the multispectral information as provided by the Landsat mission, in particular the albedo, and Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission (SRTM), both at 30 meters. Albedo and DEM are now the predictors. The area under investigation is the Metropolitan City of Milan, which covers an area of approximately 1,575 km2 and encompasses a population of over 3 million inhabitants. Both models, low- (1 km) and high-resolution (30 meters), have been validated according to a cross-validation that relies on indicators such as R2, Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). All the employed indicators give evidence of highly efficient models. In addition, an alternative network of weather stations, available for the City of Milano only, has been employed for testing the accuracy of the predicted temperatures, giving and RMSE of 0.6 and 0.7 for daytime and night-time, respectively.

Keywords: urban climate, urban heat island, geographically weighted regression, remote sensing

Procedia PDF Downloads 179
8337 Seismic Perimeter Surveillance System (Virtual Fence) for Threat Detection and Characterization Using Multiple ML Based Trained Models in Weighted Ensemble Voting

Authors: Vivek Mahadev, Manoj Kumar, Neelu Mathur, Brahm Dutt Pandey

Abstract:

Perimeter guarding and protection of critical installations require prompt intrusion detection and assessment to take effective countermeasures. Currently, visual and electronic surveillance are the primary methods used for perimeter guarding. These methods can be costly and complicated, requiring careful planning according to the location and terrain. Moreover, these methods often struggle to detect stealthy and camouflaged insurgents. The object of the present work is to devise a surveillance technique using seismic sensors that overcomes the limitations of existing systems. The aim is to improve intrusion detection, assessment, and characterization by utilizing seismic sensors. Most of the similar systems have only two types of intrusion detection capability viz., human or vehicle. In our work we could even categorize further to identify types of intrusion activity such as walking, running, group walking, fence jumping, tunnel digging and vehicular movements. A virtual fence of 60 meters at GCNEP, Bahadurgarh, Haryana, India, was created by installing four underground geophones at a distance of 15 meters each. The signals received from these geophones are then processed to find unique seismic signatures called features. Various feature optimization and selection methodologies, such as LightGBM, Boruta, Random Forest, Logistics, Recursive Feature Elimination, Chi-2 and Pearson Ratio were used to identify the best features for training the machine learning models. The trained models were developed using algorithms such as supervised support vector machine (SVM) classifier, kNN, Decision Tree, Logistic Regression, Naïve Bayes, and Artificial Neural Networks. These models were then used to predict the category of events, employing weighted ensemble voting to analyze and combine their results. The models were trained with 1940 training events and results were evaluated with 831 test events. It was observed that using the weighted ensemble voting increased the efficiency of predictions. In this study we successfully developed and deployed the virtual fence using geophones. Since these sensors are passive, do not radiate any energy and are installed underground, it is impossible for intruders to locate and nullify them. Their flexibility, quick and easy installation, low costs, hidden deployment and unattended surveillance make such systems especially suitable for critical installations and remote facilities with difficult terrain. This work demonstrates the potential of utilizing seismic sensors for creating better perimeter guarding and protection systems using multiple machine learning models in weighted ensemble voting. In this study the virtual fence achieved an intruder detection efficiency of over 97%.

Keywords: geophone, seismic perimeter surveillance, machine learning, weighted ensemble method

Procedia PDF Downloads 64
8336 Towards Automatic Calibration of In-Line Machine Processes

Authors: David F. Nettleton, Elodie Bugnicourt, Christian Wasiak, Alejandro Rosales

Abstract:

In this presentation, preliminary results are given for the modeling and calibration of two different industrial winding MIMO (Multiple Input Multiple Output) processes using machine learning techniques. In contrast to previous approaches which have typically used ‘black-box’ linear statistical methods together with a definition of the mechanical behavior of the process, we use non-linear machine learning algorithms together with a ‘white-box’ rule induction technique to create a supervised model of the fitting error between the expected and real force measures. The final objective is to build a precise model of the winding process in order to control de-tension of the material being wound in the first case, and the friction of the material passing through the die, in the second case. Case 1, Tension Control of a Winding Process. A plastic web is unwound from a first reel, goes over a traction reel and is rewound on a third reel. The objectives are: (i) to train a model to predict the web tension and (ii) calibration to find the input values which result in a given tension. Case 2, Friction Force Control of a Micro-Pullwinding Process. A core+resin passes through a first die, then two winding units wind an outer layer around the core, and a final pass through a second die. The objectives are: (i) to train a model to predict the friction on die2; (ii) calibration to find the input values which result in a given friction on die2. Different machine learning approaches are tested to build models, Kernel Ridge Regression, Support Vector Regression (with a Radial Basis Function Kernel) and MPART (Rule Induction with continuous value as output). As a previous step, the MPART rule induction algorithm was used to build an explicative model of the error (the difference between expected and real friction on die2). The modeling of the error behavior using explicative rules is used to help improve the overall process model. Once the models are built, the inputs are calibrated by generating Gaussian random numbers for each input (taking into account its mean and standard deviation) and comparing the output to a target (desired) output until a closest fit is found. The results of empirical testing show that a high precision is obtained for the trained models and for the calibration process. The learning step is the slowest part of the process (max. 5 minutes for this data), but this can be done offline just once. The calibration step is much faster and in under one minute obtained a precision error of less than 1x10-3 for both outputs. To summarize, in the present work two processes have been modeled and calibrated. A fast processing time and high precision has been achieved, which can be further improved by using heuristics to guide the Gaussian calibration. Error behavior has been modeled to help improve the overall process understanding. This has relevance for the quick optimal set up of many different industrial processes which use a pull-winding type process to manufacture fibre reinforced plastic parts. Acknowledgements to the Openmind project which is funded by Horizon 2020 European Union funding for Research & Innovation, Grant Agreement number 680820

Keywords: data model, machine learning, industrial winding, calibration

Procedia PDF Downloads 227
8335 Estimating Lost Digital Video Frames Using Unidirectional and Bidirectional Estimation Based on Autoregressive Time Model

Authors: Navid Daryasafar, Nima Farshidfar

Abstract:

In this article, we make attempt to hide error in video with an emphasis on the time-wise use of autoregressive (AR) models. To resolve this problem, we assume that all information in one or more video frames is lost. Then, lost frames are estimated using analogous Pixels time information in successive frames. Accordingly, after presenting autoregressive models and how they are applied to estimate lost frames, two general methods are presented for using these models. The first method which is the same standard method of autoregressive models estimates lost frame in unidirectional form. Usually, in such condition, previous frames information is used for estimating lost frame. Yet, in the second method, information from the previous and next frames is used for estimating the lost frame. As a result, this method is known as bidirectional estimation. Then, carrying out a series of tests, performance of each method is assessed in different modes. And, results are compared.

Keywords: error steganography, unidirectional estimation, bidirectional estimation, AR linear estimation

Procedia PDF Downloads 516
8334 Validating Condition-Based Maintenance Algorithms through Simulation

Authors: Marcel Chevalier, Léo Dupont, Sylvain Marié, Frédérique Roffet, Elena Stolyarova, William Templier, Costin Vasile

Abstract:

Industrial end-users are currently facing an increasing need to reduce the risk of unexpected failures and optimize their maintenance. This calls for both short-term analysis and long-term ageing anticipation. At Schneider Electric, we tackle those two issues using both machine learning and first principles models. Machine learning models are incrementally trained from normal data to predict expected values and detect statistically significant short-term deviations. Ageing models are constructed by breaking down physical systems into sub-assemblies, then determining relevant degradation modes and associating each one to the right kinetic law. Validating such anomaly detection and maintenance models is challenging, both because actual incident and ageing data are rare and distorted by human interventions, and incremental learning depends on human feedback. To overcome these difficulties, we propose to simulate physics, systems, and humans -including asset maintenance operations- in order to validate the overall approaches in accelerated time and possibly choose between algorithmic alternatives.

Keywords: degradation models, ageing, anomaly detection, soft sensor, incremental learning

Procedia PDF Downloads 113
8333 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 424