Search results for: k-fold cross validation technique
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 10920

Search results for: k-fold cross validation technique

10920 Invention of Novel Technique of Process Scale Up by Using Solid Dosage Form

Authors: Shashank Tiwari, S. P. Mahapatra

Abstract:

The aim of this technique is to reduce the steps of process scales up, save time & cost of the industries. This technique will minimise the steps of process scale up. The new steps are, Novel Lab Scale, Novel Lab Scale Trials, Novel Trial Batches, Novel Exhibit Batches, Novel Validation Batches. In these steps, it is not divided to validation batches in three parts but the data of trials batches, Exhibit Batches and Validation batches are use and compile for production and used for validation. It also increases the batch size of the trial, exhibit batches. The new size of trials batches is not less than fifty Thousand, the exhibit batches increase up to two lack and the validation batches up to five lack. After preparing the batches all their data & drugs use for stability & maintain the validation record and compile data for the technology transfer in production department for preparing the marketed size batches.

Keywords: batches, technique, preparation, scale up, validation

Procedia PDF Downloads 321
10919 Efficient Model Selection in Linear and Non-Linear Quantile Regression by Cross-Validation

Authors: Yoonsuh Jung, Steven N. MacEachern

Abstract:

Check loss function is used to define quantile regression. In the prospect of cross validation, it is also employed as a validation function when underlying truth is unknown. However, our empirical study indicates that the validation with check loss often leads to choosing an over estimated fits. In this work, we suggest a modified or L2-adjusted check loss which rounds the sharp corner in the middle of check loss. It has a large effect of guarding against over fitted model in some extent. Through various simulation settings of linear and non-linear regressions, the improvement of check loss by L2 adjustment is empirically examined. This adjustment is devised to shrink to zero as sample size grows.

Keywords: cross-validation, model selection, quantile regression, tuning parameter selection

Procedia PDF Downloads 406
10918 A Validation Technique for Integrated Ontologies

Authors: Neli P. Zlatareva

Abstract:

Ontology validation is an important part of web applications’ development, where knowledge integration and ontological reasoning play a fundamental role. It aims to ensure the consistency and correctness of ontological knowledge and to guarantee that ontological reasoning is carried out in a meaningful way. Existing approaches to ontology validation address more or less specific validation issues, but the overall process of validating web ontologies has not been formally established yet. As the size and the number of web ontologies continue to grow, the necessity to validate and ensure their consistency and interoperability is becoming increasingly important. This paper presents a validation technique intended to test the consistency of independent ontologies utilized by a common application.

Keywords: knowledge engineering, ontological reasoning, ontology validation, semantic web

Procedia PDF Downloads 292
10917 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models

Authors: Yoonsuh Jung

Abstract:

As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an "optimal" value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.

Keywords: cross validation, parameter averaging, parameter selection, regularization parameter search

Procedia PDF Downloads 383
10916 Preliminary Study of Standardization and Validation of Micronuclei Technique to Assess the DNA Damages Cause for the X-Rays

Authors: L. J. Díaz, M. A. Hernández, A. K. Molina, A. Bermúdez, C. Crane, V. M. Pabón

Abstract:

One of the most important biological indicators that show the exposure to the radiation is the micronuclei (MN). This technique is using to determinate the radiation effects in blood cultures as a biological control and a complement to the physics dosimetry. In Colombia the necessity to apply this analysis has emerged due to the current biological indicator most used is the chromosomal aberrations (CA), that is why it is essential the MN technique’s standardization and validation to have enough tools to improve the radioprotection topic in the country. Besides, this technique will be applied on the construction of a dose-response curve, that allow measure an approximately dose to irradiated people according to MN frequency found. Inside the steps that carried out to accomplish the standardization and validation is the statistic analysis from the lectures of “in vitro” peripheral blood cultures with different analysts, also it was determinate the best culture medium and conditions for the MN can be detected easily.

Keywords: micronuclei, radioprotection, standardization, validation

Procedia PDF Downloads 459
10915 Validation of the X-Ray Densitometry Method for Radial Density Pattern Determination of Acacia seyal var. seyal Tree Species

Authors: Hanadi Mohamed Shawgi Gamal, Claus Thomas Bues

Abstract:

Wood density is a variable influencing many of the technological and quality properties of wood. Understanding the pattern of wood density radial variation is important for its end-use. The X-ray technique, traditionally applied to softwood species to assess the wood quality properties, due to its simple and relatively uniform wood structure. On the other hand, very limited information is available about the validation of using this technique for hardwood species. The suitability of using the X-ray technique for the determination of hardwood density has a special significance in countries like Sudan, where only a few timbers are well known. This will not only save the time consumed by using the traditional methods, but it will also enhance the investigations of the great number of the lesser known species, the thing which will fill the huge cap of lake information of hardwood species growing in Sudan. The current study aimed to evaluate the validation of using the X-ray densitometry technique to determine the radial variation of wood density of Acacia seyal var. seyal. To this, a total of thirty trees were collected randomly from four states in Sudan. The wood density radial trend was determined using the basic density as well as density obtained by the X-ray densitometry method in order to assess the validation of X-ray technique in wood density radial variation determination. The results showed that the pattern of radial trend of density obtained by X-ray technique is very similar to that achieved by basic density. These results confirmed the validation of using the X-ray technique for Acacia seyal var. seyal density radial trend determination. It also promotes the suitability of using this method in other hardwood species.

Keywords: x-ray densitometry, wood density, Acacia seyal var. seyal, radial variation

Procedia PDF Downloads 114
10914 Specific Emitter Identification Based on Refined Composite Multiscale Dispersion Entropy

Authors: Shaoying Guo, Yanyun Xu, Meng Zhang, Weiqing Huang

Abstract:

The wireless communication network is developing rapidly, thus the wireless security becomes more and more important. Specific emitter identification (SEI) is an vital part of wireless communication security as a technique to identify the unique transmitters. In this paper, a SEI method based on multiscale dispersion entropy (MDE) and refined composite multiscale dispersion entropy (RCMDE) is proposed. The algorithms of MDE and RCMDE are used to extract features for identification of five wireless devices and cross-validation support vector machine (CV-SVM) is used as the classifier. The experimental results show that the total identification accuracy is 99.3%, even at low signal-to-noise ratio(SNR) of 5dB, which proves that MDE and RCMDE can describe the communication signal series well. In addition, compared with other methods, the proposed method is effective and provides better accuracy and stability for SEI.

Keywords: cross-validation support vector machine, refined com- posite multiscale dispersion entropy, specific emitter identification, transient signal, wireless communication device

Procedia PDF Downloads 107
10913 Classifying Students for E-Learning in Information Technology Course Using ANN

Authors: Sirilak Areerachakul, Nat Ployong, Supayothin Na Songkla

Abstract:

This research’s objective is to select the model with most accurate value by using Neural Network Technique as a way to filter potential students who enroll in IT course by electronic learning at Suan Suanadha Rajabhat University. It is designed to help students selecting the appropriate courses by themselves. The result showed that the most accurate model was 100 Folds Cross-validation which had 73.58% points of accuracy.

Keywords: artificial neural network, classification, students, e-learning

Procedia PDF Downloads 391
10912 Selection of Variogram Model for Environmental Variables

Authors: Sheikh Samsuzzhan Alam

Abstract:

The present study investigates the selection of variogram model in analyzing spatial variations of environmental variables with the trend. Sometimes, the autofitted theoretical variogram does not really capture the true nature of the empirical semivariogram. So proper exploration and analysis are needed to select the best variogram model. For this study, an open source data collected from California Soil Resource Lab1 is used to explain the problems when fitting a theoretical variogram. Five most commonly used variogram models: Linear, Gaussian, Exponential, Matern, and Spherical were fitted to the experimental semivariogram. Ordinary kriging methods were considered to evaluate the accuracy of the selected variograms through cross-validation. This study is beneficial for selecting an appropriate theoretical variogram model for environmental variables.

Keywords: anisotropy, cross-validation, environmental variables, kriging, variogram models

Procedia PDF Downloads 297
10911 Estimation of the Acute Toxicity of Halogenated Phenols Using Quantum Chemistry Descriptors

Authors: Khadidja Bellifa, Sidi Mohamed Mekelleche

Abstract:

Phenols and especially halogenated phenols represent a substantial part of the chemicals produced worldwide and are known as aquatic pollutants. Quantitative structure–toxicity relationship (QSTR) models are useful for understanding how chemical structure relates to the toxicity of chemicals. In the present study, the acute toxicities of 45 halogenated phenols to Tetrahymena Pyriformis are estimated using no cost semi-empirical quantum chemistry methods. QSTR models were established using the multiple linear regression technique and the predictive ability of the models was evaluated by the internal cross-validation, the Y-randomization and the external validation. Their structural chemical domain has been defined by the leverage approach. The results show that the best model is obtained with the AM1 method (R²= 0.91, R²CV= 0.90, SD= 0.20 for the training set and R²= 0.96, SD= 0.11 for the test set). Moreover, all the Tropsha’ criteria for a predictive QSTR model are verified.

Keywords: halogenated phenols, toxicity mechanism, hydrophobicity, electrophilicity index, quantitative stucture-toxicity relationships

Procedia PDF Downloads 265
10910 Analysis of Active Compounds in Thai Herbs by near Infrared Spectroscopy

Authors: Chaluntorn Vichasilp, Sutee Wangtueai

Abstract:

This study aims to develop a new method to detect active compounds in Thai herbs (1-deoxynojirimycin (DNJ) in mulberry leave, anthocyanin in Mao and curcumin in turmeric) using near infrared spectroscopy (NIRs). NIRs is non-destructive technique that rapid, non-chemical involved and low-cost determination. By NIRs and chemometrics technique, it was found that the DNJ prediction equation conducted with partial least square regression with cross-validation had low accuracy R2 (0.42) and SEP (31.87 mg/100g). On the other hand, the anthocyanin prediction equation showed moderate good results (R2 and SEP of 0.78 and 0.51 mg/g) with Multiplication scattering correction at wavelength of 2000-2200 nm. The high absorption could be observed at wavelength of 2047 nm and this model could be used as screening level. For curcumin prediction, the good result was obtained when applied original spectra with smoothing technique. The wavelength of 1400-2500 nm was created regression model with R2 (0.68) and SEP (0.17 mg/g). This model had high NIRs absorption at a wavelength of 1476, 1665, 1986 and 2395 nm, respectively. NIRs showed prospective technique for detection of some active compounds in Thai herbs.

Keywords: anthocyanin, curcumin, 1-deoxynojirimycin (DNJ), near infrared spectroscopy (NIRs)

Procedia PDF Downloads 349
10909 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation

Procedia PDF Downloads 197
10908 Surface-Enhanced Raman Spectroscopy on Gold Nanoparticles in the Kidney Disease

Authors: Leonardo C. Pacheco-Londoño, Nataly J Galan-Freyle, Lisandro Pacheco-Lugo, Antonio Acosta-Hoyos, Elkin Navarro, Gustavo Aroca-Martinez, Karin Rondón-Payares, Alberto C. Espinosa-Garavito, Samuel P. Hernández-Rivera

Abstract:

At the Life Science Research Center at Simon Bolivar University, a primary focus is the diagnosis of various diseases, and the use of gold nanoparticles (Au-NPs) in diverse biomedical applications is continually expanding. In the present study, Au-NPs were employed as substrates for Surface-Enhanced Raman Spectroscopy (SERS) aimed at diagnosing kidney diseases arising from Lupus Nephritis (LN), preeclampsia (PC), and Hypertension (H). Discrimination models were developed for distinguishing patients with and without kidney diseases based on the SERS signals from urine samples by partial least squares-discriminant analysis (PLS-DA). A comparative study of the Raman signals across the three conditions was conducted, leading to the identification of potential metabolite signals. Model performance was assessed through cross-validation and external validation, determining parameters like sensitivity and specificity. Additionally, a secondary analysis was performed using machine learning (ML) models, wherein different ML algorithms were evaluated for their efficiency. Models’ validation was carried out using cross-validation and external validation, and other parameters were determined, such as sensitivity and specificity; the models showed average values of 0.9 for both parameters. Additionally, it is not possible to highlight this collaborative effort involved two university research centers and two healthcare institutions, ensuring ethical treatment and informed consent of patient samples.

Keywords: SERS, Raman, PLS-DA, kidney diseases

Procedia PDF Downloads 13
10907 Molecular Topology and TLC Retention Behaviour of s-Triazines: QSRR Study

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

Quantitative structure-retention relationship (QSRR) analysis was used to predict the chromatographic behavior of s-triazine derivatives by using theoretical descriptors computed from the chemical structure. Fundamental basis of the reported investigation is to relate molecular topological descriptors with chromatographic behavior of s-triazine derivatives obtained by reversed-phase (RP) thin layer chromatography (TLC) on silica gel impregnated with paraffin oil and applied ethanol-water (φ = 0.5-0.8; v/v). Retention parameter (RM0) of 14 investigated s-triazine derivatives was used as dependent variable while simple connectivity index different orders were used as independent variables. The best QSRR model for predicting RM0 value was obtained with simple third order connectivity index (3χ) in the second-degree polynomial equation. Numerical values of the correlation coefficient (r=0.915), Fisher's value (F=28.34) and root mean square error (RMSE = 0.36) indicate that model is statistically significant. In order to test the predictive power of the QSRR model leave-one-out cross-validation technique has been applied. The parameters of the internal cross-validation analysis (r2CV=0.79, r2adj=0.81, PRESS=1.89) reflect the high predictive ability of the generated model and it confirms that can be used to predict RM0 value. Multivariate classification technique, hierarchical cluster analysis (HCA), has been applied in order to group molecules according to their molecular connectivity indices. HCA is a descriptive statistical method and it is the most frequently used for important area of data processing such is classification. The HCA performed on simple molecular connectivity indices obtained from the 2D structure of investigated s-triazine compounds resulted in two main clusters in which compounds molecules were grouped according to the number of atoms in the molecule. This is in agreement with the fact that these descriptors were calculated on the basis of the number of atoms in the molecule of the investigated s-triazine derivatives.

Keywords: s-triazines, QSRR, chemometrics, chromatography, molecular descriptors

Procedia PDF Downloads 364
10906 Characterization and Geographical Differentiation of Yellow Prickly Pear Produced in Different Mediterranean Countries

Authors: Artemis Louppis, Michalis Constantinou, Ioanna Kosma, Federica Blando, Michael Kontominas, Anastasia Badeka

Abstract:

The aim of the present study was to differentiate yellow prickly pear according to geographical origin based on the combination of mineral content, physicochemical parameters, vitamins and antioxidants. A total of 240 yellow prickly pear samples from Cyprus, Spain, Italy and Greece were analyzed for pH, titratable acidity, electrical conductivity, protein, moisture, ash, fat, antioxidant activity, individual antioxidants, sugars and vitamins by UPLC-MS/MS as well as minerals by ICP-MS. Statistical treatment of the data included multivariate analysis of variance followed by linear discriminant analysis. Based on results, a correct classification of 66.7% was achieved using the cross validation by mineral content while 86.1% was achieved using the cross validation method by combination of all analytical parameters.

Keywords: geographical differentiation, prickly pear, chemometrics, analytical techniques

Procedia PDF Downloads 115
10905 A Discrete Logit Survival Model with a Smooth Baseline Hazard for Age at First Alcohol Intake among Students at Tertiary Institutions in Thohoyandou, South Africa

Authors: A. Bere, H. G. Sithuba, K. Kyei, C. Sigauke

Abstract:

We employ a discrete logit survival model to investigate the risk factors for early alcohol intake among students at two tertiary institutions in Thohoyandou, South Africa. Data were collected from a sample of 744 students using a self-administered questionnaire. Significant covariates were arrived at through a regularization algorithm implemented using the glmmLasso package. The tuning parameter was determined using a five-fold cross-validation algorithm. The baseline hazard was modelled as a smooth function of time through the use of spline functions. The results show that the hazard of initial alcohol intake peaks at the age of about 16 years and that at any given time, being of a male gender, prior use of other drugs, having drinking peers, having experienced negative life events and physical abuse are associated with a higher risk of alcohol intake debut.

Keywords: cross-validation, discrete hazard model, LASSO, smooth baseline hazard

Procedia PDF Downloads 157
10904 Validation of the Linear Trend Estimation Technique for Prediction of Average Water and Sewerage Charge Rate Prices in the Czech Republic

Authors: Aneta Oblouková, Eva Vítková

Abstract:

The article deals with the issue of water and sewerage charge rate prices in the Czech Republic. The research is specifically focused on the analysis of the development of the average prices of water and sewerage charge rate in the Czech Republic in the years 1994-2021 and on the validation of the chosen methodology relevant for the prediction of the development of the average prices of water and sewerage charge rate in the Czech Republic. The research is based on data collection. The data for this research was obtained from the Czech Statistical Office. The aim of the paper is to validate the relevance of the mathematical linear trend estimate technique for the calculation of the predicted average prices of water and sewerage charge rates. The real values of the average prices of water and sewerage charge rates in the Czech Republic in the years 1994-2018 were obtained from the Czech Statistical Office and were converted into a mathematical equation. The same type of real data was obtained from the Czech Statistical Office for the years 2019-2021. Prediction of the average prices of water and sewerage charge rates in the Czech Republic in the years 2019-2021 were also calculated using a chosen method -a linear trend estimation technique. The values obtained from the Czech Statistical Office and the values calculated using the chosen methodology were subsequently compared. The research result is a validation of the chosen mathematical technique to be a suitable technique for this research.

Keywords: Czech Republic, linear trend estimation, price prediction, water and sewerage charge rate

Procedia PDF Downloads 95
10903 An Enhanced Approach in Validating Analytical Methods Using Tolerance-Based Design of Experiments (DoE)

Authors: Gule Teri

Abstract:

The effective validation of analytical methods forms a crucial component of pharmaceutical manufacturing. However, traditional validation techniques can occasionally fail to fully account for inherent variations within datasets, which may result in inconsistent outcomes. This deficiency in validation accuracy is particularly noticeable when quantifying low concentrations of active pharmaceutical ingredients (APIs), excipients, or impurities, introducing a risk to the reliability of the results and, subsequently, the safety and effectiveness of the pharmaceutical products. In response to this challenge, we introduce an enhanced, tolerance-based Design of Experiments (DoE) approach for the validation of analytical methods. This approach distinctly measures variability with reference to tolerance or design margins, enhancing the precision and trustworthiness of the results. This method provides a systematic, statistically grounded validation technique that improves the truthfulness of results. It offers an essential tool for industry professionals aiming to guarantee the accuracy of their measurements, particularly for low-concentration components. By incorporating this innovative method, pharmaceutical manufacturers can substantially advance their validation processes, subsequently improving the overall quality and safety of their products. This paper delves deeper into the development, application, and advantages of this tolerance-based DoE approach and demonstrates its effectiveness using High-Performance Liquid Chromatography (HPLC) data for verification. This paper also discusses the potential implications and future applications of this method in enhancing pharmaceutical manufacturing practices and outcomes.

Keywords: tolerance-based design, design of experiments, analytical method validation, quality control, biopharmaceutical manufacturing

Procedia PDF Downloads 39
10902 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 45
10901 Development of a Decision-Making Method by Using Machine Learning Algorithms in the Early Stage of School Building Design

Authors: Pegah Eshraghi, Zahra Sadat Zomorodian, Mohammad Tahsildoost

Abstract:

Over the past decade, energy consumption in educational buildings has steadily increased. The purpose of this research is to provide a method to quickly predict the energy consumption of buildings using separate evaluation of zones and decomposing the building to eliminate the complexity of geometry at the early design stage. To produce this framework, machine learning algorithms such as Support vector regression (SVR) and Artificial neural network (ANN) are used to predict energy consumption and thermal comfort metrics in a school as a case. The database consists of more than 55000 samples in three climates of Iran. Cross-validation evaluation and unseen data have been used for validation. In a specific label, cooling energy, it can be said the accuracy of prediction is at least 84% and 89% in SVR and ANN, respectively. The results show that the SVR performed much better than the ANN.

Keywords: early stage of design, energy, thermal comfort, validation, machine learning

Procedia PDF Downloads 44
10900 Creation and Validation of a Measurement Scale of E-Management: An Exploratory and Confirmatory Study

Authors: Hamadi Khlif

Abstract:

This paper deals with the understanding of the concept of e-management and the development of a measuring instrument adapted to the new problems encountered during the application of this new practice within the modern enterprise. Two principal e-management factors have been isolated in an exploratory study carried out among 260 participants. A confirmatory study applied to a second sample of 270 participants has been established in a cross-validation of the scale of measurement. The study presents the literature review specifically dedicated to e-management and the results of the exploratory and confirmatory phase of the development of this scale, which demonstrates satisfactory psychometric qualities. The e-management has two dimensions: a managerial dimension and a technological dimension.

Keywords: e-management, management, ICT deployment, mode of management

Procedia PDF Downloads 286
10899 Optical Flow Technique for Supersonic Jet Measurements

Authors: Haoxiang Desmond Lim, Jie Wu, Tze How Daniel New, Shengxian Shi

Abstract:

This paper outlines the development of a novel experimental technique in quantifying supersonic jet flows, in an attempt to avoid seeding particle problems frequently associated with particle-image velocimetry (PIV) techniques at high Mach numbers. Based on optical flow algorithms, the idea behind the technique involves using high speed cameras to capture Schlieren images of the supersonic jet shear layers, before they are subjected to an adapted optical flow algorithm based on the Horn-Schnuck method to determine the associated flow fields. The proposed method is capable of offering full-field unsteady flow information with potentially higher accuracy and resolution than existing point-measurements or PIV techniques. Preliminary study via numerical simulations of a circular de Laval jet nozzle successfully reveals flow and shock structures typically associated with supersonic jet flows, which serve as useful data for subsequent validation of the optical flow based experimental results. For experimental technique, a Z-type Schlieren setup is proposed with supersonic jet operated in cold mode, stagnation pressure of 8.2 bar and exit velocity of Mach 1.5. High-speed single-frame or double-frame cameras are used to capture successive Schlieren images. As implementation of optical flow technique to supersonic flows remains rare, the current focus revolves around methodology validation through synthetic images. The results of validation test offers valuable insight into how the optical flow algorithm can be further improved to improve robustness and accuracy. Details of the methodology employed and challenges faced will be further elaborated in the final conference paper should the abstract be accepted. Despite these challenges however, this novel supersonic flow measurement technique may potentially offer a simpler way to identify and quantify the fine spatial structures within the shock shear layer.

Keywords: Schlieren, optical flow, supersonic jets, shock shear layer

Procedia PDF Downloads 287
10898 Development of a Decision-Making Method by Using Machine Learning Algorithms in the Early Stage of School Building Design

Authors: Rajaian Hoonejani Mohammad, Eshraghi Pegah, Zomorodian Zahra Sadat, Tahsildoost Mohammad

Abstract:

Over the past decade, energy consumption in educational buildings has steadily increased. The purpose of this research is to provide a method to quickly predict the energy consumption of buildings using separate evaluation of zones and decomposing the building to eliminate the complexity of geometry at the early design stage. To produce this framework, machine learning algorithms such as Support vector regression (SVR) and Artificial neural network (ANN) are used to predict energy consumption and thermal comfort metrics in a school as a case. The database consists of more than 55000 samples in three climates of Iran. Cross-validation evaluation and unseen data have been used for validation. In a specific label, cooling energy, it can be said the accuracy of prediction is at least 84% and 89% in SVR and ANN, respectively. The results show that the SVR performed much better than the ANN.

Keywords: early stage of design, energy, thermal comfort, validation, machine learning

Procedia PDF Downloads 28
10897 Modeling and Validation of Microspheres Generation in the Modified T-Junction Device

Authors: Lei Lei, Hongbo Zhang, Donald J. Bergstrom, Bing Zhang, K. Y. Song, W. J. Zhang

Abstract:

This paper presents a model for a modified T-junction device for microspheres generation. The numerical model is developed using a commercial software package: COMSOL Multiphysics. In order to test the accuracy of the numerical model, multiple variables, such as the flow rate of cross-flow, fluid properties, structure, and geometry of the microdevice are applied. The results from the model are compared with the experimental results in the diameter of the microsphere generated. The comparison shows a good agreement. Therefore the model is useful in further optimization of the device and feedback control of microsphere generation if any.

Keywords: CFD modeling, validation, microsphere generation, modified T-junction

Procedia PDF Downloads 672
10896 Agile Software Effort Estimation Using Regression Techniques

Authors: Mikiyas Adugna

Abstract:

Effort estimation is among the activities carried out in software development processes. An accurate model of estimation leads to project success. The method of agile effort estimation is a complex task because of the dynamic nature of software development. Researchers are still conducting studies on agile effort estimation to enhance prediction accuracy. Due to these reasons, we investigated and proposed a model on LASSO and Elastic Net regression to enhance estimation accuracy. The proposed model has major components: preprocessing, train-test split, training with default parameters, and cross-validation. During the preprocessing phase, the entire dataset is normalized. After normalization, a train-test split is performed on the dataset, setting training at 80% and testing set to 20%. We chose two different phases for training the two algorithms (Elastic Net and LASSO) regression following the train-test-split. In the first phase, the two algorithms are trained using their default parameters and evaluated on the testing data. In the second phase, the grid search technique (the grid is used to search for tuning and select optimum parameters) and 5-fold cross-validation to get the final trained model. Finally, the final trained model is evaluated using the testing set. The experimental work is applied to the agile story point dataset of 21 software projects collected from six firms. The results show that both Elastic Net and LASSO regression outperformed the compared ones. Compared to the proposed algorithms, LASSO regression achieved better predictive performance and has acquired PRED (8%) and PRED (25%) results of 100.0, MMRE of 0.0491, MMER of 0.0551, MdMRE of 0.0593, MdMER of 0.063, and MSE of 0.0007. The result implies LASSO regression algorithm trained model is the most acceptable, and higher estimation performance exists in the literature.

Keywords: agile software development, effort estimation, elastic net regression, LASSO

Procedia PDF Downloads 19
10895 Educational Data Mining: The Case of the Department of Mathematics and Computing in the Period 2009-2018

Authors: Mário Ernesto Sitoe, Orlando Zacarias

Abstract:

University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.

Keywords: evasion and retention, cross-validation, bagging, stacking

Procedia PDF Downloads 52
10894 Kou Jump Diffusion Model: An Application to the SP 500; Nasdaq 100 and Russell 2000 Index Options

Authors: Wajih Abbassi, Zouhaier Ben Khelifa

Abstract:

The present research points towards the empirical validation of three options valuation models, the ad-hoc Black-Scholes model as proposed by Berkowitz (2001), the constant elasticity of variance model of Cox and Ross (1976) and the Kou jump-diffusion model (2002). Our empirical analysis has been conducted on a sample of 26,974 options written on three indexes, the S&P 500, Nasdaq 100 and the Russell 2000 that were negotiated during the year 2007 just before the sub-prime crisis. We start by presenting the theoretical foundations of the models of interest. Then we use the technique of trust-region-reflective algorithm to estimate the structural parameters of these models from cross-section of option prices. The empirical analysis shows the superiority of the Kou jump-diffusion model. This superiority arises from the ability of this model to portray the behavior of market participants and to be closest to the true distribution that characterizes the evolution of these indices. Indeed the double-exponential distribution covers three interesting properties that are: the leptokurtic feature, the memory less property and the psychological aspect of market participants. Numerous empirical studies have shown that markets tend to have both overreaction and under reaction over good and bad news respectively. Despite of these advantages there are not many empirical studies based on this model partly because probability distribution and option valuation formula are rather complicated. This paper is the first to have used the technique of nonlinear curve-fitting through the trust-region-reflective algorithm and cross-section options to estimate the structural parameters of the Kou jump-diffusion model.

Keywords: jump-diffusion process, Kou model, Leptokurtic feature, trust-region-reflective algorithm, US index options

Procedia PDF Downloads 402
10893 Cross-Validation of the Data Obtained for ω-6 Linoleic and ω-3 α-Linolenic Acids Concentration of Hemp Oil Using Jackknife and Bootstrap Resampling

Authors: Vibha Devi, Shabina Khanam

Abstract:

Hemp (Cannabis sativa) possesses a rich content of ω-6 linoleic and ω-3 linolenic essential fatty acid in the ratio of 3:1, which is a rare and most desired ratio that enhances the quality of hemp oil. These components are beneficial for the development of cell and body growth, strengthen the immune system, possess anti-inflammatory action, lowering the risk of heart problem owing to its anti-clotting property and a remedy for arthritis and various disorders. The present study employs supercritical fluid extraction (SFE) approach on hemp seed at various conditions of parameters; temperature (40 - 80) °C, pressure (200 - 350) bar, flow rate (5 - 15) g/min, particle size (0.430 - 1.015) mm and amount of co-solvent (0 - 10) % of solvent flow rate through central composite design (CCD). CCD suggested 32 sets of experiments, which was carried out. As SFE process includes large number of variables, the present study recommends the application of resampling techniques for cross-validation of the obtained data. Cross-validation refits the model on each data to achieve the information regarding the error, variability, deviation etc. Bootstrap and jackknife are the most popular resampling techniques, which create a large number of data through resampling from the original dataset and analyze these data to check the validity of the obtained data. Jackknife resampling is based on the eliminating one observation from the original sample of size N without replacement. For jackknife resampling, the sample size is 31 (eliminating one observation), which is repeated by 32 times. Bootstrap is the frequently used statistical approach for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. For bootstrap resampling, the sample size is 32, which was repeated by 100 times. Estimands for these resampling techniques are considered as mean, standard deviation, variation coefficient and standard error of the mean. For ω-6 linoleic acid concentration, mean value was approx. 58.5 for both resampling methods, which is the average (central value) of the sample mean of all data points. Similarly, for ω-3 linoleic acid concentration, mean was observed as 22.5 through both resampling. Variance exhibits the spread out of the data from its mean. Greater value of variance exhibits the large range of output data, which is 18 for ω-6 linoleic acid (ranging from 48.85 to 63.66 %) and 6 for ω-3 linoleic acid (ranging from 16.71 to 26.2 %). Further, low value of standard deviation (approx. 1 %), low standard error of the mean (< 0.8) and low variance coefficient (< 0.2) reflect the accuracy of the sample for prediction. All the estimator value of variance coefficients, standard deviation and standard error of the mean are found within the 95 % of confidence interval.

Keywords: resampling, supercritical fluid extraction, hemp oil, cross-validation

Procedia PDF Downloads 119
10892 Application of Support Vector Machines in Forecasting Non-Residential

Authors: Wiwat Kittinaraporn, Napat Harnpornchai, Sutja Boonyachut

Abstract:

This paper deals with the application of a novel neural network technique, so-called Support Vector Machine (SVM). The objective of this study is to explore the variable and parameter of forecasting factors in the construction industry to build up forecasting model for construction quantity in Thailand. The scope of the research is to study the non-residential construction quantity in Thailand. There are 44 sets of yearly data available, ranging from 1965 to 2009. The correlation between economic indicators and construction demand with the lag of one year was developed by Apichat Buakla. The selected variables are used to develop SVM models to forecast the non-residential construction quantity in Thailand. The parameters are selected by using ten-fold cross-validation method. The results are indicated in term of Mean Absolute Percentage Error (MAPE). The MAPE value for the non-residential construction quantity predicted by Epsilon-SVR in corporation with Radial Basis Function (RBF) of kernel function type is 5.90. Analysis of the experimental results show that the support vector machine modelling technique can be applied to forecast construction quantity time series which is useful for decision planning and management purpose.

Keywords: forecasting, non-residential, construction, support vector machines

Procedia PDF Downloads 403
10891 Predictive Analytics of Student Performance Determinants

Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi

Abstract:

Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine, Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis, and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.

Keywords: student performance, supervised machine learning, classification, cross-validation, prediction

Procedia PDF Downloads 85