Search results for: Probabilistic regression automata
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 995

Search results for: Probabilistic regression automata

815 On Estimating the Headcount Index by Using the Logistic Regression Estimator

Authors: Encarnación Álvarez, Rosa M. García-Fernández, Juan F. Muñoz, Francisco J. Blanco-Encomienda

Abstract:

The problem of estimating a proportion has important applications in the field of economics, and in general, in many areas such as social sciences. A common application in economics is the estimation of the headcount index. In this paper, we define the general headcount index as a proportion. Furthermore, we introduce a new quantitative method for estimating the headcount index. In particular, we suggest to use the logistic regression estimator for the problem of estimating the headcount index. Assuming a real data set, results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the traditional estimator of the headcount index.

Keywords: Poverty line, poor, risk of poverty, sample, Monte Carlo simulations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2093
814 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch

Abstract:

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.

Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2522
813 System Identification Based on Stepwise Regression for Dynamic Market Representation

Authors: Alexander Efremov

Abstract:

A system for market identification (SMI) is presented. The resulting representations are multivariable dynamic demand models. The market specifics are analyzed. Appropriate models and identification techniques are chosen. Multivariate static and dynamic models are used to represent the market behavior. The steps of the first stage of SMI, named data preprocessing, are mentioned. Next, the second stage, which is the model estimation, is considered in more details. Stepwise linear regression (SWR) is used to determine the significant cross-effects and the orders of the model polynomials. The estimates of the model parameters are obtained by a numerically stable estimator. Real market data is used to analyze SMI performance. The main conclusion is related to the applicability of multivariate dynamic models for representation of market systems.

Keywords: market identification, dynamic models, stepwise regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617
812 Two New Relative Efficiencies of Linear Weighted Regression

Authors: Shuimiao Wan, Chao Yuan, Baoguang Tian

Abstract:

In statistics parameter theory, usually the parameter estimations have two kinds, one is the least-square estimation (LSE), and the other is the best linear unbiased estimation (BLUE). Due to the determining theorem of minimum variance unbiased estimator (MVUE), the parameter estimation of BLUE in linear model is most ideal. But since the calculations are complicated or the covariance is not given, people are hardly to get the solution. Therefore, people prefer to use LSE rather than BLUE. And this substitution will take some losses. To quantize the losses, many scholars have presented many kinds of different relative efficiencies in different views. For the linear weighted regression model, this paper discusses the relative efficiencies of LSE of β to BLUE of β. It also defines two new relative efficiencies and gives their lower bounds.

Keywords: Linear weighted regression, Relative efficiency, Lower bound, Parameter estimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2116
811 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Authors: Jihye Jeon

Abstract:

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Keywords: Multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9248
810 Symmetry Breaking and the Emergence of Branching Structures in Morphogenesis: Minimal Conditions and Mechanical Interactions between Cells

Authors: M. Margarida Costa, Jorge Simão

Abstract:

The minimal condition for symmetry breaking in morphogenesis of cellular population was investigated using cellular automata based on reaction-diffusion dynamics. In particular, the study looked for the possibility of the emergence of branching structures due to mechanical interactions. The model used two types of cells an external gradient. The results showed that the external gradient influenced movement of cell type-I, also revealed that clusters formed by cells type-II worked as barrier to movement of cells type-I.

Keywords: Morphogenesis, branching structures, symmetrybreaking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1242
809 Segmentation Free Nastalique Urdu OCR

Authors: Sobia T. Javed, Sarmad Hussain, Ameera Maqbool, Samia Asloob, Sehrish Jamil, Huma Moin

Abstract:

The electronically available Urdu data is in image form which is very difficult to process. Printed Urdu data is the root cause of problem. So for the rapid progress of Urdu language we need an OCR systems, which can help us to make Urdu data available for the common person. Research has been carried out for years to automata Arabic and Urdu script. But the biggest hurdle in the development of Urdu OCR is the challenge to recognize Nastalique Script which is taken as standard for writing Urdu language. Nastalique script is written diagonally with no fixed baseline which makes the script somewhat complex. Overlap is present not only in characters but in the ligatures as well. This paper proposes a method which allows successful recognition of Nastalique Script.

Keywords: HMM, Image processing, Optical CharacterRecognition, Urdu OCR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2157
808 Ensembling Adaptively Constructed Polynomial Regression Models

Authors: Gints Jekabsons

Abstract:

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
807 Harmonics Elimination in Multilevel Inverter Using Linear Fuzzy Regression

Authors: A. K. Al-Othman, H. A. Al-Mekhaizim

Abstract:

Multilevel inverters supplied from equal and constant dc sources almost don-t exist in practical applications. The variation of the dc sources affects the values of the switching angles required for each specific harmonic profile, as well as increases the difficulty of the harmonic elimination-s equations. This paper presents an extremely fast optimal solution of harmonic elimination of multilevel inverters with non-equal dc sources using Tanaka's fuzzy linear regression formulation. A set of mathematical equations describing the general output waveform of the multilevel inverter with nonequal dc sources is formulated. Fuzzy linear regression is then employed to compute the optimal solution set of switching angles.

Keywords: Multilevel converters, harmonics, pulse widthmodulation (PWM), optimal control.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1796
806 Improving Taint Analysis of Android Applications Using Finite State Machines

Authors: Assad Maalouf, Lunjin Lu, James Lynott

Abstract:

We present a taint analysis that can automatically detect when string operations result in a string that is free of taints, where all the tainted patterns have been removed. This is an improvement on the conservative behavior of previous taint analyzers, where a string operation on a tainted string always leads to a tainted string unless the operation is manually marked as a sanitizer. The taint analysis is built on top of a string analysis that uses finite state automata to approximate the sets of values that string variables can take during the execution of a program. The proposed approach has been implemented as an extension of FlowDroid and experimental results show that the resulting taint analyzer is much more precise than the original FlowDroid.

Keywords: Android, static analysis, string analysis, taint analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 660
805 Burning Rate Response of Solid Fuels in Laminar Boundary Layer

Authors: A. M. Tahsini

Abstract:

Solid fuel transient burning behavior under oxidizer gas flow is numerically investigated. It is done using analysis of the regression rate responses to the imposed sudden and oscillatory variation at inflow properties. The conjugate problem is considered by simultaneous solution of flow and solid phase governing equations to compute the fuel regression rate. The advection upstream splitting method is used as flow computational scheme in finite volume method. The ignition phase is completely simulated to obtain the exact initial condition for response analysis. The results show that the transient burning effects which lead to the combustion instabilities and intermittent extinctions could be observed in solid fuels as the solid propellants.

Keywords: Extinction, Oscillation, Regression rate, Response, Transient burning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2363
804 A Model for Test Case Selection in the Software-Development Life Cycle

Authors: Adtha Lawanna

Abstract:

Software maintenance is one of the essential processes of Software-Development Life Cycle. The main philosophies of retaining software concern the improvement of errors, the revision of codes, the inhibition of future errors, and the development in piece and capacity. While the adjustment has been employing, the software structure has to be retested to an upsurge a level of assurance that it will be prepared due to the requirements. According to this state, the test cases must be considered for challenging the revised modules and the whole software. A concept of resolving this problem is ongoing by regression test selection such as the retest-all selections, random/ad-hoc selection and the safe regression test selection. Particularly, the traditional techniques concern a mapping between the test cases in a test suite and the lines of code it executes. However, there are not only the lines of code as one of the requirements that can affect the size of test suite but including the number of functions and faulty versions. Therefore, a model for test case selection is developed to cover those three requirements by the integral technique which can produce the smaller size of the test cases when compared with the traditional regression selection techniques.

Keywords: Software maintenance, regression test selection, test case.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1696
803 A Model for Test Case Selection in the Software-Development Life Cycle

Authors: Adtha Lawanna

Abstract:

Software maintenance is one of the essential processes of Software-Development Life Cycle. The main philosophies of retaining software concern the improvement of errors, the revision of codes, the inhibition of future errors, and the development in piece and capacity. While the adjustment has been employing, the software structure has to be retested to an upsurge a level of assurance that it will be prepared due to the requirements. According to this state, the test cases must be considered for challenging the revised modules and the whole software. A concept of resolving this problem is ongoing by regression test selection such as the retest-all selections, random/ad-hoc selection and the safe regression test selection. Particularly, the traditional techniques concern a mapping between the test cases in a test suite and the lines of code it executes. However, there are not only the lines of code as one of the requirements that can affect the size of test suite but including the number of functions and faulty versions. Therefore, a model for test case selection is developed to cover those three requirements by the integral technique which can produce the smaller size of the test cases when compared with the traditional regression selection techniques.

Keywords: Software maintenance, regression test selection, test case.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1598
802 Mathematical Modeling to Predict Surface Roughness in CNC Milling

Authors: Ab. Rashid M.F.F., Gan S.Y., Muhammad N.Y.

Abstract:

Surface roughness (Ra) is one of the most important requirements in machining process. In order to obtain better surface roughness, the proper setting of cutting parameters is crucial before the process take place. This research presents the development of mathematical model for surface roughness prediction before milling process in order to evaluate the fitness of machining parameters; spindle speed, feed rate and depth of cut. 84 samples were run in this study by using FANUC CNC Milling α-Τ14ιE. Those samples were randomly divided into two data sets- the training sets (m=60) and testing sets(m=24). ANOVA analysis showed that at least one of the population regression coefficients was not zero. Multiple Regression Method was used to determine the correlation between a criterion variable and a combination of predictor variables. It was established that the surface roughness is most influenced by the feed rate. By using Multiple Regression Method equation, the average percentage deviation of the testing set was 9.8% and 9.7% for training data set. This showed that the statistical model could predict the surface roughness with about 90.2% accuracy of the testing data set and 90.3% accuracy of the training data set.

Keywords: Surface roughness, regression analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2130
801 Stature Prediction Model Based On Hand Anthropometry

Authors: Arunesh Chandra, Pankaj Chandna, Surinder Deswal, Rajesh Kumar Mishra, Rajender Kumar

Abstract:

The arm length, hand length, hand breadth and middle finger length of 1540 right-handed industrial workers of Haryana state was used to assess the relationship between the upper limb dimensions and stature. Initially, the data were analyzed using basic univariate analysis and independent t-tests; then simple and multiple linear regression models were used to estimate stature using SPSS (version 17). There was a positive correlation between upper limb measurements (hand length, hand breadth, arm length and middle finger length) and stature (p < 0.01), which was highest for hand length. The accuracy of stature prediction ranged from ± 54.897 mm to ± 58.307 mm. The use of multiple regression equations gave better results than simple regression equations. This study provides new forensic standards for stature estimation from the upper limb measurements of male industrial workers of Haryana (India). The results of this research indicate that stature can be determined using hand dimensions with accuracy, when only upper limb is available due to any reasons likewise explosions, train/plane crashes, mutilated bodies, etc. The regression formula derived in this study will be useful for anatomists, archaeologists, anthropologists, design engineers and forensic scientists for fairly prediction of stature using regression equations.

Keywords: Anthropometric dimensions, Forensic identification, Industrial workers, Stature prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2959
800 Use of Regression Analysis in Determining the Length of Plastic Hinge in Reinforced Concrete Columns

Authors: Mehmet Alpaslan Köroğlu, Musa Hakan Arslan, Muslu Kazım Körez

Abstract:

Basic objective of this study is to create a regression analysis method that can estimate the length of a plastic hinge which is an important design parameter, by making use of the outcomes of (lateral load-lateral displacement hysteretic curves) the experimental studies conducted for the reinforced square concrete columns. For this aim, 170 different square reinforced concrete column tests results have been collected from the existing literature. The parameters which are thought affecting the plastic hinge length such as crosssection properties, features of material used, axial loading level, confinement of the column, longitudinal reinforcement bars in the columns etc. have been obtained from these 170 different square reinforced concrete column tests. In the study, when determining the length of plastic hinge, using the experimental test results, a regression analysis have been separately tested and compared with each other. In addition, the outcome of mentioned methods on determination of plastic hinge length of the reinforced concrete columns has been compared to other methods available in the literature.

Keywords: Columns, plastic hinge length, regression analysis, reinforced concrete.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4270
799 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines

Authors: Arun Goel

Abstract:

The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free overfall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, Support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, Support vector machine (Polynomial and rbf) models and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free overfall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.

Keywords: Air entrainment rate, dissolved oxygen, regression, SVM, weir.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1955
798 Landfill Failure Mobility Analysis: A Probabilistic Approach

Authors: Ali Jahanfar, Brajesh Dubey, Bahram Gharabaghi, Saber Bayat Movahed

Abstract:

Ever increasing population growth of major urban centers and environmental challenges in siting new landfills have resulted in a growing trend in design of mega-landfills some with extraordinary heights and dangerously steep slopes. Landfill failure mobility risk analysis is one of the most uncertain types of dynamic rheology models due to very large inherent variabilities in the heterogeneous solid waste material shear strength properties. The waste flow of three historic dumpsite and two landfill failures were back-analyzed using run-out modeling with DAN-W model. The travel distances of the waste flow during landfill failures were calculated approach by taking into account variability in material shear strength properties. The probability distribution function for shear strength properties of the waste material were grouped into four major classed based on waste material compaction (landfills versus dumpsites) and composition (high versus low quantity) of high shear strength waste materials such as wood, metal, plastic, paper and cardboard in the waste. This paper presents a probabilistic method for estimation of the spatial extent of waste avalanches, after a potential landfill failure, to create maps of vulnerability scores to inform property owners and residents of the level of the risk.

Keywords: Landfill failure, waste flow, Voellmy rheology, friction coefficient, waste compaction and type.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2285
797 A Research on Inference from Multiple Distance Variables in Hedonic Regression – Focus on Three Variables

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban context, urban nodes such as amenity or hazard will certainly affect house price, while classic hedonic analysis will employ distance variables measured from each urban nodes. However, effects from distances to facilities on house prices generally do not represent the true price of the property. Distance variables measured on the same surface are suffering a problem called multicollinearity, which is usually presented as magnitude variance and mean value in regression, errors caused by instability. In this paper, we provided a theoretical framework to identify and gather the data with less bias, and also provided specific sampling method on locating the sample region to avoid the spatial multicollinerity problem in three distance variable’s case.

Keywords: Hedonic regression, urban node, distance variables, multicollinerity, collinearity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1992
796 The Effect of the Hourly Compensation on the Unemployment Rate: Comparative Analysis of United States, Canada and the United Kingdom Using Panel Data Regression Analysis

Authors: Ashiquer Rahman, Hares Mohammad, Ummey Salma

Abstract:

A country’s hourly compensation and unemployment rates are two of its most crucial components. They are not merely statistics but they have profound effects on individual, families, country, and the economy. They are inversely related to one another. The increased hourly compensation in the manufacturing sector can have a favorable effect on job changing issues. Moreover, the relationship between hourly compensation and unemployment is complex and influenced by broader economic factors. In this paper, in order to determine the effect of hourly compensation on unemployment rate, we use the panel data regression models and evaluate the expected link between hourly compensation and unemployment rate. We estimate the fixed effects model (FEM), evaluate the error components model (ECM), and determine which model (the FEM or ECM) is better through pooling all 60 observations. We then analyze and review the data by comparing countries (United States, Canada and the United Kingdom) using panel data regression models. Finally, we provide result, analysis and a summary of this extensive research on how the hourly compensation affects unemployment rate. Additionally, this paper offers relevant and useful guideline for the government and academic community to use an econometrics and social approach for the hourly compensation on unemployment rate to eliminate the problem.

Keywords: Hourly compensation, unemployment rate, panel data regression models, dummy variables, random effects model, fixed effects model, the linear regression model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 61
795 Mimicking Morphogenesis for Robust Behaviour of Cellular Architectures

Authors: David Jones, Richard McWilliam, Alan Purvis

Abstract:

Morphogenesis is the process that underpins the selforganised development and regeneration of biological systems. The ability to mimick morphogenesis in artificial systems has great potential for many engineering applications, including production of biological tissue, design of robust electronic systems and the co-ordination of parallel computing. Previous attempts to mimick these complex dynamics within artificial systems have relied upon the use of evolutionary algorithms that have limited their size and complexity. This paper will present some insight into the underlying dynamics of morphogenesis, then show how to, without the assistance of evolutionary algorithms, design cellular architectures that converge to complex patterns.

Keywords: Morphogenesis, regeneration, robustness, convergence, cellular automata.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490
794 Implementation of Response Surface Methodology using in Small Brown Rice Peeling Machine: Part I

Authors: S. Bangphan, P. Bangphan, T.Boonkang

Abstract:

Implementation of response surface methodology (RSM) was employed to study the effects of two factor (rubber clearance and round per minute) in brown rice peeling machine of The optimal BROKENS yield (19.02, average of three repeats),.The optimized composition derived from RSM regression was analyzed using Regression analysis and Analysis of Variance (ANOVA). At a significant level α = 0.05, the values of Regression coefficient, R 2 (adj)were 97.35 % and standard deviation were 1.09513. The independent variables are initial rubber clearance, and round per minute parameters namely. The investigating responses are final rubber clearance, and round per minute (RPM). The restriction of the optimization is the designated.

Keywords: Brown rice, Response surface methodology(RSM), Rubber clearance, Round per minute (RPM), Peeling machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1968
793 Climate Change in Albania and Its Effect on Cereal Yield

Authors: L. Basha, E. Gjika

Abstract:

This study is focused on analyzing climate change in Albania and its potential effects on cereal yields. Initially, monthly temperature and rainfalls in Albania were studied for the period 1960-2021. Climacteric variables are important variables when trying to model cereal yield behavior, especially when significant changes in weather conditions are observed. For this purpose, in the second part of the study, linear and nonlinear models explaining cereal yield are constructed for the same period, 1960-2021. The multiple linear regression analysis and lasso regression method are applied to the data between cereal yield and each independent variable: average temperature, average rainfall, fertilizer consumption, arable land, land under cereal production, and nitrous oxide emissions. In our regression model, heteroscedasticity is not observed, data follow a normal distribution, and there is a low correlation between factors, so we do not have the problem of multicollinearity. Machine learning methods, such as Random Forest (RF), are used to predict cereal yield responses to climacteric and other variables. RF showed high accuracy compared to the other statistical models in the prediction of cereal yield. We found that changes in average temperature negatively affect cereal yield. The coefficients of fertilizer consumption, arable land, and land under cereal production are positively affecting production. Our results show that the RF method is an effective and versatile machine-learning method for cereal yield prediction compared to the other two methods: multiple linear regression and lasso regression method.

Keywords: Cereal yield, climate change, machine learning, multiple regression model, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 245
792 Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring

Authors: Youngji Yoo, Cheong-Sool Park, Jun Seok Kim, Young-Hak Lee, Sung-Shick Kim, Jun-Geol Baek

Abstract:

In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.

Keywords: Semiconductor, non-normal mixed process data, clustering, Statistical Quality Control (SQC), regression tree, Pearson distribution system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1778
791 Infrastructure Change Monitoring Using Multitemporal Multispectral Satellite Images

Authors: U. Datta

Abstract:

The main objective of this study is to find a suitable approach to monitor the land infrastructure growth over a period of time using multispectral satellite images. Bi-temporal change detection method is unable to indicate the continuous change occurring over a long period of time. To achieve this objective, the approach used here estimates a statistical model from series of multispectral image data over a long period of time, assuming there is no considerable change during that time period and then compare it with the multispectral image data obtained at a later time. The change is estimated pixel-wise. Statistical composite hypothesis technique is used for estimating pixel based change detection in a defined region. The generalized likelihood ratio test (GLRT) is used to detect the changed pixel from probabilistic estimated model of the corresponding pixel. The changed pixel is detected assuming that the images have been co-registered prior to estimation. To minimize error due to co-registration, 8-neighborhood pixels around the pixel under test are also considered. The multispectral images from Sentinel-2 and Landsat-8 from 2015 to 2018 are used for this purpose. There are different challenges in this method. First and foremost challenge is to get quite a large number of datasets for multivariate distribution modelling. A large number of images are always discarded due to cloud coverage. Due to imperfect modelling there will be high probability of false alarm. Overall conclusion that can be drawn from this work is that the probabilistic method described in this paper has given some promising results, which need to be pursued further.

Keywords: Co-registration, GLRT, infrastructure growth, multispectral, multitemporal, pixel-based change detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 726
790 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling

Authors: Florin Leon, Silvia Curteanu

Abstract:

Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.

Keywords: Adaptive sampling, batch bulk methyl methacrylate polymerization, large margin nearest neighbor regression, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1399
789 A Comparative Analysis of Machine Learning Techniques for PM10 Forecasting in Vilnius

Authors: M. A. S. Fahim, J. Sužiedelytė Visockienė

Abstract:

With the growing concern over air pollution (AP), it is clear that this has gained more prominence than ever before. The level of consciousness has increased and a sense of knowledge now has to be forwarded as a duty by those enlightened enough to disseminate it to others. This realization often comes after an understanding of how poor air quality indices (AQI) damage human health. The study focuses on assessing air pollution prediction models specifically for Lithuania, addressing a substantial need for empirical research within the region. Concentrating on Vilnius, it specifically examines particulate matter concentrations 10 micrometers or less in diameter (PM10). Utilizing Gaussian Process Regression (GPR) and Regression Tree Ensemble, and Regression Tree methodologies, predictive forecasting models are validated and tested using hourly data from January 2020 to December 2022. The study explores the classification of AP data into anthropogenic and natural sources, the impact of AP on human health, and its connection to cardiovascular diseases. The study revealed varying levels of accuracy among the models, with GPR achieving the highest accuracy, indicated by an RMSE of 4.14 in validation and 3.89 in testing.

Keywords: Air pollution, anthropogenic and natural sources, machine learning, Gaussian process regression, tree ensemble, forecasting models, particulate matter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 112
788 Rapid Study on Feature Extraction and Classification Models in Healthcare Applications

Authors: S. Sowmyayani

Abstract:

The advancement of computer-aided design helps the medical force and security force. Some applications include biometric recognition, elderly fall detection, face recognition, cancer recognition, tumor recognition, etc. This paper deals with different machine learning algorithms that are more generically used for any health care system. The most focused problems are classification and regression. With the rise of big data, machine learning has become particularly important for solving problems. Machine learning uses two types of techniques: supervised learning and unsupervised learning. The former trains a model on known input and output data and predicts future outputs. Classification and regression are supervised learning techniques. Unsupervised learning finds hidden patterns in input data. Clustering is one such unsupervised learning technique. The above-mentioned models are discussed briefly in this paper.

Keywords: Supervised learning, unsupervised learning, regression, neural network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 345
787 Methodologies for Crack Initiation in Welded Joints Applied to Inspection Planning

Authors: Guang Zou, Kian Banisoleiman, Arturo González

Abstract:

Crack initiation and propagation threatens structural integrity of welded joints and normally inspections are assigned based on crack propagation models. However, the approach based on crack propagation models may not be applicable for some high-quality welded joints, because the initial flaws in them may be so small that it may take long time for the flaws to develop into a detectable size. This raises a concern regarding the inspection planning of high-quality welded joins, as there is no generally acceptable approach for modeling the whole fatigue process that includes the crack initiation period. In order to address the issue, this paper reviews treatment methods for crack initiation period and initial crack size in crack propagation models applied to inspection planning. Generally, there are four approaches, by: 1) Neglecting the crack initiation period and fitting a probabilistic distribution for initial crack size based on statistical data; 2) Extrapolating the crack propagation stage to a very small fictitious initial crack size, so that the whole fatigue process can be modeled by crack propagation models; 3) Assuming a fixed detectable initial crack size and fitting a probabilistic distribution for crack initiation time based on specimen tests; and, 4) Modeling the crack initiation and propagation stage separately using small crack growth theories and Paris law or similar models. The conclusion is that in view of trade-off between accuracy and computation efforts, calibration of a small fictitious initial crack size to S-N curves is the most efficient approach.

Keywords: Crack initiation, fatigue reliability, inspection planning, welded joints.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1396
786 Probabilistic Damage Tolerance Methodology for Solid Fan Blades and Discs

Authors: Andrej Golowin, Viktor Denk, Axel Riepe

Abstract:

Solid fan blades and discs in aero engines are subjected to high combined low and high cycle fatigue loads especially around the contact areas between blade and disc. Therefore, special coatings (e.g. dry film lubricant) and surface treatments (e.g. shot peening or laser shock peening) are applied to increase the strength with respect to combined cyclic fatigue and fretting fatigue, but also to improve damage tolerance capability. The traditional deterministic damage tolerance assessment based on fracture mechanics analysis, which treats service damage as an initial crack, often gives overly conservative results especially in the presence of vibratory stresses. A probabilistic damage tolerance methodology using crack initiation data has been developed for fan discs exposed to relatively high vibratory stresses in cross- and tail-wind conditions at certain resonance speeds for limited time periods. This Monte-Carlo based method uses a damage databank from similar designs, measured vibration levels at typical aircraft operations and wind conditions and experimental crack initiation data derived from testing of artificially damaged specimens with representative surface treatment under combined fatigue conditions. The proposed methodology leads to a more realistic prediction of the minimum damage tolerance life for the most critical locations applicable to modern fan disc designs.

Keywords: Damage tolerance, Monte-Carlo method, fan blade and disc, laser shock peening.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1575