Search results for: nonparametric statistical model.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8207

Search results for: nonparametric statistical model.

8207 A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

Authors: Adriano Z. Zambom, Preethi Ravikumar

Abstract:

One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.

Keywords: Additive models, local polynomial regression, residuals, mean square error, variable selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 978
8206 A Comparison of the Nonparametric Regression Models using Smoothing Spline and Kernel Regression

Authors: Dursun Aydin

Abstract:

This paper study about using of nonparametric models for Gross National Product data in Turkey and Stanford heart transplant data. It is discussed two nonparametric techniques called smoothing spline and kernel regression. The main goal is to compare the techniques used for prediction of the nonparametric regression models. According to the results of numerical studies, it is concluded that smoothing spline regression estimators are better than those of the kernel regression.

Keywords: Kernel regression, Nonparametric models, Prediction, Smoothing spline.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3057
8205 A Brief Study about Nonparametric Adherence Tests

Authors: Vinicius R. Domingues, Luan C. S. M. Ozelim

Abstract:

The statistical study has become indispensable for various fields of knowledge. Not any different, in Geotechnics the study of probabilistic and statistical methods has gained power considering its use in characterizing the uncertainties inherent in soil properties. One of the situations where engineers are constantly faced is the definition of a probability distribution that represents significantly the sampled data. To be able to discard bad distributions, goodness-of-fit tests are necessary. In this paper, three non-parametric goodness-of-fit tests are applied to a data set computationally generated to test the goodness-of-fit of them to a series of known distributions. It is shown that the use of normal distribution does not always provide satisfactory results regarding physical and behavioral representation of the modeled parameters.

Keywords: Kolmogorov-Smirnov, Anderson-Darling, Cramer-Von-Mises, Nonparametric adherence tests.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1791
8204 A Bathtub Curve from Nonparametric Model

Authors: Eduardo C. Guardia, Jose W. M. Lima, Afonso H. M. Santos

Abstract:

This paper presents a nonparametric method to obtain the hazard rate “Bathtub curve” for power system components. The model is a mixture of the three known phases of a component life, the decreasing failure rate (DFR), the constant failure rate (CFR) and the increasing failure rate (IFR) represented by three parametric Weibull models. The parameters are obtained from a simultaneous fitting process of the model to the Kernel nonparametric hazard rate curve. From the Weibull parameters and failure rate curves the useful lifetime and the characteristic lifetime were defined. To demonstrate the model the historic time-to-failure of distribution transformers were used as an example. The resulted “Bathtub curve” shows the failure rate for the equipment lifetime which can be applied in economic and replacement decision models.

Keywords: Bathtub curve, failure analysis, lifetime estimation, parameter estimation, Weibull distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2187
8203 Nonparametric Control Chart Using Density Weighted Support Vector Data Description

Authors: Myungraee Cha, Jun Seok Kim, Seung Hwan Park, Jun-Geol Baek

Abstract:

In manufacturing industries, development of measurement leads to increase the number of monitoring variables and eventually the importance of multivariate control comes to the fore. Statistical process control (SPC) is one of the most widely used as multivariate control chart. Nevertheless, SPC is restricted to apply in processes because its assumption of data as following specific distribution. Unfortunately, process data are composed by the mixture of several processes and it is hard to estimate as one certain distribution. To alternative conventional SPC, therefore, nonparametric control chart come into the picture because of the strength of nonparametric control chart, the absence of parameter estimation. SVDD based control chart is one of the nonparametric control charts having the advantage of flexible control boundary. However,basic concept of SVDD has been an oversight to the important of data characteristic, density distribution. Therefore, we proposed DW-SVDD (Density Weighted SVDD) to cover up the weakness of conventional SVDD. DW-SVDD makes a new attempt to consider dense of data as introducing the notion of density Weight. We extend as control chart using new proposed SVDD and a simulation study of various distributional data is conducted to demonstrate the improvement of performance.

Keywords: Density estimation, Multivariate control chart, Oneclass classification, Support vector data description (SVDD)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2069
8202 A Bootstrap's Reliability Measure on Tests of Hypotheses

Authors: Al Jefferson J. Pabelic, Dennis A. Tarepe

Abstract:

Bootstrapping has gained popularity in different tests of hypotheses as an alternative in using asymptotic distribution if one is not sure of the distribution of the test statistic under a null hypothesis. This method, in general, has two variants – the parametric and the nonparametric approaches. However, issues on reliability of this method always arise in many applications. This paper addresses the issue on reliability by establishing a reliability measure in terms of quantiles with respect to asymptotic distribution, when this is approximately correct. The test of hypotheses used is Ftest. The simulated results show that using nonparametric bootstrapping in F-test gives better reliability than parametric bootstrapping with relatively higher degrees of freedom.

Keywords: F-test, nonparametric bootstrapping, parametric bootstrapping, reliability measure, tests of hypotheses.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1647
8201 Efficient System for Speech Recognition using General Regression Neural Network

Authors: Abderrahmane Amrouche, Jean Michel Rouvaen

Abstract:

In this paper we present an efficient system for independent speaker speech recognition based on neural network approach. The proposed architecture comprises two phases: a preprocessing phase which consists in segmental normalization and features extraction and a classification phase which uses neural networks based on nonparametric density estimation namely the general regression neural network (GRNN). The relative performances of the proposed model are compared to the similar recognition systems based on the Multilayer Perceptron (MLP), the Recurrent Neural Network (RNN) and the well known Discrete Hidden Markov Model (HMM-VQ) that we have achieved also. Experimental results obtained with Arabic digits have shown that the use of nonparametric density estimation with an appropriate smoothing factor (spread) improves the generalization power of the neural network. The word error rate (WER) is reduced significantly over the baseline HMM method. GRNN computation is a successful alternative to the other neural network and DHMM.

Keywords: Speech Recognition, General Regression NeuralNetwork, Hidden Markov Model, Recurrent Neural Network, ArabicDigits.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2134
8200 CART Method for Modeling the Output Power of Copper Bromide Laser

Authors: Iliycho P. Iliev, Desislava S. Voynikova, Snezhana G. Gocheva-Ilieva

Abstract:

This paper examines the available experiment data for a copper bromide vapor laser (CuBr laser), emitting at two wavelengths - 510.6 and 578.2nm. Laser output power is estimated based on 10 independent input physical parameters. A classification and regression tree (CART) model is obtained which describes 97% of data. The resulting binary CART tree specifies which input parameters influence considerably each of the classification groups. This allows for a technical assessment that indicates which of these are the most significant for the manufacture and operation of the type of laser under consideration. The predicted values of the laser output power are also obtained depending on classification. This aids the design and development processes considerably.

Keywords: Classification and regression trees (CART), Copper Bromide laser (CuBr laser), laser generation, nonparametric statistical model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
8199 Comparison of Parametric and Nonparametric Techniques for Non-peak Traffic Forecasting

Authors: Yang Zhang, Yuncai Liu

Abstract:

Accurately predicting non-peak traffic is crucial to daily traffic for all forecasting models. In the paper, least squares support vector machines (LS-SVMs) are investigated to solve such a practical problem. It is the first time to apply the approach and analyze the forecast performance in the domain. For comparison purpose, two parametric and two non-parametric techniques are selected because of their effectiveness proved in past research. Having good generalization ability and guaranteeing global minima, LS-SVMs perform better than the others. Providing sufficient improvement in stability and robustness reveals that the approach is practically promising.

Keywords: Parametric and Nonparametric Techniques, Non-peak Traffic Forecasting

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2257
8198 Adaptive Nonparametric Approach for Guaranteed Real-Time Detection of Targeted Signals in Multichannel Monitoring Systems

Authors: Andrey V. Timofeev

Abstract:

An adaptive nonparametric method is proposed for stable real-time detection of seismoacoustic sources in multichannel C-OTDR systems with a significant number of channels. This method guarantees given upper boundaries for probabilities of Type I and Type II errors. Properties of the proposed method are rigorously proved. The results of practical applications of the proposed method in a real C-OTDR-system are presented in this report.

Keywords: Adaptive detection, change point, interval estimation, guaranteed detection, multichannel monitoring systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1836
8197 Wind Farm Power Performance Verification Using Non-Parametric Statistical Inference

Authors: M. Celeska, K. Najdenkoski, V. Dimchev, V. Stoilkov

Abstract:

Accurate determination of wind turbine performance is necessary for economic operation of a wind farm. At present, the procedure to carry out the power performance verification of wind turbines is based on a standard of the International Electrotechnical Commission (IEC). In this paper, nonparametric statistical inference is applied to designing a simple, inexpensive method of verifying the power performance of a wind turbine. A statistical test is explained, examined, and the adequacy is tested over real data. The methods use the information that is collected by the SCADA system (Supervisory Control and Data Acquisition) from the sensors embedded in the wind turbines in order to carry out the power performance verification of a wind farm. The study has used data on the monthly output of wind farm in the Republic of Macedonia, and the time measuring interval was from January 1, 2016, to December 31, 2016. At the end, it is concluded whether the power performance of a wind turbine differed significantly from what would be expected. The results of the implementation of the proposed methods showed that the power performance of the specific wind farm under assessment was acceptable.

Keywords: Canonical correlation analysis, power curve, power performance, wind energy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 966
8196 Developing a Statistical Model for Electromagnetic Environment for Mobile Wireless Networks

Authors: C. Temaneh Nyah

Abstract:

The analysis of electromagnetic environment using deterministic mathematical models is characterized by the impossibility of analyzing a large number of interacting network stations with a priori unknown parameters, and this is characteristic, for example, of mobile wireless communication networks. One of the tasks of the tools used in designing, planning and optimization of mobile wireless network is to carry out simulation of electromagnetic environment based on mathematical modelling methods, including computer experiment, and to estimate its effect on radio communication devices. This paper proposes the development of a statistical model of electromagnetic environment of a mobile wireless communication network by describing the parameters and factors affecting it including the propagation channel and their statistical models.

Keywords: Electromagnetic Environment, Statistical model, Wireless communication network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1860
8195 Orthogonal Regression for Nonparametric Estimation of Errors-in-Variables Models

Authors: Anastasiia Yu. Timofeeva

Abstract:

Two new algorithms for nonparametric estimation of errors-in-variables models are proposed. The first algorithm is based on penalized regression spline. The spline is represented as a piecewise-linear function and for each linear portion orthogonal regression is estimated. This algorithm is iterative. The second algorithm involves locally weighted regression estimation. When the independent variable is measured with error such estimation is a complex nonlinear optimization problem. The simulation results have shown the advantage of the second algorithm under the assumption that true smoothing parameters values are known. Nevertheless the use of some indexes of fit to smoothing parameters selection gives the similar results and has an oversmoothing effect.

Keywords: Grade point average, orthogonal regression, penalized regression spline, locally weighted regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2091
8194 Generation Expansion Planning Strategies on Power System: A Review

Authors: V. Phupha, T. Lantharthong, N. Rugthaicharoencheep

Abstract:

The problem of generation expansion planning (GEP) has been extensively studied for many years. This paper presents three topics in GEP as follow: statistical model, models for generation expansion, and expansion problem. In the topic of statistical model, the main stages of the statistical modeling are briefly explained. Some works on models for GEP are reviewed in the topic of models for generation expansion. Finally for the topic of expansion problem, the major issues in the development of a longterm expansion plan are summarized.

Keywords: Generation expansion planning, strategies, power system

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3173
8193 Statistical Models of Network Traffic

Authors: Barath Kumar, Oliver Niggemann, Juergen Jasperneite

Abstract:

Model-based approaches have been applied successfully to a wide range of tasks such as specification, simulation, testing, and diagnosis. But one bottleneck often prevents the introduction of these ideas: Manual modeling is a non-trivial, time-consuming task. Automatically deriving models by observing and analyzing running systems is one possible way to amend this bottleneck. To derive a model automatically, some a-priori knowledge about the model structure–i.e. about the system–must exist. Such a model formalism would be used as follows: (i) By observing the network traffic, a model of the long-term system behavior could be generated automatically, (ii) Test vectors can be generated from the model, (iii) While the system is running, the model could be used to diagnose non-normal system behavior. The main contribution of this paper is the introduction of a model formalism called 'probabilistic regression automaton' suitable for the tasks mentioned above.

Keywords: Model-based approach, Probabilistic regression automata, Statistical models and Timed automata.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1485
8192 Short-Term Electric Load Forecasting Using Multiple Gaussian Process Models

Authors: Tomohiro Hachino, Hitoshi Takata, Seiji Fukushima, Yasutaka Igarashi

Abstract:

This paper presents a Gaussian process model-based short-term electric load forecasting. The Gaussian process model is a nonparametric model and the output of the model has Gaussian distribution with mean and variance. The multiple Gaussian process models as every hour ahead predictors are used to forecast future electric load demands up to 24 hours ahead in accordance with the direct forecasting approach. The separable least-squares approach that combines the linear least-squares method and genetic algorithm is applied to train these Gaussian process models. Simulation results are shown to demonstrate the effectiveness of the proposed electric load forecasting.

Keywords: Direct method, electric load forecasting, Gaussian process model, genetic algorithm, separable least-squares method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1932
8191 Parametric and Nonparametric Analysis of Breast Cancer Treatments

Authors: Chunling Cong, Chris.P.Tsokos

Abstract:

The objective of the present research manuscript is to perform parametric, nonparametric, and decision tree analysis to evaluate two treatments that are being used for breast cancer patients. Our study is based on utilizing real data which was initially used in “Tamoxifen with or without breast irradiation in women of 50 years of age or older with early breast cancer" [1], and the data is supplied to us by N.A. Ibrahim “Decision tree for competing risks survival probability in breast cancer study" [2]. We agree upon certain aspects of our findings with the published results. However, in this manuscript, we focus on relapse time of breast cancer patients instead of survival time and parametric analysis instead of semi-parametric decision tree analysis is applied to provide more precise recommendations of effectiveness of the two treatments with respect to reoccurrence of breast cancer.

Keywords: decision tree, breast cancer treatments, parametricanalysis, non-parametric analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1990
8190 Gaussian Process Model Identification Using Artificial Bee Colony Algorithm and Its Application to Modeling of Power Systems

Authors: Tomohiro Hachino, Hitoshi Takata, Shigeru Nakayama, Ichiro Iimura, Seiji Fukushima, Yasutaka Igarashi

Abstract:

This paper presents a nonparametric identification of continuous-time nonlinear systems by using a Gaussian process (GP) model. The GP prior model is trained by artificial bee colony algorithm. The nonlinear function of the objective system is estimated as the predictive mean function of the GP, and the confidence measure of the estimated nonlinear function is given by the predictive covariance of the GP. The proposed identification method is applied to modeling of a simplified electric power system. Simulation results are shown to demonstrate the effectiveness of the proposed method.

Keywords: Artificial bee colony algorithm, Gaussian process model, identification, nonlinear system, electric power system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1521
8189 A Heuristic Statistical Model for Lifetime Distribution Analysis of Complicated Systems in the Reliability Centered Maintenance

Authors: Mojtaba Mahdavi, Mohamad Mahdavi, Maryam Yazdani

Abstract:

A heuristic conceptual model for to develop the Reliability Centered Maintenance (RCM), especially in preventive strategy, has been explored during this paper. In most real cases which complicity of system obligates high degree of reliability, this model proposes a more appropriate reliability function between life time distribution based and another which is based on relevant Extreme Value (EV) distribution. A statistical and mathematical approach is used to estimate and verify these two distribution functions. Then best one is chosen just among them, whichever is more reliable. A numeric Industrial case study will be reviewed to represent the concepts of this paper, more clearly.

Keywords: Lifetime distribution, Reliability, Estimation, Extreme value, Improving model, Series, Parallel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434
8188 Simultaneous Term Structure Estimation of Hazard and Loss Given Default with a Statistical Model using Credit Rating and Financial Information

Authors: Tomohiro Ando, Satoshi Yamashita

Abstract:

The objective of this study is to propose a statistical modeling method which enables simultaneous term structure estimation of the risk-free interest rate, hazard and loss given default, incorporating the characteristics of the bond issuing company such as credit rating and financial information. A reduced form model is used for this purpose. Statistical techniques such as spline estimation and Bayesian information criterion are employed for parameter estimation and model selection. An empirical analysis is conducted using the information on the Japanese bond market data. Results of the empirical analysis confirm the usefulness of the proposed method.

Keywords: Empirical Bayes, Hazard term structure, Loss given default.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1622
8187 The Recreation Technique Model from the Perspective of Environmental Quality Elements

Authors: G. Gradinaru, S. Olteanu

Abstract:

The quality improvements of the environmental elements could increase the recreational opportunities in a certain area (destination). The technique of the need for recreation focuses on choosing certain destinations for recreational purposes. The basic exchange taken into consideration is the one between the satisfaction gained after staying in that area and the value expressed in money and time allocated. The number of tourists in the respective area, the duration of staying and the money spent including transportation provide information on how individuals rank the place or certain aspects of the area (such as the quality of the environmental elements). For the statistical analysis of the environmental benefits offered by an area through the need of recreation technique, the following stages are suggested: - characterization of the reference area based on the statistical variables considered; - estimation of the environmental benefit through comparing the reference area with other similar areas (having the same environmental characteristics), from the perspective of the statistical variables considered. The model compared in recreation technique faced with a series of difficulties which refers to the reference area and correct transformation of time in money.

Keywords: Comparison in recreation technique, the quality of the environmental elements, statistical analysis model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1046
8186 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: Bootstrap, diabetes risk groups, error rate, k-nearest neighbors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1963
8185 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: Bootstrap, Edgeworth approximation, independent and Identical distributed, quantile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 373
8184 Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model

Authors: Selvam M, Natarajan. A M, Thangarajan R

Abstract:

Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syntax and semantics thereby increasing accuracy and efficiency of the parser. Tamil language has some inherent features which are more challenging. In order to obtain the solutions, lexicalized and statistical approach is to be applied in the parsing with the aid of a language model. Statistical models mainly focus on semantics of the language which are suitable for large vocabulary tasks where as structural methods focus on syntax which models small vocabulary tasks. A statistical language model based on Trigram for Tamil language with medium vocabulary of 5000 words has been built. Though statistical parsing gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like focus on semantics rather than syntax, lack of support in free ordering of words and long term relationship. To overcome the disadvantages a structural component is to be incorporated in statistical language models which leads to the implementation of hybrid language models. This paper has attempted to build phrase structured hybrid language model which resolves above mentioned disadvantages. In the development of hybrid language model, new part of speech tag set for Tamil language has been developed with more than 500 tags which have the wider coverage. A phrase structured Treebank has been developed with 326 Tamil sentences which covers more than 5000 words. A hybrid language model has been trained with the phrase structured Treebank using immediate head parsing technique. Lexicalized and statistical parser which employs this hybrid language model and immediate head parsing technique gives better results than pure grammar and trigram based model.

Keywords: Hybrid Language Model, Immediate Head Parsing, Lexicalized and Statistical Parsing, Natural Language Processing, Parts of Speech, Probabilistic Context Free Grammar, Tamil Language, Tree Bank.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3588
8183 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Authors: Jihye Jeon

Abstract:

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Keywords: Multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9090
8182 An Evolutionary Statistical Learning Theory

Authors: Sung-Hae Jun, Kyung-Whan Oh

Abstract:

Statistical learning theory was developed by Vapnik. It is a learning theory based on Vapnik-Chervonenkis dimension. It also has been used in learning models as good analytical tools. In general, a learning theory has had several problems. Some of them are local optima and over-fitting problems. As well, statistical learning theory has same problems because the kernel type, kernel parameters, and regularization constant C are determined subjectively by the art of researchers. So, we propose an evolutionary statistical learning theory to settle the problems of original statistical learning theory. Combining evolutionary computing into statistical learning theory, our theory is constructed. We verify improved performances of an evolutionary statistical learning theory using data sets from KDD cup.

Keywords: Evolutionary computing, Local optima, Over-fitting, Statistical learning theory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1724
8181 GMDH Modeling Based on Polynomial Spline Estimation and Its Applications

Authors: LI qiu-min, TIAN yi-xiang, ZHANG gao-xun

Abstract:

GMDH algorithm can well describe the internal structure of objects. In the process of modeling, automatic screening of model structure and variables ensure the convergence rate.This paper studied a new GMDH model based on polynomial spline  stimation. The polynomial spline function was used to instead of the transfer function of GMDH to characterize the relationship between the input variables and output variables. It has proved that the algorithm has the optimal convergence rate under some conditions. The empirical results show that the algorithm can well forecast Consumer Price Index (CPI).

Keywords: spline, GMDH, nonparametric, bias, forecast.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2067
8180 The Sequential Estimation of the Seismoacoustic Source Energy in C-OTDR Monitoring Systems

Authors: Andrey V. Timofeev, Dmitry V. Egorov

Abstract:

The practical efficient approach is suggested for estimation of the seismoacoustic sources energy in C-OTDR monitoring systems. This approach is represents the sequential plan for confidence estimation both the seismoacoustic sources energy, as well the absorption coefficient of the soil. The sequential plan delivers the non-asymptotic guaranteed accuracy of obtained estimates in the form of non-asymptotic confidence regions with prescribed sizes. These confidence regions are valid for a finite sample size when the distributions of the observations are unknown. Thus, suggested estimates are non-asymptotic and nonparametric, and also these estimates guarantee the prescribed estimation accuracy in form of prior prescribed size of confidence regions, and prescribed confidence coefficient value.

Keywords: C-OTDR-system, guaranteed estimates, nonparametric estimation, sequential confidence estimation, multichannel monitoring systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2051
8179 A New Approach for Classifying Large Number of Mixed Variables

Authors: Hashibah Hamid

Abstract:

The issue of classifying objects into one of predefined groups when the measured variables are mixed with different types of variables has been part of interest among statisticians in many years. Some methods for dealing with such situation have been introduced that include parametric, semi-parametric and nonparametric approaches. This paper attempts to discuss on a problem in classifying a data when the number of measured mixed variables is larger than the size of the sample. A propose idea that integrates a dimensionality reduction technique via principal component analysis and a discriminant function based on the location model is discussed. The study aims in offering practitioners another potential tool in a classification problem that is possible to be considered when the observed variables are mixed and too large.

Keywords: classification, location model, mixed variables, principal component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506
8178 Evaluating Hourly Sulphur Dioxide and Ground Ozone Simulated with the Air Quality Model in Lima, Peru

Authors: Odón R. Sánchez-Ccoyllo, Elizabeth Ayma-Choque, Alan Llacza

Abstract:

Sulphur dioxide (SO₂) and surface-ozone (O₃) concentrations are associated with diseases. The objective of this research is to evaluate the effectiveness of the air-quality Weather Research and Forecasting model coupled to Chemistry (WRF-Chem) model with a horizontal resolution of 5 km x 5 km. For this purpose, the measurements of the hourly SO₂ and O₃ concentrations available in three air quality monitoring stations in Lima, Peru were used for the purpose of validating the simulations of the SO₂ and O₃ concentrations obtained with the WRF-Chem model in February 2018. For the quantitative evaluation of the simulations of these gases, statistical techniques were implemented, such as the average of the simulations; the average of the measurements; the Mean Bias (MeB); the Mean Error (MeE); and the Root Mean Square Error (RMSE). The results of these statistical metrics indicated that the simulated SO₂ and O₃ values over-predicted the SO₂ and O₃ measurements. For the SO₂ concentration, the MeB values varied from 0.58 to 26.35 µg/m³; the MeE values varied from 8.75 to 26.5 µg/m³; the RMSE values varied from 13.3 to 31.79 µg/m³; while for O₃ concentrations the statistical values of the MeB varied from 37.52 to 56.29 µg/m³; the MeE values varied from 37.54 to 56.70 µg/m³; the RMSE values varied from 43.05 to 69.56 µg/m³.

Keywords: Ground-ozone, Lima, Sulphur dioxide, WRF-Chem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 289