Search results for: weibull regression distribution
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7846

Search results for: weibull regression distribution

7696 Confidence Envelopes for Parametric Model Selection Inference and Post-Model Selection Inference

Authors: I. M. L. Nadeesha Jayaweera, Adao Alex Trindade

Abstract:

In choosing a candidate model in likelihood-based modeling via an information criterion, the practitioner is often faced with the difficult task of deciding just how far up the ranked list to look. Motivated by this pragmatic necessity, we construct an uncertainty band for a generalized (model selection) information criterion (GIC), defined as a criterion for which the limit in probability is identical to that of the normalized log-likelihood. This includes common special cases such as AIC & BIC. The method starts from the asymptotic normality of the GIC for the joint distribution of the candidate models in an independent and identically distributed (IID) data framework and proceeds by deriving the (asymptotically) exact distribution of the minimum. The calculation of an upper quantile for its distribution then involves the computation of multivariate Gaussian integrals, which is amenable to efficient implementation via the R package "mvtnorm". The performance of the methodology is tested on simulated data by checking the coverage probability of nominal upper quantiles and compared to the bootstrap. Both methods give coverages close to nominal for large samples, but the bootstrap is two orders of magnitude slower. The methodology is subsequently extended to two other commonly used model structures: regression and time series. In the regression case, we derive the corresponding asymptotically exact distribution of the minimum GIC invoking Lindeberg-Feller type conditions for triangular arrays and are thus able to similarly calculate upper quantiles for its distribution via multivariate Gaussian integration. The bootstrap once again provides a default competing procedure, and we find that similar comparison performance metrics hold as for the IID case. The time series case is complicated by far more intricate asymptotic regime for the joint distribution of the model GIC statistics. Under a Gaussian likelihood, the default in most packages, one needs to derive the limiting distribution of a normalized quadratic form for a realization from a stationary series. Under conditions on the process satisfied by ARMA models, a multivariate normal limit is once again achieved. The bootstrap can, however, be employed for its computation, whence we are once again in the multivariate Gaussian integration paradigm for upper quantile evaluation. Comparisons of this bootstrap-aided semi-exact method with the full-blown bootstrap once again reveal a similar performance but faster computation speeds. One of the most difficult problems in contemporary statistical methodological research is to be able to account for the extra variability introduced by model selection uncertainty, the so-called post-model selection inference (PMSI). We explore ways in which the GIC uncertainty band can be inverted to make inferences on the parameters. This is being attempted in the IID case by pivoting the CDF of the asymptotically exact distribution of the minimum GIC. For inference one parameter at a time and a small number of candidate models, this works well, whence the attained PMSI confidence intervals are wider than the MLE-based Wald, as expected.

Keywords: model selection inference, generalized information criteria, post model selection, Asymptotic Theory

Procedia PDF Downloads 61
7695 A Distribution Free Test for Censored Matched Pairs

Authors: Ayman Baklizi

Abstract:

This paper discusses the problem of testing hypotheses about the lifetime distributions of a matched pair based on censored data. A distribution free test based on a runs statistic is proposed. Its null distribution and power function are found in a simple convenient form. Some properties of the test statistic and its power function are studied.

Keywords: censored data, distribution free, matched pair, runs statistics

Procedia PDF Downloads 255
7694 Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm

Authors: Abdullah A. AlShaher

Abstract:

In this paper, we demonstrate how regression curves can be used to recognize 2D non-rigid handwritten shapes. Each shape is represented by a set of non-overlapping uniformly distributed landmarks. The underlying models utilize 2nd order of polynomials to model shapes within a training set. To estimate the regression models, we need to extract the required coefficients which describe the variations for a set of shape class. Hence, a least square method is used to estimate such modes. We then proceed by training these coefficients using the apparatus Expectation Maximization algorithm. Recognition is carried out by finding the least error landmarks displacement with respect to the model curves. Handwritten isolated Arabic characters are used to evaluate our approach.

Keywords: character recognition, regression curves, handwritten Arabic letters, expectation maximization algorithm

Procedia PDF Downloads 115
7693 Reminiscence Therapy for Alzheimer’s Disease Restrained on Logistic Regression Based Linear Bootstrap Aggregating

Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Xianpei Li, Yanmin Yuan, Tracy Lin Huan

Abstract:

Researchers are doing enchanting research into the inherited features of Alzheimer’s disease and probable consistent therapies. In Alzheimer’s, memories are extinct in reverse order; memories formed lately are more transitory than those from formerly. Reminiscence therapy includes the conversation of past actions, trials and knowledges with another individual or set of people, frequently with the help of perceptible reminders such as photos, household and other acquainted matters from the past, music and collection of tapes. In this manuscript, the competence of reminiscence therapy for Alzheimer’s disease is measured using logistic regression based linear bootstrap aggregating. Logistic regression is used to envisage the experiential features of the patient’s memory through various therapies. Linear bootstrap aggregating shows better stability and accuracy of reminiscence therapy used in statistical classification and regression of memories related to validation therapy, supportive psychotherapy, sensory integration and simulated presence therapy.

Keywords: Alzheimer’s disease, linear bootstrap aggregating, logistic regression, reminiscence therapy

Procedia PDF Downloads 273
7692 Predicting Survival in Cancer: How Cox Regression Model Compares to Artifial Neural Networks?

Authors: Dalia Rimawi, Walid Salameh, Amal Al-Omari, Hadeel AbdelKhaleq

Abstract:

Predication of Survival time of patients with cancer, is a core factor that influences oncologist decisions in different aspects; such as offered treatment plans, patients’ quality of life and medications development. For a long time proportional hazards Cox regression (ph. Cox) was and still the most well-known statistical method to predict survival outcome. But due to the revolution of data sciences; new predication models were employed and proved to be more flexible and provided higher accuracy in that type of studies. Artificial neural network is one of those models that is suitable to handle time to event predication. In this study we aim to compare ph Cox regression with artificial neural network method according to data handling and Accuracy of each model.

Keywords: Cox regression, neural networks, survival, cancer.

Procedia PDF Downloads 158
7691 Comparison between Some of Robust Regression Methods with OLS Method with Application

Authors: Sizar Abed Mohammed, Zahraa Ghazi Sadeeq

Abstract:

The use of the classic method, least squares (OLS) to estimate the linear regression parameters, when they are available assumptions, and capabilities that have good characteristics, such as impartiality, minimum variance, consistency, and so on. The development of alternative statistical techniques to estimate the parameters, when the data are contaminated with outliers. These are powerful methods (or resistance). In this paper, three of robust methods are studied, which are: Maximum likelihood type estimate M-estimator, Modified Maximum likelihood type estimate MM-estimator and Least Trimmed Squares LTS-estimator, and their results are compared with OLS method. These methods applied to real data taken from Duhok company for manufacturing furniture, the obtained results compared by using the criteria: Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE) and Mean Sum of Absolute Error (MSAE). Important conclusions that this study came up with are: a number of typical values detected by using four methods in the furniture line and very close to the data. This refers to the fact that close to the normal distribution of standard errors, but typical values in the doors line data, using OLS less than that detected by the powerful ways. This means that the standard errors of the distribution are far from normal departure. Another important conclusion is that the estimated values of the parameters by using the lifeline is very far from the estimated values using powerful methods for line doors, gave LTS- destined better results using standard MSE, and gave the M- estimator better results using standard MAPE. Moreover, we noticed that using standard MSAE, and MM- estimator is better. The programs S-plus (version 8.0, professional 2007), Minitab (version 13.2) and SPSS (version 17) are used to analyze the data.

Keywords: Robest, LTS, M estimate, MSE

Procedia PDF Downloads 208
7690 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 46
7689 Discrete State Prediction Algorithm Design with Self Performance Enhancement Capacity

Authors: Smail Tigani, Mohamed Ouzzif

Abstract:

This work presents a discrete quantitative state prediction algorithm with intelligent behavior making it able to self-improve some performance aspects. The specificity of this algorithm is the capacity of self-rectification of the prediction strategy before the final decision. The auto-rectification mechanism is based on two parallel mathematical models. In one hand, the algorithm predicts the next state based on event transition matrix updated after each observation. In the other hand, the algorithm extracts its residues trend with a linear regression representing historical residues data-points in order to rectify the first decision if needs. For a normal distribution, the interactivity between the two models allows the algorithm to self-optimize its performance and then make better prediction. Designed key performance indicator, computed during a Monte Carlo simulation, shows the advantages of the proposed approach compared with traditional one.

Keywords: discrete state, Markov Chains, linear regression, auto-adaptive systems, decision making, Monte Carlo Simulation

Procedia PDF Downloads 475
7688 An Application of Quantile Regression to Large-Scale Disaster Research

Authors: Katarzyna Wyka, Dana Sylvan, JoAnn Difede

Abstract:

Background and significance: The following disaster, population-based screening programs are routinely established to assess physical and psychological consequences of exposure. These data sets are highly skewed as only a small percentage of trauma-exposed individuals develop health issues. Commonly used statistical methodology in post-disaster mental health generally involves population-averaged models. Such models aim to capture the overall response to the disaster and its aftermath; however, they may not be sensitive enough to accommodate population heterogeneity in symptomatology, such as post-traumatic stress or depressive symptoms. Methods: We use an archival longitudinal data set from Weill-Cornell 9/11 Mental Health Screening Program established following the World Trade Center (WTC) terrorist attacks in New York in 2001. Participants are rescue and recovery workers who participated in the site cleanup and restoration (n=2960). The main outcome is the post-traumatic stress symptoms (PTSD) severity score assessed via clinician interviews (CAPS). For a detailed understanding of response to the disaster and its aftermath, we are adapting quantile regression methodology with particular focus on predictors of extreme distress and resilience to trauma. Results: The response variable was defined as the quantile of the CAPS score for each individual under two different scenarios specifying the unconditional quantiles based on: 1) clinically meaningful CAPS cutoff values and 2) CAPS distribution in the population. We present graphical summaries of the differential effects. For instance, we found that the effect of the WTC exposures, namely seeing bodies and feeling that life was in danger during rescue/recovery work was associated with very high PTSD symptoms. A similar effect was apparent in individuals with prior psychiatric history. Differential effects were also present for age and education level of the individuals. Conclusion: We evaluate the utility of quantile regression in disaster research in contrast to the commonly used population-averaged models. We focused on assessing the distribution of risk factors for post-traumatic stress symptoms across quantiles. This innovative approach provides a comprehensive understanding of the relationship between dependent and independent variables and could be used for developing tailored training programs and response plans for different vulnerability groups.

Keywords: disaster workers, post traumatic stress, PTSD, quantile regression

Procedia PDF Downloads 258
7687 Evaluation and Analysis of Light Emitting Diode Distribution in an Indoor Visible Light Communication

Authors: Olawale J. Olaluyi, Ayodele S. Oluwole, O. Akinsanmi, Johnson O. Adeogo

Abstract:

Communication using visible light VLC is considered a cutting-edge technology used for data transmission and illumination since it uses less energy than radio frequency (RF) technology and has a large bandwidth, extended lifespan, and high security. The room's irregular distribution of small base stations, or LED array distribution, is the cause of the obscured area, minimum signal-to-noise ratio (SNR), and received power. In order to maximize the received power distribution and SNR at the center of the room for an indoor VLC system, the researchers offer an innovative model for the placement of eight LED array distributions in this work. We have investigated the arrangement of the LED array distribution with regard to receiving power to fill the open space in the center of the room. The suggested LED array distribution saved 36.2% of the transmitted power, according to the simulation findings. Aside from that, the entire room was equally covered. This leads to an increase in both received power and SNR.

Keywords: visible light communication (VLC), light emitted diodes (LED), optical power distribution, signal-to-noise ratio (SNR).

Procedia PDF Downloads 47
7686 Assessment of the Spatio-Temporal Distribution of Pteridium aquilinum (Bracken Fern) Invasion on the Grassland Plateau in Nyika National Park

Authors: Andrew Kanzunguze, Lusayo Mwabumba, Jason K. Gilbertson, Dominic B. Gondwe, George Z. Nxumayo

Abstract:

Knowledge about the spatio-temporal distribution of invasive plants in protected areas provides a base from which hypotheses explaining proliferation of plant invasions can be made alongside development of relevant invasive plant monitoring programs. The aim of this study was to investigate the spatio-temporal distribution of bracken fern on the grassland plateau of Nyika National Park over the past 30 years (1986-2016) as well as to determine the current extent of the invasion. Remote sensing, machine learning, and statistical modelling techniques (object-based image analysis, image classification and linear regression analysis) in geographical information systems were used to determine both the spatial and temporal distribution of bracken fern in the study area. Results have revealed that bracken fern has been increasing coverage on the Nyika plateau at an estimated annual rate of 87.3 hectares since 1986. This translates to an estimated net increase of 2,573.1 hectares, which was recorded from 1,788.1 hectares (1986) to 4,361.9 hectares (2016). As of 2017 bracken fern covered 20,940.7 hectares, approximately 14.3% of the entire grassland plateau. Additionally, it was observed that the fern was distributed most densely around Chelinda camp (on the central plateau) as well as in forest verges and roadsides across the plateau. Based on these results it is recommended that Ecological Niche Modelling approaches be employed to (i) isolate the most important factors influencing bracken fern proliferation as well as (ii) identify and prioritize areas requiring immediate control interventions so as to minimize bracken fern proliferation in Nyika National Park.

Keywords: bracken fern, image classification, Landsat-8, Nyika National Park, spatio-temporal distribution

Procedia PDF Downloads 152
7685 A Multi Agent Based Protection Scheme for Smart Distribution Network in Presence of Distributed Energy Resources

Authors: M. R. Ebrahimi, B. Mahdaviani

Abstract:

Conventional electric distribution systems are radial in nature, supplied at one end through a main source. These networks generally have a simple protection system usually implemented using fuses, re-closers, and over-current relays. Recently, great attention has been paid to applying Distributed energy resources (DERs) throughout electric distribution systems. Presence of such generation in a network leads to losing coordination of protection devices. Therefore, it is desired to develop an algorithm which is capable of protecting distribution systems that include DER. On the other hand smart grid brings opportunities to the power system. Fast advancement in communication and measurement techniques accelerates the development of multi agent system (MAS). So in this paper, a new approach for the protection of distribution networks in the presence of DERs is presented base on MAS. The proposed scheme has been implemented on a sample 27-bus distribution network.

Keywords: distributed energy resource, distribution network, protection, smart grid, multi agent system

Procedia PDF Downloads 572
7684 Kinetic Study of Physical Quality Changes on Jumbo Squid (Dosidicus gigas) Slices during Application High-Pressure Impregnation

Authors: Mario Perez-Won, Roberto Lemus-Mondaca, Fernanda Marin, Constanza Olivares

Abstract:

This study presents the simultaneous application of high hydrostatic pressure (HHP) and osmotic dehydration of jumbo squid (Dosidicus gigas) slice. Diffusion coefficients for both components water and solids were improved by the process pressure, being influenced by pressure level. The working conditions were different pressures such as 100, 250, 400 MPa and pressure atmospheric (0.1 MPa) for time intervals from 30 to 300 seconds and a 15% NaCl concentration. The mathematical expressions used for mass transfer simulations both water and salt were those corresponding to Newton, Henderson and Pabis, Page and Weibull models, where the Weibull and Henderson-Pabis models presented the best fitted to the water and salt experimental data, respectively. The values for water diffusivity coefficients varied from 1.62 to 8.10x10⁻⁹ m²/s whereas that for salt varied among 14.18 to 36.07x10⁻⁹ m²/s for selected conditions. Finally, as to quality parameters studied under the range of experimental conditions studied, the treatment at 250 MPa yielded on the samples a minimum hardness, whereas springiness, cohesiveness and chewiness at 100, 250 and 400 MPa treatments presented statistical differences regarding to unpressurized samples. The colour parameters L* (lightness) increased, however, but b* (yellowish) and a* (reddish) parameters decreased when increasing pressure level. This way, samples presented a brighter aspect and a mildly cooked appearance. The results presented in this study can support the enormous potential of hydrostatic pressure application as a technique important for compounds impregnation under high pressure.

Keywords: colour, diffusivity, high pressure, jumbo squid, modelling, texture

Procedia PDF Downloads 315
7683 Nonparametric Quantile Regression for Multivariate Spatial Data

Authors: S. H. Arnaud Kanga, O. Hili, S. Dabo-Niang

Abstract:

Spatial prediction is an issue appealing and attracting several fields such as agriculture, environmental sciences, ecology, econometrics, and many others. Although multiple non-parametric prediction methods exist for spatial data, those are based on the conditional expectation. This paper took a different approach by examining a non-parametric spatial predictor of the conditional quantile. The study especially observes the stationary multidimensional spatial process over a rectangular domain. Indeed, the proposed quantile is obtained by inverting the conditional distribution function. Furthermore, the proposed estimator of the conditional distribution function depends on three kernels, where one of them controls the distance between spatial locations, while the other two control the distance between observations. In addition, the almost complete convergence and the convergence in mean order q of the kernel predictor are obtained when the sample considered is alpha-mixing. Such approach of the prediction method gives the advantage of accuracy as it overcomes sensitivity to extreme and outliers values.

Keywords: conditional quantile, kernel, nonparametric, stationary

Procedia PDF Downloads 125
7682 Reliability Analysis: A Case Study in Designing Power Distribution System of Tehran Oil Refinery

Authors: A. B. Arani, R. Shojaee

Abstract:

Electrical power distribution system is one of the vital infrastructures of an oil refinery, which requires wide area of study and planning before construction. In this paper, power distribution reliability of Tehran Refinery’s KHDS/GHDS unit has been taken into consideration to investigate the importance of these kinds of studies and evaluate the designed system. In this regard, the authors chose and evaluated different configurations of electrical power distribution along with the existing configuration with the aim of finding the most suited configuration which satisfies the conditions of minimum cost of electrical system construction, minimum cost imposed by loss of load, and maximum power system reliability.

Keywords: power distribution system, oil refinery, reliability, investment cost, interruption cost

Procedia PDF Downloads 843
7681 Regional Hydrological Extremes Frequency Analysis Based on Statistical and Hydrological Models

Authors: Hadush Kidane Meresa

Abstract:

The hydrological extremes frequency analysis is the foundation for the hydraulic engineering design, flood protection, drought management and water resources management and planning to utilize the available water resource to meet the desired objectives of different organizations and sectors in a country. This spatial variation of the statistical characteristics of the extreme flood and drought events are key practice for regional flood and drought analysis and mitigation management. For different hydro-climate of the regions, where the data set is short, scarcity, poor quality and insufficient, the regionalization methods are applied to transfer at-site data to a region. This study aims in regional high and low flow frequency analysis for Poland River Basins. Due to high frequent occurring of hydrological extremes in the region and rapid water resources development in this basin have caused serious concerns over the flood and drought magnitude and frequencies of the river in Poland. The magnitude and frequency result of high and low flows in the basin is needed for flood and drought planning, management and protection at present and future. Hydrological homogeneous high and low flow regions are formed by the cluster analysis of site characteristics, using the hierarchical and C- mean clustering and PCA method. Statistical tests for regional homogeneity are utilized, by Discordancy and Heterogeneity measure tests. In compliance with results of the tests, the region river basin has been divided into ten homogeneous regions. In this study, frequency analysis of high and low flows using AM for high flow and 7-day minimum low flow series is conducted using six statistical distributions. The use of L-moment and LL-moment method showed a homogeneous region over entire province with Generalized logistic (GLOG), Generalized extreme value (GEV), Pearson type III (P-III), Generalized Pareto (GPAR), Weibull (WEI) and Power (PR) distributions as the regional drought and flood frequency distributions. The 95% percentile and Flow duration curves of 1, 7, 10, 30 days have been plotted for 10 stations. However, the cluster analysis performed two regions in west and east of the province where L-moment and LL-moment method demonstrated the homogeneity of the regions and GLOG and Pearson Type III (PIII) distributions as regional frequency distributions for each region, respectively. The spatial variation and regional frequency distribution of flood and drought characteristics for 10 best catchment from the whole region was selected and beside the main variable (streamflow: high and low) we used variables which are more related to physiographic and drainage characteristics for identify and delineate homogeneous pools and to derive best regression models for ungauged sites. Those are mean annual rainfall, seasonal flow, average slope, NDVI, aspect, flow length, flow direction, maximum soil moisture, elevation, and drainage order. The regional high-flow or low-flow relationship among one streamflow characteristics with (AM or 7-day mean annual low flows) some basin characteristics is developed using Generalized Linear Mixed Model (GLMM) and Generalized Least Square (GLS) regression model, providing a simple and effective method for estimation of flood and drought of desired return periods for ungauged catchments.

Keywords: flood , drought, frequency, magnitude, regionalization, stochastic, ungauged, Poland

Procedia PDF Downloads 565
7680 Determining the Causality Variables in Female Genital Mutilation: A Factor Screening Approach

Authors: Ekele Alih, Enejo Jalija

Abstract:

Female Genital Mutilation (FGM) is made up of three types namely: Clitoridectomy, Excision and Infibulation. In this study, we examine the factors responsible for FGM in order to identify the causality variables in a logistic regression approach. From the result of the survey conducted by the Public Health Division, Nigeria Institute of Medical Research, Yaba, Lagos State, the tau statistic, τ was used to screen 9 factors that causes FGM in order to select few of the predictors before multiple regression equation is obtained. The need for this may be that the sample size may not be able to sustain having a regression with all the predictors or to avoid multi-collinearity. A total of 300 respondents, comprising 150 adult males and 150 adult females were selected for the household survey based on the multi-stage sampling procedure. The tau statistic,

Keywords: female genital mutilation, logistic regression, tau statistic, African society

Procedia PDF Downloads 227
7679 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation

Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen

Abstract:

Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.

Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning

Procedia PDF Downloads 36
7678 Base Change for Fisher Metrics: Case of the q-Gaussian Inverse Distribution

Authors: Gabriel I. Loaiza Ossa, Carlos A. Cadavid Moreno, Juan C. Arango Parra

Abstract:

It is known that the Riemannian manifold determined by the family of inverse Gaussian distributions endowed with the Fisher metric has negative constant curvature κ= -1/2, as does the family of usual Gaussian distributions. In the present paper, firstly, we arrive at this result by following a different path, much simpler than the previous ones. We first put the family in exponential form, thus endowing the family with a new set of parameters, or coordinates, θ₁, θ₂; then we determine the matrix of the Fisher metric in terms of these parameters; and finally we compute this matrix in the original parameters. Secondly, we define the inverse q-Gaussian distribution family (q < 3) as the family obtained by replacing the usual exponential function with the Tsallis q-exponential function in the expression for the inverse Gaussian distribution and observe that it supports two possible geometries, the Fisher and the q-Fisher geometry. And finally, we apply our strategy to obtain results about the Fisher and q-Fisher geometry of the inverse q-Gaussian distribution family, similar to the ones obtained in the case of the inverse Gaussian distribution family.

Keywords: base of changes, information geometry, inverse Gaussian distribution, inverse q-Gaussian distribution, statistical manifolds

Procedia PDF Downloads 211
7677 Pattern Synthesis of Nonuniform Linear Arrays Including Mutual Coupling Effects Based on Gaussian Process Regression and Genetic Algorithm

Authors: Ming Su, Ziqiang Mu

Abstract:

This paper proposes a synthesis method for nonuniform linear antenna arrays that combine Gaussian process regression (GPR) and genetic algorithm (GA). In this method, the GPR model can be used to calculate the array radiation pattern in the presence of mutual coupling effects, and then the GA is used to optimize the excitations and locations of the elements so as to generate the desired radiation pattern. In this paper, taking a 9-element nonuniform linear array as an example and the desired radiation pattern corresponding to a Chebyshev distribution as the optimization objective, optimize the excitations and locations of the elements. Finally, the optimization results are verified by electromagnetic simulation software CST, which shows that the method is effective.

Keywords: nonuniform linear antenna arrays, GPR, GA, mutual coupling effects, active element pattern

Procedia PDF Downloads 81
7676 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate

Procedia PDF Downloads 160
7675 Influence of Harmonics on Medium Voltage Distribution System: A Case Study for Residential Area

Authors: O. Arikan, C. Kocatepe, G. Ucar, Y. Hacialiefendioglu

Abstract:

In this paper, influence of harmonics on medium voltage distribution system of Bogazici Electricity Distribution Inc. (BEDAS) which takes place at Istanbul/Turkey is investigated. A ring network consisting of residential loads is taken into account for this study. Real system parameters and measurement results are used for simulations. Also, probable working conditions of the system are analyzed for %50, %75 and %100 loading of transformers with similar harmonic contents. Results of the study are exhibited the influence of nonlinear loads on %THDV, P.F. and technical losses of the medium voltage distribution system.

Keywords: distribution system, harmonic, technical losses, power factor, total harmonic distortion, residential load, medium voltage

Procedia PDF Downloads 547
7674 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 114
7673 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 332
7672 Forecast of the Small Wind Turbines Sales with Replacement Purchases and with or without Account of Price Changes

Authors: V. Churkin, M. Lopatin

Abstract:

The purpose of the paper is to estimate the US small wind turbines market potential and forecast the small wind turbines sales in the US. The forecasting method is based on the application of the Bass model and the generalized Bass model of innovations diffusion under replacement purchases. In the work an exponential distribution is used for modeling of replacement purchases. Only one parameter of such distribution is determined by average lifetime of small wind turbines. The identification of the model parameters is based on nonlinear regression analysis on the basis of the annual sales statistics which has been published by the American Wind Energy Association (AWEA) since 2001 up to 2012. The estimation of the US average market potential of small wind turbines (for adoption purchases) without account of price changes is 57080 (confidence interval from 49294 to 64866 at P = 0.95) under average lifetime of wind turbines 15 years, and 62402 (confidence interval from 54154 to 70648 at P = 0.95) under average lifetime of wind turbines 20 years. In the first case the explained variance is 90,7%, while in the second - 91,8%. The effect of the wind turbines price changes on their sales was estimated using generalized Bass model. This required a price forecast. To do this, the polynomial regression function, which is based on the Berkeley Lab statistics, was used. The estimation of the US average market potential of small wind turbines (for adoption purchases) in that case is 42542 (confidence interval from 32863 to 52221 at P = 0.95) under average lifetime of wind turbines 15 years, and 47426 (confidence interval from 36092 to 58760 at P = 0.95) under average lifetime of wind turbines 20 years. In the first case the explained variance is 95,3%, while in the second –95,3%.

Keywords: bass model, generalized bass model, replacement purchases, sales forecasting of innovations, statistics of sales of small wind turbines in the United States

Procedia PDF Downloads 324
7671 A Comparison of Neural Network and DOE-Regression Analysis for Predicting Resource Consumption of Manufacturing Processes

Authors: Frank Kuebler, Rolf Steinhilper

Abstract:

Artificial neural networks (ANN) as well as Design of Experiments (DOE) based regression analysis (RA) are mainly used for modeling of complex systems. Both methodologies are commonly applied in process and quality control of manufacturing processes. Due to the fact that resource efficiency has become a critical concern for manufacturing companies, these models needs to be extended to predict resource-consumption of manufacturing processes. This paper describes an approach to use neural networks as well as DOE based regression analysis for predicting resource consumption of manufacturing processes and gives a comparison of the achievable results based on an industrial case study of a turning process.

Keywords: artificial neural network, design of experiments, regression analysis, resource efficiency, manufacturing process

Procedia PDF Downloads 492
7670 Progressive Type-I Interval Censoring with Binomial Removal-Estimation and Its Properties

Authors: Sonal Budhiraja, Biswabrata Pradhan

Abstract:

This work considers statistical inference based on progressive Type-I interval censored data with random removal. The scheme of progressive Type-I interval censoring with random removal can be described as follows. Suppose n identical items are placed on a test at time T0 = 0 under k pre-fixed inspection times at pre-specified times T1 < T2 < . . . < Tk, where Tk is the scheduled termination time of the experiment. At inspection time Ti, Ri of the remaining surviving units Si, are randomly removed from the experiment. The removal follows a binomial distribution with parameters Si and pi for i = 1, . . . , k, with pk = 1. In this censoring scheme, the number of failures in different inspection intervals and the number of randomly removed items at pre-specified inspection times are observed. Asymptotic properties of the maximum likelihood estimators (MLEs) are established under some regularity conditions. A β-content γ-level tolerance interval (TI) is determined for two parameters Weibull lifetime model using the asymptotic properties of MLEs. The minimum sample size required to achieve the desired β-content γ-level TI is determined. The performance of the MLEs and TI is studied via simulation.

Keywords: asymptotic normality, consistency, regularity conditions, simulation study, tolerance interval

Procedia PDF Downloads 217
7669 Predicting Football Player Performance: Integrating Data Visualization and Machine Learning

Authors: Saahith M. S., Sivakami R.

Abstract:

In the realm of football analytics, particularly focusing on predicting football player performance, the ability to forecast player success accurately is of paramount importance for teams, managers, and fans. This study introduces an elaborate examination of predicting football player performance through the integration of data visualization methods and machine learning algorithms. The research entails the compilation of an extensive dataset comprising player attributes, conducting data preprocessing, feature selection, model selection, and model training to construct predictive models. The analysis within this study will involve delving into feature significance using methodologies like Select Best and Recursive Feature Elimination (RFE) to pinpoint pertinent attributes for predicting player performance. Various machine learning algorithms, including Random Forest, Decision Tree, Linear Regression, Support Vector Regression (SVR), and Artificial Neural Networks (ANN), will be explored to develop predictive models. The evaluation of each model's performance utilizing metrics such as Mean Squared Error (MSE) and R-squared will be executed to gauge their efficacy in predicting player performance. Furthermore, this investigation will encompass a top player analysis to recognize the top-performing players based on the anticipated overall performance scores. Nationality analysis will entail scrutinizing the player distribution based on nationality and investigating potential correlations between nationality and player performance. Positional analysis will concentrate on examining the player distribution across various positions and assessing the average performance of players in each position. Age analysis will evaluate the influence of age on player performance and identify any discernible trends or patterns associated with player age groups. The primary objective is to predict a football player's overall performance accurately based on their individual attributes, leveraging data-driven insights to enrich the comprehension of player success on the field. By amalgamating data visualization and machine learning methodologies, the aim is to furnish valuable tools for teams, managers, and fans to effectively analyze and forecast player performance. This research contributes to the progression of sports analytics by showcasing the potential of machine learning in predicting football player performance and offering actionable insights for diverse stakeholders in the football industry.

Keywords: football analytics, player performance prediction, data visualization, machine learning algorithms, random forest, decision tree, linear regression, support vector regression, artificial neural networks, model evaluation, top player analysis, nationality analysis, positional analysis

Procedia PDF Downloads 14
7668 Logistic Regression Model versus Additive Model for Recurrent Event Data

Authors: Entisar A. Elgmati

Abstract:

Recurrent infant diarrhea is studied using daily data collected in Salvador, Brazil over one year and three months. A logistic regression model is fitted instead of Aalen's additive model using the same covariates that were used in the analysis with the additive model. The model gives reasonably similar results to that using additive regression model. In addition, the problem with the estimated conditional probabilities not being constrained between zero and one in additive model is solved here. Also martingale residuals that have been used to judge the goodness of fit for the additive model are shown to be useful for judging the goodness of fit of the logistic model.

Keywords: additive model, cumulative probabilities, infant diarrhoea, recurrent event

Procedia PDF Downloads 605
7667 Paraoxonase 1 (PON 1) Arylesterase Activity and Apolipoprotein B: Predictors of Myocardial Infarction

Authors: Mukund Ramchandra Mogarekar, Pankaj Kumar, Shraddha Vilas More

Abstract:

Background: Myocardial infarction (MI) is defined as myocardial cell death due to prolonged ischemia as a consequence of atherosclerosis. TC, low-density lipoprotein cholesterol (LDL-C), very low-density lipoprotein cholesterol (VLDL-C), Apo B, and lipoprotein(a) was found as atherogenic factors while high-density lipoprotein cholesterol (HDL-C) was anti-atherogenic. Methods and Results: The study group consists of 40, MI subjects and 40 healthy individuals in control group. PON 1 Arylesterase activity (ARE) was measured by using phenylacetate. Phenotyping was done by double substrate method, serum AOPP by using chloramine T and Apo B by Turbidimetric immunoassay. PON 1 ARE activities were significantly lower (p< 0.05) and AOPPs & Apo B were higher in MI subjects (p> 0.05). Trimodal distribution of QQ, QR, and RR phenotypes of study population showed no significant difference among cases and controls (p> 0.05). Univariate binary logistic regression analysis showed independent association of TC, HDL, LDL, AOPP, Apo B, and PON 1 ARE activity with MI and multiple forward binary logistic regression showed PON 1 ARE activity and serum Apo B as an independent predictor of MI. Conclusions: Decrease in PON 1 ARE activity in MI subjects than in controls suggests increased oxidative stress in MI which is reflected by significantly increased AOPP and Apo B. PON1 polymorphism of QQ, QR and RR showed no significant difference in protection against MI. Univariate and multiple binary logistic regression showed PON1 ARE activity and serum Apo B as an independent predictor of MI.

Keywords: advanced oxidation protein product, apolipoprotein B, PON 1 arylesterase activity, myocardial infarction

Procedia PDF Downloads 235