Search results for: random intercepts model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18230

Search results for: random intercepts model

18200 Different Sampling Schemes for Semi-Parametric Frailty Model

Authors: Nursel Koyuncu, Nihal Ata Tutkun

Abstract:

Frailty model is a survival model that takes into account the unobserved heterogeneity for exploring the relationship between the survival of an individual and several covariates. In the recent years, proposed survival models become more complex and this feature causes convergence problems especially in large data sets. Therefore selection of sample from these big data sets is very important for estimation of parameters. In sampling literature, some authors have defined new sampling schemes to predict the parameters correctly. For this aim, we try to see the effect of sampling design in semi-parametric frailty model. We conducted a simulation study in R programme to estimate the parameters of semi-parametric frailty model for different sample sizes, censoring rates under classical simple random sampling and ranked set sampling schemes. In the simulation study, we used data set recording 17260 male Civil Servants aged 40–64 years with complete 10-year follow-up as population. Time to death from coronary heart disease is treated as a survival-time and age, systolic blood pressure are used as covariates. We select the 1000 samples from population using different sampling schemes and estimate the parameters. From the simulation study, we concluded that ranked set sampling design performs better than simple random sampling for each scenario.

Keywords: frailty model, ranked set sampling, efficiency, simple random sampling

Procedia PDF Downloads 211
18199 On the Use of Analytical Performance Models to Design a High-Performance Active Queue Management Scheme

Authors: Shahram Jamali, Samira Hamed

Abstract:

One of the open issues in Random Early Detection (RED) algorithm is how to set its parameters to reach high performance for the dynamic conditions of the network. Although original RED uses fixed values for its parameters, this paper follows a model-based approach to upgrade performance of the RED algorithm. It models the routers queue behavior by using the Markov model and uses this model to predict future conditions of the queue. This prediction helps the proposed algorithm to make some tunings over RED's parameters and provide efficiency and better performance. Widespread packet level simulations confirm that the proposed algorithm, called Markov-RED, outperforms RED and FARED in terms of queue stability, bottleneck utilization and dropped packets count.

Keywords: active queue management, RED, Markov model, random early detection algorithm

Procedia PDF Downloads 539
18198 Modeling Of The Random Impingement Erosion Due To The Impact Of The Solid Particles

Authors: Siamack A. Shirazi, Farzin Darihaki

Abstract:

Solid particles could be found in many multiphase flows, including transport pipelines and pipe fittings. Such particles interact with the pipe material and cause erosion which threats the integrity of the system. Therefore, predicting the erosion rate is an important factor in the design and the monitor of such systems. Mechanistic models can provide reliable predictions for many conditions while demanding only relatively low computational cost. Mechanistic models utilize a representative particle trajectory to predict the impact characteristics of the majority of the particle impacts that cause maximum erosion rate in the domain. The erosion caused by particle impacts is not only due to the direct impacts but also random impingements. In the present study, an alternative model has been introduced to describe the erosion due to random impingement of particles. The present model provides a realistic trend for erosion with changes in the particle size and particle Stokes number. The present model is examined against the experimental data and CFD simulation results and indicates better agreement with the data incomparison to the available models in the literature.

Keywords: erosion, mechanistic modeling, particles, multiphase flow, gas-liquid-solid

Procedia PDF Downloads 169
18197 Global Direct Search Optimization of a Tuned Liquid Column Damper Subject to Stochastic Load

Authors: Mansour H. Alkmim, Adriano T. Fabro, Marcus V. G. De Morais

Abstract:

In this paper, a global direct search optimization algorithm to reduce vibration of a tuned liquid column damper (TLCD), a class of passive structural control device, is presented. The objective is to find optimized parameters for the TLCD under stochastic load from different wind power spectral density. A verification is made considering the analytical solution of an undamped primary system under white noise excitation. Finally, a numerical example considering a simplified wind turbine model is given to illustrate the efficacy of the TLCD. Results from the random vibration analysis are shown for four types of random excitation wind model where the response PSDs obtained showed good vibration attenuation.

Keywords: generalized pattern search, parameter optimization, random vibration analysis, vibration suppression

Procedia PDF Downloads 275
18196 Fast Bayesian Inference of Multivariate Block-Nearest Neighbor Gaussian Process (NNGP) Models for Large Data

Authors: Carlos Gonzales, Zaida Quiroz, Marcos Prates

Abstract:

Several spatial variables collected at the same location that share a common spatial distribution can be modeled simultaneously through a multivariate geostatistical model that takes into account the correlation between these variables and the spatial autocorrelation. The main goal of this model is to perform spatial prediction of these variables in the region of study. Here we focus on a geostatistical multivariate formulation that relies on sharing common spatial random effect terms. In particular, the first response variable can be modeled by a mean that incorporates a shared random spatial effect, while the other response variables depend on this shared spatial term, in addition to specific random spatial effects. Each spatial random effect is defined through a Gaussian process with a valid covariance function, but in order to improve the computational efficiency when the data are large, each Gaussian process is approximated to a Gaussian random Markov field (GRMF), specifically to the block nearest neighbor Gaussian process (Block-NNGP). This approach involves dividing the spatial domain into several dependent blocks under certain constraints, where the cross blocks allow capturing the spatial dependence on a large scale, while each individual block captures the spatial dependence on a smaller scale. The multivariate geostatistical model belongs to the class of Latent Gaussian Models; thus, to achieve fast Bayesian inference, it is used the integrated nested Laplace approximation (INLA) method. The good performance of the proposed model is shown through simulations and applications for massive data.

Keywords: Block-NNGP, geostatistics, gaussian process, GRMF, INLA, multivariate models.

Procedia PDF Downloads 97
18195 Optimizing Skill Development in Golf Putting: An Investigation of Blocked, Random, and Increasing Practice Schedules

Authors: John White

Abstract:

This study investigated the effects of practice schedules on learning and performance in golf putting, specifically focusing on the impact of increasing contextual interference (CI). University students (n=7) were randomly assigned to blocked, random, or increasing practice schedules. During acquisition, participants performed 135 putting trials using different weighted golf balls. The blocked group followed a specific sequence of ball weights, while the random group practiced with the balls in a random order. The increasing group started with a blocked schedule, transitioned to a serial schedule, and concluded with a random schedule. Retention and transfer tests were conducted 24 hours later. The results indicated that high levels of CI (random practice) were more beneficial for learning than low levels of CI (blocked practice). The increasing practice schedule, incorporating blocked, serial, and random practice, demonstrated advantages over traditional blocked and random schedules. Additionally, EEG was used to explore the neurophysiological effects of the increasing practice schedule.

Keywords: skill acquisition, motor control, learning, contextual interference

Procedia PDF Downloads 96
18194 Random Vertical Seismic Vibrations of the Long Span Cantilever Beams

Authors: Sergo Esadze

Abstract:

Seismic resistance norms require calculation of cantilevers on vertical components of the base seismic acceleration. Long span cantilevers, as a rule, must be calculated as a separate construction element. According to the architectural-planning solution, functional purposes and environmental condition of a designing buildings/structures, long span cantilever construction may be of very different types: both by main bearing element (beam, truss, slab), and by material (reinforced concrete, steel). A choice from these is always linked with bearing construction system of the building. Research of vertical seismic vibration of these constructions requires individual approach for each (which is not specified in the norms) in correlation with model of seismic load. The latest may be given both as deterministic load and as a random process. Loading model as a random process is more adequate to this problem. In presented paper, two types of long span (from 6m – up to 12m) reinforcement concrete cantilever beams have been considered: a) bearing elements of cantilevers, i.e., elements in which they fixed, have cross-sections with large sizes and cantilevers are made with haunch; b) cantilever beam with load-bearing rod element. Calculation models are suggested, separately for a) and b) types. They are presented as systems with finite quantity degree (concentrated masses) of freedom. Conditions for fixing ends are corresponding with its types. Vertical acceleration and vertical component of the angular acceleration affect masses. Model is based on assumption translator-rotational motion of the building in the vertical plane, caused by vertical seismic acceleration. Seismic accelerations are considered as random processes and presented by multiplication of the deterministic envelope function on stationary random process. Problem is solved within the framework of the correlation theory of random process. Solved numerical examples are given. The method is effective for solving the specific problems.

Keywords: cantilever, random process, seismic load, vertical acceleration

Procedia PDF Downloads 189
18193 Determining Optimal Number of Trees in Random Forests

Authors: Songul Cinaroglu

Abstract:

Background: Random Forest is an efficient, multi-class machine learning method using for classification, regression and other tasks. This method is operating by constructing each tree using different bootstrap sample of the data. Determining the number of trees in random forests is an open question in the literature for studies about improving classification performance of random forests. Aim: The aim of this study is to analyze whether there is an optimal number of trees in Random Forests and how performance of Random Forests differ according to increase in number of trees using sample health data sets in R programme. Method: In this study we analyzed the performance of Random Forests as the number of trees grows and doubling the number of trees at every iteration using “random forest” package in R programme. For determining minimum and optimal number of trees we performed Mc Nemar test and Area Under ROC Curve respectively. Results: At the end of the analysis it was found that as the number of trees grows, it does not always means that the performance of the forest is better than forests which have fever trees. In other words larger number of trees only increases computational costs but not increases performance results. Conclusion: Despite general practice in using random forests is to generate large number of trees for having high performance results, this study shows that increasing number of trees doesn’t always improves performance. Future studies can compare different kinds of data sets and different performance measures to test whether Random Forest performance results change as number of trees increase or not.

Keywords: classification methods, decision trees, number of trees, random forest

Procedia PDF Downloads 395
18192 Three-Stage Multivariate Stratified Sample Surveys with Probabilistic Cost Constraint and Random Variance

Authors: Sanam Haseen, Abdul Bari

Abstract:

In this paper a three stage multivariate programming problem with random survey cost and variances as random variables has been formulated as a non-linear stochastic programming problem. The problem has been converted into an equivalent deterministic form using chance constraint programming and modified E-modeling. An empirical study of the problem has been done at the end of the paper using R-simulation.

Keywords: chance constraint programming, modified E-model, stochastic programming, stratified sample surveys, three stage sample surveys

Procedia PDF Downloads 458
18191 Optimization of Machine Learning Regression Results: An Application on Health Expenditures

Authors: Songul Cinaroglu

Abstract:

Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.

Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure

Procedia PDF Downloads 226
18190 [Keynote Talk]: Existence of Random Fixed Point Theorem for Contractive Mappings

Authors: D. S. Palimkar

Abstract:

Random fixed point theory has received much attention in recent years, and it is needed for the study of various classes of random equations. The study of random fixed point theorems was initiated by the Prague school of probabilistic in the 1950s. The existence and uniqueness of fixed points for the self-maps of a metric space by altering distances between the points with the use of a control function is an interesting aspect in the classical fixed point theory. In a new category of fixed point problems for a single self-map with the help of a control function that alters the distance between two points in a metric space which they called an altering distance function. In this paper, we prove the results of existence of random common fixed point and its uniqueness for a pair of random mappings under weakly contractive condition for generalizing alter distance function in polish spaces using Random Common Fixed Point Theorem for Generalized Weakly Contractions.

Keywords: Polish space, random common fixed point theorem, weakly contractive mapping, altering function

Procedia PDF Downloads 273
18189 Joint Modeling of Bottle Use, Daily Milk Intake from Bottles, and Daily Energy Intake in Toddlers

Authors: Yungtai Lo

Abstract:

The current study follows an educational intervention on bottle-weaning to simultaneously evaluate the effect of the bottle-weaning intervention on reducing bottle use, daily milk intake from bottles, and daily energy intake in toddlers aged 11 to 13 months. A shared parameter model and a random effects model are used to jointly model bottle use, daily milk intake from bottles, and daily energy intake. We show in the two joint models that the bottle-weaning intervention promotes bottleweaning, and reduces daily milk intake from bottles in toddlers not off bottles and daily energy intake. We also show that the odds of drinking from a bottle were positively associated with the amount of milk intake from bottles and increased daily milk intake from bottles was associated with increased daily energy intake. The effect of bottle use on daily energy intake is through its effect on increasing daily milk intake from bottles that in turn increases daily energy intake.

Keywords: two-part model, semi-continuous variable, joint model, gamma regression, shared parameter model, random effects model

Procedia PDF Downloads 287
18188 On Four Models of a Three Server Queue with Optional Server Vacations

Authors: Kailash C. Madan

Abstract:

We study four models of a three server queueing system with Bernoulli schedule optional server vacations. Customers arriving at the system one by one in a Poisson process are provided identical exponential service by three parallel servers according to a first-come, first served queue discipline. In model A, all three servers may be allowed a vacation at one time, in Model B at the most two of the three servers may be allowed a vacation at one time, in model C at the most one server is allowed a vacation, and in model D no server is allowed a vacation. We study steady the state behavior of the four models and obtain steady state probability generating functions for the queue size at a random point of time for all states of the system. In model D, a known result for a three server queueing system without server vacations is derived.

Keywords: a three server queue, Bernoulli schedule server vacations, queue size distribution at a random epoch, steady state

Procedia PDF Downloads 296
18187 A Sequential Approach for Random-Effects Meta-Analysis

Authors: Samson Henry Dogo, Allan Clark, Elena Kulinskaya

Abstract:

The objective in meta-analysis is to combine results from several independent studies in order to create generalization and provide evidence based for decision making. But recent studies show that the magnitude of effect size estimates reported in many areas of research finding changed with year publication and this can impair the results and conclusions of meta-analysis. A number of sequential methods have been proposed for monitoring the effect size estimates in meta-analysis. However they are based on statistical theory applicable to fixed effect model (FEM). For random-effects model (REM), the analysis incorporates the heterogeneity variance, tau-squared and its estimation create complications. In this paper proposed the use of Gombay and Serbian (2005) truncated CUSUM-type test with asymptotically valid critical values for sequential monitoring of REM. Simulation results show that the test does not control the Type I error well, and is not recommended. Further work required to derive an appropriate test in this important area of application.

Keywords: meta-analysis, random-effects model, sequential test, temporal changes in effect sizes

Procedia PDF Downloads 467
18186 An Information Matrix Goodness-of-Fit Test of the Conditional Logistic Model for Matched Case-Control Studies

Authors: Li-Ching Chen

Abstract:

The case-control design has been widely applied in clinical and epidemiological studies to investigate the association between risk factors and a given disease. The retrospective design can be easily implemented and is more economical over prospective studies. To adjust effects for confounding factors, methods such as stratification at the design stage and may be adopted. When some major confounding factors are difficult to be quantified, a matching design provides an opportunity for researchers to control the confounding effects. The matching effects can be parameterized by the intercepts of logistic models and the conditional logistic regression analysis is then adopted. This study demonstrates an information-matrix-based goodness-of-fit statistic to test the validity of the logistic regression model for matched case-control data. The asymptotic null distribution of this proposed test statistic is inferred. It needs neither to employ a simulation to evaluate its critical values nor to partition covariate space. The asymptotic power of this test statistic is also derived. The performance of the proposed method is assessed through simulation studies. An example of the real data set is applied to illustrate the implementation of the proposed method as well.

Keywords: conditional logistic model, goodness-of-fit, information matrix, matched case-control studies

Procedia PDF Downloads 292
18185 Anisotropic Total Fractional Order Variation Model in Seismic Data Denoising

Authors: Jianwei Ma, Diriba Gemechu

Abstract:

In seismic data processing, attenuation of random noise is the basic step to improve quality of data for further application of seismic data in exploration and development in different gas and oil industries. The signal-to-noise ratio of the data also highly determines quality of seismic data. This factor affects the reliability as well as the accuracy of seismic signal during interpretation for different purposes in different companies. To use seismic data for further application and interpretation, we need to improve the signal-to-noise ration while attenuating random noise effectively. To improve the signal-to-noise ration and attenuating seismic random noise by preserving important features and information about seismic signals, we introduce the concept of anisotropic total fractional order denoising algorithm. The anisotropic total fractional order variation model defined in fractional order bounded variation is proposed as a regularization in seismic denoising. The split Bregman algorithm is employed to solve the minimization problem of the anisotropic total fractional order variation model and the corresponding denoising algorithm for the proposed method is derived. We test the effectiveness of theproposed method for synthetic and real seismic data sets and the denoised result is compared with F-X deconvolution and non-local means denoising algorithm.

Keywords: anisotropic total fractional order variation, fractional order bounded variation, seismic random noise attenuation, split Bregman algorithm

Procedia PDF Downloads 207
18184 Parallel Random Number Generation for the Modern Supercomputer Architectures

Authors: Roman Snytsar

Abstract:

Pseudo-random numbers are often used in scientific computing such as the Monte Carlo Simulations or the Quantum Inspired Optimization. Requirements for a parallel random number generator running in the modern multi-core vector environment are more stringent than those for sequential random number generators. As well as passing the usual quality tests, the output of the parallel random number generator must be verifiable and reproducible throughout the concurrent execution. We propose a family of vectorized Permuted Congruential Generators. Implementations are available for multiple modern vector modern computer architectures. Besides demonstrating good single core performance, the generators scale easily across many processor cores and multiple distributed nodes. We provide performance and parallel speedup analysis and comparisons between the implementations.

Keywords: pseudo-random numbers, quantum optimization, SIMD, parallel computing

Procedia PDF Downloads 120
18183 Attitude Stabilization of Satellites Using Random Dither Quantization

Authors: Kazuma Okada, Tomoaki Hashimoto, Hirokazu Tahara

Abstract:

Recently, the effectiveness of random dither quantization method for linear feedback control systems has been shown in several papers. However, the random dither quantization method has not yet been applied to nonlinear feedback control systems. The objective of this paper is to verify the effectiveness of random dither quantization method for nonlinear feedback control systems. For this purpose, we consider the attitude stabilization problem of satellites using discrete-level actuators. Namely, this paper provides a control method based on the random dither quantization method for stabilizing the attitude of satellites using discrete-level actuators.

Keywords: quantized control, nonlinear systems, random dither quantization

Procedia PDF Downloads 243
18182 Second Order Statistics of Dynamic Response of Structures Using Gamma Distributed Damping Parameters

Authors: Badreddine Chemali, Boualem Tiliouine

Abstract:

This article presents the main results of a numerical investigation on the uncertainty of dynamic response of structures with statistically correlated random damping Gamma distributed. A computational method based on a Linear Statistical Model (LSM) is implemented to predict second order statistics for the response of a typical industrial building structure. The significance of random damping with correlated parameters and its implications on the sensitivity of structural peak response in the neighborhood of a resonant frequency are discussed in light of considerable ranges of damping uncertainties and correlation coefficients. The results are compared to those generated using Monte Carlo simulation techniques. The numerical results obtained show the importance of damping uncertainty and statistical correlation of damping coefficients when obtaining accurate probabilistic estimates of dynamic response of structures. Furthermore, the effectiveness of the LSM model to efficiently predict uncertainty propagation for structural dynamic problems with correlated damping parameters is demonstrated.

Keywords: correlated random damping, linear statistical model, Monte Carlo simulation, uncertainty of dynamic response

Procedia PDF Downloads 280
18181 Manufacturing Anomaly Detection Using a Combination of Gated Recurrent Unit Network and Random Forest Algorithm

Authors: Atinkut Atinafu Yilma, Eyob Messele Sefene

Abstract:

Anomaly detection is one of the essential mechanisms to control and reduce production loss, especially in today's smart manufacturing. Quick anomaly detection aids in reducing the cost of production by minimizing the possibility of producing defective products. However, developing an anomaly detection model that can rapidly detect a production change is challenging. This paper proposes Gated Recurrent Unit (GRU) combined with Random Forest (RF) to detect anomalies in the production process in real-time quickly. The GRU is used as a feature detector, and RF as a classifier using the input features from GRU. The model was tested using various synthesis and real-world datasets against benchmark methods. The results show that the proposed GRU-RF outperforms the benchmark methods with the shortest time taken to detect anomalies in the production process. Based on the investigation from the study, this proposed model can eliminate or reduce unnecessary production costs and bring a competitive advantage to manufacturing industries.

Keywords: anomaly detection, multivariate time series data, smart manufacturing, gated recurrent unit network, random forest

Procedia PDF Downloads 118
18180 Steady-State Behavior of a Multi-Phase M/M/1 Queue in Random Evolution Subject to Catastrophe Failure

Authors: Reni M. Sagayaraj, Anand Gnana S. Selvam, Reynald R. Susainathan

Abstract:

In this paper, we consider stochastic queueing models for Steady-state behavior of a multi-phase M/M/1 queue in random evolution subject to catastrophe failure. The arrival flow of customers is described by a marked Markovian arrival process. The service times of different type customers have a phase-type distribution with different parameters. To facilitate the investigation of the system we use a generalized phase-type service time distribution. This model contains a repair state, when a catastrophe occurs the system is transferred to the failure state. The paper focuses on the steady-state equation, and observes that, the steady-state behavior of the underlying queueing model along with the average queue size is analyzed.

Keywords: M/G/1 queuing system, multi-phase, random evolution, steady-state equation, catastrophe failure

Procedia PDF Downloads 328
18179 Tabu Random Algorithm for Guiding Mobile Robots

Authors: Kevin Worrall, Euan McGookin

Abstract:

The use of optimization algorithms is common across a large number of diverse fields. This work presents the use of a hybrid optimization algorithm applied to a mobile robot tasked with carrying out a search of an unknown environment. The algorithm is then applied to the multiple robots case, which results in a reduction in the time taken to carry out the search. The hybrid algorithm is a Random Search Algorithm fused with a Tabu mechanism. The work shows that the algorithm locates the desired points in a quicker time than a brute force search. The Tabu Random algorithm is shown to work within a simulated environment using a validated mathematical model. The simulation was run using three different environments with varying numbers of targets. As an algorithm, the Tabu Random is small, clear and can be implemented with minimal resources. The power of the algorithm is the speed at which it locates points of interest and the robustness to the number of robots involved. The number of robots can vary with no changes to the algorithm resulting in a flexible algorithm.

Keywords: algorithms, control, multi-agent, search and rescue

Procedia PDF Downloads 239
18178 Using Predictive Analytics to Identify First-Year Engineering Students at Risk of Failing

Authors: Beng Yew Low, Cher Liang Cha, Cheng Yong Teoh

Abstract:

Due to a lack of continual assessment or grade related data, identifying first-year engineering students in a polytechnic education at risk of failing is challenging. Our experience over the years tells us that there is no strong correlation between having good entry grades in Mathematics and the Sciences and excelling in hardcore engineering subjects. Hence, identifying students at risk of failure cannot be on the basis of entry grades in Mathematics and the Sciences alone. These factors compound the difficulty of early identification and intervention. This paper describes the development of a predictive analytics model in the early detection of students at risk of failing and evaluates its effectiveness. Data from continual assessments conducted in term one, supplemented by data of student psychological profiles such as interests and study habits, were used. Three classification techniques, namely Logistic Regression, K Nearest Neighbour, and Random Forest, were used in our predictive model. Based on our findings, Random Forest was determined to be the strongest predictor with an Area Under the Curve (AUC) value of 0.994. Correspondingly, the Accuracy, Precision, Recall, and F-Score were also highest among these three classifiers. Using this Random Forest Classification technique, students at risk of failure could be identified at the end of term one. They could then be assigned to a Learning Support Programme at the beginning of term two. This paper gathers the results of our findings. It also proposes further improvements that can be made to the model.

Keywords: continual assessment, predictive analytics, random forest, student psychological profile

Procedia PDF Downloads 134
18177 Estimation of a Finite Population Mean under Random Non Response Using Improved Nadaraya and Watson Kernel Weights

Authors: Nelson Bii, Christopher Ouma, John Odhiambo

Abstract:

Non-response is a potential source of errors in sample surveys. It introduces bias and large variance in the estimation of finite population parameters. Regression models have been recognized as one of the techniques of reducing bias and variance due to random non-response using auxiliary data. In this study, it is assumed that random non-response occurs in the survey variable in the second stage of cluster sampling, assuming full auxiliary information is available throughout. Auxiliary information is used at the estimation stage via a regression model to address the problem of random non-response. In particular, the auxiliary information is used via an improved Nadaraya-Watson kernel regression technique to compensate for random non-response. The asymptotic bias and mean squared error of the estimator proposed are derived. Besides, a simulation study conducted indicates that the proposed estimator has smaller values of the bias and smaller mean squared error values compared to existing estimators of finite population mean. The proposed estimator is also shown to have tighter confidence interval lengths at a 95% coverage rate. The results obtained in this study are useful, for instance, in choosing efficient estimators of the finite population mean in demographic sample surveys.

Keywords: mean squared error, random non-response, two-stage cluster sampling, confidence interval lengths

Procedia PDF Downloads 140
18176 Geo-Additive Modeling of Family Size in Nigeria

Authors: Oluwayemisi O. Alaba, John O. Olaomi

Abstract:

The 2013 Nigerian Demographic Health Survey (NDHS) data was used to investigate the determinants of family size in Nigeria using the geo-additive model. The fixed effect of categorical covariates were modelled using the diffuse prior, P-spline with second-order random walk for the nonlinear effect of continuous variable, spatial effects followed Markov random field priors while the exchangeable normal priors were used for the random effects of the community and household. The Negative Binomial distribution was used to handle overdispersion of the dependent variable. Inference was fully Bayesian approach. Results showed a declining effect of secondary and higher education of mother, Yoruba tribe, Christianity, family planning, mother giving birth by caesarean section and having a partner who has secondary education on family size. Big family size is positively associated with age at first birth, number of daughters in a household, being gainfully employed, married and living with partner, community and household effects.

Keywords: Bayesian analysis, family size, geo-additive model, negative binomial

Procedia PDF Downloads 541
18175 Leveraging SHAP Values for Effective Feature Selection in Peptide Identification

Authors: Sharon Li, Zhonghang Xia

Abstract:

Post-database search is an essential phase in peptide identification using tandem mass spectrometry (MS/MS) to refine peptide-spectrum matches (PSMs) produced by database search engines. These engines frequently face difficulty differentiating between correct and incorrect peptide assignments. Despite advances in statistical and machine learning methods aimed at improving the accuracy of peptide identification, challenges remain in selecting critical features for these models. In this study, two machine learning models—a random forest tree and a support vector machine—were applied to three datasets to enhance PSMs. SHAP values were utilized to determine the significance of each feature within the models. The experimental results indicate that the random forest model consistently outperformed the SVM across all datasets. Further analysis of SHAP values revealed that the importance of features varies depending on the dataset, indicating that a feature's role in model predictions can differ significantly. This variability in feature selection can lead to substantial differences in model performance, with false discovery rate (FDR) differences exceeding 50% between different feature combinations. Through SHAP value analysis, the most effective feature combinations were identified, significantly enhancing model performance.

Keywords: peptide identification, SHAP value, feature selection, random forest tree, support vector machine

Procedia PDF Downloads 23
18174 A Statistical Model for the Dynamics of Single Cathode Spot in Vacuum Cylindrical Cathode

Authors: Po-Wen Chen, Jin-Yu Wu, Md. Manirul Ali, Yang Peng, Chen-Te Chang, Der-Jun Jan

Abstract:

Dynamics of cathode spot has become a major part of vacuum arc discharge with its high academic interest and wide application potential. In this article, using a three-dimensional statistical model, we simulate the distribution of the ignition probability of a new cathode spot occurring in different magnetic pressure on old cathode spot surface and at different arcing time. This model for the ignition probability of a new cathode spot was proposed in two typical situations, one by the pure isotropic random walk in the absence of an external magnetic field, other by the retrograde motion in external magnetic field, in parallel with the cathode surface. We mainly focus on developed relationship between the ignition probability density distribution of a new cathode spot and the external magnetic field.

Keywords: cathode spot, vacuum arc discharge, transverse magnetic field, random walk

Procedia PDF Downloads 434
18173 Loan Repayment Prediction Using Machine Learning: Model Development, Django Web Integration and Cloud Deployment

Authors: Seun Mayowa Sunday

Abstract:

Loan prediction is one of the most significant and recognised fields of research in the banking, insurance, and the financial security industries. Some prediction systems on the market include the construction of static software. However, due to the fact that static software only operates with strictly regulated rules, they cannot aid customers beyond these limitations. Application of many machine learning (ML) techniques are required for loan prediction. Four separate machine learning models, random forest (RF), decision tree (DT), k-nearest neighbour (KNN), and logistic regression, are used to create the loan prediction model. Using the anaconda navigator and the required machine learning (ML) libraries, models are created and evaluated using the appropriate measuring metrics. From the finding, the random forest performs with the highest accuracy of 80.17% which was later implemented into the Django framework. For real-time testing, the web application is deployed on the Alibabacloud which is among the top 4 biggest cloud computing provider. Hence, to the best of our knowledge, this research will serve as the first academic paper which combines the model development and the Django framework, with the deployment into the Alibaba cloud computing application.

Keywords: k-nearest neighbor, random forest, logistic regression, decision tree, django, cloud computing, alibaba cloud

Procedia PDF Downloads 136
18172 Asymptotic Spectral Theory for Nonlinear Random Fields

Authors: Karima Kimouche

Abstract:

In this paper, we consider the asymptotic problems in spectral analysis of stationary causal random fields. We impose conditions only involving (conditional) moments, which are easily verifiable for a variety of nonlinear random fields. Limiting distributions of periodograms and smoothed periodogram spectral density estimates are obtained and applications to the spectral domain bootstrap are given.

Keywords: spatial nonlinear processes, spectral estimators, GMC condition, bootstrap method

Procedia PDF Downloads 453
18171 Inference for Compound Truncated Poisson Lognormal Model with Application to Maximum Precipitation Data

Authors: M. Z. Raqab, Debasis Kundu, M. A. Meraou

Abstract:

In this paper, we have analyzed maximum precipitation data during a particular period of time obtained from different stations in the Global Historical Climatological Network of the USA. One important point to mention is that some stations are shut down on certain days for some reason or the other. Hence, the maximum values are recorded by excluding those readings. It is assumed that the number of stations that operate follows zero-truncated Poisson random variables, and the daily precipitation follows a lognormal random variable. We call this model a compound truncated Poisson lognormal model. The proposed model has three unknown parameters, and it can take a variety of shapes. The maximum likelihood estimators can be obtained quite conveniently using Expectation-Maximization (EM) algorithm. Approximate maximum likelihood estimators are also derived. The associated confidence intervals also can be obtained from the observed Fisher information matrix. Simulation results have been performed to check the performance of the EM algorithm, and it is observed that the EM algorithm works quite well in this case. When we analyze the precipitation data set using the proposed model, it is observed that the proposed model provides a better fit than some of the existing models.

Keywords: compound Poisson lognormal distribution, EM algorithm, maximum likelihood estimation, approximate maximum likelihood estimation, Fisher information, skew distribution

Procedia PDF Downloads 108