Search results for: Bayesian
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 294

Search results for: Bayesian

114 Parallel Fuzzy Rough Support Vector Machine for Data Classification in Cloud Environment

Authors: Arindam Chaudhuri

Abstract:

Classification of data has been actively used for most effective and efficient means of conveying knowledge and information to users. The prima face has always been upon techniques for extracting useful knowledge from data such that returns are maximized. With emergence of huge datasets the existing classification techniques often fail to produce desirable results. The challenge lies in analyzing and understanding characteristics of massive data sets by retrieving useful geometric and statistical patterns. We propose a supervised parallel fuzzy rough support vector machine (PFRSVM) for data classification in cloud environment. The classification is performed by PFRSVM using hyperbolic tangent kernel. The fuzzy rough set model takes care of sensitiveness of noisy samples and handles impreciseness in training samples bringing robustness to results. The membership function is function of center and radius of each class in feature space and is represented with kernel. It plays an important role towards sampling the decision surface. The success of PFRSVM is governed by choosing appropriate parameter values. The training samples are either linear or nonlinear separable. The different input points make unique contributions to decision surface. The algorithm is parallelized with a view to reduce training times. The system is built on support vector machine library using Hadoop implementation of MapReduce. The algorithm is tested on large data sets to check its feasibility and convergence. The performance of classifier is also assessed in terms of number of support vectors. The challenges encountered towards implementing big data classification in machine learning frameworks are also discussed. The experiments are done on the cloud environment available at University of Technology and Management, India. The results are illustrated for Gaussian RBF and Bayesian kernels. The effect of variability in prediction and generalization of PFRSVM is examined with respect to values of parameter C. It effectively resolves outliers’ effects, imbalance and overlapping class problems, normalizes to unseen data and relaxes dependency between features and labels. The average classification accuracy for PFRSVM is better than other classifiers for both Gaussian RBF and Bayesian kernels. The experimental results on both synthetic and real data sets clearly demonstrate the superiority of the proposed technique.

Keywords: FRSVM, Hadoop, MapReduce, PFRSVM

Procedia PDF Downloads 454
113 The Postcognitivist Era in Cognitive Psychology

Authors: C. Jameke

Abstract:

During the cognitivist era in cognitive psychology, a theory of internal rules and symbolic representations was posited as an account of human cognition. This type of cognitive architecture had its heyday during the 1970s and 80s, but it has now been largely abandoned in favour of subsymbolic architectures (e.g. connectionism), non-representational frameworks (e.g. dynamical systems theory), and statistical approaches such as Bayesian theory. In this presentation I describe this changing landscape of research, and comment on the increasing influence of neuroscience on cognitive psychology. I then briefly review a few recent developments in connectionism, and neurocomputation relevant to cognitive psychology, and critically discuss the assumption made by some researchers in these frameworks that higher-level aspects of human cognition are simply emergent properties of massively large distributed neural networks

Keywords: connectionism, emergentism, postocgnitivist, representations, subsymbolic archiitecture

Procedia PDF Downloads 538
112 Ensemble Sampler For Infinite-Dimensional Inverse Problems

Authors: Jeremie Coullon, Robert J. Webber

Abstract:

We introduce a Markov chain Monte Carlo (MCMC) sam-pler for infinite-dimensional inverse problems. Our sam-pler is based on the affine invariant ensemble sampler, which uses interacting walkers to adapt to the covariance structure of the target distribution. We extend this ensem-ble sampler for the first time to infinite-dimensional func-tion spaces, yielding a highly efficient gradient-free MCMC algorithm. Because our ensemble sampler does not require gradients or posterior covariance estimates, it is simple to implement and broadly applicable. In many Bayes-ian inverse problems, Markov chain Monte Carlo (MCMC) meth-ods are needed to approximate distributions on infinite-dimensional function spaces, for example, in groundwater flow, medical imaging, and traffic flow. Yet designing efficient MCMC methods for function spaces has proved challenging. Recent gradi-ent-based MCMC methods preconditioned MCMC methods, and SMC methods have improved the computational efficiency of functional random walk. However, these samplers require gradi-ents or posterior covariance estimates that may be challenging to obtain. Calculating gradients is difficult or impossible in many high-dimensional inverse problems involving a numerical integra-tor with a black-box code base. Additionally, accurately estimating posterior covariances can require a lengthy pilot run or adaptation period. These concerns raise the question: is there a functional sampler that outperforms functional random walk without requir-ing gradients or posterior covariance estimates? To address this question, we consider a gradient-free sampler that avoids explicit covariance estimation yet adapts naturally to the covariance struc-ture of the sampled distribution. This sampler works by consider-ing an ensemble of walkers and interpolating and extrapolating between walkers to make a proposal. This is called the affine in-variant ensemble sampler (AIES), which is easy to tune, easy to parallelize, and efficient at sampling spaces of moderate dimen-sionality (less than 20). The main contribution of this work is to propose a functional ensemble sampler (FES) that combines func-tional random walk and AIES. To apply this sampler, we first cal-culate the Karhunen–Loeve (KL) expansion for the Bayesian prior distribution, assumed to be Gaussian and trace-class. Then, we use AIES to sample the posterior distribution on the low-wavenumber KL components and use the functional random walk to sample the posterior distribution on the high-wavenumber KL components. Alternating between AIES and functional random walk updates, we obtain our functional ensemble sampler that is efficient and easy to use without requiring detailed knowledge of the target dis-tribution. In past work, several authors have proposed splitting the Bayesian posterior into low-wavenumber and high-wavenumber components and then applying enhanced sampling to the low-wavenumber components. Yet compared to these other samplers, FES is unique in its simplicity and broad applicability. FES does not require any derivatives, and the need for derivative-free sam-plers has previously been emphasized. FES also eliminates the requirement for posterior covariance estimates. Lastly, FES is more efficient than other gradient-free samplers in our tests. In two nu-merical examples, we apply FES to challenging inverse problems that involve estimating a functional parameter and one or more scalar parameters. We compare the performance of functional random walk, FES, and an alternative derivative-free sampler that explicitly estimates the posterior covariance matrix. We conclude that FES is the fastest available gradient-free sampler for these challenging and multimodal test problems.

Keywords: Bayesian inverse problems, Markov chain Monte Carlo, infinite-dimensional inverse problems, dimensionality reduction

Procedia PDF Downloads 119
111 New Segmentation of Piecewise Moving-Average Model by Using Reversible Jump MCMC Algorithm

Authors: Suparman

Abstract:

This paper addresses the problem of the signal segmentation within a Bayesian framework by using reversible jump MCMC algorithm. The signal is modelled by piecewise constant Moving-Average (MA) model where the numbers of segments, the position of change-point, the order and the coefficient of the MA model for each segment are unknown. The reversible jump MCMC algorithm is then used to generate samples distributed according to the joint posterior distribution of the unknown parameters. These samples allow calculating some interesting features of the posterior distribution. The performance of the methodology is illustrated via several simulation results.

Keywords: piecewise, moving-average model, reversible jump MCMC, signal segmentation

Procedia PDF Downloads 193
110 Winter – Not Spring - Climate Drives Annual Adult Survival in Common Passerines: A Country-Wide, Multi-Species Modeling Exercise

Authors: Manon Ghislain, Timothée Bonnet, Olivier Gimenez, Olivier Dehorter, Pierre-Yves Henry

Abstract:

Climatic fluctuations affect the demography of animal populations, generating changes in population size, phenology, distribution and community assemblages. However, very few studies have identified the underlying demographic processes. For short-lived species, like common passerine birds, are these changes generated by changes in adult survival or in fecundity and recruitment? This study tests for an effect of annual climatic conditions (spring and winter) on annual, local adult survival at very large spatial (a country, 252 sites), temporal (25 years) and biological (25 species) scales. The Constant Effort Site ringing has allowed the collection of capture - mark - recapture data for 100 000 adult individuals since 1989, over metropolitan France, thus documenting annual, local survival rates of the most common passerine birds. We specifically developed a set of multi-year, multi-species, multi-site Bayesian models describing variations in local survival and recapture probabilities. This method allows for a statistically powerful hierarchical assessment (global versus species-specific) of the effects of climate variables on survival. A major part of between-year variations in survival rate was common to all species (74% of between-year variance), whereas only 26% of temporal variation was species-specific. Although changing spring climate is commonly invoked as a cause of population size fluctuations, spring climatic anomalies (mean precipitation or temperature for March-August) do not impact adult survival: only 1% of between-year variation of species survival is explained by spring climatic anomalies. However, for sedentary birds, winter climatic anomalies (North Atlantic Oscillation) had a significant, quadratic effect on adult survival, birds surviving less during intermediate years than during more extreme years. For migratory birds, we do not detect an effect of winter climatic anomalies (Sahel Rainfall). We will analyze the life history traits (migration, habitat, thermal range) that could explain a different sensitivity of species to winter climate anomalies. Overall, we conclude that changes in population sizes for passerine birds are unlikely to be the consequences of climate-driven mortality (or emigration) in spring but could be induced by other demographic parameters, like fecundity.

Keywords: Bayesian approach, capture-recapture, climate anomaly, constant effort sites scheme, passerine, seasons, survival

Procedia PDF Downloads 263
109 Multi-Criteria Evolutionary Algorithm to Develop Efficient Schedules for Complex Maintenance Problems

Authors: Sven Tackenberg, Sönke Duckwitz, Andreas Petz, Christopher M. Schlick

Abstract:

This paper introduces an extension to the well-established Resource-Constrained Project Scheduling Problem (RCPSP) to apply it to complex maintenance problems. The problem is to assign technicians to a team which has to process several tasks with multi-level skill requirements during a work shift. Here, several alternative activities for a task allow both, the temporal shift of activities or the reallocation of technicians and tools. As a result, switches from one valid work process variant to another can be considered and may be selected by the developed evolutionary algorithm based on the present skill level of technicians or the available tools. An additional complication of the observed scheduling problem is that the locations of the construction sites are only temporarily accessible during a day. Due to intensive rail traffic, the available time slots for maintenance and repair works are extremely short and are often distributed throughout the day. To identify efficient working periods, a first concept of a Bayesian network is introduced and is integrated into the extended RCPSP with pre-emptive and non-pre-emptive tasks. Thereby, the Bayesian network is used to calculate the probability of a maintenance task to be processed during a specific period of the shift. Focusing on the domain of maintenance of the railway infrastructure in metropolitan areas as the most unproductive implementation process at construction site, the paper illustrates how the extended RCPSP can be applied for maintenance planning support. A multi-criteria evolutionary algorithm with a problem representation is introduced which is capable of revising technician-task allocations, whereas the duration of the task may be stochastic. The approach uses a novel activity list representation to ensure easily describable and modifiable elements which can be converted into detailed shift schedules. Thereby, the main objective is to develop a shift plan which maximizes the utilization of each technician due to a minimization of the waiting times caused by rail traffic. The results of the already implemented core algorithm illustrate a fast convergence towards an optimal team composition for a shift, an efficient sequence of tasks and a high probability of the subsequent implementation due to the stochastic durations of the tasks. In the paper, the algorithm for the extended RCPSP is analyzed in experimental evaluation using real-world example problems with various size, resource complexity, tightness and so forth.

Keywords: maintenance management, scheduling, resource constrained project scheduling problem, genetic algorithms

Procedia PDF Downloads 200
108 Currency Exchange Rate Forecasts Using Quantile Regression

Authors: Yuzhi Cai

Abstract:

In this paper, we discuss a Bayesian approach to quantile autoregressive (QAR) time series model estimation and forecasting. Together with a combining forecasts technique, we then predict USD to GBP currency exchange rates. Combined forecasts contain all the information captured by the fitted QAR models at different quantile levels and are therefore better than those obtained from individual models. Our results show that an unequally weighted combining method performs better than other forecasting methodology. We found that a median AR model can perform well in point forecasting when the predictive density functions are symmetric. However, in practice, using the median AR model alone may involve the loss of information about the data captured by other QAR models. We recommend that combined forecasts should be used whenever possible.

Keywords: combining forecasts, MCMC, predictive density functions, quantile forecasting, quantile modelling

Procedia PDF Downloads 222
107 Non-Linear Causality Inference Using BAMLSS and Bi-CAM in Finance

Authors: Flora Babongo, Valerie Chavez

Abstract:

Inferring causality from observational data is one of the fundamental subjects, especially in quantitative finance. So far most of the papers analyze additive noise models with either linearity, nonlinearity or Gaussian noise. We fill in the gap by providing a nonlinear and non-gaussian causal multiplicative noise model that aims to distinguish the cause from the effect using a two steps method based on Bayesian additive models for location, scale and shape (BAMLSS) and on causal additive models (CAM). We have tested our method on simulated and real data and we reached an accuracy of 0.86 on average. As real data, we considered the causality between financial indices such as S&P 500, Nasdaq, CAC 40 and Nikkei, and companies' log-returns. Our results can be useful in inferring causality when the data is heteroskedastic or non-injective.

Keywords: causal inference, DAGs, BAMLSS, financial index

Procedia PDF Downloads 117
106 Investigating the Behavior of Individual Business Taxpayers: Behavioral Economics Approach

Authors: Yeganeh Mousavi Jahromi, Sahar Dehghan

Abstract:

In Direct Tax Act, penalties and incentives are two strategies for realization of the expected tax revenues. In this study, the interaction between individual businesses' taxpayers' behaviors and National Tax Administration is investigated by using prospect theory which is based on behavioral economics approach. For this purpose, the structure of the tax compliance of the mentioned taxpayers is evaluated via the changes in penalty and incentive rates. In this way, a special questionnaire regarding the items of individual businesses sector of Direct Tax Act was designed for tax compliance evaluation, and the results were obtained using Bayesian Hierarchical method. The results indicate that the investigated individual business taxpayers, at all income levels, were more sensitive toward incentive rates so that this result can be useful for tax policymakers.

Keywords: behavioral economics, prospect theory, tax compliance, penalties, incentives

Procedia PDF Downloads 32
105 Choosing between the Regression Correlation, the Rank Correlation, and the Correlation Curve

Authors: Roger L. Goodwin

Abstract:

This paper presents a rank correlation curve. The traditional correlation coefficient is valid for both continuous variables and for integer variables using rank statistics. Since the correlation coefficient has already been established in rank statistics by Spearman, such a calculation can be extended to the correlation curve. This paper presents two survey questions. The survey collected non-continuous variables. We will show weak to moderate correlation. Obviously, one question has a negative effect on the other. A review of the qualitative literature can answer which question and why. The rank correlation curve shows which collection of responses has a positive slope and which collection of responses has a negative slope. Such information is unavailable from the flat, "first-glance" correlation statistics.

Keywords: Bayesian estimation, regression model, rank statistics, correlation, correlation curve

Procedia PDF Downloads 422
104 RAD-Seq Data Reveals Evidence of Local Adaptation between Upstream and Downstream Populations of Australian Glass Shrimp

Authors: Sharmeen Rahman, Daniel Schmidt, Jane Hughes

Abstract:

Paratya australiensis Kemp (Decapoda: Atyidae) is a widely distributed indigenous freshwater shrimp, highly abundant in eastern Australia. This species has been considered as a model stream organism to study genetics, dispersal, biology, behaviour and evolution in Atyids. Paratya has a filter feeding and scavenging habit which plays a significant role in the formation of lotic community structure. It has been shown to reduce periphyton and sediment from hard substrates of coastal streams and hence acts as a strongly-interacting ecosystem macroconsumer. Besides, Paratya is one of the major food sources for stream dwelling fishes. Paratya australiensis is a cryptic species complex consisting of 9 highly divergent mitochondrial DNA lineages. Among them, one lineage has been observed to favour upstream sites at higher altitudes, with cooler water temperatures. This study aims to identify local adaptation in upstream and downstream populations of this lineage in three streams in the Conondale Range, North-eastern Brisbane, Queensland, Australia. Two populations (up and down stream) from each stream have been chosen to test for local adaptation, and a parallel pattern of adaptation is expected across all streams. Six populations each consisting of 24 individuals were sequenced using the Restriction Site Associated DNA-seq (RAD-seq) technique. Genetic markers (SNPs) were developed using double digest RAD sequencing (ddRAD-seq). These were used for de novo assembly of Paratya genome. De novo assembly was done using the STACKs program and produced 56, 344 loci for 47 individuals from one stream. Among these individuals, 39 individuals shared 5819 loci, and these markers are being used to test for local adaptation using Fst outlier tests (Arlequin) and Bayesian analysis (BayeScan) between up and downstream populations. Fst outlier test detected 27 loci likely to be under selection and the Bayesian analysis also detected 27 loci as under selection. Among these 27 loci, 3 loci showed evidence of selection at a significance level using BayeScan program. On the other hand, up and downstream populations are strongly diverged at neutral loci with a Fst =0.37. Similar analysis will be done with all six populations to determine if there is a parallel pattern of adaptation across all streams. Furthermore, multi-locus among population covariance analysis will be done to identify potential markers under selection as well as to compare single locus versus multi-locus approaches for detecting local adaptation. Adaptive genes identified in this study can be used for future studies to design primers and test for adaptation in related crustacean species.

Keywords: Paratya australiensis, rainforest streams, selection, single nucleotide polymorphism (SNPs)

Procedia PDF Downloads 225
103 Comparison of Quality of Life One Year after Bariatric Intervention: Systematic Review of the Literature with Bayesian Network Meta-Analysis

Authors: Piotr Tylec, Alicja Dudek, Grzegorz Torbicz, Magdalena Mizera, Natalia Gajewska, Michael Su, Tanawat Vongsurbchart, Tomasz Stefura, Magdalena Pisarska, Mateusz Rubinkiewicz, Piotr Malczak, Piotr Major, Michal Pedziwiatr

Abstract:

Introduction: Quality of life after bariatric surgery is an important factor when evaluating the final result of the treatment. Considering the vast surgical options, we tried to globally compare available methods in terms of quality of following the surgery. The aim of the study is to compare the quality of life a year after bariatric intervention using network meta-analysis methods. Material and Methods: We performed a systematic review according to PRISMA guidelines with Bayesian network meta-analysis. Inclusion criteria were: studies comparing at least two methods of weight loss treatment of which at least one is surgical, assessment of the quality of life one year after surgery by validated questionnaires. Primary outcomes were quality of life one year after bariatric procedure. The following aspects of quality of life were analyzed: physical, emotional, general health, vitality, role physical, social, mental, and bodily pain. All questionnaires were standardized and pooled to a single scale. Lifestyle intervention was considered as a referenced point. Results: An initial reference search yielded 5636 articles. 18 studies were evaluated. In comparison of total score of quality of life, we observed that laparoscopic sleeve gastrectomy (LSG) (median (M): 3.606, Credible Interval 97.5% (CrI): 1.039; 6.191), laparoscopic Roux en-Y gastric by-pass (LRYGB) (M: 4.973, CrI: 2.627; 7.317) and open Roux en-Y gastric by-pass (RYGB) (M: 9.735, CrI: 6.708; 12.760) had better results than other bariatric intervention in relation to lifestyle interventions. In the analysis of the physical aspects of quality of life, we notice better results in LSG (M: 3.348, CrI: 0.548; 6.147) and in LRYGB procedure (M: 5.070, CrI: 2.896; 7.208) than control intervention, and worst results in open RYGB (M: -9.212, CrI: -11.610; -6.844). Analyzing emotional aspects, we found better results than control intervention in LSG, in LRYGB, in open RYGB, and laparoscopic gastric plication. In general health better results were in LSG (M: 9.144, CrI: 4.704; 13.470), in LRYGB (M: 6.451, CrI: 10.240; 13.830) and in single-anastomosis gastric by-pass (M: 8.671, CrI: 1.986; 15.310), and worst results in open RYGB (M: -4.048, CrI: -7.984; -0.305). In social and vital aspects of quality of life, better results were observed in LSG and LRYGB than control intervention. We did not find any differences between bariatric interventions in physical role, mental and bodily aspects of quality of life. Conclusion: The network meta-analysis revealed that better quality of life in total score one year after bariatric interventions were after LSG, LRYGB, open RYGB. In physical and general health aspects worst quality of life was in open RYGB procedure. Other interventions did not significantly affect the quality of life after a year compared to dietary intervention.

Keywords: bariatric surgery, network meta-analysis, quality of life, one year follow-up

Procedia PDF Downloads 113
102 Reinforcement Learning the Born Rule from Photon Detection

Authors: Rodrigo S. Piera, Jailson Sales Ara´ujo, Gabriela B. Lemos, Matthew B. Weiss, John B. DeBrota, Gabriel H. Aguilar, Jacques L. Pienaar

Abstract:

The Born rule was historically viewed as an independent axiom of quantum mechanics until Gleason derived it in 1957 by assuming the Hilbert space structure of quantum measurements [1]. In subsequent decades there have been diverse proposals to derive the Born rule starting from even more basic assumptions [2]. In this work, we demonstrate that a simple reinforcement-learning algorithm, having no pre-programmed assumptions about quantum theory, will nevertheless converge to a behaviour pattern that accords with the Born rule, when tasked with predicting the output of a quantum optical implementation of a symmetric informationally-complete measurement (SIC). Our findings support a hypothesis due to QBism (the subjective Bayesian approach to quantum theory), which states that the Born rule can be thought of as a normative rule for making decisions in a quantum world [3].

Keywords: quantum Bayesianism, quantum theory, quantum information, quantum measurement

Procedia PDF Downloads 44
101 Comparison of Various Classification Techniques Using WEKA for Colon Cancer Detection

Authors: Beema Akbar, Varun P. Gopi, V. Suresh Babu

Abstract:

Colon cancer causes the deaths of about half a million people every year. The common method of its detection is histopathological tissue analysis, it leads to tiredness and workload to the pathologist. A novel method is proposed that combines both structural and statistical pattern recognition used for the detection of colon cancer. This paper presents a comparison among the different classifiers such as Multilayer Perception (MLP), Sequential Minimal Optimization (SMO), Bayesian Logistic Regression (BLR) and k-star by using classification accuracy and error rate based on the percentage split method. The result shows that the best algorithm in WEKA is MLP classifier with an accuracy of 83.333% and kappa statistics is 0.625. The MLP classifier which has a lower error rate, will be preferred as more powerful classification capability.

Keywords: colon cancer, histopathological image, structural and statistical pattern recognition, multilayer perception

Procedia PDF Downloads 545
100 Electroencephalogram Based Alzheimer Disease Classification using Machine and Deep Learning Methods

Authors: Carlos Roncero-Parra, Alfonso Parreño-Torres, Jorge Mateo Sotos, Alejandro L. Borja

Abstract:

In this research, different methods based on machine/deep learning algorithms are presented for the classification and diagnosis of patients with mental disorders such as alzheimer. For this purpose, the signals obtained from 32 unipolar electrodes identified by non-invasive EEG were examined, and their basic properties were obtained. More specifically, different well-known machine learning based classifiers have been used, i.e., support vector machine (SVM), Bayesian linear discriminant analysis (BLDA), decision tree (DT), Gaussian Naïve Bayes (GNB), K-nearest neighbor (KNN) and Convolutional Neural Network (CNN). A total of 668 patients from five different hospitals have been studied in the period from 2011 to 2021. The best accuracy is obtained was around 93 % in both ADM and ADA classifications. It can be concluded that such a classification will enable the training of algorithms that can be used to identify and classify different mental disorders with high accuracy.

Keywords: alzheimer, machine learning, deep learning, EEG

Procedia PDF Downloads 75
99 Estimation and Forecasting with a Quantile AR Model for Financial Returns

Authors: Yuzhi Cai

Abstract:

This talk presents a Bayesian approach to quantile autoregressive (QAR) time series model estimation and forecasting. We establish that the joint posterior distribution of the model parameters and future values is well defined. The associated MCMC algorithm for parameter estimation and forecasting converges to the posterior distribution quickly. We also present a combining forecasts technique to produce more accurate out-of-sample forecasts by using a weighted sequence of fitted QAR models. A moving window method to check the quality of the estimated conditional quantiles is developed. We verify our methodology using simulation studies and then apply it to currency exchange rate data. An application of the method to the USD to GBP daily currency exchange rates will also be discussed. The results obtained show that an unequally weighted combining method performs better than other forecasting methodology.

Keywords: combining forecasts, MCMC, quantile modelling, quantile forecasting, predictive density functions

Procedia PDF Downloads 312
98 A Time-Varying and Non-Stationary Convolution Spectral Mixture Kernel for Gaussian Process

Authors: Kai Chen, Shuguang Cui, Feng Yin

Abstract:

Gaussian process (GP) with spectral mixture (SM) kernel demonstrates flexible non-parametric Bayesian learning ability in modeling unknown function. In this work a novel time-varying and non-stationary convolution spectral mixture (TN-CSM) kernel with a significant enhancing of interpretability by using process convolution is introduced. A way decomposing the SM component into an auto-convolution of base SM component and parameterizing it to be input dependent is outlined. Smoothly, performing a convolution between two base SM component yields a novel structure of non-stationary SM component with much better generalized expression and interpretation. The TN-CSM perfectly allows compatibility with the stationary SM kernel in terms of kernel form and spectral base ignored and confused by previous non-stationary kernels. On synthetic and real-world datatsets, experiments show the time-varying characteristics of hyper-parameters in TN-CSM and compare the learning performance of TN-CSM with popular and representative non-stationary GP.

Keywords: Gaussian process, spectral mixture, non-stationary, convolution

Procedia PDF Downloads 156
97 Gaussian Particle Flow Bernoulli Filter for Single Target Tracking

Authors: Hyeongbok Kim, Lingling Zhao, Xiaohong Su, Junjie Wang

Abstract:

The Bernoulli filter is a precise Bayesian filter for single target tracking based on the random finite set theory. The standard Bernoulli filter often underestimates the number of targets. This study proposes a Gaussian particle flow (GPF) Bernoulli filter employing particle flow to migrate particles from prior to posterior positions to improve the performance of the standard Bernoulli filter. By employing the particle flow filter, the computational speed of the Bernoulli filters is significantly improved. In addition, the GPF Bernoulli filter provides a more accurate estimation compared with that of the standard Bernoulli filter. Simulation results confirm the improved tracking performance and computational speed in two- and three-dimensional scenarios compared with other algorithms.

Keywords: Bernoulli filter, particle filter, particle flow filter, random finite sets, target tracking

Procedia PDF Downloads 51
96 Wireless Sensor Anomaly Detection Using Soft Computing

Authors: Mouhammd Alkasassbeh, Alaa Lasasmeh

Abstract:

We live in an era of rapid development as a result of significant scientific growth. Like other technologies, wireless sensor networks (WSNs) are playing one of the main roles. Based on WSNs, ZigBee adds many features to devices, such as minimum cost and power consumption, and increasing the range and connect ability of sensor nodes. ZigBee technology has come to be used in various fields, including science, engineering, and networks, and even in medicinal aspects of intelligence building. In this work, we generated two main datasets, the first being based on tree topology and the second on star topology. The datasets were evaluated by three machine learning (ML) algorithms: J48, meta.j48 and multilayer perceptron (MLP). Each topology was classified into normal and abnormal (attack) network traffic. The dataset used in our work contained simulated data from network simulation 2 (NS2). In each database, the Bayesian network meta.j48 classifier achieved the highest accuracy level among other classifiers, of 99.7% and 99.2% respectively.

Keywords: IDS, Machine learning, WSN, ZigBee technology

Procedia PDF Downloads 509
95 Forecasting Model to Predict Dengue Incidence in Malaysia

Authors: W. H. Wan Zakiyatussariroh, A. A. Nasuhar, W. Y. Wan Fairos, Z. A. Nazatul Shahreen

Abstract:

Forecasting dengue incidence in a population can provide useful information to facilitate the planning of the public health intervention. Many studies on dengue cases in Malaysia were conducted but are limited in modeling the outbreak and forecasting incidence. This article attempts to propose the most appropriate time series model to explain the behavior of dengue incidence in Malaysia for the purpose of forecasting future dengue outbreaks. Several seasonal auto-regressive integrated moving average (SARIMA) models were developed to model Malaysia’s number of dengue incidence on weekly data collected from January 2001 to December 2011. SARIMA (2,1,1)(1,1,1)52 model was found to be the most suitable model for Malaysia’s dengue incidence with the least value of Akaike information criteria (AIC) and Bayesian information criteria (BIC) for in-sample fitting. The models further evaluate out-sample forecast accuracy using four different accuracy measures. The results indicate that SARIMA (2,1,1)(1,1,1)52 performed well for both in-sample fitting and out-sample evaluation.

Keywords: time series modeling, Box-Jenkins, SARIMA, forecasting

Procedia PDF Downloads 439
94 A Geographic Information System Mapping Method for Creating Improved Satellite Solar Radiation Dataset Over Qatar

Authors: Sachin Jain, Daniel Perez-Astudillo, Dunia A. Bachour, Antonio P. Sanfilippo

Abstract:

The future of solar energy in Qatar is evolving steadily. Hence, high-quality spatial solar radiation data is of the uttermost requirement for any planning and commissioning of solar technology. Generally, two types of solar radiation data are available: satellite data and ground observations. Satellite solar radiation data is developed by the physical and statistical model. Ground data is collected by solar radiation measurement stations. The ground data is of high quality. However, they are limited to distributed point locations with the high cost of installation and maintenance for the ground stations. On the other hand, satellite solar radiation data is continuous and available throughout geographical locations, but they are relatively less accurate than ground data. To utilize the advantage of both data, a product has been developed here which provides spatial continuity and higher accuracy than any of the data alone. The popular satellite databases: National Solar radiation Data Base, NSRDB (PSM V3 model, spatial resolution: 4 km) is chosen here for merging with ground-measured solar radiation measurement in Qatar. The spatial distribution of ground solar radiation measurement stations is comprehensive in Qatar, with a network of 13 ground stations. The monthly average of the daily total Global Horizontal Irradiation (GHI) component from ground and satellite data is used for error analysis. The normalized root means square error (NRMSE) values of 3.31%, 6.53%, and 6.63% for October, November, and December 2019 were observed respectively when comparing in-situ and NSRDB data. The method is based on the Empirical Bayesian Kriging Regression Prediction model available in ArcGIS, ESRI. The workflow of the algorithm is based on the combination of regression and kriging methods. A regression model (OLS, ordinary least square) is fitted between the ground and NSBRD data points. A semi-variogram is fitted into the experimental semi-variogram obtained from the residuals. The kriging residuals obtained after fitting the semi-variogram model were added to NSRBD data predicted values obtained from the regression model to obtain the final predicted values. The NRMSE values obtained after merging are respectively 1.84%, 1.28%, and 1.81% for October, November, and December 2019. One more explanatory variable, that is the ground elevation, has been incorporated in the regression and kriging methods to reduce the error and to provide higher spatial resolution (30 m). The final GHI maps have been created after merging, and NRMSE values of 1.24%, 1.28%, and 1.28% have been observed for October, November, and December 2019, respectively. The proposed merging method has proven as a highly accurate method. An additional method is also proposed here to generate calibrated maps by using regression and kriging model and further to use the calibrated model to generate solar radiation maps from the explanatory variable only when not enough historical ground data is available for long-term analysis. The NRMSE values obtained after the comparison of the calibrated maps with ground data are 5.60% and 5.31% for November and December 2019 month respectively.

Keywords: global horizontal irradiation, GIS, empirical bayesian kriging regression prediction, NSRDB

Procedia PDF Downloads 55
93 The First Complete Mitochondrial Genome of Melon Thrips, Thrips palmi (Thripinae: Thysanoptera): Vector for Tospoviruses

Authors: Kaomud Tyagi, Rajasree Chakraborty, Shantanu Kundu, Devkant Singha, Kailash Chandra, Vikas Kumar

Abstract:

The melon thrips, Thrips palmi is a serious pest of a wide range of agriculture crops and also act as vectors for plant viruses (genus Tospovirus, family Bunyaviridae). More molecular data on this species is required to understand the cryptic speciation and evolutionary affiliations. Mitochondrial genomes have been widely used in phylogenetic and evolutionary studies in insect. So far, mitogenomes of five thrips species (Anaphothrips obscurus, Frankliniella intonsa, Frankliniella occidentalis, Scirtothrips dorsalis and Thrips imaginis) is available in the GenBank database. In this study, we sequenced the first complete mitogenome T. palmi and compared it with available thrips mitogenomes. We assembled the mitogenome from the whole genome sequencing data generated using Illumina Hiseq2500. Annotation was performed using MITOS web-server to estimate the location of protein coding genes (PCGs), transfer RNA (tRNAs), ribosomal RNAs (rRNAs) and their secondary structures. The boundaries of PCGs and rRNAs was confirmed manually in NCBI. Phylogenetic analyses were performed using the 13 PCGs data using maximum likelihood (ML) in PAUP, and Bayesian inference (BI) in MrBayes 3.2. The complete mitogenome of T. palmi was 15,333 base pairs (bp), which was greater than the genomes of A. obscurus (14,890bp), F. intonsa (15,215 bp), F. occidentalis (14,889 bp) and S. dorsalis South Asia strain (SA1) (14,283 bp), but smaller than the genomes of T. imaginis (15,407 bp) and S. dorsalis East Asia strain (EA1) (15,343bp). Like in other thrips species, the mitochondrial genome of T. palmi was represented by 37 genes, including 13 PCGs, large and small ribosomal RNA (rrnL and rrnS) genes, 22 transfer RNA (tRNAs) genes (with one extra gene for trn-Serine) and two A+T-rich control regions (CR1 and CR2). Thirty one genes were observed on heavy (H) strand and six genes on the light (L) strand. The six tRNA genes (trnG,trnK, trnY, trnW, trnF, and trnH) were found to be conserved in all thrips species mitogenomes in their locations relative to a protein-coding or rRNA gene upstream or downstream. The gene arrangements of T. palmi is very close to T. imaginis except the rearrangements in tRNAs genes: trnR (arginine), and trnE (glutamic acid) were found to be located between cox3 and CR2 in T. imaginis which were translocated between atp6 and CR1 in T. palmi; trnL1 (Leucine) and trnS1(Serine) were located between atp6 and CR1 in T. imaginis which were translocated between cox3 and CR2 in T. palmi. The location of CR1 upstream of nad5 gene was suggested to be ancestral condition of the thrips species in subfamily Thripinae, was also observed in T. palmi. Both the Maximum likelihood (ML) and Bayesian Inference (BI) phylogenetic trees generated resulted in similar topologies. The T. palmi was clustered with T. imaginis. We concluded that more molecular data on the diverse thrips species from different hierarchical level is needed, to understand the phylogenetic and evolutionary relationships among them.

Keywords: thrips, comparative mitogenomics, gene rearrangements, phylogenetic analysis

Procedia PDF Downloads 134
92 Performance and Limitations of Likelihood Based Information Criteria and Leave-One-Out Cross-Validation Approximation Methods

Authors: M. A. C. S. Sampath Fernando, James M. Curran, Renate Meyer

Abstract:

Model assessment, in the Bayesian context, involves evaluation of the goodness-of-fit and the comparison of several alternative candidate models for predictive accuracy and improvements. In posterior predictive checks, the data simulated under the fitted model is compared with the actual data. Predictive model accuracy is estimated using information criteria such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the Deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC). The goal of an information criterion is to obtain an unbiased measure of out-of-sample prediction error. Since posterior checks use the data twice; once for model estimation and once for testing, a bias correction which penalises the model complexity is incorporated in these criteria. Cross-validation (CV) is another method used for examining out-of-sample prediction accuracy. Leave-one-out cross-validation (LOO-CV) is the most computationally expensive variant among the other CV methods, as it fits as many models as the number of observations. Importance sampling (IS), truncated importance sampling (TIS) and Pareto-smoothed importance sampling (PSIS) are generally used as approximations to the exact LOO-CV and utilise the existing MCMC results avoiding expensive computational issues. The reciprocals of the predictive densities calculated over posterior draws for each observation are treated as the raw importance weights. These are in turn used to calculate the approximate LOO-CV of the observation as a weighted average of posterior densities. In IS-LOO, the raw weights are directly used. In contrast, the larger weights are replaced by their modified truncated weights in calculating TIS-LOO and PSIS-LOO. Although, information criteria and LOO-CV are unable to reflect the goodness-of-fit in absolute sense, the differences can be used to measure the relative performance of the models of interest. However, the use of these measures is only valid under specific circumstances. This study has developed 11 models using normal, log-normal, gamma, and student’s t distributions to improve the PCR stutter prediction with forensic data. These models are comprised of four with profile-wide variances, four with locus specific variances, and three which are two-component mixture models. The mean stutter ratio in each model is modeled as a locus specific simple linear regression against a feature of the alleles under study known as the longest uninterrupted sequence (LUS). The use of AIC, BIC, DIC, and WAIC in model comparison has some practical limitations. Even though, IS-LOO, TIS-LOO, and PSIS-LOO are considered to be approximations of the exact LOO-CV, the study observed some drastic deviations in the results. However, there are some interesting relationships among the logarithms of pointwise predictive densities (lppd) calculated under WAIC and the LOO approximation methods. The estimated overall lppd is a relative measure that reflects the overall goodness-of-fit of the model. Parallel log-likelihood profiles for the models conditional on equal posterior variances in lppds were observed. This study illustrates the limitations of the information criteria in practical model comparison problems. In addition, the relationships among LOO-CV approximation methods and WAIC with their limitations are discussed. Finally, useful recommendations that may help in practical model comparisons with these methods are provided.

Keywords: cross-validation, importance sampling, information criteria, predictive accuracy

Procedia PDF Downloads 359
91 Assessment of Potential Chemical Exposure to Betamethasone Valerate and Clobetasol Propionate in Pharmaceutical Manufacturing Laboratories

Authors: Nadeen Felemban, Hamsa Banjer, Rabaah Jaafari

Abstract:

One of the most common hazards in the pharmaceutical industry is the chemical hazard, which can cause harm or develop occupational health diseases/illnesses due to chronic exposures to hazardous substances. Therefore, a chemical agent management system is required, including hazard identification, risk assessment, controls for specific hazards and inspections, to keep your workplace healthy and safe. However, routine management monitoring is also required to verify the effectiveness of the control measures. Moreover, Betamethasone Valerate and Clobetasol Propionate are some of the APIs (Active Pharmaceutical Ingredients) with highly hazardous classification-Occupational Hazard Category (OHC 4), which requires a full containment (ECA-D) during handling to avoid chemical exposure. According to Safety Data Sheet, those chemicals are reproductive toxicants (reprotoxicant H360D), which may affect female workers’ health and cause fatal damage to an unborn child, or impair fertility. In this study, qualitative (chemical Risk assessment-qCRA) was conducted to assess the chemical exposure during handling of Betamethasone Valerate and Clobetasol Propionate in pharmaceutical laboratories. The outcomes of qCRA identified that there is a risk of potential chemical exposure (risk rating 8 Amber risk). Therefore, immediate actions were taken to ensure interim controls (according to the Hierarchy of controls) are in place and in use to minimize the risk of chemical exposure. No open handlings should be done out of the Steroid Glove Box Isolator (SGB) with the required Personal Protective Equipment (PPEs). The PPEs include coverall, nitrile hand gloves, safety shoes and powered air-purifying respirators (PAPR). Furthermore, a quantitative assessment (personal air sampling) was conducted to verify the effectiveness of the engineering controls (SGB Isolator) and to confirm if there is chemical exposure, as indicated earlier by qCRA. Three personal air samples were collected using an air sampling pump and filter (IOM2 filters, 25mm glass fiber media). The collected samples were analyzed by HPLC in the BV lab, and the measured concentrations were reported in (ug/m3) with reference to Occupation Exposure Limits, 8hr OELs (8hr TWA) for each analytic. The analytical results are needed in 8hr TWA (8hr Time-weighted Average) to be analyzed using Bayesian statistics (IHDataAnalyst). The results of the Bayesian Likelihood Graph indicate (category 0), which means Exposures are de "minimus," trivial, or non-existent Employees have little to no exposure. Also, these results indicate that the 3 samplings are representative samplings with very low variations (SD=0.0014). In conclusion, the engineering controls were effective in protecting the operators from such exposure. However, routine chemical monitoring is required every 3 years unless there is a change in the processor type of chemicals. Also, frequent management monitoring (daily, weekly, and monthly) is required to ensure the control measures are in place and in use. Furthermore, a Similar Exposure Group (SEG) was identified in this activity and included in the annual health surveillance for health monitoring.

Keywords: occupational health and safety, risk assessment, chemical exposure, hierarchy of control, reproductive

Procedia PDF Downloads 145
90 A Game of Information in Defense/Attack Strategies: Case of Poisson Attacks

Authors: Asma Ben Yaghlane, Mohamed Naceur Azaiez

Abstract:

In this paper, we briefly introduce the concept of Poisson attacks in the case of defense/attack strategies where attacks are assumed to be continuous. We suggest a game model in which the attacker will combine both criteria of a sufficient confidence level of a successful attack and a reasonably small size of the estimation error in order to launch an attack. Here, estimation error arises from assessing the system failure upon attack using aggregate data at the system level. The corresponding error is referred to as aggregation error. On the other hand, the defender will attempt to deter attack by making one or both criteria inapplicable. The defender will build his/her strategy by both strengthening the targeted system and increasing the size of error. We will formulate the defender problem based on appropriate optimization models. The attacker will opt for a Bayesian updating in assessing the impact on the improvement made by the defender. Then, the attacker will evaluate the feasibility of the attack before making the decision of whether or not to launch it. We will provide illustrations to better explain the process.

Keywords: attacker, defender, game theory, information

Procedia PDF Downloads 424
89 Modeling Food Popularity Dependencies Using Social Media Data

Authors: DEVASHISH KHULBE, MANU PATHAK

Abstract:

The rise in popularity of major social media platforms have enabled people to share photos and textual information about their daily life. One of the popular topics about which information is shared is food. Since a lot of media about food are attributed to particular locations and restaurants, information like spatio-temporal popularity of various cuisines can be analyzed. Tracking the popularity of food types and retail locations across space and time can also be useful for business owners and restaurant investors. In this work, we present an approach using off-the shelf machine learning techniques to identify trends and popularity of cuisine types in an area using geo-tagged data from social media, Google images and Yelp. After adjusting for time, we use the Kernel Density Estimation to get hot spots across the location and model the dependencies among food cuisines popularity using Bayesian Networks. We consider the Manhattan borough of New York City as the location for our analyses but the approach can be used for any area with social media data and information about retail businesses.

Keywords: Web Mining, Geographic Information Systems, Business popularity, Spatial Data Analyses

Procedia PDF Downloads 75
88 Development of Computational Approach for Calculation of Hydrogen Solubility in Hydrocarbons for Treatment of Petroleum

Authors: Abdulrahman Sumayli, Saad M. AlShahrani

Abstract:

For the hydrogenation process, knowing the solubility of hydrogen (H2) in hydrocarbons is critical to improve the efficiency of the process. We investigated the H2 solubility computation in four heavy crude oil feedstocks using machine learning techniques. Temperature, pressure, and feedstock type were considered as the inputs to the models, while the hydrogen solubility was the sole response. Specifically, we employed three different models: Support Vector Regression (SVR), Gaussian process regression (GPR), and Bayesian ridge regression (BRR). To achieve the best performance, the hyper-parameters of these models are optimized using the whale optimization algorithm (WOA). We evaluated the models using a dataset of solubility measurements in various feedstocks, and we compared their performance based on several metrics. Our results show that the WOA-SVR model tuned with WOA achieves the best performance overall, with an RMSE of 1.38 × 10− 2 and an R-squared of 0.991. These findings suggest that machine learning techniques can provide accurate predictions of hydrogen solubility in different feedstocks, which could be useful in the development of hydrogen-related technologies. Besides, the solubility of hydrogen in the four heavy oil fractions is estimated in different ranges of temperatures and pressures of 150 ◦C–350 ◦C and 1.2 MPa–10.8 MPa, respectively

Keywords: temperature, pressure variations, machine learning, oil treatment

Procedia PDF Downloads 35
87 Medical Knowledge Management since the Integration of Heterogeneous Data until the Knowledge Exploitation in a Decision-Making System

Authors: Nadjat Zerf Boudjettou, Fahima Nader, Rachid Chalal

Abstract:

Knowledge management is to acquire and represent knowledge relevant to a domain, a task or a specific organization in order to facilitate access, reuse and evolution. This usually means building, maintaining and evolving an explicit representation of knowledge. The next step is to provide access to that knowledge, that is to say, the spread in order to enable effective use. Knowledge management in the medical field aims to improve the performance of the medical organization by allowing individuals in the care facility (doctors, nurses, paramedics, etc.) to capture, share and apply collective knowledge in order to make optimal decisions in real time. In this paper, we propose a knowledge management approach based on integration technique of heterogeneous data in the medical field by creating a data warehouse, a technique of extracting knowledge from medical data by choosing a technique of data mining, and finally an exploitation technique of that knowledge in a case-based reasoning system.

Keywords: data warehouse, data mining, knowledge discovery in database, KDD, medical knowledge management, Bayesian networks

Procedia PDF Downloads 353
86 Optimizing the Capacity of a Convolutional Neural Network for Image Segmentation and Pattern Recognition

Authors: Yalong Jiang, Zheru Chi

Abstract:

In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.

Keywords: CNN, convolutional neural network, capsule network, capacity optimization, character recognition, data augmentation, semantic segmentation

Procedia PDF Downloads 114
85 Facility Anomaly Detection with Gaussian Mixture Model

Authors: Sunghoon Park, Hank Kim, Jinwon An, Sungzoon Cho

Abstract:

Internet of Things allows one to collect data from facilities which are then used to monitor them and even predict malfunctions in advance. Conventional quality control methods focus on setting a normal range on a sensor value defined between a lower control limit and an upper control limit, and declaring as an anomaly anything falling outside it. However, interactions among sensor values are ignored, thus leading to suboptimal performance. We propose a multivariate approach which takes into account many sensor values at the same time. In particular Gaussian Mixture Model is used which is trained to maximize likelihood value using Expectation-Maximization algorithm. The number of Gaussian component distributions is determined by Bayesian Information Criterion. The negative Log likelihood value is used as an anomaly score. The actual usage scenario goes like a following. For each instance of sensor values from a facility, an anomaly score is computed. If it is larger than a threshold, an alarm will go off and a human expert intervenes and checks the system. A real world data from Building energy system was used to test the model.

Keywords: facility anomaly detection, gaussian mixture model, anomaly score, expectation maximization algorithm

Procedia PDF Downloads 238