Search results for: Naïve Bayesian
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 443

Search results for: Naïve Bayesian

173 The Regulation of the Cancer Epigenetic Landscape Lies in the Realm of the Long Non-coding RNAs

Authors: Ricardo Alberto Chiong Zevallos, Eduardo Moraes Rego Reis

Abstract:

Pancreatic adenocarcinoma (PDAC) patients have a less than 10% 5-year survival rate. PDAC has no defined diagnostic and prognostic biomarkers. Gemcitabine is the first-line drug in PDAC and several other cancers. Long non-coding RNAs (lncRNAs) contribute to the tumorigenesis and are potential biomarkers for PDAC. Although lncRNAs aren’t translated into proteins, they have important functions. LncRNAs can decoy or recruit proteins from the epigenetic machinery, act as microRNA sponges, participate in protein translocation through different cellular compartments, and even promote chemoresistance. The chromatin remodeling enzyme EZH2 is a histone methyltransferase that catalyzes the methylation of histone 3 at lysine 27, silencing local expression. EZH2 is ambivalent, it can also activate gene expression independently of its histone methyltransferase activity. EZH2 is overexpressed in several cancers and interacts with lncRNAs, being recruited to a specific locus. EZH2 can be recruited to activate an oncogene or silence a tumor suppressor. The lncRNAs misregulation in cancer can result in the differential recruitment of EZH2 and in a distinct epigenetic landscape, promoting chemoresistance. The relevance of the EZH2-lncRNAs interaction to chemoresistant PDAC was assessed by Real Time quantitative PCR (RT-qPCR) and RNA Immunoprecipitation (RIP) experiments with naïve and gemcitabine-resistant PDAC cells. The expression of several lncRNAs and EZH2 gene targets was evaluated contrasting naïve and resistant cells. Selection of candidate genes was made by bioinformatic analysis and literature curation. Indeed, the resistant cell line showed higher expression of chemoresistant-associated lncRNAs and protein coding genes. RIP detected lncRNAs interacting with EZH2 with varying intensity levels in the cell lines. During RIP, the nuclear fraction of the cells was incubated with an antibody for EZH2 and with magnetic beads. The RNA precipitated with the beads-antibody-EZH2 complex was isolated and reverse transcribed. The presence of candidate lncRNAs was detected by RT-qPCR, and the enrichment was calculated relative to INPUT (total lysate control sample collected before RIP). The enrichment levels varied across the several lncRNAs and cell lines. The EZH2-lncRNA interaction might be responsible for the regulation of chemoresistance-associated genes in multiple cancers. The relevance of the lncRNA-EZH2 interaction to PDAC was assessed by siRNA knockdown of a lncRNA, followed by the analysis of the EZH2 target expression by RT-qPCR. The chromatin immunoprecipitation (ChIP) of EZH2 and H3K27me3 followed by RT-qPCR with primers for EZH2 targets also assess the specificity of the EZH2 recruitment by the lncRNA. This is the first report of the interaction of EZH2 and lncRNAs HOTTIP and PVT1 in chemoresistant PDAC. HOTTIP and PVT1 were described as promoting chemoresistance in several cancers, but the role of EZH2 is not clarified. For the first time, the lncRNA LINC01133 was detected in a chemoresistant cancer. The interaction of EZH2 with LINC02577, LINC00920, LINC00941, and LINC01559 have never been reported in any context. The novel lncRNAs-EZH2 interactions regulate chemoresistant-associated genes in PDAC and might be relevant to other cancers. Therapies targeting EZH2 alone weren’t successful, and a combinatorial approach also targeting the lncRNAs interacting with it might be key to overcome chemoresistance in several cancers.

Keywords: epigenetics, chemoresistance, long non-coding RNAs, pancreatic cancer, histone modification

Procedia PDF Downloads 61
172 Gaussian Particle Flow Bernoulli Filter for Single Target Tracking

Authors: Hyeongbok Kim, Lingling Zhao, Xiaohong Su, Junjie Wang

Abstract:

The Bernoulli filter is a precise Bayesian filter for single target tracking based on the random finite set theory. The standard Bernoulli filter often underestimates the number of targets. This study proposes a Gaussian particle flow (GPF) Bernoulli filter employing particle flow to migrate particles from prior to posterior positions to improve the performance of the standard Bernoulli filter. By employing the particle flow filter, the computational speed of the Bernoulli filters is significantly improved. In addition, the GPF Bernoulli filter provides a more accurate estimation compared with that of the standard Bernoulli filter. Simulation results confirm the improved tracking performance and computational speed in two- and three-dimensional scenarios compared with other algorithms.

Keywords: Bernoulli filter, particle filter, particle flow filter, random finite sets, target tracking

Procedia PDF Downloads 56
171 Cross Project Software Fault Prediction at Design Phase

Authors: Pradeep Singh, Shrish Verma

Abstract:

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. The earlier we predict the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven data sets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.

Keywords: software metrics, fault prediction, cross project, within project.

Procedia PDF Downloads 309
170 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301
169 Homing of B Cells via Afferent Lymphatics

Authors: Sara Pereira-Nogueira, Tim Worbs, Marc Permanyer-Bosser, Reinhold Förster

Abstract:

While the entry mechanism of lymphocytes into the lymph node via the blood are well described, it is still largely unknown how cells enter lymph nodes that arrive via afferent lymphatics. In order to address this, our group has established a micro-injection technique in mice through which cells are delivered directly into the lymphatic vessel immediately afferent to the popliteal lymph node. Injected cells can then be tracked via multi-colour fluorescence or 2-photon microscopy, and their localization can be analysed within the popliteal or downstream lymph nodes by immunohistology. Since naïve B cells express the chemokine receptor CXCR5 we intra-lymphatically co-injected B cells derived from wildtype and Cxcr5-deficient mice. While CXCR5 does not play a role in guiding B cells out of the subcapsular sinus, it affects their positioning within the lymph node parenchyma, since CXCR5-deficient B cells are impaired in migrating into the B cell follicle. The knowledge obtained by studying B-cell migration may prove beneficial in clinical settings regarding tumor metastasis or autoimmune diseases.

Keywords: afferent lymphatics, B cell migration, chemokine, intra-lymphatic injection

Procedia PDF Downloads 234
168 The Best Prediction Data Mining Model for Breast Cancer Probability in Women Residents in Kabul

Authors: Mina Jafari, Kobra Hamraee, Saied Hossein Hosseini

Abstract:

The prediction of breast cancer disease is one of the challenges in medicine. In this paper we collected 528 records of women’s information who live in Kabul including demographic, life style, diet and pregnancy data. There are many classification algorithm in breast cancer prediction and tried to find the best model with most accurate result and lowest error rate. We evaluated some other common supervised algorithms in data mining to find the best model in prediction of breast cancer disease among afghan women living in Kabul regarding to momography result as target variable. For evaluating these algorithms we used Cross Validation which is an assured method for measuring the performance of models. After comparing error rate and accuracy of three models: Decision Tree, Naive Bays and Rule Induction, Decision Tree with accuracy of 94.06% and error rate of %15 is found the best model to predicting breast cancer disease based on the health care records.

Keywords: decision tree, breast cancer, probability, data mining

Procedia PDF Downloads 108
167 Wireless Sensor Anomaly Detection Using Soft Computing

Authors: Mouhammd Alkasassbeh, Alaa Lasasmeh

Abstract:

We live in an era of rapid development as a result of significant scientific growth. Like other technologies, wireless sensor networks (WSNs) are playing one of the main roles. Based on WSNs, ZigBee adds many features to devices, such as minimum cost and power consumption, and increasing the range and connect ability of sensor nodes. ZigBee technology has come to be used in various fields, including science, engineering, and networks, and even in medicinal aspects of intelligence building. In this work, we generated two main datasets, the first being based on tree topology and the second on star topology. The datasets were evaluated by three machine learning (ML) algorithms: J48, meta.j48 and multilayer perceptron (MLP). Each topology was classified into normal and abnormal (attack) network traffic. The dataset used in our work contained simulated data from network simulation 2 (NS2). In each database, the Bayesian network meta.j48 classifier achieved the highest accuracy level among other classifiers, of 99.7% and 99.2% respectively.

Keywords: IDS, Machine learning, WSN, ZigBee technology

Procedia PDF Downloads 515
166 Intrathecal Fentanyl with 0.5% Bupivacaine Heavy in Chronic Opium Abusers

Authors: Suneet Kathuria, Shikha Gupta, Kapil Dev, Sunil Katyal

Abstract:

Chronic use of opioids in opium abusers can cause poor pain control and increased analgaesic requirement. We compared the duration of spinal anaesthesia in chronic opium abusers and non-abusers. This prospective randomised study included 60 American Society of Anesthesiologists (ASA) Grade I or II adults undergoing surgery under spinal anaesthesia with 10 mg bupivacaine, and 25 μg fentanyl in non-opium abusers (Group A); and chronic opium abusers (Group B), and 40 μg fentanyl in chronic opium abusers (Group C). Patients were assessed for onset and duration of sensory and motor blockade and duration of effective analgesia. Mean time to onset of adequate analgesia in opium abusers was significantly longer in chronic opium abusers than in opium-naive patients. The duration of sensory block and motor block was significantly less in chronic opium abusers than in non-opium abusers. Duration of effective analgesia in groups A, B and C was 255.55 ± 26.84, 217.85 ± 15.15, and 268.20 ± 18.25 minutes, respectively; this difference was statistically significant. In chronic opium abusers, the duration of spinal anaesthesia is significantly shorter than that in opium nonabusers. The duration of spinal anaesthesia with bupivacaine and fentanyl in chronic opium abusers can be improved by increasing the intrathecal fentanyl dose from 25 μg to 40 μg.

Keywords: bupivacaine, chronic opium abusers, fentanyl, intrathecal

Procedia PDF Downloads 265
165 On Estimating the Low Income Proportion with Several Auxiliary Variables

Authors: Juan F. Muñoz-Rosas, Rosa M. García-Fernández, Encarnación Álvarez-Verdejo, Pablo J. Moya-Fernández

Abstract:

Poverty measurement is a very important topic in many studies in social sciences. One of the most important indicators when measuring poverty is the low income proportion. This indicator gives the proportion of people of a population classified as poor. This indicator is generally unknown, and for this reason, it is estimated by using survey data, which are obtained by official surveys carried out by many statistical agencies such as Eurostat. The main feature of the mentioned survey data is the fact that they contain several variables. The variable used to estimate the low income proportion is called as the variable of interest. The survey data may contain several additional variables, also named as the auxiliary variables, related to the variable of interest, and if this is the situation, they could be used to improve the estimation of the low income proportion. In this paper, we use Monte Carlo simulation studies to analyze numerically the performance of estimators based on several auxiliary variables. In this simulation study, we considered real data sets obtained from the 2011 European Union Survey on Income and Living Condition. Results derived from this study indicate that the estimators based on auxiliary variables are more accurate than the naive estimator.

Keywords: inclusion probability, poverty, poverty line, survey sampling

Procedia PDF Downloads 421
164 Forecasting Model to Predict Dengue Incidence in Malaysia

Authors: W. H. Wan Zakiyatussariroh, A. A. Nasuhar, W. Y. Wan Fairos, Z. A. Nazatul Shahreen

Abstract:

Forecasting dengue incidence in a population can provide useful information to facilitate the planning of the public health intervention. Many studies on dengue cases in Malaysia were conducted but are limited in modeling the outbreak and forecasting incidence. This article attempts to propose the most appropriate time series model to explain the behavior of dengue incidence in Malaysia for the purpose of forecasting future dengue outbreaks. Several seasonal auto-regressive integrated moving average (SARIMA) models were developed to model Malaysia’s number of dengue incidence on weekly data collected from January 2001 to December 2011. SARIMA (2,1,1)(1,1,1)52 model was found to be the most suitable model for Malaysia’s dengue incidence with the least value of Akaike information criteria (AIC) and Bayesian information criteria (BIC) for in-sample fitting. The models further evaluate out-sample forecast accuracy using four different accuracy measures. The results indicate that SARIMA (2,1,1)(1,1,1)52 performed well for both in-sample fitting and out-sample evaluation.

Keywords: time series modeling, Box-Jenkins, SARIMA, forecasting

Procedia PDF Downloads 447
163 A Geographic Information System Mapping Method for Creating Improved Satellite Solar Radiation Dataset Over Qatar

Authors: Sachin Jain, Daniel Perez-Astudillo, Dunia A. Bachour, Antonio P. Sanfilippo

Abstract:

The future of solar energy in Qatar is evolving steadily. Hence, high-quality spatial solar radiation data is of the uttermost requirement for any planning and commissioning of solar technology. Generally, two types of solar radiation data are available: satellite data and ground observations. Satellite solar radiation data is developed by the physical and statistical model. Ground data is collected by solar radiation measurement stations. The ground data is of high quality. However, they are limited to distributed point locations with the high cost of installation and maintenance for the ground stations. On the other hand, satellite solar radiation data is continuous and available throughout geographical locations, but they are relatively less accurate than ground data. To utilize the advantage of both data, a product has been developed here which provides spatial continuity and higher accuracy than any of the data alone. The popular satellite databases: National Solar radiation Data Base, NSRDB (PSM V3 model, spatial resolution: 4 km) is chosen here for merging with ground-measured solar radiation measurement in Qatar. The spatial distribution of ground solar radiation measurement stations is comprehensive in Qatar, with a network of 13 ground stations. The monthly average of the daily total Global Horizontal Irradiation (GHI) component from ground and satellite data is used for error analysis. The normalized root means square error (NRMSE) values of 3.31%, 6.53%, and 6.63% for October, November, and December 2019 were observed respectively when comparing in-situ and NSRDB data. The method is based on the Empirical Bayesian Kriging Regression Prediction model available in ArcGIS, ESRI. The workflow of the algorithm is based on the combination of regression and kriging methods. A regression model (OLS, ordinary least square) is fitted between the ground and NSBRD data points. A semi-variogram is fitted into the experimental semi-variogram obtained from the residuals. The kriging residuals obtained after fitting the semi-variogram model were added to NSRBD data predicted values obtained from the regression model to obtain the final predicted values. The NRMSE values obtained after merging are respectively 1.84%, 1.28%, and 1.81% for October, November, and December 2019. One more explanatory variable, that is the ground elevation, has been incorporated in the regression and kriging methods to reduce the error and to provide higher spatial resolution (30 m). The final GHI maps have been created after merging, and NRMSE values of 1.24%, 1.28%, and 1.28% have been observed for October, November, and December 2019, respectively. The proposed merging method has proven as a highly accurate method. An additional method is also proposed here to generate calibrated maps by using regression and kriging model and further to use the calibrated model to generate solar radiation maps from the explanatory variable only when not enough historical ground data is available for long-term analysis. The NRMSE values obtained after the comparison of the calibrated maps with ground data are 5.60% and 5.31% for November and December 2019 month respectively.

Keywords: global horizontal irradiation, GIS, empirical bayesian kriging regression prediction, NSRDB

Procedia PDF Downloads 60
162 A Stochastic Volatility Model for Optimal Market-Making

Authors: Zubier Arfan, Paul Johnson

Abstract:

The electronification of financial markets and the rise of algorithmic trading has sparked a lot of interest from the mathematical community, for the market making-problem in particular. The research presented in this short paper solves the classic stochastic control problem in order to derive the strategy for a market-maker. It also shows how to calibrate and simulate the strategy with real limit order book data for back-testing. The ambiguity of limit-order priority in back-testing is dealt with by considering optimistic and pessimistic priority scenarios. The model, although it does outperform a naive strategy, assumes constant volatility, therefore, is not best suited to the LOB data. The Heston model is introduced to describe the price and variance process of the asset. The Trader's constant absolute risk aversion utility function is optimised by numerically solving a 3-dimensional Hamilton-Jacobi-Bellman partial differential equation to find the optimal limit order quotes. The results show that the stochastic volatility market-making model is more suitable for a risk-averse trader and is also less sensitive to calibration error than the constant volatility model.

Keywords: market-making, market-microsctrucure, stochastic volatility, quantitative trading

Procedia PDF Downloads 115
161 The First Complete Mitochondrial Genome of Melon Thrips, Thrips palmi (Thripinae: Thysanoptera): Vector for Tospoviruses

Authors: Kaomud Tyagi, Rajasree Chakraborty, Shantanu Kundu, Devkant Singha, Kailash Chandra, Vikas Kumar

Abstract:

The melon thrips, Thrips palmi is a serious pest of a wide range of agriculture crops and also act as vectors for plant viruses (genus Tospovirus, family Bunyaviridae). More molecular data on this species is required to understand the cryptic speciation and evolutionary affiliations. Mitochondrial genomes have been widely used in phylogenetic and evolutionary studies in insect. So far, mitogenomes of five thrips species (Anaphothrips obscurus, Frankliniella intonsa, Frankliniella occidentalis, Scirtothrips dorsalis and Thrips imaginis) is available in the GenBank database. In this study, we sequenced the first complete mitogenome T. palmi and compared it with available thrips mitogenomes. We assembled the mitogenome from the whole genome sequencing data generated using Illumina Hiseq2500. Annotation was performed using MITOS web-server to estimate the location of protein coding genes (PCGs), transfer RNA (tRNAs), ribosomal RNAs (rRNAs) and their secondary structures. The boundaries of PCGs and rRNAs was confirmed manually in NCBI. Phylogenetic analyses were performed using the 13 PCGs data using maximum likelihood (ML) in PAUP, and Bayesian inference (BI) in MrBayes 3.2. The complete mitogenome of T. palmi was 15,333 base pairs (bp), which was greater than the genomes of A. obscurus (14,890bp), F. intonsa (15,215 bp), F. occidentalis (14,889 bp) and S. dorsalis South Asia strain (SA1) (14,283 bp), but smaller than the genomes of T. imaginis (15,407 bp) and S. dorsalis East Asia strain (EA1) (15,343bp). Like in other thrips species, the mitochondrial genome of T. palmi was represented by 37 genes, including 13 PCGs, large and small ribosomal RNA (rrnL and rrnS) genes, 22 transfer RNA (tRNAs) genes (with one extra gene for trn-Serine) and two A+T-rich control regions (CR1 and CR2). Thirty one genes were observed on heavy (H) strand and six genes on the light (L) strand. The six tRNA genes (trnG,trnK, trnY, trnW, trnF, and trnH) were found to be conserved in all thrips species mitogenomes in their locations relative to a protein-coding or rRNA gene upstream or downstream. The gene arrangements of T. palmi is very close to T. imaginis except the rearrangements in tRNAs genes: trnR (arginine), and trnE (glutamic acid) were found to be located between cox3 and CR2 in T. imaginis which were translocated between atp6 and CR1 in T. palmi; trnL1 (Leucine) and trnS1(Serine) were located between atp6 and CR1 in T. imaginis which were translocated between cox3 and CR2 in T. palmi. The location of CR1 upstream of nad5 gene was suggested to be ancestral condition of the thrips species in subfamily Thripinae, was also observed in T. palmi. Both the Maximum likelihood (ML) and Bayesian Inference (BI) phylogenetic trees generated resulted in similar topologies. The T. palmi was clustered with T. imaginis. We concluded that more molecular data on the diverse thrips species from different hierarchical level is needed, to understand the phylogenetic and evolutionary relationships among them.

Keywords: thrips, comparative mitogenomics, gene rearrangements, phylogenetic analysis

Procedia PDF Downloads 138
160 Hawking Radiation of Grumiller Black

Authors: Sherwan Kher Alden Yakub Alsofy

Abstract:

In this paper, we consider the relativistic Hamilton-Jacobi (HJ) equation and study the Hawking radiation (HR) of scalar particles from uncharged Grumiller black hole (GBH) which is affordable for testing in astrophysics. GBH is also known as Rindler modified Schwarzschild BH. Our aim is not only to investigate the effect of the Rindler parameter A on the Hawking temperature (TH ), but to examine whether there is any discrepancy between the computed horizon temperature and the standard TH as well. For this purpose, in addition to its naive coordinate system, we study on the three regular coordinate systems which are Painlev´-Gullstrand (PG), ingoing Eddington- Finkelstein (IEF) and Kruskal-Szekeres (KS) coordinates. In all coordinate systems, we calculate the tunneling probabilities of incoming and outgoing scalar particles from the event horizon by using the HJ equation. It has been shown in detail that the considered HJ method is concluded with the conventional TH in all these coordinate systems without giving rise to the famous factor- 2 problem. Furthermore, in the PG coordinates Parikh-Wilczek’s tunneling (PWT) method is employed in order to show how one can integrate the quantum gravity (QG) corrections to the semiclassical tunneling rate by including the effects of self-gravitation and back reaction. We then show how these corrections yield a modification in the TH.

Keywords: ingoing Eddington, Finkelstein, coordinates Parikh-Wilczek’s, Hamilton-Jacobi equation

Procedia PDF Downloads 586
159 Performance and Limitations of Likelihood Based Information Criteria and Leave-One-Out Cross-Validation Approximation Methods

Authors: M. A. C. S. Sampath Fernando, James M. Curran, Renate Meyer

Abstract:

Model assessment, in the Bayesian context, involves evaluation of the goodness-of-fit and the comparison of several alternative candidate models for predictive accuracy and improvements. In posterior predictive checks, the data simulated under the fitted model is compared with the actual data. Predictive model accuracy is estimated using information criteria such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the Deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC). The goal of an information criterion is to obtain an unbiased measure of out-of-sample prediction error. Since posterior checks use the data twice; once for model estimation and once for testing, a bias correction which penalises the model complexity is incorporated in these criteria. Cross-validation (CV) is another method used for examining out-of-sample prediction accuracy. Leave-one-out cross-validation (LOO-CV) is the most computationally expensive variant among the other CV methods, as it fits as many models as the number of observations. Importance sampling (IS), truncated importance sampling (TIS) and Pareto-smoothed importance sampling (PSIS) are generally used as approximations to the exact LOO-CV and utilise the existing MCMC results avoiding expensive computational issues. The reciprocals of the predictive densities calculated over posterior draws for each observation are treated as the raw importance weights. These are in turn used to calculate the approximate LOO-CV of the observation as a weighted average of posterior densities. In IS-LOO, the raw weights are directly used. In contrast, the larger weights are replaced by their modified truncated weights in calculating TIS-LOO and PSIS-LOO. Although, information criteria and LOO-CV are unable to reflect the goodness-of-fit in absolute sense, the differences can be used to measure the relative performance of the models of interest. However, the use of these measures is only valid under specific circumstances. This study has developed 11 models using normal, log-normal, gamma, and student’s t distributions to improve the PCR stutter prediction with forensic data. These models are comprised of four with profile-wide variances, four with locus specific variances, and three which are two-component mixture models. The mean stutter ratio in each model is modeled as a locus specific simple linear regression against a feature of the alleles under study known as the longest uninterrupted sequence (LUS). The use of AIC, BIC, DIC, and WAIC in model comparison has some practical limitations. Even though, IS-LOO, TIS-LOO, and PSIS-LOO are considered to be approximations of the exact LOO-CV, the study observed some drastic deviations in the results. However, there are some interesting relationships among the logarithms of pointwise predictive densities (lppd) calculated under WAIC and the LOO approximation methods. The estimated overall lppd is a relative measure that reflects the overall goodness-of-fit of the model. Parallel log-likelihood profiles for the models conditional on equal posterior variances in lppds were observed. This study illustrates the limitations of the information criteria in practical model comparison problems. In addition, the relationships among LOO-CV approximation methods and WAIC with their limitations are discussed. Finally, useful recommendations that may help in practical model comparisons with these methods are provided.

Keywords: cross-validation, importance sampling, information criteria, predictive accuracy

Procedia PDF Downloads 362
158 Assessment of Potential Chemical Exposure to Betamethasone Valerate and Clobetasol Propionate in Pharmaceutical Manufacturing Laboratories

Authors: Nadeen Felemban, Hamsa Banjer, Rabaah Jaafari

Abstract:

One of the most common hazards in the pharmaceutical industry is the chemical hazard, which can cause harm or develop occupational health diseases/illnesses due to chronic exposures to hazardous substances. Therefore, a chemical agent management system is required, including hazard identification, risk assessment, controls for specific hazards and inspections, to keep your workplace healthy and safe. However, routine management monitoring is also required to verify the effectiveness of the control measures. Moreover, Betamethasone Valerate and Clobetasol Propionate are some of the APIs (Active Pharmaceutical Ingredients) with highly hazardous classification-Occupational Hazard Category (OHC 4), which requires a full containment (ECA-D) during handling to avoid chemical exposure. According to Safety Data Sheet, those chemicals are reproductive toxicants (reprotoxicant H360D), which may affect female workers’ health and cause fatal damage to an unborn child, or impair fertility. In this study, qualitative (chemical Risk assessment-qCRA) was conducted to assess the chemical exposure during handling of Betamethasone Valerate and Clobetasol Propionate in pharmaceutical laboratories. The outcomes of qCRA identified that there is a risk of potential chemical exposure (risk rating 8 Amber risk). Therefore, immediate actions were taken to ensure interim controls (according to the Hierarchy of controls) are in place and in use to minimize the risk of chemical exposure. No open handlings should be done out of the Steroid Glove Box Isolator (SGB) with the required Personal Protective Equipment (PPEs). The PPEs include coverall, nitrile hand gloves, safety shoes and powered air-purifying respirators (PAPR). Furthermore, a quantitative assessment (personal air sampling) was conducted to verify the effectiveness of the engineering controls (SGB Isolator) and to confirm if there is chemical exposure, as indicated earlier by qCRA. Three personal air samples were collected using an air sampling pump and filter (IOM2 filters, 25mm glass fiber media). The collected samples were analyzed by HPLC in the BV lab, and the measured concentrations were reported in (ug/m3) with reference to Occupation Exposure Limits, 8hr OELs (8hr TWA) for each analytic. The analytical results are needed in 8hr TWA (8hr Time-weighted Average) to be analyzed using Bayesian statistics (IHDataAnalyst). The results of the Bayesian Likelihood Graph indicate (category 0), which means Exposures are de "minimus," trivial, or non-existent Employees have little to no exposure. Also, these results indicate that the 3 samplings are representative samplings with very low variations (SD=0.0014). In conclusion, the engineering controls were effective in protecting the operators from such exposure. However, routine chemical monitoring is required every 3 years unless there is a change in the processor type of chemicals. Also, frequent management monitoring (daily, weekly, and monthly) is required to ensure the control measures are in place and in use. Furthermore, a Similar Exposure Group (SEG) was identified in this activity and included in the annual health surveillance for health monitoring.

Keywords: occupational health and safety, risk assessment, chemical exposure, hierarchy of control, reproductive

Procedia PDF Downloads 149
157 Detection and Classification of Myocardial Infarction Using New Extracted Features from Standard 12-Lead ECG Signals

Authors: Naser Safdarian, Nader Jafarnia Dabanloo

Abstract:

In this paper we used four features i.e. Q-wave integral, QRS complex integral, T-wave integral and total integral as extracted feature from normal and patient ECG signals to detection and localization of myocardial infarction (MI) in left ventricle of heart. In our research we focused on detection and localization of MI in standard ECG. We use the Q-wave integral and T-wave integral because this feature is important impression in detection of MI. We used some pattern recognition method such as Artificial Neural Network (ANN) to detect and localize the MI. Because these methods have good accuracy for classification of normal and abnormal signals. We used one type of Radial Basis Function (RBF) that called Probabilistic Neural Network (PNN) because of its nonlinearity property, and used other classifier such as k-Nearest Neighbors (KNN), Multilayer Perceptron (MLP) and Naive Bayes Classification. We used PhysioNet database as our training and test data. We reached over 80% for accuracy in test data for localization and over 95% for detection of MI. Main advantages of our method are simplicity and its good accuracy. Also we can improve accuracy of classification by adding more features in this method. A simple method based on using only four features which extracted from standard ECG is presented which has good accuracy in MI localization.

Keywords: ECG signal processing, myocardial infarction, features extraction, pattern recognition

Procedia PDF Downloads 427
156 A Game of Information in Defense/Attack Strategies: Case of Poisson Attacks

Authors: Asma Ben Yaghlane, Mohamed Naceur Azaiez

Abstract:

In this paper, we briefly introduce the concept of Poisson attacks in the case of defense/attack strategies where attacks are assumed to be continuous. We suggest a game model in which the attacker will combine both criteria of a sufficient confidence level of a successful attack and a reasonably small size of the estimation error in order to launch an attack. Here, estimation error arises from assessing the system failure upon attack using aggregate data at the system level. The corresponding error is referred to as aggregation error. On the other hand, the defender will attempt to deter attack by making one or both criteria inapplicable. The defender will build his/her strategy by both strengthening the targeted system and increasing the size of error. We will formulate the defender problem based on appropriate optimization models. The attacker will opt for a Bayesian updating in assessing the impact on the improvement made by the defender. Then, the attacker will evaluate the feasibility of the attack before making the decision of whether or not to launch it. We will provide illustrations to better explain the process.

Keywords: attacker, defender, game theory, information

Procedia PDF Downloads 427
155 Stackelberg Security Game for Optimizing Security of Federated Internet of Things Platform Instances

Authors: Violeta Damjanovic-Behrendt

Abstract:

This paper presents an approach for optimal cyber security decisions to protect instances of a federated Internet of Things (IoT) platform in the cloud. The presented solution implements the repeated Stackelberg Security Game (SSG) and a model called Stochastic Human behaviour model with AttRactiveness and Probability weighting (SHARP). SHARP employs the Subjective Utility Quantal Response (SUQR) for formulating a subjective utility function, which is based on the evaluations of alternative solutions during decision-making. We augment the repeated SSG (including SHARP and SUQR) with a reinforced learning algorithm called Naïve Q-Learning. Naïve Q-Learning belongs to the category of active and model-free Machine Learning (ML) techniques in which the agent (either the defender or the attacker) attempts to find an optimal security solution. In this way, we combine GT and ML algorithms for discovering optimal cyber security policies. The proposed security optimization components will be validated in a collaborative cloud platform that is based on the Industrial Internet Reference Architecture (IIRA) and its recently published security model.

Keywords: security, internet of things, cloud computing, stackelberg game, machine learning, naive q-learning

Procedia PDF Downloads 326
154 Modeling Food Popularity Dependencies Using Social Media Data

Authors: DEVASHISH KHULBE, MANU PATHAK

Abstract:

The rise in popularity of major social media platforms have enabled people to share photos and textual information about their daily life. One of the popular topics about which information is shared is food. Since a lot of media about food are attributed to particular locations and restaurants, information like spatio-temporal popularity of various cuisines can be analyzed. Tracking the popularity of food types and retail locations across space and time can also be useful for business owners and restaurant investors. In this work, we present an approach using off-the shelf machine learning techniques to identify trends and popularity of cuisine types in an area using geo-tagged data from social media, Google images and Yelp. After adjusting for time, we use the Kernel Density Estimation to get hot spots across the location and model the dependencies among food cuisines popularity using Bayesian Networks. We consider the Manhattan borough of New York City as the location for our analyses but the approach can be used for any area with social media data and information about retail businesses.

Keywords: Web Mining, Geographic Information Systems, Business popularity, Spatial Data Analyses

Procedia PDF Downloads 83
153 Development of Computational Approach for Calculation of Hydrogen Solubility in Hydrocarbons for Treatment of Petroleum

Authors: Abdulrahman Sumayli, Saad M. AlShahrani

Abstract:

For the hydrogenation process, knowing the solubility of hydrogen (H2) in hydrocarbons is critical to improve the efficiency of the process. We investigated the H2 solubility computation in four heavy crude oil feedstocks using machine learning techniques. Temperature, pressure, and feedstock type were considered as the inputs to the models, while the hydrogen solubility was the sole response. Specifically, we employed three different models: Support Vector Regression (SVR), Gaussian process regression (GPR), and Bayesian ridge regression (BRR). To achieve the best performance, the hyper-parameters of these models are optimized using the whale optimization algorithm (WOA). We evaluated the models using a dataset of solubility measurements in various feedstocks, and we compared their performance based on several metrics. Our results show that the WOA-SVR model tuned with WOA achieves the best performance overall, with an RMSE of 1.38 × 10− 2 and an R-squared of 0.991. These findings suggest that machine learning techniques can provide accurate predictions of hydrogen solubility in different feedstocks, which could be useful in the development of hydrogen-related technologies. Besides, the solubility of hydrogen in the four heavy oil fractions is estimated in different ranges of temperatures and pressures of 150 ◦C–350 ◦C and 1.2 MPa–10.8 MPa, respectively

Keywords: temperature, pressure variations, machine learning, oil treatment

Procedia PDF Downloads 41
152 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors

Authors: Katawut Kaewbanjong

Abstract:

We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.

Keywords: prediction model, statistical analysis, software project, user satisfaction factor

Procedia PDF Downloads 89
151 Predicting Relative Performance of Sector Exchange Traded Funds Using Machine Learning

Authors: Jun Wang, Ge Zhang

Abstract:

Machine learning has been used in many areas today. It thrives at reviewing large volumes of data and identifying patterns and trends that might not be apparent to a human. Given the huge potential benefit and the amount of data available in the financial market, it is not surprising to see machine learning applied to various financial products. While future prices of financial securities are extremely difficult to forecast, we study them from a different angle. Instead of trying to forecast future prices, we apply machine learning algorithms to predict the direction of future price movement, in particular, whether a sector Exchange Traded Fund (ETF) would outperform or underperform the market in the next week or in the next month. We apply several machine learning algorithms for this prediction. The algorithms are Linear Discriminant Analysis (LDA), k-Nearest Neighbors (KNN), Decision Tree (DT), Gaussian Naive Bayes (GNB), and Neural Networks (NN). We show that these machine learning algorithms, most notably GNB and NN, have some predictive power in forecasting out-performance and under-performance out of sample. We also try to explore whether it is possible to utilize the predictions from these algorithms to outperform the buy-and-hold strategy of the S&P 500 index. The trading strategy to explore out-performance predictions does not perform very well, but the trading strategy to explore under-performance predictions can earn higher returns than simply holding the S&P 500 index out of sample.

Keywords: machine learning, ETF prediction, dynamic trading, asset allocation

Procedia PDF Downloads 57
150 Medical Knowledge Management since the Integration of Heterogeneous Data until the Knowledge Exploitation in a Decision-Making System

Authors: Nadjat Zerf Boudjettou, Fahima Nader, Rachid Chalal

Abstract:

Knowledge management is to acquire and represent knowledge relevant to a domain, a task or a specific organization in order to facilitate access, reuse and evolution. This usually means building, maintaining and evolving an explicit representation of knowledge. The next step is to provide access to that knowledge, that is to say, the spread in order to enable effective use. Knowledge management in the medical field aims to improve the performance of the medical organization by allowing individuals in the care facility (doctors, nurses, paramedics, etc.) to capture, share and apply collective knowledge in order to make optimal decisions in real time. In this paper, we propose a knowledge management approach based on integration technique of heterogeneous data in the medical field by creating a data warehouse, a technique of extracting knowledge from medical data by choosing a technique of data mining, and finally an exploitation technique of that knowledge in a case-based reasoning system.

Keywords: data warehouse, data mining, knowledge discovery in database, KDD, medical knowledge management, Bayesian networks

Procedia PDF Downloads 358
149 Exploring Data Leakage in EEG Based Brain-Computer Interfaces: Overfitting Challenges

Authors: Khalida Douibi, Rodrigo Balp, Solène Le Bars

Abstract:

In the medical field, applications related to human experiments are frequently linked to reduced samples size, which makes the training of machine learning models quite sensitive and therefore not very robust nor generalizable. This is notably the case in Brain-Computer Interface (BCI) studies, where the sample size rarely exceeds 20 subjects or a few number of trials. To address this problem, several resampling approaches are often used during the data preparation phase, which is an overly critical step in a data science analysis process. One of the naive approaches that is usually applied by data scientists consists in the transformation of the entire database before the resampling phase. However, this can cause model’ s performance to be incorrectly estimated when making predictions on unseen data. In this paper, we explored the effect of data leakage observed during our BCI experiments for device control through the real-time classification of SSVEPs (Steady State Visually Evoked Potentials). We also studied potential ways to ensure optimal validation of the classifiers during the calibration phase to avoid overfitting. The results show that the scaling step is crucial for some algorithms, and it should be applied after the resampling phase to avoid data leackage and improve results.

Keywords: data leackage, data science, machine learning, SSVEP, BCI, overfitting

Procedia PDF Downloads 120
148 Optimizing the Capacity of a Convolutional Neural Network for Image Segmentation and Pattern Recognition

Authors: Yalong Jiang, Zheru Chi

Abstract:

In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.

Keywords: CNN, convolutional neural network, capsule network, capacity optimization, character recognition, data augmentation, semantic segmentation

Procedia PDF Downloads 118
147 Facility Anomaly Detection with Gaussian Mixture Model

Authors: Sunghoon Park, Hank Kim, Jinwon An, Sungzoon Cho

Abstract:

Internet of Things allows one to collect data from facilities which are then used to monitor them and even predict malfunctions in advance. Conventional quality control methods focus on setting a normal range on a sensor value defined between a lower control limit and an upper control limit, and declaring as an anomaly anything falling outside it. However, interactions among sensor values are ignored, thus leading to suboptimal performance. We propose a multivariate approach which takes into account many sensor values at the same time. In particular Gaussian Mixture Model is used which is trained to maximize likelihood value using Expectation-Maximization algorithm. The number of Gaussian component distributions is determined by Bayesian Information Criterion. The negative Log likelihood value is used as an anomaly score. The actual usage scenario goes like a following. For each instance of sensor values from a facility, an anomaly score is computed. If it is larger than a threshold, an alarm will go off and a human expert intervenes and checks the system. A real world data from Building energy system was used to test the model.

Keywords: facility anomaly detection, gaussian mixture model, anomaly score, expectation maximization algorithm

Procedia PDF Downloads 243
146 Optimal Maintenance Policy for a Partially Observable Two-Unit System

Authors: Leila Jafari, Viliam Makis, G. B. Akram Khaleghei

Abstract:

In this paper, we present a maintenance model of a two-unit series system with economic dependence. Unit#1, which is considered to be more expensive and more important, is subject to condition monitoring (CM) at equidistant, discrete time epochs and unit#2, which is not subject to CM, has a general lifetime distribution. The multivariate observation vectors obtained through condition monitoring carry partial information about the hidden state of unit#1, which can be in a healthy or a warning state while operating. Only the failure state is assumed to be observable for both units. The objective is to find an optimal opportunistic maintenance policy minimizing the long-run expected average cost per unit time. The problem is formulated and solved in the partially observable semi-Markov decision process framework. An effective computational algorithm for finding the optimal policy and the minimum average cost is developed and illustrated by a numerical example.

Keywords: condition-based maintenance, semi-Markov decision process, multivariate Bayesian control chart, partially observable system, two-unit system

Procedia PDF Downloads 433
145 A Decision Support System to Detect the Lumbar Disc Disease on the Basis of Clinical MRI

Authors: Yavuz Unal, Kemal Polat, H. Erdinc Kocer

Abstract:

In this study, a decision support system comprising three stages has been proposed to detect the disc abnormalities of the lumbar region. In the first stage named the feature extraction, T2-weighted sagittal and axial Magnetic Resonance Images (MRI) were taken from 55 people and then 27 appearance and shape features were acquired from both sagittal and transverse images. In the second stage named the feature weighting process, k-means clustering based feature weighting (KMCBFW) proposed by Gunes et al. Finally, in the third stage named the classification process, the classifier algorithms including multi-layer perceptron (MLP- neural network), support vector machine (SVM), Naïve Bayes, and decision tree have been used to classify whether the subject has lumbar disc or not. In order to test the performance of the proposed method, the classification accuracy (%), sensitivity, specificity, precision, recall, f-measure, kappa value, and computation times have been used. The best hybrid model is the combination of k-means clustering based feature weighting and decision tree in the detecting of lumbar disc disease based on both sagittal and axial MR images.

Keywords: lumbar disc abnormality, lumbar MRI, lumbar spine, hybrid models, hybrid features, k-means clustering based feature weighting

Procedia PDF Downloads 496
144 Statistical Classification, Downscaling and Uncertainty Assessment for Global Climate Model Outputs

Authors: Queen Suraajini Rajendran, Sai Hung Cheung

Abstract:

Statistical down scaling models are required to connect the global climate model outputs and the local weather variables for climate change impact prediction. For reliable climate change impact studies, the uncertainty associated with the model including natural variability, uncertainty in the climate model(s), down scaling model, model inadequacy and in the predicted results should be quantified appropriately. In this work, a new approach is developed by the authors for statistical classification, statistical down scaling and uncertainty assessment and is applied to Singapore rainfall. It is a robust Bayesian uncertainty analysis methodology and tools based on coupling dependent modeling error with classification and statistical down scaling models in a way that the dependency among modeling errors will impact the results of both classification and statistical down scaling model calibration and uncertainty analysis for future prediction. Singapore data are considered here and the uncertainty and prediction results are obtained. From the results obtained, directions of research for improvement are briefly presented.

Keywords: statistical downscaling, global climate model, climate change, uncertainty

Procedia PDF Downloads 336