Search results for: Bayesian
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 147

Search results for: Bayesian

27 Trust Managementfor Pervasive Computing Environments

Authors: Denis Trcek

Abstract:

Trust is essential for further and wider acceptance of contemporary e-services. It was first addressed almost thirty years ago in Trusted Computer System Evaluation Criteria standard by the US DoD. But this and other proposed approaches of that period were actually solving security. Roughly some ten years ago, methodologies followed that addressed trust phenomenon at its core, and they were based on Bayesian statistics and its derivatives, while some approaches were based on game theory. However, trust is a manifestation of judgment and reasoning processes. It has to be dealt with in accordance with this fact and adequately supported in cyber environment. On the basis of the results in the field of psychology and our own findings, a methodology called qualitative algebra has been developed, which deals with so far overlooked elements of trust phenomenon. It complements existing methodologies and provides a basis for a practical technical solution that supports management of trust in contemporary computing environments. Such solution is also presented at the end of this paper.

Keywords: internet security, trust management, multi-agent systems, reasoning and judgment, modeling and simulation, qualitativealgebra

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1528
26 First Studies of the Influence of Single Gene Perturbations on the Inference of Genetic Networks

Authors: Frank Emmert-Streib, Matthias Dehmer

Abstract:

Inferring the network structure from time series data is a hard problem, especially if the time series is short and noisy. DNA microarray is a technology allowing to monitor the mRNA concentration of thousands of genes simultaneously that produces data of these characteristics. In this study we try to investigate the influence of the experimental design on the quality of the result. More precisely, we investigate the influence of two different types of random single gene perturbations on the inference of genetic networks from time series data. To obtain an objective quality measure for this influence we simulate gene expression values with a biologically plausible model of a known network structure. Within this framework we study the influence of single gene knock-outs in opposite to linearly controlled expression for single genes on the quality of the infered network structure.

Keywords: Dynamic Bayesian networks, microarray data, structure learning, Markov chain Monte Carlo.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506
25 Intelligent Multi-Agent Middleware for Ubiquitous Home Networking Environments

Authors: Minwoo Son, Seung-Hun Lee, Dongkyoo Shin, Dongil Shin

Abstract:

The next stage of the home networking environment is supposed to be ubiquitous, where each piece of material is equipped with an RFID (Radio Frequency Identification) tag. To fully support the ubiquitous environment, home networking middleware should be able to recommend home services based on a user-s interests and efficiently manage information on service usage profiles for the users. Therefore, USN (Ubiquitous Sensor Network) technology, which recognizes and manages a appliance-s state-information (location, capabilities, and so on) by connecting RFID tags is considered. The Intelligent Multi-Agent Middleware (IMAM) architecture was proposed to intelligently manage the mobile RFID-based home networking and to automatically supply information about home services that match a user-s interests. Evaluation results for personalization services for IMAM using Bayesian-Net and Decision Trees are presented.

Keywords: Intelligent Agents, Home Network, Mobile RFID, Intelligent Middleware.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1404
24 Optimizing the Capacity of a Convolutional Neural Network for Image Segmentation and Pattern Recognition

Authors: Yalong Jiang, Zheru Chi

Abstract:

In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.

Keywords: CNN, capsule network, capacity optimization, character recognition, data augmentation; semantic segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 647
23 Modeling the Symptom-Disease Relationship by Using Rough Set Theory and Formal Concept Analysis

Authors: Mert Bal, Hayri Sever, Oya Kalıpsız

Abstract:

Medical Decision Support Systems (MDSSs) are sophisticated, intelligent systems that can provide inference due to lack of information and uncertainty. In such systems, to model the uncertainty various soft computing methods such as Bayesian networks, rough sets, artificial neural networks, fuzzy logic, inductive logic programming and genetic algorithms and hybrid methods that formed from the combination of the few mentioned methods are used. In this study, symptom-disease relationships are presented by a framework which is modeled with a formal concept analysis and theory, as diseases, objects and attributes of symptoms. After a concept lattice is formed, Bayes theorem can be used to determine the relationships between attributes and objects. A discernibility relation that forms the base of the rough sets can be applied to attribute data sets in order to reduce attributes and decrease the complexity of computation.

Keywords: Formal Concept Analysis, Rough Set Theory, Granular Computing, Medical Decision Support System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1767
22 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification

Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman

Abstract:

In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.

Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2646
21 Data Mining Classification Methods Applied in Drug Design

Authors: Mária Stachová, Lukáš Sobíšek

Abstract:

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

Keywords: data mining, classification, drug design, QSAR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2796
20 Optimal Maintenance Policy for a Partially Observable Two-Unit System

Authors: Leila Jafari, Viliam Makis, Akram Khaleghei G.B.

Abstract:

In this paper, we present a maintenance model of a two-unit series system with economic dependence. Unit#1 which is considered to be more expensive and more important, is subject to condition monitoring (CM) at equidistant, discrete time epochs and unit#2, which is not subject to CM has a general lifetime distribution. The multivariate observation vectors obtained through condition monitoring carry partial information about the hidden state of unit#1, which can be in a healthy or a warning state while operating. Only the failure state is assumed to be observable for both units. The objective is to find an optimal opportunistic maintenance policy minimizing the long-run expected average cost per unit time. The problem is formulated and solved in the partially observable semi-Markov decision process framework. An effective computational algorithm for finding the optimal policy and the minimum average cost is developed, illustrated by a numerical example.

Keywords: Condition-Based Maintenance, Semi-Markov Decision Process, Multivariate Bayesian Control Chart, Partially Observable System, Two-unit System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2250
19 Investigation on Performance of Change Point Algorithm in Time Series Dynamical Regimes and Effect of Data Characteristics

Authors: Farhad Asadi, Mohammad Javad Mollakazemi

Abstract:

In this paper, Bayesian online inference in models of data series are constructed by change-points algorithm, which separated the observed time series into independent series and study the change and variation of the regime of the data with related statistical characteristics. variation of statistical characteristics of time series data often represent separated phenomena in the some dynamical system, like a change in state of brain dynamical reflected in EEG signal data measurement or a change in important regime of data in many dynamical system. In this paper, prediction algorithm for studying change point location in some time series data is simulated. It is verified that pattern of proposed distribution of data has important factor on simpler and smother fluctuation of hazard rate parameter and also for better identification of change point locations. Finally, the conditions of how the time series distribution effect on factors in this approach are explained and validated with different time series databases for some dynamical system.

Keywords: Time series, fluctuation in statistical characteristics, optimal learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1758
18 Multiscale Syntheses of Knee Collateral Ligament Stresses: Aggregate Mechanics as a Function of Molecular Properties

Authors: Raouf Mbarki, Fadi Al Khatib, Malek Adouni

Abstract:

Knee collateral ligaments play a significant role in restraining excessive frontal motion (varus/valgus rotations). In this investigation, a multiscale frame was developed based on structural hierarchies of the collateral ligaments starting from the bottom (tropocollagen molecule) to up where the fibred reinforced structure established. Experimental data of failure tensile test were considered as the principal driver of the developed model. This model was calibrated statistically using Bayesian calibration due to the high number of unknown parameters. Then the model is scaled up to fit the real structure of the collateral ligaments and simulated under realistic boundary conditions. Predications have been successful in describing the observed transient response of the collateral ligaments during tensile test under pre- and post-damage loading conditions. Collateral ligaments maximum stresses and strengths were observed near to the femoral insertions, a results that is in good agreement with experimental investigations. Also for the first time, damage initiation and propagation were documented with this model as a function of the cross-link density between tropocollagen molecules.

Keywords: Multiscale model, tropocollagen, fibrils, ligaments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 554
17 Comparison of Machine Learning Techniques for Single Imputation on Audiograms

Authors: Sarah Beaver, Renee Bryce

Abstract:

Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125 Hz to 8000 Hz. The data contain patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R2 values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R2 values for the best models for KNN ranges from .89 to .95. The best imputation models received R2 between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our imputation models versus constant imputations by a two percent increase.

Keywords: Machine Learning, audiograms, data imputations, single imputations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24
16 Application of Machine Learning Methods to Online Test Error Detection in Semiconductor Test

Authors: Matthias Kirmse, Uwe Petersohn, Elief Paffrath

Abstract:

As in today's semiconductor industries test costs can make up to 50 percent of the total production costs, an efficient test error detection becomes more and more important. In this paper, we present a new machine learning approach to test error detection that should provide a faster recognition of test system faults as well as an improved test error recall. The key idea is to learn a classifier ensemble, detecting typical test error patterns in wafer test results immediately after finishing these tests. Since test error detection has not yet been discussed in the machine learning community, we define central problem-relevant terms and provide an analysis of important domain properties. Finally, we present comparative studies reflecting the failure detection performance of three individual classifiers and three ensemble methods based upon them. As base classifiers we chose a decision tree learner, a support vector machine and a Bayesian network, while the compared ensemble methods were simple and weighted majority vote as well as stacking. For the evaluation, we used cross validation and a specially designed practical simulation. By implementing our approach in a semiconductor test department for the observation of two products, we proofed its practical applicability.

Keywords: Ensemble methods, fault detection, machine learning, semiconductor test.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2218
15 Texture Feature-Based Language Identification Using Wavelet-Domain BDIP and BVLC Features and FFT Feature

Authors: Ick Hoon Jang, Hoon Jae Lee, Dae Hoon Kwon, Ui Young Pak

Abstract:

In this paper, we propose a texture feature-based language identification using wavelet-domain BDIP (block difference of inverse probabilities) and BVLC (block variance of local correlation coefficients) features and FFT (fast Fourier transform) feature. In the proposed method, wavelet subbands are first obtained by wavelet transform from a test image and denoised by Donoho-s soft-thresholding. BDIP and BVLC operators are next applied to the wavelet subbands. FFT blocks are also obtained by 2D (twodimensional) FFT from the blocks into which the test image is partitioned. Some significant FFT coefficients in each block are selected and magnitude operator is applied to them. Moments for each subband of BDIP and BVLC and for each magnitude of significant FFT coefficients are then computed and fused into a feature vector. In classification, a stabilized Bayesian classifier, which adopts variance thresholding, searches the training feature vector most similar to the test feature vector. Experimental results show that the proposed method with the three operations yields excellent language identification even with rather low feature dimension.

Keywords: BDIP, BVLC, FFT, language identification, texture feature, wavelet transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2106
14 Learning to Recognize Faces by Local Feature Design and Selection

Authors: Yanwei Pang, Lei Zhang, Zhengkai Liu

Abstract:

Studies in neuroscience suggest that both global and local feature information are crucial for perception and recognition of faces. It is widely believed that local feature is less sensitive to variations caused by illumination, expression and illumination. In this paper, we target at designing and learning local features for face recognition. We designed three types of local features. They are semi-global feature, local patch feature and tangent shape feature. The designing of semi-global feature aims at taking advantage of global-like feature and meanwhile avoiding suppressing AdaBoost algorithm in boosting weak classifies established from small local patches. The designing of local patch feature targets at automatically selecting discriminative features, and is thus different with traditional ways, in which local patches are usually selected manually to cover the salient facial components. Also, shape feature is considered in this paper for frontal view face recognition. These features are selected and combined under the framework of boosting algorithm and cascade structure. The experimental results demonstrate that the proposed approach outperforms the standard eigenface method and Bayesian method. Moreover, the selected local features and observations in the experiments are enlightening to researches in local feature design in face recognition.

Keywords: Face recognition, local feature, AdaBoost, subspace analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1547
13 The Reproducibility and Repeatability of Modified Likelihood Ratio for Forensics Handwriting Examination

Authors: O. Abiodun Adeyinka, B. Adeyemo Adesesan

Abstract:

The forensic use of handwriting depends on the analysis, comparison, and evaluation decisions made by forensic document examiners. When using biometric technology in forensic applications, it is necessary to compute Likelihood Ratio (LR) for quantifying strength of evidence under two competing hypotheses, namely the prosecution and the defense hypotheses wherein a set of assumptions and methods for a given data set will be made. It is therefore important to know how repeatable and reproducible our estimated LR is. This paper evaluated the accuracy and reproducibility of examiners' decisions. Confidence interval for the estimated LR were presented so as not get an incorrect estimate that will be used to deliver wrong judgment in the court of Law. The estimate of LR is fundamentally a Bayesian concept and we used two LR estimators, namely Logistic Regression (LoR) and Kernel Density Estimator (KDE) for this paper. The repeatability evaluation was carried out by retesting the initial experiment after an interval of six months to observe whether examiners would repeat their decisions for the estimated LR. The experimental results, which are based on handwriting dataset, show that LR has different confidence intervals which therefore implies that LR cannot be estimated with the same certainty everywhere. Though the LoR performed better than the KDE when tested using the same dataset, the two LR estimators investigated showed a consistent region in which LR value can be estimated confidently. These two findings advance our understanding of LR when used in computing the strength of evidence in handwriting using forensics.

Keywords: Logistic Regression LoR, Kernel Density Estimator KDE, Handwriting, Confidence Interval, Repeatability, Reproducibility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 405
12 Jeffrey's Prior for Unknown Sinusoidal Noise Model via Cramer-Rao Lower Bound

Authors: Samuel A. Phillips, Emmanuel A. Ayanlowo, Rasaki O. Olanrewaju, Olayode Fatoki

Abstract:

This paper employs the Jeffrey's prior technique in the process of estimating the periodograms and frequency of sinusoidal model for unknown noisy time variants or oscillating events (data) in a Bayesian setting. The non-informative Jeffrey's prior was adopted for the posterior trigonometric function of the sinusoidal model such that Cramer-Rao Lower Bound (CRLB) inference was used in carving-out the minimum variance needed to curb the invariance structure effect for unknown noisy time observational and repeated circular patterns. An average monthly oscillating temperature series measured in degree Celsius (0C) from 1901 to 2014 was subjected to the posterior solution of the unknown noisy events of the sinusoidal model via Markov Chain Monte Carlo (MCMC). It was not only deduced that two minutes period is required before completing a cycle of changing temperature from one particular degree Celsius to another but also that the sinusoidal model via the CRLB-Jeffrey's prior for unknown noisy events produced a miniature posterior Maximum A Posteriori (MAP) compare to a known noisy events.

Keywords: Cramer-Rao Lower Bound (CRLB), Jeffrey's prior, Sinusoidal, Maximum A Posteriori (MAP), Markov Chain Monte Carlo (MCMC), Periodograms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 602
11 Novel Hybrid Method for Gene Selection and Cancer Prediction

Authors: Liping Jing, Michael K. Ng, Tieyong Zeng

Abstract:

Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.

Keywords: Gene Selection, Cancer Prediction, Lasso, Clustering, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1994
10 Laser Data Based Automatic Generation of Lane-Level Road Map for Intelligent Vehicles

Authors: Zehai Yu, Hui Zhu, Linglong Lin, Huawei Liang, Biao Yu, Weixin Huang

Abstract:

With the development of intelligent vehicle systems, a high-precision road map is increasingly needed in many aspects. The automatic lane lines extraction and modeling are the most essential steps for the generation of a precise lane-level road map. In this paper, an automatic lane-level road map generation system is proposed. To extract the road markings on the ground, the multi-region Otsu thresholding method is applied, which calculates the intensity value of laser data that maximizes the variance between background and road markings. The extracted road marking points are then projected to the raster image and clustered using a two-stage clustering algorithm. Lane lines are subsequently recognized from these clusters by the shape features of their minimum bounding rectangle. To ensure the storage efficiency of the map, the lane lines are approximated to cubic polynomial curves using a Bayesian estimation approach. The proposed lane-level road map generation system has been tested on urban and expressway conditions in Hefei, China. The experimental results on the datasets show that our method can achieve excellent extraction and clustering effect, and the fitted lines can reach a high position accuracy with an error of less than 10 cm.

Keywords: Curve fitting, lane-level road map, line recognition, multi-thresholding, two-stage clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 462
9 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Authors: Rodolfo Lorbieski, Silvia Modesto Nassar

Abstract:

Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.

Keywords: Stacking, multi-layers, ensemble, multi-class.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1044
8 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: Classifier ensemble, breast cancer survivability, data mining, SEER.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1621
7 Inferring User Preference Using Distance Dependent Chinese Restaurant Process and Weighted Distribution for a Content Based Recommender System

Authors: Bagher Rahimpour Cami, Hamid Hassanpour, Hoda Mashayekhi

Abstract:

Nowadays websites provide a vast number of resources for users. Recommender systems have been developed as an essential element of these websites to provide a personalized environment for users. They help users to retrieve interested resources from large sets of available resources. Due to the dynamic feature of user preference, constructing an appropriate model to estimate the user preference is the major task of recommender systems. Profile matching and latent factors are two main approaches to identify user preference. In this paper, we employed the latent factor and profile matching to cluster the user profile and identify user preference, respectively. The method uses the Distance Dependent Chines Restaurant Process as a Bayesian nonparametric framework to extract the latent factors from the user profile. These latent factors are mapped to user interests and a weighted distribution is used to identify user preferences. We evaluate the proposed method using a real-world data-set that contains news tweets of a news agency (BBC). The experimental results and comparisons show the superior recommendation accuracy of the proposed approach related to existing methods, and its ability to effectively evolve over time.

Keywords: Content-based recommender systems, dynamic user modeling, extracting user interests, predicting user preference.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 767
6 Behavioral Analysis of Team Members in Virtual Organization based on Trust Dimension and Learning

Authors: Indiramma M., K. R. Anandakumar

Abstract:

Trust management and Reputation models are becoming integral part of Internet based applications such as CSCW, E-commerce and Grid Computing. Also the trust dimension is a significant social structure and key to social relations within a collaborative community. Collaborative Decision Making (CDM) is a difficult task in the context of distributed environment (information across different geographical locations) and multidisciplinary decisions are involved such as Virtual Organization (VO). To aid team decision making in VO, Decision Support System and social network analysis approaches are integrated. In such situations social learning helps an organization in terms of relationship, team formation, partner selection etc. In this paper we focus on trust learning. Trust learning is an important activity in terms of information exchange, negotiation, collaboration and trust assessment for cooperation among virtual team members. In this paper we have proposed a reinforcement learning which enhances the trust decision making capability of interacting agents during collaboration in problem solving activity. Trust computational model with learning that we present is adapted for best alternate selection of new project in the organization. We verify our model in a multi-agent simulation where the agents in the community learn to identify trustworthy members, inconsistent behavior and conflicting behavior of agents.

Keywords: Collaborative Decision making, Trust, Multi Agent System (MAS), Bayesian Network, Reinforcement Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1846
5 An Automatic Bayesian Classification System for File Format Selection

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Keywords: Data mining, digital libraries, digital preservation, file format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1612
4 Region Segmentation based on Gaussian Dirichlet Process Mixture Model and its Application to 3D Geometric Stricture Detection

Authors: Jonghyun Park, Soonyoung Park, Sanggyun Kim, Wanhyun Cho, Sunworl Kim

Abstract:

In general, image-based 3D scenes can now be found in many popular vision systems, computer games and virtual reality tours. So, It is important to segment ROI (region of interest) from input scenes as a preprocessing step for geometric stricture detection in 3D scene. In this paper, we propose a method for segmenting ROI based on tensor voting and Dirichlet process mixture model. In particular, to estimate geometric structure information for 3D scene from a single outdoor image, we apply the tensor voting and Dirichlet process mixture model to a image segmentation. The tensor voting is used based on the fact that homogeneous region in an image are usually close together on a smooth region and therefore the tokens corresponding to centers of these regions have high saliency values. The proposed approach is a novel nonparametric Bayesian segmentation method using Gaussian Dirichlet process mixture model to automatically segment various natural scenes. Finally, our method can label regions of the input image into coarse categories: “ground", “sky", and “vertical" for 3D application. The experimental results show that our method successfully segments coarse regions in many complex natural scene images for 3D.

Keywords: Region segmentation, tensor voting, image-based 3D, geometric structure, Gaussian Dirichlet process mixture model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1845
3 Dengue Disease Mapping with Standardized Morbidity Ratio and Poisson-gamma Model: An Analysis of Dengue Disease in Perak, Malaysia

Authors: N. A. Samat, S. H. Mohd Imam Ma’arof

Abstract:

Dengue disease is an infectious vector-borne viral disease that is commonly found in tropical and sub-tropical regions, especially in urban and semi-urban areas, around the world and including Malaysia. There is no currently available vaccine or chemotherapy for the prevention or treatment of dengue disease. Therefore prevention and treatment of the disease depend on vector surveillance and control measures. Disease risk mapping has been recognized as an important tool in the prevention and control strategies for diseases. The choice of statistical model used for relative risk estimation is important as a good model will subsequently produce a good disease risk map. Therefore, the aim of this study is to estimate the relative risk for dengue disease based initially on the most common statistic used in disease mapping called Standardized Morbidity Ratio (SMR) and one of the earliest applications of Bayesian methodology called Poisson-gamma model. This paper begins by providing a review of the SMR method, which we then apply to dengue data of Perak, Malaysia. We then fit an extension of the SMR method, which is the Poisson-gamma model. Both results are displayed and compared using graph, tables and maps. Results of the analysis shows that the latter method gives a better relative risk estimates compared with using the SMR. The Poisson-gamma model has been demonstrated can overcome the problem of SMR when there is no observed dengue cases in certain regions. However, covariate adjustment in this model is difficult and there is no possibility for allowing spatial correlation between risks in adjacent areas. The drawbacks of this model have motivated many researchers to propose other alternative methods for estimating the risk.

Keywords: Dengue disease, Disease mapping, Standardized Morbidity Ratio, Poisson-gamma model, Relative risk.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3210
2 A Bayesian Classification System for Facilitating an Institutional Risk Profile Definition

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for easy creation and classification of institutional risk profiles supporting endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support set up of the most important risk factors. Subsequently, risk profiles employ risk factors classifier and associated configurations to support digital preservation experts with a semi-automatic estimation of endangerment group for file format risk profiles. Our goal is to make use of an expert knowledge base, accuired through a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation of risk factors for a requried dimension for analysis. Using the naive Bayes method, the decision support system recommends to an expert the matching risk profile group for the previously selected institutional risk profile. The proposed methods improve the visibility of risk factor values and the quality of a digital preservation process. The presented approach is designed to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and values of file format risk profiles. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert and to define its profile group. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.

Keywords: linked open data, information integration, digital libraries, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 672
1 Improving 99mTc-tetrofosmin Myocardial Perfusion Images by Time Subtraction Technique

Authors: Yasuyuki Takahashi, Hayato Ishimura, Masao Miyagawa, Teruhito Mochizuki

Abstract:

Quantitative measurement of myocardium perfusion is possible with single photon emission computed tomography (SPECT) using a semiconductor detector. However, accumulation of 99mTc-tetrofosmin in the liver may make it difficult to assess that accurately in the inferior myocardium. Our idea is to reduce the high accumulation in the liver by using dynamic SPECT imaging and a technique called time subtraction. We evaluated the performance of a new SPECT system with a cadmium-zinc-telluride solid-state semi- conductor detector (Discovery NM 530c; GE Healthcare). Our system acquired list-mode raw data over 10 minutes for a typical patient. From the data, ten SPECT images were reconstructed, one for every minute of acquired data. Reconstruction with the semiconductor detector was based on an implementation of a 3-D iterative Bayesian reconstruction algorithm. We studied 20 patients with coronary artery disease (mean age 75.4 ± 12.1 years; range 42-86; 16 males and 4 females). In each subject, 259 MBq of 99mTc-tetrofosmin was injected intravenously. We performed both a phantom and a clinical study using dynamic SPECT. An approximation to a liver-only image is obtained by reconstructing an image from the early projections during which time the liver accumulation dominates (0.5~2.5 minutes SPECT image-5~10 minutes SPECT image). The extracted liver-only image is then subtracted from a later SPECT image that shows both the liver and the myocardial uptake (5~10 minutes SPECT image-liver-only image). The time subtraction of liver was possible in both a phantom and the clinical study. The visualization of the inferior myocardium was improved. In past reports, higher accumulation in the myocardium due to the overlap of the liver is un-diagnosable. Using our time subtraction method, the image quality of the 99mTc-tetorofosmin myocardial SPECT image is considerably improved.

Keywords: 99mTc-tetrofosmin, dynamic SPECT, time subtraction, semiconductor detector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 971