Search results for: Bayesian Kernel
33 Support Vector Machine based Intelligent Watermark Decoding for Anticipated Attack
Authors: Syed Fahad Tahir, Asifullah Khan, Abdul Majid, Anwar M. Mirza
Abstract:
In this paper, we present an innovative scheme of blindly extracting message bits from an image distorted by an attack. Support Vector Machine (SVM) is used to nonlinearly classify the bits of the embedded message. Traditionally, a hard decoder is used with the assumption that the underlying modeling of the Discrete Cosine Transform (DCT) coefficients does not appreciably change. In case of an attack, the distribution of the image coefficients is heavily altered. The distribution of the sufficient statistics at the receiving end corresponding to the antipodal signals overlap and a simple hard decoder fails to classify them properly. We are considering message retrieval of antipodal signal as a binary classification problem. Machine learning techniques like SVM is used to retrieve the message, when certain specific class of attacks is most probable. In order to validate SVM based decoding scheme, we have taken Gaussian noise as a test case. We generate a data set using 125 images and 25 different keys. Polynomial kernel of SVM has achieved 100 percent accuracy on test data.Keywords: Bit Correct Ratio (BCR), Grid Search, Intelligent Decoding, Jackknife Technique, Support Vector Machine (SVM), Watermarking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166932 Least-Squares Support Vector Machine for Characterization of Clusters of Microcalcifications
Authors: Baljit Singh Khehra, Amar Partap Singh Pharwaha
Abstract:
Clusters of Microcalcifications (MCCs) are most frequent symptoms of Ductal Carcinoma in Situ (DCIS) recognized by mammography. Least-Square Support Vector Machine (LS-SVM) is a variant of the standard SVM. In the paper, LS-SVM is proposed as a classifier for classifying MCCs as benign or malignant based on relevant extracted features from enhanced mammogram. To establish the credibility of LS-SVM classifier for classifying MCCs, a comparative evaluation of the relative performance of LS-SVM classifier for different kernel functions is made. For comparative evaluation, confusion matrix and ROC analysis are used. Experiments are performed on data extracted from mammogram images of DDSM database. A total of 380 suspicious areas are collected, which contain 235 malignant and 145 benign samples, from mammogram images of DDSM database. A set of 50 features is calculated for each suspicious area. After this, an optimal subset of 23 most suitable features is selected from 50 features by Particle Swarm Optimization (PSO). The results of proposed study are quite promising.
Keywords: Clusters of Microcalcifications, Ductal Carcinoma in Situ, Least-Square Support Vector Machine, Particle Swarm Optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 181231 Moving Object Detection Using Histogram of Uniformly Oriented Gradient
Authors: Wei-Jong Yang, Yu-Siang Su, Pau-Choo Chung, Jar-Ferr Yang
Abstract:
Moving object detection (MOD) is an important issue in advanced driver assistance systems (ADAS). There are two important moving objects, pedestrians and scooters in ADAS. In real-world systems, there exist two important challenges for MOD, including the computational complexity and the detection accuracy. The histogram of oriented gradient (HOG) features can easily detect the edge of object without invariance to changes in illumination and shadowing. However, to reduce the execution time for real-time systems, the image size should be down sampled which would lead the outlier influence to increase. For this reason, we propose the histogram of uniformly-oriented gradient (HUG) features to get better accurate description of the contour of human body. In the testing phase, the support vector machine (SVM) with linear kernel function is involved. Experimental results show the correctness and effectiveness of the proposed method. With SVM classifiers, the real testing results show the proposed HUG features achieve better than classification performance than the HOG ones.
Keywords: Moving object detection, histogram of oriented gradient histogram of oriented gradient, histogram of uniformly-oriented gradient, linear support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 123330 Mixtures of Monotone Networks for Prediction
Authors: Marina Velikova, Hennie Daniels, Ad Feelders
Abstract:
In many data mining applications, it is a priori known that the target function should satisfy certain constraints imposed by, for example, economic theory or a human-decision maker. In this paper we consider partially monotone prediction problems, where the target variable depends monotonically on some of the input variables but not on all. We propose a novel method to construct prediction models, where monotone dependences with respect to some of the input variables are preserved by virtue of construction. Our method belongs to the class of mixture models. The basic idea is to convolute monotone neural networks with weight (kernel) functions to make predictions. By using simulation and real case studies, we demonstrate the application of our method. To obtain sound assessment for the performance of our approach, we use standard neural networks with weight decay and partially monotone linear models as benchmark methods for comparison. The results show that our approach outperforms partially monotone linear models in terms of accuracy. Furthermore, the incorporation of partial monotonicity constraints not only leads to models that are in accordance with the decision maker's expertise, but also reduces considerably the model variance in comparison to standard neural networks with weight decay.Keywords: mixture models, monotone neural networks, partially monotone models, partially monotone problems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 124529 Multiscale Syntheses of Knee Collateral Ligament Stresses: Aggregate Mechanics as a Function of Molecular Properties
Authors: Raouf Mbarki, Fadi Al Khatib, Malek Adouni
Abstract:
Knee collateral ligaments play a significant role in restraining excessive frontal motion (varus/valgus rotations). In this investigation, a multiscale frame was developed based on structural hierarchies of the collateral ligaments starting from the bottom (tropocollagen molecule) to up where the fibred reinforced structure established. Experimental data of failure tensile test were considered as the principal driver of the developed model. This model was calibrated statistically using Bayesian calibration due to the high number of unknown parameters. Then the model is scaled up to fit the real structure of the collateral ligaments and simulated under realistic boundary conditions. Predications have been successful in describing the observed transient response of the collateral ligaments during tensile test under pre- and post-damage loading conditions. Collateral ligaments maximum stresses and strengths were observed near to the femoral insertions, a results that is in good agreement with experimental investigations. Also for the first time, damage initiation and propagation were documented with this model as a function of the cross-link density between tropocollagen molecules.
Keywords: Multiscale model, tropocollagen, fibrils, ligaments.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 59928 Predictive Analytics of Student Performance Determinants in Education
Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi
Abstract:
Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.
Keywords: Student performance, supervised machine learning, prediction, classification, cross-validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 54827 Gas Detection via Machine Learning
Authors: Walaa Khalaf, Calogero Pace, Manlio Gaudioso
Abstract:
We present an Electronic Nose (ENose), which is aimed at identifying the presence of one out of two gases, possibly detecting the presence of a mixture of the two. Estimation of the concentrations of the components is also performed for a volatile organic compound (VOC) constituted by methanol and acetone, for the ranges 40-400 and 22-220 ppm (parts-per-million), respectively. Our system contains 8 sensors, 5 of them being gas sensors (of the class TGS from FIGARO USA, INC., whose sensing element is a tin dioxide (SnO2) semiconductor), the remaining being a temperature sensor (LM35 from National Semiconductor Corporation), a humidity sensor (HIH–3610 from Honeywell), and a pressure sensor (XFAM from Fujikura Ltd.). Our integrated hardware–software system uses some machine learning principles and least square regression principle to identify at first a new gas sample, or a mixture, and then to estimate the concentrations. In particular we adopt a training model using the Support Vector Machine (SVM) approach with linear kernel to teach the system how discriminate among different gases. Then we apply another training model using the least square regression, to predict the concentrations. The experimental results demonstrate that the proposed multiclassification and regression scheme is effective in the identification of the tested VOCs of methanol and acetone with 96.61% correctness. The concentration prediction is obtained with 0.979 and 0.964 correlation coefficient for the predicted versus real concentrations of methanol and acetone, respectively.Keywords: Electronic nose, Least square regression, Mixture ofgases, Support Vector Machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 253826 Comparison of Machine Learning Techniques for Single Imputation on Audiograms
Authors: Sarah Beaver, Renee Bryce
Abstract:
Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125 Hz to 8000 Hz. The data contain patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R2 values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R2 values for the best models for KNN ranges from .89 to .95. The best imputation models received R2 between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our imputation models versus constant imputations by a two percent increase.
Keywords: Machine Learning, audiograms, data imputations, single imputations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15825 Application of Machine Learning Methods to Online Test Error Detection in Semiconductor Test
Authors: Matthias Kirmse, Uwe Petersohn, Elief Paffrath
Abstract:
As in today's semiconductor industries test costs can make up to 50 percent of the total production costs, an efficient test error detection becomes more and more important. In this paper, we present a new machine learning approach to test error detection that should provide a faster recognition of test system faults as well as an improved test error recall. The key idea is to learn a classifier ensemble, detecting typical test error patterns in wafer test results immediately after finishing these tests. Since test error detection has not yet been discussed in the machine learning community, we define central problem-relevant terms and provide an analysis of important domain properties. Finally, we present comparative studies reflecting the failure detection performance of three individual classifiers and three ensemble methods based upon them. As base classifiers we chose a decision tree learner, a support vector machine and a Bayesian network, while the compared ensemble methods were simple and weighted majority vote as well as stacking. For the evaluation, we used cross validation and a specially designed practical simulation. By implementing our approach in a semiconductor test department for the observation of two products, we proofed its practical applicability.
Keywords: Ensemble methods, fault detection, machine learning, semiconductor test.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 227324 Effect of Rollers Differential Speed and Paddy Moisture Content on Performance of Rubber Roll Husker
Authors: S. Firouzi, M.R. Alizadeh, S. Minaei
Abstract:
A study was carried out at the Rice Research Institute of Iran (RRII) to investigate the effect of rollers differential peripheral speed of commercial rubber roll husker and paddy moisture content on the husking index and percentage of broken rice. The experiment was conducted at six levels of rollers differential speed (1.5, 2.2, 2.9, 3.6, 4.3 and 5 m/s) and three levels of paddy moisture content (8-9, 10-11 and 12-13% w.b.). Two common paddy varieties namely, Binam and Khazer, were selected for this study. Results revealed that the effect of rollers differential speed and moisture content significantly (P<0.01) affected percentage of broken brown rice and paddy husking index. Average broken kernel percentage increased from 13 to 14.61% while husking index decreased from 71.64 to 61.81%, as paddy moisture content increased from 8-9 to 12-13%. It was observed that amount of broken rice decreased from 18.83 to 9.97%, when rollers differential speed varied from 1.5 to 5 m/s, while the husking index initially increased and then started to decrease. The mean value of husking index for Khazar variety (64.71%) was significantly lower than that for Binam variety (69.2%). It was concluded that rollers differential speed of 2.9 m/s and moisture content of 8-9% was the most appropriate combination for paddy husking of Binam and Khazar varieties in rubber roll husker.Keywords: husking index, moisture content, paddy, rubber roll husker.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 328723 Texture Feature-Based Language Identification Using Wavelet-Domain BDIP and BVLC Features and FFT Feature
Authors: Ick Hoon Jang, Hoon Jae Lee, Dae Hoon Kwon, Ui Young Pak
Abstract:
In this paper, we propose a texture feature-based language identification using wavelet-domain BDIP (block difference of inverse probabilities) and BVLC (block variance of local correlation coefficients) features and FFT (fast Fourier transform) feature. In the proposed method, wavelet subbands are first obtained by wavelet transform from a test image and denoised by Donoho-s soft-thresholding. BDIP and BVLC operators are next applied to the wavelet subbands. FFT blocks are also obtained by 2D (twodimensional) FFT from the blocks into which the test image is partitioned. Some significant FFT coefficients in each block are selected and magnitude operator is applied to them. Moments for each subband of BDIP and BVLC and for each magnitude of significant FFT coefficients are then computed and fused into a feature vector. In classification, a stabilized Bayesian classifier, which adopts variance thresholding, searches the training feature vector most similar to the test feature vector. Experimental results show that the proposed method with the three operations yields excellent language identification even with rather low feature dimension.Keywords: BDIP, BVLC, FFT, language identification, texture feature, wavelet transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 214922 Learning to Recognize Faces by Local Feature Design and Selection
Authors: Yanwei Pang, Lei Zhang, Zhengkai Liu
Abstract:
Studies in neuroscience suggest that both global and local feature information are crucial for perception and recognition of faces. It is widely believed that local feature is less sensitive to variations caused by illumination, expression and illumination. In this paper, we target at designing and learning local features for face recognition. We designed three types of local features. They are semi-global feature, local patch feature and tangent shape feature. The designing of semi-global feature aims at taking advantage of global-like feature and meanwhile avoiding suppressing AdaBoost algorithm in boosting weak classifies established from small local patches. The designing of local patch feature targets at automatically selecting discriminative features, and is thus different with traditional ways, in which local patches are usually selected manually to cover the salient facial components. Also, shape feature is considered in this paper for frontal view face recognition. These features are selected and combined under the framework of boosting algorithm and cascade structure. The experimental results demonstrate that the proposed approach outperforms the standard eigenface method and Bayesian method. Moreover, the selected local features and observations in the experiments are enlightening to researches in local feature design in face recognition.Keywords: Face recognition, local feature, AdaBoost, subspace analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 159621 Development of a Cost Effective Two Wheel Tractor Mounted Mobile Maize Sheller for Small Farmers in Bangladesh
Authors: M. Israil Hossain, T. P. Tiwari, Ashrafuzzaman Gulandaz, Nusrat Jahan
Abstract:
Two-wheel tractor (power tiller) is a common tillage tool in Bangladesh agriculture for easy access in fragmented land with affordable price of small farmers. Traditional maize sheller needs to be carried from place to place by hooking with two-wheel tractor (2WT) and set up again for shelling operation which takes longer time for preparation of maize shelling. The mobile maize sheller eliminates the transportation problem and can start shelling operation instantly any place as it is attached together with 2WT. It is counterclockwise rotating cylinder, axial flow type sheller, and grain separated with a frictional force between spike tooth and concave. The maize sheller is attached with nuts and bolts in front of the engine base of 2WT. The operating power of the sheller comes from the fly wheel of the engine of the tractor through ‘V” belt pulley arrangement. The average shelling capacity of the mobile sheller is 2.0 t/hr, broken kernel 2.2%, and shelling efficiency 97%. The average maize shelling cost is Tk. 0.22/kg and traditional custom hire rate is Tk.1.0/kg, respectively (1 US$=Tk.78.0). The service provider of the 2WT can transport the mobile maize sheller long distance in operator’s seating position. The manufacturers started the fabrication of mobile maize sheller. This mobile maize sheller is also compatible for the other countries where 2WT is available for farming operation.
Keywords: Cost effective, mobile maize sheller, maize shelling capacity, small farmers, two-wheel tractor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 90620 Effect of Modified Atmosphere Packaging and Storage Temperatures on Quality of Shelled Raw Walnuts
Authors: M. Javanmard
Abstract:
This study was aimed at analyzing the effects of packaging (MAP) and preservation conditions on the packaged fresh walnut kernel quality. The central composite plan was used for evaluating the effect of oxygen (0–10%), carbon dioxide (0-10%), and temperature (4-26 °C) on qualitative characteristics of walnut kernels. Also, the response level technique was used to find the optimal conditions for interactive effects of factors, as well as estimating the best conditions of process using least amount of testing. Measured qualitative parameters were: peroxide index, color, decreased weight, mould and yeast counting test, and sensory evaluation. The results showed that the defined model for peroxide index, color, weight loss, and sensory evaluation is significant (p < 0.001), so that increase of temperature causes the peroxide value, color variation, and weight loss to increase and it reduces the overall acceptability of walnut kernels. An increase in oxygen percentage caused the color variation level and peroxide value to increase and resulted in lower overall acceptability of the walnuts. An increase in CO2 percentage caused the peroxide value to decrease, but did not significantly affect other indices (p ≥ 0.05). Mould and yeast were not found in any samples. Optimal packaging conditions to achieve maximum quality of walnuts include: 1.46% oxygen, 10% carbon dioxide, and temperature of 4 °C.
Keywords: Shelled walnut, MAP, quality, storage temperature.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 113919 Jeffrey's Prior for Unknown Sinusoidal Noise Model via Cramer-Rao Lower Bound
Authors: Samuel A. Phillips, Emmanuel A. Ayanlowo, Rasaki O. Olanrewaju, Olayode Fatoki
Abstract:
This paper employs the Jeffrey's prior technique in the process of estimating the periodograms and frequency of sinusoidal model for unknown noisy time variants or oscillating events (data) in a Bayesian setting. The non-informative Jeffrey's prior was adopted for the posterior trigonometric function of the sinusoidal model such that Cramer-Rao Lower Bound (CRLB) inference was used in carving-out the minimum variance needed to curb the invariance structure effect for unknown noisy time observational and repeated circular patterns. An average monthly oscillating temperature series measured in degree Celsius (0C) from 1901 to 2014 was subjected to the posterior solution of the unknown noisy events of the sinusoidal model via Markov Chain Monte Carlo (MCMC). It was not only deduced that two minutes period is required before completing a cycle of changing temperature from one particular degree Celsius to another but also that the sinusoidal model via the CRLB-Jeffrey's prior for unknown noisy events produced a miniature posterior Maximum A Posteriori (MAP) compare to a known noisy events.
Keywords: Cramer-Rao Lower Bound (CRLB), Jeffrey's prior, Sinusoidal, Maximum A Posteriori (MAP), Markov Chain Monte Carlo (MCMC), Periodograms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 65818 Novel Hybrid Method for Gene Selection and Cancer Prediction
Authors: Liping Jing, Michael K. Ng, Tieyong Zeng
Abstract:
Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.Keywords: Gene Selection, Cancer Prediction, Lasso, Clustering, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 204317 Laser Data Based Automatic Generation of Lane-Level Road Map for Intelligent Vehicles
Authors: Zehai Yu, Hui Zhu, Linglong Lin, Huawei Liang, Biao Yu, Weixin Huang
Abstract:
With the development of intelligent vehicle systems, a high-precision road map is increasingly needed in many aspects. The automatic lane lines extraction and modeling are the most essential steps for the generation of a precise lane-level road map. In this paper, an automatic lane-level road map generation system is proposed. To extract the road markings on the ground, the multi-region Otsu thresholding method is applied, which calculates the intensity value of laser data that maximizes the variance between background and road markings. The extracted road marking points are then projected to the raster image and clustered using a two-stage clustering algorithm. Lane lines are subsequently recognized from these clusters by the shape features of their minimum bounding rectangle. To ensure the storage efficiency of the map, the lane lines are approximated to cubic polynomial curves using a Bayesian estimation approach. The proposed lane-level road map generation system has been tested on urban and expressway conditions in Hefei, China. The experimental results on the datasets show that our method can achieve excellent extraction and clustering effect, and the fitted lines can reach a high position accuracy with an error of less than 10 cm.
Keywords: Curve fitting, lane-level road map, line recognition, multi-thresholding, two-stage clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 51216 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems
Authors: Rodolfo Lorbieski, Silvia Modesto Nassar
Abstract:
Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.Keywords: Stacking, multi-layers, ensemble, multi-class.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 109315 Breast Cancer Survivability Prediction via Classifier Ensemble
Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia
Abstract:
This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.Keywords: Classifier ensemble, breast cancer survivability, data mining, SEER.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167114 Inferring User Preference Using Distance Dependent Chinese Restaurant Process and Weighted Distribution for a Content Based Recommender System
Authors: Bagher Rahimpour Cami, Hamid Hassanpour, Hoda Mashayekhi
Abstract:
Nowadays websites provide a vast number of resources for users. Recommender systems have been developed as an essential element of these websites to provide a personalized environment for users. They help users to retrieve interested resources from large sets of available resources. Due to the dynamic feature of user preference, constructing an appropriate model to estimate the user preference is the major task of recommender systems. Profile matching and latent factors are two main approaches to identify user preference. In this paper, we employed the latent factor and profile matching to cluster the user profile and identify user preference, respectively. The method uses the Distance Dependent Chines Restaurant Process as a Bayesian nonparametric framework to extract the latent factors from the user profile. These latent factors are mapped to user interests and a weighted distribution is used to identify user preferences. We evaluate the proposed method using a real-world data-set that contains news tweets of a news agency (BBC). The experimental results and comparisons show the superior recommendation accuracy of the proposed approach related to existing methods, and its ability to effectively evolve over time.Keywords: Content-based recommender systems, dynamic user modeling, extracting user interests, predicting user preference.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 81513 Study of the Thermal Performance of Bio-Sourced Materials Used as Thermal Insulation in Buildings under Humid Tropical Climate
Authors: Guarry Montrose, Ted Soubdhan
Abstract:
In the fight against climate change, the energy consuming building sector must also be taken into account to solve this problem. In this case thermal insulation of buildings using bio-based materials is an interesting solution. Therefore, the thermal performance of some materials of this type has been studied. The advantages of these natural materials of plant origin are multiple, biodegradable, low economic cost, renewable and readily available. The use of biobased materials is widespread in the building sector in order to replace conventional insulation materials with natural materials. Vegetable fibers are very important because they have good thermal behaviour and good insulating properties. The aim of using bio-sourced materials is in line with the logic of energy control and environmental protection, the approach is to make the inhabitants of the houses comfortable and reduce their energy consumption (energy efficiency). In this research we will present the results of studies carried out on the thermal conductivity of banana leaves, latan leaves, vetivers fibers, palm kernel fibers, sargassum, coconut leaves, sawdust and bulk sugarcane leaves. The study on thermal conductivity was carried out in two ways, on the one hand using the flash method, and on the other hand a so-called hot box experiment was carried out. We will discuss and highlight a number of influential factors such as moisture and air pockets present in the samples on the thermophysical properties of these materials, in particular thermal conductivity. Finally, the result of a thermal performance test of banana leaves on a roof in Haiti will also be presented in this work.
Keywords: Buildings, insulating properties, natural materials of plant origin, thermal performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 95612 Behavioral Analysis of Team Members in Virtual Organization based on Trust Dimension and Learning
Authors: Indiramma M., K. R. Anandakumar
Abstract:
Trust management and Reputation models are becoming integral part of Internet based applications such as CSCW, E-commerce and Grid Computing. Also the trust dimension is a significant social structure and key to social relations within a collaborative community. Collaborative Decision Making (CDM) is a difficult task in the context of distributed environment (information across different geographical locations) and multidisciplinary decisions are involved such as Virtual Organization (VO). To aid team decision making in VO, Decision Support System and social network analysis approaches are integrated. In such situations social learning helps an organization in terms of relationship, team formation, partner selection etc. In this paper we focus on trust learning. Trust learning is an important activity in terms of information exchange, negotiation, collaboration and trust assessment for cooperation among virtual team members. In this paper we have proposed a reinforcement learning which enhances the trust decision making capability of interacting agents during collaboration in problem solving activity. Trust computational model with learning that we present is adapted for best alternate selection of new project in the organization. We verify our model in a multi-agent simulation where the agents in the community learn to identify trustworthy members, inconsistent behavior and conflicting behavior of agents.Keywords: Collaborative Decision making, Trust, Multi Agent System (MAS), Bayesian Network, Reinforcement Learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 189211 An Automatic Bayesian Classification System for File Format Selection
Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan
Abstract:
This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.Keywords: Data mining, digital libraries, digital preservation, file format.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 165910 Region Segmentation based on Gaussian Dirichlet Process Mixture Model and its Application to 3D Geometric Stricture Detection
Authors: Jonghyun Park, Soonyoung Park, Sanggyun Kim, Wanhyun Cho, Sunworl Kim
Abstract:
In general, image-based 3D scenes can now be found in many popular vision systems, computer games and virtual reality tours. So, It is important to segment ROI (region of interest) from input scenes as a preprocessing step for geometric stricture detection in 3D scene. In this paper, we propose a method for segmenting ROI based on tensor voting and Dirichlet process mixture model. In particular, to estimate geometric structure information for 3D scene from a single outdoor image, we apply the tensor voting and Dirichlet process mixture model to a image segmentation. The tensor voting is used based on the fact that homogeneous region in an image are usually close together on a smooth region and therefore the tokens corresponding to centers of these regions have high saliency values. The proposed approach is a novel nonparametric Bayesian segmentation method using Gaussian Dirichlet process mixture model to automatically segment various natural scenes. Finally, our method can label regions of the input image into coarse categories: “ground", “sky", and “vertical" for 3D application. The experimental results show that our method successfully segments coarse regions in many complex natural scene images for 3D.
Keywords: Region segmentation, tensor voting, image-based 3D, geometric structure, Gaussian Dirichlet process mixture model
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18919 Spatial Mapping of Dengue Incidence: A Case Study in Hulu Langat District, Selangor, Malaysia
Authors: Er, A. C., Rosli, M. H., Asmahani A., Mohamad Naim M. R., Harsuzilawati M.
Abstract:
Dengue is a mosquito-borne infection that has peaked to an alarming rate in recent decades. It can be found in tropical and sub-tropical climate. In Malaysia, dengue has been declared as one of the national health threat to the public. This study aimed to map the spatial distributions of dengue cases in the district of Hulu Langat, Selangor via a combination of Geographic Information System (GIS) and spatial statistic tools. Data related to dengue was gathered from the various government health agencies. The location of dengue cases was geocoded using a handheld GPS Juno SB Trimble. A total of 197 dengue cases occurring in 2003 were used in this study. Those data then was aggregated into sub-district level and then converted into GIS format. The study also used population or demographic data as well as the boundary of Hulu Langat. To assess the spatial distribution of dengue cases three spatial statistics method (Moran-s I, average nearest neighborhood (ANN) and kernel density estimation) were applied together with spatial analysis in the GIS environment. Those three indices were used to analyze the spatial distribution and average distance of dengue incidence and to locate the hot spot of dengue cases. The results indicated that the dengue cases was clustered (p < 0.01) when analyze using Moran-s I with z scores 5.03. The results from ANN analysis showed that the average nearest neighbor ratio is less than 1 which is 0.518755 (p < 0.0001). From this result, we can expect the dengue cases pattern in Hulu Langat district is exhibiting a cluster pattern. The z-score for dengue incidence within the district is -13.0525 (p < 0.0001). It was also found that the significant spatial autocorrelation of dengue incidences occurs at an average distance of 380.81 meters (p < 0.0001). Several locations especially residential area also had been identified as the hot spots of dengue cases in the district.
Keywords: Dengue, geographic information system (GIS), spatial analysis, spatial statistics
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 53668 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring
Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti
Abstract:
Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., entropy, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one-class classification (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, principal component analysis (PCA), kernel principal component analysis (KPCA), and autoassociative neural network (ANN) are presented and their performance are compared. It is also shown that, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 95%.
Keywords: Anomaly detection, dimensionality reduction, frequencies selection, modal analysis, neural network, structural health monitoring, vibration measurement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7087 Dengue Disease Mapping with Standardized Morbidity Ratio and Poisson-gamma Model: An Analysis of Dengue Disease in Perak, Malaysia
Authors: N. A. Samat, S. H. Mohd Imam Ma’arof
Abstract:
Dengue disease is an infectious vector-borne viral disease that is commonly found in tropical and sub-tropical regions, especially in urban and semi-urban areas, around the world and including Malaysia. There is no currently available vaccine or chemotherapy for the prevention or treatment of dengue disease. Therefore prevention and treatment of the disease depend on vector surveillance and control measures. Disease risk mapping has been recognized as an important tool in the prevention and control strategies for diseases. The choice of statistical model used for relative risk estimation is important as a good model will subsequently produce a good disease risk map. Therefore, the aim of this study is to estimate the relative risk for dengue disease based initially on the most common statistic used in disease mapping called Standardized Morbidity Ratio (SMR) and one of the earliest applications of Bayesian methodology called Poisson-gamma model. This paper begins by providing a review of the SMR method, which we then apply to dengue data of Perak, Malaysia. We then fit an extension of the SMR method, which is the Poisson-gamma model. Both results are displayed and compared using graph, tables and maps. Results of the analysis shows that the latter method gives a better relative risk estimates compared with using the SMR. The Poisson-gamma model has been demonstrated can overcome the problem of SMR when there is no observed dengue cases in certain regions. However, covariate adjustment in this model is difficult and there is no possibility for allowing spatial correlation between risks in adjacent areas. The drawbacks of this model have motivated many researchers to propose other alternative methods for estimating the risk.
Keywords: Dengue disease, Disease mapping, Standardized Morbidity Ratio, Poisson-gamma model, Relative risk.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32946 Low Sulfur Diesel Like Fuel Oil from Quick Remediation Process of Waste Oil Sludge
Authors: Isam A. H. Al Zubaidi
Abstract:
Low sulfur diesel like fuel oil was produced from a quick remediation process of waste oil sludge (WOS). This quick process will reduce the volume of the WOS in petroleum refineries as well as oil fields by transferring the waste to more beneficial product. The practice includes mixing process of WOS with commercial diesel fuel. Different ratios of WOS to diesel fuel were prepared ranging 1:1 to 20:1 by mass. The mixture was continuously mixed for 10 minutes using a bench-type overhead stirrer, and followed by the filtration process to separate the soil waste from filtrate oil product. The quantity and the physical properties of the oil filtrate were measured. It was found that the addition of up to 15% WOS to diesel fuel was accepted without dramatic changes to the properties of diesel fuel. The amount of WOS was decreased by about 60% by mass. This means that about 60% of the mass of sludge was recovered as light fuel oil. The physical properties of the resulting fuel from 10% sludge mixing ratio showed that the specific gravity, ash content, carbon residue, asphaltene content, viscosity, diesel index, cetane number, and calorific value were affected slightly. The color was changed to light black. The sulfur content was increased also. This requires another process to reduce the sulfur content of resulting light fuel. A desulfurization process was achieved using adsorption techniques with activated biomaterial to reduce the sulfur content to acceptable limits. Adsorption process by ZnCl2 activated date palm kernel powder was effective for improvement of the physical properties of diesel like fuel. The final sulfur content was increased to 0.185 wt%. This diesel like fuel can be used in all tractors, buses, tracks inside and outside the refineries. The solid remaining seems to be smooth and can be mixed with asphalt mixture for asphalting the roads or can be used with other materials as asphalt coating material for constructed buildings. Through this process, valuable fuel has been recovered, and the amount of waste material had decreased.
Keywords: Oil sludge, diesel fuel, blending process, filtration process.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3425 Screening of Factors Affecting the Enzymatic Hydrolysis of Empty Fruit Bunches in Aqueous Ionic Liquid and Locally Produced Cellulase System
Authors: Md. Z. Alam, Amal A. Elgharbawy, Muhammad Moniruzzaman, Nassereldeen A. Kabbashi, Parveen Jamal
Abstract:
The enzymatic hydrolysis of lignocellulosic biomass is one of the obstacles in the process of sugar production, due to the presence of lignin that protects the cellulose molecules against cellulases. Although the pretreatment of lignocellulose in ionic liquid (IL) system has been receiving a lot of interest; however, it requires IL removal with an anti-solvent in order to proceed with the enzymatic hydrolysis. At this point, introducing a compatible cellulase enzyme seems more efficient in this process. A cellulase enzyme that was produced by Trichoderma reesei on palm kernel cake (PKC) exhibited a promising stability in several ILs. The enzyme called PKC-Cel was tested for its optimum pH and temperature as well as its molecular weight. One among evaluated ILs, 1,3-diethylimidazolium dimethyl phosphate [DEMIM] DMP was applied in this study. Evaluation of six factors was executed in Stat-Ease Design Expert V.9, definitive screening design, which are IL/ buffer ratio, temperature, hydrolysis retention time, biomass loading, cellulase loading and empty fruit bunches (EFB) particle size. According to the obtained data, IL-enzyme system shows the highest sugar concentration at 70 °C, 27 hours, 10% IL-buffer, 35% biomass loading, 60 Units/g cellulase and 200 μm particle size. As concluded from the obtained data, not only the PKC-Cel was stable in the presence of the IL, also it was actually stable at a higher temperature than its optimum one. The reducing sugar obtained was 53.468±4.58 g/L which was equivalent to 0.3055 g reducing sugar/g EFB. This approach opens an insight for more studies in order to understand the actual effect of ILs on cellulases and their interactions in the aqueous system. It could also benefit in an efficient production of bioethanol from lignocellulosic biomass.Keywords: Cellulase, hydrolysis, lignocellulose, pretreatment, stability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14864 A Bayesian Classification System for Facilitating an Institutional Risk Profile Definition
Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan
Abstract:
This paper presents an approach for easy creation and classification of institutional risk profiles supporting endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support set up of the most important risk factors. Subsequently, risk profiles employ risk factors classifier and associated configurations to support digital preservation experts with a semi-automatic estimation of endangerment group for file format risk profiles. Our goal is to make use of an expert knowledge base, accuired through a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation of risk factors for a requried dimension for analysis. Using the naive Bayes method, the decision support system recommends to an expert the matching risk profile group for the previously selected institutional risk profile. The proposed methods improve the visibility of risk factor values and the quality of a digital preservation process. The presented approach is designed to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and values of file format risk profiles. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert and to define its profile group. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.Keywords: linked open data, information integration, digital libraries, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 729