Search results for: classifiers ensemble
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 343

Search results for: classifiers ensemble

223 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: breast cancer, diagnosis, machine learning, biomarker classification, neural network

Procedia PDF Downloads 96
222 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 100
221 Detection of Powdery Mildew Disease in Strawberry Using Image Texture and Supervised Classifiers

Authors: Sultan Mahmud, Qamar Zaman, Travis Esau, Young Chang

Abstract:

Strawberry powdery mildew (PM) is a serious disease that has a significant impact on strawberry production. Field scouting is still a major way to find PM disease, which is not only labor intensive but also almost impossible to monitor disease severity. To reduce the loss caused by PM disease and achieve faster automatic detection of the disease, this paper proposes an approach for detection of the disease, based on image texture and classified with support vector machines (SVMs) and k-nearest neighbors (kNNs). The methodology of the proposed study is based on image processing which is composed of five main steps including image acquisition, pre-processing, segmentation, features extraction and classification. Two strawberry fields were used in this study. Images of healthy leaves and leaves infected with PM (Sphaerotheca macularis) disease under artificial cloud lighting condition. Colour thresholding was utilized to segment all images before textural analysis. Colour co-occurrence matrix (CCM) was introduced for extraction of textural features. Forty textural features, related to a physiological parameter of leaves were extracted from CCM of National television system committee (NTSC) luminance, hue, saturation and intensity (HSI) images. The normalized feature data were utilized for training and validation, respectively, using developed classifiers. The classifiers have experimented with internal, external and cross-validations. The best classifier was selected based on their performance and accuracy. Experimental results suggested that SVMs classifier showed 98.33%, 85.33%, 87.33%, 93.33% and 95.0% of accuracy on internal, external-I, external-II, 4-fold cross and 5-fold cross-validation, respectively. Whereas, kNNs results represented 90.0%, 72.00%, 74.66%, 89.33% and 90.3% of classification accuracy, respectively. The outcome of this study demonstrated that SVMs classified PM disease with a highest overall accuracy of 91.86% and 1.1211 seconds of processing time. Therefore, overall results concluded that the proposed study can significantly support an accurate and automatic identification and recognition of strawberry PM disease with SVMs classifier.

Keywords: powdery mildew, image processing, textural analysis, color co-occurrence matrix, support vector machines, k-nearest neighbors

Procedia PDF Downloads 95
220 Accelerating Quantum Chemistry Calculations: Machine Learning for Efficient Evaluation of Electron-Repulsion Integrals

Authors: Nishant Rodrigues, Nicole Spanedda, Chilukuri K. Mohan, Arindam Chakraborty

Abstract:

A crucial objective in quantum chemistry is the computation of the energy levels of chemical systems. This task requires electron-repulsion integrals as inputs, and the steep computational cost of evaluating these integrals poses a major numerical challenge in efficient implementation of quantum chemical software. This work presents a moment-based machine-learning approach for the efficient evaluation of electron-repulsion integrals. These integrals were approximated using linear combinations of a small number of moments. Machine learning algorithms were applied to estimate the coefficients in the linear combination. A random forest approach was used to identify promising features using a recursive feature elimination approach, which performed best for learning the sign of each coefficient but not the magnitude. A neural network with two hidden layers were then used to learn the coefficient magnitudes along with an iterative feature masking approach to perform input vector compression, identifying a small subset of orbitals whose coefficients are sufficient for the quantum state energy computation. Finally, a small ensemble of neural networks (with a median rule for decision fusion) was shown to improve results when compared to a single network.

Keywords: quantum energy calculations, atomic orbitals, electron-repulsion integrals, ensemble machine learning, random forests, neural networks, feature extraction

Procedia PDF Downloads 68
219 Artificial Intelligence Assisted Sentiment Analysis of Hotel Reviews Using Topic Modeling

Authors: Sushma Ghogale

Abstract:

With a surge in user-generated content or feedback or reviews on the internet, it has become possible and important to know consumers' opinions about products and services. This data is important for both potential customers and businesses providing the services. Data from social media is attracting significant attention and has become the most prominent channel of expressing an unregulated opinion. Prospective customers look for reviews from experienced customers before deciding to buy a product or service. Several websites provide a platform for users to post their feedback for the provider and potential customers. However, the biggest challenge in analyzing such data is in extracting latent features and providing term-level analysis of the data. This paper proposes an approach to use topic modeling to classify the reviews into topics and conduct sentiment analysis to mine the opinions. This approach can analyse and classify latent topics mentioned by reviewers on business sites or review sites, or social media using topic modeling to identify the importance of each topic. It is followed by sentiment analysis to assess the satisfaction level of each topic. This approach provides a classification of hotel reviews using multiple machine learning techniques and comparing different classifiers to mine the opinions of user reviews through sentiment analysis. This experiment concludes that Multinomial Naïve Bayes classifier produces higher accuracy than other classifiers.

Keywords: latent Dirichlet allocation, topic modeling, text classification, sentiment analysis

Procedia PDF Downloads 75
218 Classification Using Worldview-2 Imagery of Giant Panda Habitat in Wolong, Sichuan Province, China

Authors: Yunwei Tang, Linhai Jing, Hui Li, Qingjie Liu, Xiuxia Li, Qi Yan, Haifeng Ding

Abstract:

The giant panda (Ailuropoda melanoleuca) is an endangered species, mainly live in central China, where bamboos act as the main food source of wild giant pandas. Knowledge of spatial distribution of bamboos therefore becomes important for identifying the habitat of giant pandas. There have been ongoing studies for mapping bamboos and other tree species using remote sensing. WorldView-2 (WV-2) is the first high resolution commercial satellite with eight Multi-Spectral (MS) bands. Recent studies demonstrated that WV-2 imagery has a high potential in classification of tree species. The advanced classification techniques are important for utilising high spatial resolution imagery. It is generally agreed that object-based image analysis is a more desirable method than pixel-based analysis in processing high spatial resolution remotely sensed data. Classifiers that use spatial information combined with spectral information are known as contextual classifiers. It is suggested that contextual classifiers can achieve greater accuracy than non-contextual classifiers. Thus, spatial correlation can be incorporated into classifiers to improve classification results. The study area is located at Wuyipeng area in Wolong, Sichuan Province. The complex environment makes it difficult for information extraction since bamboos are sparsely distributed, mixed with brushes, and covered by other trees. Extensive fieldworks in Wuyingpeng were carried out twice. The first one was on 11th June, 2014, aiming at sampling feature locations for geometric correction and collecting training samples for classification. The second fieldwork was on 11th September, 2014, for the purposes of testing the classification results. In this study, spectral separability analysis was first performed to select appropriate MS bands for classification. Also, the reflectance analysis provided information for expanding sample points under the circumstance of knowing only a few. Then, a spatially weighted object-based k-nearest neighbour (k-NN) classifier was applied to the selected MS bands to identify seven land cover types (bamboo, conifer, broadleaf, mixed forest, brush, bare land, and shadow), accounting for spatial correlation within classes using geostatistical modelling. The spatially weighted k-NN method was compared with three alternatives: the traditional k-NN classifier, the Support Vector Machine (SVM) method and the Classification and Regression Tree (CART). Through field validation, it was proved that the classification result obtained using the spatially weighted k-NN method has the highest overall classification accuracy (77.61%) and Kappa coefficient (0.729); the producer’s accuracy and user’s accuracy achieve 81.25% and 95.12% for the bamboo class, respectively, also higher than the other methods. Photos of tree crowns were taken at sample locations using a fisheye camera, so the canopy density could be estimated. It is found that it is difficult to identify bamboo in the areas with a large canopy density (over 0.70); it is possible to extract bamboos in the areas with a median canopy density (from 0.2 to 0.7) and in a sparse forest (canopy density is less than 0.2). In summary, this study explores the ability of WV-2 imagery for bamboo extraction in a mountainous region in Sichuan. The study successfully identified the bamboo distribution, providing supporting knowledge for assessing the habitats of giant pandas.

Keywords: bamboo mapping, classification, geostatistics, k-NN, worldview-2

Procedia PDF Downloads 289
217 Performance Study of Classification Algorithms for Consumer Online Shopping Attitudes and Behavior Using Data Mining

Authors: Rana Alaa El-Deen Ahmed, M. Elemam Shehab, Shereen Morsy, Nermeen Mekawie

Abstract:

With the growing popularity and acceptance of e-commerce platforms, users face an ever increasing burden in actually choosing the right product from the large number of online offers. Thus, techniques for personalization and shopping guides are needed by users. For a pleasant and successful shopping experience, users need to know easily which products to buy with high confidence. Since selling a wide variety of products has become easier due to the popularity of online stores, online retailers are able to sell more products than a physical store. The disadvantage is that the customers might not find products they need. In this research the customer will be able to find the products he is searching for, because recommender systems are used in some ecommerce web sites. Recommender system learns from the information about customers and products and provides appropriate personalized recommendations to customers to find the needed product. In this paper eleven classification algorithms are comparatively tested to find the best classifier fit for consumer online shopping attitudes and behavior in the experimented dataset. The WEKA knowledge analysis tool, which is an open source data mining workbench software used in comparing conventional classifiers to get the best classifier was used in this research. In this research by using the data mining tool (WEKA) with the experimented classifiers the results show that decision table and filtered classifier gives the highest accuracy and the lowest accuracy classification via clustering and simple cart.

Keywords: classification, data mining, machine learning, online shopping, WEKA

Procedia PDF Downloads 329
216 Machine Learning Classification of Fused Sentinel-1 and Sentinel-2 Image Data Towards Mapping Fruit Plantations in Highly Heterogenous Landscapes

Authors: Yingisani Chabalala, Elhadi Adam, Khalid Adem Ali

Abstract:

Mapping smallholder fruit plantations using optical data is challenging due to morphological landscape heterogeneity and crop types having overlapped spectral signatures. Furthermore, cloud covers limit the use of optical sensing, especially in subtropical climates where they are persistent. This research assessed the effectiveness of Sentinel-1 (S1) and Sentinel-2 (S2) data for mapping fruit trees and co-existing land-use types by using support vector machine (SVM) and random forest (RF) classifiers independently. These classifiers were also applied to fused data from the two sensors. Feature ranks were extracted using the RF mean decrease accuracy (MDA) and forward variable selection (FVS) to identify optimal spectral windows to classify fruit trees. Based on RF MDA and FVS, the SVM classifier resulted in relatively high classification accuracy with overall accuracy (OA) = 0.91.6% and kappa coefficient = 0.91% when applied to the fused satellite data. Application of SVM to S1, S2, S2 selected variables and S1S2 fusion independently produced OA = 27.64, Kappa coefficient = 0.13%; OA= 87%, Kappa coefficient = 86.89%; OA = 69.33, Kappa coefficient = 69. %; OA = 87.01%, Kappa coefficient = 87%, respectively. Results also indicated that the optimal spectral bands for fruit tree mapping are green (B3) and SWIR_2 (B10) for S2, whereas for S1, the vertical-horizontal (VH) polarization band. Including the textural metrics from the VV channel improved crop discrimination and co-existing land use cover types. The fusion approach proved robust and well-suited for accurate smallholder fruit plantation mapping.

Keywords: smallholder agriculture, fruit trees, data fusion, precision agriculture

Procedia PDF Downloads 19
215 Credit Card Fraud Detection with Ensemble Model: A Meta-Heuristic Approach

Authors: Gong Zhilin, Jing Yang, Jian Yin

Abstract:

The purpose of this paper is to develop a novel system for credit card fraud detection based on sequential modeling of data using hybrid deep learning models. The projected model encapsulates five major phases are pre-processing, imbalance-data handling, feature extraction, optimal feature selection, and fraud detection with an ensemble classifier. The collected raw data (input) is pre-processed to enhance the quality of the data through alleviation of the missing data, noisy data as well as null values. The pre-processed data are class imbalanced in nature, and therefore they are handled effectively with the K-means clustering-based SMOTE model. From the balanced class data, the most relevant features like improved Principal Component Analysis (PCA), statistical features (mean, median, standard deviation) and higher-order statistical features (skewness and kurtosis). Among the extracted features, the most optimal features are selected with the Self-improved Arithmetic Optimization Algorithm (SI-AOA). This SI-AOA model is the conceptual improvement of the standard Arithmetic Optimization Algorithm. The deep learning models like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and optimized Quantum Deep Neural Network (QDNN). The LSTM and CNN are trained with the extracted optimal features. The outcomes from LSTM and CNN will enter as input to optimized QDNN that provides the final detection outcome. Since the QDNN is the ultimate detector, its weight function is fine-tuned with the Self-improved Arithmetic Optimization Algorithm (SI-AOA).

Keywords: credit card, data mining, fraud detection, money transactions

Procedia PDF Downloads 102
214 Comparison of Deep Learning and Machine Learning Algorithms to Diagnose and Predict Breast Cancer

Authors: F. Ghazalnaz Sharifonnasabi, Iman Makhdoom

Abstract:

Breast cancer is a serious health concern that affects many people around the world. According to a study published in the Breast journal, the global burden of breast cancer is expected to increase significantly over the next few decades. The number of deaths from breast cancer has been increasing over the years, but the age-standardized mortality rate has decreased in some countries. It’s important to be aware of the risk factors for breast cancer and to get regular check- ups to catch it early if it does occur. Machin learning techniques have been used to aid in the early detection and diagnosis of breast cancer. These techniques, that have been shown to be effective in predicting and diagnosing the disease, have become a research hotspot. In this study, we consider two deep learning approaches including: Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). We also considered the five-machine learning algorithm titled: Decision Tree (C4.5), Naïve Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) Algorithm and XGBoost (eXtreme Gradient Boosting) on the Breast Cancer Wisconsin Diagnostic dataset. We have carried out the process of evaluating and comparing classifiers involving selecting appropriate metrics to evaluate classifier performance and selecting an appropriate tool to quantify this performance. The main purpose of the study is predicting and diagnosis breast cancer, applying the mentioned algorithms and also discovering of the most effective with respect to confusion matrix, accuracy and precision. It is realized that CNN outperformed all other classifiers and achieved the highest accuracy (0.982456). The work is implemented in the Anaconda environment based on Python programing language.

Keywords: breast cancer, multi-layer perceptron, Naïve Bayesian, SVM, decision tree, convolutional neural network, XGBoost, KNN

Procedia PDF Downloads 43
213 Evaluation of Random Forest and Support Vector Machine Classification Performance for the Prediction of Early Multiple Sclerosis from Resting State FMRI Connectivity Data

Authors: V. Saccà, A. Sarica, F. Novellino, S. Barone, T. Tallarico, E. Filippelli, A. Granata, P. Valentino, A. Quattrone

Abstract:

The work aim was to evaluate how well Random Forest (RF) and Support Vector Machine (SVM) algorithms could support the early diagnosis of Multiple Sclerosis (MS) from resting-state functional connectivity data. In particular, we wanted to explore the ability in distinguishing between controls and patients of mean signals extracted from ICA components corresponding to 15 well-known networks. Eighteen patients with early-MS (mean-age 37.42±8.11, 9 females) were recruited according to McDonald and Polman, and matched for demographic variables with 19 healthy controls (mean-age 37.55±14.76, 10 females). MRI was acquired by a 3T scanner with 8-channel head coil: (a)whole-brain T1-weighted; (b)conventional T2-weighted; (c)resting-state functional MRI (rsFMRI), 200 volumes. Estimated total lesion load (ml) and number of lesions were calculated using LST-toolbox from the corrected T1 and FLAIR. All rsFMRIs were pre-processed using tools from the FMRIB's Software Library as follows: (1) discarding of the first 5 volumes to remove T1 equilibrium effects, (2) skull-stripping of images, (3) motion and slice-time correction, (4) denoising with high-pass temporal filter (128s), (5) spatial smoothing with a Gaussian kernel of FWHM 8mm. No statistical significant differences (t-test, p < 0.05) were found between the two groups in the mean Euclidian distance and the mean Euler angle. WM and CSF signal together with 6 motion parameters were regressed out from the time series. We applied an independent component analysis (ICA) with the GIFT-toolbox using the Infomax approach with number of components=21. Fifteen mean components were visually identified by two experts. The resulting z-score maps were thresholded and binarized to extract the mean signal of the 15 networks for each subject. Statistical and machine learning analysis were then conducted on this dataset composed of 37 rows (subjects) and 15 features (mean signal in the network) with R language. The dataset was randomly splitted into training (75%) and test sets and two different classifiers were trained: RF and RBF-SVM. We used the intrinsic feature selection of RF, based on the Gini index, and recursive feature elimination (rfe) for the SVM, to obtain a rank of the most predictive variables. Thus, we built two new classifiers only on the most important features and we evaluated the accuracies (with and without feature selection) on test-set. The classifiers, trained on all the features, showed very poor accuracies on training (RF:58.62%, SVM:65.52%) and test sets (RF:62.5%, SVM:50%). Interestingly, when feature selection by RF and rfe-SVM were performed, the most important variable was the sensori-motor network I in both cases. Indeed, with only this network, RF and SVM classifiers reached an accuracy of 87.5% on test-set. More interestingly, the only misclassified patient resulted to have the lowest value of lesion volume. We showed that, with two different classification algorithms and feature selection approaches, the best discriminant network between controls and early MS, was the sensori-motor I. Similar importance values were obtained for the sensori-motor II, cerebellum and working memory networks. These findings, in according to the early manifestation of motor/sensorial deficits in MS, could represent an encouraging step toward the translation to the clinical diagnosis and prognosis.

Keywords: feature selection, machine learning, multiple sclerosis, random forest, support vector machine

Procedia PDF Downloads 216
212 Climate Change Effects in a Mediterranean Island and Streamflow Changes for a Small Basin Using Euro-Cordex Regional Climate Simulations Combined with the SWAT Model

Authors: Pier Andrea Marras, Daniela Lima, Pedro Matos Soares, Rita Maria Cardoso, Daniela Medas, Elisabetta Dore, Giovanni De Giudici

Abstract:

Climate change effects on the hydrologic cycle are the main concern for the evaluation of water management strategies. Climate models project scenarios of precipitation changes in the future, considering greenhouse emissions. In this study, the EURO-CORDEX (European Coordinated Regional Downscaling Experiment) climate models were first evaluated in a Mediterranean island (Sardinia) against observed precipitation for a historical reference period (1976-2005). A weighted multi-model ensemble (ENS) was built, weighting the single models based on their ability to reproduce observed rainfall. Future projections (2071-2100) were carried out using the 8.5 RCP emissions scenario to evaluate changes in precipitations. ENS was then used as climate forcing for the SWAT model (Soil and Water Assessment Tool), with the aim to assess the consequences of such projected changes on streamflow and runoff of two small catchments located in the South-West Sardinia. Results showed that a decrease of mean rainfall values, up to -25 % at yearly scale, is expected for the future, along with an increase of extreme precipitation events. Particularly in the eastern and southern areas, extreme events are projected to increase by 30%. Such changes reflect on the hydrologic cycle with a decrease of mean streamflow and runoff, except in spring, when runoff is projected to increase by 20-30%. These results stress that the Mediterranean is a hotspot for climate change, and the use of model tools can provide very useful information to adopt water and land management strategies to deal with such changes.

Keywords: EURO-CORDEX, climate change, hydrology, SWAT model, Sardinia, multi-model ensemble

Procedia PDF Downloads 184
211 Molecular Dynamic Simulation of CO2 Absorption into Mixed Aqueous Solutions MDEA/PZ

Authors: N. Harun, E. E. Masiren, W. H. W. Ibrahim, F. Adam

Abstract:

Amine absorption process is an approach for mitigation of CO2 from flue gas that produces from power plant. This process is the most common system used in chemical and oil industries for gas purification to remove acid gases. On the challenges of this process is high energy requirement for solvent regeneration to release CO2. In the past few years, mixed alkanolamines have received increasing attention. In most cases, the mixtures contain N-methyldiethanolamine (MDEA) as the base amine with the addition of one or two more reactive amines such as PZ. The reason for the application of such blend amine is to take advantage of high reaction rate of CO2 with the activator combined with the advantages of the low heat of regeneration of MDEA. Several experimental and simulation studies have been undertaken to understand this process using blend MDEA/PZ solvent. Despite those studies, the mechanism of CO2 absorption into the aqueous MDEA is not well understood and available knowledge within the open literature is limited. The aim of this study is to investigate the intermolecular interaction of the blend MDEA/PZ using Molecular Dynamics (MD) simulation. MD simulation was run under condition 313K and 1 atm using NVE ensemble at 200ps and NVT ensemble at 1ns. The results were interpreted in term of Radial Distribution Function (RDF) analysis through two system of interest i.e binary and tertiary. The binary system will explain the interaction between amine and water molecule while tertiary system used to determine the interaction between the amine and CO2 molecule. For the binary system, it was observed that the –OH group of MDEA is more attracted to water molecule compared to –NH group of MDEA. The –OH group of MDEA can form the hydrogen bond with water that will assist the solubility of MDEA in water. The intermolecular interaction probability of –OH and –NH group of MDEA with CO2 in blended MDEA/PZ is higher than using single MDEA. This findings show that PZ molecule act as an activator to promote the intermolecular interaction between MDEA and CO2.Thus, blend of MDEA with PZ is expecting to increase the absorption rate of CO2 and reduce the heat regeneration requirement.

Keywords: amine absorption process, blend MDEA/PZ, CO2 capture, molecular dynamic simulation, radial distribution function

Procedia PDF Downloads 263
210 A Multi-Output Network with U-Net Enhanced Class Activation Map and Robust Classification Performance for Medical Imaging Analysis

Authors: Jaiden Xuan Schraut, Leon Liu, Yiqiao Yin

Abstract:

Computer vision in medical diagnosis has achieved a high level of success in diagnosing diseases with high accuracy. However, conventional classifiers that produce an image to-label result provides insufficient information for medical professionals to judge and raise concerns over the trust and reliability of a model with results that cannot be explained. In order to gain local insight into cancerous regions, separate tasks such as imaging segmentation need to be implemented to aid the doctors in treating patients, which doubles the training time and costs which renders the diagnosis system inefficient and difficult to be accepted by the public. To tackle this issue and drive AI-first medical solutions further, this paper proposes a multi-output network that follows a U-Net architecture for image segmentation output and features an additional convolutional neural networks (CNN) module for auxiliary classification output. Class activation maps are a method of providing insight into a convolutional neural network’s feature maps that leads to its classification but in the case of lung diseases, the region of interest is enhanced by U-net-assisted Class Activation Map (CAM) visualization. Therefore, our proposed model combines image segmentation models and classifiers to crop out only the lung region of a chest X-ray’s class activation map to provide a visualization that improves the explainability and is able to generate classification results simultaneously which builds trust for AI-led diagnosis systems. The proposed U-Net model achieves 97.61% accuracy and a dice coefficient of 0.97 on testing data from the COVID-QU-Ex Dataset which includes both diseased and healthy lungs.

Keywords: multi-output network model, U-net, class activation map, image classification, medical imaging analysis

Procedia PDF Downloads 161
209 Multimodal Biometric Cryptography Based Authentication in Cloud Environment to Enhance Information Security

Authors: D. Pugazhenthi, B. Sree Vidya

Abstract:

Cloud computing is one of the emerging technologies that enables end users to use the services of cloud on ‘pay per usage’ strategy. This technology grows in a fast pace and so is its security threat. One among the various services provided by cloud is storage. In this service, security plays a vital factor for both authenticating legitimate users and protection of information. This paper brings in efficient ways of authenticating users as well as securing information on the cloud. Initial phase proposed in this paper deals with an authentication technique using multi-factor and multi-dimensional authentication system with multi-level security. Unique identification and slow intrusive formulates an advanced reliability on user-behaviour based biometrics than conventional means of password authentication. By biometric systems, the accounts are accessed only by a legitimate user and not by a nonentity. The biometric templates employed here do not include single trait but multiple, viz., iris and finger prints. The coordinating stage of the authentication system functions on Ensemble Support Vector Machine (SVM) and optimization by assembling weights of base SVMs for SVM ensemble after individual SVM of ensemble is trained by the Artificial Fish Swarm Algorithm (AFSA). Thus it helps in generating a user-specific secure cryptographic key of the multimodal biometric template by fusion process. Data security problem is averted and enhanced security architecture is proposed using encryption and decryption system with double key cryptography based on Fuzzy Neural Network (FNN) for data storing and retrieval in cloud computing . The proposing scheme aims to protect the records from hackers by arresting the breaking of cipher text to original text. This improves the authentication performance that the proposed double cryptographic key scheme is capable of providing better user authentication and better security which distinguish between the genuine and fake users. Thus, there are three important modules in this proposed work such as 1) Feature extraction, 2) Multimodal biometric template generation and 3) Cryptographic key generation. The extraction of the feature and texture properties from the respective fingerprint and iris images has been done initially. Finally, with the help of fuzzy neural network and symmetric cryptography algorithm, the technique of double key encryption technique has been developed. As the proposed approach is based on neural networks, it has the advantage of not being decrypted by the hacker even though the data were hacked already. The results prove that authentication process is optimal and stored information is secured.

Keywords: artificial fish swarm algorithm (AFSA), biometric authentication, decryption, encryption, fingerprint, fusion, fuzzy neural network (FNN), iris, multi-modal, support vector machine classification

Procedia PDF Downloads 227
208 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 389
207 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 216
206 Experimental Modeling of Spray and Water Sheet Formation Due to Wave Interactions with Vertical and Slant Bow-Shaped Model

Authors: Armin Bodaghkhani, Bruce Colbourne, Yuri S. Muzychka

Abstract:

The process of spray-cloud formation and flow kinematics produced from breaking wave impact on vertical and slant lab-scale bow-shaped models were experimentally investigated. Bubble Image Velocimetry (BIV) and Image Processing (IP) techniques were applied to study the various types of wave-model impacts. Different wave characteristics were generated in a tow tank to investigate the effects of wave characteristics, such as wave phase velocity, wave steepness on droplet velocities, and behavior of the process of spray cloud formation. The phase ensemble-averaged vertical velocity and turbulent intensity were computed. A high-speed camera and diffused LED backlights were utilized to capture images for further post processing. Various pressure sensors and capacitive wave probes were used to measure the wave impact pressure and the free surface profile at different locations of the model and wave-tank, respectively. Droplet sizes and velocities were measured using BIV and IP techniques to trace bubbles and droplets in order to measure their velocities and sizes by correlating the texture in these images. The impact pressure and droplet size distributions were compared to several previously experimental models, and satisfactory agreements were achieved. The distribution of droplets in front of both models are demonstrated. Due to the highly transient process of spray formation, the drag coefficient for several stages of this transient displacement for various droplet size ranges and different Reynolds number were calculated based on the ensemble average method. From the experimental results, the slant model produces less spray in comparison with the vertical model, and the droplet velocities generated from the wave impact with the slant model have a lower velocity as compared with the vertical model.

Keywords: spray charachteristics, droplet size and velocity, wave-body interactions, bubble image velocimetry, image processing

Procedia PDF Downloads 277
205 Solar Power Forecasting for the Bidding Zones of the Italian Electricity Market with an Analog Ensemble Approach

Authors: Elena Collino, Dario A. Ronzio, Goffredo Decimi, Maurizio Riva

Abstract:

The rapid increase of renewable energy in Italy is led by wind and solar installations. The 2017 Italian energy strategy foresees a further development of these sustainable technologies, especially solar. This fact has resulted in new opportunities, challenges, and different problems to deal with. The growth of renewables allows to meet the European requirements regarding energy and environmental policy, but these types of sources are difficult to manage because they are intermittent and non-programmable. Operationally, these characteristics can lead to instability on the voltage profile and increasing uncertainty on energy reserve scheduling. The increasing renewable production must be considered with more and more attention especially by the Transmission System Operator (TSO). The TSO, in fact, every day provides orders on energy dispatch, once the market outcome has been determined, on extended areas, defined mainly on the basis of power transmission limitations. In Italy, six market zone are defined: Northern-Italy, Central-Northern Italy, Central-Southern Italy, Southern Italy, Sardinia, and Sicily. An accurate hourly renewable power forecasting for the day-ahead on these extended areas brings an improvement both in terms of dispatching and reserve management. In this study, an operational forecasting tool of the hourly solar output for the six Italian market zones is presented, and the performance is analysed. The implementation is carried out by means of a numerical weather prediction model, coupled with a statistical post-processing in order to derive the power forecast on the basis of the meteorological projection. The weather forecast is obtained from the limited area model RAMS on the Italian territory, initialized with IFS-ECMWF boundary conditions. The post-processing calculates the solar power production with the Analog Ensemble technique (AN). This statistical approach forecasts the production using a probability distribution of the measured production registered in the past when the weather scenario looked very similar to the forecasted one. The similarity is evaluated for the components of the solar radiation: global (GHI), diffuse (DIF) and direct normal (DNI) irradiation, together with the corresponding azimuth and zenith solar angles. These are, in fact, the main factors that affect the solar production. Considering that the AN performance is strictly related to the length and quality of the historical data a training period of more than one year has been used. The training set is made by historical Numerical Weather Prediction (NWP) forecasts at 12 UTC for the GHI, DIF and DNI variables over the Italian territory together with corresponding hourly measured production for each of the six zones. The AN technique makes it possible to estimate the aggregate solar production in the area, without information about the technologic characteristics of the all solar parks present in each area. Besides, this information is often only partially available. Every day, the hourly solar power forecast for the six Italian market zones is made publicly available through a website.

Keywords: analog ensemble, electricity market, PV forecast, solar energy

Procedia PDF Downloads 124
204 FracXpert: Ensemble Machine Learning Approach for Localization and Classification of Bone Fractures in Cricket Athletes

Authors: Madushani Rodrigo, Banuka Athuraliya

Abstract:

In today's world of medical diagnosis and prediction, machine learning stands out as a strong tool, transforming old ways of caring for health. This study analyzes the use of machine learning in the specialized domain of sports medicine, with a focus on the timely and accurate detection of bone fractures in cricket athletes. Failure to identify bone fractures in real time can result in malunion or non-union conditions. To ensure proper treatment and enhance the bone healing process, accurately identifying fracture locations and types is necessary. When interpreting X-ray images, it relies on the expertise and experience of medical professionals in the identification process. Sometimes, radiographic images are of low quality, leading to potential issues. Therefore, it is necessary to have a proper approach to accurately localize and classify fractures in real time. The research has revealed that the optimal approach needs to address the stated problem and employ appropriate radiographic image processing techniques and object detection algorithms. These algorithms should effectively localize and accurately classify all types of fractures with high precision and in a timely manner. In order to overcome the challenges of misidentifying fractures, a distinct model for fracture localization and classification has been implemented. The research also incorporates radiographic image enhancement and preprocessing techniques to overcome the limitations posed by low-quality images. A classification ensemble model has been implemented using ResNet18 and VGG16. In parallel, a fracture segmentation model has been implemented using the enhanced U-Net architecture. Combining the results of these two implemented models, the FracXpert system can accurately localize exact fracture locations along with fracture types from the available 12 different types of fracture patterns, which include avulsion, comminuted, compressed, dislocation, greenstick, hairline, impacted, intraarticular, longitudinal, oblique, pathological, and spiral. This system will generate a confidence score level indicating the degree of confidence in the predicted result. Using ResNet18 and VGG16 architectures, the implemented fracture segmentation model, based on the U-Net architecture, achieved a high accuracy level of 99.94%, demonstrating its precision in identifying fracture locations. Simultaneously, the classification ensemble model achieved an accuracy of 81.0%, showcasing its ability to categorize various fracture patterns, which is instrumental in the fracture treatment process. In conclusion, FracXpert has become a promising ML application in sports medicine, demonstrating its potential to revolutionize fracture detection processes. By leveraging the power of ML algorithms, this study contributes to the advancement of diagnostic capabilities in cricket athlete healthcare, ensuring timely and accurate identification of bone fractures for the best treatment outcomes.

Keywords: multiclass classification, object detection, ResNet18, U-Net, VGG16

Procedia PDF Downloads 29
203 A Single-Channel BSS-Based Method for Structural Health Monitoring of Civil Infrastructure under Environmental Variations

Authors: Yanjie Zhu, André Jesus, Irwanda Laory

Abstract:

Structural Health Monitoring (SHM), involving data acquisition, data interpretation and decision-making system aim to continuously monitor the structural performance of civil infrastructures under various in-service circumstances. The main value and purpose of SHM is identifying damages through data interpretation system. Research on SHM has been expanded in the last decades and a large volume of data is recorded every day owing to the dramatic development in sensor techniques and certain progress in signal processing techniques. However, efficient and reliable data interpretation for damage detection under environmental variations is still a big challenge. Structural damages might be masked because variations in measured data can be the result of environmental variations. This research reports a novel method based on single-channel Blind Signal Separation (BSS), which extracts environmental effects from measured data directly without any prior knowledge of the structure loading and environmental conditions. Despite the successful application in audio processing and bio-medical research fields, BSS has never been used to detect damage under varying environmental conditions. This proposed method optimizes and combines Ensemble Empirical Mode Decomposition (EEMD), Principal Component Analysis (PCA) and Independent Component Analysis (ICA) together to separate structural responses due to different loading conditions respectively from a single channel input signal. The ICA is applying on dimension-reduced output of EEMD. Numerical simulation of a truss bridge, inspired from New Joban Line Arakawa Railway Bridge, is used to validate this method. All results demonstrate that the single-channel BSS-based method can recover temperature effects from mixed structural response recorded by a single sensor with a convincing accuracy. This will be the foundation of further research on direct damage detection under varying environment.

Keywords: damage detection, ensemble empirical mode decomposition (EEMD), environmental variations, independent component analysis (ICA), principal component analysis (PCA), structural health monitoring (SHM)

Procedia PDF Downloads 280
202 Comparing SVM and Naïve Bayes Classifier for Automatic Microaneurysm Detections

Authors: A. Sopharak, B. Uyyanonvara, S. Barman

Abstract:

Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.

Keywords: diabetic retinopathy, microaneurysm, naive Bayes classifier, SVM classifier

Procedia PDF Downloads 304
201 The Optimization of Decision Rules in Multimodal Decision-Level Fusion Scheme

Authors: Andrey V. Timofeev, Dmitry V. Egorov

Abstract:

This paper introduces an original method of parametric optimization of the structure for multimodal decision-level fusion scheme which combines the results of the partial solution of the classification task obtained from assembly of the mono-modal classifiers. As a result, a multimodal fusion classifier which has the minimum value of the total error rate has been obtained.

Keywords: classification accuracy, fusion solution, total error rate, multimodal fusion classifier

Procedia PDF Downloads 434
200 Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain

Authors: Zachary Blanks, Solomon Sonya

Abstract:

Poaching presents a serious threat to endangered animal species, environment conservations, and human life. Additionally, some poaching activity has even been linked to supplying funds to support terrorist networks elsewhere around the world. Consequently, agencies dedicated to protecting wildlife habitats have a near intractable task of adequately patrolling an entire area (spanning several thousand kilometers) given limited resources, funds, and personnel at their disposal. Thus, agencies need predictive tools that are both high-performing and easily implementable by the user to help in learning how the significant features (e.g. animal population densities, topography, behavior patterns of the criminals within the area, etc) interact with each other in hopes of abating poaching. This research develops a classification model using machine learning algorithms to aid in forecasting future attacks that is both easy to train and performs well when compared to other models. In this research, we demonstrate how data imputation methods (specifically predictive mean matching, gradient boosting, and random forest multiple imputation) can be applied to analyze data and create significant predictions across a varied data set. Specifically, we apply these methods to improve the accuracy of adopted prediction models (Logistic Regression, Support Vector Machine, etc). Finally, we assess the performance of the model and the accuracy of our data imputation methods by learning on a real-world data set constituting four years of imputed data and testing on one year of non-imputed data. This paper provides three main contributions. First, we extend work done by the Teamcore and CREATE (Center for Risk and Economic Analysis of Terrorism Events) research group at the University of Southern California (USC) working in conjunction with the Department of Homeland Security to apply game theory and machine learning algorithms to develop more efficient ways of reducing poaching. This research introduces ensemble methods (Random Forests and Stochastic Gradient Boosting) and applies it to real-world poaching data gathered from the Ugandan rain forest park rangers. Next, we consider the effect of data imputation on both the performance of various algorithms and the general accuracy of the method itself when applied to a dependent variable where a large number of observations are missing. Third, we provide an alternate approach to predict the probability of observing poaching both by season and by month. The results from this research are very promising. We conclude that by using Stochastic Gradient Boosting to predict observations for non-commercial poaching by season, we are able to produce statistically equivalent results while being orders of magnitude faster in computation time and complexity. Additionally, when predicting potential poaching incidents by individual month vice entire seasons, boosting techniques produce a mean area under the curve increase of approximately 3% relative to previous prediction schedules by entire seasons.

Keywords: ensemble methods, imputation, machine learning, random forests, statistical analysis, stochastic gradient boosting, wildlife protection

Procedia PDF Downloads 263
199 Active Features Determination: A Unified Framework

Authors: Meenal Badki

Abstract:

We address the issue of active feature determination, where the objective is to determine the set of examples on which additional data (such as lab tests) needs to be gathered, given a large number of examples with some features (such as demographics) and some examples with all the features (such as the complete Electronic Health Record). We note that certain features may be more costly, unique, or laborious to gather. Our proposal is a general active learning approach that is independent of classifiers and similarity metrics. It allows us to identify examples that differ from the full data set and obtain all the features for the examples that match. Our comprehensive evaluation shows the efficacy of this approach, which is driven by four authentic clinical tasks.

Keywords: feature determination, classification, active learning, sample-efficiency

Procedia PDF Downloads 30
198 Predicting Aggregation Propensity from Low-Temperature Conformational Fluctuations

Authors: Hamza Javar Magnier, Robin Curtis

Abstract:

There have been rapid advances in the upstream processing of protein therapeutics, which has shifted the bottleneck to downstream purification and formulation. Finding liquid formulations with shelf lives of up to two years is increasingly difficult for some of the newer therapeutics, which have been engineered for activity, but their formulations are often viscous, can phase separate, and have a high propensity for irreversible aggregation1. We explore means to develop improved predictive ability from a better understanding of how protein-protein interactions on formulation conditions (pH, ionic strength, buffer type, presence of excipients) and how these impact upon the initial steps in protein self-association and aggregation. In this work, we study the initial steps in the aggregation pathways using a minimal protein model based on square-well potentials and discontinuous molecular dynamics. The effect of model parameters, including range of interaction, stiffness, chain length, and chain sequence, implies that protein models fold according to various pathways. By reducing the range of interactions, the folding- and collapse- transition come together, and follow a single-step folding pathway from the denatured to the native state2. After parameterizing the model interaction-parameters, we developed an understanding of low-temperature conformational properties and fluctuations, and the correlation to the folding transition of proteins in isolation. The model fluctuations increase with temperature. We observe a low-temperature point, below which large fluctuations are frozen out. This implies that fluctuations at low-temperature can be correlated to the folding transition at the melting temperature. Because proteins “breath” at low temperatures, defining a native-state as a single structure with conserved contacts and a fixed three-dimensional structure is misleading. Rather, we introduce a new definition of a native-state ensemble based on our understanding of the core conservation, which takes into account the native fluctuations at low temperatures. This approach permits the study of a large range of length and time scales needed to link the molecular interactions to the macroscopically observed behaviour. In addition, these models studied are parameterized by fitting to experimentally observed protein-protein interactions characterized in terms of osmotic second virial coefficients.

Keywords: protein folding, native-ensemble, conformational fluctuation, aggregation

Procedia PDF Downloads 333
197 A Computer-Aided System for Tooth Shade Matching

Authors: Zuhal Kurt, Meral Kurt, Bilge T. Bal, Kemal Ozkan

Abstract:

Shade matching and reproduction is the most important element of success in prosthetic dentistry. Until recently, shade matching procedure was implemented by dentists visual perception with the help of shade guides. Since many factors influence visual perception; tooth shade matching using visual devices (shade guides) is highly subjective and inconsistent. Subjective nature of this process has lead to the development of instrumental devices. Nowadays, colorimeters, spectrophotometers, spectroradiometers and digital image analysing systems are used for instrumental shade selection. Instrumental devices have advantages that readings are quantifiable, can obtain more rapidly and simply, objectively and precisely. However, these devices have noticeable drawbacks. For example, translucent structure and irregular surfaces of teeth lead to defects on measurement with these devices. Also between the results acquired by devices with different measurement principles may make inconsistencies. So, its obligatory to search for new methods for dental shade matching process. A computer-aided system device; digital camera has developed rapidly upon today. Currently, advances in image processing and computing have resulted in the extensive use of digital cameras for color imaging. This procedure has a much cheaper process than the use of traditional contact-type color measurement devices. Digital cameras can be taken by the place of contact-type instruments for shade selection and overcome their disadvantages. Images taken from teeth show morphology and color texture of teeth. In last decades, a new method was recommended to compare the color of shade tabs taken by a digital camera using color features. This method showed that visual and computer-aided shade matching systems should be used as concatenated. Recently using methods of feature extraction techniques are based on shape description and not used color information. However, color is mostly experienced as an essential property in depicting and extracting features from objects in the world around us. When local feature descriptors with color information are extended by concatenating color descriptor with the shape descriptor, that descriptor will be effective on visual object recognition and classification task. Therefore, the color descriptor is to be used in combination with a shape descriptor it does not need to contain any spatial information, which leads us to use local histograms. This local color histogram method is remain reliable under variation of photometric changes, geometrical changes and variation of image quality. So, coloring local feature extraction methods are used to extract features, and also the Scale Invariant Feature Transform (SIFT) descriptor used to for shape description in the proposed method. After the combination of these descriptors, the state-of-art descriptor named by Color-SIFT will be used in this study. Finally, the image feature vectors obtained from quantization algorithm are fed to classifiers such as Nearest Neighbor (KNN), Naive Bayes or Support Vector Machines (SVM) to determine label(s) of the visual object category or matching. In this study, SVM are used as classifiers for color determination and shade matching. Finally, experimental results of this method will be compared with other recent studies. It is concluded from the study that the proposed method is remarkable development on computer aided tooth shade determination system.

Keywords: classifiers, color determination, computer-aided system, tooth shade matching, feature extraction

Procedia PDF Downloads 403
196 Spectrogram Pre-Processing to Improve Isotopic Identification to Discriminate Gamma and Neutrons Sources

Authors: Mustafa Alhamdi

Abstract:

Industrial application to classify gamma rays and neutron events is investigated in this study using deep machine learning. The identification using a convolutional neural network and recursive neural network showed a significant improvement in predication accuracy in a variety of applications. The ability to identify the isotope type and activity from spectral information depends on feature extraction methods, followed by classification. The features extracted from the spectrum profiles try to find patterns and relationships to present the actual spectrum energy in low dimensional space. Increasing the level of separation between classes in feature space improves the possibility to enhance classification accuracy. The nonlinear nature to extract features by neural network contains a variety of transformation and mathematical optimization, while principal component analysis depends on linear transformations to extract features and subsequently improve the classification accuracy. In this paper, the isotope spectrum information has been preprocessed by finding the frequencies components relative to time and using them as a training dataset. Fourier transform implementation to extract frequencies component has been optimized by a suitable windowing function. Training and validation samples of different isotope profiles interacted with CdTe crystal have been simulated using Geant4. The readout electronic noise has been simulated by optimizing the mean and variance of normal distribution. Ensemble learning by combing voting of many models managed to improve the classification accuracy of neural networks. The ability to discriminate gamma and neutron events in a single predication approach using deep machine learning has shown high accuracy using deep learning. The paper findings show the ability to improve the classification accuracy by applying the spectrogram preprocessing stage to the gamma and neutron spectrums of different isotopes. Tuning deep machine learning models by hyperparameter optimization of neural network models enhanced the separation in the latent space and provided the ability to extend the number of detected isotopes in the training database. Ensemble learning contributed significantly to improve the final prediction.

Keywords: machine learning, nuclear physics, Monte Carlo simulation, noise estimation, feature extraction, classification

Procedia PDF Downloads 121
195 The Power of the Proper Orthogonal Decomposition Method

Authors: Charles Lee

Abstract:

The Principal Orthogonal Decomposition (POD) technique has been used as a model reduction tool for many applications in engineering and science. In principle, one begins with an ensemble of data, called snapshots, collected from an experiment or laboratory results. The beauty of the POD technique is that when applied, the entire data set can be represented by the smallest number of orthogonal basis elements. It is the such capability that allows us to reduce the complexity and dimensions of many physical applications. Mathematical formulations and numerical schemes for the POD method will be discussed along with applications in NASA’s Deep Space Large Antenna Arrays, Satellite Image Reconstruction, Cancer Detection with DNA Microarray Data, Maximizing Stock Return, and Medical Imaging.

Keywords: reduced-order methods, principal component analysis, cancer detection, image reconstruction, stock portfolios

Procedia PDF Downloads 49
194 Ribotaxa: Combined Approaches for Taxonomic Resolution Down to the Species Level from Metagenomics Data Revealing Novelties

Authors: Oshma Chakoory, Sophie Comtet-Marre, Pierre Peyret

Abstract:

Metagenomic classifiers are widely used for the taxonomic profiling of metagenomic data and estimation of taxa relative abundance. Small subunit rRNA genes are nowadays a gold standard for the phylogenetic resolution of complex microbial communities, although the power of this marker comes down to its use as full-length. We benchmarked the performance and accuracy of rRNA-specialized versus general-purpose read mappers, reference-targeted assemblers and taxonomic classifiers. We then built a pipeline called RiboTaxa to generate a highly sensitive and specific metataxonomic approach. Using metagenomics data, RiboTaxa gave the best results compared to other tools (Kraken2, Centrifuge (1), METAXA2 (2), PhyloFlash (3)) with precise taxonomic identification and relative abundance description, giving no false positive detection. Using real datasets from various environments (ocean, soil, human gut) and from different approaches (metagenomics and gene capture by hybridization), RiboTaxa revealed microbial novelties not seen by current bioinformatics analysis opening new biological perspectives in human and environmental health. In a study focused on corals’ health involving 20 metagenomic samples (4), an affiliation of prokaryotes was limited to the family level with Endozoicomonadaceae characterising healthy octocoral tissue. RiboTaxa highlighted 2 species of uncultured Endozoicomonas which were dominant in the healthy tissue. Both species belonged to a genus not yet described, opening new research perspectives on corals’ health. Applied to metagenomics data from a study on human gut and extreme longevity (5), RiboTaxa detected the presence of an uncultured archaeon in semi-supercentenarians (aged 105 to 109 years) highlighting an archaeal genus, not yet described, and 3 uncultured species belonging to the Enorma genus that could be species of interest participating in the longevity process. RiboTaxa is user-friendly, rapid, allowing microbiota structure description from any environment and the results can be easily interpreted. This software is freely available at https://github.com/oschakoory/RiboTaxa under the GNU Affero General Public License 3.0.

Keywords: metagenomics profiling, microbial diversity, SSU rRNA genes, full-length phylogenetic marker

Procedia PDF Downloads 88