Search results for: categorical datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 816

Search results for: categorical datasets

456 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 154
455 A New Concept for Deriving the Expected Value of Fuzzy Random Variables

Authors: Liang-Hsuan Chen, Chia-Jung Chang

Abstract:

Fuzzy random variables have been introduced as an imprecise concept of numeric values for characterizing the imprecise knowledge. The descriptive parameters can be used to describe the primary features of a set of fuzzy random observations. In fuzzy environments, the expected values are usually represented as fuzzy-valued, interval-valued or numeric-valued descriptive parameters using various metrics. Instead of the concept of area metric that is usually adopted in the relevant studies, the numeric expected value is proposed by the concept of distance metric in this study based on two characters (fuzziness and randomness) of FRVs. Comparing with the existing measures, although the results show that the proposed numeric expected value is same with those using the different metric, if only triangular membership functions are used. However, the proposed approach has the advantages of intuitiveness and computational efficiency, when the membership functions are not triangular types. An example with three datasets is provided for verifying the proposed approach.

Keywords: fuzzy random variables, distance measure, expected value, descriptive parameters

Procedia PDF Downloads 322
454 Robust Pattern Recognition via Correntropy Generalized Orthogonal Matching Pursuit

Authors: Yulong Wang, Yuan Yan Tang, Cuiming Zou, Lina Yang

Abstract:

This paper presents a novel sparse representation method for robust pattern classification. Generalized orthogonal matching pursuit (GOMP) is a recently proposed efficient sparse representation technique. However, GOMP adopts the mean square error (MSE) criterion and assign the same weights to all measurements, including both severely and slightly corrupted ones. To reduce the limitation, we propose an information-theoretic GOMP (ITGOMP) method by exploiting the correntropy induced metric. The results show that ITGOMP can adaptively assign small weights on severely contaminated measurements and large weights on clean ones, respectively. An ITGOMP based classifier is further developed for robust pattern classification. The experiments on public real datasets demonstrate the efficacy of the proposed approach.

Keywords: correntropy induced metric, matching pursuit, pattern classification, sparse representation

Procedia PDF Downloads 334
453 Cotton Crops Vegetative Indices Based Assessment Using Multispectral Images

Authors: Muhammad Shahzad Shifa, Amna Shifa, Muhammad Omar, Aamir Shahzad, Rahmat Ali Khan

Abstract:

Many applications of remote sensing to vegetation and crop response depend on spectral properties of individual leaves and plants. Vegetation indices are usually determined to estimate crop biophysical parameters like crop canopies and crop leaf area indices with the help of remote sensing. Cotton crops assessment is performed with the help of vegetative indices. Remotely sensed images from an optical multispectral radiometer MSR5 are used in this study. The interpretation is based on the fact that different materials reflect and absorb light differently at different wavelengths. Non-normalized and normalized forms of these datasets are analyzed using two complementary data mining algorithms; K-means and K-nearest neighbor (KNN). Our analysis shows that the use of normalized reflectance data and vegetative indices are suitable for an automated assessment and decision making.

Keywords: cotton, condition assessment, KNN algorithm, clustering, MSR5, vegetation indices

Procedia PDF Downloads 308
452 Credit Risk Evaluation Using Genetic Programming

Authors: Ines Gasmi, Salima Smiti, Makram Soui, Khaled Ghedira

Abstract:

Credit risk is considered as one of the important issues for financial institutions. It provokes great losses for banks. To this objective, numerous methods for credit risk evaluation have been proposed. Many evaluation methods are black box models that cannot adequately reveal information hidden in the data. However, several works have focused on building transparent rules-based models. For credit risk assessment, generated rules must be not only highly accurate, but also highly interpretable. In this paper, we aim to build both, an accurate and transparent credit risk evaluation model which proposes a set of classification rules. In fact, we consider the credit risk evaluation as an optimization problem which uses a genetic programming (GP) algorithm, where the goal is to maximize the accuracy of generated rules. We evaluate our proposed approach on the base of German and Australian credit datasets. We compared our finding with some existing works; the result shows that the proposed GP outperforms the other models.

Keywords: credit risk assessment, rule generation, genetic programming, feature selection

Procedia PDF Downloads 325
451 Impact of COVID-19 on Radiology Training in Australia and New Zealand

Authors: Preet Gill, Danus Ravindran

Abstract:

These The COVID-19 pandemic resulted in widespread implications for medical specialist training programs worldwide, including radiology. The objective of this study was to investigate the impact of COVID-19 on the Australian and New Zealand radiology trainee experience and well-being, as well as to compare the Australasian experience with that reported by other countries. An anonymised electronic online questionnaire was disseminated to all training members of the Royal Australian and New Zealand College of Radiologists who were radiology trainees during the 2020 – 2022 clinical years. Trainees were questioned about their experience from the beginning of the COVID-19 pandemic in Australasia (March 2020) to the time of survey completion. Participation was voluntary. Questions assessed the impact of the pandemic across multiple domains, including workload (inpatient/outpatient & individual modality volume), teaching, supervision, external learning opportunities, redeployment and trainee wellbeing. Survey responses were collated and compared with other peer reviewed publications. Answer options were primarily in categorical format (nominal and ordinal subtypes, as appropriate). An opportunity to provide free text answers to a minority of questions was provided. While our results mirror that of other countries, which demonstrated reduced case exposure and increased remote teaching and supervision, responses showed variation in the methods utilised by training sites during the height of the pandemic. A significant number of trainees were affected by examination cancellations/postponements and had subspecialty training rotations postponed. The majority of trainees felt that the pandemic had a negative effect on their training. In conclusion, the COVID-19 pandemic has had a significant impact on radiology trainees across Australia and New Zealand. The present study has highlighted the extent of these effects, with most aspects of training impacted. Opportunities exist to utilise this information to create robust workplace strategies to mitigate these negative effects should the need arise in the future.

Keywords: COVID-19, radiology, training, pandemic

Procedia PDF Downloads 45
450 A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data

Authors: Benjamin Leiby, Darryl Ahner

Abstract:

This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions while presenting a need for further refinement that mimics predictive mean matching.

Keywords: correlation, country conflict, imputation, stochastic regression

Procedia PDF Downloads 101
449 Tree Species Classification Using Effective Features of Polarimetric SAR and Hyperspectral Images

Authors: Milad Vahidi, Mahmod R. Sahebi, Mehrnoosh Omati, Reza Mohammadi

Abstract:

Forest management organizations need information to perform their work effectively. Remote sensing is an effective method to acquire information from the Earth. Two datasets of remote sensing images were used to classify forested regions. Firstly, all of extractable features from hyperspectral and PolSAR images were extracted. The optical features were spectral indexes related to the chemical, water contents, structural indexes, effective bands and absorption features. Also, PolSAR features were the original data, target decomposition components, and SAR discriminators features. Secondly, the particle swarm optimization (PSO) and the genetic algorithms (GA) were applied to select optimization features. Furthermore, the support vector machine (SVM) classifier was used to classify the image. The results showed that the combination of PSO and SVM had higher overall accuracy than the other cases. This combination provided overall accuracy about 90.56%. The effective features were the spectral index, the bands in shortwave infrared (SWIR) and the visible ranges and certain PolSAR features.

Keywords: hyperspectral, PolSAR, feature selection, SVM

Procedia PDF Downloads 394
448 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases

Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha

Abstract:

Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.

Keywords: feature fusion, image retrieval, membership function, normalization

Procedia PDF Downloads 326
447 RGB-D SLAM Algorithm Based on pixel level Dense Depth Map

Authors: Hao Zhang, Hongyang Yu

Abstract:

Scale uncertainty is a well-known challenging problem in visual SLAM. Because RGB-D sensor provides depth information, RGB-D SLAM improves this scale uncertainty problem. However, due to the limitation of physical hardware, the depth map output by RGB-D sensor usually contains a large area of missing depth values. These missing depth information affect the accuracy and robustness of RGB-D SLAM. In order to reduce these effects, this paper completes the missing area of the depth map output by RGB-D sensor and then fuses the completed dense depth map into ORB SLAM2. By adding the process of obtaining pixel-level dense depth maps, a better RGB-D visual SLAM algorithm is finally obtained. In the process of obtaining dense depth maps, a deep learning model of indoor scenes is adopted. Experiments are conducted on public datasets and real-world environments of indoor scenes. Experimental results show that the proposed SLAM algorithm has better robustness than ORB SLAM2.

Keywords: RGB-D, SLAM, dense depth, depth map

Procedia PDF Downloads 115
446 Integrative System of GDP, Emissions, Health Services and Population Health in Vietnam: Dynamic Panel Data Estimation

Authors: Ha Hai Duong, Amnon Levy Livermore, Kankesu Jayanthakumaran, Oleg Yerokhin

Abstract:

The issues of economic development, the environment and human health have been investigated since 1990s. Previous researchers have found different empirical evidences of the relationship between income and environmental pollution, health as determinant of economic growth, and the effects of income and environmental pollution on health in various regions of the world. This paper concentrates on integrative relationship analysis of GDP, carbon dioxide emissions, and health services and population health in context of Vietnam. We applied the dynamic generalized method of moments (GMM) estimation on datasets of Vietnam’s sixty-three provinces for the years 2000-2010. Our results show the significant positive effect of GDP on emissions and the dependence of population health on emissions and health services. We find the significant relationship between population health and GDP. Additionally, health services are significantly affected by population health and GDP. Finally, the population size too is other important determinant of both emissions and GDP.

Keywords: economic development, emissions, environmental pollution, health

Procedia PDF Downloads 596
445 Study of Three-Dimensional Computed Tomography of Frontoethmoidal Cells Using International Frontal Sinus Anatomy Classification

Authors: Prabesh Karki, Shyam Thapa Chettri, Bajarang Prasad Sah, Manoj Bhattarai, Sudeep Mishra

Abstract:

Introduction: Frontal sinus is frequently described as the most difficult sinus to access surgically due to its proximity to the cribriform plate, orbit, and anterior ethmoid artery. Frontal sinus surgery requires a detailed understanding of the cellular structure and FSDP unique to each patient, making high-resolution CT scans an indispensable tool to assess the difficulty of planned sinus surgery. International Frontal Sinus Anatomy Classification (IFAC) was developed to provide a more precise nomenclature for cells in the frontal recess, classifying cells based on their anatomic origin. Objectives: To assess the proportion of frontal cell variants defined by IFAC, variation with respect to age and gender. Methods: 54 cases were enrolled after a detailed clinical history, thorough general and physical examinations, and CT a report ordered in a film. Assessment and tabulation of the presence of frontal cells according to the IFAC analyzed. The prevalence of each cell type was calculated, and data were entered in MS Excel and analyzed using Statistical Package for the Social Sciences (SPSS). Descriptive statistics and frequencies were defined for categorical and numerical variables. Frequency, percentage, the mean and standard deviation were calculated. Result: Among 54 patients, 30 (55.6%) were male and 24 (44.4%) were female. The patient enrolled ranged from 18 to 78 years. Majority33.3% (n=18) were in age group of >50 years.According to IFAC, Agger nasi cells (92.6%) were most common, whereas supraorbital ethmoidal cells were least common 16 (29.6%). Prevalence of other frontoethmoidal cells was SAC- 57.4%, SAFC- 38.9%, SBC- 74.1%, SBFC- 33.3%, FSC- 38.9% of 54 cases. Conclusion: IFAC is an international consensus document that describes an anatomically precise nomenclature for classifying frontoethmoidal cells' anatomy. This study has defined the prevalence, symmetry and reliability of frontoethmoidal cells as established by the IFAC system as in other parts of the world.

Keywords: frontal sinus, frontoethmoidal cells, international frontal sinus anatomy classification

Procedia PDF Downloads 73
444 3D Point Cloud Model Color Adjustment by Combining Terrestrial Laser Scanner and Close Range Photogrammetry Datasets

Authors: M. Pepe, S. Ackermann, L. Fregonese, C. Achille

Abstract:

3D models obtained with advanced survey techniques such as close-range photogrammetry and laser scanner are nowadays particularly appreciated in Cultural Heritage and Archaeology fields. In order to produce high quality models representing archaeological evidences and anthropological artifacts, the appearance of the model (i.e. color) beyond the geometric accuracy, is not a negligible aspect. The integration of the close-range photogrammetry survey techniques with the laser scanner is still a topic of study and research. By combining point cloud data sets of the same object generated with both technologies, or with the same technology but registered in different moment and/or natural light condition, could construct a final point cloud with accentuated color dissimilarities. In this paper, a methodology to uniform the different data sets, to improve the chromatic quality and to highlight further details by balancing the point color will be presented.

Keywords: color models, cultural heritage, laser scanner, photogrammetry

Procedia PDF Downloads 261
443 Large-Scale Electroencephalogram Biometrics through Contrastive Learning

Authors: Mostafa ‘Neo’ Mohsenvand, Mohammad Rasool Izadi, Pattie Maes

Abstract:

EEG-based biometrics (user identification) has been explored on small datasets of no more than 157 subjects. Here we show that the accuracy of modern supervised methods falls rapidly as the number of users increases to a few thousand. Moreover, supervised methods require a large amount of labeled data for training which limits their applications in real-world scenarios where acquiring data for training should not take more than a few minutes. We show that using contrastive learning for pre-training, it is possible to maintain high accuracy on a dataset of 2130 subjects while only using a fraction of labels. We compare 5 different self-supervised tasks for pre-training of the encoder where our proposed method achieves the accuracy of 96.4%, improving the baseline supervised models by 22.75% and the competing self-supervised model by 3.93%. We also study the effects of the length of the signal and the number of channels on the accuracy of the user-identification models. Our results reveal that signals from temporal and frontal channels contain more identifying features compared to other channels.

Keywords: brainprint, contrastive learning, electroencephalo-gram, self-supervised learning, user identification

Procedia PDF Downloads 137
442 Multi-Classification Deep Learning Model for Diagnosing Different Chest Diseases

Authors: Bandhan Dey, Muhsina Bintoon Yiasha, Gulam Sulaman Choudhury

Abstract:

Chest disease is one of the most problematic ailments in our regular life. There are many known chest diseases out there. Diagnosing them correctly plays a vital role in the process of treatment. There are many methods available explicitly developed for different chest diseases. But the most common approach for diagnosing these diseases is through X-ray. In this paper, we proposed a multi-classification deep learning model for diagnosing COVID-19, lung cancer, pneumonia, tuberculosis, and atelectasis from chest X-rays. In the present work, we used the transfer learning method for better accuracy and fast training phase. The performance of three architectures is considered: InceptionV3, VGG-16, and VGG-19. We evaluated these deep learning architectures using public digital chest x-ray datasets with six classes (i.e., COVID-19, lung cancer, pneumonia, tuberculosis, atelectasis, and normal). The experiments are conducted on six-classification, and we found that VGG16 outperforms other proposed models with an accuracy of 95%.

Keywords: deep learning, image classification, X-ray images, Tensorflow, Keras, chest diseases, convolutional neural networks, multi-classification

Procedia PDF Downloads 65
441 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 258
440 User Intention Generation with Large Language Models Using Chain-of-Thought Prompting Title

Authors: Gangmin Li, Fan Yang

Abstract:

Personalized recommendation is crucial for any recommendation system. One of the techniques for personalized recommendation is to identify the intention. Traditional user intention identification uses the user’s selection when facing multiple items. This modeling relies primarily on historical behaviour data resulting in challenges such as the cold start, unintended choice, and failure to capture intention when items are new. Motivated by recent advancements in Large Language Models (LLMs) like ChatGPT, we present an approach for user intention identification by embracing LLMs with Chain-of-Thought (CoT) prompting. We use the initial user profile as input to LLMs and design a collection of prompts to align the LLM's response through various recommendation tasks encompassing rating prediction, search and browse history, user clarification, etc. Our tests on real-world datasets demonstrate the improvements in recommendation by explicit user intention identification and, with that intention, merged into a user model.

Keywords: personalized recommendation, generative user modelling, user intention identification, large language models, chain-of-thought prompting

Procedia PDF Downloads 23
439 A Genetic Algorithm Based Permutation and Non-Permutation Scheduling Heuristics for Finite Capacity Material Requirement Planning Problem

Authors: Watchara Songserm, Teeradej Wuttipornpun

Abstract:

This paper presents a genetic algorithm based permutation and non-permutation scheduling heuristics (GAPNP) to solve a multi-stage finite capacity material requirement planning (FCMRP) problem in automotive assembly flow shop with unrelated parallel machines. In the algorithm, the sequences of orders are iteratively improved by the GA characteristics, whereas the required operations are scheduled based on the presented permutation and non-permutation heuristics. Finally, a linear programming is applied to minimize the total cost. The presented GAPNP algorithm is evaluated by using real datasets from automotive companies. The required parameters for GAPNP are intently tuned to obtain a common parameter setting for all case studies. The results show that GAPNP significantly outperforms the benchmark algorithm about 30% on average.

Keywords: capacitated MRP, genetic algorithm, linear programming, automotive industries, flow shop, application in industry

Procedia PDF Downloads 470
438 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 408
437 A Comparative Study between Digital Mammography, B Mode Ultrasound, Shear-Wave and Strain Elastography to Distinguish Benign and Malignant Breast Masses

Authors: Arjun Prakash, Samanvitha H.

Abstract:

BACKGROUND: Breast cancer is the commonest malignancy among women globally, with an estimated incidence of 2.3 million new cases as of 2020, representing 11.7% of all malignancies. As per Globocan data 2020, it accounted for 13.5% of all cancers and 10.6% of all cancer deaths in India. Early diagnosis and treatment can improve the overall morbidity and mortality, which necessitates the importance of differentiating benign from malignant breast masses. OBJECTIVE: The objective of the present study was to evaluate and compare the role of Digital Mammography (DM), B mode Ultrasound (USG), Shear Wave Elastography (SWE) and Strain Elastography (SE) in differentiating benign and malignant breast masses (ACR BI-RADS 3 - 5). Histo-Pathological Examination (HPE) was considered the Gold standard. MATERIALS & METHODS: We conducted a cross-sectional study on 53 patients with 64 breast masses over a period of 10 months. All patients underwent DM, USG, SWE and SE. These modalities were individually assessed to know their accuracy in differentiating benign and malignant masses. All Digital Mammograms were done using the Fujifilm AMULET Innovality Digital Mammography system and all Ultrasound examinations were performed on SAMSUNG RS 80 EVO Ultrasound system equipped with 2 to 9 MHz and 3 – 16 MHz linear transducers. All masses were subjected to HPE. Independent t-test and Chi-square or Fisher’s exact test were used to assess continuous and categorical variables, respectively. ROC analysis was done to assess the accuracy of diagnostic tests. RESULTS: Of 64 lesions, 51 (79.68%) were malignant and 13 (20.31%) (p < 0.0001) were benign. SE was the most specific (100%) (p < 0.0001) and USG (98%) (p < 0.0001) was the most sensitive of all the modalities. E max, E mean, E max ratio, E mean ratio and Strain Ratio of the malignant masses significantly differed from those of the benign masses. Maximum SWE value showed the highest sensitivity (88.2%) (p < 0.0001) among the elastography parameters. A combination of USG, SE and SWE had good sensitivity (86%) (p < 0.0001). CONCLUSION: A combination of USG, SE and SWE improves overall diagnostic yield in differentiating benign and malignant breast masses. Early diagnosis and treatment of breast carcinoma will reduce patient mortality and morbidity.

Keywords: digital mammography, breast cancer, ultrasound, elastography

Procedia PDF Downloads 85
436 Large Neural Networks Learning From Scratch With Very Few Data and Without Explicit Regularization

Authors: Christoph Linse, Thomas Martinetz

Abstract:

Recent findings have shown that Neural Networks generalize also in over-parametrized regimes with zero training error. This is surprising, since it is completely against traditional machine learning wisdom. In our empirical study we fortify these findings in the domain of fine-grained image classification. We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining. We train the architectures ResNet018, ResNet101 and VGG19 on subsets of the difficult benchmark datasets Caltech101, CUB_200_2011, FGVCAircraft, Flowers102 and StanfordCars with 100 classes and more, perform a comprehensive comparative study and draw implications for the practical application of CNNs. Finally, we show that VGG19 with 140 million weights learns to distinguish airplanes and motorbikes with up to 95% accuracy using only 20 training samples per class.

Keywords: convolutional neural networks, fine-grained image classification, generalization, image recognition, over-parameterized, small data sets

Procedia PDF Downloads 66
435 Improved Rare Species Identification Using Focal Loss Based Deep Learning Models

Authors: Chad Goldsworthy, B. Rajeswari Matam

Abstract:

The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.

Keywords: convolutional neural networks, data imbalance, deep learning, focal loss, species classification, wildlife conservation

Procedia PDF Downloads 163
434 Studying Relationship between Local Geometry of Decision Boundary with Network Complexity for Robustness Analysis with Adversarial Perturbations

Authors: Tushar K. Routh

Abstract:

If inputs are engineered in certain manners, they can influence deep neural networks’ (DNN) performances by facilitating misclassifications, a phenomenon well-known as adversarial attacks that question networks’ vulnerability. Recent studies have unfolded the relationship between vulnerability of such networks with their complexity. In this paper, the distinctive influence of additional convolutional layers at the decision boundaries of several DNN architectures was investigated. Here, to engineer inputs from widely known image datasets like MNIST, Fashion MNIST, and Cifar 10, we have exercised One Step Spectral Attack (OSSA) and Fast Gradient Method (FGM) techniques. The aftermaths of adding layers to the robustness of the architectures have been analyzed. For reasoning, separation width from linear class partitions and local geometry (curvature) near the decision boundary have been examined. The result reveals that model complexity has significant roles in adjusting relative distances from margins, as well as the local features of decision boundaries, which impact robustness.

Keywords: DNN robustness, decision boundary, local curvature, network complexity

Procedia PDF Downloads 49
433 The Impact of Total Parenteral Nutrition on Pediatric Stem Cell Transplantation and Its Complications

Authors: R. Alramyan, S. Alsalamah, R. Alrashed, R. Alakel, F. Altheyeb, M. Alessa

Abstract:

Background: Nutritional support with total parenteral nutrition (TPN) is usually commenced with hematopoietic stem cell transplantation (HSCT) patients. However, it has its benefits and risks. Complications related to central venous catheter such as infections, and metabolic disturbances, including abnormal liver function, is usually of concern in such patients. Methods: A retrospective charts review of all pediatric patients who underwent HSCT between the period 2015-2018 in a tertiary hospital in Riyadh, Saudi Arabia. Patients' demographics, types of conditioning, type of nutrition, and patients' outcomes were collected. Statistical analysis was conducted using SPSS version 22. Frequencies and percentages were used to describe categorical variables. Mean, and standard deviation were used for continuous variables. A P value of less than 0.05 was considered as statically significant. Results: a total of 162 HSCTs were identified during the period mentioned. Indication of allogenic transplant included hemoglobinopathy in 50 patients (31%), acute lymphoblastic leukemia in 21 patients (13%). TPN was used in 96 patients (59.30%) for a median of 14 days, nasogastric tube feeding (NGT) in 16 (9.90%) patients for a median of 11 days, and 71 of patients (43.80%) were able to tolerate oral feeding. Out of the 96 patients (59.30%) who were dependent on TPN, 64 patients (66.7%) had severe mucositis in comparison to 17 patients (25.8%) who were either on NGT or tolerated oral intake. (P-value= 0.00). Sinusoidal obstruction syndrome (SOS) was seen in 14 patients (14.6%) who were receiving TPN compared to none in non-TPN patients (P=value 0.001). Moreover, majority of patients who had SOS received myeloablative conditioning therapy for non-malignant disease (hemoglobinopathy). However, there were no statistically significant differences in Graft-vs-Host Disease (both acute and chronic), bacteremia, and patient outcome between both groups. Conclusions: Nutritional support using TPN is used in majority of patients, especially post-myeloablative conditioning associated with severe mucositis. TPN was associated with VOD, especially in hemoglobinopathy patients who received myeloablative therapy. This may emphasize on use of preventative measures such as fluid restriction, use of diuretics, or defibrotide in high-risk patients.

Keywords: hematopoeitic stem cell transplant, HSCT, stem cell transplant, sinusoidal obstruction syndrome, total parenteral nutrition

Procedia PDF Downloads 132
432 Deep learning with Noisy Labels : Learning True Labels as Discrete Latent Variable

Authors: Azeddine El-Hassouny, Chandrashekhar Meshram, Geraldin Nanfack

Abstract:

In recent years, learning from data with noisy labels (Label Noise) has been a major concern in supervised learning. This problem has become even more worrying in Deep Learning, where the generalization capabilities have been questioned lately. Indeed, deep learning requires a large amount of data that is generally collected by search engines, which frequently return data with unreliable labels. In this paper, we investigate the Label Noise in Deep Learning using variational inference. Our contributions are : (1) exploiting Label Noise concept where the true labels are learnt using reparameterization variational inference, while observed labels are learnt discriminatively. (2) the noise transition matrix is learnt during the training without any particular process, neither heuristic nor preliminary phases. The theoretical results shows how true label distribution can be learned by variational inference in any discriminate neural network, and the effectiveness of our approach is proved in several target datasets, such as MNIST and CIFAR32.

Keywords: label noise, deep learning, discrete latent variable, variational inference, MNIST, CIFAR32

Procedia PDF Downloads 102
431 Images Selection and Best Descriptor Combination for Multi-Shot Person Re-Identification

Authors: Yousra Hadj Hassen, Walid Ayedi, Tarek Ouni, Mohamed Jallouli

Abstract:

To re-identify a person is to check if he/she has been already seen over a cameras network. Recently, re-identifying people over large public cameras networks has become a crucial task of great importance to ensure public security. The vision community has deeply investigated this area of research. Most existing researches rely only on the spatial appearance information from either one or multiple person images. Actually, the real person re-id framework is a multi-shot scenario. However, to efficiently model a person’s appearance and to choose the best samples to remain a challenging problem. In this work, an extensive comparison of descriptors of state of the art associated with the proposed frame selection method is studied. Specifically, we evaluate the samples selection approach using multiple proposed descriptors. We show the effectiveness and advantages of the proposed method by extensive comparisons with related state-of-the-art approaches using two standard datasets PRID2011 and iLIDS-VID.

Keywords: camera network, descriptor, model, multi-shot, person re-identification, selection

Procedia PDF Downloads 258
430 Effectiveness of Adrenal Venous Sampling in the Management of Primary Aldosteronism: Single Centered Cohort Study at a Tertiary Care Hospital in Sri Lanka

Authors: Balasooriya B. M. C. M., Sujeeva N., Thowfeek Z., Siddiqa Omo, Liyanagunawardana J. E., Jayawardana Saiu, Manathunga S. S., Katulanda G. W.

Abstract:

Introduction and objectives: Adrenal venous sampling (AVS) is the gold standard to discriminate unilateral primary aldosteronism (UPA) from bilateral disease (BPA). AVS is technically demanding and only performed in a limited number of centers worldwide. To the best of our knowledge, Except for one study conducted in India, no other research studies on this area have been conducted in South Asia. This study aimed to evaluate the effectiveness of AVS in the management of primary aldosteronism. Methods: A total of 32 patients who underwent AVS at the National Hospital of Sri Lanka from April 2021 to April 2023 were enrolled. Demographic, clinical and laboratory data were obtained retrospectively. A procedure was considered successful when adequate cannulation of both adrenal veins was demonstrated. Cortisol gradient across the adrenal vein (AV) and the peripheral vein was used to establish the success of venous cannulation. Lateralization was determined by the aldosterone gradient between the two sides. Continuous and categorical variables were summarized with mean, SD, and proportions, respectively. The mean and standard deviation of the contralateral suppression index (CSI) were estimated with an intercept-only Bayesian inference model. Results: Of the 32 patients, the average age was 52.47 +26.14 and 19 (59.4%) were males. Both AVs were successfully cannulated in 12 (37.5%). Among them, lateralization was demonstrated in 11(91.7%), and one was diagnosed as a bilateral disease. There were no total failures. Right AV cannulation was unsuccessful in 18 (56.25%), of which lateralization was demonstrated in 9 (50%), and others were inconclusive. Left AV cannulation was unsuccessful only in 2 (6.25%); one was lateralized, and the other remained inconclusive. The estimated mean of the CSI was 0.33 (89% credible interval 0.11-0.86). Seven patients underwent unilateral adrenalectomy and demonstrated significant improvement in blood pressure during follow-up. Two patients await surgery. Others were treated medically. Conclusions: Despite failure due to procedural difficulties, AVS remained useful in the management of patients with PA. Moreover, the success of the procedure needs experienced hands and advanced equipment to achieve optimal outcomes in PA.

Keywords: adrenal venous sampling, lateralization, contralateral suppression index, primary aldosteronism

Procedia PDF Downloads 46
429 Representativity Based Wasserstein Active Regression

Authors: Benjamin Bobbia, Matthias Picard

Abstract:

In recent years active learning methodologies based on the representativity of the data seems more promising to limit overfitting. The presented query methodology for regression using the Wasserstein distance measuring the representativity of our labelled dataset compared to the global distribution. In this work a crucial use of GroupSort Neural Networks is made therewith to draw a double advantage. The Wasserstein distance can be exactly expressed in terms of such neural networks. Moreover, one can provide explicit bounds for their size and depth together with rates of convergence. However, heterogeneity of the dataset is also considered by weighting the Wasserstein distance with the error of approximation at the previous step of active learning. Such an approach leads to a reduction of overfitting and high prediction performance after few steps of query. After having detailed the methodology and algorithm, an empirical study is presented in order to investigate the range of our hyperparameters. The performances of this method are compared, in terms of numbers of query needed, with other classical and recent query methods on several UCI datasets.

Keywords: active learning, Lipschitz regularization, neural networks, optimal transport, regression

Procedia PDF Downloads 65
428 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: bioassay, machine learning, preprocessing, virtual screen

Procedia PDF Downloads 254
427 Automatic Classification of Periodic Heart Sounds Using Convolutional Neural Network

Authors: Jia Xin Low, Keng Wah Choo

Abstract:

This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.

Keywords: convolutional neural network, discrete wavelet transform, deep learning, heart sound classification

Procedia PDF Downloads 326