Search results for: datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 695

Search results for: datasets

425 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression

Authors: Arindam Chaudhuri

Abstract:

The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.

Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR

Procedia PDF Downloads 332
424 Solving Crimes through DNA Methylation Analysis

Authors: Ajay Kumar Rana

Abstract:

Predicting human behaviour, discerning monozygotic twins or left over remnant tissues/fluids of a single human source remains a big challenge in forensic science. Recent advances in the field of DNA methylations which are broadly chemical hallmarks in response to environmental factors can certainly help to identify and discriminate various single-source DNA samples collected from the crime scenes. In this review, cytosine methylation of DNA has been methodologically discussed with its broad applications in many challenging forensic issues like body fluid identification, race/ethnicity identification, monozygotic twins dilemma, addiction or behavioural prediction, age prediction, or even authenticity of the human DNA. With the advent of next-generation sequencing techniques, blooming of DNA methylation datasets and together with standard molecular protocols, the prospect of investigating and solving the above issues and extracting the exact nature of the truth for reconstructing the crime scene events would be undoubtedly helpful in defending and solving the critical crime cases.

Keywords: DNA methylation, differentially methylated regions, human identification, forensics

Procedia PDF Downloads 293
423 The Continuous Facility Location Problem and Transportation Mode Selection in the Supply Chain under Sustainability

Authors: Abdulaziz Alageel, Martino Luis, Shuya Zhong

Abstract:

The main focus of this research study is on the challenges faced in decision-making in a supply chain network regarding the facility location while considering carbon emissions. The study aims (i) to locate facilities (i.e., distribution centeres) in a continuous space considering limitations of capacity and the costs associated with opening and (ii) to reduce the cost of carbon emissions by selecting the mode of transportation. The problem is formulated as mixed-integer linear programming. This study hybridised a greedy randomised adaptive search (GRASP) and variable neighborhood search (VNS) to deal with the problem. Well-known datasets from the literature (Brimberg et al. 2001) are used and adapted in order to assess the performance of the proposed method. The proposed hybrid method produces encouraging results based on computational analysis. The study also highlights some research avenues for future recommendations.

Keywords: supply chain, facility location, weber problem, sustainability

Procedia PDF Downloads 75
422 Reliable Soup: Reliable-Driven Model Weight Fusion on Ultrasound Imaging Classification

Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Yan Tong

Abstract:

It remains challenging to measure reliability from classification results from different machine learning models. This paper proposes a reliable soup optimization algorithm based on the model weight fusion algorithm Model Soup, aiming to improve reliability by using dual-channel reliability as the objective function to fuse a series of weights in the breast ultrasound classification models. Experimental results on breast ultrasound clinical datasets demonstrate that reliable soup significantly enhances the reliability of breast ultrasound image classification tasks. The effectiveness of the proposed approach was verified via multicenter trials. The results from five centers indicate that the reliability optimization algorithm can enhance the reliability of the breast ultrasound image classification model and exhibit low multicenter correlation.

Keywords: breast ultrasound image classification, feature attribution, reliability assessment, reliability optimization

Procedia PDF Downloads 48
421 Fuzzy Sentiment Analysis of Customer Product Reviews

Authors: Samaneh Nadali, Masrah Azrifah Azmi Murad

Abstract:

As a result of the growth of the web, people are able to express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in internet forums, discussion groups, and blogs. Therefore, the number of product reviews has grown rapidly. The large numbers of reviews make it difficult for manufacturers or businesses to automatically classify them into different semantic orientations (positive, negative, and neutral). For sentiment classification, most existing methods utilize a list of opinion words whereas this paper proposes a fuzzy approach for evaluating sentiments expressed in customer product reviews, to predict the strength levels (e.g. very weak, weak, moderate, strong and very strong) of customer product reviews by combinations of adjective, adverb and verb. The proposed fuzzy approach has been tested on eight benchmark datasets and obtained 74% accuracy, which leads to help the organization with a more clear understanding of customer's behavior in support of business planning process.

Keywords: fuzzy logic, customer product review, sentiment analysis

Procedia PDF Downloads 331
420 A Comparative Study of Malware Detection Techniques Using Machine Learning Methods

Authors: Cristina Vatamanu, Doina Cosovan, Dragos Gavrilut, Henri Luchian

Abstract:

In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through semi-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.

Keywords: ensembles, false positives, feature selection, one side class algorithm

Procedia PDF Downloads 262
419 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 145
418 A New Concept for Deriving the Expected Value of Fuzzy Random Variables

Authors: Liang-Hsuan Chen, Chia-Jung Chang

Abstract:

Fuzzy random variables have been introduced as an imprecise concept of numeric values for characterizing the imprecise knowledge. The descriptive parameters can be used to describe the primary features of a set of fuzzy random observations. In fuzzy environments, the expected values are usually represented as fuzzy-valued, interval-valued or numeric-valued descriptive parameters using various metrics. Instead of the concept of area metric that is usually adopted in the relevant studies, the numeric expected value is proposed by the concept of distance metric in this study based on two characters (fuzziness and randomness) of FRVs. Comparing with the existing measures, although the results show that the proposed numeric expected value is same with those using the different metric, if only triangular membership functions are used. However, the proposed approach has the advantages of intuitiveness and computational efficiency, when the membership functions are not triangular types. An example with three datasets is provided for verifying the proposed approach.

Keywords: fuzzy random variables, distance measure, expected value, descriptive parameters

Procedia PDF Downloads 314
417 Robust Pattern Recognition via Correntropy Generalized Orthogonal Matching Pursuit

Authors: Yulong Wang, Yuan Yan Tang, Cuiming Zou, Lina Yang

Abstract:

This paper presents a novel sparse representation method for robust pattern classification. Generalized orthogonal matching pursuit (GOMP) is a recently proposed efficient sparse representation technique. However, GOMP adopts the mean square error (MSE) criterion and assign the same weights to all measurements, including both severely and slightly corrupted ones. To reduce the limitation, we propose an information-theoretic GOMP (ITGOMP) method by exploiting the correntropy induced metric. The results show that ITGOMP can adaptively assign small weights on severely contaminated measurements and large weights on clean ones, respectively. An ITGOMP based classifier is further developed for robust pattern classification. The experiments on public real datasets demonstrate the efficacy of the proposed approach.

Keywords: correntropy induced metric, matching pursuit, pattern classification, sparse representation

Procedia PDF Downloads 328
416 Cotton Crops Vegetative Indices Based Assessment Using Multispectral Images

Authors: Muhammad Shahzad Shifa, Amna Shifa, Muhammad Omar, Aamir Shahzad, Rahmat Ali Khan

Abstract:

Many applications of remote sensing to vegetation and crop response depend on spectral properties of individual leaves and plants. Vegetation indices are usually determined to estimate crop biophysical parameters like crop canopies and crop leaf area indices with the help of remote sensing. Cotton crops assessment is performed with the help of vegetative indices. Remotely sensed images from an optical multispectral radiometer MSR5 are used in this study. The interpretation is based on the fact that different materials reflect and absorb light differently at different wavelengths. Non-normalized and normalized forms of these datasets are analyzed using two complementary data mining algorithms; K-means and K-nearest neighbor (KNN). Our analysis shows that the use of normalized reflectance data and vegetative indices are suitable for an automated assessment and decision making.

Keywords: cotton, condition assessment, KNN algorithm, clustering, MSR5, vegetation indices

Procedia PDF Downloads 292
415 Credit Risk Evaluation Using Genetic Programming

Authors: Ines Gasmi, Salima Smiti, Makram Soui, Khaled Ghedira

Abstract:

Credit risk is considered as one of the important issues for financial institutions. It provokes great losses for banks. To this objective, numerous methods for credit risk evaluation have been proposed. Many evaluation methods are black box models that cannot adequately reveal information hidden in the data. However, several works have focused on building transparent rules-based models. For credit risk assessment, generated rules must be not only highly accurate, but also highly interpretable. In this paper, we aim to build both, an accurate and transparent credit risk evaluation model which proposes a set of classification rules. In fact, we consider the credit risk evaluation as an optimization problem which uses a genetic programming (GP) algorithm, where the goal is to maximize the accuracy of generated rules. We evaluate our proposed approach on the base of German and Australian credit datasets. We compared our finding with some existing works; the result shows that the proposed GP outperforms the other models.

Keywords: credit risk assessment, rule generation, genetic programming, feature selection

Procedia PDF Downloads 316
414 A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data

Authors: Benjamin Leiby, Darryl Ahner

Abstract:

This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions while presenting a need for further refinement that mimics predictive mean matching.

Keywords: correlation, country conflict, imputation, stochastic regression

Procedia PDF Downloads 90
413 Tree Species Classification Using Effective Features of Polarimetric SAR and Hyperspectral Images

Authors: Milad Vahidi, Mahmod R. Sahebi, Mehrnoosh Omati, Reza Mohammadi

Abstract:

Forest management organizations need information to perform their work effectively. Remote sensing is an effective method to acquire information from the Earth. Two datasets of remote sensing images were used to classify forested regions. Firstly, all of extractable features from hyperspectral and PolSAR images were extracted. The optical features were spectral indexes related to the chemical, water contents, structural indexes, effective bands and absorption features. Also, PolSAR features were the original data, target decomposition components, and SAR discriminators features. Secondly, the particle swarm optimization (PSO) and the genetic algorithms (GA) were applied to select optimization features. Furthermore, the support vector machine (SVM) classifier was used to classify the image. The results showed that the combination of PSO and SVM had higher overall accuracy than the other cases. This combination provided overall accuracy about 90.56%. The effective features were the spectral index, the bands in shortwave infrared (SWIR) and the visible ranges and certain PolSAR features.

Keywords: hyperspectral, PolSAR, feature selection, SVM

Procedia PDF Downloads 387
412 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases

Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha

Abstract:

Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.

Keywords: feature fusion, image retrieval, membership function, normalization

Procedia PDF Downloads 321
411 RGB-D SLAM Algorithm Based on pixel level Dense Depth Map

Authors: Hao Zhang, Hongyang Yu

Abstract:

Scale uncertainty is a well-known challenging problem in visual SLAM. Because RGB-D sensor provides depth information, RGB-D SLAM improves this scale uncertainty problem. However, due to the limitation of physical hardware, the depth map output by RGB-D sensor usually contains a large area of missing depth values. These missing depth information affect the accuracy and robustness of RGB-D SLAM. In order to reduce these effects, this paper completes the missing area of the depth map output by RGB-D sensor and then fuses the completed dense depth map into ORB SLAM2. By adding the process of obtaining pixel-level dense depth maps, a better RGB-D visual SLAM algorithm is finally obtained. In the process of obtaining dense depth maps, a deep learning model of indoor scenes is adopted. Experiments are conducted on public datasets and real-world environments of indoor scenes. Experimental results show that the proposed SLAM algorithm has better robustness than ORB SLAM2.

Keywords: RGB-D, SLAM, dense depth, depth map

Procedia PDF Downloads 108
410 Integrative System of GDP, Emissions, Health Services and Population Health in Vietnam: Dynamic Panel Data Estimation

Authors: Ha Hai Duong, Amnon Levy Livermore, Kankesu Jayanthakumaran, Oleg Yerokhin

Abstract:

The issues of economic development, the environment and human health have been investigated since 1990s. Previous researchers have found different empirical evidences of the relationship between income and environmental pollution, health as determinant of economic growth, and the effects of income and environmental pollution on health in various regions of the world. This paper concentrates on integrative relationship analysis of GDP, carbon dioxide emissions, and health services and population health in context of Vietnam. We applied the dynamic generalized method of moments (GMM) estimation on datasets of Vietnam’s sixty-three provinces for the years 2000-2010. Our results show the significant positive effect of GDP on emissions and the dependence of population health on emissions and health services. We find the significant relationship between population health and GDP. Additionally, health services are significantly affected by population health and GDP. Finally, the population size too is other important determinant of both emissions and GDP.

Keywords: economic development, emissions, environmental pollution, health

Procedia PDF Downloads 587
409 3D Point Cloud Model Color Adjustment by Combining Terrestrial Laser Scanner and Close Range Photogrammetry Datasets

Authors: M. Pepe, S. Ackermann, L. Fregonese, C. Achille

Abstract:

3D models obtained with advanced survey techniques such as close-range photogrammetry and laser scanner are nowadays particularly appreciated in Cultural Heritage and Archaeology fields. In order to produce high quality models representing archaeological evidences and anthropological artifacts, the appearance of the model (i.e. color) beyond the geometric accuracy, is not a negligible aspect. The integration of the close-range photogrammetry survey techniques with the laser scanner is still a topic of study and research. By combining point cloud data sets of the same object generated with both technologies, or with the same technology but registered in different moment and/or natural light condition, could construct a final point cloud with accentuated color dissimilarities. In this paper, a methodology to uniform the different data sets, to improve the chromatic quality and to highlight further details by balancing the point color will be presented.

Keywords: color models, cultural heritage, laser scanner, photogrammetry

Procedia PDF Downloads 254
408 Large-Scale Electroencephalogram Biometrics through Contrastive Learning

Authors: Mostafa ‘Neo’ Mohsenvand, Mohammad Rasool Izadi, Pattie Maes

Abstract:

EEG-based biometrics (user identification) has been explored on small datasets of no more than 157 subjects. Here we show that the accuracy of modern supervised methods falls rapidly as the number of users increases to a few thousand. Moreover, supervised methods require a large amount of labeled data for training which limits their applications in real-world scenarios where acquiring data for training should not take more than a few minutes. We show that using contrastive learning for pre-training, it is possible to maintain high accuracy on a dataset of 2130 subjects while only using a fraction of labels. We compare 5 different self-supervised tasks for pre-training of the encoder where our proposed method achieves the accuracy of 96.4%, improving the baseline supervised models by 22.75% and the competing self-supervised model by 3.93%. We also study the effects of the length of the signal and the number of channels on the accuracy of the user-identification models. Our results reveal that signals from temporal and frontal channels contain more identifying features compared to other channels.

Keywords: brainprint, contrastive learning, electroencephalo-gram, self-supervised learning, user identification

Procedia PDF Downloads 131
407 Multi-Classification Deep Learning Model for Diagnosing Different Chest Diseases

Authors: Bandhan Dey, Muhsina Bintoon Yiasha, Gulam Sulaman Choudhury

Abstract:

Chest disease is one of the most problematic ailments in our regular life. There are many known chest diseases out there. Diagnosing them correctly plays a vital role in the process of treatment. There are many methods available explicitly developed for different chest diseases. But the most common approach for diagnosing these diseases is through X-ray. In this paper, we proposed a multi-classification deep learning model for diagnosing COVID-19, lung cancer, pneumonia, tuberculosis, and atelectasis from chest X-rays. In the present work, we used the transfer learning method for better accuracy and fast training phase. The performance of three architectures is considered: InceptionV3, VGG-16, and VGG-19. We evaluated these deep learning architectures using public digital chest x-ray datasets with six classes (i.e., COVID-19, lung cancer, pneumonia, tuberculosis, atelectasis, and normal). The experiments are conducted on six-classification, and we found that VGG16 outperforms other proposed models with an accuracy of 95%.

Keywords: deep learning, image classification, X-ray images, Tensorflow, Keras, chest diseases, convolutional neural networks, multi-classification

Procedia PDF Downloads 59
406 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 250
405 A Genetic Algorithm Based Permutation and Non-Permutation Scheduling Heuristics for Finite Capacity Material Requirement Planning Problem

Authors: Watchara Songserm, Teeradej Wuttipornpun

Abstract:

This paper presents a genetic algorithm based permutation and non-permutation scheduling heuristics (GAPNP) to solve a multi-stage finite capacity material requirement planning (FCMRP) problem in automotive assembly flow shop with unrelated parallel machines. In the algorithm, the sequences of orders are iteratively improved by the GA characteristics, whereas the required operations are scheduled based on the presented permutation and non-permutation heuristics. Finally, a linear programming is applied to minimize the total cost. The presented GAPNP algorithm is evaluated by using real datasets from automotive companies. The required parameters for GAPNP are intently tuned to obtain a common parameter setting for all case studies. The results show that GAPNP significantly outperforms the benchmark algorithm about 30% on average.

Keywords: capacitated MRP, genetic algorithm, linear programming, automotive industries, flow shop, application in industry

Procedia PDF Downloads 463
404 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 402
403 Large Neural Networks Learning From Scratch With Very Few Data and Without Explicit Regularization

Authors: Christoph Linse, Thomas Martinetz

Abstract:

Recent findings have shown that Neural Networks generalize also in over-parametrized regimes with zero training error. This is surprising, since it is completely against traditional machine learning wisdom. In our empirical study we fortify these findings in the domain of fine-grained image classification. We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining. We train the architectures ResNet018, ResNet101 and VGG19 on subsets of the difficult benchmark datasets Caltech101, CUB_200_2011, FGVCAircraft, Flowers102 and StanfordCars with 100 classes and more, perform a comprehensive comparative study and draw implications for the practical application of CNNs. Finally, we show that VGG19 with 140 million weights learns to distinguish airplanes and motorbikes with up to 95% accuracy using only 20 training samples per class.

Keywords: convolutional neural networks, fine-grained image classification, generalization, image recognition, over-parameterized, small data sets

Procedia PDF Downloads 56
402 Improved Rare Species Identification Using Focal Loss Based Deep Learning Models

Authors: Chad Goldsworthy, B. Rajeswari Matam

Abstract:

The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.

Keywords: convolutional neural networks, data imbalance, deep learning, focal loss, species classification, wildlife conservation

Procedia PDF Downloads 148
401 Studying Relationship between Local Geometry of Decision Boundary with Network Complexity for Robustness Analysis with Adversarial Perturbations

Authors: Tushar K. Routh

Abstract:

If inputs are engineered in certain manners, they can influence deep neural networks’ (DNN) performances by facilitating misclassifications, a phenomenon well-known as adversarial attacks that question networks’ vulnerability. Recent studies have unfolded the relationship between vulnerability of such networks with their complexity. In this paper, the distinctive influence of additional convolutional layers at the decision boundaries of several DNN architectures was investigated. Here, to engineer inputs from widely known image datasets like MNIST, Fashion MNIST, and Cifar 10, we have exercised One Step Spectral Attack (OSSA) and Fast Gradient Method (FGM) techniques. The aftermaths of adding layers to the robustness of the architectures have been analyzed. For reasoning, separation width from linear class partitions and local geometry (curvature) near the decision boundary have been examined. The result reveals that model complexity has significant roles in adjusting relative distances from margins, as well as the local features of decision boundaries, which impact robustness.

Keywords: DNN robustness, decision boundary, local curvature, network complexity

Procedia PDF Downloads 44
400 Deep learning with Noisy Labels : Learning True Labels as Discrete Latent Variable

Authors: Azeddine El-Hassouny, Chandrashekhar Meshram, Geraldin Nanfack

Abstract:

In recent years, learning from data with noisy labels (Label Noise) has been a major concern in supervised learning. This problem has become even more worrying in Deep Learning, where the generalization capabilities have been questioned lately. Indeed, deep learning requires a large amount of data that is generally collected by search engines, which frequently return data with unreliable labels. In this paper, we investigate the Label Noise in Deep Learning using variational inference. Our contributions are : (1) exploiting Label Noise concept where the true labels are learnt using reparameterization variational inference, while observed labels are learnt discriminatively. (2) the noise transition matrix is learnt during the training without any particular process, neither heuristic nor preliminary phases. The theoretical results shows how true label distribution can be learned by variational inference in any discriminate neural network, and the effectiveness of our approach is proved in several target datasets, such as MNIST and CIFAR32.

Keywords: label noise, deep learning, discrete latent variable, variational inference, MNIST, CIFAR32

Procedia PDF Downloads 87
399 Images Selection and Best Descriptor Combination for Multi-Shot Person Re-Identification

Authors: Yousra Hadj Hassen, Walid Ayedi, Tarek Ouni, Mohamed Jallouli

Abstract:

To re-identify a person is to check if he/she has been already seen over a cameras network. Recently, re-identifying people over large public cameras networks has become a crucial task of great importance to ensure public security. The vision community has deeply investigated this area of research. Most existing researches rely only on the spatial appearance information from either one or multiple person images. Actually, the real person re-id framework is a multi-shot scenario. However, to efficiently model a person’s appearance and to choose the best samples to remain a challenging problem. In this work, an extensive comparison of descriptors of state of the art associated with the proposed frame selection method is studied. Specifically, we evaluate the samples selection approach using multiple proposed descriptors. We show the effectiveness and advantages of the proposed method by extensive comparisons with related state-of-the-art approaches using two standard datasets PRID2011 and iLIDS-VID.

Keywords: camera network, descriptor, model, multi-shot, person re-identification, selection

Procedia PDF Downloads 254
398 Representativity Based Wasserstein Active Regression

Authors: Benjamin Bobbia, Matthias Picard

Abstract:

In recent years active learning methodologies based on the representativity of the data seems more promising to limit overfitting. The presented query methodology for regression using the Wasserstein distance measuring the representativity of our labelled dataset compared to the global distribution. In this work a crucial use of GroupSort Neural Networks is made therewith to draw a double advantage. The Wasserstein distance can be exactly expressed in terms of such neural networks. Moreover, one can provide explicit bounds for their size and depth together with rates of convergence. However, heterogeneity of the dataset is also considered by weighting the Wasserstein distance with the error of approximation at the previous step of active learning. Such an approach leads to a reduction of overfitting and high prediction performance after few steps of query. After having detailed the methodology and algorithm, an empirical study is presented in order to investigate the range of our hyperparameters. The performances of this method are compared, in terms of numbers of query needed, with other classical and recent query methods on several UCI datasets.

Keywords: active learning, Lipschitz regularization, neural networks, optimal transport, regression

Procedia PDF Downloads 56
397 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: bioassay, machine learning, preprocessing, virtual screen

Procedia PDF Downloads 250
396 Automatic Classification of Periodic Heart Sounds Using Convolutional Neural Network

Authors: Jia Xin Low, Keng Wah Choo

Abstract:

This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.

Keywords: convolutional neural network, discrete wavelet transform, deep learning, heart sound classification

Procedia PDF Downloads 318