Search results for: time-series phosphoproteomic datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 701

Search results for: time-series phosphoproteomic datasets

431 4D Monitoring of Subsurface Conditions in Concrete Infrastructure Prior to Failure Using Ground Penetrating Radar

Authors: Lee Tasker, Ali Karrech, Jeffrey Shragge, Matthew Josh

Abstract:

Monitoring for the deterioration of concrete infrastructure is an important assessment tool for an engineer and difficulties can be experienced with monitoring for deterioration within an infrastructure. If a failure crack, or fluid seepage through such a crack, is observed from the surface often the source location of the deterioration is not known. Geophysical methods are used to assist engineers with assessing the subsurface conditions of materials. Techniques such as Ground Penetrating Radar (GPR) provide information on the location of buried infrastructure such as pipes and conduits, positions of reinforcements within concrete blocks, and regions of voids/cavities behind tunnel lining. This experiment underlines the application of GPR as an infrastructure-monitoring tool to highlight and monitor regions of possible deterioration within a concrete test wall due to an increase in the generation of fractures; in particular, during a time period of applied load to a concrete wall up to and including structural failure. A three-point load was applied to a concrete test wall of dimensions 1700 x 600 x 300 mm³ in increments of 10 kN, until the wall structurally failed at 107.6 kN. At each increment of applied load, the load was kept constant and the wall was scanned using GPR along profile lines across the wall surface. The measured radar amplitude responses of the GPR profiles, at each applied load interval, were reconstructed into depth-slice grids and presented at fixed depth-slice intervals. The corresponding depth-slices were subtracted from each data set to compare the radar amplitude response between datasets and monitor for changes in the radar amplitude response. At lower values of applied load (i.e., 0-60 kN), few changes were observed in the difference of radar amplitude responses between data sets. At higher values of applied load (i.e., 100 kN), closer to structural failure, larger differences in radar amplitude response between data sets were highlighted in the GPR data; up to 300% increase in radar amplitude response at some locations between the 0 kN and 100 kN radar datasets. Distinct regions were observed in the 100 kN difference dataset (i.e., 100 kN-0 kN) close to the location of the final failure crack. The key regions observed were a conical feature located between approximately 3.0-12.0 cm depth from surface and a vertical linear feature located approximately 12.1-21.0 cm depth from surface. These key regions have been interpreted as locations exhibiting an increased change in pore-space due to increased mechanical loading, or locations displaying an increase in volume of micro-cracks, or locations showing the development of a larger macro-crack. The experiment showed that GPR is a useful geophysical monitoring tool to assist engineers with highlighting and monitoring regions of large changes of radar amplitude response that may be associated with locations of significant internal structural change (e.g. crack development). GPR is a non-destructive technique that is fast to deploy in a production setting. GPR can assist with reducing risk and costs in future infrastructure maintenance programs by highlighting and monitoring locations within the structure exhibiting large changes in radar amplitude over calendar-time.

Keywords: 4D GPR, engineering geophysics, ground penetrating radar, infrastructure monitoring

Procedia PDF Downloads 143
430 Clinical Validation of an Automated Natural Language Processing Algorithm for Finding COVID-19 Symptoms and Complications in Patient Notes

Authors: Karolina Wieczorek, Sophie Wiliams

Abstract:

Introduction: Patient data is often collected in Electronic Health Record Systems (EHR) for purposes such as providing care as well as reporting data. This information can be re-used to validate data models in clinical trials or in epidemiological studies. Manual validation of automated tools is vital to pick up errors in processing and to provide confidence in the output. Mentioning a disease in a discharge letter does not necessarily mean that a patient suffers from this disease. Many of them discuss a diagnostic process, different tests, or discuss whether a patient has a certain disease. The COVID-19 dataset in this study used natural language processing (NLP), an automated algorithm which extracts information related to COVID-19 symptoms, complications, and medications prescribed within the hospital. Free-text patient clinical patient notes are rich sources of information which contain patient data not captured in a structured form, hence the use of named entity recognition (NER) to capture additional information. Methods: Patient data (discharge summary letters) were exported and screened by an algorithm to pick up relevant terms related to COVID-19. Manual validation of automated tools is vital to pick up errors in processing and to provide confidence in the output. A list of 124 Systematized Nomenclature of Medicine (SNOMED) Clinical Terms has been provided in Excel with corresponding IDs. Two independent medical student researchers were provided with a dictionary of SNOMED list of terms to refer to when screening the notes. They worked on two separate datasets called "A” and "B”, respectively. Notes were screened to check if the correct term had been picked-up by the algorithm to ensure that negated terms were not picked up. Results: Its implementation in the hospital began on March 31, 2020, and the first EHR-derived extract was generated for use in an audit study on June 04, 2020. The dataset has contributed to large, priority clinical trials (including International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) by bulk upload to REDcap research databases) and local research and audit studies. Successful sharing of EHR-extracted datasets requires communicating the provenance and quality, including completeness and accuracy of this data. The results of the validation of the algorithm were the following: precision (0.907), recall (0.416), and F-score test (0.570). Percentage enhancement with NLP extracted terms compared to regular data extraction alone was low (0.3%) for relatively well-documented data such as previous medical history but higher (16.6%, 29.53%, 30.3%, 45.1%) for complications, presenting illness, chronic procedures, acute procedures respectively. Conclusions: This automated NLP algorithm is shown to be useful in facilitating patient data analysis and has the potential to be used in more large-scale clinical trials to assess potential study exclusion criteria for participants in the development of vaccines.

Keywords: automated, algorithm, NLP, COVID-19

Procedia PDF Downloads 66
429 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 321
428 Enhancing Predictive Accuracy in Pharmaceutical Sales through an Ensemble Kernel Gaussian Process Regression Approach

Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf

Abstract:

This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.

Keywords: Gaussian process regression, ensemble kernels, bayesian optimization, pharmaceutical sales analysis, time series forecasting, data analysis

Procedia PDF Downloads 29
427 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression

Authors: Arindam Chaudhuri

Abstract:

The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.

Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR

Procedia PDF Downloads 332
426 Solving Crimes through DNA Methylation Analysis

Authors: Ajay Kumar Rana

Abstract:

Predicting human behaviour, discerning monozygotic twins or left over remnant tissues/fluids of a single human source remains a big challenge in forensic science. Recent advances in the field of DNA methylations which are broadly chemical hallmarks in response to environmental factors can certainly help to identify and discriminate various single-source DNA samples collected from the crime scenes. In this review, cytosine methylation of DNA has been methodologically discussed with its broad applications in many challenging forensic issues like body fluid identification, race/ethnicity identification, monozygotic twins dilemma, addiction or behavioural prediction, age prediction, or even authenticity of the human DNA. With the advent of next-generation sequencing techniques, blooming of DNA methylation datasets and together with standard molecular protocols, the prospect of investigating and solving the above issues and extracting the exact nature of the truth for reconstructing the crime scene events would be undoubtedly helpful in defending and solving the critical crime cases.

Keywords: DNA methylation, differentially methylated regions, human identification, forensics

Procedia PDF Downloads 293
425 The Continuous Facility Location Problem and Transportation Mode Selection in the Supply Chain under Sustainability

Authors: Abdulaziz Alageel, Martino Luis, Shuya Zhong

Abstract:

The main focus of this research study is on the challenges faced in decision-making in a supply chain network regarding the facility location while considering carbon emissions. The study aims (i) to locate facilities (i.e., distribution centeres) in a continuous space considering limitations of capacity and the costs associated with opening and (ii) to reduce the cost of carbon emissions by selecting the mode of transportation. The problem is formulated as mixed-integer linear programming. This study hybridised a greedy randomised adaptive search (GRASP) and variable neighborhood search (VNS) to deal with the problem. Well-known datasets from the literature (Brimberg et al. 2001) are used and adapted in order to assess the performance of the proposed method. The proposed hybrid method produces encouraging results based on computational analysis. The study also highlights some research avenues for future recommendations.

Keywords: supply chain, facility location, weber problem, sustainability

Procedia PDF Downloads 75
424 Reliable Soup: Reliable-Driven Model Weight Fusion on Ultrasound Imaging Classification

Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Yan Tong

Abstract:

It remains challenging to measure reliability from classification results from different machine learning models. This paper proposes a reliable soup optimization algorithm based on the model weight fusion algorithm Model Soup, aiming to improve reliability by using dual-channel reliability as the objective function to fuse a series of weights in the breast ultrasound classification models. Experimental results on breast ultrasound clinical datasets demonstrate that reliable soup significantly enhances the reliability of breast ultrasound image classification tasks. The effectiveness of the proposed approach was verified via multicenter trials. The results from five centers indicate that the reliability optimization algorithm can enhance the reliability of the breast ultrasound image classification model and exhibit low multicenter correlation.

Keywords: breast ultrasound image classification, feature attribution, reliability assessment, reliability optimization

Procedia PDF Downloads 48
423 Fuzzy Sentiment Analysis of Customer Product Reviews

Authors: Samaneh Nadali, Masrah Azrifah Azmi Murad

Abstract:

As a result of the growth of the web, people are able to express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in internet forums, discussion groups, and blogs. Therefore, the number of product reviews has grown rapidly. The large numbers of reviews make it difficult for manufacturers or businesses to automatically classify them into different semantic orientations (positive, negative, and neutral). For sentiment classification, most existing methods utilize a list of opinion words whereas this paper proposes a fuzzy approach for evaluating sentiments expressed in customer product reviews, to predict the strength levels (e.g. very weak, weak, moderate, strong and very strong) of customer product reviews by combinations of adjective, adverb and verb. The proposed fuzzy approach has been tested on eight benchmark datasets and obtained 74% accuracy, which leads to help the organization with a more clear understanding of customer's behavior in support of business planning process.

Keywords: fuzzy logic, customer product review, sentiment analysis

Procedia PDF Downloads 331
422 A Comparative Study of Malware Detection Techniques Using Machine Learning Methods

Authors: Cristina Vatamanu, Doina Cosovan, Dragos Gavrilut, Henri Luchian

Abstract:

In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through semi-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.

Keywords: ensembles, false positives, feature selection, one side class algorithm

Procedia PDF Downloads 262
421 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 145
420 A New Concept for Deriving the Expected Value of Fuzzy Random Variables

Authors: Liang-Hsuan Chen, Chia-Jung Chang

Abstract:

Fuzzy random variables have been introduced as an imprecise concept of numeric values for characterizing the imprecise knowledge. The descriptive parameters can be used to describe the primary features of a set of fuzzy random observations. In fuzzy environments, the expected values are usually represented as fuzzy-valued, interval-valued or numeric-valued descriptive parameters using various metrics. Instead of the concept of area metric that is usually adopted in the relevant studies, the numeric expected value is proposed by the concept of distance metric in this study based on two characters (fuzziness and randomness) of FRVs. Comparing with the existing measures, although the results show that the proposed numeric expected value is same with those using the different metric, if only triangular membership functions are used. However, the proposed approach has the advantages of intuitiveness and computational efficiency, when the membership functions are not triangular types. An example with three datasets is provided for verifying the proposed approach.

Keywords: fuzzy random variables, distance measure, expected value, descriptive parameters

Procedia PDF Downloads 314
419 Robust Pattern Recognition via Correntropy Generalized Orthogonal Matching Pursuit

Authors: Yulong Wang, Yuan Yan Tang, Cuiming Zou, Lina Yang

Abstract:

This paper presents a novel sparse representation method for robust pattern classification. Generalized orthogonal matching pursuit (GOMP) is a recently proposed efficient sparse representation technique. However, GOMP adopts the mean square error (MSE) criterion and assign the same weights to all measurements, including both severely and slightly corrupted ones. To reduce the limitation, we propose an information-theoretic GOMP (ITGOMP) method by exploiting the correntropy induced metric. The results show that ITGOMP can adaptively assign small weights on severely contaminated measurements and large weights on clean ones, respectively. An ITGOMP based classifier is further developed for robust pattern classification. The experiments on public real datasets demonstrate the efficacy of the proposed approach.

Keywords: correntropy induced metric, matching pursuit, pattern classification, sparse representation

Procedia PDF Downloads 328
418 Cotton Crops Vegetative Indices Based Assessment Using Multispectral Images

Authors: Muhammad Shahzad Shifa, Amna Shifa, Muhammad Omar, Aamir Shahzad, Rahmat Ali Khan

Abstract:

Many applications of remote sensing to vegetation and crop response depend on spectral properties of individual leaves and plants. Vegetation indices are usually determined to estimate crop biophysical parameters like crop canopies and crop leaf area indices with the help of remote sensing. Cotton crops assessment is performed with the help of vegetative indices. Remotely sensed images from an optical multispectral radiometer MSR5 are used in this study. The interpretation is based on the fact that different materials reflect and absorb light differently at different wavelengths. Non-normalized and normalized forms of these datasets are analyzed using two complementary data mining algorithms; K-means and K-nearest neighbor (KNN). Our analysis shows that the use of normalized reflectance data and vegetative indices are suitable for an automated assessment and decision making.

Keywords: cotton, condition assessment, KNN algorithm, clustering, MSR5, vegetation indices

Procedia PDF Downloads 292
417 Credit Risk Evaluation Using Genetic Programming

Authors: Ines Gasmi, Salima Smiti, Makram Soui, Khaled Ghedira

Abstract:

Credit risk is considered as one of the important issues for financial institutions. It provokes great losses for banks. To this objective, numerous methods for credit risk evaluation have been proposed. Many evaluation methods are black box models that cannot adequately reveal information hidden in the data. However, several works have focused on building transparent rules-based models. For credit risk assessment, generated rules must be not only highly accurate, but also highly interpretable. In this paper, we aim to build both, an accurate and transparent credit risk evaluation model which proposes a set of classification rules. In fact, we consider the credit risk evaluation as an optimization problem which uses a genetic programming (GP) algorithm, where the goal is to maximize the accuracy of generated rules. We evaluate our proposed approach on the base of German and Australian credit datasets. We compared our finding with some existing works; the result shows that the proposed GP outperforms the other models.

Keywords: credit risk assessment, rule generation, genetic programming, feature selection

Procedia PDF Downloads 317
416 A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data

Authors: Benjamin Leiby, Darryl Ahner

Abstract:

This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions while presenting a need for further refinement that mimics predictive mean matching.

Keywords: correlation, country conflict, imputation, stochastic regression

Procedia PDF Downloads 90
415 Tree Species Classification Using Effective Features of Polarimetric SAR and Hyperspectral Images

Authors: Milad Vahidi, Mahmod R. Sahebi, Mehrnoosh Omati, Reza Mohammadi

Abstract:

Forest management organizations need information to perform their work effectively. Remote sensing is an effective method to acquire information from the Earth. Two datasets of remote sensing images were used to classify forested regions. Firstly, all of extractable features from hyperspectral and PolSAR images were extracted. The optical features were spectral indexes related to the chemical, water contents, structural indexes, effective bands and absorption features. Also, PolSAR features were the original data, target decomposition components, and SAR discriminators features. Secondly, the particle swarm optimization (PSO) and the genetic algorithms (GA) were applied to select optimization features. Furthermore, the support vector machine (SVM) classifier was used to classify the image. The results showed that the combination of PSO and SVM had higher overall accuracy than the other cases. This combination provided overall accuracy about 90.56%. The effective features were the spectral index, the bands in shortwave infrared (SWIR) and the visible ranges and certain PolSAR features.

Keywords: hyperspectral, PolSAR, feature selection, SVM

Procedia PDF Downloads 387
414 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases

Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha

Abstract:

Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.

Keywords: feature fusion, image retrieval, membership function, normalization

Procedia PDF Downloads 321
413 RGB-D SLAM Algorithm Based on pixel level Dense Depth Map

Authors: Hao Zhang, Hongyang Yu

Abstract:

Scale uncertainty is a well-known challenging problem in visual SLAM. Because RGB-D sensor provides depth information, RGB-D SLAM improves this scale uncertainty problem. However, due to the limitation of physical hardware, the depth map output by RGB-D sensor usually contains a large area of missing depth values. These missing depth information affect the accuracy and robustness of RGB-D SLAM. In order to reduce these effects, this paper completes the missing area of the depth map output by RGB-D sensor and then fuses the completed dense depth map into ORB SLAM2. By adding the process of obtaining pixel-level dense depth maps, a better RGB-D visual SLAM algorithm is finally obtained. In the process of obtaining dense depth maps, a deep learning model of indoor scenes is adopted. Experiments are conducted on public datasets and real-world environments of indoor scenes. Experimental results show that the proposed SLAM algorithm has better robustness than ORB SLAM2.

Keywords: RGB-D, SLAM, dense depth, depth map

Procedia PDF Downloads 108
412 Integrative System of GDP, Emissions, Health Services and Population Health in Vietnam: Dynamic Panel Data Estimation

Authors: Ha Hai Duong, Amnon Levy Livermore, Kankesu Jayanthakumaran, Oleg Yerokhin

Abstract:

The issues of economic development, the environment and human health have been investigated since 1990s. Previous researchers have found different empirical evidences of the relationship between income and environmental pollution, health as determinant of economic growth, and the effects of income and environmental pollution on health in various regions of the world. This paper concentrates on integrative relationship analysis of GDP, carbon dioxide emissions, and health services and population health in context of Vietnam. We applied the dynamic generalized method of moments (GMM) estimation on datasets of Vietnam’s sixty-three provinces for the years 2000-2010. Our results show the significant positive effect of GDP on emissions and the dependence of population health on emissions and health services. We find the significant relationship between population health and GDP. Additionally, health services are significantly affected by population health and GDP. Finally, the population size too is other important determinant of both emissions and GDP.

Keywords: economic development, emissions, environmental pollution, health

Procedia PDF Downloads 587
411 3D Point Cloud Model Color Adjustment by Combining Terrestrial Laser Scanner and Close Range Photogrammetry Datasets

Authors: M. Pepe, S. Ackermann, L. Fregonese, C. Achille

Abstract:

3D models obtained with advanced survey techniques such as close-range photogrammetry and laser scanner are nowadays particularly appreciated in Cultural Heritage and Archaeology fields. In order to produce high quality models representing archaeological evidences and anthropological artifacts, the appearance of the model (i.e. color) beyond the geometric accuracy, is not a negligible aspect. The integration of the close-range photogrammetry survey techniques with the laser scanner is still a topic of study and research. By combining point cloud data sets of the same object generated with both technologies, or with the same technology but registered in different moment and/or natural light condition, could construct a final point cloud with accentuated color dissimilarities. In this paper, a methodology to uniform the different data sets, to improve the chromatic quality and to highlight further details by balancing the point color will be presented.

Keywords: color models, cultural heritage, laser scanner, photogrammetry

Procedia PDF Downloads 254
410 Large-Scale Electroencephalogram Biometrics through Contrastive Learning

Authors: Mostafa ‘Neo’ Mohsenvand, Mohammad Rasool Izadi, Pattie Maes

Abstract:

EEG-based biometrics (user identification) has been explored on small datasets of no more than 157 subjects. Here we show that the accuracy of modern supervised methods falls rapidly as the number of users increases to a few thousand. Moreover, supervised methods require a large amount of labeled data for training which limits their applications in real-world scenarios where acquiring data for training should not take more than a few minutes. We show that using contrastive learning for pre-training, it is possible to maintain high accuracy on a dataset of 2130 subjects while only using a fraction of labels. We compare 5 different self-supervised tasks for pre-training of the encoder where our proposed method achieves the accuracy of 96.4%, improving the baseline supervised models by 22.75% and the competing self-supervised model by 3.93%. We also study the effects of the length of the signal and the number of channels on the accuracy of the user-identification models. Our results reveal that signals from temporal and frontal channels contain more identifying features compared to other channels.

Keywords: brainprint, contrastive learning, electroencephalo-gram, self-supervised learning, user identification

Procedia PDF Downloads 131
409 Multi-Classification Deep Learning Model for Diagnosing Different Chest Diseases

Authors: Bandhan Dey, Muhsina Bintoon Yiasha, Gulam Sulaman Choudhury

Abstract:

Chest disease is one of the most problematic ailments in our regular life. There are many known chest diseases out there. Diagnosing them correctly plays a vital role in the process of treatment. There are many methods available explicitly developed for different chest diseases. But the most common approach for diagnosing these diseases is through X-ray. In this paper, we proposed a multi-classification deep learning model for diagnosing COVID-19, lung cancer, pneumonia, tuberculosis, and atelectasis from chest X-rays. In the present work, we used the transfer learning method for better accuracy and fast training phase. The performance of three architectures is considered: InceptionV3, VGG-16, and VGG-19. We evaluated these deep learning architectures using public digital chest x-ray datasets with six classes (i.e., COVID-19, lung cancer, pneumonia, tuberculosis, atelectasis, and normal). The experiments are conducted on six-classification, and we found that VGG16 outperforms other proposed models with an accuracy of 95%.

Keywords: deep learning, image classification, X-ray images, Tensorflow, Keras, chest diseases, convolutional neural networks, multi-classification

Procedia PDF Downloads 59
408 User Intention Generation with Large Language Models (LLMs) Using Chain-of-Thought Prompting

Authors: Gangmin Li, Fan Yang

Abstract:

Personalized recommendation is crucial for any recommendation system. One of the techniques for personalized recommendation is to identify the intention. Traditional user intention identification uses the user’s selection when facing multiple items. This modeling relies primarily on historical behaviour data resulting in challenges such as the cold start, unintended choice, and failure to capture intention when items are new. Motivated by recent advancements in Large Language Models (LLMs) like ChatGPT, we present an approach for user intention identification by embracing LLMs with Chain-of-Thought (CoT) prompting. We use the initial user profile as input to LLMs and design a collection of prompts to align the LLM's response through various recommendation tasks encompassing rating prediction, search and browse history, user clarification etc. Our tests on real-world datasets demonstrate the improvements in recommendation by explicit user intention identification and, with that intention, merged into a user model.

Keywords: personalized recommendation, generative user modelling, user intention identification, large language models, chain-of-thought prompting

Procedia PDF Downloads 4
407 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 250
406 A Genetic Algorithm Based Permutation and Non-Permutation Scheduling Heuristics for Finite Capacity Material Requirement Planning Problem

Authors: Watchara Songserm, Teeradej Wuttipornpun

Abstract:

This paper presents a genetic algorithm based permutation and non-permutation scheduling heuristics (GAPNP) to solve a multi-stage finite capacity material requirement planning (FCMRP) problem in automotive assembly flow shop with unrelated parallel machines. In the algorithm, the sequences of orders are iteratively improved by the GA characteristics, whereas the required operations are scheduled based on the presented permutation and non-permutation heuristics. Finally, a linear programming is applied to minimize the total cost. The presented GAPNP algorithm is evaluated by using real datasets from automotive companies. The required parameters for GAPNP are intently tuned to obtain a common parameter setting for all case studies. The results show that GAPNP significantly outperforms the benchmark algorithm about 30% on average.

Keywords: capacitated MRP, genetic algorithm, linear programming, automotive industries, flow shop, application in industry

Procedia PDF Downloads 463
405 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 402
404 Large Neural Networks Learning From Scratch With Very Few Data and Without Explicit Regularization

Authors: Christoph Linse, Thomas Martinetz

Abstract:

Recent findings have shown that Neural Networks generalize also in over-parametrized regimes with zero training error. This is surprising, since it is completely against traditional machine learning wisdom. In our empirical study we fortify these findings in the domain of fine-grained image classification. We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining. We train the architectures ResNet018, ResNet101 and VGG19 on subsets of the difficult benchmark datasets Caltech101, CUB_200_2011, FGVCAircraft, Flowers102 and StanfordCars with 100 classes and more, perform a comprehensive comparative study and draw implications for the practical application of CNNs. Finally, we show that VGG19 with 140 million weights learns to distinguish airplanes and motorbikes with up to 95% accuracy using only 20 training samples per class.

Keywords: convolutional neural networks, fine-grained image classification, generalization, image recognition, over-parameterized, small data sets

Procedia PDF Downloads 56
403 Improved Rare Species Identification Using Focal Loss Based Deep Learning Models

Authors: Chad Goldsworthy, B. Rajeswari Matam

Abstract:

The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.

Keywords: convolutional neural networks, data imbalance, deep learning, focal loss, species classification, wildlife conservation

Procedia PDF Downloads 148
402 Studying Relationship between Local Geometry of Decision Boundary with Network Complexity for Robustness Analysis with Adversarial Perturbations

Authors: Tushar K. Routh

Abstract:

If inputs are engineered in certain manners, they can influence deep neural networks’ (DNN) performances by facilitating misclassifications, a phenomenon well-known as adversarial attacks that question networks’ vulnerability. Recent studies have unfolded the relationship between vulnerability of such networks with their complexity. In this paper, the distinctive influence of additional convolutional layers at the decision boundaries of several DNN architectures was investigated. Here, to engineer inputs from widely known image datasets like MNIST, Fashion MNIST, and Cifar 10, we have exercised One Step Spectral Attack (OSSA) and Fast Gradient Method (FGM) techniques. The aftermaths of adding layers to the robustness of the architectures have been analyzed. For reasoning, separation width from linear class partitions and local geometry (curvature) near the decision boundary have been examined. The result reveals that model complexity has significant roles in adjusting relative distances from margins, as well as the local features of decision boundaries, which impact robustness.

Keywords: DNN robustness, decision boundary, local curvature, network complexity

Procedia PDF Downloads 44