Search results for: dataset development
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 17201

Search results for: dataset development

16991 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic

Authors: Fei Gao, Rodolfo C. Raga Jr.

Abstract:

This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.

Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle

Procedia PDF Downloads 75
16990 Electrocardiogram-Based Heartbeat Classification Using Convolutional Neural Networks

Authors: Jacqueline Rose T. Alipo-on, Francesca Isabelle F. Escobar, Myles Joshua T. Tan, Hezerul Abdul Karim, Nouar Al Dahoul

Abstract:

Electrocardiogram (ECG) signal analysis and processing are crucial in the diagnosis of cardiovascular diseases, which are considered one of the leading causes of mortality worldwide. However, the traditional rule-based analysis of large volumes of ECG data is time-consuming, labor-intensive, and prone to human errors. With the advancement of the programming paradigm, algorithms such as machine learning have been increasingly used to perform an analysis of ECG signals. In this paper, various deep learning algorithms were adapted to classify five classes of heartbeat types. The dataset used in this work is the synthetic MIT-BIH Arrhythmia dataset produced from generative adversarial networks (GANs). Various deep learning models such as ResNet-50 convolutional neural network (CNN), 1-D CNN, and long short-term memory (LSTM) were evaluated and compared. ResNet-50 was found to outperform other models in terms of recall and F1 score using a five-fold average score of 98.88% and 98.87%, respectively. 1-D CNN, on the other hand, was found to have the highest average precision of 98.93%.

Keywords: heartbeat classification, convolutional neural network, electrocardiogram signals, generative adversarial networks, long short-term memory, ResNet-50

Procedia PDF Downloads 128
16989 Wearable Antenna for Diagnosis of Parkinson’s Disease Using a Deep Learning Pipeline on Accelerated Hardware

Authors: Subham Ghosh, Banani Basu, Marami Das

Abstract:

Background: The development of compact, low-power antenna sensors has resulted in hardware restructuring, allowing for wireless ubiquitous sensing. The antenna sensors can create wireless body-area networks (WBAN) by linking various wireless nodes across the human body. WBAN and IoT applications, such as remote health and fitness monitoring and rehabilitation, are becoming increasingly important. In particular, Parkinson’s disease (PD), a common neurodegenerative disorder, presents clinical features that can be easily misdiagnosed. As a mobility disease, it may greatly benefit from the antenna’s nearfield approach with a variety of activities that can use WBAN and IoT technologies to increase diagnosis accuracy and patient monitoring. Methodology: This study investigates the feasibility of leveraging a single patch antenna mounted (using cloth) on the wrist dorsal to differentiate actual Parkinson's disease (PD) from false PD using a small hardware platform. The semi-flexible antenna operates at the 2.4 GHz ISM band and collects reflection coefficient (Γ) data from patients performing five exercises designed for the classification of PD and other disorders such as essential tremor (ET) or those physiological disorders caused by anxiety or stress. The obtained data is normalized and converted into 2-D representations using the Gabor wavelet transform (GWT). Data augmentation is then used to expand the dataset size. A lightweight deep-learning (DL) model is developed to run on the GPU-enabled NVIDIA Jetson Nano platform. The DL model processes the 2-D images for feature extraction and classification. Findings: The DL model was trained and tested on both the original and augmented datasets, thus doubling the dataset size. To ensure robustness, a 5-fold stratified cross-validation (5-FSCV) method was used. The proposed framework, utilizing a DL model with 1.356 million parameters on the NVIDIA Jetson Nano, achieved optimal performance in terms of accuracy of 88.64%, F1-score of 88.54, and recall of 90.46%, with a latency of 33 seconds per epoch.

Keywords: antenna, deep-learning, GPU-hardware, Parkinson’s disease

Procedia PDF Downloads 7
16988 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 111
16987 Research on the Status Quo and Countermeasures of Professional Development of Engineering Teachers in China

Authors: Wang Xiu Xiu

Abstract:

The professional development of engineering teachers in universities is the key to the construction of outstanding engineers in China, which is related to the quality and prospects of the entire engineering education. This study investigated 2789 teachers' professional development in different regions of China, which outlines the current situation of the professional development of engineering teachers from three perspectives: professional development needs, professional development methods and professional development effects. Data results show that engineering teachers have the strongest demand for the improvement of subject knowledge and teaching ability. Engineering faculty with 0-5 years of teaching experience, under 35 years of age and a doctorate degree have the strongest demand for development. The frequency of engineering teachers' participation in various professional development activities is low, especially in school-enterprise cooperation-related activities. There are significant differences in the participation frequency of professional development activities among engineering faculty with different teaching ages, ages, professional titles, degrees and administrative positions in schools. The professional development of engineering faculty has been improved to a certain extent and is positively affected by professional development needs and participation in professional development. In this regard, we can constantly improve the professional development system of engineering teachers from three aspects: training on demand, stimulating motivation, and optimizing resource allocation, to enhance the professional development level of engineering teachers.

Keywords: engineering teachers in universities, professional development, status quo, countermeasures

Procedia PDF Downloads 17
16986 Bank Concentration and Industry Structure: Evidence from China

Authors: Jingjing Ye, Cijun Fan, Yan Dong

Abstract:

The development of financial sector plays an important role in shaping industrial structure. However, evidence on the micro-level channels through which this relation manifest remains relatively sparse, particularly for developing countries. In this paper, we compile an industry-by-city dataset based on manufacturing firms and registered banks in 287 Chinese cities from 1998 to 2008. Based on a difference-in-difference approach, we find the highly concentrated banking sector decreases the competitiveness of firms in each manufacturing industry. There are two main reasons: i) bank accessibility successfully fosters firm expansion within each industry, however, only for sufficiently large enterprises; ii) state-owned enterprises are favored by the banking industry in China. The results are robust after considering alternative concentration and external finance dependence measures.

Keywords: bank concentration, China, difference-in-difference, industry structure

Procedia PDF Downloads 388
16985 Alloy Design of Single Crystal Ni-base Superalloys by Combined Method of Neural Network and CALPHAD

Authors: Mehdi Montakhabrazlighi, Ercan Balikci

Abstract:

The neural network (NN) method is applied to alloy development of single crystal Ni-base Superalloys with low density and improved mechanical strength. A set of 1200 dataset which includes chemical composition of the alloys, applied stress and temperature as inputs and density and time to rupture as outputs is used for training and testing the network. Thermodynamic phase diagram modeling of the screened alloys is performed with Thermocalc software to model the equilibrium phases and also microsegregation in solidification processing. The model is first trained by 80% of the data and the 20% rest is used to test it. Comparing the predicted values and the experimental ones showed that a well-trained network is capable of accurately predicting the density and time to rupture strength of the Ni-base superalloys. Modeling results is used to determine the effect of alloying elements, stress, temperature and gamma-prime phase volume fraction on rupture strength of the Ni-base superalloys. This approach is in line with the materials genome initiative and integrated computed materials engineering approaches promoted recently with the aim of reducing the cost and time for development of new alloys for critical aerospace components. This work has been funded by TUBITAK under grant number 112M783.

Keywords: neural network, rupture strength, superalloy, thermocalc

Procedia PDF Downloads 313
16984 Disaggregation the Daily Rainfall Dataset into Sub-Daily Resolution in the Temperate Oceanic Climate Region

Authors: Mohammad Bakhshi, Firas Al Janabi

Abstract:

High resolution rain data are very important to fulfill the input of hydrological models. Among models of high-resolution rainfall data generation, the temporal disaggregation was chosen for this study. The paper attempts to generate three different rainfall resolutions (4-hourly, hourly and 10-minutes) from daily for around 20-year record period. The process was done by DiMoN tool which is based on random cascade model and method of fragment. Differences between observed and simulated rain dataset are evaluated with variety of statistical and empirical methods: Kolmogorov-Smirnov test (K-S), usual statistics, and Exceedance probability. The tool worked well at preserving the daily rainfall values in wet days, however, the generated data are cumulated in a shorter time period and made stronger storms. It is demonstrated that the difference between generated and observed cumulative distribution function curve of 4-hourly datasets is passed the K-S test criteria while in hourly and 10-minutes datasets the P-value should be employed to prove that their differences were reasonable. The results are encouraging considering the overestimation of generated high-resolution rainfall data.

Keywords: DiMoN Tool, disaggregation, exceedance probability, Kolmogorov-Smirnov test, rainfall

Procedia PDF Downloads 201
16983 Optimizing Pediatric Pneumonia Diagnosis with Lightweight MobileNetV2 and VAE-GAN Techniques in Chest X-Ray Analysis

Authors: Shriya Shukla, Lachin Fernando

Abstract:

Pneumonia, a leading cause of mortality in young children globally, presents significant diagnostic challenges, particularly in resource-limited settings. This study presents an approach to diagnosing pediatric pneumonia using Chest X-Ray (CXR) images, employing a lightweight MobileNetV2 model enhanced with synthetic data augmentation. Addressing the challenge of dataset scarcity and imbalance, the study used a Variational Autoencoder-Generative Adversarial Network (VAE-GAN) to generate synthetic CXR images, improving the representation of normal cases in the pediatric dataset. This approach not only addresses the issues of data imbalance and scarcity prevalent in medical imaging but also provides a more accessible and reliable diagnostic tool for early pneumonia detection. The augmented data improved the model’s accuracy and generalization, achieving an overall accuracy of 95% in pneumonia detection. These findings highlight the efficacy of the MobileNetV2 model, offering a computationally efficient yet robust solution well-suited for resource-constrained environments such as mobile health applications. This study demonstrates the potential of synthetic data augmentation in enhancing medical image analysis for critical conditions like pediatric pneumonia.

Keywords: pneumonia, MobileNetV2, image classification, GAN, VAE, deep learning

Procedia PDF Downloads 125
16982 ECG Based Reliable User Identification Using Deep Learning

Authors: R. N. Begum, Ambalika Sharma, G. K. Singh

Abstract:

Identity theft has serious ramifications beyond data and personal information loss. This necessitates the implementation of robust and efficient user identification systems. Therefore, automatic biometric recognition systems are the need of the hour, and ECG-based systems are unquestionably the best choice due to their appealing inherent characteristics. The CNNs are the recent state-of-the-art techniques for ECG-based user identification systems. However, the results obtained are significantly below standards, and the situation worsens as the number of users and types of heartbeats in the dataset grows. As a result, this study proposes a highly accurate and resilient ECG-based person identification system using CNN's dense learning framework. The proposed research explores explicitly the calibre of dense CNNs in the field of ECG-based human recognition. The study tests four different configurations of dense CNN which are trained on a dataset of recordings collected from eight popular ECG databases. With the highest FAR of 0.04 percent and the highest FRR of 5%, the best performing network achieved an identification accuracy of 99.94 percent. The best network is also tested with various train/test split ratios. The findings show that DenseNets are not only extremely reliable but also highly efficient. Thus, they might also be implemented in real-time ECG-based human recognition systems.

Keywords: Biometrics, Dense Networks, Identification Rate, Train/Test split ratio

Procedia PDF Downloads 160
16981 Communication Development for Development Communication: Prospects and Challenges of New Media Technologies in South East Zone, Nigeria

Authors: O. I. Ekwueme

Abstract:

New media technologies are noted for their immense contributions in various sectors of the economy which are believed to have resulted in the development of European countries. There is an assumption that we cannot have development communication without communication development, but we are not sure if new media technologies contribute to development in the South-East zone, Nigeria. The study employed mixed method and discovered that new media technologies have a very minimal relationship to development in the South-East zone, Nigeria. It was discovered that the media report on development news is basically informative instead of interactive. The South-East zone is scarcely covered unlike other zones. It argued that the communication technologies introduced in Nigeria was as a result of their struggle for independence. It was recommended that media organisations in the South-East zone should give adequate coverage to the zone, and be more interactive.

Keywords: communication, development, new media, technologies

Procedia PDF Downloads 340
16980 Quantitative Analysis of Contract Variations Impact on Infrastructure Project Performance

Authors: Soheila Sadeghi

Abstract:

Infrastructure projects often encounter contract variations that can significantly deviate from the original tender estimates, leading to cost overruns, schedule delays, and financial implications. This research aims to quantitatively assess the impact of changes in contract variations on project performance by conducting an in-depth analysis of a comprehensive dataset from the Regional Airport Car Park project. The dataset includes tender budget, contract quantities, rates, claims, and revenue data, providing a unique opportunity to investigate the effects of variations on project outcomes. The study focuses on 21 specific variations identified in the dataset, which represent changes or additions to the project scope. The research methodology involves establishing a baseline for the project's planned cost and scope by examining the tender budget and contract quantities. Each variation is then analyzed in detail, comparing the actual quantities and rates against the tender estimates to determine their impact on project cost and schedule. The claims data is utilized to track the progress of work and identify deviations from the planned schedule. The study employs statistical analysis using R to examine the dataset, including tender budget, contract quantities, rates, claims, and revenue data. Time series analysis is applied to the claims data to track progress and detect variations from the planned schedule. Regression analysis is utilized to investigate the relationship between variations and project performance indicators, such as cost overruns and schedule delays. The research findings highlight the significance of effective variation management in construction projects. The analysis reveals that variations can have a substantial impact on project cost, schedule, and financial outcomes. The study identifies specific variations that had the most significant influence on the Regional Airport Car Park project's performance, such as PV03 (additional fill, road base gravel, spray seal, and asphalt), PV06 (extension to the commercial car park), and PV07 (additional box out and general fill). These variations contributed to increased costs, schedule delays, and changes in the project's revenue profile. The study also examines the effectiveness of project management practices in managing variations and mitigating their impact. The research suggests that proactive risk management, thorough scope definition, and effective communication among project stakeholders can help minimize the negative consequences of variations. The findings emphasize the importance of establishing clear procedures for identifying, assessing, and managing variations throughout the project lifecycle. The outcomes of this research contribute to the body of knowledge in construction project management by demonstrating the value of analyzing tender, contract, claims, and revenue data in variation impact assessment. However, the research acknowledges the limitations imposed by the dataset, particularly the absence of detailed contract and tender documents. This constraint restricts the depth of analysis possible in investigating the root causes and full extent of variations' impact on the project. Future research could build upon this study by incorporating more comprehensive data sources to further explore the dynamics of variations in construction projects.

Keywords: contract variation impact, quantitative analysis, project performance, claims analysis

Procedia PDF Downloads 40
16979 Mental Health Diagnosis through Machine Learning Approaches

Authors: Md Rafiqul Islam, Ashir Ahmed, Anwaar Ulhaq, Abu Raihan M. Kamal, Yuan Miao, Hua Wang

Abstract:

Mental health of people is equally important as of their physical health. Mental health and well-being are influenced not only by individual attributes but also by the social circumstances in which people find themselves and the environment in which they live. Like physical health, there is a number of internal and external factors such as biological, social and occupational factors that could influence the mental health of people. People living in poverty, suffering from chronic health conditions, minority groups, and those who exposed to/or displaced by war or conflict are generally more likely to develop mental health conditions. However, to authors’ best knowledge, there is dearth of knowledge on the impact of workplace (especially the highly stressed IT/Tech workplace) on the mental health of its workers. This study attempts to examine the factors influencing the mental health of tech workers. A publicly available dataset containing more than 65,000 cells and 100 attributes is examined for this purpose. Number of machine learning techniques such as ‘Decision Tree’, ‘K nearest neighbor’ ‘Support Vector Machine’ and ‘Ensemble’, are then applied to the selected dataset to draw the findings. It is anticipated that the analysis reported in this study would contribute in presenting useful insights on the attributes contributing in the mental health of tech workers using relevant machine learning techniques.

Keywords: mental disorder, diagnosis, occupational stress, IT workplace

Procedia PDF Downloads 288
16978 Brain Tumor Detection and Classification Using Pre-Trained Deep Learning Models

Authors: Aditya Karade, Sharada Falane, Dhananjay Deshmukh, Vijaykumar Mantri

Abstract:

Brain tumors pose a significant challenge in healthcare due to their complex nature and impact on patient outcomes. The application of deep learning (DL) algorithms in medical imaging have shown promise in accurate and efficient brain tumour detection. This paper explores the performance of various pre-trained DL models ResNet50, Xception, InceptionV3, EfficientNetB0, DenseNet121, NASNetMobile, VGG19, VGG16, and MobileNet on a brain tumour dataset sourced from Figshare. The dataset consists of MRI scans categorizing different types of brain tumours, including meningioma, pituitary, glioma, and no tumour. The study involves a comprehensive evaluation of these models’ accuracy and effectiveness in classifying brain tumour images. Data preprocessing, augmentation, and finetuning techniques are employed to optimize model performance. Among the evaluated deep learning models for brain tumour detection, ResNet50 emerges as the top performer with an accuracy of 98.86%. Following closely is Xception, exhibiting a strong accuracy of 97.33%. These models showcase robust capabilities in accurately classifying brain tumour images. On the other end of the spectrum, VGG16 trails with the lowest accuracy at 89.02%.

Keywords: brain tumour, MRI image, detecting and classifying tumour, pre-trained models, transfer learning, image segmentation, data augmentation

Procedia PDF Downloads 74
16977 Analysis of the Lung Microbiome in Cystic Fibrosis Patients Using 16S Sequencing

Authors: Manasvi Pinnaka, Brianna Chrisman

Abstract:

Cystic fibrosis patients often develop lung infections that range anywhere in severity from mild to life-threatening due to the presence of thick and sticky mucus that fills their airways. Since many of these infections are chronic, they not only affect a patient’s ability to breathe but also increase the chances of mortality by respiratory failure. With a publicly available dataset of DNA sequences from bacterial species in the lung microbiome of cystic fibrosis patients, the correlations between different microbial species in the lung and the extent of deterioration of lung function were investigated. 16S sequencing technologies were used to determine the microbiome composition of the samples in the dataset. For the statistical analyses, referencing helped distinguish between taxonomies, and the proportions of certain taxa relative to another were determined. It was found that the Fusobacterium, Actinomyces, and Leptotrichia microbial types all had a positive correlation with the FEV1 score, indicating the potential displacement of these species by pathogens as the disease progresses. However, the dominant pathogens themselves, including Pseudomonas aeruginosa and Staphylococcus aureus, did not have statistically significant negative correlations with the FEV1 score as described by past literature. Examining the lung microbiology of cystic fibrosis patients can help with the prediction of the current condition of lung function, with the potential to guide doctors when designing personalized treatment plans for patients.

Keywords: bacterial infections, cystic fibrosis, lung microbiome, 16S sequencing

Procedia PDF Downloads 99
16976 Perceptions on Development of the Deaf in Higher Education Level: The Case of Special Education Students in Tiaong, Quezon, Philippines

Authors: Ashley Venerable, Rosario Tatlonghari

Abstract:

This study identified how college deaf students of Bartimaeus Center for Alternative Learning in Tiaong, Quezon, Philippines view development using visual communication techniques and generating themes from responses. Complete enumeration was employed. Guided by Constructivist Theory of Perception, past experiences and stored information influenced perception. These themes of development emerged: social development; pleasant environment; interpersonal relationships; availability of resources; employment; infrastructure development; values; and peace and security. Using the National Economic and Development Authority development indicators, findings showed the deaf students’ views on development were similar from the mainstream views. Responses also became more meaningful through visual communication techniques.

Keywords: deaf, development, perception, development indicators, visual communication

Procedia PDF Downloads 431
16975 Comparison of Existing Predictor and Development of Computational Method for S- Palmitoylation Site Identification in Arabidopsis Thaliana

Authors: Ayesha Sanjana Kawser Parsha

Abstract:

S-acylation is an irreversible bond in which cysteine residues are linked to fatty acids palmitate (74%) or stearate (22%), either at the COOH or NH2 terminal, via a thioester linkage. There are several experimental methods that can be used to identify the S-palmitoylation site; however, since they require a lot of time, computational methods are becoming increasingly necessary. There aren't many predictors, however, that can locate S- palmitoylation sites in Arabidopsis Thaliana with sufficient accuracy. This research is based on the importance of building a better prediction tool. To identify the type of machine learning algorithm that predicts this site more accurately for the experimental dataset, several prediction tools were examined in this research, including the GPS PALM 6.0, pCysMod, GPS LIPID 1.0, CSS PALM 4.0, and NBA PALM. These analyses were conducted by constructing the receiver operating characteristics plot and the area under the curve score. An AI-driven deep learning-based prediction tool has been developed utilizing the analysis and three sequence-based input data, such as the amino acid composition, binary encoding profile, and autocorrelation features. The model was developed using five layers, two activation functions, associated parameters, and hyperparameters. The model was built using various combinations of features, and after training and validation, it performed better when all the features were present while using the experimental dataset for 8 and 10-fold cross-validations. While testing the model with unseen and new data, such as the GPS PALM 6.0 plant and pCysMod mouse, the model performed better, and the area under the curve score was near 1. It can be demonstrated that this model outperforms the prior tools in predicting the S- palmitoylation site in the experimental data set by comparing the area under curve score of 10-fold cross-validation of the new model with the established tools' area under curve score with their respective training sets. The objective of this study is to develop a prediction tool for Arabidopsis Thaliana that is more accurate than current tools, as measured by the area under the curve score. Plant food production and immunological treatment targets can both be managed by utilizing this method to forecast S- palmitoylation sites.

Keywords: S- palmitoylation, ROC PLOT, area under the curve, cross- validation score

Procedia PDF Downloads 77
16974 Evaluation and Compression of Different Language Transformer Models for Semantic Textual Similarity Binary Task Using Minority Language Resources

Authors: Ma. Gracia Corazon Cayanan, Kai Yuen Cheong, Li Sha

Abstract:

Training a language model for a minority language has been a challenging task. The lack of available corpora to train and fine-tune state-of-the-art language models is still a challenge in the area of Natural Language Processing (NLP). Moreover, the need for high computational resources and bulk data limit the attainment of this task. In this paper, we presented the following contributions: (1) we introduce and used a translation pair set of Tagalog and English (TL-EN) in pre-training a language model to a minority language resource; (2) we fine-tuned and evaluated top-ranking and pre-trained semantic textual similarity binary task (STSB) models, to both TL-EN and STS dataset pairs. (3) then, we reduced the size of the model to offset the need for high computational resources. Based on our results, the models that were pre-trained to translation pairs and STS pairs can perform well for STSB task. Also, having it reduced to a smaller dimension has no negative effect on the performance but rather has a notable increase on the similarity scores. Moreover, models that were pre-trained to a similar dataset have a tremendous effect on the model’s performance scores.

Keywords: semantic matching, semantic textual similarity binary task, low resource minority language, fine-tuning, dimension reduction, transformer models

Procedia PDF Downloads 211
16973 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis

Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel

Abstract:

Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.

Keywords: artificial immune system, breast cancer diagnosis, Euclidean function, Gaussian function

Procedia PDF Downloads 435
16972 JaCoText: A Pretrained Model for Java Code-Text Generation

Authors: Jessica Lopez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri

Abstract:

Pretrained transformer-based models have shown high performance in natural language generation tasks. However, a new wave of interest has surged: automatic programming language code generation. This task consists of translating natural language instructions to a source code. Despite the fact that well-known pre-trained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformer neural network. It aims to generate java source code from natural language text. JaCoText leverages the advantages of both natural language and code generation models. More specifically, we study some findings from state of the art and use them to (1) initialize our model from powerful pre-trained models, (2) explore additional pretraining on our java dataset, (3) lead experiments combining the unimodal and bimodal data in training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.

Keywords: java code generation, natural language processing, sequence-to-sequence models, transformer neural networks

Procedia PDF Downloads 284
16971 The Reproducibility and Repeatability of Modified Likelihood Ratio for Forensics Handwriting Examination

Authors: O. Abiodun Adeyinka, B. Adeyemo Adesesan

Abstract:

The forensic use of handwriting depends on the analysis, comparison, and evaluation decisions made by forensic document examiners. When using biometric technology in forensic applications, it is necessary to compute Likelihood Ratio (LR) for quantifying strength of evidence under two competing hypotheses, namely the prosecution and the defense hypotheses wherein a set of assumptions and methods for a given data set will be made. It is therefore important to know how repeatable and reproducible our estimated LR is. This paper evaluated the accuracy and reproducibility of examiners' decisions. Confidence interval for the estimated LR were presented so as not get an incorrect estimate that will be used to deliver wrong judgment in the court of Law. The estimate of LR is fundamentally a Bayesian concept and we used two LR estimators, namely Logistic Regression (LoR) and Kernel Density Estimator (KDE) for this paper. The repeatability evaluation was carried out by retesting the initial experiment after an interval of six months to observe whether examiners would repeat their decisions for the estimated LR. The experimental results, which are based on handwriting dataset, show that LR has different confidence intervals which therefore implies that LR cannot be estimated with the same certainty everywhere. Though the LoR performed better than the KDE when tested using the same dataset, the two LR estimators investigated showed a consistent region in which LR value can be estimated confidently. These two findings advance our understanding of LR when used in computing the strength of evidence in handwriting using forensics.

Keywords: confidence interval, handwriting, kernel density estimator, KDE, logistic regression LoR, repeatability, reproducibility

Procedia PDF Downloads 124
16970 Content-Aware Image Augmentation for Medical Imaging Applications

Authors: Filip Rusak, Yulia Arzhaeva, Dadong Wang

Abstract:

Machine learning based Computer-Aided Diagnosis (CAD) is gaining much popularity in medical imaging and diagnostic radiology. However, it requires a large amount of high quality and labeled training image datasets. The training images may come from different sources and be acquired from different radiography machines produced by different manufacturers, digital or digitized copies of film radiographs, with various sizes as well as different pixel intensity distributions. In this paper, a content-aware image augmentation method is presented to deal with these variations. The results of the proposed method have been validated graphically by plotting the removed and added seams of pixels on original images. Two different chest X-ray (CXR) datasets are used in the experiments. The CXRs in the datasets defer in size, some are digital CXRs while the others are digitized from analog CXR films. With the proposed content-aware augmentation method, the Seam Carving algorithm is employed to resize CXRs and the corresponding labels in the form of image masks, followed by histogram matching used to normalize the pixel intensities of digital radiography, based on the pixel intensity values of digitized radiographs. We implemented the algorithms, resized the well-known Montgomery dataset, to the size of the most frequently used Japanese Society of Radiological Technology (JSRT) dataset and normalized our digital CXRs for testing. This work resulted in the unified off-the-shelf CXR dataset composed of radiographs included in both, Montgomery and JSRT datasets. The experimental results show that even though the amount of augmentation is large, our algorithm can preserve the important information in lung fields, local structures, and global visual effect adequately. The proposed method can be used to augment training and testing image data sets so that the trained machine learning model can be used to process CXRs from various sources, and it can be potentially used broadly in any medical imaging applications.

Keywords: computer-aided diagnosis, image augmentation, lung segmentation, medical imaging, seam carving

Procedia PDF Downloads 222
16969 Assessment of Psychomotor Development of Preschool Children: A Review of Eight Psychomotor Developmental Tools

Authors: Viola Hubačová Pirová

Abstract:

The assessment of psychomotor development allows us to identify children with motor delays, helps us to monitor progress in time and prepare suitable intervention programs. The foundation of psychomotor development lies in pre-school age and is crucial for child´s further cognitive and social development. Many assessment tools of psychomotor development have been developed over the years. Some of them are easy screening tools; others are more complex and sophisticated. The purpose of this review is to describe the history of psychomotor assessment, specify preschool children´s psychomotor evaluation and review eight psychomotor development assessment tools for preschool children (Denver II., DEMOST-PRE, TGMD -2/3, BOT-2, MABC-2, PDMS-2, KTK, MOT 4-6). The selection of test depends on purpose and context in which is the assessment planned.

Keywords: assessment of psychomotor development, preschool children, psychomotor development, review of assessment tools

Procedia PDF Downloads 167
16968 The Education-Development Nexus: The Vision of International Organizations

Authors: Thibaut Lauwerier

Abstract:

This presentation will cover the vision of international organizations on the link between development and education. This issue is very relevant to address the general topic of the conference. 'Educating for development' is indeed at the heart of their discourse. For most of international organizations involved in education, it is important to invest in this field since it is at the service of development. The idea of this presentation is to better understand the vision of development according to these international organizations and how education can contribute to this type of development. To address this issue, we conducted a comparative study of three major international organizations (OECD, UNESCO and World Bank) influencing education policy at the international level. The data come from the strategic reports of these organizations over the period 1990-2015. The results show that the visions of development refer mainly to the neoliberal agenda, despite evolutions, even contradictions. And so, education must increase productivity, improve economic growth, etc. UNESCO, which has a less narrow conception of the development and therefore the aims of education, does not have the same means as the two other organizations to advocate for an alternative vision.

Keywords: development, education, international organizations, poilcy

Procedia PDF Downloads 221
16967 FreGsd: A Framework for Golbal Software Requirement Engineering

Authors: Alsahli Abdulaziz Abdullah, Hameed Ullah Khan

Abstract:

Software development nowadays is more and more using global ways of development instead of normal development enviroment where development occur in one location. This paper is a aimed to propose a Requirement Engineering framework to support Global Software Development environment with regards to all requirment engineering activities from elicitation to fially magning requirment change. Global software enviroment is more and more gaining better reputation in software developmet with better quality is resulting from developing in this eviroment yet with lower cost.However, failure rate developing in this enviroment is high due to inapproprate requirment development and managment.This paper will add to the software engineering development envrioments discipline and many developers in GSD will benefit from it.

Keywords: global software development environment, GSD, requirement engineering, FreGsd, computer engineering

Procedia PDF Downloads 549
16966 Microfinance and Microenterprise Development: Evidence from Bangladesh

Authors: Rahat Dewan

Abstract:

The debate surrounding the efficacy of microfinance and the importance of microenterprise is fierce, lengthy and multifaceted. This paper reviews key issues, theory and evidence surrounding microfinance and microenterprise development for poverty alleviation. We report on a recently completed, large-scale microenterprise development intervention in Bangladesh using the rudimentary data available to us, and also our own qualitative field research. We find reasonable evidence for significant returns to several development outcomes.

Keywords: Bangladesh, development, microenterprise, microfinance

Procedia PDF Downloads 234
16965 A Study of Agile Based Approaches to Improve Software Quality

Authors: Gurmeet Kaur

Abstract:

Agile software development methods are being recognized as popular, and efficient approach to the development of software system that has a short delivery period with high quality also that meets customer requirements with zero defect. In agile software development, quality means quality of code where in the quality is maintained through the use of methods or approaches like refactoring, test driven development, behavior driven development, acceptance test driven development, and demand driven development. Software quality is measured in term of metrics such as the number of defects during development of software. Usage of above mentioned methods or approaches, reduces the possibilities of defects in developed software, and hence improve quality. This paper focuses on study of agile based quality methods or approaches for software development that ensures improved quality of software as well as reduced cost, and customer satisfaction.

Keywords: ATDD, BDD, DDD, TDD

Procedia PDF Downloads 172
16964 Statistical Models and Time Series Forecasting on Crime Data in Nepal

Authors: Dila Ram Bhandari

Abstract:

Throughout the 20th century, new governments were created where identities such as ethnic, religious, linguistic, caste, communal, tribal, and others played a part in the development of constitutions and the legal system of victim and criminal justice. Acute issues with extremism, poverty, environmental degradation, cybercrimes, human rights violations, crime against, and victimization of both individuals and groups have recently plagued South Asian nations. Everyday massive number of crimes are steadfast, these frequent crimes have made the lives of common citizens restless. Crimes are one of the major threats to society and also for civilization. Crime is a bone of contention that can create a societal disturbance. The old-style crime solving practices are unable to live up to the requirement of existing crime situations. Crime analysis is one of the most important activities of the majority of intelligent and law enforcement organizations all over the world. The South Asia region lacks such a regional coordination mechanism, unlike central Asia of Asia Pacific regions, to facilitate criminal intelligence sharing and operational coordination related to organized crime, including illicit drug trafficking and money laundering. There have been numerous conversations in recent years about using data mining technology to combat crime and terrorism. The Data Detective program from Sentient as a software company, uses data mining techniques to support the police (Sentient, 2017). The goals of this internship are to test out several predictive model solutions and choose the most effective and promising one. First, extensive literature reviews on data mining, crime analysis, and crime data mining were conducted. Sentient offered a 7-year archive of crime statistics that were daily aggregated to produce a univariate dataset. Moreover, a daily incidence type aggregation was performed to produce a multivariate dataset. Each solution's forecast period lasted seven days. Statistical models and neural network models were the two main groups into which the experiments were split. For the crime data, neural networks fared better than statistical models. This study gives a general review of the applied statistics and neural network models. A detailed image of each model's performance on the available data and generalizability is provided by a comparative analysis of all the models on a comparable dataset. Obviously, the studies demonstrated that, in comparison to other models, Gated Recurrent Units (GRU) produced greater prediction. The crime records of 2005-2019 which was collected from Nepal Police headquarter and analysed by R programming. In conclusion, gated recurrent unit implementation could give benefit to police in predicting crime. Hence, time series analysis using GRU could be a prospective additional feature in Data Detective.

Keywords: time series analysis, forecasting, ARIMA, machine learning

Procedia PDF Downloads 164
16963 A Comparative Asessment of Some Algorithms for Modeling and Forecasting Horizontal Displacement of Ialy Dam, Vietnam

Authors: Kien-Trinh Thi Bui, Cuong Manh Nguyen

Abstract:

In order to simulate and reproduce the operational characteristics of a dam visually, it is necessary to capture the displacement at different measurement points and analyze the observed movement data promptly to forecast the dam safety. The accuracy of forecasts is further improved by applying machine learning methods to data analysis progress. In this study, the horizontal displacement monitoring data of the Ialy hydroelectric dam was applied to machine learning algorithms: Gaussian processes, multi-layer perceptron neural networks, and the M5-rules algorithm for modelling and forecasting of horizontal displacement of the Ialy hydropower dam (Vietnam), respectively, for analysing. The database which used in this research was built by collecting time series of data from 2006 to 2021 and divided into two parts: training dataset and validating dataset. The final results show all three algorithms have high performance for both training and model validation, but the MLPs is the best model. The usability of them are further investigated by comparison with a benchmark models created by multi-linear regression. The result show the performance which obtained from all the GP model, the MLPs model and the M5-Rules model are much better, therefore these three models should be used to analyze and predict the horizontal displacement of the dam.

Keywords: Gaussian processes, horizontal displacement, hydropower dam, Ialy dam, M5-Rules, multi-layer perception neural networks

Procedia PDF Downloads 210
16962 Network and Sentiment Analysis of U.S. Congressional Tweets

Authors: Chaitanya Kanakamedala, Hansa Pradhan, Carter Gilbert

Abstract:

Social media platforms, such as Twitter, are excellent datasets for understanding human interactions and sentiments. This report explores social dynamics among US Congressional members through a network analysis applied to a dataset of tweets spanning 2008 to 2017 from the ’US Congressional Tweets Dataset’. In this report, we preform network analysis where connections between users (edges) are established based on a similarity threshold: two tweets are connected if the tweets they post are similar. By utilizing the Natural Language Toolkit (NLTK) and NetworkX, we quantified tweet similarity and constructed a graph comprising various interconnected components. Each component represents a cluster of users with closely aligned content. We then preform sentiment analysis on each cluster to explore the prevalent emotions and opinions within these groups. Our findings reveal that despite the initial expectation of distinct ideological divisions typically aligning with party lines, the analysis exposed a high degree of topical convergence across tweets from different political affiliations. The analysis preformed in this report not only highlights the potential of social media as a tool for political communication but also suggests a complex layer of interaction that transcends traditional partisan boundaries, reflecting a complicated landscape of politics in the digital age.

Keywords: natural language processing, sentiment analysis, centrality analysis, topic modeling

Procedia PDF Downloads 33