Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 456

Search results for: Pima Indians diabetes dataset

336 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1237

335 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: Clustering, k-means, categorical datasets, pattern recognition, unsupervised learning, knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3496

334 A New DIDS Design Based on a Combination Feature Selection Approach

Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman

Abstract:

Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.

Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1888

333 NDENet: End-to-End Nighttime Dehazing and Enhancement

Authors: H. Baskar, A. S. Chakravarthy, P. Garg, D. Goel, A. S. Raj, K. Kumar, Lakshya, R. Parvatham, V. Sushant, B. Kumar Rout

Abstract:

In this paper, we present a computer vision task called nighttime dehaze-enhancement. This task aims to jointly perform dehazing and lightness enhancement. Our task fundamentally differs from nighttime dehazing – our goal is to jointly dehaze and enhance scenes, while nighttime dehazing aims to dehaze scenes under a nighttime setting. In order to facilitate further research on this task, we release a benchmark dataset called Reside-β Night dataset, consisting of 4122 nighttime hazed images from 2061 scenes and 2061 ground truth images. Moreover, we also propose a network called NDENet (Nighttime Dehaze-Enhancement Network), which jointly performs dehazing and low-light enhancement in an end-to-end manner. We evaluate our method on the proposed benchmark and achieve Structural Index Similarity (SSIM) of 0.8962 and Peak Signal to Noise Ratio (PSNR) of 26.25. We also compare our network with other baseline networks on our benchmark to demonstrate the effectiveness of our approach. We believe that nighttime dehaze-enhancement is an essential task particularly for autonomous navigation applications, and hope that our work will open up new frontiers in research. The code for our network is made publicly available.

Keywords: Dehazing, image enhancement, nighttime, computer vision.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 561

332 Impact of Social Media on the Functioning of the Indian Government: A Critical Analysis

Authors: Priya Sepaha

Abstract:

Social media has loomed as the most effective tool in recent times to flag the causes, contents, opinions and direction of any social movement and has demonstrated that it will have a far-reaching effect on government as well. This study focuses on India which has emerged as the fastest growing community on social media. Social movement activists, in particular, have extensively utilized the power of digital social media to streamline the effectiveness of social protest on a particular issue through extensive successful mass mobilizations. This research analyses the role and impact of social media as a power to catalyze the social movements in India and further seeks to describe how certain social movements are resisted, subverted, co-opted and/or deployed by social media. The impact assessment study has been made with the help of cases, policies and some social movement which India has witnessed the assertion of numerous social issues perturbing the public which eventually paved the way for remarkable judicial decisions. The paper concludes with the observations that despite its pros and cons, the impacts of social media on the functioning of the Indian Government have demonstrated that it has already become an indispensable tool in the hands of social media-suave Indians who are committed to bring about a desired change.

Keywords: Impact, Indian government, misuse, social media, social movement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 931

331 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings

Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank

Abstract:

Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].

Keywords: data mining, protein secondary structure prediction, parallelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1562

330 Improved Blood Glucose-Insulin Monitoring with Dual-Layer Predictive Control Design

Authors: Vahid Nademi

Abstract:

In response to widely used wearable medical devices equipped with a continuous glucose monitor (CGM) and insulin pump, the advanced control methods are still demanding to get the full benefit of these devices. Unlike costly clinical trials, implementing effective insulin-glucose control strategies can provide significant contributions to the patients suffering from chronic diseases such as diabetes. This study deals with a key role of two-layer insulin-glucose regulator based on model-predictive-control (MPC) scheme so that the patient’s predicted glucose profile is in compliance with the insulin level injected through insulin pump automatically. It is achieved by iterative optimization algorithm which is called an integrated perturbation analysis and sequential quadratic programming (IPA-SQP) solver for handling uncertainties due to unexpected variations in glucose-insulin values and body’s characteristics. The feasibility evaluation of the discussed control approach is also studied by means of numerical simulations of two case scenarios via measured data. The obtained results are presented to verify the superior and reliable performance of the proposed control scheme with no negative impact on patient safety.

Keywords: Blood glucose monitoring, insulin pump, optimization, predictive control, diabetes disease.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 708

329 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1185

328 Citizenship Norms and the Participation of Young Adults in a Democracy

Authors: Samsudin A. Rahim, Latiffah Pawanteh, Ali Salman

Abstract:

This paper explores the changing trend in citizenship norms among young citizens from various ethnic groups in Malaysia and the extent to which it influences the participation of young citizens in political and civil issues. Embedded in democratic constitutions are the rights and freedoms that accompany citizenship, and these rights and freedoms include participation. Participation in democracies should go beyond voting; it should include taking part in the governance process. The political process is not at risk even though politics does not work as it did in the past. A national sample of 1697 respondents between the ages of 21 and 40 years were interviewed in January 2011. The findings show that respondents embrace an engaged-citizenship norm more than they do the traditional duty-citizen norm. Among the ethnic groups, the Chinese show lower means in both citizenship norms compared with other ethnic groups, namely, the Malays and the Indians. The duty-citizen norm correlates higher with political participation than with civic participation. On the other hand, the engaged-citizen norm correlates higher with civic participation than with political participation.

Keywords: citizenship norms, political participation, civic participation, youths, globalization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2484

327 Disaggregation the Daily Rainfall Dataset into Sub-Daily Resolution in the Temperate Oceanic Climate Region

Authors: Mohammad Bakhshi, Firas Al Janabi

Abstract:

High resolution rain data are very important to fulfill the input of hydrological models. Among models of high-resolution rainfall data generation, the temporal disaggregation was chosen for this study. The paper attempts to generate three different rainfall resolutions (4-hourly, hourly and 10-minutes) from daily for around 20-year record period. The process was done by DiMoN tool which is based on random cascade model and method of fragment. Differences between observed and simulated rain dataset are evaluated with variety of statistical and empirical methods: Kolmogorov-Smirnov test (K-S), usual statistics, and Exceedance probability. The tool worked well at preserving the daily rainfall values in wet days, however, the generated data are cumulated in a shorter time period and made stronger storms. It is demonstrated that the difference between generated and observed cumulative distribution function curve of 4-hourly datasets is passed the K-S test criteria while in hourly and 10-minutes datasets the P-value should be employed to prove that their differences were reasonable. The results are encouraging considering the overestimation of generated high-resolution rainfall data.

Keywords: DiMoN tool, disaggregation, exceedance probability, Kolmogorov-Smirnov Test, rainfall.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 947

326 Analysis of Metallothionein Gene MT1A (rs11076161) and MT2A (rs10636) Polymorphisms as a Molecular Marker in Type 2 Diabetes Mellitus among Malay Population

Authors: Norsakinah Mohammad Osman, Ali Etemad, Patimah Ismail

Abstract:

Type 2 diabetes mellitus (T2DM) is a complex metabolic disorder that characterized by the presence of high glucose in blood that cause from insulin resistance and insufficiency due to deterioration β-cell Langerhans functions. T2DM is commonly caused by the combination of inherited genetic variations as well as our own lifestyle. Metallothionein (MT) is a known cysteine-rich protein responsible in helping zinc homeostasis which is important in insulin signaling and secretion as well as protection our body from reactive oxygen species (ROS). MT scavenged ROS and free radicals in our body happen to be one of the reasons of T2DM and its complications. The objective of this study was to investigate the association of MT1A and MT2A polymorphisms between T2DM and control subjects among Malay populations. This study involved 150 T2DM and 120 Healthy individuals of Malay ethnic with mixed genders. The genomic DNA was extracted from buccal cells and amplified for MT1A and MT2A loci; the 347bp and 238bp banding patterns were respectively produced by mean of the Polymerase Chain Reaction (PCR). The PCR products were digested with Mlucl and Tsp451 restriction enzymes respectively and producing fragments lengths of (158/189/347bp) and (103/135/238bp) respectively. The ANOVA test was conducted and it shown that there was a significant difference between diabetic and control subjects for age, BMI, WHR, SBP, FPG, HBA1C, LDL, TG, TC and family history with (P<0.05). While the HDL, CVD risk ratio and DBP does not show any significant difference with (P>0.05). The genotype frequency for AA, AG and GG of MT1A polymorphisms was 72.7%, 22.7% and 4.7% in cases and 15%, 55% and 30% in control respectively. As for MT2A, genotype frequency of GG, GC and CC was 42.7%, 27.3% and 30% in case and 5%, 40% and 55% for control respectively. Both polymorphisms show significant difference between two investigated groups with (P=0.000). The Post hoc test was conducted and shows a significant difference between the genotypes within each polymorphism (P=0. 000). The MT1A and MT2A polymorphisms were believed to be the reliable molecular markers to distinguish the T2DM subjects from healthy individuals in Malay populations.

Keywords: Type 2 Diabetes Mellitus (T2DM), Metallothionein (MT), MT1A (rs11076161), MT2A (rs10636), Malay, Genetic Polymorphism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2278

325 ECG-Based Heartbeat Classification Using Convolutional Neural Networks

Authors: Jacqueline R. T. Alipo-on, Francesca I. F. Escobar, Myles J. T. Tan, Hezerul Abdul Karim, Nouar AlDahoul

Abstract:

Electrocardiogram (ECG) signal analysis and processing are crucial in the diagnosis of cardiovascular diseases which are considered as one of the leading causes of mortality worldwide. However, the traditional rule-based analysis of large volumes of ECG data is time-consuming, labor-intensive, and prone to human errors. With the advancement of the programming paradigm, algorithms such as machine learning have been increasingly used to perform an analysis on the ECG signals. In this paper, various deep learning algorithms were adapted to classify five classes of heart beat types. The dataset used in this work is the synthetic MIT-Beth Israel Hospital (MIT-BIH) Arrhythmia dataset produced from generative adversarial networks (GANs). Various deep learning models such as ResNet-50 convolutional neural network (CNN), 1-D CNN, and long short-term memory (LSTM) were evaluated and compared. ResNet-50 was found to outperform other models in terms of recall and F1 score using a five-fold average score of 98.88% and 98.87%, respectively. 1-D CNN, on the other hand, was found to have the highest average precision of 98.93%.

Keywords: Heartbeat classification, convolutional neural network, electrocardiogram signals, ECG signals, generative adversarial networks, long short-term memory, LSTM, ResNet-50.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 125

324 JaCoText: A Pretrained Model for Java Code-Text Generation

Authors: Jessica Lòpez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri

Abstract:

Pretrained transformer-based models have shown high performance in natural language generation task. However, a new wave of interest has surged: automatic programming language generation. This task consists of translating natural language instructions to a programming code. Despite the fact that well-known pretrained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformers neural network. It aims to generate java source code from natural language text. JaCoText leverages advantages of both natural language and code generation models. More specifically, we study some findings from the state of the art and use them to (1) initialize our model from powerful pretrained models, (2) explore additional pretraining on our java dataset, (3) carry out experiments combining the unimodal and bimodal data in the training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.

Keywords: Java code generation, Natural Language Processing, Sequence-to-sequence Models, Transformers Neural Networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 742

323 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis

Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel

Abstract:

Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.

Keywords: Artificial Immune System, Breast Cancer Diagnosis, Euclidean Function, Gaussian Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2081

322 The Reproducibility and Repeatability of Modified Likelihood Ratio for Forensics Handwriting Examination

Authors: O. Abiodun Adeyinka, B. Adeyemo Adesesan

Abstract:

The forensic use of handwriting depends on the analysis, comparison, and evaluation decisions made by forensic document examiners. When using biometric technology in forensic applications, it is necessary to compute Likelihood Ratio (LR) for quantifying strength of evidence under two competing hypotheses, namely the prosecution and the defense hypotheses wherein a set of assumptions and methods for a given data set will be made. It is therefore important to know how repeatable and reproducible our estimated LR is. This paper evaluated the accuracy and reproducibility of examiners' decisions. Confidence interval for the estimated LR were presented so as not get an incorrect estimate that will be used to deliver wrong judgment in the court of Law. The estimate of LR is fundamentally a Bayesian concept and we used two LR estimators, namely Logistic Regression (LoR) and Kernel Density Estimator (KDE) for this paper. The repeatability evaluation was carried out by retesting the initial experiment after an interval of six months to observe whether examiners would repeat their decisions for the estimated LR. The experimental results, which are based on handwriting dataset, show that LR has different confidence intervals which therefore implies that LR cannot be estimated with the same certainty everywhere. Though the LoR performed better than the KDE when tested using the same dataset, the two LR estimators investigated showed a consistent region in which LR value can be estimated confidently. These two findings advance our understanding of LR when used in computing the strength of evidence in handwriting using forensics.

Keywords: Logistic Regression LoR, Kernel Density Estimator KDE, Handwriting, Confidence Interval, Repeatability, Reproducibility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 418

321 The Antidiabetic Properties of Indonesian Swietenia mahagoni in Alloxan-Induced Diabetic Rats

Authors: T. Wresdiyati, S. Sa’diah, A. Winarto

Abstract:

Diabetes mellitus (DM) is a metabolic disease that can be indicated by the high level of blood glucose. The objective of this study was to observe the antidiabetic properties of ethanolic extract of Indonesian Swietenia mahagoni Jacq. seed on the profile of pancreatic superoxide dismutase and β-cells in the alloxan- experimental diabetic rats. The Swietenia mahagoni seed was obtained from Leuwiliang-Bogor, Indonesia. Extraction of Swietenia mahagoni was done by using ethanol with maceration methods. A total of 25 male Sprague dawley rats were divided into five groups; (a) negative control group, (b) positive control group (DM), (c) DM group that was treated with Swietenia mahagoni seed extract, (d) DM group that was treated with acarbose, and (e) non-DM group that was treated with Swietenia mahagoni seed extract. The DM groups were induced by alloxan (110 mg/kgBW). The extract was orally administrated to diabetic rats 500 mg/kg/BW/day for 28 days. The extract showed hypoglycemic effect, increased body weight, increased the content of superoxide dismutase in the pancreatic tissue, and delayed the rate of β-cells damage of experimental diabetic rats. These results suggested that the ethanolic extract of Indonesian Swietenia mahagoni Jacq. seed could be proposed as a potential anti-diabetic agent.

Keywords: β-cell, diabetes mellitus, superoxide dismutase, Swietenia mahagoni.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1418

320 Incidence of Chronic Disease and Lipid Profile in Veteran Rugby Athletes

Authors: Mike Climstein, Joe Walsh, John Best, Ian Timothy Heazlewood, Stephen Burke, Jyrki Kettunen, Kent Adams, Mark DeBeliso

Abstract:

Recently, the health of retired National Football League players, particularly lineman has been investigated. A number of studies have reported increased cardiometabolic risk, premature cardiovascular disease and incidence of type 2 diabetes. Rugby union players have somatotypes very similar to National Football League players which suggests that rugby players may have similar health risks. The International Golden Oldies World Rugby Festival (GORF) provided a unique opportunity to investigate the demographics of veteran rugby players. METHODOLOGIES: A cross-sectional, observational study was completed using an online web-based questionnaire that consisted of medical history and physiological measures. Data analysis was completed using a one sample t-test (<50yrs versus >50yrs) and Chi-square test. RESULTS: A total of 216 veteran rugby competitors (response rate = 6.8%) representing 10 countries, aged 35-72 yrs (mean 51.2, S.D. ±8.0), participated in the online survey. As a group, the incidence of current smokers was low at 8.8% (avg 72.4 cigs/wk) whilst the percentage consuming alcohol was high (93.1% (avg 11.2 drinks/wk). Competitors reported the following top six chronic diseases/disorders; hypertension (18.6%), arthritis (OA/RA, 11.5%), asthma (9.3%), hyperlipidemia (8.2%), diabetes (all types, 7.5%) and gout (6%), there were significant differences between groups with regard to cancer (all types) and migraines. When compared to the Australian general population (Australian Bureau of Statistics data, n=18,000), GORF competitors had a significantly lower incidence of anxiety (p<0.01), arthritis (p<0.06), depression (p<.01) however, a significantly higher incidence of diabetes (p<0.03) and hypertension (p<0.01). The GORF competitors also reported taking the following prescribed medications; antihypertensive (13%), hypolipidemics (8%), non-steroidal anti-inflammatory (6%), and anticoagulants (4%). Significant differences between groups were observed in antihypertensives, anticoagulants and hypolipidemics. There were significant (p<0.05) differences between groups (<50yrs versus >50yrs) with regard to height (180 vs 177cm), weight (97.6 vs 93.1Kg-s), BMI (30 vs 29.7kg/m2) and waist circumference (85.7 vs 93.1cm) however, there were no differences in subsequent parameters of systolic blood pressure, diastolic blood pressure, total cholesterol, HDL-C, LDL-C, triglycerides-C or fasting plasma glucose. CONCLUSIONS: This represents the first collection of demographics on this cohort. GORF participants demonstrated increased cardiometabolic risk with regard to the incidence of hypercholesterolemia, hypertension and type 2 diabetes. Preventative strategies should be developed to reduce this risk with education of these risks for future participants.

Keywords: Masters athlete, rugby union, risk factors, chronic disease

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2022

319 Static Analysis of Security Issues of the Python Packages Ecosystem

Authors: Adam Gorine, Faten Spondon

Abstract:

Python is considered the most popular programming language and offers its own ecosystem for archiving and maintaining open-source software packages. This system is called the Python Package Index (PyPI), the repository of this programming language. Unfortunately, one-third of these software packages have vulnerabilities that allow attackers to execute code automatically when a vulnerable or malicious package is installed. This paper contributes to large-scale empirical studies investigating security issues in the Python ecosystem by evaluating package vulnerabilities. These provide a series of implications that can help the security of software ecosystems by improving the process of discovering, fixing, and managing package vulnerabilities. The vulnerable dataset is generated using the NVD, the National Vulnerability Database, and the Snyk vulnerability dataset. In addition, we evaluated 807 vulnerability reports in the NVD and 3900 publicly known security vulnerabilities in Python Package Manager (Pip) from the Snyk database from 2002 to 2022. As a result, many Python vulnerabilities appear in high severity, followed by medium severity. The most problematic areas have been improper input validation and denial of service attacks. A hybrid scanning tool that combines the three scanners, Bandit, Snyk and Dlint, which provide a clear report of the code vulnerability, is also described.

Keywords: Python vulnerabilities, Bandit, Snyk, Dlint, Python Package Index, ecosystem, static analysis, malicious attacks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 146

318 Face Recognition Using Principal Component Analysis, K-Means Clustering, and Convolutional Neural Network

Authors: Zukisa Nante, Wang Zenghui

Abstract:

Face recognition is the problem of identifying or recognizing individuals in an image. This paper investigates a possible method to bring a solution to this problem. The method proposes an amalgamation of Principal Component Analysis (PCA), K-Means clustering, and Convolutional Neural Network (CNN) for a face recognition system. It is trained and evaluated using the ORL dataset. This dataset consists of 400 different faces with 40 classes of 10 face images per class. Firstly, PCA enabled the usage of a smaller network. This reduces the training time of the CNN. Thus, we get rid of the redundancy and preserve the variance with a smaller number of coefficients. Secondly, the K-Means clustering model is trained using the compressed PCA obtained data which select the K-Means clustering centers with better characteristics. Lastly, the K-Means characteristics or features are an initial value of the CNN and act as input data. The accuracy and the performance of the proposed method were tested in comparison to other Face Recognition (FR) techniques namely PCA, Support Vector Machine (SVM), as well as K-Nearest Neighbour (kNN). During experimentation, the accuracy and the performance of our suggested method after 90 epochs achieved the highest performance: 99% accuracy F1-Score, 99% precision, and 99% recall in 463.934 seconds. It outperformed the PCA that obtained 97% and KNN with 84% during the conducted experiments. Therefore, this method proved to be efficient in identifying faces in the images.

Keywords: Face recognition, Principal Component Analysis, PCA, Convolutional Neural Network, CNN, Rectified Linear Unit, ReLU, feature extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 438

317 Anti-Diabetic Effect of Bryophyllum pinnatum Leaves

Authors: E. F. Aransiola, M.O. Daramola, E. O. Iwalewa, A. M. Seluwa, O. O. Olufowobi

Abstract:

Diabetes is a chronic metabolic disorder that affects the quality of life in terms of physical health, social and psychological well-being. In spite of the enormous progress in the treatment of diabetes using existing commercial drugs, such as, insulin and oral hypoglycemic agents, the quest and search for new drugs is imperative due to several limitations of the commercial drugs. In addition, the existing diabetic drugs are expensive and unaffordable by the rural populace in the developing countries. The present study demonstrates the anti-diabetic property of aqueous extract of Bryophyllum pinnatum (BP) leaves using diabetic rats (albino rats) as models. At the same time, the anti-diabetic effect of the aqueous extract was compared to that of a sample containing a mixture of the extract and a commercial diabetic medicine, glibenclamide. A specified dosage of aqueous extract of Bryophyllum pinnatum (BP) leaves was administered on the experimental diabetic rats, and their BGL was measured and recorded. The results showed a significant drop in the BGL of the diabetic rats to a value close to normal blood glucose level within 120 minutes when only aqueous extract from BP leaves was used. When a sample containing a mixture of the aqueous extract and glibenclamide was administered, a further drop in BGL was observed. Therefore, the results reveal that aqueous extract of Bryophyllum pinnatum leaves have significant anti-diabetic properties, and that the performance of the existing drugs (glibenclamide) could be enhanced with the use of the aqueous extract.

Keywords: Anti-diabetics, Bryophyllum pinnatum, Blood glucose level, albino rats.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5193

316 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: Breast cancer, health diagnosis, Machine Learning, biomarker classification, Neural Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 235

315 Empirical Roughness Progression Models of Heavy Duty Rural Pavements

Authors: Nahla H. Alaswadko, Rayya A. Hassan, Bayar N. Mohammed

Abstract:

Empirical deterministic models have been developed to predict roughness progression of heavy duty spray sealed pavements for a dataset representing rural arterial roads. The dataset provides a good representation of the relevant network and covers a wide range of operating and environmental conditions. A sample with a large size of historical time series data for many pavement sections has been collected and prepared for use in multilevel regression analysis. The modelling parameters include road roughness as performance parameter and traffic loading, time, initial pavement strength, reactivity level of subgrade soil, climate condition, and condition of drainage system as predictor parameters. The purpose of this paper is to report the approaches adopted for models development and validation. The study presents multilevel models that can account for the correlation among time series data of the same section and to capture the effect of unobserved variables. Study results show that the models fit the data very well. The contribution and significance of relevant influencing factors in predicting roughness progression are presented and explained. The paper concludes that the analysis approach used for developing the models confirmed their accuracy and reliability by well-fitting to the validation data.

Keywords: Roughness progression, empirical model, pavement performance, heavy duty pavement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 758

314 ECG Based Reliable User Identification Using Deep Learning

Authors: R. N. Begum, Ambalika Sharma, G. K. Singh

Abstract:

Identity theft has serious ramifications beyond data and personal information loss. This necessitates the implementation of robust and efficient user identification systems. Therefore, automatic biometric recognition systems are the need of the hour, and electrocardiogram (ECG)-based systems are unquestionably the best choice due to their appealing inherent characteristics. The Convolutional Neural Networks (CNNs) are the recent state-of-the-art techniques for ECG-based user identification systems. However, the results obtained are significantly below standards, and the situation worsens as the number of users and types of heartbeats in the dataset grows. As a result, this study proposes a highly accurate and resilient ECG-based person identification system using CNN's dense learning framework. The proposed research explores explicitly the caliber of dense CNNs in the field of ECG-based human recognition. The study tests four different configurations of dense CNN which are trained on a dataset of recordings collected from eight popular ECG databases. With the highest False Acceptance Rate (FAR) of 0.04% and the highest False Rejection Rate (FRR) of 5%, the best performing network achieved an identification accuracy of 99.94%. The best network is also tested with various train/test split ratios. The findings show that DenseNets are not only extremely reliable, but also highly efficient. Thus, they might also be implemented in real-time ECG-based human recognition systems.

Keywords: Biometrics, dense networks, identification rate, train/test split ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 478

313 Linear Prediction System in Measuring Glucose Level in Blood

Authors: Intan Maisarah Abd Rahim, Herlina Abdul Rahim, Rashidah Ghazali

Abstract:

Diabetes is a medical condition that can lead to various diseases such as stroke, heart disease, blindness and obesity. In clinical practice, the concern of the diabetic patients towards the blood glucose examination is rather alarming as some of the individual describing it as something painful with pinprick and pinch. As for some patient with high level of glucose level, pricking the fingers multiple times a day with the conventional glucose meter for close monitoring can be tiresome, time consuming and painful. With these concerns, several non-invasive techniques were used by researchers in measuring the glucose level in blood, including ultrasonic sensor implementation, multisensory systems, absorbance of transmittance, bio-impedance, voltage intensity, and thermography. This paper is discussing the application of the near-infrared (NIR) spectroscopy as a non-invasive method in measuring the glucose level and the implementation of the linear system identification model in predicting the output data for the NIR measurement. In this study, the wavelengths considered are at the 1450 nm and 1950 nm. Both of these wavelengths showed the most reliable information on the glucose presence in blood. Then, the linear Autoregressive Moving Average Exogenous model (ARMAX) model with both un-regularized and regularized methods was implemented in predicting the output result for the NIR measurement in order to investigate the practicality of the linear system in this study. However, the result showed only 50.11% accuracy obtained from the system which is far from the satisfying results that should be obtained.

Keywords: Diabetes, glucose level, linear, near-infrared (NIR), non-invasive, prediction system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 830

312 Generative Adversarial Network Based Fingerprint Anti-Spoofing Limitations

Authors: Yehjune Heo

Abstract:

Fingerprint Anti-Spoofing approaches have been actively developed and applied in real-world applications. One of the main problems for Fingerprint Anti-Spoofing is not robust to unseen samples, especially in real-world scenarios. A possible solution will be to generate artificial, but realistic fingerprint samples and use them for training in order to achieve good generalization. This paper contains experimental and comparative results with currently popular GAN based methods and uses realistic synthesis of fingerprints in training in order to increase the performance. Among various GAN models, the most popular StyleGAN is used for the experiments. The CNN models were first trained with the dataset that did not contain generated fake images and the accuracy along with the mean average error rate were recorded. Then, the fake generated images (fake images of live fingerprints and fake images of spoof fingerprints) were each combined with the original images (real images of live fingerprints and real images of spoof fingerprints), and various CNN models were trained. The best performances for each CNN model, trained with the dataset of generated fake images and each time the accuracy and the mean average error rate, were recorded. We observe that current GAN based approaches need significant improvements for the Anti-Spoofing performance, although the overall quality of the synthesized fingerprints seems to be reasonable. We include the analysis of this performance degradation, especially with a small number of samples. In addition, we suggest several approaches towards improved generalization with a small number of samples, by focusing on what GAN based approaches should learn and should not learn.

Keywords: Anti-spoofing, CNN, fingerprint recognition, GAN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 543

311 Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications

Authors: Kanthida Kusonmano, Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Klaus R. Liedl, Armin Graber

Abstract:

Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.

Keywords: Classification, High dimensional data, Machine learning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2337

310 Comparison of Methods of Estimation for Use in Goodness of Fit Tests for Binary Multilevel Models

Authors: I. V. Pinto, M. R. Sooriyarachchi

Abstract:

It can be frequently observed that the data arising in our environment have a hierarchical or a nested structure attached with the data. Multilevel modelling is a modern approach to handle this kind of data. When multilevel modelling is combined with a binary response, the estimation methods get complex in nature and the usual techniques are derived from quasi-likelihood method. The estimation methods which are compared in this study are, marginal quasi-likelihood (order 1 & order 2) (MQL1, MQL2) and penalized quasi-likelihood (order 1 & order 2) (PQL1, PQL2). A statistical model is of no use if it does not reflect the given dataset. Therefore, checking the adequacy of the fitted model through a goodness-of-fit (GOF) test is an essential stage in any modelling procedure. However, prior to usage, it is also equally important to confirm that the GOF test performs well and is suitable for the given model. This study assesses the suitability of the GOF test developed for binary response multilevel models with respect to the method used in model estimation. An extensive set of simulations was conducted using MLwiN (v 2.19) with varying number of clusters, cluster sizes and intra cluster correlations. The test maintained the desirable Type-I error for models estimated using PQL2 and it failed for almost all the combinations of MQL. Power of the test was adequate for most of the combinations in all estimation methods except MQL1. Moreover, models were fitted using the four methods to a real-life dataset and performance of the test was compared for each model.

Keywords: Goodness-of-fit test, marginal quasi-likelihood, multilevel modelling, type-I error, penalized quasi-likelihood, power, quasi-likelihood.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 700

309 Time Series Forecasting Using Various Deep Learning Models

Authors: Jimeng Shi, Mahek Jain, Giri Narasimhan

Abstract:

Time Series Forecasting (TSF) is used to predict the target variables at a future time point based on the learning from previous time points. To keep the problem tractable, learning methods use data from a fixed length window in the past as an explicit input. In this paper, we study how the performance of predictive models change as a function of different look-back window sizes and different amounts of time to predict into the future. We also consider the performance of the recent attention-based transformer models, which had good success in the image processing and natural language processing domains. In all, we compare four different deep learning methods (Recurrent Neural Network (RNN), Long Short-term Memory (LSTM), Gated Recurrent Units (GRU), and Transformer) along with a baseline method. The dataset (hourly) we used is the Beijing Air Quality Dataset from the website of University of California, Irvine (UCI), which includes a multivariate time series of many factors measured on an hourly basis for a period of 5 years (2010-14). For each model, we also report on the relationship between the performance and the look-back window sizes and the number of predicted time points into the future. Our experiments suggest that Transformer models have the best performance with the lowest Mean Absolute Errors (MAE = 14.599, 23.273) and Root Mean Square Errors (RSME = 23.573, 38.131) for most of our single-step and multi-steps predictions. The best size for the look-back window to predict 1 hour into the future appears to be one day, while 2 or 4 days perform the best to predict 3 hours into the future.

Keywords: Air quality prediction, deep learning algorithms, time series forecasting, look-back window.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1047

308 Fully Automated Methods for the Detection and Segmentation of Mitochondria in Microscopy Images

Authors: Blessing Ojeme, Frederick Quinn, Russell Karls, Shannon Quinn

Abstract:

The detection and segmentation of mitochondria from fluorescence microscopy is crucial for understanding the complex structure of the nervous system. However, the constant fission and fusion of mitochondria and image distortion in the background make the task of detection and segmentation challenging. Although there exists a number of open-source software tools and artificial intelligence (AI) methods designed for analyzing mitochondrial images, the availability of only a few combined expertise in the medical field and AI required to utilize these tools poses a challenge to its full adoption and use in clinical settings. Motivated by the advantages of automated methods in terms of good performance, minimum detection time, ease of implementation, and cross-platform compactibility, this study proposes a fully automated framework for the detection and segmentation of mitochondria using both image shape information and descriptive statistics. Using the low-cost, open-source Python and OpenCV library, the algorithms are implemented in three stages: pre-processing; image binarization; and coarse-to-fine segmentation. The proposed model is validated using the fluorescence mitochondrial dataset. Ground truth labels generated using Labkit were also used to evaluate the performance of our detection and segmentation model using precision, recall and rand index. The study produces good detection and segmentation results and reports the challenges encountered during the image analysis of mitochondrial morphology from the fluorescence mitochondrial dataset. A discussion on the methods and future perspectives of fully automated frameworks concludes the paper.

Keywords: 2D, Binarization, CLAHE, detection, fluorescence microscopy, mitochondria, segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 371

307 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1134