Search results for: Pima Indians diabetes dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 456

Search results for: Pima Indians diabetes dataset

426 Causal Modeling of the Glucose-Insulin System in Type-I Diabetic Patients

Authors: J. Fernandez, N. Aguilar, R. Fernandez de Canete, J. C. Ramos-Diaz

Abstract:

In this paper, a simulation model of the glucose-insulin system for a patient undergoing diabetes Type 1 is developed by using a causal modeling approach under system dynamics. The OpenModelica simulation environment has been employed to build the so called causal model, while the glucose-insulin model parameters were adjusted to fit recorded mean data of a diabetic patient database. Model results under different conditions of a three-meal glucose and exogenous insulin ingestion patterns have been obtained. This simulation model can be useful to evaluate glucose-insulin performance in several circumstances, including insulin infusion algorithms in open-loop and decision support systems in closed-loop.

Keywords: Causal modeling, diabetes, glucose-insulin system, diabetes, causal modeling, OpenModelica software.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1383
425 Diagnosis of Diabetes Using Computer Methods: Soft Computing Methods for Diabetes Detection Using Iris

Authors: Piyush Samant, Ravinder Agarwal

Abstract:

Complementary and Alternative Medicine (CAM) techniques are quite popular and effective for chronic diseases. Iridology is more than 150 years old CAM technique which analyzes the patterns, tissue weakness, color, shape, structure, etc. for disease diagnosis. The objective of this paper is to validate the use of iridology for the diagnosis of the diabetes. The suggested model was applied in a systemic disease with ocular effects. 200 subject data of 100 each diabetic and non-diabetic were evaluated. Complete procedure was kept very simple and free from the involvement of any iridologist. From the normalized iris, the region of interest was cropped. All 63 features were extracted using statistical, texture analysis, and two-dimensional discrete wavelet transformation. A comparison of accuracies of six different classifiers has been presented. The result shows 89.66% accuracy by the random forest classifier.

Keywords: Complementary and alternative medicine, Iridology, iris, feature extraction, classification, disease prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1786
424 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach

Authors: K. Thangavel, R. Rathipriya

Abstract:

For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.

Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2037
423 Contributory Factors to Diabetes Dietary Regimen Non Adherence in Adults with Diabetes

Authors: Okolie Uchenna, Ehiemere Ijeoma, Ezenduka Pauline, Ogbu Sylvester

Abstract:

A cross sectional survey design was used to collect data from 370 diabetic patients. Two instruments were used in obtaining data; in-depth interview guide and researchers- developed questionnaire. Fisher's exact test was used to investigate association between the identified factors and nonadherence. Factors identified were: socio-demographic factors such as: gender, age, marital status, educational level and occupation; psychosocial obstacles such as: non-affordability of prescribed diet, frustration due to the restriction, limited spousal support, feelings of deprivation, feeling that temptation is inevitable, difficulty in adhering in social gatherings and difficulty in revealing to host that one is diabetic; health care providers obstacles were: poor attitude of health workers, irregular diabetes education in clinics , limited number of nutrition education sessions/ inability of the patients to estimate the desired quantity of food, no reminder post cards or phone calls about upcoming patient appointments and delayed start of appointment / time wasting in clinics.

Keywords: Behavior change, diabetes mellitus, dietarymanagement, diet adherence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3384
422 Role of Oxidative DNA Damage in Pathogenesis of Diabetic Neuropathy

Authors: Ireneusz Majsterek, Anna Merecz, Agnieszka Sliwinska, Marcin Kosmalski, Jacek Kasznicki, Jozef Drzewoski

Abstract:

Oxidative stress is considered to be the cause for onset and the progression of type 2 diabetes mellitus (T2DM) and complications including neuropathy. It is a deleterious process that can be an important mediator of damage to cell structures: protein, lipids and DNA. Data suggest that in patients with diabetes and diabetic neuropathy DNA repair is impaired, which prevents effective removal of lesions. Objective: The aim of our study was to evaluate the association of the hOGG1 (326 Ser/Cys) and XRCC1 (194 Arg/Trp, 399 Arg/Gln) gene polymorphisms whose protein is involved in the BER pathway with DNA repair efficiency in patients with diabetes type 2 and diabetic neuropathy compared to the healthy subjects. Genotypes were determined by PCR-RFLP analysis in 385 subjects, including 117 with type 2 diabetes, 56 with diabetic neuropathy and 212 with normal glucose metabolism. The polymorphisms studied include codon 326 of hOGG1 and 194, 399 of XRCC1 in the base excision repair (BER) genes. Comet assay was carried out using peripheral blood lymphocytes from the patients and controls. This test enabled the evaluation of DNA damage in cells exposed to hydrogen peroxide alone and in the combination with the endonuclease III (Nth). The results of the analysis of polymorphism were statistically examination by calculating the odds ratio (OR) and their 95% confidence intervals (95% CI) using the ¤ç2-tests. Our data indicate that patients with diabetes mellitus type 2 (including those with neuropathy) had higher frequencies of the XRCC1 399Arg/Gln polymorphism in homozygote (GG) (OR: 1.85 [95% CI: 1.07-3.22], P=0.3) and also increased frequency of 399Gln (G) allele (OR: 1.38 [95% CI: 1.03-1.83], P=0.3). No relation to other polymorphisms with increased risk of diabetes or diabetic neuropathy. In T2DM patients complicated by neuropathy, there was less efficient repair of oxidative DNA damage induced by hydrogen peroxide in both the presence and absence of the Nth enzyme. The results of our study suggest that the XRCC1 399 Arg/Gln polymorphism is a significant risk factor of T2DM in Polish population. Obtained data suggest a decreased efficiency of DNA repair in cells from patients with diabetes and neuropathy may be associated with oxidative stress. Additionally, patients with neuropathy are characterized by even greater sensitivity to oxidative damage than patients with diabetes, which suggests participation of free radicals in the pathogenesis of neuropathy.

Keywords: Diabetic neuropathy, oxidative stress, gene polymorphisms, oxidative DNA damage.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2031
421 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: Bootstrap, diabetes risk groups, error rate, k-nearest neighbors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1974
420 Performance Analysis of Traffic Classification with Machine Learning

Authors: Htay Htay Yi, Zin May Aye

Abstract:

Network security is role of the ICT environment because malicious users are continually growing that realm of education, business, and then related with ICT. The network security contravention is typically described and examined centrally based on a security event management system. The firewalls, Intrusion Detection System (IDS), and Intrusion Prevention System are becoming essential to monitor or prevent of potential violations, incidents attack, and imminent threats. In this system, the firewall rules are set only for where the system policies are needed. Dataset deployed in this system are derived from the testbed environment. The traffic as in DoS and PortScan traffics are applied in the testbed with firewall and IDS implementation. The network traffics are classified as normal or attacks in the existing testbed environment based on six machine learning classification methods applied in the system. It is required to be tested to get datasets and applied for DoS and PortScan. The dataset is based on CICIDS2017 and some features have been added. This system tested 26 features from the applied dataset. The system is to reduce false positive rates and to improve accuracy in the implemented testbed design. The system also proves good performance by selecting important features and comparing existing a dataset by machine learning classifiers.

Keywords: False negative rate, intrusion detection system, machine learning methods, performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1021
419 Multi-Layer Perceptron and Radial Basis Function Neural Network Models for Classification of Diabetic Retinopathy Disease Using Video-Oculography Signals

Authors: Ceren Kaya, Okan Erkaymaz, Orhan Ayar, Mahmut Özer

Abstract:

Diabetes Mellitus (Diabetes) is a disease based on insulin hormone disorders and causes high blood glucose. Clinical findings determine that diabetes can be diagnosed by electrophysiological signals obtained from the vital organs. 'Diabetic Retinopathy' is one of the most common eye diseases resulting on diabetes and it is the leading cause of vision loss due to structural alteration of the retinal layer vessels. In this study, features of horizontal and vertical Video-Oculography (VOG) signals have been used to classify non-proliferative and proliferative diabetic retinopathy disease. Twenty-five features are acquired by using discrete wavelet transform with VOG signals which are taken from 21 subjects. Two models, based on multi-layer perceptron and radial basis function, are recommended in the diagnosis of Diabetic Retinopathy. The proposed models also can detect level of the disease. We show comparative classification performance of the proposed models. Our results show that proposed the RBF model (100%) results in better classification performance than the MLP model (94%).

Keywords: Diabetic retinopathy, discrete wavelet transform, multi-layer perceptron, radial basis function, video-oculography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1308
418 Using Satellite Images Datasets for Road Intersection Detection in Route Planning

Authors: Fatma El-zahraa El-taher, Ayman Taha, Jane Courtney, Susan Mckeever

Abstract:

Understanding road networks plays an important role in navigation applications such as self-driving vehicles and route planning for individual journeys. Intersections of roads are essential components of road networks. Understanding the features of an intersection, from a simple T-junction to larger multi-road junctions is critical to decisions such as crossing roads or selecting safest routes. The identification and profiling of intersections from satellite images is a challenging task. While deep learning approaches offer state-of-the-art in image classification and detection, the availability of training datasets is a bottleneck in this approach. In this paper, a labelled satellite image dataset for the intersection recognition  problem is presented. It consists of 14,692 satellite images of Washington DC, USA. To support other users of the dataset, an automated download and labelling script is provided for dataset replication. The challenges of construction and fine-grained feature labelling of a satellite image dataset are examined, including the issue of how to address features that are spread across multiple images. Finally, the accuracy of detection of intersections in satellite images is evaluated.

Keywords: Satellite images, remote sensing images, data acquisition, autonomous vehicles, robot navigation, route planning, road intersections.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 647
417 Back Stepping Sliding Mode Control of Blood Glucose for Type I Diabetes

Authors: N. Tadrisi Parsa, A. R. Vali, R. Ghasemi

Abstract:

Diabetes is a growing health problem in worldwide. Especially, the patients with Type 1 diabetes need strict glycemic control because they have deficiency of insulin production. This paper attempts to control blood glucose based on body mathematical body model. The Bergman minimal mathematical model is used to develop the nonlinear controller. A novel back-stepping based sliding mode control (B-SMC) strategy is proposed as a solution that guarantees practical tracking of a desired glucose concentration. In order to show the performance of the proposed design, it is compared with conventional linear and fuzzy controllers which have been done in previous researches. The numerical simulation result shows the advantages of sliding mode back stepping controller design to linear and fuzzy controllers.

Keywords: Back stepping, Bergman Model, Nonlinear control, Sliding mode control.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3503
416 Reducing the Imbalance Penalty through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey

Authors: H. Anıl, G. Kar

Abstract:

In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations, since the geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning and time series methods, the total generation of the power plants belonging to Zorlu Doğal Electricity Generation, which has a high installed capacity in terms of geothermal, was predicted for the first one-week and first two-weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.

Keywords: Machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 145
415 An Educational Data Mining System for Advising Higher Education Students

Authors: Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy

Abstract:

Educational  data mining  is  a  specific  data   mining field applied to data originating from educational environments, it relies on different  approaches to discover hidden knowledge  from  the  available   data. Among these approaches are   machine   learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.

In  our  research, we propose  a “Student  Advisory  Framework” that  utilizes  classification  and  clustering  to  build  an  intelligent system. This system can be used to provide pieces of consultations to a first year  university  student to  pursue a  certain   education   track   where  he/she  will  likely  succeed  in, aiming  to  decrease   the  high  rate   of  academic  failure   among these  students.  A real case study  in Cairo  Higher  Institute  for Engineering, Computer  Science  and  Management  is  presented using  real  dataset   collected  from  2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.

Keywords: Classification, Clustering, Educational Data Mining (EDM), Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5174
414 Bottom Up Text Mining through Hierarchical Document Representation

Authors: Y. Djouadi., F. Souam.

Abstract:

Most of the existing text mining approaches are proposed, keeping in mind, transaction databases model. Thus, the mined dataset is structured using just one concept: the “transaction", whereas the whole dataset is modeled using the “set" abstract type. In such cases, the structure of the whole dataset and the relationships among the transactions themselves are not modeled and consequently, not considered in the mining process. We believe that taking into account structure properties of hierarchically structured information (e.g. textual document, etc ...) in the mining process, can leads to best results. For this purpose, an hierarchical associations rule mining approach for textual documents is proposed in this paper and the classical set-oriented mining approach is reconsidered profits to a Direct Acyclic Graph (DAG) oriented approach. Natural languages processing techniques are used in order to obtain the DAG structure. Based on this graph model, an hierarchical bottom up algorithm is proposed. The main idea is that each node is mined with its parent node.

Keywords: Graph based association rules mining, Hierarchical document structure, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2021
413 A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data

Authors: Benjamin D. Leiby, Darryl K. Ahner

Abstract:

This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known  values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions, while presenting a need for further refinement that mimics predictive mean matching.

Keywords: Correlation, country conflict, imputation, stochastic regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 367
412 Prediction of Research Topics Using Ensemble of Best Predictors from Similar Dataset

Authors: Indra Budi, Rizal Fathoni Aji, Agus Widodo

Abstract:

Prediction of future research topics by using time series analysis either statistical or machine learning has been conducted previously by several researchers. Several methods have been proposed to combine the forecasting results into single forecast. These methods use fixed combination of individual forecast to get the final forecast result. In this paper, quite different approach is employed to select the forecasting methods, in which every point to forecast is calculated by using the best methods used by similar validation dataset. The dataset used in the experiment is time series derived from research report in Garuda, which is an online sites belongs to the Ministry of Education in Indonesia, over the past 20 years. The experimental result demonstrates that the proposed method may perform better compared to the fix combination of predictors. In addition, based on the prediction result, we can forecast emerging research topics for the next few years.

Keywords: Combination, emerging topics, ensemble, forecasting, machine learning, prediction, research topics, similarity measure, time series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2077
411 Bayesian Geostatistical Modelling of COVID-19 Datasets

Authors: I. Oloyede

Abstract:

The COVID-19 dataset is obtained by extracting weather, longitude, latitude, ISO3666, cases and death of coronavirus patients across the globe. The data were extracted for a period of eight day choosing uniform time within the specified period. Then mapping of cases and deaths with reverence to continents were obtained. Bayesian Geostastical modelling was carried out on the dataset. The study found out that countries in the tropical region suffered less deaths/attacks compared to countries in the temperate region, this is due to high temperature in the tropical region.

Keywords: COVID-19, Bayesian, geostastical modelling, prior, posterior.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 409
410 Evaluation of Some Prominent Biomarkers in Rural Type – 2 Diabetes Mellitus Cases in Kanyakumari District, Tamil Nadu, India

Authors: Murugan. A., Jerlin Nirmala. F .

Abstract:

Life is beautiful. But, it is decided by genes, environment and the individual and shattered by the natural and / or the invited problems. Most of the global rural helpless masses are struggling for their survival since; they are neglected in all aspects of life including health. Amidst a countless number of miserable diseases in man, diabetes is becoming a dreaded killer and ramifying the entire globe in a jet speed. Diabetes control continues as a Herculean task to the scientific community and the modern society in the 21st century also. T2DM is not pertaining to any age and it can develop even during the childhood. This multifactorial disease abruptly changes the activities of certain vital biomarkers in the present rural T2DM cases. A remarkable variation in the levels of biomarkers like AST, ALT, GGT, ALP, LDH, HbA1C, C- peptide, fasting sugar, post-prandial sugar, sodium, potassium, BUN, creatinine and insulin show the rampant nature of T2DM in this physically active rural agrarian community.

Keywords: Alanine aminotransferase, Aspartate aminotransferase, Blood urea nitrogen, Glycated haemoglobin, Thyroid stimulating hormone

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1545
409 Integrating Geographic Information into Diabetes Disease Management

Authors: Tsu-Yun Chiu, Tsung-Hsueh Lu, Tain-Junn Cheng

Abstract:

Background: Traditional chronic disease management did not pay attention to effects of geographic factors on the compliance of treatment regime, which resulted in geographic inequality in outcomes of chronic disease management. This study aims to examine the geographic distribution and clustering of quality indicators of diabetes care. Method: We first extracted address, demographic information and quality of care indicators (number of visits, complications, prescription and laboratory records) of patients with diabetes for 2014 from medical information system in a medical center in Tainan City, Taiwan, and the patients’ addresses were transformed into district- and village-level data. We then compared the differences of geographic distribution and clustering of quality of care indicators between districts and villages. Despite the descriptive results, rate ratios and 95% confidence intervals (CI) were estimated for indices of care in order to compare the quality of diabetes care among different areas. Results: A total of 23,588 patients with diabetes were extracted from the hospital data system; whereas 12,716 patients’ information and medical records were included to the following analysis. More than half of the subjects in this study were male and between 60-79 years old. Furthermore, the quality of diabetes care did indeed vary by geographical levels. Thru the smaller level, we could point out clustered areas more specifically. Fuguo Village (of Yongkang District) and Zhiyi Village (of Sinhua District) were found to be “hotspots” for nephropathy and cerebrovascular disease; while Wangliau Village and Erwang Village (of Yongkang District) would be “coldspots” for lowest proportion of ≥80% compliance to blood lipids examination. On the other hand, Yuping Village (in Anping District) was the area with the lowest proportion of ≥80% compliance to all laboratory examination. Conclusion: In spite of examining the geographic distribution, calculating rate ratios and their 95% CI could also be a useful and consistent method to test the association. This information is useful for health planners, diabetes case managers and other affiliate practitioners to organize care resources to the areas most needed.

Keywords: Geocoding, chronic disease management, quality of diabetes care, rate ratio.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 951
408 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Authors: Essam Al Daoud

Abstract:

Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Keywords: Gradient boosting, XGBoost, LightGBM, CatBoost, home credit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9065
407 Diabetes Mellitus and Food Balance in the Kingdom of Saudi Arabia

Authors: Aljabryn Dalal Hamad

Abstract:

The present explanatory study concerns with the relation between Diabetes Mellitus and Food Balance in the Kingdom of Saudi Arabia during 2005-2010, using published data. Results illustrated that Saudi citizen daily protein consumption (DPC) during 2005-2007 (g/capita/day) is higher than the average global consumption level of protein with 15.27%, daily fat consumption (DFC) with 24.56% and daily energy consumption (DEC) with 16.93% and increases than recommended level by International Nutrition Organizations (INO) with 56% for protein, 60.49% for fat and 27.37% for energy. On the other hand, DPC per capita in Saudi Arabia decreased during the period 2008-2010 from 88.3 to 82.36 gram/ day. Moreover, DFC per capita in Saudi Arabia decreased during the period 2008-2010 from 3247.90 to 3176.43 Cal/capita/ day, and daily energy consumption (DEC) of Saudi citizen increases than world consumption with 16.93%, whereas increases with 27.37% than INO. Despite this, DPC, DFC and DEC per capita in Saudi Arabia still higher than world mean. On the other side, results illustrated that the number of diabetic patients in Saudi Arabia during the same period (2005-2010). The curve of diabetic patient’s number in Saudi Arabia during 2005-2010 is regular ascending with increasing level ranged between 7.10% in 2005 and 12.44% in 2010. It is essential to devise Saudi National programs to educate the public about the relation of food balances and diabetes so it could be avoided, and provide citizens with healthy dietary balances tables.

Keywords: Diabetes Mellitus, Food Balance, Energy, Fat, Protein, Saudi Arabia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1483
406 Combining the Deep Neural Network with the K-Means for Traffic Accident Prediction

Authors: Celso L. Fernando, Toshio Yoshii, Takahiro Tsubota

Abstract:

Understanding the causes of a road accident and predicting their occurrence is key to prevent deaths and serious injuries from road accident events. Traditional statistical methods such as the Poisson and the Logistics regressions have been used to find the association of the traffic environmental factors with the accident occurred; recently, an artificial neural network, ANN, a computational technique that learns from historical data to make a more accurate prediction, has emerged. Although the ability to make accurate predictions, the ANN has difficulty dealing with highly unbalanced attribute patterns distribution in the training dataset; in such circumstances, the ANN treats the minority group as noise. However, in the real world data, the minority group is often the group of interest; e.g., in the road traffic accident data, the events of the accident are the group of interest. This study proposes a combination of the k-means with the ANN to improve the predictive ability of the neural network model by alleviating the effect of the unbalanced distribution of the attribute patterns in the training dataset. The results show that the proposed method improves the ability of the neural network to make a prediction on a highly unbalanced distributed attribute patterns dataset; however, on an even distributed attribute patterns dataset, the proposed method performs almost like a standard neural network. 

Keywords: Accident risks estimation, artificial neural network, deep learning, K-mean, road safety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 875
405 Understanding Physical Activity Behavior of Type 2 Diabetics Using the Theory of Planned Behavior and Structural Equation Modeling

Authors: D. O. Omondi, M. K. Walingo, G. M. Mbagaya, L. O. A. Othuon

Abstract:

Understanding patient factors related to physical activity behavior is important in the management of Type 2 Diabetes. This study applied the Theory of Planned Behavior model to understand physical activity behavior among sampled Type 2 diabetics in Kenya. The study was conducted within the diabetic clinic at Kisii Level 5 Hospital and adopted sequential mixed methods design beginning with qualitative phase and ending with quantitative phase. Qualitative data was analyzed using grounded theory analysis method. Structural equation modeling using maximum likelihood was used to analyze quantitative data. The common fit indices revealed that the theory of planned behavior fitted the data acceptably well among the Type 2 diabetes and within physical activity behavior {¤ç2 = 213, df = 84, n=230, p = .061, ¤ç2/df = 2.53; TLI = .97; CFI =.96; RMSEA (90CI) = .073(.029, .08)}. This theory proved to be useful in understanding physical activity behavior among Type 2 diabetics.

Keywords: Physical activity, Theory of Planned Behavior, Type2 diabetes, Kenya.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1945
404 Evaluation of the Possible Effect of Gender, Age and Duration of Diabetes on the Serum Zinc Levels of Diabetic Patients in Murzuk Area-Libya

Authors: Mukhtar H. Hassan, Muhammed A. Basher, Elhadi E. Saad, Almahdi M. Almahdi

Abstract:

The aim of this study was to demonstrate the possible effect of some variables such as age, gender, blood sugar level, and duration of diabetes on the serum level of zinc in diabetic individuals from Murzuk area. Serum zinc (Zn), Fasting blood sugar (FBS), hemoglobin HbA1c (HbA1c) were evaluated in 46 type I diabetic subjects (group 1), 48 type II diabetic subjects (group 2) and 43 healthy individuals (control) of both genders aged (30-81) years. Data showed that both diabetic groups have significantly higher (P<0.05) serum levels of Zn, FBS and HbA1c compared with controls. No significant (p>0.05) differences in serum Zn levels were observed between Males and Females. Serum Zn levels were non-significantly decreased with increasing age. In type II diabetic subjects, serum Zn levels were non-significantly decreased with increasing duration of disease whereas those in type I were non-significantly increased.

Keywords: Blood sugar, diabetes, HbA1c, zinc.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1551
403 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2726
402 Visfatin and Apelin Are New Interrelated Adipokines Playing Role in the Pathogenesis of Type 2 Diabetes Mellitus Associated Coronary Artery Disease in Postmenopausal Women

Authors: Hala O. El-Mesallamy, Salwa M. Suwailem, Mae M. Seleem

Abstract:

Visfatin and apelin are two new adipokines that recently gained a special interest in diabetes research. This study was conducted to study the interplay between these two adipokines and their correlation with other inflammatory and biochemical parameters in type 2 diabetic (T2D) postmenopausal women with CAD. Visfatin and apelin were measured by enzyme-linked immunoassay (ELISA). Visfatin was found to be significantly higher in the following groups: T2D patients without CAD, non-obese and obese T2D patients with CAD when compared to control group. Apelin was found to be significantly lower in non-obese and obese T2D patients with CAD when compared to control group. Visfatin and apelin were found to be significantly associated with each other and with other biochemical parameters. The current study provides evidence for the interplay between visfatin and apelin through the inflammatory milieu characteristic of T2D and their possible role in the pathogenesis of CAD complication of T2D. 

Keywords: Apelin, Coronary artery disease, Inflammation, Type 2 diabetes, Visfatin.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2124
401 Culturally Enhanced Collaborative Filtering

Authors: Mahboobe Zardosht, Nasser Ghasem-Aghaee

Abstract:

We propose an enhanced collaborative filtering method using Hofstede-s cultural dimensions, calculated for 111 countries. We employ 4 of these dimensions, which are correlated to the costumers- buying behavior, in order to detect users- preferences for items. In addition, several advantages of this method demonstrated for data sparseness and cold-start users, which are important challenges in collaborative filtering. We present experiments using a real dataset, Book Crossing Dataset. Experimental results shows that the proposed algorithm provide significant advantages in terms of improving recommendation quality.

Keywords: Collaborative filtering, Cross-cultural, E-commerce, Recommender systems

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1813
400 Attenuation of Pancreatic Histology, Hematology and Biochemical Parameters in Type 2 Diabetic Rats Treated with Azadirachta excelsa

Authors: S. Nurdiana, A. S. Nor Haziqah, M. K. Nur Ezwa Khairunnisa, S. Nurul `Izzati, Y. Siti Amna M. J. Norashirene, I. Nur Hilwani

Abstract:

Azadirachta excelsa or locally known as sentang are frequently used as a traditional medicine by diabetes patients in Malaysia. However, less attention has been given to their toxicity effect. Thus, the study is an attempt to examine the protective effect of A. excelsa on the pancreas and to determine possible toxicity mediated by the extract. Diabetes was induced experimentally in rats by high-fat-diet for 16 weeks followed by intraperitoneal injection of streptozotocin at dosage of 35 mg/kg of body weight. Declination of the fasting blood glucose level was observed after continuous administration of A. excelsa for 14 days twice daily. This is due to the refining structure of the pancreas. However, surprisingly, the plant extract reduced the leukocytes, erythrocytes, hemoglobin, MCHC and lymphocytes. In addition, the rat treated with the plant extract exhibited increment in AST and eosinocytes level. Overall, the finding shows that A. excelsa possesses antidiabetic activity by improving the structure of pancreatic islet of Langerhans but involved in ameliorating of hematology and biochemical parameters.

Keywords: Azadirachta excelsa, diabetes, pancreas, hematobiochemical parameters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2247
399 Biometric Authentication Using Fast Correlation of Near Infrared Hand Vein Patterns

Authors: Mohamed Shahin, Ahmed Badawi, Mohamed Kamel

Abstract:

This paper presents a hand vein authentication system using fast spatial correlation of hand vein patterns. In order to evaluate the system performance, a prototype was designed and a dataset of 50 persons of different ages above 16 and of different gender, each has 10 images per person was acquired at different intervals, 5 images for left hand and 5 images for right hand. In verification testing analysis, we used 3 images to represent the templates and 2 images for testing. Each of the 2 images is matched with the existing 3 templates. FAR of 0.02% and FRR of 3.00 % were reported at threshold 80. The system efficiency at this threshold was found to be 99.95%. The system can operate at a 97% genuine acceptance rate and 99.98 % genuine reject rate, at corresponding threshold of 80. The EER was reported as 0.25 % at threshold 77. We verified that no similarity exists between right and left hand vein patterns for the same person over the acquired dataset sample. Finally, this distinct 100 hand vein patterns dataset sample can be accessed by researchers and students upon request for testing other methods of hand veins matching.

Keywords: Biometrics, Verification, Hand Veins, PatternsSimilarity, Statistical Performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3441
398 An Enhanced Support Vector Machine-Based Approach for Sentiment Classification of Arabic Tweets of Different Dialects

Authors: Gehad S. Kaseb, Mona F. Ahmed

Abstract:

Arabic Sentiment Analysis (SA) is one of the most common research fields with many open areas. This paper proposes different pre-processing steps and a modified methodology to improve the accuracy using normal Support Vector Machine (SVM) classification. The paper works on two datasets, Arabic Sentiment Tweets Dataset (ASTD) and Extended Arabic Tweets Sentiment Dataset (Extended-ATSD), which are publicly available for academic use. The results show that the classification accuracy approaches 86%.

Keywords: Arabic, hybrid classification, sentiment analysis, tweets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 411
397 The New Educators: The Reasons for Saudi Arabia to Invest More in Student Counseling Programs

Authors: Turki Alotaibi

Abstract:

Student counseling programs can provide many benefits to students in schools all around the world. In theory, the government of the Kingdom of Saudi Arabia (Saudi Arabia) has committed itself to school counseling programs in educational institutions throughout the country. Student counselors face a number of burdens and obstacles that impact student counseling programs. It is also widely known that Saudi Arabia has extremely high prevalence rates for overweight and obesity, anxiety and depression, and diabetes in children. It has also been demonstrated that teachers and staff are inadequately prepared when dealing with health issues relating to diabetes in schools in Saudi Arabia. This study will clearly demonstrate how student counselors in Saudi Arabia could become 'New Educators' in Saudi schools in relation to these health issues. This would allow them to leverage their position as student counselor to improve the management of these health issues in Saudi schools, to improve the quality of care provided to school children, and to overcome burdens and obstacles that are currently negatively affecting student counseling in Saudi schools.

Keywords: Anxiety, depression, diabetes, overweight, obesity, policy recommendations, student counseling, The Kingdom of Saudi Arabia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1632