Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 2488

Search results for: genomic prediction

1618 Interpretable Deep Learning Models for Medical Condition Identification

Authors: Dongping Fang, Lian Duan, Xiaojing Yuan, Mike Xu, Allyn Klunder, Kevin Tan, Suiting Cao, Yeqing Ji

Abstract:

Accurate prediction of a medical condition with straight clinical evidence is a long-sought topic in the medical management and health insurance field. Although great progress has been made with machine learning algorithms, the medical community is still, to a certain degree, suspicious about the model's accuracy and interpretability. This paper presents an innovative hierarchical attention deep learning model to achieve good prediction and clear interpretability that can be easily understood by medical professionals. This deep learning model uses a hierarchical attention structure that matches naturally with the medical history data structure and reflects the member’s encounter (date of service) sequence. The model attention structure consists of 3 levels: (1) attention on the medical code types (diagnosis codes, procedure codes, lab test results, and prescription drugs), (2) attention on the sequential medical encounters within a type, (3) attention on the medical codes within an encounter and type. This model is applied to predict the occurrence of stage 3 chronic kidney disease (CKD3), using three years’ medical history of Medicare Advantage (MA) members from a top health insurance company. The model takes members’ medical events, both claims and electronic medical record (EMR) data, as input, makes a prediction of CKD3 and calculates the contribution from individual events to the predicted outcome. The model outcome can be easily explained with the clinical evidence identified by the model algorithm. Here are examples: Member A had 36 medical encounters in the past three years: multiple office visits, lab tests and medications. The model predicts member A has a high risk of CKD3 with the following well-contributed clinical events - multiple high ‘Creatinine in Serum or Plasma’ tests and multiple low kidneys functioning ‘Glomerular filtration rate’ tests. Among the abnormal lab tests, more recent results contributed more to the prediction. The model also indicates regular office visits, no abnormal findings of medical examinations, and taking proper medications decreased the CKD3 risk. Member B had 104 medical encounters in the past 3 years and was predicted to have a low risk of CKD3, because the model didn’t identify diagnoses, procedures, or medications related to kidney disease, and many lab test results, including ‘Glomerular filtration rate’ were within the normal range. The model accurately predicts members A and B and provides interpretable clinical evidence that is validated by clinicians. Without extra effort, the interpretation is generated directly from the model and presented together with the occurrence date. Our model uses the medical data in its most raw format without any further data aggregation, transformation, or mapping. This greatly simplifies the data preparation process, mitigates the chance for error and eliminates post-modeling work needed for traditional model explanation. To our knowledge, this is the first paper on an interpretable deep-learning model using a 3-level attention structure, sourcing both EMR and claim data, including all 4 types of medical data, on the entire Medicare population of a big insurance company, and more importantly, directly generating model interpretation to support user decision. In the future, we plan to enrich the model input by adding patients’ demographics and information from free-texted physician notes.

Keywords: deep learning, interpretability, attention, big data, medical conditions

Procedia PDF Downloads 91

1617 Machine Learning Approaches Based on Recency, Frequency, Monetary (RFM) and K-Means for Predicting Electrical Failures and Voltage Reliability in Smart Cities

Authors: Panaya Sudta, Wanchalerm Patanacharoenwong, Prachya Bumrungkun

Abstract:

As With the evolution of smart grids, ensuring the reliability and efficiency of electrical systems in smart cities has become crucial. This paper proposes a distinct approach that combines advanced machine learning techniques to accurately predict electrical failures and address voltage reliability issues. This approach aims to improve the accuracy and efficiency of reliability evaluations in smart cities. The aim of this research is to develop a comprehensive predictive model that accurately predicts electrical failures and voltage reliability in smart cities. This model integrates RFM analysis, K-means clustering, and LSTM networks to achieve this objective. The research utilizes RFM analysis, traditionally used in customer value assessment, to categorize and analyze electrical components based on their failure recency, frequency, and monetary impact. K-means clustering is employed to segment electrical components into distinct groups with similar characteristics and failure patterns. LSTM networks are used to capture the temporal dependencies and patterns in customer data. This integration of RFM, K-means, and LSTM results in a robust predictive tool for electrical failures and voltage reliability. The proposed model has been tested and validated on diverse electrical utility datasets. The results show a significant improvement in prediction accuracy and reliability compared to traditional methods, achieving an accuracy of 92.78% and an F1-score of 0.83. This research contributes to the proactive maintenance and optimization of electrical infrastructures in smart cities. It also enhances overall energy management and sustainability. The integration of advanced machine learning techniques in the predictive model demonstrates the potential for transforming the landscape of electrical system management within smart cities. The research utilizes diverse electrical utility datasets to develop and validate the predictive model. RFM analysis, K-means clustering, and LSTM networks are applied to these datasets to analyze and predict electrical failures and voltage reliability. The research addresses the question of how accurately electrical failures and voltage reliability can be predicted in smart cities. It also investigates the effectiveness of integrating RFM analysis, K-means clustering, and LSTM networks in achieving this goal. The proposed approach presents a distinct, efficient, and effective solution for predicting and mitigating electrical failures and voltage issues in smart cities. It significantly improves prediction accuracy and reliability compared to traditional methods. This advancement contributes to the proactive maintenance and optimization of electrical infrastructures, overall energy management, and sustainability in smart cities.

Keywords: electrical state prediction, smart grids, data-driven method, long short-term memory, RFM, k-means, machine learning

Procedia PDF Downloads 56

1616 Heart Rate Variability Analysis for Early Stage Prediction of Sudden Cardiac Death

Authors: Reeta Devi, Hitender Kumar Tyagi, Dinesh Kumar

Abstract:

In present scenario, cardiovascular problems are growing challenge for researchers and physiologists. As heart disease have no geographic, gender or socioeconomic specific reasons; detecting cardiac irregularities at early stage followed by quick and correct treatment is very important. Electrocardiogram is the finest tool for continuous monitoring of heart activity. Heart rate variability (HRV) is used to measure naturally occurring oscillations between consecutive cardiac cycles. Analysis of this variability is carried out using time domain, frequency domain and non-linear parameters. This paper presents HRV analysis of the online dataset for normal sinus rhythm (taken as healthy subject) and sudden cardiac death (SCD subject) using all three methods computing values for parameters like standard deviation of node to node intervals (SDNN), square root of mean of the sequences of difference between adjacent RR intervals (RMSSD), mean of R to R intervals (mean RR) in time domain, very low-frequency (VLF), low-frequency (LF), high frequency (HF) and ratio of low to high frequency (LF/HF ratio) in frequency domain and Poincare plot for non linear analysis. To differentiate HRV of healthy subject from subject died with SCD, k –nearest neighbor (k-NN) classifier has been used because of its high accuracy. Results show highly reduced values for all stated parameters for SCD subjects as compared to healthy ones. As the dataset used for SCD patients is recording of their ECG signal one hour prior to their death, it is therefore, verified with an accuracy of 95% that proposed algorithm can identify mortality risk of a patient one hour before its death. The identification of a patient’s mortality risk at such an early stage may prevent him/her meeting sudden death if in-time and right treatment is given by the doctor.

Keywords: early stage prediction, heart rate variability, linear and non-linear analysis, sudden cardiac death

Procedia PDF Downloads 342

1615 Implementation of Deep Neural Networks for Pavement Condition Index Prediction

Authors: M. Sirhan, S. Bekhor, A. Sidess

Abstract:

In-service pavements deteriorate with time due to traffic wheel loads, environment, and climate conditions. Pavement deterioration leads to a reduction in their serviceability and structural behavior. Consequently, proper maintenance and rehabilitation (M&R) are necessary actions to keep the in-service pavement network at the desired level of serviceability. Due to resource and financial constraints, the pavement management system (PMS) prioritizes roads most in need of maintenance and rehabilitation action. It recommends a suitable action for each pavement based on the performance and surface condition of each road in the network. The pavement performance and condition are usually quantified and evaluated by different types of roughness-based and stress-based indices. Examples of such indices are Pavement Serviceability Index (PSI), Pavement Serviceability Ratio (PSR), Mean Panel Rating (MPR), Pavement Condition Rating (PCR), Ride Number (RN), Profile Index (PI), International Roughness Index (IRI), and Pavement Condition Index (PCI). PCI is commonly used in PMS as an indicator of the extent of the distresses on the pavement surface. PCI values range between 0 and 100; where 0 and 100 represent a highly deteriorated pavement and a newly constructed pavement, respectively. The PCI value is a function of distress type, severity, and density (measured as a percentage of the total pavement area). PCI is usually calculated iteratively using the 'Paver' program developed by the US Army Corps. The use of soft computing techniques, especially Artificial Neural Network (ANN), has become increasingly popular in the modeling of engineering problems. ANN techniques have successfully modeled the performance of the in-service pavements, due to its efficiency in predicting and solving non-linear relationships and dealing with an uncertain large amount of data. Typical regression models, which require a pre-defined relationship, can be replaced by ANN, which was found to be an appropriate tool for predicting the different pavement performance indices versus different factors as well. Subsequently, the objective of the presented study is to develop and train an ANN model that predicts the PCI values. The model’s input consists of percentage areas of 11 different damage types; alligator cracking, swelling, rutting, block cracking, longitudinal/transverse cracking, edge cracking, shoving, raveling, potholes, patching, and lane drop off, at three severity levels (low, medium, high) for each. The developed model was trained using 536,000 samples and tested on 134,000 samples. The samples were collected and prepared by The National Transport Infrastructure Company. The predicted results yielded satisfactory compliance with field measurements. The proposed model predicted PCI values with relatively low standard deviations, suggesting that it could be incorporated into the PMS for PCI determination. It is worth mentioning that the most influencing variables for PCI prediction are damages related to alligator cracking, swelling, rutting, and potholes.

Keywords: artificial neural networks, computer programming, pavement condition index, pavement management, performance prediction

Procedia PDF Downloads 137

1614 Validation of Nutritional Assessment Scores in Prediction of Mortality and Duration of Admission in Elderly, Hospitalized Patients: A Cross-Sectional Study

Authors: Christos Lampropoulos, Maria Konsta, Vicky Dradaki, Irini Dri, Konstantina Panouria, Tamta Sirbilatze, Ifigenia Apostolou, Vaggelis Lambas, Christina Kordali, Georgios Mavras

Abstract:

Objectives: Malnutrition in hospitalized patients is related to increased morbidity and mortality. The purpose of our study was to compare various nutritional scores in order to detect the most suitable one for assessing the nutritional status of elderly, hospitalized patients and correlate them with mortality and extension of admission duration, due to patients’ critical condition. Methods: Sample population included 150 patients (78 men, 72 women, mean age 80±8.2). Nutritional status was assessed by Mini Nutritional Assessment (MNA full, short-form), Malnutrition Universal Screening Tool (MUST) and short Nutritional Appetite Questionnaire (sNAQ). Sensitivity, specificity, positive and negative predictive values and ROC curves were assessed after adjustment for the cause of current admission, a known prognostic factor according to previously applied multivariate models. Primary endpoints were mortality (from admission until 6 months afterwards) and duration of hospitalization, compared to national guidelines for closed consolidated medical expenses. Results: Concerning mortality, MNA (short-form and full) and SNAQ had similar, low sensitivity (25.8%, 25.8% and 35.5% respectively) while MUST had higher sensitivity (48.4%). In contrast, all the questionnaires had high specificity (94%-97.5%). Short-form MNA and sNAQ had the best positive predictive value (72.7% and 78.6% respectively) whereas all the questionnaires had similar negative predictive value (83.2%-87.5%). MUST had the highest ROC curve (0.83) in contrast to the rest questionnaires (0.73-0.77). With regard to extension of admission duration, all four scores had relatively low sensitivity (48.7%-56.7%), specificity (68.4%-77.6%), positive predictive value (63.1%-69.6%), negative predictive value (61%-63%) and ROC curve (0.67-0.69). Conclusion: MUST questionnaire is more advantageous in predicting mortality due to its higher sensitivity and ROC curve. None of the nutritional scores is suitable for prediction of extended hospitalization.

Keywords: duration of admission, malnutrition, nutritional assessment scores, prognostic factors for mortality

Procedia PDF Downloads 346

1613 Advances in Sesame Molecular Breeding: A Comprehensive Review

Authors: Micheale Yifter Weldemichael

Abstract:

Sesame (Sesamum indicum L.) is among the most important oilseed crops for its high edible oil quality and quantity. Sesame is grown for food, medicinal, pharmaceutical, and industrial uses. Sesame is also cultivated as a main cash crop in Asia and Africa by smallholder farmers. Despite the global exponential increase in sesame cultivation area, its production and productivity remain low, mainly due to biotic and abiotic constraints. Notwithstanding the efforts to solve these problems, a low level of genetic variation and inadequate genomic resources hinder the progress of sesame improvement. The objective of this paper is, therefore, to review recent advances in the area of molecular breeding and transformation to overcome major production constraints and could result in enhanced and sustained sesame production. This paper reviews various researches conducted to date on molecular breeding and genetic transformation in sesame focusing on molecular markers used in assessing the available online database resources, genes responsible for key agronomic traits as well as transgenic technology and genome editing. The review concentrates on quantitative and semi-quantitative studies on molecular breeding for key agronomic traits such as improvement of yield components, oil and oil-related traits, disease and insect/pest resistance, and drought, waterlogging and salt tolerance, as well as sesame genetic transformation and genome editing techniques. Pitfalls and limitations of existing studies and methodologies used so far are identified and some priorities for future research directions in sesame genetic improvement are identified in this review.

Keywords: abiotic stress, biotic stress, improvement, molecular breeding, oil, sesame, shattering

Procedia PDF Downloads 35

1612 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach

Authors: James Ladzekpo

Abstract:

Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.

Keywords: diabetes, machine learning, prediction, biomarkers

Procedia PDF Downloads 55

1611 The Prediction of Evolutionary Process of Coloured Vision in Mammals: A System Biology Approach

Authors: Shivani Sharma, Prashant Saxena, Inamul Hasan Madar

Abstract:

Since the time of Darwin, it has been considered that genetic change is the direct indicator of variation in phenotype. But a few studies in system biology in the past years have proposed that epigenetic developmental processes also affect the phenotype thus shifting the focus from a linear genotype-phenotype map to a non-linear G-P map. In this paper, we attempt at explaining the evolution of colour vision in mammals by taking LWS/ Long-wave sensitive gene under consideration.

Keywords: evolution, phenotypes, epigenetics, LWS gene, G-P map

Procedia PDF Downloads 521

1610 Prognostic Implication of Nras Gene Mutations in Egyptian Adult Acute Myeloid Leukemia

Authors: Doaa M. Elghannam, Nashwa Khayrat Abousamra, Doaa A. Shahin, Enas F. Goda, Hanan Azzam, Emad Azmy, Manal Salah El-Din

Abstract:

Background: The pathogenesis of acute myeloid leukemia (AML) involves the cooperation of mutations promoting proliferation/survival and those impairing differentiation. Point mutations of the NRAS gene are the most frequent somatic mutations causing aberrant signal-transduction in acute myeloid leukemia (AML). Aim: The present work was conducted to study the frequency and prognostic significance of NRAS gene mutations (NRASmut) in de novo Egyptian adult AML. Material and methods: Bone marrow specimens from 150 patients with de novo acute myeloid leukemia and controls were analyzed by genomic PCR-SSCP at codons 12, 13 (exon 1), and 61 (exon 2) for NRAS mutations. Results: NRAS gene mutations was found in 19/150 (12.7%) AML cases, represented more frequently in the FAB subtype M4eo (P = 0.028), and at codon 12, 13 (14of 19; 73.7%). Patients with NRASmut had a significant lower peripheral marrow blasts (P = 0.004, P=0.03) and non significant improved clinical outcome than patients without the mutation. Complete remission rate was (63.2% vs 56.5%; p=0.46), resistant disease (15.8% vs 23.6%; p=0.51), three years overall survival (44% vs 42%; P = 0.85) and disease free survival (42.1% vs 38.9%, P = 0.74). Multivariate analysis showed that age was the strongest unfavorable factor for overall survival (relative risk [RR], 1.9; P = .002), followed by cytogenetics (P = .004). FAB types, NRAS mutation, and leukocytosis were less important. Conclusions: NRAS gene mutation frequency and spectrum differ between biologically distinct subtypes of AML but do not significantly influence prognosis and clinical outcome.

Keywords: NRAS Gene, egyptian adult, acute myeloid leukemia, cytogenetics

Procedia PDF Downloads 99

1609 16s rRNA Based Metagenomic Analysis of Palm Sap Samples From Bangladesh

Authors: Ágota Ábrahám, Md Nurul Islam, Karimane Zeghbib, Gábor Kemenesi, Sazeda Akter

Abstract:

Collecting palm sap as a food source is an everyday practice in some parts of the world. However, the consumption of palm juice has been associated with regular infections and epidemics in parts of Bangladesh. This is attributed to fruit-eating bats and other vertebrates or invertebrates native to the area, contaminating the food with their body secretions during the collection process. The frequent intake of palm juice, whether as a processed food product or in its unprocessed form, is a common phenomenon in large areas. The range of pathogens suitable for human infection resulting from this practice is not yet fully understood. Additionally, the high sugar content of the liquid makes it an ideal culture medium for certain bacteria, which can easily propagate and potentially harm consumers. Rapid diagnostics, especially in remote locations, could mitigate health risks associated with palm juice consumption. The primary objective of this research is the rapid genomic detection and risk assessment of bacteria that may cause infections in humans through the consumption of palm juice. Utilizing state-of-the-art third-generation Nanopore metagenomic sequencing technology based on 16S rRNA, and identified bacteria primarily involved in fermenting processes. The swift metagenomic analysis, coupled with the widespread availability and portability of Nanopore products (including real-time analysis options), proves advantageous for detecting harmful pathogens in food sources without relying on extensive industry resources and testing.

Keywords: raw date palm sap, NGS, metabarcoding, food safety

Procedia PDF Downloads 55

1608 Applying Semi-Automatic Digital Aerial Survey Technology and Canopy Characters Classification for Surface Vegetation Interpretation of Archaeological Sites

Authors: Yung-Chung Chuang

Abstract:

The cultural layers of archaeological sites are mainly affected by surface land use, land cover, and root system of surface vegetation. For this reason, continuous monitoring of land use and land cover change is important for archaeological sites protection and management. However, in actual operation, on-site investigation and orthogonal photograph interpretation require a lot of time and manpower. For this reason, it is necessary to perform a good alternative for surface vegetation survey in an automated or semi-automated manner. In this study, we applied semi-automatic digital aerial survey technology and canopy characters classification with very high-resolution aerial photographs for surface vegetation interpretation of archaeological sites. The main idea is based on different landscape or forest type can easily be distinguished with canopy characters (e.g., specific texture distribution, shadow effects and gap characters) extracted by semi-automatic image classification. A novel methodology to classify the shape of canopy characters using landscape indices and multivariate statistics was also proposed. Non-hierarchical cluster analysis was used to assess the optimal number of canopy character clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy character classification (seven categories). Therefore, people could easily predict the forest type and vegetation land cover by corresponding to the specific canopy character category. The results showed that the semi-automatic classification could effectively extract the canopy characters of forest and vegetation land cover. As for forest type and vegetation type prediction, the average prediction accuracy reached 80.3%~91.7% with different sizes of test frame. It represented this technology is useful for archaeological site survey, and can improve the classification efficiency and data update rate.

Keywords: digital aerial survey, canopy characters classification, archaeological sites, multivariate statistics

Procedia PDF Downloads 142

1607 A Generalized Weighted Loss for Support Vextor Classification and Multilayer Perceptron

Authors: Filippo Portera

Abstract:

Usually standard algorithms employ a loss where each error is the mere absolute difference between the true value and the prediction, in case of a regression task. In the present, we present several error weighting schemes that are a generalization of the consolidated routine. We study both a binary classification model for Support Vextor Classification and a regression net for Multylayer Perceptron. Results proves that the error is never worse than the standard procedure and several times it is better.

Keywords: loss, binary-classification, MLP, weights, regression

Procedia PDF Downloads 95

1606 CMPD: Cancer Mutant Proteome Database

Authors: Po-Jung Huang, Chi-Ching Lee, Bertrand Chin-Ming Tan, Yuan-Ming Yeh, Julie Lichieh Chu, Tin-Wen Chen, Cheng-Yang Lee, Ruei-Chi Gan, Hsuan Liu, Petrus Tang

Abstract:

Whole-exome sequencing focuses on the protein coding regions of disease/cancer associated genes based on a priori knowledge is the most cost-effective method to study the association between genetic alterations and disease. Recent advances in high throughput sequencing technologies and proteomic techniques has provided an opportunity to integrate genomics and proteomics, allowing readily detectable mutated peptides corresponding to mutated genes. Since sequence database search is the most widely used method for protein identification using Mass spectrometry (MS)-based proteomics technology, a mutant proteome database is required to better approximate the real protein pool to improve disease-associated mutated protein identification. Large-scale whole exome/genome sequencing studies were launched by National Cancer Institute (NCI), Broad Institute, and The Cancer Genome Atlas (TCGA), which provide not only a comprehensive report on the analysis of coding variants in diverse samples cell lines but a invaluable resource for extensive research community. No existing database is available for the collection of mutant protein sequences related to the identified variants in these studies. CMPD is designed to address this issue, serving as a bridge between genomic data and proteomic studies and focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations.

Keywords: TCGA, cancer, mutant, proteome

Procedia PDF Downloads 593

1605 Analysis of Biomarkers Intractable Epileptogenic Brain Networks with Independent Component Analysis and Deep Learning Algorithms: A Comprehensive Framework for Scalable Seizure Prediction with Unimodal Neuroimaging Data in Pediatric Patients

Authors: Bliss Singhal

Abstract:

Epilepsy is a prevalent neurological disorder affecting approximately 50 million individuals worldwide and 1.2 million Americans. There exist millions of pediatric patients with intractable epilepsy, a condition in which seizures fail to come under control. The occurrence of seizures can result in physical injury, disorientation, unconsciousness, and additional symptoms that could impede children's ability to participate in everyday tasks. Predicting seizures can help parents and healthcare providers take precautions, prevent risky situations, and mentally prepare children to minimize anxiety and nervousness associated with the uncertainty of a seizure. This research proposes a comprehensive framework to predict seizures in pediatric patients by evaluating machine learning algorithms on unimodal neuroimaging data consisting of electroencephalogram signals. The bandpass filtering and independent component analysis proved to be effective in reducing the noise and artifacts from the dataset. Various machine learning algorithms’ performance is evaluated on important metrics such as accuracy, precision, specificity, sensitivity, F1 score and MCC. The results show that the deep learning algorithms are more successful in predicting seizures than logistic Regression, and k nearest neighbors. The recurrent neural network (RNN) gave the highest precision and F1 Score, long short-term memory (LSTM) outperformed RNN in accuracy and convolutional neural network (CNN) resulted in the highest Specificity. This research has significant implications for healthcare providers in proactively managing seizure occurrence in pediatric patients, potentially transforming clinical practices, and improving pediatric care.

Keywords: intractable epilepsy, seizure, deep learning, prediction, electroencephalogram channels

Procedia PDF Downloads 84

1604 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 241

1603 Validation of Asymptotic Techniques to Predict Bistatic Radar Cross Section

Authors: M. Pienaar, J. W. Odendaal, J. C. Smit, J. Joubert

Abstract:

Simulations are commonly used to predict the bistatic radar cross section (RCS) of military targets since characterization measurements can be expensive and time consuming. It is thus important to accurately predict the bistatic RCS of targets. Computational electromagnetic (CEM) methods can be used for bistatic RCS prediction. CEM methods are divided into full-wave and asymptotic methods. Full-wave methods are numerical approximations to the exact solution of Maxwell’s equations. These methods are very accurate but are computationally very intensive and time consuming. Asymptotic techniques make simplifying assumptions in solving Maxwell's equations and are thus less accurate but require less computational resources and time. Asymptotic techniques can thus be very valuable for the prediction of bistatic RCS of electrically large targets, due to the decreased computational requirements. This study extends previous work by validating the accuracy of asymptotic techniques to predict bistatic RCS through comparison with full-wave simulations as well as measurements. Validation is done with canonical structures as well as complex realistic aircraft models instead of only looking at a complex slicy structure. The slicy structure is a combination of canonical structures, including cylinders, corner reflectors and cubes. Validation is done over large bistatic angles and at different polarizations. Bistatic RCS measurements were conducted in a compact range, at the University of Pretoria, South Africa. The measurements were performed at different polarizations from 2 GHz to 6 GHz. Fixed bistatic angles of β = 30.8°, 45° and 90° were used. The measurements were calibrated with an active calibration target. The EM simulation tool FEKO was used to generate simulated results. The full-wave multi-level fast multipole method (MLFMM) simulated results together with the measured data were used as reference for validation. The accuracy of physical optics (PO) and geometrical optics (GO) was investigated. Differences relating to amplitude, lobing structure and null positions were observed between the asymptotic, full-wave and measured data. PO and GO were more accurate at angles close to the specular scattering directions and the accuracy seemed to decrease as the bistatic angle increased. At large bistatic angles PO did not perform well due to the shadow regions not being treated appropriately. PO also did not perform well for canonical structures where multi-bounce was the main scattering mechanism. PO and GO do not account for diffraction but these inaccuracies tended to decrease as the electrical size of objects increased. It was evident that both asymptotic techniques do not properly account for bistatic structural shadowing. Specular scattering was calculated accurately even if targets did not meet the electrically large criteria. It was evident that the bistatic RCS prediction performance of PO and GO depends on incident angle, frequency, target shape and observation angle. The improved computational efficiency of the asymptotic solvers yields a major advantage over full-wave solvers and measurements; however, there is still much room for improvement of the accuracy of these asymptotic techniques.

Keywords: asymptotic techniques, bistatic RCS, geometrical optics, physical optics

Procedia PDF Downloads 258

1602 Field Prognostic Factors on Discharge Prediction of Traumatic Brain Injuries

Authors: Mohammad Javad Behzadnia, Amir Bahador Boroumand

Abstract:

Introduction: Limited facility situations require allocating the most available resources for most casualties. Accordingly, Traumatic Brain Injury (TBI) is the one that may need to transport the patient as soon as possible. In a mass casualty event, deciding when the facilities are restricted is hard. The Extended Glasgow Outcome Score (GOSE) has been introduced to assess the global outcome after brain injuries. Therefore, we aimed to evaluate the prognostic factors associated with GOSE. Materials and Methods: In a multicenter cross-sectional study conducted on 144 patients with TBI admitted to trauma emergency centers. All the patients with isolated TBI who were mentally and physically healthy before the trauma entered the study. The patient’s information was evaluated, including demographic characteristics, duration of hospital stays, mechanical ventilation on admission laboratory measurements, and on-admission vital signs. We recorded the patients’ TBI-related symptoms and brain computed tomography (CT) scan findings. Results: GOSE assessments showed an increasing trend by the comparison of on-discharge (7.47 ± 1.30), within a month (7.51 ± 1.30), and within three months (7.58 ± 1.21) evaluations (P < 0.001). On discharge, GOSE was positively correlated with Glasgow Coma Scale (GCS) (r = 0.729, P < 0.001) and motor GCS (r = 0.812, P < 0.001), and inversely with age (r = −0.261, P = 0.002), hospitalization period (r = −0.678, P < 0.001), pulse rate (r = −0.256, P = 0.002) and white blood cell (WBC). Among imaging signs and trauma-related symptoms in univariate analysis, intracranial hemorrhage (ICH), interventricular hemorrhage (IVH) (P = 0.006), subarachnoid hemorrhage (SAH) (P = 0.06; marginally at P < 0.1), subdural hemorrhage (SDH) (P = 0.032), and epidural hemorrhage (EDH) (P = 0.037) were significantly associated with GOSE at discharge in multivariable analysis. Conclusion: Our study showed some predictive factors that could help to decide which casualty should transport earlier to a trauma center. According to the current study findings, GCS, pulse rate, WBC, and among imaging signs and trauma-related symptoms, ICH, IVH, SAH, SDH, and EDH are significant independent predictors of GOSE at discharge in TBI patients.

Keywords: field, Glasgow outcome score, prediction, traumatic brain injury.

Procedia PDF Downloads 76

1601 Identification of Crimean-Congo Hemorrhagic Fever Virus in Patients Referred to Ahvaz and Gilan Hospitals in Iran by real-time PCR Technique

Authors: Najmeh Jafari, Sona Rostampour Yasouri

Abstract:

Crimean-Congo hemorrhagic fever (CCHF) is an acute hemorrhagic disease. This disease is one of the common diseases between humans and animals, transmitted through tick bites or contact with the blood and secretions or carcasses of infected animals and humans. CCHF is more common in people who work with livestock, such as ranchers, butchers, farmers, slaughterhouse workers, healthcare workers, etc. Its hospital prevalence is also very high. Considering that CCHF can be transmitted through the consumption of food such as beef and sheep meat, this study aims to quickly identify and diagnose the Crimean-Congo fever virus in suspected patients through real-time PCR technique. In the summer of 1402, 20 blood samples were collected separately from Ahvaz and Gilan hospitals. An extraction kit was used to extract the virus RNA. Primers and probes were designed based on the S genomic region, the conserved region in CCHFV. Then, a real-time PCR technique was performed with specific primers and probes. It should be noted that the mentioned technique was repeated several times. The number of 4 samples from the examined samples was determined positive by real-time PCR. This technique has high sensitivity and specificity and the possibility of rapid detection of CCHFV. Therefore, the above method is a good candidate for quick disease diagnosis. By diagnosing the disease, the treatment process can be done faster, and the best prevention methods can be used to control the disease and prevent the death of patients.

Keywords: ahvaz, crimean-congo hemorrhagic fever, gilan, real time PCR

Procedia PDF Downloads 74

1600 PPRA Regulates DNA Replication Initiation and Cell Morphology in Escherichia coli

Authors: Ganesh K. Maurya, Reema Chaudhary, Neha Pandey, Hari S. Misra

Abstract:

PprA, a pleiotropic protein participating in radioresistance, has been reported for its roles in DNA replication initiation, genome segregation, cell division and DNA repair in polyextremophile Deinococcus radiodurans. Interestingly, expression of deinococcal PprA in E. coli suppresses its growth by reducing the number of colony forming units and provides better resistance against γ-radiation than control. We employed different biochemical and cell biology studies using PprA and its DNA binding/polymerization mutants (K133E & W183R) in E. coli. Cells expressing wild type PprA or its K133E mutant showed reduction in the amount of genomic DNA as well as chromosome copy number in comparison to W183R mutant of PprA and control cells, which suggests the role of PprA protein in regulation of DNA replication initiation in E. coli. Further, E. coli cells expressing PprA or its mutants exhibited different impact on cell morphology than control. Expression of PprA or K133E mutant displayed a significant increase in cell length upto 5 folds while W183R mutant showed cell length similar to uninduced control cells. We checked the interaction of deinococcal PprA and its mutants with E. coli DnaA using Bacterial two-hybrid system and co-immunoprecipitation. We observed a functional interaction of EcDnaA with PprA and K133E mutant but not with W183R mutant of PprA. Further, PprA or K133E mutant has suppressed the ATPase activity of EcDnaA but W183R mutant of PprA failed to do so. These observations suggested that PprA protein regulates DNA replication initiation and cell morphology of surrogate E. coli.

Keywords: DNA replication, radioresistance, protein-protein interaction, cell morphology, ATPase activity

Procedia PDF Downloads 69

1599 PPRA Controls DNA Replication and Cell Growth in Escherichia Coli

Authors: Ganesh K. Maurya, Reema Chaudhary, Neha Pandey, Hari S. Misra

Abstract:

PprA, a pleiotropic protein participating in radioresistance, has been reported for its roles in DNA replication initiation, genome segregation, cell division and DNA repair in polyextremophile Deinococcus radiodurans. Interestingly, expression of deinococcal PprA in E. coli suppresses its growth by reducing the number of colony forming units and provide better resistance against γ-radiation than control. We employed different biochemical and cell biology studies using PprA and its DNA binding/polymerization mutants (K133E & W183R) in E. coli. Cells expressing wild type PprA or its K133E mutant showed reduction in the amount of genomic DNA as well as chromosome copy number in comparison to W183R mutant of PprA and control cells, which suggests the role of PprA protein in regulation of DNA replication initiation in E. coli. Further, E. coli cells expressing PprA or its mutants exhibited different impact on cell morphology than control. Expression of PprA or K133E mutant displayed a significant increase in cell length upto 5 folds while W183R mutant showed cell length similar to uninduced control cells. We checked the interaction of deinococcal PprA and its mutants with E. coli DnaA using Bacterial two-hybrid system and co-immunoprecipitation. We observed a functional interaction of EcDnaA with PprA and K133E mutant but not with W183R mutant of PprA. Further, PprA or K133E mutant has suppressed the ATPase activity of EcDnaA but W183R mutant of PprA failed to do so. These observations suggested that PprA protein regulates DNA replication initiation and cell morphology of surrogate E. coli.

Keywords: DNA replication, radioresistance, protein-protein interaction, cell morphology, ATPase activity

Procedia PDF Downloads 70

1598 Estimation of Fragility Curves Using Proposed Ground Motion Selection and Scaling Procedure

Authors: Esra Zengin, Sinan Akkar

Abstract:

Reliable and accurate prediction of nonlinear structural response requires specification of appropriate earthquake ground motions to be used in nonlinear time history analysis. The current research has mainly focused on selection and manipulation of real earthquake records that can be seen as the most critical step in the performance based seismic design and assessment of the structures. Utilizing amplitude scaled ground motions that matches with the target spectra is commonly used technique for the estimation of nonlinear structural response. Representative ground motion ensembles are selected to match target spectrum such as scenario-based spectrum derived from ground motion prediction equations, Uniform Hazard Spectrum (UHS), Conditional Mean Spectrum (CMS) or Conditional Spectrum (CS). Different sets of criteria exist among those developed methodologies to select and scale ground motions with the objective of obtaining robust estimation of the structural performance. This study presents ground motion selection and scaling procedure that considers the spectral variability at target demand with the level of ground motion dispersion. The proposed methodology provides a set of ground motions whose response spectra match target median and corresponding variance within a specified period interval. The efficient and simple algorithm is used to assemble the ground motion sets. The scaling stage is based on the minimization of the error between scaled median and the target spectra where the dispersion of the earthquake shaking is preserved along the period interval. The impact of the spectral variability on nonlinear response distribution is investigated at the level of inelastic single degree of freedom systems. In order to see the effect of different selection and scaling methodologies on fragility curve estimations, results are compared with those obtained by CMS-based scaling methodology. The variability in fragility curves due to the consideration of dispersion in ground motion selection process is also examined.

Keywords: ground motion selection, scaling, uncertainty, fragility curve

Procedia PDF Downloads 583

1597 A Long Short-Term Memory Based Deep Learning Model for Corporate Bond Price Predictions

Authors: Vikrant Gupta, Amrit Goswami

Abstract:

The fixed income market forms the basis of the modern financial market. All other assets in financial markets derive their value from the bond market. Owing to its over-the-counter nature, corporate bonds have relatively less data publicly available and thus is researched upon far less compared to Equities. Bond price prediction is a complex financial time series forecasting problem and is considered very crucial in the domain of finance. The bond prices are highly volatile and full of noise which makes it very difficult for traditional statistical time-series models to capture the complexity in series patterns which leads to inefficient forecasts. To overcome the inefficiencies of statistical models, various machine learning techniques were initially used in the literature for more accurate forecasting of time-series. However, simple machine learning methods such as linear regression, support vectors, random forests fail to provide efficient results when tested on highly complex sequences such as stock prices and bond prices. hence to capture these intricate sequence patterns, various deep learning-based methodologies have been discussed in the literature. In this study, a recurrent neural network-based deep learning model using long short term networks for prediction of corporate bond prices has been discussed. Long Short Term networks (LSTM) have been widely used in the literature for various sequence learning tasks in various domains such as machine translation, speech recognition, etc. In recent years, various studies have discussed the effectiveness of LSTMs in forecasting complex time-series sequences and have shown promising results when compared to other methodologies. LSTMs are a special kind of recurrent neural networks which are capable of learning long term dependencies due to its memory function which traditional neural networks fail to capture. In this study, a simple LSTM, Stacked LSTM and a Masked LSTM based model has been discussed with respect to varying input sequences (three days, seven days and 14 days). In order to facilitate faster learning and to gradually decompose the complexity of bond price sequence, an Empirical Mode Decomposition (EMD) has been used, which has resulted in accuracy improvement of the standalone LSTM model. With a variety of Technical Indicators and EMD decomposed time series, Masked LSTM outperformed the other two counterparts in terms of prediction accuracy. To benchmark the proposed model, the results have been compared with traditional time series models (ARIMA), shallow neural networks and above discussed three different LSTM models. In summary, our results show that the use of LSTM models provide more accurate results and should be explored more within the asset management industry.

Keywords: bond prices, long short-term memory, time series forecasting, empirical mode decomposition

Procedia PDF Downloads 136

1596 A Study of Body Weight and Type Traits Recorded on Hairy Goat in Punjab, Pakistan

Authors: A. Qayyum, G. Bilal, H. M. Waheed

Abstract:

The objectives of the study were to determine phenotypic variations in Hairy goats for quantitative and qualitative traits and to analyze the relationship between different body measurements and body weight in Hairy goats. Data were collected from the Barani Livestock Production Research Institute (BLPRI) at Kherimurat, Attock and potential farmers who were raising hairy goats in the Potohar region. Twelve (12) phenotypic parameters were measured on 99 adult Hairy goat (18 male and 81 female). Four qualitative and 8 quantitative traits were investigated. Qualitative traits were visually observed and expressed as percentages. Descriptive analysis was done on quantitative variables. All hairy goats had predominately black body coat color (72%), whereas white (11%) and brown (11%) body coat color were also observed. Both the pigmented (45.5%) and non-pigmented (54.5%) type of body skin were observed in the goat breed. Horns were present in the majority (91%) of animals. Most of the animals (83%) had straight facial head profiles. Analysis was performed in SAS On-Demand for Academics using PROC mixed model procedure. Overall means ± SD of body weight (BW), body length (BL), height at wither (HAW), ear length (EL), head length (HL), heart girth (HG), tail length (TL) and MC (muzzle circumference) were 41.44 ± 12.21 kg, 66.40 ± 7.87 cm, 75.17 ± 7.83 cm, 22.99 ± 6.75 cm, 15.07 ± 3.44 cm, 76.54 ± 8.80 cm, 18.28 ± 4.18 cm, and 26.24 ± 5.192 cm, respectively. Sex had a significant effect on BL and HG (P < 0.05), whereas BW, HAW, EL, HL, TL, and MC were not significantly affected (P > 0.05). The herd had a significant effect on BW, BL, HAW, HL, HG, and TL (P < 0.05) except EL and MC (P > 0.05). Hairy goats appear to have the potential for selection as mutton breeds in the Potohar region of Punjab. The findings of the present study would help in the characterization and conservation of hairy goats using genetic and genomic tools in the future.

Keywords: body weight, Hairy goat, type traits Punjab, Pakistan

Procedia PDF Downloads 67

1595 Polymorphisms in the Prolactin Gene (C576A) and Its Effect on Milk Production Traits in Crossbred Anglo-Nubian Dairy Goats

Authors: Carlo Stephen O. Moneva, Sharon Rose M. Tabugo

Abstract:

The present study aims to assess polymorphism in the prolactin (C576A) gene and determine the influence of different prolactin (PRL) genotypes to milk yield performance in crossbred Anglo-Nubian dairy goats raised from Awang, Opol, Misamis Oriental and Talay, Dumaguete City, Negros Oriental. Genomic DNA was extracted from hair follicles and Polymerase Chain Reaction – Restriction Fragment Length Polymorphism (PCR-RFLP) was performed for the genotyping of the C576A polymorphism located in exon 5 of goats’ prolactin gene using Eco241 restriction enzyme. Genotypic and allelic frequencies of 0.56 for AA, 0.44 for AB, 0.78 for A, and 0.22 for B were recorded. Observed heterozygosity values were higher than the expected heterozygosity. All populations followed the Hardy–Weinberg principle at p>0.05, except for dairy goats from Farm A located in Opol, Misamis Oriental. A two-way factorial (2 x 4) in a Randomized Complete Block Design was used to be able to evaluate the relationship between genotypes and milk yield performance. PRL genotypes and parity were used as main factors and farm as the blocking factor. AB genotype goats produced significantly higher average daily milk yield and total milk production than AA genotype (p<0.05), an indication that the polymorphism in the caprine PRL (C576A) gene influenced milk yield performance in the population of crossbred Anglo-Nubian goats from Opol, Misamis Oriental and Dumaguete City, Negros Oriental. However, these results have to be validated in other dairy goat breeds.

Keywords: polymorphism, prolactin, milk yield, Anglo-Nubian, PCR-RFLP

Procedia PDF Downloads 106

1594 Measuring Enterprise Growth: Pitfalls and Implications

Authors: N. Šarlija, S. Pfeifer, M. Jeger, A. Bilandžić

Abstract:

Enterprise growth is generally considered as a key driver of competitiveness, employment, economic development and social inclusion. As such, it is perceived to be a highly desirable outcome of entrepreneurship for scholars and decision makers. The huge academic debate resulted in the multitude of theoretical frameworks focused on explaining growth stages, determinants and future prospects. It has been widely accepted that enterprise growth is most likely nonlinear, temporal and related to the variety of factors which reflect the individual, firm, organizational, industry or environmental determinants of growth. However, factors that affect growth are not easily captured, instruments to measure those factors are often arbitrary, causality between variables and growth is elusive, indicating that growth is not easily modeled. Furthermore, in line with heterogeneous nature of the growth phenomenon, there is a vast number of measurement constructs assessing growth which are used interchangeably. Differences among various growth measures, at conceptual as well as at operationalization level, can hinder theory development which emphasizes the need for more empirically robust studies. In line with these highlights, the main purpose of this paper is twofold. Firstly, to compare structure and performance of three growth prediction models based on the main growth measures: Revenues, employment and assets growth. Secondly, to explore the prospects of financial indicators, set as exact, visible, standardized and accessible variables, to serve as determinants of enterprise growth. Finally, to contribute to the understanding of the implications on research results and recommendations for growth caused by different growth measures. The models include a range of financial indicators as lag determinants of the enterprises’ performances during the 2008-2013, extracted from the national register of the financial statements of SMEs in Croatia. The design and testing stage of the modeling used the logistic regression procedures. Findings confirm that growth prediction models based on different measures of growth have different set of predictors. Moreover, the relationship between particular predictors and growth measure is inconsistent, namely the same predictor positively related to one growth measure may exert negative effect on a different growth measure. Overall, financial indicators alone can serve as good proxy of growth and yield adequate predictive power of the models. The paper sheds light on both methodology and conceptual framework of enterprise growth by using a range of variables which serve as a proxy for the multitude of internal and external determinants, but are unlike them, accessible, available, exact and free of perceptual nuances in building up the model. Selection of the growth measure seems to have significant impact on the implications and recommendations related to growth. Furthermore, the paper points out to potential pitfalls of measuring and predicting growth. Overall, the results and the implications of the study are relevant for advancing academic debates on growth-related methodology, and can contribute to evidence-based decisions of policy makers.

Keywords: growth measurement constructs, logistic regression, prediction of growth potential, small and medium-sized enterprises

Procedia PDF Downloads 252

1593 Harnessing Deep-Level Metagenomics to Explore the Three Dynamic One Health Areas: Healthcare, Domiciliary and Veterinary

Authors: Christina Killian, Katie Wall, Séamus Fanning, Guerrino Macori

Abstract:

Deep-level metagenomics offers a useful technical approach to explore the three dynamic One Health axes: healthcare, domiciliary and veterinary. There is currently limited understanding of the composition of complex biofilms, natural abundance of AMR genes and gene transfer occurrence in these ecological niches. By using a newly established small-scale complex biofilm model, COMBAT has the potential to provide new information on microbial diversity, antimicrobial resistance (AMR)-encoding gene abundance, and their transfer in complex biofilms of importance to these three One Health axes. Shotgun metagenomics has been used to sample the genomes of all microbes comprising the complex communities found in each biofilm source. A comparative analysis between untreated and biocide-treated biofilms is described. The basic steps include the purification of genomic DNA, followed by library preparation, sequencing, and finally, data analysis. The use of long-read sequencing facilitates the completion of metagenome-assembled genomes (MAG). Samples were sequenced using a PromethION platform, and following quality checks, binning methods, and bespoke bioinformatics pipelines, we describe the recovery of individual MAGs to identify mobile gene elements (MGE) and the corresponding AMR genotypes that map to these structures. High-throughput sequencing strategies have been deployed to characterize these communities. Accurately defining the profiles of these niches is an essential step towards elucidating the impact of the microbiota on each niche biofilm environment and their evolution.

Keywords: COMBAT, biofilm, metagenomics, high-throughput sequencing

Procedia PDF Downloads 56

1592 Lineup Optimization Model of Basketball Players Based on the Prediction of Recursive Neural Networks

Authors: Wang Yichen, Haruka Yamashita

Abstract:

In recent years, in the field of sports, decision making such as member in the game and strategy of the game based on then analysis of the accumulated sports data are widely attempted. In fact, in the NBA basketball league where the world's highest level players gather, to win the games, teams analyze the data using various statistical techniques. However, it is difficult to analyze the game data for each play such as the ball tracking or motion of the players in the game, because the situation of the game changes rapidly, and the structure of the data should be complicated. Therefore, it is considered that the analysis method for real time game play data is proposed. In this research, we propose an analytical model for "determining the optimal lineup composition" using the real time play data, which is considered to be difficult for all coaches. In this study, because replacing the entire lineup is too complicated, and the actual question for the replacement of players is "whether or not the lineup should be changed", and “whether or not Small Ball lineup is adopted”. Therefore, we propose an analytical model for the optimal player selection problem based on Small Ball lineups. In basketball, we can accumulate scoring data for each play, which indicates a player's contribution to the game, and the scoring data can be considered as a time series data. In order to compare the importance of players in different situations and lineups, we combine RNN (Recurrent Neural Network) model, which can analyze time series data, and NN (Neural Network) model, which can analyze the situation on the field, to build the prediction model of score. This model is capable to identify the current optimal lineup for different situations. In this research, we collected all the data of accumulated data of NBA from 2019-2020. Then we apply the method to the actual basketball play data to verify the reliability of the proposed model.

Keywords: recurrent neural network, players lineup, basketball data, decision making model

Procedia PDF Downloads 133

1591 Biophysical Characterization of Archaeal Cyclophilin Like Chaperone Protein

Authors: Vineeta Kaushik, Manisha Goel

Abstract:

Chaperones are proteins that help other proteins fold correctly, and are found in all domains of life i.e., prokaryotes, eukaryotes and archaea. Various comparative genomic studies have suggested that the archaeal protein folding machinery appears to be highly similar to that found in eukaryotes. In case of protein folding; slow rotation of peptide prolyl-imide bond is often the rate limiting step. Formation of the prolyl-imide bond during the folding of a protein requires the assistance of other proteins, termed as peptide prolyl cis-trans isomerases (PPIases). Cyclophilins constitute the class of peptide prolyl isomerases with a wide range of biological function like protein folding, signaling and chaperoning. Most of the cyclophilins exhibit PPIase enzymatic activity and play active role in substrate protein folding which classifies them as a category of molecular chaperones. Till date, there is not very much data available in the literature on archaeal cyclophilins. We aim to compare the structural and biochemical features of the cyclophilin protein from within the three domains to elucidate the features affecting their stability and enzyme activity. In the present study, we carry out in-silico analysis of the cyclophilin proteins to predict their conserved residues, sites under positive selection and compare these proteins to their bacterial and eukaryotic counterparts to predict functional divergence. We also aim to clone and express these proteins in heterologous system and study their biophysical characteristics in detail using techniques like CD and fluorescence spectroscopy. Overall we aim to understand the features contributing to the folding, stability and dynamics of the archaeal cyclophilin proteins.

Keywords: biophysical characterization, x-ray crystallography, chaperone-like activity, cyclophilin, PPIase activity

Procedia PDF Downloads 213

1590 Comparing Performance of Neural Network and Decision Tree in Prediction of Myocardial Infarction

Authors: Reza Safdari, Goli Arji, Robab Abdolkhani Maryam zahmatkeshan

Abstract:

Background and purpose: Cardiovascular diseases are among the most common diseases in all societies. The most important step in minimizing myocardial infarction and its complications is to minimize its risk factors. The amount of medical data is increasingly growing. Medical data mining has a great potential for transforming these data into information. Using data mining techniques to generate predictive models for identifying those at risk for reducing the effects of the disease is very helpful. The present study aimed to collect data related to risk factors of heart infarction from patients’ medical record and developed predicting models using data mining algorithm. Methods: The present work was an analytical study conducted on a database containing 350 records. Data were related to patients admitted to Shahid Rajaei specialized cardiovascular hospital, Iran, in 2011. Data were collected using a four-sectioned data collection form. Data analysis was performed using SPSS and Clementine version 12. Seven predictive algorithms and one algorithm-based model for predicting association rules were applied to the data. Accuracy, precision, sensitivity, specificity, as well as positive and negative predictive values were determined and the final model was obtained. Results: five parameters, including hypertension, DLP, tobacco smoking, diabetes, and A+ blood group, were the most critical risk factors of myocardial infarction. Among the models, the neural network model was found to have the highest sensitivity, indicating its ability to successfully diagnose the disease. Conclusion: Risk prediction models have great potentials in facilitating the management of a patient with a specific disease. Therefore, health interventions or change in their life style can be conducted based on these models for improving the health conditions of the individuals at risk.

Keywords: decision trees, neural network, myocardial infarction, Data Mining

Procedia PDF Downloads 429

1589 Machine Learning Approach for Predicting Students’ Academic Performance and Study Strategies Based on Their Motivation

Authors: Fidelia A. Orji, Julita Vassileva

Abstract:

This research aims to develop machine learning models for students' academic performance and study strategy prediction, which could be generalized to all courses in higher education. Key learning attributes (intrinsic, extrinsic, autonomy, relatedness, competence, and self-esteem) used in building the models are chosen based on prior studies, which revealed that the attributes are essential in students’ learning process. Previous studies revealed the individual effects of each of these attributes on students’ learning progress. However, few studies have investigated the combined effect of the attributes in predicting student study strategy and academic performance to reduce the dropout rate. To bridge this gap, we used Scikit-learn in python to build five machine learning models (Decision Tree, K-Nearest Neighbour, Random Forest, Linear/Logistic Regression, and Support Vector Machine) for both regression and classification tasks to perform our analysis. The models were trained, evaluated, and tested for accuracy using 924 university dentistry students' data collected by Chilean authors through quantitative research design. A comparative analysis of the models revealed that the tree-based models such as the random forest (with prediction accuracy of 94.9%) and decision tree show the best results compared to the linear, support vector, and k-nearest neighbours. The models built in this research can be used in predicting student performance and study strategy so that appropriate interventions could be implemented to improve student learning progress. Thus, incorporating strategies that could improve diverse student learning attributes in the design of online educational systems may increase the likelihood of students continuing with their learning tasks as required. Moreover, the results show that the attributes could be modelled together and used to adapt/personalize the learning process.

Keywords: classification models, learning strategy, predictive modeling, regression models, student academic performance, student motivation, supervised machine learning

Procedia PDF Downloads 128