Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 2271

Search results for: seizure prediction

1491 Machine Learning Approaches Based on Recency, Frequency, Monetary (RFM) and K-Means for Predicting Electrical Failures and Voltage Reliability in Smart Cities

Authors: Panaya Sudta, Wanchalerm Patanacharoenwong, Prachya Bumrungkun

Abstract:

As With the evolution of smart grids, ensuring the reliability and efficiency of electrical systems in smart cities has become crucial. This paper proposes a distinct approach that combines advanced machine learning techniques to accurately predict electrical failures and address voltage reliability issues. This approach aims to improve the accuracy and efficiency of reliability evaluations in smart cities. The aim of this research is to develop a comprehensive predictive model that accurately predicts electrical failures and voltage reliability in smart cities. This model integrates RFM analysis, K-means clustering, and LSTM networks to achieve this objective. The research utilizes RFM analysis, traditionally used in customer value assessment, to categorize and analyze electrical components based on their failure recency, frequency, and monetary impact. K-means clustering is employed to segment electrical components into distinct groups with similar characteristics and failure patterns. LSTM networks are used to capture the temporal dependencies and patterns in customer data. This integration of RFM, K-means, and LSTM results in a robust predictive tool for electrical failures and voltage reliability. The proposed model has been tested and validated on diverse electrical utility datasets. The results show a significant improvement in prediction accuracy and reliability compared to traditional methods, achieving an accuracy of 92.78% and an F1-score of 0.83. This research contributes to the proactive maintenance and optimization of electrical infrastructures in smart cities. It also enhances overall energy management and sustainability. The integration of advanced machine learning techniques in the predictive model demonstrates the potential for transforming the landscape of electrical system management within smart cities. The research utilizes diverse electrical utility datasets to develop and validate the predictive model. RFM analysis, K-means clustering, and LSTM networks are applied to these datasets to analyze and predict electrical failures and voltage reliability. The research addresses the question of how accurately electrical failures and voltage reliability can be predicted in smart cities. It also investigates the effectiveness of integrating RFM analysis, K-means clustering, and LSTM networks in achieving this goal. The proposed approach presents a distinct, efficient, and effective solution for predicting and mitigating electrical failures and voltage issues in smart cities. It significantly improves prediction accuracy and reliability compared to traditional methods. This advancement contributes to the proactive maintenance and optimization of electrical infrastructures, overall energy management, and sustainability in smart cities.

Keywords: electrical state prediction, smart grids, data-driven method, long short-term memory, RFM, k-means, machine learning

Procedia PDF Downloads 49

1490 Heart Rate Variability Analysis for Early Stage Prediction of Sudden Cardiac Death

Authors: Reeta Devi, Hitender Kumar Tyagi, Dinesh Kumar

Abstract:

In present scenario, cardiovascular problems are growing challenge for researchers and physiologists. As heart disease have no geographic, gender or socioeconomic specific reasons; detecting cardiac irregularities at early stage followed by quick and correct treatment is very important. Electrocardiogram is the finest tool for continuous monitoring of heart activity. Heart rate variability (HRV) is used to measure naturally occurring oscillations between consecutive cardiac cycles. Analysis of this variability is carried out using time domain, frequency domain and non-linear parameters. This paper presents HRV analysis of the online dataset for normal sinus rhythm (taken as healthy subject) and sudden cardiac death (SCD subject) using all three methods computing values for parameters like standard deviation of node to node intervals (SDNN), square root of mean of the sequences of difference between adjacent RR intervals (RMSSD), mean of R to R intervals (mean RR) in time domain, very low-frequency (VLF), low-frequency (LF), high frequency (HF) and ratio of low to high frequency (LF/HF ratio) in frequency domain and Poincare plot for non linear analysis. To differentiate HRV of healthy subject from subject died with SCD, k –nearest neighbor (k-NN) classifier has been used because of its high accuracy. Results show highly reduced values for all stated parameters for SCD subjects as compared to healthy ones. As the dataset used for SCD patients is recording of their ECG signal one hour prior to their death, it is therefore, verified with an accuracy of 95% that proposed algorithm can identify mortality risk of a patient one hour before its death. The identification of a patient’s mortality risk at such an early stage may prevent him/her meeting sudden death if in-time and right treatment is given by the doctor.

Keywords: early stage prediction, heart rate variability, linear and non-linear analysis, sudden cardiac death

Procedia PDF Downloads 337

1489 Implementation of Deep Neural Networks for Pavement Condition Index Prediction

Authors: M. Sirhan, S. Bekhor, A. Sidess

Abstract:

In-service pavements deteriorate with time due to traffic wheel loads, environment, and climate conditions. Pavement deterioration leads to a reduction in their serviceability and structural behavior. Consequently, proper maintenance and rehabilitation (M&R) are necessary actions to keep the in-service pavement network at the desired level of serviceability. Due to resource and financial constraints, the pavement management system (PMS) prioritizes roads most in need of maintenance and rehabilitation action. It recommends a suitable action for each pavement based on the performance and surface condition of each road in the network. The pavement performance and condition are usually quantified and evaluated by different types of roughness-based and stress-based indices. Examples of such indices are Pavement Serviceability Index (PSI), Pavement Serviceability Ratio (PSR), Mean Panel Rating (MPR), Pavement Condition Rating (PCR), Ride Number (RN), Profile Index (PI), International Roughness Index (IRI), and Pavement Condition Index (PCI). PCI is commonly used in PMS as an indicator of the extent of the distresses on the pavement surface. PCI values range between 0 and 100; where 0 and 100 represent a highly deteriorated pavement and a newly constructed pavement, respectively. The PCI value is a function of distress type, severity, and density (measured as a percentage of the total pavement area). PCI is usually calculated iteratively using the 'Paver' program developed by the US Army Corps. The use of soft computing techniques, especially Artificial Neural Network (ANN), has become increasingly popular in the modeling of engineering problems. ANN techniques have successfully modeled the performance of the in-service pavements, due to its efficiency in predicting and solving non-linear relationships and dealing with an uncertain large amount of data. Typical regression models, which require a pre-defined relationship, can be replaced by ANN, which was found to be an appropriate tool for predicting the different pavement performance indices versus different factors as well. Subsequently, the objective of the presented study is to develop and train an ANN model that predicts the PCI values. The model’s input consists of percentage areas of 11 different damage types; alligator cracking, swelling, rutting, block cracking, longitudinal/transverse cracking, edge cracking, shoving, raveling, potholes, patching, and lane drop off, at three severity levels (low, medium, high) for each. The developed model was trained using 536,000 samples and tested on 134,000 samples. The samples were collected and prepared by The National Transport Infrastructure Company. The predicted results yielded satisfactory compliance with field measurements. The proposed model predicted PCI values with relatively low standard deviations, suggesting that it could be incorporated into the PMS for PCI determination. It is worth mentioning that the most influencing variables for PCI prediction are damages related to alligator cracking, swelling, rutting, and potholes.

Keywords: artificial neural networks, computer programming, pavement condition index, pavement management, performance prediction

Procedia PDF Downloads 131

1488 Validation of Nutritional Assessment Scores in Prediction of Mortality and Duration of Admission in Elderly, Hospitalized Patients: A Cross-Sectional Study

Authors: Christos Lampropoulos, Maria Konsta, Vicky Dradaki, Irini Dri, Konstantina Panouria, Tamta Sirbilatze, Ifigenia Apostolou, Vaggelis Lambas, Christina Kordali, Georgios Mavras

Abstract:

Objectives: Malnutrition in hospitalized patients is related to increased morbidity and mortality. The purpose of our study was to compare various nutritional scores in order to detect the most suitable one for assessing the nutritional status of elderly, hospitalized patients and correlate them with mortality and extension of admission duration, due to patients’ critical condition. Methods: Sample population included 150 patients (78 men, 72 women, mean age 80±8.2). Nutritional status was assessed by Mini Nutritional Assessment (MNA full, short-form), Malnutrition Universal Screening Tool (MUST) and short Nutritional Appetite Questionnaire (sNAQ). Sensitivity, specificity, positive and negative predictive values and ROC curves were assessed after adjustment for the cause of current admission, a known prognostic factor according to previously applied multivariate models. Primary endpoints were mortality (from admission until 6 months afterwards) and duration of hospitalization, compared to national guidelines for closed consolidated medical expenses. Results: Concerning mortality, MNA (short-form and full) and SNAQ had similar, low sensitivity (25.8%, 25.8% and 35.5% respectively) while MUST had higher sensitivity (48.4%). In contrast, all the questionnaires had high specificity (94%-97.5%). Short-form MNA and sNAQ had the best positive predictive value (72.7% and 78.6% respectively) whereas all the questionnaires had similar negative predictive value (83.2%-87.5%). MUST had the highest ROC curve (0.83) in contrast to the rest questionnaires (0.73-0.77). With regard to extension of admission duration, all four scores had relatively low sensitivity (48.7%-56.7%), specificity (68.4%-77.6%), positive predictive value (63.1%-69.6%), negative predictive value (61%-63%) and ROC curve (0.67-0.69). Conclusion: MUST questionnaire is more advantageous in predicting mortality due to its higher sensitivity and ROC curve. None of the nutritional scores is suitable for prediction of extended hospitalization.

Keywords: duration of admission, malnutrition, nutritional assessment scores, prognostic factors for mortality

Procedia PDF Downloads 338

1487 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach

Authors: James Ladzekpo

Abstract:

Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.

Keywords: diabetes, machine learning, prediction, biomarkers

Procedia PDF Downloads 49

1486 The Prediction of Evolutionary Process of Coloured Vision in Mammals: A System Biology Approach

Authors: Shivani Sharma, Prashant Saxena, Inamul Hasan Madar

Abstract:

Since the time of Darwin, it has been considered that genetic change is the direct indicator of variation in phenotype. But a few studies in system biology in the past years have proposed that epigenetic developmental processes also affect the phenotype thus shifting the focus from a linear genotype-phenotype map to a non-linear G-P map. In this paper, we attempt at explaining the evolution of colour vision in mammals by taking LWS/ Long-wave sensitive gene under consideration.

Keywords: evolution, phenotypes, epigenetics, LWS gene, G-P map

Procedia PDF Downloads 513

1485 Applying Semi-Automatic Digital Aerial Survey Technology and Canopy Characters Classification for Surface Vegetation Interpretation of Archaeological Sites

Authors: Yung-Chung Chuang

Abstract:

The cultural layers of archaeological sites are mainly affected by surface land use, land cover, and root system of surface vegetation. For this reason, continuous monitoring of land use and land cover change is important for archaeological sites protection and management. However, in actual operation, on-site investigation and orthogonal photograph interpretation require a lot of time and manpower. For this reason, it is necessary to perform a good alternative for surface vegetation survey in an automated or semi-automated manner. In this study, we applied semi-automatic digital aerial survey technology and canopy characters classification with very high-resolution aerial photographs for surface vegetation interpretation of archaeological sites. The main idea is based on different landscape or forest type can easily be distinguished with canopy characters (e.g., specific texture distribution, shadow effects and gap characters) extracted by semi-automatic image classification. A novel methodology to classify the shape of canopy characters using landscape indices and multivariate statistics was also proposed. Non-hierarchical cluster analysis was used to assess the optimal number of canopy character clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy character classification (seven categories). Therefore, people could easily predict the forest type and vegetation land cover by corresponding to the specific canopy character category. The results showed that the semi-automatic classification could effectively extract the canopy characters of forest and vegetation land cover. As for forest type and vegetation type prediction, the average prediction accuracy reached 80.3%~91.7% with different sizes of test frame. It represented this technology is useful for archaeological site survey, and can improve the classification efficiency and data update rate.

Keywords: digital aerial survey, canopy characters classification, archaeological sites, multivariate statistics

Procedia PDF Downloads 136

1484 A Generalized Weighted Loss for Support Vextor Classification and Multilayer Perceptron

Authors: Filippo Portera

Abstract:

Usually standard algorithms employ a loss where each error is the mere absolute difference between the true value and the prediction, in case of a regression task. In the present, we present several error weighting schemes that are a generalization of the consolidated routine. We study both a binary classification model for Support Vextor Classification and a regression net for Multylayer Perceptron. Results proves that the error is never worse than the standard procedure and several times it is better.

Keywords: loss, binary-classification, MLP, weights, regression

Procedia PDF Downloads 86

1483 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 233

1482 Validation of Asymptotic Techniques to Predict Bistatic Radar Cross Section

Authors: M. Pienaar, J. W. Odendaal, J. C. Smit, J. Joubert

Abstract:

Simulations are commonly used to predict the bistatic radar cross section (RCS) of military targets since characterization measurements can be expensive and time consuming. It is thus important to accurately predict the bistatic RCS of targets. Computational electromagnetic (CEM) methods can be used for bistatic RCS prediction. CEM methods are divided into full-wave and asymptotic methods. Full-wave methods are numerical approximations to the exact solution of Maxwell’s equations. These methods are very accurate but are computationally very intensive and time consuming. Asymptotic techniques make simplifying assumptions in solving Maxwell's equations and are thus less accurate but require less computational resources and time. Asymptotic techniques can thus be very valuable for the prediction of bistatic RCS of electrically large targets, due to the decreased computational requirements. This study extends previous work by validating the accuracy of asymptotic techniques to predict bistatic RCS through comparison with full-wave simulations as well as measurements. Validation is done with canonical structures as well as complex realistic aircraft models instead of only looking at a complex slicy structure. The slicy structure is a combination of canonical structures, including cylinders, corner reflectors and cubes. Validation is done over large bistatic angles and at different polarizations. Bistatic RCS measurements were conducted in a compact range, at the University of Pretoria, South Africa. The measurements were performed at different polarizations from 2 GHz to 6 GHz. Fixed bistatic angles of β = 30.8°, 45° and 90° were used. The measurements were calibrated with an active calibration target. The EM simulation tool FEKO was used to generate simulated results. The full-wave multi-level fast multipole method (MLFMM) simulated results together with the measured data were used as reference for validation. The accuracy of physical optics (PO) and geometrical optics (GO) was investigated. Differences relating to amplitude, lobing structure and null positions were observed between the asymptotic, full-wave and measured data. PO and GO were more accurate at angles close to the specular scattering directions and the accuracy seemed to decrease as the bistatic angle increased. At large bistatic angles PO did not perform well due to the shadow regions not being treated appropriately. PO also did not perform well for canonical structures where multi-bounce was the main scattering mechanism. PO and GO do not account for diffraction but these inaccuracies tended to decrease as the electrical size of objects increased. It was evident that both asymptotic techniques do not properly account for bistatic structural shadowing. Specular scattering was calculated accurately even if targets did not meet the electrically large criteria. It was evident that the bistatic RCS prediction performance of PO and GO depends on incident angle, frequency, target shape and observation angle. The improved computational efficiency of the asymptotic solvers yields a major advantage over full-wave solvers and measurements; however, there is still much room for improvement of the accuracy of these asymptotic techniques.

Keywords: asymptotic techniques, bistatic RCS, geometrical optics, physical optics

Procedia PDF Downloads 251

1481 Field Prognostic Factors on Discharge Prediction of Traumatic Brain Injuries

Authors: Mohammad Javad Behzadnia, Amir Bahador Boroumand

Abstract:

Introduction: Limited facility situations require allocating the most available resources for most casualties. Accordingly, Traumatic Brain Injury (TBI) is the one that may need to transport the patient as soon as possible. In a mass casualty event, deciding when the facilities are restricted is hard. The Extended Glasgow Outcome Score (GOSE) has been introduced to assess the global outcome after brain injuries. Therefore, we aimed to evaluate the prognostic factors associated with GOSE. Materials and Methods: In a multicenter cross-sectional study conducted on 144 patients with TBI admitted to trauma emergency centers. All the patients with isolated TBI who were mentally and physically healthy before the trauma entered the study. The patient’s information was evaluated, including demographic characteristics, duration of hospital stays, mechanical ventilation on admission laboratory measurements, and on-admission vital signs. We recorded the patients’ TBI-related symptoms and brain computed tomography (CT) scan findings. Results: GOSE assessments showed an increasing trend by the comparison of on-discharge (7.47 ± 1.30), within a month (7.51 ± 1.30), and within three months (7.58 ± 1.21) evaluations (P < 0.001). On discharge, GOSE was positively correlated with Glasgow Coma Scale (GCS) (r = 0.729, P < 0.001) and motor GCS (r = 0.812, P < 0.001), and inversely with age (r = −0.261, P = 0.002), hospitalization period (r = −0.678, P < 0.001), pulse rate (r = −0.256, P = 0.002) and white blood cell (WBC). Among imaging signs and trauma-related symptoms in univariate analysis, intracranial hemorrhage (ICH), interventricular hemorrhage (IVH) (P = 0.006), subarachnoid hemorrhage (SAH) (P = 0.06; marginally at P < 0.1), subdural hemorrhage (SDH) (P = 0.032), and epidural hemorrhage (EDH) (P = 0.037) were significantly associated with GOSE at discharge in multivariable analysis. Conclusion: Our study showed some predictive factors that could help to decide which casualty should transport earlier to a trauma center. According to the current study findings, GCS, pulse rate, WBC, and among imaging signs and trauma-related symptoms, ICH, IVH, SAH, SDH, and EDH are significant independent predictors of GOSE at discharge in TBI patients.

Keywords: field, Glasgow outcome score, prediction, traumatic brain injury.

Procedia PDF Downloads 69

1480 Estimation of Fragility Curves Using Proposed Ground Motion Selection and Scaling Procedure

Authors: Esra Zengin, Sinan Akkar

Abstract:

Reliable and accurate prediction of nonlinear structural response requires specification of appropriate earthquake ground motions to be used in nonlinear time history analysis. The current research has mainly focused on selection and manipulation of real earthquake records that can be seen as the most critical step in the performance based seismic design and assessment of the structures. Utilizing amplitude scaled ground motions that matches with the target spectra is commonly used technique for the estimation of nonlinear structural response. Representative ground motion ensembles are selected to match target spectrum such as scenario-based spectrum derived from ground motion prediction equations, Uniform Hazard Spectrum (UHS), Conditional Mean Spectrum (CMS) or Conditional Spectrum (CS). Different sets of criteria exist among those developed methodologies to select and scale ground motions with the objective of obtaining robust estimation of the structural performance. This study presents ground motion selection and scaling procedure that considers the spectral variability at target demand with the level of ground motion dispersion. The proposed methodology provides a set of ground motions whose response spectra match target median and corresponding variance within a specified period interval. The efficient and simple algorithm is used to assemble the ground motion sets. The scaling stage is based on the minimization of the error between scaled median and the target spectra where the dispersion of the earthquake shaking is preserved along the period interval. The impact of the spectral variability on nonlinear response distribution is investigated at the level of inelastic single degree of freedom systems. In order to see the effect of different selection and scaling methodologies on fragility curve estimations, results are compared with those obtained by CMS-based scaling methodology. The variability in fragility curves due to the consideration of dispersion in ground motion selection process is also examined.

Keywords: ground motion selection, scaling, uncertainty, fragility curve

Procedia PDF Downloads 581

1479 A Long Short-Term Memory Based Deep Learning Model for Corporate Bond Price Predictions

Authors: Vikrant Gupta, Amrit Goswami

Abstract:

The fixed income market forms the basis of the modern financial market. All other assets in financial markets derive their value from the bond market. Owing to its over-the-counter nature, corporate bonds have relatively less data publicly available and thus is researched upon far less compared to Equities. Bond price prediction is a complex financial time series forecasting problem and is considered very crucial in the domain of finance. The bond prices are highly volatile and full of noise which makes it very difficult for traditional statistical time-series models to capture the complexity in series patterns which leads to inefficient forecasts. To overcome the inefficiencies of statistical models, various machine learning techniques were initially used in the literature for more accurate forecasting of time-series. However, simple machine learning methods such as linear regression, support vectors, random forests fail to provide efficient results when tested on highly complex sequences such as stock prices and bond prices. hence to capture these intricate sequence patterns, various deep learning-based methodologies have been discussed in the literature. In this study, a recurrent neural network-based deep learning model using long short term networks for prediction of corporate bond prices has been discussed. Long Short Term networks (LSTM) have been widely used in the literature for various sequence learning tasks in various domains such as machine translation, speech recognition, etc. In recent years, various studies have discussed the effectiveness of LSTMs in forecasting complex time-series sequences and have shown promising results when compared to other methodologies. LSTMs are a special kind of recurrent neural networks which are capable of learning long term dependencies due to its memory function which traditional neural networks fail to capture. In this study, a simple LSTM, Stacked LSTM and a Masked LSTM based model has been discussed with respect to varying input sequences (three days, seven days and 14 days). In order to facilitate faster learning and to gradually decompose the complexity of bond price sequence, an Empirical Mode Decomposition (EMD) has been used, which has resulted in accuracy improvement of the standalone LSTM model. With a variety of Technical Indicators and EMD decomposed time series, Masked LSTM outperformed the other two counterparts in terms of prediction accuracy. To benchmark the proposed model, the results have been compared with traditional time series models (ARIMA), shallow neural networks and above discussed three different LSTM models. In summary, our results show that the use of LSTM models provide more accurate results and should be explored more within the asset management industry.

Keywords: bond prices, long short-term memory, time series forecasting, empirical mode decomposition

Procedia PDF Downloads 133

1478 Measuring Enterprise Growth: Pitfalls and Implications

Authors: N. Šarlija, S. Pfeifer, M. Jeger, A. Bilandžić

Abstract:

Enterprise growth is generally considered as a key driver of competitiveness, employment, economic development and social inclusion. As such, it is perceived to be a highly desirable outcome of entrepreneurship for scholars and decision makers. The huge academic debate resulted in the multitude of theoretical frameworks focused on explaining growth stages, determinants and future prospects. It has been widely accepted that enterprise growth is most likely nonlinear, temporal and related to the variety of factors which reflect the individual, firm, organizational, industry or environmental determinants of growth. However, factors that affect growth are not easily captured, instruments to measure those factors are often arbitrary, causality between variables and growth is elusive, indicating that growth is not easily modeled. Furthermore, in line with heterogeneous nature of the growth phenomenon, there is a vast number of measurement constructs assessing growth which are used interchangeably. Differences among various growth measures, at conceptual as well as at operationalization level, can hinder theory development which emphasizes the need for more empirically robust studies. In line with these highlights, the main purpose of this paper is twofold. Firstly, to compare structure and performance of three growth prediction models based on the main growth measures: Revenues, employment and assets growth. Secondly, to explore the prospects of financial indicators, set as exact, visible, standardized and accessible variables, to serve as determinants of enterprise growth. Finally, to contribute to the understanding of the implications on research results and recommendations for growth caused by different growth measures. The models include a range of financial indicators as lag determinants of the enterprises’ performances during the 2008-2013, extracted from the national register of the financial statements of SMEs in Croatia. The design and testing stage of the modeling used the logistic regression procedures. Findings confirm that growth prediction models based on different measures of growth have different set of predictors. Moreover, the relationship between particular predictors and growth measure is inconsistent, namely the same predictor positively related to one growth measure may exert negative effect on a different growth measure. Overall, financial indicators alone can serve as good proxy of growth and yield adequate predictive power of the models. The paper sheds light on both methodology and conceptual framework of enterprise growth by using a range of variables which serve as a proxy for the multitude of internal and external determinants, but are unlike them, accessible, available, exact and free of perceptual nuances in building up the model. Selection of the growth measure seems to have significant impact on the implications and recommendations related to growth. Furthermore, the paper points out to potential pitfalls of measuring and predicting growth. Overall, the results and the implications of the study are relevant for advancing academic debates on growth-related methodology, and can contribute to evidence-based decisions of policy makers.

Keywords: growth measurement constructs, logistic regression, prediction of growth potential, small and medium-sized enterprises

Procedia PDF Downloads 247

1477 Lineup Optimization Model of Basketball Players Based on the Prediction of Recursive Neural Networks

Authors: Wang Yichen, Haruka Yamashita

Abstract:

In recent years, in the field of sports, decision making such as member in the game and strategy of the game based on then analysis of the accumulated sports data are widely attempted. In fact, in the NBA basketball league where the world's highest level players gather, to win the games, teams analyze the data using various statistical techniques. However, it is difficult to analyze the game data for each play such as the ball tracking or motion of the players in the game, because the situation of the game changes rapidly, and the structure of the data should be complicated. Therefore, it is considered that the analysis method for real time game play data is proposed. In this research, we propose an analytical model for "determining the optimal lineup composition" using the real time play data, which is considered to be difficult for all coaches. In this study, because replacing the entire lineup is too complicated, and the actual question for the replacement of players is "whether or not the lineup should be changed", and “whether or not Small Ball lineup is adopted”. Therefore, we propose an analytical model for the optimal player selection problem based on Small Ball lineups. In basketball, we can accumulate scoring data for each play, which indicates a player's contribution to the game, and the scoring data can be considered as a time series data. In order to compare the importance of players in different situations and lineups, we combine RNN (Recurrent Neural Network) model, which can analyze time series data, and NN (Neural Network) model, which can analyze the situation on the field, to build the prediction model of score. This model is capable to identify the current optimal lineup for different situations. In this research, we collected all the data of accumulated data of NBA from 2019-2020. Then we apply the method to the actual basketball play data to verify the reliability of the proposed model.

Keywords: recurrent neural network, players lineup, basketball data, decision making model

Procedia PDF Downloads 126

1476 Fastidious Enteric Pathogens in HIV

Authors: S. Pathak, R. Lazarus

Abstract:

A 25-year-old male HIV patient (CD4 cells 20/µL and HIV viral load 14200000 copies/ml) with a past medical history of duodenal ulcer, pneumocystis carinii pneumonia, oesophageal candidiasis presented with fever and a seizure to hospital. The only recent travel had been a religious pilgrimage from Singapore to Malaysia 5 days prior; during the trip he sustained skin abrasions. The patient had recently started highly active antiretroviral therapy 2 months prior. Clinical examination was unremarkable other than a temperature of 38.8°C and perianal warts. Laboratory tests showed a leukocyte count 12.5x109 cells/L, haemoglobin 9.4 g/dL, normal biochemistry and a C-reactive protein 121 mg/L. CT head and MRI head were unremarkable and cerebrospinal fluid analysis performed after a delay (due to technical difficulties) of 11 days was unremarkable. Blood cultures (three sets) taken on admission showed Gram-negative rods in the anaerobic bottles only at the end of incubation with culture result confirmed by molecular sequencing showing Helicobacter cinaedi. The patient was treated empirically with ceftriaxone for seven days and this was converted to oral co-amoxiclav for a further seven days after the blood cultures became positive. A Transthoracic echocardiogram was unremarkable. The patient made a full recovery. Helicobacter cinaedi is a gram-negative anaerobic fastidious organism affecting patients with comorbidity. Infection may manifest as cellulitius, colitis or as in this case as bloodstream infection – the latter is often attributed to faeco-oral infection. Laboratory identification requires prolonged culture. Therapeutic options may be limited by resistance to macrolides and fluoroquinolones. The likely pathogen inoculation routes in the case described include gastrointestinal translocation due to proctitis at the site of perianal warts, or breach of the skin via abrasions occurring during the pilgrimage. Such organisms are increasing in prevalence as our patient population ages and patients have multiple comorbidities including HIV. It may be necessary in patients with unexplained fever to prolong incubation of sterile sites including blood in order to identify this unusual fastidious organism.

Keywords: fastidious, Helicobacter cinaedi, HIV, immunocompromised

Procedia PDF Downloads 373

1475 Comparing Performance of Neural Network and Decision Tree in Prediction of Myocardial Infarction

Authors: Reza Safdari, Goli Arji, Robab Abdolkhani Maryam zahmatkeshan

Abstract:

Background and purpose: Cardiovascular diseases are among the most common diseases in all societies. The most important step in minimizing myocardial infarction and its complications is to minimize its risk factors. The amount of medical data is increasingly growing. Medical data mining has a great potential for transforming these data into information. Using data mining techniques to generate predictive models for identifying those at risk for reducing the effects of the disease is very helpful. The present study aimed to collect data related to risk factors of heart infarction from patients’ medical record and developed predicting models using data mining algorithm. Methods: The present work was an analytical study conducted on a database containing 350 records. Data were related to patients admitted to Shahid Rajaei specialized cardiovascular hospital, Iran, in 2011. Data were collected using a four-sectioned data collection form. Data analysis was performed using SPSS and Clementine version 12. Seven predictive algorithms and one algorithm-based model for predicting association rules were applied to the data. Accuracy, precision, sensitivity, specificity, as well as positive and negative predictive values were determined and the final model was obtained. Results: five parameters, including hypertension, DLP, tobacco smoking, diabetes, and A+ blood group, were the most critical risk factors of myocardial infarction. Among the models, the neural network model was found to have the highest sensitivity, indicating its ability to successfully diagnose the disease. Conclusion: Risk prediction models have great potentials in facilitating the management of a patient with a specific disease. Therefore, health interventions or change in their life style can be conducted based on these models for improving the health conditions of the individuals at risk.

Keywords: decision trees, neural network, myocardial infarction, Data Mining

Procedia PDF Downloads 424

1474 Cost-Effectiveness of a Certified Service or Hearing Dog Compared to a Regular Companion Dog

Authors: Lundqvist M., Alwin J., Levin L-A.

Abstract:

Background: Assistance dogs are dogs trained to assist persons with functional impairment or chronic diseases. The assistance dog concept includes different types: guide dogs, hearing dogs, and service dogs. The service dog can further be divided into subgroups of physical services dogs, diabetes alert dogs, and seizure alert dogs. To examine the long-term effects of health care interventions, both in terms of resource use and health outcomes, cost-effectiveness analyses can be conducted. This analysis can provide important input to decision-makers when setting priorities. Little is known when it comes to the cost-effectiveness of assistance dogs. The study aimed to assess the cost-effectiveness of certified service or hearing dogs in comparison to regular companion dogs. Methods: The main data source for the analysis was the “service and hearing dog project”. It was a longitudinal interventional study with a pre-post design that incorporated fifty-five owners and their dogs. Data on all relevant costs affected by the use of a service dog such as; municipal services, health care costs, costs of sick leave, and costs of informal care were collected. Health-related quality of life was measured with the standardized instrument EQ-5D-3L. A decision-analytic Markov model was constructed to conduct the cost-effectiveness analysis. Outcomes were estimated over a 10-year time horizon. The incremental cost-effectiveness ratio expressed as cost per gained quality-adjusted life year was the primary outcome. The analysis employed a societal perspective. Results: The result of the cost-effectiveness analysis showed that compared to a regular companion dog, a certified dog is cost-effective with both lower total costs [-32,000 USD] and more quality-adjusted life-years [0.17]. Also, we will present subgroup results analyzing the cost-effectiveness of physicals service dogs and diabetes alert dogs. Conclusions: The study shows that a certified dog is cost-effective in comparison with a regular companion dog for individuals with functional impairments or chronic diseases. Analyses of uncertainty imply that further studies are needed.

Keywords: service dogs, hearing dogs, health economics, Markov model, quality-adjusted, life years

Procedia PDF Downloads 143

1473 Machine Learning Approach for Predicting Students’ Academic Performance and Study Strategies Based on Their Motivation

Authors: Fidelia A. Orji, Julita Vassileva

Abstract:

This research aims to develop machine learning models for students' academic performance and study strategy prediction, which could be generalized to all courses in higher education. Key learning attributes (intrinsic, extrinsic, autonomy, relatedness, competence, and self-esteem) used in building the models are chosen based on prior studies, which revealed that the attributes are essential in students’ learning process. Previous studies revealed the individual effects of each of these attributes on students’ learning progress. However, few studies have investigated the combined effect of the attributes in predicting student study strategy and academic performance to reduce the dropout rate. To bridge this gap, we used Scikit-learn in python to build five machine learning models (Decision Tree, K-Nearest Neighbour, Random Forest, Linear/Logistic Regression, and Support Vector Machine) for both regression and classification tasks to perform our analysis. The models were trained, evaluated, and tested for accuracy using 924 university dentistry students' data collected by Chilean authors through quantitative research design. A comparative analysis of the models revealed that the tree-based models such as the random forest (with prediction accuracy of 94.9%) and decision tree show the best results compared to the linear, support vector, and k-nearest neighbours. The models built in this research can be used in predicting student performance and study strategy so that appropriate interventions could be implemented to improve student learning progress. Thus, incorporating strategies that could improve diverse student learning attributes in the design of online educational systems may increase the likelihood of students continuing with their learning tasks as required. Moreover, the results show that the attributes could be modelled together and used to adapt/personalize the learning process.

Keywords: classification models, learning strategy, predictive modeling, regression models, student academic performance, student motivation, supervised machine learning

Procedia PDF Downloads 120

1472 Artificial Neural Networks and Hidden Markov Model in Landslides Prediction

Authors: C. S. Subhashini, H. L. Premaratne

Abstract:

Landslides are the most recurrent and prominent disaster in Sri Lanka. Sri Lanka has been subjected to a number of extreme landslide disasters that resulted in a significant loss of life, material damage, and distress. It is required to explore a solution towards preparedness and mitigation to reduce recurrent losses associated with landslides. Artificial Neural Networks (ANNs) and Hidden Markov Model (HMMs) are now widely used in many computer applications spanning multiple domains. This research examines the effectiveness of using Artificial Neural Networks and Hidden Markov Model in landslides predictions and the possibility of applying the modern technology to predict landslides in a prominent geographical area in Sri Lanka. A thorough survey was conducted with the participation of resource persons from several national universities in Sri Lanka to identify and rank the influencing factors for landslides. A landslide database was created using existing topographic; soil, drainage, land cover maps and historical data. The landslide related factors which include external factors (Rainfall and Number of Previous Occurrences) and internal factors (Soil Material, Geology, Land Use, Curvature, Soil Texture, Slope, Aspect, Soil Drainage, and Soil Effective Thickness) are extracted from the landslide database. These factors are used to recognize the possibility to occur landslides by using an ANN and HMM. The model acquires the relationship between the factors of landslide and its hazard index during the training session. These models with landslide related factors as the inputs will be trained to predict three classes namely, ‘landslide occurs’, ‘landslide does not occur’ and ‘landslide likely to occur’. Once trained, the models will be able to predict the most likely class for the prevailing data. Finally compared two models with regards to prediction accuracy, False Acceptance Rates and False Rejection rates and This research indicates that the Artificial Neural Network could be used as a strong decision support system to predict landslides efficiently and effectively than Hidden Markov Model.

Keywords: landslides, influencing factors, neural network model, hidden markov model

Procedia PDF Downloads 378

1471 Predicting Food Waste and Losses Reduction for Fresh Products in Modified Atmosphere Packaging

Authors: Matar Celine, Gaucel Sebastien, Gontard Nathalie, Guilbert Stephane, Guillard Valerie

Abstract:

To increase the very short shelf life of fresh fruits and vegetable, Modified Atmosphere Packaging (MAP) allows an optimal atmosphere composition to be maintained around the product and thus prevent its decay. This technology relies on the modification of internal packaging atmosphere due to equilibrium between production/consumption of gases by the respiring product and gas permeation through the packaging material. While, to the best of our knowledge, benefit of MAP for fresh fruits and vegetable has been widely demonstrated in the literature, its effect on shelf life increase has never been quantified and formalized in a clear and simple manner leading difficult to anticipate its economic and environmental benefit, notably through the decrease of food losses. Mathematical modelling of mass transfers in the food/packaging system is the basis for a better design and dimensioning of the food packaging system. But up to now, existing models did not permit to estimate food quality nor shelf life gain reached by using MAP. However, shelf life prediction is an indispensable prerequisite for quantifying the effect of MAP on food losses reduction. The objective of this work is to propose an innovative approach to predict shelf life of MAP food product and then to link it to a reduction of food losses and wastes. In this purpose, a ‘Virtual MAP modeling tool’ was developed by coupling a new predictive deterioration model (based on visual surface prediction of deterioration encompassing colour, texture and spoilage development) with models of the literature for respiration and permeation. A major input of this modelling tool is the maximal percentage of deterioration (MAD) which was assessed from dedicated consumers’ studies. Strawberries of the variety Charlotte were selected as the model food for its high perishability, high respiration rate; 50-100 ml CO₂/h/kg produced at 20°C, allowing it to be a good representative of challenging post-harvest storage. A value of 13% was determined as a limit of acceptability for the consumers, permitting to define products’ shelf life. The ‘Virtual MAP modeling tool’ was validated in isothermal conditions (5, 10 and 20°C) and in dynamic temperature conditions mimicking commercial post-harvest storage of strawberries. RMSE values were systematically lower than 3% for respectively, O₂, CO₂ and deterioration profiles as a function of time confirming the goodness of model fitting. For the investigated temperature profile, a shelf life gain of 0.33 days was obtained in MAP compared to the conventional storage situation (no MAP condition). Shelf life gain of more than 1 day could be obtained for optimized post-harvest conditions as numerically investigated. Such shelf life gain permitted to anticipate a significant reduction of food losses at the distribution and consumer steps. This food losses' reduction as a function of shelf life gain has been quantified using a dedicated mathematical equation that has been developed for this purpose.

Keywords: food losses and wastes, modified atmosphere packaging, mathematical modeling, shelf life prediction

Procedia PDF Downloads 180

1470 Abridging Pharmaceutical Analysis and Drug Discovery via LC-MS-TOF, NMR, in-silico Toxicity-Bioactivity Profiling for Therapeutic Purposing Zileuton Impurities: Need of Hour

Authors: Saurabh B. Ganorkar, Atul A. Shirkhedkar

Abstract:

The need for investigations protecting against toxic impurities though seems to be a primary requirement; the impurities which may prove non - toxic can be explored for their therapeutic potential if any to assist advanced drug discovery. The essential role of pharmaceutical analysis can thus be extended effectively to achieve it. The present study successfully achieved these objectives with characterization of major degradation products as impurities for Zileuton which has been used for to treat asthma since years. The forced degradation studies were performed to identify the potential degradation products using Ultra-fine Liquid-chromatography. Liquid-chromatography-Mass spectrometry (Time of Flight) and Proton Nuclear Magnetic Resonance Studies were utilized effectively to characterize the drug along with five major oxidative and hydrolytic degradation products (DP’s). The mass fragments were identified for Zileuton and path for the degradation was investigated. The characterized DP’s were subjected to In-Silico studies as XP Molecular Docking to compare the gain or loss in binding affinity with 5-Lipooxygenase enzyme. One of the impurity of was found to have the binding affinity more than the drug itself indicating for its potential to be more bioactive as better Antiasthmatic. The close structural resemblance has the ability to potentiate or reduce bioactivity and or toxicity. The chances of being active biologically at other sites cannot be denied and the same is achieved to some extent by predictions for probability of being active with Prediction of Activity Spectrum for Substances (PASS) The impurities found to be bio-active as Antineoplastic, Antiallergic, and inhibitors of Complement Factor D. The toxicological abilities as Ames-Mutagenicity, Carcinogenicity, Developmental Toxicity and Skin Irritancy were evaluated using Toxicity Prediction by Komputer Assisted Technology (TOPKAT). Two of the impurities were found to be non-toxic as compared to original drug Zileuton. As the drugs are purposed and repurposed effectively the impurities can also be; as they can have more binding affinity; less toxicity and better ability to be bio-active at other biological targets.

Keywords: UFLC, LC-MS-TOF, NMR, Zileuton, impurities, toxicity, bio-activity

Procedia PDF Downloads 189

1469 Comparison of Different Reanalysis Products for Predicting Extreme Precipitation in the Southern Coast of the Caspian Sea

Authors: Parvin Ghafarian, Mohammadreza Mohammadpur Panchah, Mehri Fallahi

Abstract:

Synoptic patterns from surface up to tropopause are very important for forecasting the weather and atmospheric conditions. There are many tools to prepare and analyze these maps. Reanalysis data and the outputs of numerical weather prediction models, satellite images, meteorological radar, and weather station data are used in world forecasting centers to predict the weather. The forecasting extreme precipitating on the southern coast of the Caspian Sea (CS) is the main issue due to complex topography. Also, there are different types of climate in these areas. In this research, we used two reanalysis data such as ECMWF Reanalysis 5th Generation Description (ERA5) and National Centers for Environmental Prediction /National Center for Atmospheric Research (NCEP/NCAR) for verification of the numerical model. ERA5 is the latest version of ECMWF. The temporal resolution of ERA5 is hourly, and the NCEP/NCAR is every six hours. Some atmospheric parameters such as mean sea level pressure, geopotential height, relative humidity, wind speed and direction, sea surface temperature, etc. were selected and analyzed. Some different type of precipitation (rain and snow) was selected. The results showed that the NCEP/NCAR has more ability to demonstrate the intensity of the atmospheric system. The ERA5 is suitable for extract the value of parameters for specific point. Also, ERA5 is appropriate to analyze the snowfall events over CS (snow cover and snow depth). Sea surface temperature has the main role to generate instability over CS, especially when the cold air pass from the CS. Sea surface temperature of NCEP/NCAR product has low resolution near coast. However, both data were able to detect meteorological synoptic patterns that led to heavy rainfall over CS. However, due to the time lag, they are not suitable for forecast centers. The application of these two data is for research and verification of meteorological models. Finally, ERA5 has a better resolution, respect to NCEP/NCAR reanalysis data, but NCEP/NCAR data is available from 1948 and appropriate for long term research.

Keywords: synoptic patterns, heavy precipitation, reanalysis data, snow

Procedia PDF Downloads 113

1468 Physics Informed Deep Residual Networks Based Type-A Aortic Dissection Prediction

Authors: Joy Cao, Min Zhou

Abstract:

Purpose: Acute Type A aortic dissection is a well-known cause of extremely high mortality rate. A highly accurate and cost-effective non-invasive predictor is critically needed so that the patient can be treated at earlier stage. Although various CFD approaches have been tried to establish some prediction frameworks, they are sensitive to uncertainty in both image segmentation and boundary conditions. Tedious pre-processing and demanding calibration procedures requirement further compound the issue, thus hampering their clinical applicability. Using the latest physics informed deep learning methods to establish an accurate and cost-effective predictor framework are amongst the main goals for a better Type A aortic dissection treatment. Methods: Via training a novel physics-informed deep residual network, with non-invasive 4D MRI displacement vectors as inputs, the trained model can cost-effectively calculate all these biomarkers: aortic blood pressure, WSS, and OSI, which are used to predict potential type A aortic dissection to avoid the high mortality events down the road. Results: The proposed deep learning method has been successfully trained and tested with both synthetic 3D aneurysm dataset and a clinical dataset in the aortic dissection context using Google colab environment. In both cases, the model has generated aortic blood pressure, WSS, and OSI results matching the expected patient’s health status. Conclusion: The proposed novel physics-informed deep residual network shows great potential to create a cost-effective, non-invasive predictor framework. Additional physics-based de-noising algorithm will be added to make the model more robust to clinical data noises. Further studies will be conducted in collaboration with big institutions such as Cleveland Clinic with more clinical samples to further improve the model’s clinical applicability.

Keywords: type-a aortic dissection, deep residual networks, blood flow modeling, data-driven modeling, non-invasive diagnostics, deep learning, artificial intelligence.

Procedia PDF Downloads 82

1467 Synchronization of a Perturbed Satellite Attitude Motion

Authors: Sadaoui Djaouida

Abstract:

In this paper, the predictive control method is proposed to control the synchronization of two perturbed satellites attitude motion. Based on delayed feedback control of continuous-time systems combines with the prediction-based method of discrete-time systems, this approach only needs a single controller to realize synchronization, which has considerable significance in reducing the cost and complexity for controller implementation.

Keywords: predictive control, synchronization, satellite attitude, control engineering

Procedia PDF Downloads 550

1466 Multiscale Analysis of Shale Heterogeneity in Silurian Longmaxi Formation from South China

Authors: Xianglu Tang, Zhenxue Jiang, Zhuo Li

Abstract:

Characterization of shale multi scale heterogeneity is an important part to evaluate size and space distribution of shale gas reservoirs in sedimentary basins. The origin of shale heterogeneity has always been a hot research topic for it determines shale micro characteristics description and macro quality reservoir prediction. Shale multi scale heterogeneity was discussed based on thin section observation, FIB-SEM, QEMSCAN, TOC, XRD, mercury intrusion porosimetry (MIP), and nitrogen adsorption analysis from 30 core samples in Silurian Longmaxi formation. Results show that shale heterogeneity can be characterized by pore structure and mineral composition. The heterogeneity of shale pore is showed by different size pores at nm-μm scale. Macropores (pore diameter > 50 nm) have a large percentage of pore volume than mesopores (pore diameter between 2~ 50 nm) and micropores (pore diameter < 2nm). However, they have a low specific surface area than mesopores and micropores. Fractal dimensions of the pores from nitrogen adsorption data are higher than 2.7, what are higher than 2.8 from MIP data, showing extremely complex pore structure. This complexity in pore structure is mainly due to the organic matter and clay minerals with complex pore network structures, and diagenesis makes it more complicated. The heterogeneity of shale minerals is showed by mineral grains, lamina, and different lithology at nm-km scale under the continuous changing horizon. Through analyzing the change of mineral composition at each scale, random arrangement of mineral equal proportion, seasonal climate changes, large changes of sedimentary environment, and provenance supply are considered to be the main reasons that cause shale minerals heterogeneity from microcosmic to macroscopic. Due to scale effect, the change of shale multi scale heterogeneity is a discontinuous process, and there is a transformation boundary between homogeneous and in homogeneous. Therefore, a shale multi scale heterogeneity changing model is established by defining four types of homogeneous unit at different scales, which can be used to guide the prediction of shale gas distribution from micro scale to macro scale.

Keywords: heterogeneity, homogeneous unit, multiscale, shale

Procedia PDF Downloads 444

1465 Fatigue Analysis and Life Estimation of the Helicopter Horizontal Tail under Cyclic Loading by Using Finite Element Method

Authors: Defne Uz

Abstract:

Horizontal Tail of helicopter is exposed to repeated oscillatory loading generated by aerodynamic and inertial loads, and bending moments depending on operating conditions and maneuvers of the helicopter. In order to ensure that maximum stress levels do not exceed certain fatigue limit of the material and to prevent damage, a numerical analysis approach can be utilized through the Finite Element Method. Therefore, in this paper, fatigue analysis of the Horizontal Tail model is studied numerically to predict high-cycle and low-cycle fatigue life related to defined loading. The analysis estimates the stress field at stress concentration regions such as around fastener holes where the maximum principal stresses are considered for each load case. Critical element identification of the main load carrying structural components of the model with rivet holes is performed as a post-process since critical regions with high-stress values are used as an input for fatigue life calculation. Once the maximum stress is obtained at the critical element and the related mean and alternating components, it is compared with the endurance limit by applying Soderberg approach. The constant life straight line provides the limit for several combinations of mean and alternating stresses. The life calculation based on S-N (Stress-Number of Cycles) curve is also applied with fully reversed loading to determine the number of cycles corresponds to the oscillatory stress with zero means. The results determine the appropriateness of the design of the model for its fatigue strength and the number of cycles that the model can withstand for the calculated stress. The effect of correctly determining the critical rivet holes is investigated by analyzing stresses at different structural parts in the model. In the case of low life prediction, alternative design solutions are developed, and flight hours can be estimated for the fatigue safe operation of the model.

Keywords: fatigue analysis, finite element method, helicopter horizontal tail, life prediction, stress concentration

Procedia PDF Downloads 141

1464 Performance and Limitations of Likelihood Based Information Criteria and Leave-One-Out Cross-Validation Approximation Methods

Authors: M. A. C. S. Sampath Fernando, James M. Curran, Renate Meyer

Abstract:

Model assessment, in the Bayesian context, involves evaluation of the goodness-of-fit and the comparison of several alternative candidate models for predictive accuracy and improvements. In posterior predictive checks, the data simulated under the fitted model is compared with the actual data. Predictive model accuracy is estimated using information criteria such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the Deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC). The goal of an information criterion is to obtain an unbiased measure of out-of-sample prediction error. Since posterior checks use the data twice; once for model estimation and once for testing, a bias correction which penalises the model complexity is incorporated in these criteria. Cross-validation (CV) is another method used for examining out-of-sample prediction accuracy. Leave-one-out cross-validation (LOO-CV) is the most computationally expensive variant among the other CV methods, as it fits as many models as the number of observations. Importance sampling (IS), truncated importance sampling (TIS) and Pareto-smoothed importance sampling (PSIS) are generally used as approximations to the exact LOO-CV and utilise the existing MCMC results avoiding expensive computational issues. The reciprocals of the predictive densities calculated over posterior draws for each observation are treated as the raw importance weights. These are in turn used to calculate the approximate LOO-CV of the observation as a weighted average of posterior densities. In IS-LOO, the raw weights are directly used. In contrast, the larger weights are replaced by their modified truncated weights in calculating TIS-LOO and PSIS-LOO. Although, information criteria and LOO-CV are unable to reflect the goodness-of-fit in absolute sense, the differences can be used to measure the relative performance of the models of interest. However, the use of these measures is only valid under specific circumstances. This study has developed 11 models using normal, log-normal, gamma, and student’s t distributions to improve the PCR stutter prediction with forensic data. These models are comprised of four with profile-wide variances, four with locus specific variances, and three which are two-component mixture models. The mean stutter ratio in each model is modeled as a locus specific simple linear regression against a feature of the alleles under study known as the longest uninterrupted sequence (LUS). The use of AIC, BIC, DIC, and WAIC in model comparison has some practical limitations. Even though, IS-LOO, TIS-LOO, and PSIS-LOO are considered to be approximations of the exact LOO-CV, the study observed some drastic deviations in the results. However, there are some interesting relationships among the logarithms of pointwise predictive densities (lppd) calculated under WAIC and the LOO approximation methods. The estimated overall lppd is a relative measure that reflects the overall goodness-of-fit of the model. Parallel log-likelihood profiles for the models conditional on equal posterior variances in lppds were observed. This study illustrates the limitations of the information criteria in practical model comparison problems. In addition, the relationships among LOO-CV approximation methods and WAIC with their limitations are discussed. Finally, useful recommendations that may help in practical model comparisons with these methods are provided.

Keywords: cross-validation, importance sampling, information criteria, predictive accuracy

Procedia PDF Downloads 385

1463 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 161

1462 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 147