Search results for: statistical models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9783

Search results for: statistical models

9513 Focus-Latent Dirichlet Allocation for Aspect-Level Opinion Mining

Authors: Mohsen Farhadloo, Majid Farhadloo

Abstract:

Aspect-level opinion mining that aims at discovering aspects (aspect identification) and their corresponding ratings (sentiment identification) from customer reviews have increasingly attracted attention of researchers and practitioners as it provides valuable insights about products/services from customer's points of view. Instead of addressing aspect identification and sentiment identification in two separate steps, it is possible to simultaneously identify both aspects and sentiments. In recent years many graphical models based on Latent Dirichlet Allocation (LDA) have been proposed to solve both aspect and sentiment identifications in a single step. Although LDA models have been effective tools for the statistical analysis of document collections, they also have shortcomings in addressing some unique characteristics of opinion mining. Our goal in this paper is to address one of the limitations of topic models to date; that is, they fail to directly model the associations among topics. Indeed in many text corpora, it is natural to expect that subsets of the latent topics have higher probabilities. We propose a probabilistic graphical model called focus-LDA, to better capture the associations among topics when applied to aspect-level opinion mining. Our experiments on real-life data sets demonstrate the improved effectiveness of the focus-LDA model in terms of the accuracy of the predictive distributions over held out documents. Furthermore, we demonstrate qualitatively that the focus-LDA topic model provides a natural way of visualizing and exploring unstructured collection of textual data.

Keywords: aspect-level opinion mining, document modeling, Latent Dirichlet Allocation, LDA, sentiment analysis

Procedia PDF Downloads 74
9512 Use of Cyber-Physical Devices for the Implementation of Virtual and Augmented Realities in Bridge Construction

Authors: Muhammmad Fawad

Abstract:

The bridge construction industry has been revolutionized by the applications of Virtual Reality (VR) and Augmented Reality (AR). In this article, the author has focused on the field applications of digital technologies in structural, especially in bridge engineering. This research analyzed the use of VR/AR for the assessment of bridge concepts. For this purpose, the author has used Cyber-Physical Devices, i.e., Oculus Quest (OQ) for the implementation of VR, Trimble Microsoft HoloLens (THL), and Trimble Site Vision (TSV) for the implementation of AR/MR by visualizing the models of bridge planned to be constructed in Poland. The visualization of the models in Extended Reality (XR) is based on the development of BIM models of the bridge, which are further uploaded to the platforms required to implement these models in XR. This research helped to implement the models in MR so a bridge with a 1:1 scale at the exact location was placed, and authorities were presented with the possibility to visualize the exact scale and location of the bridge before its construction.

Keywords: augmented reality, virtual reality, HoloLens, BIM, bridges

Procedia PDF Downloads 86
9511 Public Spending and Economic Growth: An Empirical Analysis of Developed Countries

Authors: Bernur Acikgoz

Abstract:

The purpose of this paper is to investigate the effects of public spending on economic growth and examine the sources of economic growth in developed countries since the 1990s. This paper analyses whether public spending effect on economic growth based on Cobb-Douglas Production Function with the two econometric models with Autoregressive Distributed Lag (ARDL) and Dynamic Fixed Effect (DFE) for 21 developed countries (high-income OECD countries), over the period 1990-2013. Our models results are parallel to each other and the models support that public spending has an important role for economic growth. This result is accurate with theories and previous empirical studies.

Keywords: public spending, economic growth, panel data, ARDL models

Procedia PDF Downloads 328
9510 Space Telemetry Anomaly Detection Based On Statistical PCA Algorithm

Authors: Bassem Nassar, Wessam Hussein, Medhat Mokhtar

Abstract:

The crucial concern of satellite operations is to ensure the health and safety of satellites. The worst case in this perspective is probably the loss of a mission but the more common interruption of satellite functionality can result in compromised mission objectives. All the data acquiring from the spacecraft are known as Telemetry (TM), which contains the wealth information related to the health of all its subsystems. Each single item of information is contained in a telemetry parameter, which represents a time-variant property (i.e. a status or a measurement) to be checked. As a consequence, there is a continuous improvement of TM monitoring systems in order to reduce the time required to respond to changes in a satellite's state of health. A fast conception of the current state of the satellite is thus very important in order to respond to occurring failures. Statistical multivariate latent techniques are one of the vital learning tools that are used to tackle the aforementioned problem coherently. Information extraction from such rich data sources using advanced statistical methodologies is a challenging task due to the massive volume of data. To solve this problem, in this paper, we present a proposed unsupervised learning algorithm based on Principle Component Analysis (PCA) technique. The algorithm is particularly applied on an actual remote sensing spacecraft. Data from the Attitude Determination and Control System (ADCS) was acquired under two operation conditions: normal and faulty states. The models were built and tested under these conditions and the results shows that the algorithm could successfully differentiate between these operations conditions. Furthermore, the algorithm provides competent information in prediction as well as adding more insight and physical interpretation to the ADCS operation.

Keywords: space telemetry monitoring, multivariate analysis, PCA algorithm, space operations

Procedia PDF Downloads 391
9509 Effect of Drag Coefficient Models concerning Global Air-Sea Momentum Flux in Broad Wind Range including Extreme Wind Speeds

Authors: Takeshi Takemoto, Naoya Suzuki, Naohisa Takagaki, Satoru Komori, Masako Terui, George Truscott

Abstract:

Drag coefficient is an important parameter in order to correctly estimate the air-sea momentum flux. However, The parameterization of the drag coefficient hasn’t been established due to the variation in the field data. Instead, a number of drag coefficient model formulae have been proposed, even though almost all these models haven’t discussed the extreme wind speed range. With regards to such models, it is unclear how the drag coefficient changes in the extreme wind speed range as the wind speed increased. In this study, we investigated the effect of the drag coefficient models concerning the air-sea momentum flux in the extreme wind range on a global scale, comparing two different drag coefficient models. Interestingly, one model didn’t discuss the extreme wind speed range while the other model considered it. We found that the difference of the models in the annual global air-sea momentum flux was small because the occurrence frequency of strong wind was approximately 1% with a wind speed of 20m/s or more. However, we also discovered that the difference of the models was shown in the middle latitude where the annual mean air-sea momentum flux was large and the occurrence frequency of strong wind was high. In addition, the estimated data showed that the difference of the models in the drag coefficient was large in the extreme wind speed range and that the largest difference became 23% with a wind speed of 35m/s or more. These results clearly show that the difference of the two models concerning the drag coefficient has a significant impact on the estimation of a regional air-sea momentum flux in an extreme wind speed range such as that seen in a tropical cyclone environment. Furthermore, we estimated each air-sea momentum flux using several kinds of drag coefficient models. We will also provide data from an observation tower and result from CFD (Computational Fluid Dynamics) concerning the influence of wind flow at and around the place.

Keywords: air-sea interaction, drag coefficient, air-sea momentum flux, CFD (Computational Fluid Dynamics)

Procedia PDF Downloads 343
9508 Volatility Switching between Two Regimes

Authors: Josip Visković, Josip Arnerić, Ante Rozga

Abstract:

Based on the fact that volatility is time varying in high frequency data and that periods of high volatility tend to cluster, the most successful and popular models in modelling time varying volatility are GARCH type models. When financial returns exhibit sudden jumps that are due to structural breaks, standard GARCH models show high volatility persistence, i.e. integrated behaviour of the conditional variance. In such situations models in which the parameters are allowed to change over time are more appropriate. This paper compares different GARCH models in terms of their ability to describe structural changes in returns caused by financial crisis at stock markets of six selected central and east European countries. The empirical analysis demonstrates that Markov regime switching GARCH model resolves the problem of excessive persistence and outperforms uni-regime GARCH models in forecasting volatility when sudden switching occurs in response to financial crisis.

Keywords: central and east European countries, financial crisis, Markov switching GARCH model, transition probabilities

Procedia PDF Downloads 200
9507 Agriculture Yield Prediction Using Predictive Analytic Techniques

Authors: Nagini Sabbineni, Rajini T. V. Kanth, B. V. Kiranmayee

Abstract:

India’s economy primarily depends on agriculture yield growth and their allied agro industry products. The agriculture yield prediction is the toughest task for agricultural departments across the globe. The agriculture yield depends on various factors. Particularly countries like India, majority of agriculture growth depends on rain water, which is highly unpredictable. Agriculture growth depends on different parameters, namely Water, Nitrogen, Weather, Soil characteristics, Crop rotation, Soil moisture, Surface temperature and Rain water etc. In our paper, lot of Explorative Data Analysis is done and various predictive models were designed. Further various regression models like Linear, Multiple Linear, Non-linear models are tested for the effective prediction or the forecast of the agriculture yield for various crops in Andhra Pradesh and Telangana states.

Keywords: agriculture yield growth, agriculture yield prediction, explorative data analysis, predictive models, regression models

Procedia PDF Downloads 275
9506 A 7 Dimensional-Quantitative Structure-Activity Relationship Approach Combining Quantum Mechanics Based Grid and Solvation Models to Predict Hotspots and Kinetic Properties of Mutated Enzymes: An Enzyme Engineering Perspective

Authors: R. Pravin Kumar, L. Roopa

Abstract:

Enzymes are molecular machines used in various industries such as pharmaceuticals, cosmetics, food and animal feed, paper and leather processing, biofuel, and etc. Nevertheless, this has been possible only by the breath-taking efforts of the chemists and biologists to evolve/engineer these mysterious biomolecules to work the needful. Main agenda of this enzyme engineering project is to derive screening and selection tools to obtain focused libraries of enzyme variants with desired qualities. The methodologies for this research include the well-established directed evolution, rational redesign and relatively less established yet much faster and accurate insilico methods. This concept was initiated as a Receptor Rependent-4Dimensional Quantitative Structure Activity Relationship (RD-4D-QSAR) to predict kinetic properties of enzymes and extended here to study transaminase by a 7D QSAR approach. Induced-fit scenarios were explored using Quantum Mechanics/Molecular Mechanics (QM/MM) simulations which were then placed in a grid that stores interactions energies derived from QM parameters (QMgrid). In this study, the mutated enzymes were immersed completely inside the QMgrid and this was combined with solvation models to predict descriptors. After statistical screening of descriptors, QSAR models showed > 90% specificity and > 85% sensitivity towards the experimental activity. Mapping descriptors on the enzyme structure revealed hotspots important to enhance the enantioselectivity of the enzyme.

Keywords: QMgrid, QM/MM simulations, RD-4D-QSAR, transaminase

Procedia PDF Downloads 112
9505 Graphical Modeling of High Dimension Processes with an Environmental Application

Authors: Ali S. Gargoum

Abstract:

Graphical modeling plays an important role in providing efficient probability calculations in high dimensional problems (computational efficiency). In this paper, we address one of such problems where we discuss fragmenting puff models and some distributional assumptions concerning models for the instantaneous, emission readings and for the fragmenting process. A graphical representation in terms of a junction tree of the conditional probability breakdown of puffs and puff fragments is proposed.

Keywords: graphical models, influence diagrams, junction trees, Bayesian nets

Procedia PDF Downloads 371
9504 Dynamics of the Landscape in the Different Colonization Models Implemented in the Legal Amazon

Authors: Valdir Moura, FranciléIa De Oliveira E. Silva, Erivelto Mercante, Ranieli Dos Anjos De Souza, Jerry Adriani Johann

Abstract:

Several colonization projects were implemented in the Brazilian Legal Amazon in the 1970s and 1980s. Among all of these colonization projects, the most prominent were those with the Fishbone and Topographic models. Within this scope, the projects of settlements known as Anari and Machadinho were created, which stood out because they are contiguous areas with different models and structure of occupation and colonization. The main objective of this work was to evaluate the dynamics of Land-Use and Land-Cover (LULC) in two different colonization models, implanted in the State of Rondonia in the 1980s. The Fishbone and Topographic models were implanted in the Anari and Machadinho settlements respectively. The understanding of these two forms of occupation will help in future colonization programs of the Brazilian Legal Amazon. These settlements are contiguous areas with different occupancy structures. A 32-year Landsat time series (1984-2016) was used to evaluate the rates and trends in the LULC process in the different colonization models. In the different occupation models analyzed, the results showed a rapid loss of primary and secondary forests (deforestation), mainly due to the dynamics of use, established by the Agriculture/Pasture (A/P) relation and, with heavy dependence due to road construction.

Keywords: land-cover, deforestation, rate fragments, remote sensing, secondary succession

Procedia PDF Downloads 113
9503 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 469
9502 Simulations in Structural Masonry Walls with Chases Horizontal Through Models in State Deformation Plan (2D)

Authors: Raquel Zydeck, Karina Azzolin, Luis Kosteski, Alisson Milani

Abstract:

This work presents numerical models in plane deformations (2D), using the Discrete Element Method formedbybars (LDEM) andtheFiniteElementMethod (FEM), in structuralmasonrywallswith horizontal chasesof 20%, 30%, and 50% deep, located in the central part and 1/3 oftheupperpartofthewall, withcenteredandeccentricloading. Differentcombinationsofboundaryconditionsandinteractionsbetweenthemethodswerestudied.

Keywords: chases in structural masonry walls, discrete element method formed by bars, finite element method, numerical models, boundary condition

Procedia PDF Downloads 135
9501 Stability Analysis of Modelling the Effect of Vaccination and Novel Quarantine-Adjusted Incidence on the Spread of Newcastle Disease

Authors: Nurudeen O. Lasisi, Sirajo Abdulrahman, Abdulkareem A. Ibrahim

Abstract:

Newcastle disease is an infection of domestic poultry and other bird species with the virulent Newcastle disease virus (NDV). In this paper, we study the dynamics of the modeling of the Newcastle disease virus (NDV) using a novel quarantine-adjusted incidence. The comparison of Vaccination, linear incident rate and novel quarantine-adjusted incident rate in the models are discussed. The dynamics of the models yield disease-free and endemic equilibrium states.The effective reproduction numbers of the models are computed in order to measure the relative impact of an individual bird or combined intervention for effective disease control. We showed the local and global stability of endemic equilibrium states of the models and we found that the stability of endemic equilibrium states of models are globally asymptotically stable if the effective reproduction numbers of the models equations are greater than a unit.

Keywords: effective reproduction number, Endemic state, Mathematical model, Newcastle disease virus, novel quarantine-adjusted incidence, stability analysis

Procedia PDF Downloads 67
9500 A Comparative Assessment of Information Value, Fuzzy Expert System Models for Landslide Susceptibility Mapping of Dharamshala and Surrounding, Himachal Pradesh, India

Authors: Kumari Sweta, Ajanta Goswami, Abhilasha Dixit

Abstract:

Landslide is a geomorphic process that plays an essential role in the evolution of the hill-slope and long-term landscape evolution. But its abrupt nature and the associated catastrophic forces of the process can have undesirable socio-economic impacts, like substantial economic losses, fatalities, ecosystem, geomorphologic and infrastructure disturbances. The estimated fatality rate is approximately 1person /100 sq. Km and the average economic loss is more than 550 crores/year in the Himalayan belt due to landslides. This study presents a comparative performance of a statistical bivariate method and a machine learning technique for landslide susceptibility mapping in and around Dharamshala, Himachal Pradesh. The final produced landslide susceptibility maps (LSMs) with better accuracy could be used for land-use planning to prevent future losses. Dharamshala, a part of North-western Himalaya, is one of the fastest-growing tourism hubs with a total population of 30,764 according to the 2011 census and is amongst one of the hundred Indian cities to be developed as a smart city under PM’s Smart Cities Mission. A total of 209 landslide locations were identified in using high-resolution linear imaging self-scanning (LISS IV) data. The thematic maps of parameters influencing landslide occurrence were generated using remote sensing and other ancillary data in the GIS environment. The landslide causative parameters used in the study are slope angle, slope aspect, elevation, curvature, topographic wetness index, relative relief, distance from lineaments, land use land cover, and geology. LSMs were prepared using information value (Info Val), and Fuzzy Expert System (FES) models. Info Val is a statistical bivariate method, in which information values were calculated as the ratio of the landslide pixels per factor class (Si/Ni) to the total landslide pixel per parameter (S/N). Using this information values all parameters were reclassified and then summed in GIS to obtain the landslide susceptibility index (LSI) map. The FES method is a machine learning technique based on ‘mean and neighbour’ strategy for the construction of fuzzifier (input) and defuzzifier (output) membership function (MF) structure, and the FR method is used for formulating if-then rules. Two types of membership structures were utilized for membership function Bell-Gaussian (BG) and Trapezoidal-Triangular (TT). LSI for BG and TT were obtained applying membership function and if-then rules in MATLAB. The final LSMs were spatially and statistically validated. The validation results showed that in terms of accuracy, Info Val (83.4%) is better than BG (83.0%) and TT (82.6%), whereas, in terms of spatial distribution, BG is best. Hence, considering both statistical and spatial accuracy, BG is the most accurate one.

Keywords: bivariate statistical techniques, BG and TT membership structure, fuzzy expert system, information value method, machine learning technique

Procedia PDF Downloads 102
9499 Examining Statistical Monitoring Approach against Traditional Monitoring Techniques in Detecting Data Anomalies during Conduct of Clinical Trials

Authors: Sheikh Omar Sillah

Abstract:

Introduction: Monitoring is an important means of ensuring the smooth implementation and quality of clinical trials. For many years, traditional site monitoring approaches have been critical in detecting data errors but not optimal in identifying fabricated and implanted data as well as non-random data distributions that may significantly invalidate study results. The objective of this paper was to provide recommendations based on best statistical monitoring practices for detecting data-integrity issues suggestive of fabrication and implantation early in the study conduct to allow implementation of meaningful corrective and preventive actions. Methodology: Electronic bibliographic databases (Medline, Embase, PubMed, Scopus, and Web of Science) were used for the literature search, and both qualitative and quantitative studies were sought. Search results were uploaded into Eppi-Reviewer Software, and only publications written in the English language from 2012 were included in the review. Gray literature not considered to present reproducible methods was excluded. Results: A total of 18 peer-reviewed publications were included in the review. The publications demonstrated that traditional site monitoring techniques are not efficient in detecting data anomalies. By specifying project-specific parameters such as laboratory reference range values, visit schedules, etc., with appropriate interactive data monitoring, statistical monitoring can offer early signals of data anomalies to study teams. The review further revealed that statistical monitoring is useful to identify unusual data patterns that might be revealing issues that could impact data integrity or may potentially impact study participants' safety. However, subjective measures may not be good candidates for statistical monitoring. Conclusion: The statistical monitoring approach requires a combination of education, training, and experience sufficient to implement its principles in detecting data anomalies for the statistical aspects of a clinical trial.

Keywords: statistical monitoring, data anomalies, clinical trials, traditional monitoring

Procedia PDF Downloads 45
9498 Electricity Price Forecasting: A Comparative Analysis with Shallow-ANN and DNN

Authors: Fazıl Gökgöz, Fahrettin Filiz

Abstract:

Electricity prices have sophisticated features such as high volatility, nonlinearity and high frequency that make forecasting quite difficult. Electricity price has a volatile and non-random character so that, it is possible to identify the patterns based on the historical data. Intelligent decision-making requires accurate price forecasting for market traders, retailers, and generation companies. So far, many shallow-ANN (artificial neural networks) models have been published in the literature and showed adequate forecasting results. During the last years, neural networks with many hidden layers, which are referred to as DNN (deep neural networks) have been using in the machine learning community. The goal of this study is to investigate electricity price forecasting performance of the shallow-ANN and DNN models for the Turkish day-ahead electricity market. The forecasting accuracy of the models has been evaluated with publicly available data from the Turkish day-ahead electricity market. Both shallow-ANN and DNN approach would give successful result in forecasting problems. Historical load, price and weather temperature data are used as the input variables for the models. The data set includes power consumption measurements gathered between January 2016 and December 2017 with one-hour resolution. In this regard, forecasting studies have been carried out comparatively with shallow-ANN and DNN models for Turkish electricity markets in the related time period. The main contribution of this study is the investigation of different shallow-ANN and DNN models in the field of electricity price forecast. All models are compared regarding their MAE (Mean Absolute Error) and MSE (Mean Square) results. DNN models give better forecasting performance compare to shallow-ANN. Best five MAE results for DNN models are 0.346, 0.372, 0.392, 0,402 and 0.409.

Keywords: deep learning, artificial neural networks, energy price forecasting, turkey

Procedia PDF Downloads 265
9497 Comparison Of Data Mining Models To Predict Future Bridge Conditions

Authors: Pablo Martinez, Emad Mohamed, Osama Mohsen, Yasser Mohamed

Abstract:

Highway and bridge agencies, such as the Ministry of Transportation in Ontario, use the Bridge Condition Index (BCI) which is defined as the weighted condition of all bridge elements to determine the rehabilitation priorities for its bridges. Therefore, accurate forecasting of BCI is essential for bridge rehabilitation budgeting planning. The large amount of data available in regard to bridge conditions for several years dictate utilizing traditional mathematical models as infeasible analysis methods. This research study focuses on investigating different classification models that are developed to predict the bridge condition index in the province of Ontario, Canada based on the publicly available data for 2800 bridges over a period of more than 10 years. The data preparation is a key factor to develop acceptable classification models even with the simplest one, the k-NN model. All the models were tested, compared and statistically validated via cross validation and t-test. A simple k-NN model showed reasonable results (within 0.5% relative error) when predicting the bridge condition in an incoming year.

Keywords: asset management, bridge condition index, data mining, forecasting, infrastructure, knowledge discovery in databases, maintenance, predictive models

Procedia PDF Downloads 164
9496 Social Entrepreneurship on Islamic Perspective: Identifying Research Gap

Authors: Mohd Adib Abd Muin, Shuhairimi Abdullah, Azizan Bahari

Abstract:

Problem: The research problem is lacking of model on social entrepreneurship that focus on Islamic perspective. Objective: The objective of this paper is to analyse the existing model on social entrepreneurship and to identify the research gap on Islamic perspective from existing models. Research Methodology: The research method used in this study is literature review and comparative analysis from 6 existing models of social entrepreneurship. Finding: The research finding shows that 6 existing models on social entrepreneurship has been analysed and it shows that the existing models on social entrepreneurship do not emphasize on Islamic perspective.

Keywords: social entrepreneurship, Islamic perspective, research gap, business management

Procedia PDF Downloads 326
9495 A-Score, Distress Prediction Model with Earning Response during the Financial Crisis: Evidence from Emerging Market

Authors: Sumaira Ashraf, Elisabete G.S. Félix, Zélia Serrasqueiro

Abstract:

Traditional financial distress prediction models performed well to predict bankrupt and insolvent firms of the developed markets. Previous studies particularly focused on the predictability of financial distress, financial failure, and bankruptcy of firms. This paper contributes to the literature by extending the definition of financial distress with the inclusion of early warning signs related to quotation of face value, dividend/bonus declaration, annual general meeting, and listing fee. The study used five well-known distress prediction models to see if they have the ability to predict early warning signs of financial distress. Results showed that the predictive ability of the models varies over time and decreases specifically for the sample with early warning signs of financial distress. Furthermore, the study checked the differences in the predictive ability of the models with respect to the financial crisis. The results conclude that the predictive ability of the traditional financial distress prediction models decreases for the firms with early warning signs of financial distress and during the time of financial crisis. The study developed a new model comprising significant variables from the five models and one new variable earning response. This new model outperforms the old distress prediction models before, during and after the financial crisis. Thus, it can be used by researchers, organizations and all other concerned parties to indicate early warning signs for the emerging markets.

Keywords: financial distress, emerging market, prediction models, Z-Score, logit analysis, probit model

Procedia PDF Downloads 219
9494 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 133
9493 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 123
9492 Regeneration of Geological Models Using Support Vector Machine Assisted by Principal Component Analysis

Authors: H. Jung, N. Kim, B. Kang, J. Choe

Abstract:

History matching is a crucial procedure for predicting reservoir performances and making future decisions. However, it is difficult due to uncertainties of initial reservoir models. Therefore, it is important to have reliable initial models for successful history matching of highly heterogeneous reservoirs such as channel reservoirs. In this paper, we proposed a novel scheme for regenerating geological models using support vector machine (SVM) and principal component analysis (PCA). First, we perform PCA for figuring out main geological characteristics of models. Through the procedure, permeability values of each model are transformed to new parameters by principal components, which have eigenvalues of large magnitude. Secondly, the parameters are projected into two-dimensional plane by multi-dimensional scaling (MDS) based on Euclidean distances. Finally, we train an SVM classifier using 20% models which show the most similar or dissimilar well oil production rates (WOPR) with the true values (10% for each). Then, the other 80% models are classified by trained SVM. We select models on side of low WOPR errors. One hundred channel reservoir models are initially generated by single normal equation simulation. By repeating the classification process, we can select models which have similar geological trend with the true reservoir model. The average field of the selected models is utilized as a probability map for regeneration. Newly generated models can preserve correct channel features and exclude wrong geological properties maintaining suitable uncertainty ranges. History matching with the initial models cannot provide trustworthy results. It fails to find out correct geological features of the true model. However, history matching with the regenerated ensemble offers reliable characterization results by figuring out proper channel trend. Furthermore, it gives dependable prediction of future performances with reduced uncertainties. We propose a novel classification scheme which integrates PCA, MDS, and SVM for regenerating reservoir models. The scheme can easily sort out reliable models which have similar channel trend with the reference in lowered dimension space.

Keywords: history matching, principal component analysis, reservoir modelling, support vector machine

Procedia PDF Downloads 135
9491 Unveiling Karst Features in Miocene Carbonate Reservoirs of Central Luconia-Malaysia: Case Study of F23 Field's Karstification

Authors: Abd Al-Salam Al-Masgari, Haylay Tsegab, Ismailalwali Babikir, Monera A. Shoieb

Abstract:

We present a study of Malaysia's Central Luconia region, which is an essential deposit of Miocene carbonate reservoirs. This study aims to identify and map areas of selected carbonate platforms, develop high-resolution statistical karst models, and generate comprehensive karst geobody models for selected carbonate fields. This study uses seismic characterization and advanced geophysical surveys to identify karst signatures in Miocene carbonate reservoirs. The results highlight the use of variance, RMS, RGB colour blending, and 3D visualization Prop seismic sequence stratigraphy seismic attributes to visualize the karstified areas across the F23 field of Central Luconia. The offshore karst model serves as a powerful visualization tool to reveal the karstization of carbonate sediments of interest. The results of this study contribute to a better understanding of the karst distribution of Miocene carbonate reservoirs in Central Luconia, which are essential for hydrocarbon exploration and production. This is because these features significantly impact the reservoir geometry, flow path and characteristics.

Keywords: karst, central Luconia, seismic attributes, Miocene carbonate build-ups

Procedia PDF Downloads 45
9490 Group Sequential Covariate-Adjusted Response Adaptive Designs for Survival Outcomes

Authors: Yaxian Chen, Yeonhee Park

Abstract:

Driven by evolving FDA recommendations, modern clinical trials demand innovative designs that strike a balance between statistical rigor and ethical considerations. Covariate-adjusted response-adaptive (CARA) designs bridge this gap by utilizing patient attributes and responses to skew treatment allocation in favor of the treatment that is best for an individual patient’s profile. However, existing CARA designs for survival outcomes often hinge on specific parametric models, constraining their applicability in clinical practice. In this article, we address this limitation by introducing a CARA design for survival outcomes (CARAS) based on the Cox model and a variance estimator. This method addresses issues of model misspecification and enhances the flexibility of the design. We also propose a group sequential overlapweighted log-rank test to preserve type I error rate in the context of group sequential trials using extensive simulation studies to demonstrate the clinical benefit, statistical efficiency, and robustness to model misspecification of the proposed method compared to traditional randomized controlled trial designs and response-adaptive randomization designs.

Keywords: cox model, log-rank test, optimal allocation ratio, overlap weight, survival outcome

Procedia PDF Downloads 28
9489 Resistance and Sub-Resistances of RC Beams Subjected to Multiple Failure Modes

Authors: F. Sangiorgio, J. Silfwerbrand, G. Mancini

Abstract:

Geometric and mechanical properties all influence the resistance of RC structures and may, in certain combination of property values, increase the risk of a brittle failure of the whole system. This paper presents a statistical and probabilistic investigation on the resistance of RC beams designed according to Eurocodes 2 and 8, and subjected to multiple failure modes, under both the natural variation of material properties and the uncertainty associated with cross-section and transverse reinforcement geometry. A full probabilistic model based on JCSS Probabilistic Model Code is derived. Different beams are studied through material nonlinear analysis via Monte Carlo simulations. The resistance model is consistent with Eurocode 2. Both a multivariate statistical evaluation and the data clustering analysis of outcomes are then performed. Results show that the ultimate load behaviour of RC beams subjected to flexural and shear failure modes seems to be mainly influenced by the combination of the mechanical properties of both longitudinal reinforcement and stirrups, and the tensile strength of concrete, of which the latter appears to affect the overall response of the system in a nonlinear way. The model uncertainty of the resistance model used in the analysis plays undoubtedly an important role in interpreting results.

Keywords: modelling, Monte Carlo simulations, probabilistic models, data clustering, reinforced concrete members, structural design

Procedia PDF Downloads 447
9488 Stability Analysis of Endemic State of Modelling the Effect of Vaccination and Novel Quarantine-Adjusted Incidence on the Spread of Newcastle Disease Virus

Authors: Nurudeen Oluwasola Lasisi, Abdulkareem Afolabi Ibrahim

Abstract:

Newcastle disease is an infection of domestic poultry and other bird species with virulent Newcastle disease virus (NDV). In this paper, we study the dynamics of modeling the Newcastle disease virus (NDV) using a novel quarantine-adjusted incidence. We do a comparison of Vaccination, linear incident rate, and novel quarantine adjusted incident rate in the models. The dynamics of the models yield disease free and endemic equilibrium states. The effective reproduction numbers of the models are computed in order to measure the relative impact for the individual bird or combined intervention for effective disease control. We showed the local and global stability of endemic equilibrium states of the models, and we found that stability of endemic equilibrium states of models are globally asymptotically stable if the effective reproduction numbers of the models equations are greater than a unit.

Keywords: effective reproduction number, endemic state, mathematical model, Newcastle disease virus, novel quarantine-adjusted incidence, stability analysis

Procedia PDF Downloads 216
9487 Reservoir Fluids: Occurrence, Classification, and Modeling

Authors: Ahmed El-Banbi

Abstract:

Several PVT models exist to represent how PVT properties are handled in sub-surface and surface engineering calculations for oil and gas production. The most commonly used models include black oil, modified black oil (MBO), and compositional models. These models are used in calculations that allow engineers to optimize and forecast well and reservoir performance (e.g., reservoir simulation calculations, material balance, nodal analysis, surface facilities, etc.). The choice of which model is dependent on fluid type and the production process (e.g., depletion, water injection, gas injection, etc.). Based on close to 2,000 reservoir fluid samples collected from different basins and locations, this paper presents some conclusions on the occurrence of reservoir fluids. It also reviews the common methods used to classify reservoir fluid types. Based on new criteria related to the production behavior of different fluids and economic considerations, an updated classification of reservoir fluid types is presented in the paper. Recommendations on the use of different PVT models to simulate the behavior of different reservoir fluid types are discussed. Each PVT model requirement is highlighted. Available methods for the calculation of PVT properties from each model are also discussed. Practical recommendations and tips on how to control the calculations to achieve the most accurate results are given.

Keywords: PVT models, fluid types, PVT properties, fluids classification

Procedia PDF Downloads 43
9486 Extraction of Compound Words in Malay Sentences Using Linguistic and Statistical Approaches

Authors: Zamri Abu Bakar Zamri, Normaly Kamal Ismail Normaly, Mohd Izani Mohamed Rawi Izani

Abstract:

Malay noun compound are phrases that consist of two or more nouns. The key characteristic behind noun compounds lies on its frequent occurrences within the text. Therefore, extracting these noun compounds is essential for several domains of research such as Information Retrieval, Sentiment Analysis and Question Answering. Many research efforts have been proposed in terms of extracting Malay noun compounds using linguistic and statistical approaches. Most of the existing methods have concentrated on the extraction of bi-gram noun+noun compound. However, extracting noun+verb, noun+adjective and noun+prepositional is challenging due to the difficulty of selecting an appropriate method with effective results. Thus, there is still room for improvement in terms of enhancing the effectiveness of compound word extraction. Therefore, this study proposed a combination of linguistic approach and statistical measures in order to enhance the extraction of compound words. Several preprocessing steps are involved including normalization, tokenization, and stemming. The linguistic approach that has been used in this study is Part-of-Speech (POS) tagging. In addition, a new linguistic pattern for named entities has been utilized using a list of Malays named entities in order to enhance the linguistic approach in terms of noun compound recognition. The proposed statistical measures consists of NC-value, NTC-value and NLC value.

Keywords: Compound Word, Noun Compound, Linguistic Approach, Statistical Approach

Procedia PDF Downloads 317
9485 Modeling Curriculum for High School Students to Learn about Electric Circuits

Authors: Meng-Fei Cheng, Wei-Lun Chen, Han-Chang Ma, Chi-Che Tsai

Abstract:

Recent K–12 Taiwan Science Education Curriculum Guideline emphasize the essential role of modeling curriculum in science learning; however, few modeling curricula have been designed and adopted in current science teaching. Therefore, this study aims to develop modeling curriculum on electric circuits to investigate any learning difficulties students have with modeling curriculum and further enhance modeling teaching. This study was conducted with 44 10th-grade students in Central Taiwan. Data collection included a students’ understanding of models in science (SUMS) survey that explored the students' epistemology of scientific models and modeling and a complex circuit problem to investigate the students’ modeling abilities. Data analysis included the following: (1) Paired sample t-tests were used to examine the improvement of students’ modeling abilities and conceptual understanding before and after the curriculum was taught. (2) Paired sample t-tests were also utilized to determine the students’ modeling abilities before and after the modeling activities, and a Pearson correlation was used to understand the relationship between students’ modeling abilities during the activities and on the posttest. (3) ANOVA analysis was used during different stages of the modeling curriculum to investigate the differences between the students’ who developed microscopic models and macroscopic models after the modeling curriculum was taught. (4) Independent sample t-tests were employed to determine whether the students who changed their models had significantly different understandings of scientific models than the students who did not change their models. The results revealed the following: (1) After the modeling curriculum was taught, the students had made significant progress in both their understanding of the science concept and their modeling abilities. In terms of science concepts, this modeling curriculum helped the students overcome the misconception that electric currents reduce after flowing through light bulbs. In terms of modeling abilities, this modeling curriculum helped students employ macroscopic or microscopic models to explain their observed phenomena. (2) Encouraging the students to explain scientific phenomena in different context prompts during the modeling process allowed them to convert their models to microscopic models, but it did not help them continuously employ microscopic models throughout the whole curriculum. The students finally consistently employed microscopic models when they had help visualizing the microscopic models. (3) During the modeling process, the students who revised their own models better understood that models can be changed than the students who did not revise their own models. Also, the students who revised their models to explain different scientific phenomena tended to regard models as explanatory tools. In short, this study explored different strategies to facilitate students’ modeling processes as well as their difficulties with the modeling process. The findings can be used to design and teach modeling curricula and help students enhance their modeling abilities.

Keywords: electric circuits, modeling curriculum, science learning, scientific model

Procedia PDF Downloads 430
9484 A Structuring and Classification Method for Assigning Application Areas to Suitable Digital Factory Models

Authors: R. Hellmuth

Abstract:

The method of factory planning has changed a lot, especially when it is about planning the factory building itself. Factory planning has the task of designing products, plants, processes, organization, areas, and the building of a factory. Regular restructuring is becoming more important in order to maintain the competitiveness of a factory. Restrictions in new areas, shorter life cycles of product and production technology as well as a VUCA world (Volatility, Uncertainty, Complexity and Ambiguity) lead to more frequent restructuring measures within a factory. A digital factory model is the planning basis for rebuilding measures and becomes an indispensable tool. Furthermore, digital building models are increasingly being used in factories to support facility management and manufacturing processes. The main research question of this paper is, therefore: What kind of digital factory model is suitable for the different areas of application during the operation of a factory? First, different types of digital factory models are investigated, and their properties and usabilities for use cases are analysed. Within the scope of investigation are point cloud models, building information models, photogrammetry models, and these enriched with sensor data are examined. It is investigated which digital models allow a simple integration of sensor data and where the differences are. Subsequently, possible application areas of digital factory models are determined by means of a survey and the respective digital factory models are assigned to the application areas. Finally, an application case from maintenance is selected and implemented with the help of the appropriate digital factory model. It is shown how a completely digitalized maintenance process can be supported by a digital factory model by providing information. Among other purposes, the digital factory model is used for indoor navigation, information provision, and display of sensor data. In summary, the paper shows a structuring of digital factory models that concentrates on the geometric representation of a factory building and its technical facilities. A practical application case is shown and implemented. Thus, the systematic selection of digital factory models with the corresponding application cases is evaluated.

Keywords: building information modeling, digital factory model, factory planning, maintenance

Procedia PDF Downloads 84