Search results for: multivariate regression tree
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4459

Search results for: multivariate regression tree

4429 Some Generalized Multivariate Estimators for Population Mean under Multi Phase Stratified Systematic Sampling

Authors: Muqaddas Javed, Muhammad Hanif

Abstract:

The generalized multivariate ratio and regression type estimators for population mean are suggested under multi-phase stratified systematic sampling (MPSSS) using multi auxiliary information. Estimators are developed under the two different situations of availability of auxiliary information. The expressions of bias and mean square error (MSE) are developed. Special cases of suggested estimators are also discussed and simulation study is conducted to observe the performance of estimators.

Keywords: generalized estimators, multi-phase sampling, stratified random sampling, systematic sampling

Procedia PDF Downloads 731
4428 Heart Ailment Prediction Using Machine Learning Methods

Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula

Abstract:

The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.

Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting

Procedia PDF Downloads 52
4427 Behind Fuzzy Regression Approach: An Exploration Study

Authors: Lavinia B. Dulla

Abstract:

The exploration study of the fuzzy regression approach attempts to present that fuzzy regression can be used as a possible alternative to classical regression. It likewise seeks to assess the differences and characteristics of simple linear regression and fuzzy regression using the width of prediction interval, mean absolute deviation, and variance of residuals. Based on the simple linear regression model, the fuzzy regression approach is worth considering as an alternative to simple linear regression when the sample size is between 10 and 20. As the sample size increases, the fuzzy regression approach is not applicable to use since the assumption regarding large sample size is already operating within the framework of simple linear regression. Nonetheless, it can be suggested for a practical alternative when decisions often have to be made on the basis of small data.

Keywords: fuzzy regression approach, minimum fuzziness criterion, interval regression, prediction interval

Procedia PDF Downloads 302
4426 Application of Deep Learning in Top Pair and Single Top Quark Production at the Large Hadron Collider

Authors: Ijaz Ahmed, Anwar Zada, Muhammad Waqas, M. U. Ashraf

Abstract:

We demonstrate the performance of a very efficient tagger applies on hadronically decaying top quark pairs as signal based on deep neural network algorithms and compares with the QCD multi-jet background events. A significant enhancement of performance in boosted top quark events is observed with our limited computing resources. We also compare modern machine learning approaches and perform a multivariate analysis of boosted top-pair as well as single top quark production through weak interaction at √s = 14 TeV proton-proton Collider. The most relevant known background processes are incorporated. Through the techniques of Boosted Decision Tree (BDT), likelihood and Multlayer Perceptron (MLP) the analysis is trained to observe the performance in comparison with the conventional cut based and count approach

Keywords: top tagger, multivariate, deep learning, LHC, single top

Procedia PDF Downloads 111
4425 A Dynamic Round Robin Routing for Z-Fat Tree

Authors: M. O. Adda

Abstract:

In this paper, we propose a topology called Zoned fat tree (Z-Fat tree) which is a further extension to the classical fat trees. The extension relates to the provision of extra degree of connectivity to maximize the number of deployed ports per routing nodes, and hence increases the bisection bandwidth especially for slimmed fat trees. The extra links, when classical routing is used, tend, in deterministic environment, to be under-utilized for some traffic patterns, hence achieving poor performance. We suggest two versions of a dynamic round robin scheme that outperforms the classical D-mod-k and S-mod-K routing and show by simulation that our proposal utilize all the extra added links to the classical fat tree, and achieve better performance for general applications.

Keywords: deterministic routing, fat tree, interconnection, traffic pattern

Procedia PDF Downloads 486
4424 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors

Authors: Katawut Kaewbanjong

Abstract:

We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.

Keywords: prediction model, statistical analysis, software project, user satisfaction factor

Procedia PDF Downloads 124
4423 Historical Landscape Affects Present Tree Density in Paddy Field

Authors: Ha T. Pham, Shuichi Miyagawa

Abstract:

Ongoing landscape transformation is one of the major causes behind disappearance of traditional landscapes, and lead to species and resource loss. Tree in paddy fields in the northeast of Thailand is one of those traditional landscapes. Using three different historical time layers, we acknowledged the severe deforestation and rapid urbanization happened in the region. Despite the general thinking of decline in tree density as consequences, the heterogeneous trend of changes in total tree density in three studied landscapes denied the hypothesis that number of trees in paddy field depend on the length of land use practice. On the other hand, due to selection of planting new trees on levees, existence of trees in paddy field are now rely on their values for human use. Besides, changes in land use and landscape structure had a significant impact on decision of which tree density level is considered as suitable for the landscape.

Keywords: aerial photographs, land use change, traditional landscape, tree in paddy fields

Procedia PDF Downloads 421
4422 Multivariate Genome-Wide Association Studies for Identifying Additional Loci for Myopia

Authors: Qiao Fan, Xiaobo Guo, Junxian Zhu, Xiaohu Ding, Ching-Yu Cheng, Tien-Yin Wong, Mingguang He, Heping Zhang, Xueqin Wang

Abstract:

A systematic, simultaneous analysis of multiple phenotypes in genome-wide association studies (GWASs) draws a great attention to integrate the signals from single phenotypes with increased power. However, lacking an interpretable and efficient multivariate GWAS analysis impede the application of such approach. In this study, we propose to decompose the multivariate model into a series of simple univariate models. This transformation illuminates what exactly the individual trait contributes to the significant signals from the multivariate analyses. By employing our approach in the analysis of three myopia-related endophenotypes from the Singapore Malay Eye Study (SIMES), we identify novel candidate loci which were successfully validated in an independent Guangzhou Twin Eye Study (GTES).

Keywords: GWAS multivariate, multiple traits, myopia, association

Procedia PDF Downloads 225
4421 Unconventional Dating of Old Peepal Tree of Chandigarh (India) Using Optically Stimulated Luminescence

Authors: Rita Rani, Ramesh Kumar

Abstract:

The intend of the current study is to date an old grand Peepal tree that is still alive. The tree is situated in Kalibard village, Sector 9, Chandigarh (India). Due to its huge structure, it has got the status of ‘Heritage tree.’ Optically Stimulated Luminescence of sediments beneath the roots is used to determine the age of the tree. Optical dating is preferred over conventional dating methods due to more precession. The methodology includes OSL of quartz grain using SAR protocol for accumulated dose measurement. The age determination of an alive tree using sedimentary quartz is in close agreement with the approximated age provided by the related agency. This is the first attempt at using optically stimulated luminescence in the age determination of alive trees in this region. The study concludes that the Luminescence dating of alive trees is the nondestructive and more precise method.

Keywords: luminescence, dose rate, optical dating, sediments

Procedia PDF Downloads 176
4420 Fuzzy Approach for Fault Tree Analysis of Water Tube Boiler

Authors: Syed Ahzam Tariq, Atharva Modi

Abstract:

This paper presents a probabilistic analysis of the safety of water tube boilers using fault tree analysis (FTA). A fault tree has been constructed by considering all possible areas where a malfunction could lead to a boiler accident. Boiler accidents are relatively rare, causing a scarcity of data. The fuzzy approach is employed to perform a quantitative analysis, wherein theories of fuzzy logic are employed in conjunction with expert elicitation to calculate failure probabilities. The Fuzzy Fault Tree Analysis (FFTA) provides a scientific and contingent method to forecast and prevent accidents.

Keywords: fault tree analysis water tube boiler, fuzzy probability score, failure probability

Procedia PDF Downloads 128
4419 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: client classification, loan suitability, risk rating, CART analysis

Procedia PDF Downloads 338
4418 Predictive Analysis of the Stock Price Market Trends with Deep Learning

Authors: Suraj Mehrotra

Abstract:

The stock market is a volatile, bustling marketplace that is a cornerstone of economics. It defines whether companies are successful or in spiral. A thorough understanding of it is important - many companies have whole divisions dedicated to analysis of both their stock and of rivaling companies. Linking the world of finance and artificial intelligence (AI), especially the stock market, has been a relatively recent development. Predicting how stocks will do considering all external factors and previous data has always been a human task. With the help of AI, however, machine learning models can help us make more complete predictions in financial trends. Taking a look at the stock market specifically, predicting the open, closing, high, and low prices for the next day is very hard to do. Machine learning makes this task a lot easier. A model that builds upon itself that takes in external factors as weights can predict trends far into the future. When used effectively, new doors can be opened up in the business and finance world, and companies can make better and more complete decisions. This paper explores the various techniques used in the prediction of stock prices, from traditional statistical methods to deep learning and neural networks based approaches, among other methods. It provides a detailed analysis of the techniques and also explores the challenges in predictive analysis. For the accuracy of the testing set, taking a look at four different models - linear regression, neural network, decision tree, and naïve Bayes - on the different stocks, Apple, Google, Tesla, Amazon, United Healthcare, Exxon Mobil, J.P. Morgan & Chase, and Johnson & Johnson, the naïve Bayes model and linear regression models worked best. For the testing set, the naïve Bayes model had the highest accuracy along with the linear regression model, followed by the neural network model and then the decision tree model. The training set had similar results except for the fact that the decision tree model was perfect with complete accuracy in its predictions, which makes sense. This means that the decision tree model likely overfitted the training set when used for the testing set.

Keywords: machine learning, testing set, artificial intelligence, stock analysis

Procedia PDF Downloads 96
4417 Study on Optimal Control Strategy of PM2.5 in Wuhan, China

Authors: Qiuling Xie, Shanliang Zhu, Zongdi Sun

Abstract:

In this paper, we analyzed the correlation relationship among PM2.5 from other five Air Quality Indices (AQIs) based on the grey relational degree, and built a multivariate nonlinear regression equation model of PM2.5 and the five monitoring indexes. For the optimal control problem of PM2.5, we took the partial large Cauchy distribution of membership equation as satisfaction function. We established a nonlinear programming model with the goal of maximum performance to price ratio. And the optimal control scheme is given.

Keywords: grey relational degree, multiple linear regression, membership function, nonlinear programming

Procedia PDF Downloads 301
4416 Commuters Trip Purpose Decision Tree Based Model of Makurdi Metropolis, Nigeria and Strategic Digital City Project

Authors: Emmanuel Okechukwu Nwafor, Folake Olubunmi Akintayo, Denis Alcides Rezende

Abstract:

Decision tree models are versatile and interpretable machine learning algorithms widely used for both classification and regression tasks, which can be related to cities, whether physical or digital. The aim of this research is to assess how well decision tree algorithms can predict trip purposes in Makurdi, Nigeria, while also exploring their connection to the strategic digital city initiative. The research methodology involves formalizing household demographic and trips information datasets obtained from extensive survey process. Modelling and Prediction were achieved using Python Programming Language and the evaluation metrics like R-squared and mean absolute error were used to assess the decision tree algorithm's performance. The results indicate that the model performed well, with accuracies of 84% and 68%, and low MAE values of 0.188 and 0.314, on training and validation data, respectively. This suggests the model can be relied upon for future prediction. The conclusion reiterates that This model will assist decision-makers, including urban planners, transportation engineers, government officials, and commuters, in making informed decisions on transportation planning and management within the framework of a strategic digital city. Its application will enhance the efficiency, sustainability, and overall quality of transportation services in Makurdi, Nigeria.

Keywords: decision tree algorithm, trip purpose, intelligent transport, strategic digital city, travel pattern, sustainable transport

Procedia PDF Downloads 23
4415 Parameter Estimation via Metamodeling

Authors: Sergio Haram Sarmiento, Arcady Ponosov

Abstract:

Based on appropriate multivariate statistical methodology, we suggest a generic framework for efficient parameter estimation for ordinary differential equations and the corresponding nonlinear models. In this framework classical linear regression strategies is refined into a nonlinear regression by a locally linear modelling technique (known as metamodelling). The approach identifies those latent variables of the given model that accumulate most information about it among all approximations of the same dimension. The method is applied to several benchmark problems, in particular, to the so-called ”power-law systems”, being non-linear differential equations typically used in Biochemical System Theory.

Keywords: principal component analysis, generalized law of mass action, parameter estimation, metamodels

Procedia PDF Downloads 518
4414 High Resolution Satellite Imagery and Lidar Data for Object-Based Tree Species Classification in Quebec, Canada

Authors: Bilel Chalghaf, Mathieu Varin

Abstract:

Forest characterization in Quebec, Canada, is usually assessed based on photo-interpretation at the stand level. For species identification, this often results in a lack of precision. Very high spatial resolution imagery, such as DigitalGlobe, and Light Detection and Ranging (LiDAR), have the potential to overcome the limitations of aerial imagery. To date, few studies have used that data to map a large number of species at the tree level using machine learning techniques. The main objective of this study is to map 11 individual high tree species ( > 17m) at the tree level using an object-based approach in the broadleaf forest of Kenauk Nature, Quebec. For the individual tree crown segmentation, three canopy-height models (CHMs) from LiDAR data were assessed: 1) the original, 2) a filtered, and 3) a corrected model. The corrected CHM gave the best accuracy and was then coupled with imagery to refine tree species crown identification. When compared with photo-interpretation, 90% of the objects represented a single species. For modeling, 313 variables were derived from 16-band WorldView-3 imagery and LiDAR data, using radiance, reflectance, pixel, and object-based calculation techniques. Variable selection procedures were employed to reduce their number from 313 to 16, using only 11 bands to aid reproducibility. For classification, a global approach using all 11 species was compared to a semi-hierarchical hybrid classification approach at two levels: (1) tree type (broadleaf/conifer) and (2) individual broadleaf (five) and conifer (six) species. Five different model techniques were used: (1) support vector machine (SVM), (2) classification and regression tree (CART), (3) random forest (RF), (4) k-nearest neighbors (k-NN), and (5) linear discriminant analysis (LDA). Each model was tuned separately for all approaches and levels. For the global approach, the best model was the SVM using eight variables (overall accuracy (OA): 80%, Kappa: 0.77). With the semi-hierarchical hybrid approach, at the tree type level, the best model was the k-NN using six variables (OA: 100% and Kappa: 1.00). At the level of identifying broadleaf and conifer species, the best model was the SVM, with OA of 80% and 97% and Kappa values of 0.74 and 0.97, respectively, using seven variables for both models. This paper demonstrates that a hybrid classification approach gives better results and that using 16-band WorldView-3 with LiDAR data leads to more precise predictions for tree segmentation and classification, especially when the number of tree species is large.

Keywords: tree species, object-based, classification, multispectral, machine learning, WorldView-3, LiDAR

Procedia PDF Downloads 136
4413 Estimation of Functional Response Model by Supervised Functional Principal Component Analysis

Authors: Hyon I. Paek, Sang Rim Kim, Hyon A. Ryu

Abstract:

In functional linear regression, one typical problem is to reduce dimension. Compared with multivariate linear regression, functional linear regression is regarded as an infinite-dimensional case, and the main task is to reduce dimensions of functional response and functional predictors. One common approach is to adapt functional principal component analysis (FPCA) on functional predictors and then use a few leading functional principal components (FPC) to predict the functional model. The leading FPCs estimated by the typical FPCA explain a major variation of the functional predictor, but these leading FPCs may not be mostly correlated with the functional response, so they may not be significant in the prediction for response. In this paper, we propose a supervised functional principal component analysis method for a functional response model with FPCs obtained by considering the correlation of the functional response. Our method would have a better prediction accuracy than the typical FPCA method.

Keywords: supervised, functional principal component analysis, functional response, functional linear regression

Procedia PDF Downloads 77
4412 Handshake Algorithm for Minimum Spanning Tree Construction

Authors: Nassiri Khalid, El Hibaoui Abdelaaziz et Hajar Moha

Abstract:

In this paper, we introduce and analyse a probabilistic distributed algorithm for a construction of a minimum spanning tree on network. This algorithm is based on the handshake concept. Firstly, each network node is considered as a sub-spanning tree. And at each round of the execution of our algorithm, a sub-spanning trees are merged. The execution continues until all sub-spanning trees are merged into one. We analyze this algorithm by a stochastic process.

Keywords: Spanning tree, Distributed Algorithm, Handshake Algorithm, Matching, Probabilistic Analysis

Procedia PDF Downloads 660
4411 Simultaneous Determination of Six Characterizing/Quality Parameters of Biodiesels via 1H NMR and Multivariate Calibration

Authors: Gustavo G. Shimamoto, Matthieu Tubino

Abstract:

The characterization and the quality of biodiesel samples are checked by determining several parameters. Considering a large number of analysis to be performed, as well as the disadvantages of the use of toxic solvents and waste generation, multivariate calibration is suggested to reduce the number of tests. In this work, hydrogen nuclear magnetic resonance (1H NMR) spectra were used to build multivariate models, from partial least squares (PLS) regression, in order to determine simultaneously six important characterizing and/or quality parameters of biodiesels: density at 20 ºC, kinematic viscosity at 40 ºC, iodine value, acid number, oxidative stability, and water content. Biodiesels from twelve different oils sources were used in this study: babassu, brown flaxseed, canola, corn, cottonseed, macauba almond, microalgae, palm kernel, residual frying, sesame, soybean, and sunflower. 1H NMR reflects the structures of the compounds present in biodiesel samples and showed suitable correlations with the six parameters. The PLS models were constructed with latent variables between 5 and 7, the obtained values of r(cal) and r(val) were greater than 0.994 and 0.989, respectively. In addition, the models were considered suitable to predict all the six parameters for external samples, taking into account the analytical speed to perform it. Thus, the alliance between 1H NMR and PLS showed to be appropriate to characterize and evaluate the quality of biodiesels, reducing significantly analysis time, the consumption of reagents/solvents, and waste generation. Therefore, the proposed methods can be considered to adhere to the principles of green chemistry.

Keywords: biodiesel, multivariate calibration, nuclear magnetic resonance, quality parameters

Procedia PDF Downloads 541
4410 Predicting Growth of Eucalyptus Marginata in a Mediterranean Climate Using an Individual-Based Modelling Approach

Authors: S.K. Bhandari, E. Veneklaas, L. McCaw, R. Mazanec, K. Whitford, M. Renton

Abstract:

Eucalyptus marginata, E. diversicolor and Corymbia calophylla form widespread forests in south-west Western Australia (SWWA). These forests have economic and ecological importance, and therefore, tree growth and sustainable management are of high priority. This paper aimed to analyse and model the growth of these species at both stand and individual levels, but this presentation will focus on predicting the growth of E. Marginata at the individual tree level. More specifically, the study wanted to investigate how well individual E. marginata tree growth could be predicted by considering the diameter and height of the tree at the start of the growth period, and whether this prediction could be improved by also accounting for the competition from neighbouring trees in different ways. The study also wanted to investigate how many neighbouring trees or what neighbourhood distance needed to be considered when accounting for competition. To achieve this aim, the Pearson correlation coefficient was examined among competition indices (CIs), between CIs and dbh growth, and selected the competition index that can best predict the diameter growth of individual trees of E. marginata forest managed under different thinning regimes at Inglehope in SWWA. Furthermore, individual tree growth models were developed using simple linear regression, multiple linear regression, and linear mixed effect modelling approaches. Individual tree growth models were developed for thinned and unthinned stand separately. The developed models were validated using two approaches. In the first approach, models were validated using a subset of data that was not used in model fitting. In the second approach, the model of the one growth period was validated with the data of another growth period. Tree size (diameter and height) was a significant predictor of growth. This prediction was improved when the competition was included in the model. The fit statistic (coefficient of determination) of the model ranged from 0.31 to 0.68. The model with spatial competition indices validated as being more accurate than with non-spatial indices. The model prediction can be optimized if 10 to 15 competitors (by number) or competitors within ~10 m (by distance) from the base of the subject tree are included in the model, which can reduce the time and cost of collecting the information about the competitors. As competition from neighbours was a significant predictor with a negative effect on growth, it is recommended including neighbourhood competition when predicting growth and considering thinning treatments to minimize the effect of competition on growth. These model approaches are likely to be useful tools for the conservations and sustainable management of forests of E. marginata in SWWA. As a next step in optimizing the number and distance of competitors, further studies in larger size plots and with a larger number of plots than those used in the present study are recommended.

Keywords: competition, growth, model, thinning

Procedia PDF Downloads 128
4409 Decision Tree Based Scheduling for Flexible Job Shops with Multiple Process Plans

Authors: H.-H. Doh, J.-M. Yu, Y.-J. Kwon, J.-H. Shin, H.-W. Kim, S.-H. Nam, D.-H. Lee

Abstract:

This paper suggests a decision tree based approach for flexible job shop scheduling with multiple process plans, i. e. each job can be processed through alternative operations, each of which can be processed on alternative machines. The main decision variables are: (a) selecting operation/machine pair; and (b) sequencing the jobs assigned to each machine. As an extension of the priority scheduling approach that selects the best priority rule combination after many simulation runs, this study suggests a decision tree based approach in which a decision tree is used to select a priority rule combination adequate for a specific system state and hence the burdens required for developing simulation models and carrying out simulation runs can be eliminated. The decision tree based scheduling approach consists of construction and scheduling modules. In the construction module, a decision tree is constructed using a four-stage algorithm, and in the scheduling module, a priority rule combination is selected using the decision tree. To show the performance of the decision tree based approach suggested in this study, a case study was done on a flexible job shop with reconfigurable manufacturing cells and a conventional job shop, and the results are reported by comparing it with individual priority rule combinations for the objectives of minimizing total flow time and total tardiness.

Keywords: flexible job shop scheduling, decision tree, priority rules, case study

Procedia PDF Downloads 358
4408 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents

Authors: Ying Zhao, Xingyan Bin

Abstract:

Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.

Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints

Procedia PDF Downloads 548
4407 Using Data Mining Technique for Scholarship Disbursement

Authors: J. K. Alhassan, S. A. Lawal

Abstract:

This work is on decision tree-based classification for the disbursement of scholarship. Tree-based data mining classification technique is used in other to determine the generic rule to be used to disburse the scholarship. The system based on the defined rules from the tree is able to determine the class (status) to which an applicant shall belong whether Granted or Not Granted. The applicants that fall to the class of granted denote a successful acquirement of scholarship while those in not granted class are unsuccessful in the scheme. An algorithm that can be used to classify the applicants based on the rules from tree-based classification was also developed. The tree-based classification is adopted because of its efficiency, effectiveness, and easy to comprehend features. The system was tested with the data of National Information Technology Development Agency (NITDA) Abuja, a Parastatal of Federal Ministry of Communication Technology that is mandated to develop and regulate information technology in Nigeria. The system was found working according to the specification. It is therefore recommended for all scholarship disbursement organizations.

Keywords: classification, data mining, decision tree, scholarship

Procedia PDF Downloads 378
4406 Spatial Interactions Between Earthworm Abundance and Tree Growth Characteristics in Western Niger Delta

Authors: Olatunde Sunday Eludoyin, Charles Obiechina Olisa

Abstract:

The study examined the spatial interactions between earthworm abundance (EA) and tree growth characteristics in ecological belts of Western Niger Delta, Nigeria. Eight 20m x 20m quadrat were delimited in the natural vegetation in each of the rainforest (RF), mangrove (M), fresh water swamp (FWS), and guinea savanna (GS) ecological belts to gather data about the tree species (TS) characteristics which included individual number of tree species (IN), diversity (Di), density (De) and richness (Ri). Three quadrats of 1m x 1m were delineated in each of the 20m x 20m quadrats to collect earthworm species the topsoil (0-15cm), and subsoil (15-30cm) and were taken to laboratory for further analysis. Descriptive statistics and inferential statistics were used for data analysis. Findings showed that a total of 19 earthworm species was found, with 58.5% individual species recorded in the topsoil and 41.5% recorded in the subsoil. The total population ofEudriliuseugeniae was predominantly highest in both topsoil (38.4%) and subsoil (27.1%). The total population of individual species of earthworm was least in GS in the topsoil (11.9%) and subsoil (8.4%). A total of 40 different species of TS was recorded, of which 55.5% were recorded in FWS, while RF was significantly highest in the species diversity(0.5971). Regression analysis revealed that Ri, IN, DBH, Di, and De of trees explained 65.9% of the variability of EA in the topsoil, while 46.9 % of the variability of earthworm abundance was explained by the floristic parameters in the subsoil.Similarly, correlation statistics revealed that in the topsoil, EA is positively and significantly correlated with Ri (r=0.35; p<0.05), IN (r=0.523; p<0.05) and De (r=0.469; p<0.05) while DBH was negatively and significantly correlated with earthworm abundance (r=-0.437; p<0.05). In the subsoil, only Ri and DBH correlated significantly with EA. The study concluded that EA in the study locations was highly influenced by tree growth species especially Ri, IN, DBH, Di, and De. The study recommended that the TSabundance should be improved in the study locations to ensure the survival of earthworms for ecosystem functions.

Keywords: interactions, earthworm abundance, tree growth, ecological zones, western niger delta

Procedia PDF Downloads 102
4405 Optimization of Machine Learning Regression Results: An Application on Health Expenditures

Authors: Songul Cinaroglu

Abstract:

Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.

Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure

Procedia PDF Downloads 226
4404 HIV Disclosure Status and Factors among Women to Their Sexual Partner in Victory plus, Yogyakarta, Indonesia

Authors: Dwi Kartika Rukmi, Miftafu Darussalam

Abstract:

Background: The disclosure of women’s HIV status toward their sexual partners is an important issue that should be regarded as one of the efforts to prevent and control the spread of HIV. Research on the disclosure of seropositive HIV status as well as women-related factors in Indonesia, especially Yogyakarta is only a few. Methods: This is a correlational descriptive research along with its cross-sectional approach on 329 women with HIV/AIDS at the Victory Plus NGO from June to July 2016. This research used a purposive sampling method and a questionnaire as the data collection technique. The bivariate analysis test was undertaken by using a chi-square and multivariate test along with a logistic regression. Result: The multivariate analysis and logistic regression show five independent variables related to the disclosure of seropositive HIV status of women with HIV/AIDS toward their sexual partners, namely ethnicity (aOR = 36,859; 95% CI; (6,544-207,616)) religion (aOR =0,255; 95%CI; (0,075-0,868)), discussion with partners prior to the HIV test (aOR =0,069; 95%CI; (0,065-0,438)) , types of sexual partners (aOR = 0.191; 95% CI; (0.082-0,445)) and knowledge on the partners’ HIV status (aOR = 0.036; 95% CI; (0.008-0.160)). The highest level of reason for seropositive HIV women not to be open about their partners’ status is the fear of being rejected by their partners and the environmental stigma of HIV AIDS disease. Conclusion: The disclosure of seropositive HIV status in women with HIV/AIDS in the Victory Plus NGO of Yogyakarta was 79.4% or classified as a high category with some related factors such as ethnicity, religion, discussion with partners prior to the HIV test, types of partners and knowledge on the partners’ HIV status.

Keywords: women, HIV, disclosure, sexual partner

Procedia PDF Downloads 261
4403 Dynamic Fault Tree Analysis of Dynamic Positioning System through Monte Carlo Approach

Authors: A. S. Cheliyan, S. K. Bhattacharyya

Abstract:

Dynamic Positioning System (DPS) is employed in marine vessels of the offshore oil and gas industry. It is a computer controlled system to automatically maintain a ship’s position and heading by using its own thrusters. Reliability assessment of the same can be analyzed through conventional fault tree. However, the complex behaviour like sequence failure, redundancy management and priority of failing of events cannot be analyzed by the conventional fault trees. The Dynamic Fault Tree (DFT) addresses these shortcomings of conventional Fault Tree by defining additional gates called dynamic gates. Monte Carlo based simulation approach has been adopted for the dynamic gates. This method of realistic modeling of DPS gives meaningful insight into the system reliability and the ability to improve the same.

Keywords: dynamic positioning system, dynamic fault tree, Monte Carlo simulation, reliability assessment

Procedia PDF Downloads 775
4402 Multivariate Output-Associative RVM for Multi-Dimensional Affect Predictions

Authors: Achut Manandhar, Kenneth D. Morton, Peter A. Torrione, Leslie M. Collins

Abstract:

The current trends in affect recognition research are to consider continuous observations from spontaneous natural interactions in people using multiple feature modalities, and to represent affect in terms of continuous dimensions, incorporate spatio-temporal correlation among affect dimensions, and provide fast affect predictions. These research efforts have been propelled by a growing effort to develop affect recognition system that can be implemented to enable seamless real-time human-computer interaction in a wide variety of applications. Motivated by these desired attributes of an affect recognition system, in this work a multi-dimensional affect prediction approach is proposed by integrating multivariate Relevance Vector Machine (MVRVM) with a recently developed Output-associative Relevance Vector Machine (OARVM) approach. The resulting approach can provide fast continuous affect predictions by jointly modeling the multiple affect dimensions and their correlations. Experiments on the RECOLA database show that the proposed approach performs competitively with the OARVM while providing faster predictions during testing.

Keywords: dimensional affect prediction, output-associative RVM, multivariate regression, fast testing

Procedia PDF Downloads 286
4401 Prediction of Coronary Artery Stenosis Severity Based on Machine Learning Algorithms

Authors: Yu-Jia Jian, Emily Chia-Yu Su, Hui-Ling Hsu, Jian-Jhih Chen

Abstract:

Coronary artery is the major supplier of myocardial blood flow. When fat and cholesterol are deposit in the coronary arterial wall, narrowing and stenosis of the artery occurs, which may lead to myocardial ischemia and eventually infarction. According to the World Health Organization (WHO), estimated 740 million people have died of coronary heart disease in 2015. According to Statistics from Ministry of Health and Welfare in Taiwan, heart disease (except for hypertensive diseases) ranked the second among the top 10 causes of death from 2013 to 2016, and it still shows a growing trend. According to American Heart Association (AHA), the risk factors for coronary heart disease including: age (> 65 years), sex (men to women with 2:1 ratio), obesity, diabetes, hypertension, hyperlipidemia, smoking, family history, lack of exercise and more. We have collected a dataset of 421 patients from a hospital located in northern Taiwan who received coronary computed tomography (CT) angiography. There were 300 males (71.26%) and 121 females (28.74%), with age ranging from 24 to 92 years, and a mean age of 56.3 years. Prior to coronary CT angiography, basic data of the patients, including age, gender, obesity index (BMI), diastolic blood pressure, systolic blood pressure, diabetes, hypertension, hyperlipidemia, smoking, family history of coronary heart disease and exercise habits, were collected and used as input variables. The output variable of the prediction module is the degree of coronary artery stenosis. The output variable of the prediction module is the narrow constriction of the coronary artery. In this study, the dataset was randomly divided into 80% as training set and 20% as test set. Four machine learning algorithms, including logistic regression, stepwise regression, neural network and decision tree, were incorporated to generate prediction results. We used area under curve (AUC) / accuracy (Acc.) to compare the four models, the best model is neural network, followed by stepwise logistic regression, decision tree, and logistic regression, with 0.68 / 79 %, 0.68 / 74%, 0.65 / 78%, and 0.65 / 74%, respectively. Sensitivity of neural network was 27.3%, specificity was 90.8%, stepwise Logistic regression sensitivity was 18.2%, specificity was 92.3%, decision tree sensitivity was 13.6%, specificity was 100%, logistic regression sensitivity was 27.3%, specificity 89.2%. From the result of this study, we hope to improve the accuracy by improving the module parameters or other methods in the future and we hope to solve the problem of low sensitivity by adjusting the imbalanced proportion of positive and negative data.

Keywords: decision support, computed tomography, coronary artery, machine learning

Procedia PDF Downloads 229
4400 CanVis: Towards a Web Platform for Cancer Progression Tree Analysis

Authors: Michael Aupetit, Mahmoud Al-ismail, Khaled Mohamed

Abstract:

Cancer is a major public health problem all over the world. Breast cancer has the highest incidence rate over all cancers for women in Qatar making its study a top priority of the country. Human cancer is a dynamic disease that develops over an extended period through the accumulation of a series of genetic alterations. A Darwinian process drives the tumor cells toward higher malignancy growing the branches of a progression tree in the space of genes expression. Although it is not possible to track these genetic alterations dynamically for one patient, it is possible to reconstruct the progression tree from the aggregation of thousands of tumor cells’ genetic profiles from thousands of different patients at different stages of the disease. Analyzing the progression tree is a way to detect pivotal molecular events that drive the malignant evolution and to provide a guide for the development of cancer diagnostics, prognostics and targeted therapeutics. In this work we present the development of a Visual Analytic web platform CanVis enabling users to upload gene-expression data and analyze their progression tree. The server computes the progression tree based on state-of-the-art techniques and allows an interactive visual exploration of this tree and the gene-expression data along its branching structure helping to discover potential driver genes.

Keywords: breast cancer, progression tree, visual analytics, web platform

Procedia PDF Downloads 419