Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 23

random forest Related Abstracts

23 Comparative Study od Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast

Authors: Zalinda Othman, Abdul Razak Hamdan, Azuraliza Abu Bakar, Nabilah Filzah Mohd Radzuan, Andi Putra

Abstract:

Precipitation forecast is important to avoid natural disaster incident which can cause losses in the involved area. This paper reviews three techniques logistic regression, decision tree, and random forest which are used in making precipitation forecast. These combination techniques through the vector auto-regression (VAR) model help in finding the advantages and strengths of each technique in the forecast process. The data-set contains variables of the rain’s domain. Adaptation of artificial intelligence techniques involved in rain domain enables the forecast process to be easier and systematic for precipitation forecast.

Keywords: Logistic Regression, decisions tree, random forest, VAR model

Procedia PDF Downloads 240
22 A Study of Classification Models to Predict Drill-Bit Breakage Using Degradation Signals

Authors: Bharatendra Rai

Abstract:

Cutting tools are widely used in manufacturing processes and drilling is the most commonly used machining process. Although drill-bits used in drilling may not be expensive, their breakage can cause damage to expensive work piece being drilled and at the same time has major impact on productivity. Predicting drill-bit breakage, therefore, is important in reducing cost and improving productivity. This study uses twenty features extracted from two degradation signals viz., thrust force and torque. The methodology used involves developing and comparing decision tree, random forest, and multinomial logistic regression models for classifying and predicting drill-bit breakage using degradation signals.

Keywords: random forest, degradation signal, drill-bit breakage, multinomial logistic regression

Procedia PDF Downloads 215
21 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second

Authors: P. V. Pramila, V. Mahesh

Abstract:

Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.

Keywords: random forest, FEV, multivariate adaptive regression splines pulmonary function test

Procedia PDF Downloads 178
20 Determining Optimal Number of Trees in Random Forests

Authors: Songul Cinaroglu

Abstract:

Background: Random Forest is an efficient, multi-class machine learning method using for classification, regression and other tasks. This method is operating by constructing each tree using different bootstrap sample of the data. Determining the number of trees in random forests is an open question in the literature for studies about improving classification performance of random forests. Aim: The aim of this study is to analyze whether there is an optimal number of trees in Random Forests and how performance of Random Forests differ according to increase in number of trees using sample health data sets in R programme. Method: In this study we analyzed the performance of Random Forests as the number of trees grows and doubling the number of trees at every iteration using “random forest” package in R programme. For determining minimum and optimal number of trees we performed Mc Nemar test and Area Under ROC Curve respectively. Results: At the end of the analysis it was found that as the number of trees grows, it does not always means that the performance of the forest is better than forests which have fever trees. In other words larger number of trees only increases computational costs but not increases performance results. Conclusion: Despite general practice in using random forests is to generate large number of trees for having high performance results, this study shows that increasing number of trees doesn’t always improves performance. Future studies can compare different kinds of data sets and different performance measures to test whether Random Forest performance results change as number of trees increase or not.

Keywords: Classification Methods, Decision trees, random forest, number of trees

Procedia PDF Downloads 240
19 Towards Integrating Statistical Color Features for Human Skin Detection

Authors: Mohd Aizaini Maarof, Mohd Zamri Osman, Mohd Foad Rohani

Abstract:

Human skin detection recognized as the primary step in most of the applications such as face detection, illicit image filtering, hand recognition and video surveillance. The performance of any skin detection applications greatly relies on the two components: feature extraction and classification method. Skin color is the most vital information used for skin detection purpose. However, color feature alone sometimes could not handle images with having same color distribution with skin color. A color feature of pixel-based does not eliminate the skin-like color due to the intensity of skin and skin-like color fall under the same distribution. Hence, the statistical color analysis will be exploited such mean and standard deviation as an additional feature to increase the reliability of skin detector. In this paper, we studied the effectiveness of statistical color feature for human skin detection. Furthermore, the paper analyzed the integrated color and texture using eight classifiers with three color spaces of RGB, YCbCr, and HSV. The experimental results show that the integrating statistical feature using Random Forest classifier achieved a significant performance with an F1-score 0.969.

Keywords: Neural Network, random forest, skin detection, color space, statistical feature

Procedia PDF Downloads 249
18 Land Cover Classification Using Sentinel-2 Image Data and Random Forest Algorithm

Authors: Thanh Noi Phan, Martin Kappas, Jan Degener

Abstract:

The currently launched Sentinel 2 (S2) satellite (June, 2015) bring a great potential and opportunities for land use/cover map applications, due to its fine spatial resolution multispectral as well as high temporal resolutions. So far, there are handful studies using S2 real data for land cover classification. Especially in northern Vietnam, to our best knowledge, there exist no studies using S2 data for land cover map application. The aim of this study is to provide the preliminary result of land cover classification using Sentinel -2 data with a rising state – of – art classifier, Random Forest. A case study with heterogeneous land use/cover in the eastern of Hanoi Capital – Vietnam was chosen for this study. All 10 spectral bands of 10 and 20 m pixel size of S2 images were used, the 10 m bands were resampled to 20 m. Among several classified algorithms, supervised Random Forest classifier (RF) was applied because it was reported as one of the most accuracy methods of satellite image classification. The results showed that the red-edge and shortwave infrared (SWIR) bands play an important role in land cover classified results. A very high overall accuracy above 90% of classification results was achieved.

Keywords: classification, Land Cover, random forest, Vietnam, classify algorithm, sentinel 2

Procedia PDF Downloads 192
17 Intrusion Detection in Cloud Computing Using Machine Learning

Authors: Faiza Babur Khan, Sohail Asghar

Abstract:

With an emergence of distributed environment, cloud computing is proving to be the most stimulating computing paradigm shift in computer technology, resulting in spectacular expansion in IT industry. Many companies have augmented their technical infrastructure by adopting cloud resource sharing architecture. Cloud computing has opened doors to unlimited opportunities from application to platform availability, expandable storage and provision of computing environment. However, from a security viewpoint, an added risk level is introduced from clouds, weakening the protection mechanisms, and hardening the availability of privacy, data security and on demand service. Issues of trust, confidentiality, and integrity are elevated due to multitenant resource sharing architecture of cloud. Trust or reliability of cloud refers to its capability of providing the needed services precisely and unfailingly. Confidentiality is the ability of the architecture to ensure authorization of the relevant party to access its private data. It also guarantees integrity to protect the data from being fabricated by an unauthorized user. So in order to assure provision of secured cloud, a roadmap or model is obligatory to analyze a security problem, design mitigation strategies, and evaluate solutions. The aim of the paper is twofold; first to enlighten the factors which make cloud security critical along with alleviation strategies and secondly to propose an intrusion detection model that identifies the attackers in a preventive way using machine learning Random Forest classifier with an accuracy of 99.8%. This model uses less number of features. A comparison with other classifiers is also presented.

Keywords: Machine Learning, Cloud Security, classification, Threats, random forest

Procedia PDF Downloads 213
16 A Quantitative Structure-Adsorption Study on Novel and Emerging Adsorbent Materials

Authors: Marc Sader, Michiel Stock, Bernard De Baets

Abstract:

Considering a large amount of adsorption data of adsorbate gases on adsorbent materials in literature, it is interesting to predict such adsorption data without experimentation. A quantitative structure-activity relationship (QSAR) is developed to correlate molecular characteristics of gases and existing knowledge of materials with their respective adsorption properties. The application of Random Forest, a machine learning method, on a set of adsorption isotherms at a wide range of partial pressures and concentrations is studied. The predicted adsorption isotherms are fitted to several adsorption equations to estimate the adsorption properties. To impute the adsorption properties of desired gases on desired materials, leave-one-out cross-validation is employed. Extensive experimental results for a range of settings are reported.

Keywords: Adsorption, predictive modeling, random forest, QSAR

Procedia PDF Downloads 93
15 [Keynote Speech]: Feature Selection and Predictive Modeling of Housing Data Using Random Forest

Authors: Bharatendra Rai

Abstract:

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).

Keywords: Feature selection, random forest, root mean square error, housing data, Boruta algorithm

Procedia PDF Downloads 195
14 Evaluation of Random Forest and Support Vector Machine Classification Performance for the Prediction of Early Multiple Sclerosis from Resting State FMRI Connectivity Data

Authors: V. Sacca, A. Sarica, F. Novellino, S. Barone, T. Tallarico, E. Filippelli, A. Granata, P. Valentino, A. Quattrone

Abstract:

The work aim was to evaluate how well Random Forest (RF) and Support Vector Machine (SVM) algorithms could support the early diagnosis of Multiple Sclerosis (MS) from resting-state functional connectivity data. In particular, we wanted to explore the ability in distinguishing between controls and patients of mean signals extracted from ICA components corresponding to 15 well-known networks. Eighteen patients with early-MS (mean-age 37.42±8.11, 9 females) were recruited according to McDonald and Polman, and matched for demographic variables with 19 healthy controls (mean-age 37.55±14.76, 10 females). MRI was acquired by a 3T scanner with 8-channel head coil: (a)whole-brain T1-weighted; (b)conventional T2-weighted; (c)resting-state functional MRI (rsFMRI), 200 volumes. Estimated total lesion load (ml) and number of lesions were calculated using LST-toolbox from the corrected T1 and FLAIR. All rsFMRIs were pre-processed using tools from the FMRIB's Software Library as follows: (1) discarding of the first 5 volumes to remove T1 equilibrium effects, (2) skull-stripping of images, (3) motion and slice-time correction, (4) denoising with high-pass temporal filter (128s), (5) spatial smoothing with a Gaussian kernel of FWHM 8mm. No statistical significant differences (t-test, p < 0.05) were found between the two groups in the mean Euclidian distance and the mean Euler angle. WM and CSF signal together with 6 motion parameters were regressed out from the time series. We applied an independent component analysis (ICA) with the GIFT-toolbox using the Infomax approach with number of components=21. Fifteen mean components were visually identified by two experts. The resulting z-score maps were thresholded and binarized to extract the mean signal of the 15 networks for each subject. Statistical and machine learning analysis were then conducted on this dataset composed of 37 rows (subjects) and 15 features (mean signal in the network) with R language. The dataset was randomly splitted into training (75%) and test sets and two different classifiers were trained: RF and RBF-SVM. We used the intrinsic feature selection of RF, based on the Gini index, and recursive feature elimination (rfe) for the SVM, to obtain a rank of the most predictive variables. Thus, we built two new classifiers only on the most important features and we evaluated the accuracies (with and without feature selection) on test-set. The classifiers, trained on all the features, showed very poor accuracies on training (RF:58.62%, SVM:65.52%) and test sets (RF:62.5%, SVM:50%). Interestingly, when feature selection by RF and rfe-SVM were performed, the most important variable was the sensori-motor network I in both cases. Indeed, with only this network, RF and SVM classifiers reached an accuracy of 87.5% on test-set. More interestingly, the only misclassified patient resulted to have the lowest value of lesion volume. We showed that, with two different classification algorithms and feature selection approaches, the best discriminant network between controls and early MS, was the sensori-motor I. Similar importance values were obtained for the sensori-motor II, cerebellum and working memory networks. These findings, in according to the early manifestation of motor/sensorial deficits in MS, could represent an encouraging step toward the translation to the clinical diagnosis and prognosis.

Keywords: Machine Learning, Multiple Sclerosis, Feature selection, random forest, support vector machine

Procedia PDF Downloads 108
13 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Particle Swarm Optimization, Sensitivity Analysis, Crop Model, random forest, crop yield prediction, paramater estimation

Procedia PDF Downloads 110
12 Segmentation of Liver Using Random Forest Classifier

Authors: Gajendra Kumar Mourya, Dinesh Bhatia, Akash Handique, Sunita Warjri, Syed Achaab Amir

Abstract:

Nowadays, Medical imaging has become an integral part of modern healthcare. Abdominal CT images are an invaluable mean for abdominal organ investigation and have been widely studied in the recent years. Diagnosis of liver pathologies is one of the major areas of current interests in the field of medical image processing and is still an open problem. To deeply study and diagnose the liver, segmentation of liver is done to identify which part of the liver is mostly affected. Manual segmentation of the liver in CT images is time-consuming and suffers from inter- and intra-observer differences. However, automatic or semi-automatic computer aided segmentation of the Liver is a challenging task due to inter-patient Liver shape and size variability. In this paper, we present a technique for automatic segmenting the liver from CT images using Random Forest Classifier. Random forests or random decision forests are an ensemble learning method for classification that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees. After comparing with various other techniques, it was found that Random Forest Classifier provide a better segmentation results with respect to accuracy and speed. We have done the validation of our results using various techniques and it shows above 89% accuracy in all the cases.

Keywords: Segmentation, random forest, CT images, image validation

Procedia PDF Downloads 155
11 Identifying Promoters and Their Types Based on a Two-Layer Approach

Authors: Bin Liu

Abstract:

Prokaryotic promoter, consisted of two short DNA sequences located at in -35 and -10 positions, is responsible for controlling the initiation and expression of gene expression. Different types of promoters have different functions, and their consensus sequences are similar. In addition, their consensus sequences may be different for the same type of promoter, which poses difficulties for promoter identification. Unfortunately, all existing computational methods treat promoter identification as a binary classification task and can only identify whether a query sequence belongs to a specific promoter type. It is desired to develop computational methods for effectively identifying promoters and their types. Here, a two-layer predictor is proposed to try to deal with the problem. The first layer is designed to predict whether a given sequence is a promoter and the second layer predicts the type of promoter that is judged as a promoter. Meanwhile, we also analyze the importance of feature and sequence conversation in two aspects: promoter identification and promoter type identification. To the best knowledge of ours, it is the first computational predictor to detect promoters and their types.

Keywords: random forest, promoter, promoter type, sequence information

Procedia PDF Downloads 70
10 Stock Price Prediction with 'Earnings' Conference Call Sentiment

Authors: Sungzoon Cho, Hye Jin Lee, Sungwhan Jeon, Dongyoung Min, Sungwon Lyu

Abstract:

Major public corporations worldwide use conference calls to report their quarterly earnings. These 'earnings' conference calls allow for questions from stock analysts. We investigated if it is possible to identify sentiment from the call script and use it to predict stock price movement. We analyzed call scripts from six companies, two each from Korea, China and Indonesia during six years 2011Q1 – 2017Q2. Random forest with Frequency-based sentiment scores using Loughran MacDonald Dictionary did better than control model with only financial indicators. When the stock prices went up 20 days from earnings release, our model predicted correctly 77% of time. When the model predicted 'up,' actual stock prices went up 65% of time. This preliminary result encourages us to investigate advanced sentiment scoring methodologies such as topic modeling, auto-encoder, and word2vec variants.

Keywords: sentiment analysis, random forest, earnings call script, stock price prediction

Procedia PDF Downloads 57
9 Monitoring Future Climate Changes Pattern over Major Cities in Ghana Using Coupled Modeled Intercomparison Project Phase 5, Support Vector Machine, and Random Forest Modeling

Authors: Stephen Dankwa, Zheng Wenfeng, Xiaolu Li

Abstract:

Climate change is recently gaining the attention of many countries across the world. Climate change, which is also known as global warming, referring to the increasing in average surface temperature has been a concern to the Environmental Protection Agency of Ghana. Recently, Ghana has become vulnerable to the effect of the climate change as a result of the dependence of the majority of the population on agriculture. The clearing down of trees to grow crops and burning of charcoal in the country has been a contributing factor to the rise in temperature nowadays in the country as a result of releasing of carbon dioxide and greenhouse gases into the air. Recently, petroleum stations across the cities have been on fire due to this climate changes and which have position Ghana in a way not able to withstand this climate event. As a result, the significant of this research paper is to project how the rise in the average surface temperature will be like at the end of the mid-21st century when agriculture and deforestation are allowed to continue for some time in the country. This study uses the Coupled Modeled Intercomparison Project phase 5 (CMIP5) experiment RCP 8.5 model output data to monitor the future climate changes from 2041-2050, at the end of the mid-21st century over the ten (10) major cities (Accra, Bolgatanga, Cape Coast, Koforidua, Kumasi, Sekondi-Takoradi, Sunyani, Ho, Tamale, Wa) in Ghana. In the models, Support Vector Machine and Random forest, where the cities as a function of heat wave metrics (minimum temperature, maximum temperature, mean temperature, heat wave duration and number of heat waves) assisted to provide more than 50% accuracy to predict and monitor the pattern of the surface air temperature. The findings identified were that the near-surface air temperature will rise between 1°C-2°C (degrees Celsius) over the coastal cities (Accra, Cape Coast, Sekondi-Takoradi). The temperature over Kumasi, Ho and Sunyani by the end of 2050 will rise by 1°C. In Koforidua, it will rise between 1°C-2°C. The temperature will rise in Bolgatanga, Tamale and Wa by 0.5°C by 2050. This indicates how the coastal and the southern part of the country are becoming hotter compared with the north, even though the northern part is the hottest. During heat waves from 2041-2050, Bolgatanga, Tamale, and Wa will experience the highest mean daily air temperature between 34°C-36°C. Kumasi, Koforidua, and Sunyani will experience about 34°C. The coastal cities (Accra, Cape Coast, Sekondi-Takoradi) will experience below 32°C. Even though, the coastal cities will experience the lowest mean temperature, they will have the highest number of heat waves about 62. Majority of the heat waves will last between 2 to 10 days with the maximum 30 days. The surface temperature will continue to rise by the end of the mid-21st century (2041-2050) over the major cities in Ghana and so needs to be addressed to the Environmental Protection Agency in Ghana in order to mitigate this problem.

Keywords: Climate Changes, random forest, SVM, Ghana, heat waves, CMIP5

Procedia PDF Downloads 86
8 Object-Based Image Analysis for Gully-Affected Area Detection in the Hilly Loess Plateau Region of China Using Unmanned Aerial Vehicle

Authors: Kai Liu, Hu Ding, Guoan Tang

Abstract:

The Chinese Loess Plateau suffers from serious gully erosion induced by natural and human causes. Gully features detection including gully-affected area and its two dimension parameters (length, width, area et al.), is a significant task not only for researchers but also for policy-makers. This study aims at gully-affected area detection in three catchments of Chinese Loess Plateau, which were selected in Changwu, Ansai, and Suide by using unmanned aerial vehicle (UAV). The methodology includes a sequence of UAV data generation, image segmentation, feature calculation and selection, and random forest classification. Two experiments were conducted to investigate the influences of segmentation strategy and feature selection. Results showed that vertical and horizontal root-mean-square errors were below 0.5 and 0.2 m, respectively, which were ideal for the Loess Plateau region. The segmentation strategy adopted in this paper, which considers the topographic information, and optimal parameter combination can improve the segmentation results. Besides, the overall extraction accuracy in Changwu, Ansai, and Suide achieved was 84.62%, 86.46%, and 93.06%, respectively, which indicated that the proposed method for detecting gully-affected area is more objective and effective than traditional methods. This study demonstrated that UAV can bridge the gap between field measurement and satellite-based remote sensing, obtaining a balance in resolution and efficiency for catchment-scale gully erosion research.

Keywords: random forest, unmanned aerial vehicle (UAV), object-analysis image analysis, gully erosion, gully-affected area, Loess Plateau

Procedia PDF Downloads 97
7 The Employment of Unmanned Aircraft Systems for Identification and Classification of Helicopter Landing Zones and Airdrop Zones in Calamity Situations

Authors: Marielcio Lacerda, Angelo Paulino, Elcio Shiguemori, Alvaro Damiao, Lamartine Guimaraes, Camila Anjos

Abstract:

Accurate information about the terrain is extremely important in disaster management activities or conflict. This paper proposes the use of the Unmanned Aircraft Systems (UAS) at the identification of Airdrop Zones (AZs) and Helicopter Landing Zones (HLZs). In this paper we consider the AZs the zones where troops or supplies are dropped by parachute, and HLZs areas where victims can be rescued. The use of digital image processing enables the automatic generation of an orthorectified mosaic and an actual Digital Surface Model (DSM). This methodology allows obtaining this fundamental information to the terrain’s comprehension post-disaster in a short amount of time and with good accuracy. In order to get the identification and classification of AZs and HLZs images from DJI drone, model Phantom 4 have been used. The images were obtained with the knowledge and authorization of the responsible sectors and were duly registered in the control agencies. The flight was performed on May 24, 2017, and approximately 1,300 images were obtained during approximately 1 hour of flight. Afterward, new attributes were generated by Feature Extraction (FE) from the original images. The use of multispectral images and complementary attributes generated independently from them increases the accuracy of classification. The attributes of this work include the Declivity Map and Principal Component Analysis (PCA). For the classification four distinct classes were considered: HLZ 1 – small size (18m x 18m); HLZ 2 – medium size (23m x 23m); HLZ 3 – large size (28m x 28m); AZ (100m x 100m). The Decision Tree method Random Forest (RF) was used in this work. RF is a classification method that uses a large collection of de-correlated decision trees. Different random sets of samples are used as sampled objects. The results of classification from each tree and for each object is called a class vote. The resulting classification is decided by a majority of class votes. In this case, we used 200 trees for the execution of RF in the software WEKA 3.8. The classification result was visualized on QGIS Desktop 2.12.3. Through the methodology used, it was possible to classify in the study area: 6 areas as HLZ 1, 6 areas as HLZ 2, 4 areas as HLZ 3; and 2 areas as AZ. It should be noted that an area classified as AZ covers the classifications of the other classes, and may be used as AZ, HLZ of large size (HLZ3), medium size (HLZ2) and small size helicopters (HLZ1). Likewise, an area classified as HLZ for large rotary wing aircraft (HLZ3) covers the smaller area classifications, and so on. It was concluded that images obtained through small UAV are of great use in calamity situations since they can provide data with high accuracy, with low cost, low risk and ease and agility in obtaining aerial photographs. This allows the generation, in a short time, of information about the features of the terrain in order to serve as an important decision support tool.

Keywords: Disaster Management, unmanned aircraft systems, random forest, helicopter landing zones, airdrop zones

Procedia PDF Downloads 46
6 Classification for Obstructive Sleep Apnea Syndrome Based on Random Forest

Authors: Cheng-Yu Tsai, Wen-Te Liu, Shin-Mei Hsu, Yin-Tzu Lin, Chi Wu

Abstract:

Background: Obstructive Sleep apnea syndrome (OSAS) is a common respiratory disorder during sleep. In addition, Body parameters were identified high predictive importance for OSAS severity. However, the effects of body parameters on OSAS severity remain unclear. Objective: In this study, the objective is to establish a prediction model for OSAS by using body parameters and investigate the effects of body parameters in OSAS. Methodologies: Severity was quantified as the polysomnography and the mean hourly number of greater than 3% dips in oxygen saturation during examination in a hospital in New Taipei City (Taiwan). Four levels of OSAS severity were classified by the apnea and hypopnea index (AHI) with American Academy of Sleep Medicine (AASM) guideline. Body parameters, including neck circumference, waist size, and body mass index (BMI) were obtained from questionnaire. Next, dividing the collecting subjects into two groups: training and testing groups. The training group was used to establish the random forest (RF) to predicting, and test group was used to evaluated the accuracy of classification. Results: There were 3330 subjects recruited in this study, whom had been done polysomnography for evaluating severity for OSAS. A RF of 1000 trees achieved correctly classified 79.94 % of test cases. When further evaluated on the test cohort, RF showed the waist and BMI as the high import factors in OSAS. Conclusion It is possible to provide patient with prescreening by body parameters which can pre-evaluate the health risks.

Keywords: random forest, obstructive sleep apnea syndrome, apnea and hypopnea index, Body parameters

Procedia PDF Downloads 14
5 Relevant LMA Features for Human Motion Recognition

Authors: Insaf Ajili, Malik Mallem, Jean-Yves Didier

Abstract:

Motion recognition from videos is actually a very complex task due to the high variability of motions. This paper describes the challenges of human motion recognition, especially motion representation step with relevant features. Our descriptor vector is inspired from Laban Movement Analysis method. We propose discriminative features using the Random Forest algorithm in order to remove redundant features and make learning algorithms operate faster and more effectively. We validate our method on MSRC-12 and UTKinect datasets.

Keywords: random forest, features reduction, human motion recognition, discriminative LMA features

Procedia PDF Downloads 30
4 Examination of Public Hospital Unions Technical Efficiencies Using Data Envelopment Analysis and Machine Learning Techniques

Authors: Songul Cinaroglu

Abstract:

Regional planning in health has gained speed for developing countries in recent years. In Turkey, 89 different Public Hospital Unions (PHUs) were conducted based on provincial levels. In this study technical efficiencies of 89 PHUs were examined by using Data Envelopment Analysis (DEA) and machine learning techniques by dividing them into two clusters in terms of similarities of input and output indicators. Number of beds, physicians and nurses determined as input variables and number of outpatients, inpatients and surgical operations determined as output indicators. Before performing DEA, PHUs were grouped into two clusters. It is seen that the first cluster represents PHUs which have higher population, demand and service density than the others. The difference between clusters was statistically significant in terms of all study variables (p ˂ 0.001). After clustering, DEA was performed for general and for two clusters separately. It was found that 11% of PHUs were efficient in general, additionally 21% and 17% of them were efficient for the first and second clusters respectively. It is seen that PHUs, which are representing urban parts of the country and have higher population and service density, are more efficient than others. Random forest decision tree graph shows that number of inpatients is a determinative factor of efficiency of PHUs, which is a measure of service density. It is advisable for public health policy makers to use statistical learning methods in resource planning decisions to improve efficiency in health care.

Keywords: Data Envelopment Analysis, Efficiency, random forest, public hospital unions

Procedia PDF Downloads 21
3 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values

Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi

Abstract:

A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.

Keywords: Alzheimer Disease, Multiclass Classification, random forest, support vector machine, missing data, eXtreme gradient boosting, early mild cognitive impairment, late mild cognitive impair, ADNI

Procedia PDF Downloads 12
2 Machine Learning Techniques for Estimating Ground Motion Parameters

Authors: Farid Khosravikia, Patricia Clayton

Abstract:

The main objective of this study is to evaluate the advantages and disadvantages of various machine learning techniques in forecasting ground-motion intensity measures given source characteristics, source-to-site distance, and local site condition. Intensity measures such as peak ground acceleration and velocity (PGA and PGV, respectively) as well as 5% damped elastic pseudospectral accelerations at different periods (PSA), are indicators of the strength of shaking at the ground surface. Estimating these variables for future earthquake events is a key step in seismic hazard assessment and potentially subsequent risk assessment of different types of structures. Typically, linear regression-based models, with pre-defined equations and coefficients, are used in ground motion prediction. However, due to the restrictions of the linear regression methods, such models may not capture more complex nonlinear behaviors that exist in the data. Thus, this study comparatively investigates potential benefits from employing other machine learning techniques as a statistical method in ground motion prediction such as Artificial Neural Network, Random Forest, and Support Vector Machine. The algorithms are adjusted to quantify event-to-event and site-to-site variability of the ground motions by implementing them as random effects in the proposed models to reduce the aleatory uncertainty. All the algorithms are trained using a selected database of 4,528 ground-motions, including 376 seismic events with magnitude 3 to 5.8, recorded over the hypocentral distance range of 4 to 500 km in Oklahoma, Kansas, and Texas since 2005. The main reason of the considered database stems from the recent increase in the seismicity rate of these states attributed to petroleum production and wastewater disposal activities, which necessities further investigation in the ground motion models developed for these states. Accuracy of the models in predicting intensity measures, generalization capability of the models for future data, as well as usability of the models are discussed in the evaluation process. The results indicate the algorithms satisfy some physically sound characteristics such as magnitude scaling distance dependency without requiring pre-defined equations or coefficients. Moreover, it is shown that, when sufficient data is available, all the alternative algorithms tend to provide more accurate estimates compared to the conventional linear regression-based method, and particularly, Random Forest outperforms the other algorithms. However, the conventional method is a better tool when limited data is available.

Keywords: Machine Learning, Artificial Neural Network, random forest, support vector machine, ground-motion models

Procedia PDF Downloads 2
1 Machine Learning Techniques in Seismic Risk Assessment of Structures

Authors: Farid Khosravikia, Patricia Clayton

Abstract:

The main objective of this work is to evaluate the advantages and disadvantages of various machine learning techniques in two key steps of seismic hazard and risk assessment of different types of structures. The first step is the development of ground-motion models, which are used for forecasting ground-motion intensity measures (IM) given source characteristics, source-to-site distance, and local site condition for future events. IMs such as peak ground acceleration and velocity (PGA and PGV, respectively) as well as 5% damped elastic pseudospectral accelerations at different periods (PSA), are indicators of the strength of shaking at the ground surface. Typically, linear regression-based models, with pre-defined equations and coefficients, are used in ground motion prediction. However, due to the restrictions of the linear regression methods, such models may not capture more complex nonlinear behaviors that exist in the data. Thus, this study comparatively investigates potential benefits from employing other machine learning techniques as statistical method in ground motion prediction such as Artificial Neural Network, Random Forest, and Support Vector Machine. The results indicate the algorithms satisfy some physically sound characteristics such as magnitude scaling distance dependency without requiring pre-defined equations or coefficients. Moreover, it is shown that, when sufficient data is available, all the alternative algorithms tend to provide more accurate estimates compared to the conventional linear regression-based method, and particularly, Random Forest outperforms the other algorithms. However, the conventional method is a better tool when limited data is available. Second, it is investigated how machine learning techniques could be beneficial for developing probabilistic seismic demand models (PSDMs), which provide the relationship between the structural demand responses (e.g., component deformations, accelerations, internal forces, etc.) and the ground motion IMs. In the risk framework, such models are used to develop fragility curves estimating exceeding probability of damage for pre-defined limit states, and therefore, control the reliability of the predictions in the risk assessment. In this study, machine learning algorithms like artificial neural network, random forest, and support vector machine are adopted and trained on the demand parameters to derive PSDMs for them. It is observed that such models can provide more accurate estimates of prediction in relatively shorter about of time compared to conventional methods. Moreover, they can be used for sensitivity analysis of fragility curves with respect to many modeling parameters without necessarily requiring more intense numerical response-history analysis.

Keywords: Machine Learning, Seismic Hazard Analysis, Seismic Risk Analysis, Artificial Neural Network, random forest, support vector machine

Procedia PDF Downloads 2