Search results for: random forest
2791 Random Forest Classification for Population Segmentation
Authors: Regina Chua
Abstract:
To reduce the costs of re-fielding a large survey, a Random Forest classifier was applied to measure the accuracy of classifying individuals into their assigned segments with the fewest possible questions. Given a long survey, one needed to determine the most predictive ten or fewer questions that would accurately assign new individuals to custom segments. Furthermore, the solution needed to be quick in its classification and usable in non-Python environments. In this paper, a supervised Random Forest classifier was modeled on a dataset with 7,000 individuals, 60 questions, and 254 features. The Random Forest consisted of an iterative collection of individual decision trees that result in a predicted segment with robust precision and recall scores compared to a single tree. A random 70-30 stratified sampling for training the algorithm was used, and accuracy trade-offs at different depths for each segment were identified. Ultimately, the Random Forest classifier performed at 87% accuracy at a depth of 10 with 20 instead of 254 features and 10 instead of 60 questions. With an acceptable accuracy in prioritizing feature selection, new tools were developed for non-Python environments: a worksheet with a formulaic version of the algorithm and an embedded function to predict the segment of an individual in real-time. Random Forest was determined to be an optimal classification model by its feature selection, performance, processing speed, and flexible application in other environments.Keywords: machine learning, supervised learning, data science, random forest, classification, prediction, predictive modeling
Procedia PDF Downloads 912790 Determining Optimal Number of Trees in Random Forests
Authors: Songul Cinaroglu
Abstract:
Background: Random Forest is an efficient, multi-class machine learning method using for classification, regression and other tasks. This method is operating by constructing each tree using different bootstrap sample of the data. Determining the number of trees in random forests is an open question in the literature for studies about improving classification performance of random forests. Aim: The aim of this study is to analyze whether there is an optimal number of trees in Random Forests and how performance of Random Forests differ according to increase in number of trees using sample health data sets in R programme. Method: In this study we analyzed the performance of Random Forests as the number of trees grows and doubling the number of trees at every iteration using “random forest” package in R programme. For determining minimum and optimal number of trees we performed Mc Nemar test and Area Under ROC Curve respectively. Results: At the end of the analysis it was found that as the number of trees grows, it does not always means that the performance of the forest is better than forests which have fever trees. In other words larger number of trees only increases computational costs but not increases performance results. Conclusion: Despite general practice in using random forests is to generate large number of trees for having high performance results, this study shows that increasing number of trees doesn’t always improves performance. Future studies can compare different kinds of data sets and different performance measures to test whether Random Forest performance results change as number of trees increase or not.Keywords: classification methods, decision trees, number of trees, random forest
Procedia PDF Downloads 3942789 [Keynote Speech]: Feature Selection and Predictive Modeling of Housing Data Using Random Forest
Authors: Bharatendra Rai
Abstract:
Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).Keywords: housing data, feature selection, random forest, Boruta algorithm, root mean square error
Procedia PDF Downloads 3212788 Predictive Modeling of Bridge Conditions Using Random Forest
Authors: Miral Selim, May Haggag, Ibrahim Abotaleb
Abstract:
The aging of transportation infrastructure presents significant challenges, particularly concerning the monitoring and maintenance of bridges. This study investigates the application of Random Forest algorithms for predictive modeling of bridge conditions, utilizing data from the US National Bridge Inventory (NBI). The research is significant as it aims to improve bridge management through data-driven insights that can enhance maintenance strategies and contribute to overall safety. Random Forest is chosen for its robustness, ability to handle complex, non-linear relationships among variables, and its effectiveness in feature importance evaluation. The study begins with comprehensive data collection and cleaning, followed by the identification of key variables influencing bridge condition ratings, including age, construction materials, environmental factors, and maintenance history. Random Forest is utilized to examine the relationships between these variables and the predicted bridge conditions. The dataset is divided into training and testing subsets to evaluate the model's performance. The findings demonstrate that the Random Forest model effectively enhances the understanding of factors affecting bridge conditions. By identifying bridges at greater risk of deterioration, the model facilitates proactive maintenance strategies, which can help avoid costly repairs and minimize service disruptions. Additionally, this research underscores the value of data-driven decision-making, enabling better resource allocation to prioritize maintenance efforts where they are most necessary. In summary, this study highlights the efficiency and applicability of Random Forest in predictive modeling for bridge management. Ultimately, these findings pave the way for more resilient and proactive management of bridge systems, ensuring their longevity and reliability for future use.Keywords: data analysis, random forest, predictive modeling, bridge management
Procedia PDF Downloads 202787 Forecasting the Fluctuation of Currency Exchange Rate Using Random Forest
Authors: Lule Basha, Eralda Gjika
Abstract:
The exchange rate is one of the most important economic variables, especially for a small, open economy such as Albania. Its effect is noticeable in one country's competitiveness, trade and current account, inflation, wages, domestic economic activity, and bank stability. This study investigates the fluctuation of Albania’s exchange rates using monthly average foreign currency, Euro (Eur) to Albanian Lek (ALL) exchange rate with a time span from January 2008 to June 2021, and the macroeconomic factors that have a significant effect on the exchange rate. Initially, the Random Forest Regression algorithm is constructed to understand the impact of economic variables on the behavior of monthly average foreign currencies exchange rates. Then the forecast of macro-economic indicators for 12 months was performed using time series models. The predicted values received are placed in the random forest model in order to obtain the average monthly forecast of the Euro to Albanian Lek (ALL) exchange rate for the period July 2021 to June 2022.Keywords: exchange rate, random forest, time series, machine learning, prediction
Procedia PDF Downloads 992786 Segmentation of Liver Using Random Forest Classifier
Authors: Gajendra Kumar Mourya, Dinesh Bhatia, Akash Handique, Sunita Warjri, Syed Achaab Amir
Abstract:
Nowadays, Medical imaging has become an integral part of modern healthcare. Abdominal CT images are an invaluable mean for abdominal organ investigation and have been widely studied in the recent years. Diagnosis of liver pathologies is one of the major areas of current interests in the field of medical image processing and is still an open problem. To deeply study and diagnose the liver, segmentation of liver is done to identify which part of the liver is mostly affected. Manual segmentation of the liver in CT images is time-consuming and suffers from inter- and intra-observer differences. However, automatic or semi-automatic computer aided segmentation of the Liver is a challenging task due to inter-patient Liver shape and size variability. In this paper, we present a technique for automatic segmenting the liver from CT images using Random Forest Classifier. Random forests or random decision forests are an ensemble learning method for classification that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees. After comparing with various other techniques, it was found that Random Forest Classifier provide a better segmentation results with respect to accuracy and speed. We have done the validation of our results using various techniques and it shows above 89% accuracy in all the cases.Keywords: CT images, image validation, random forest, segmentation
Procedia PDF Downloads 3112785 Land Cover Classification Using Sentinel-2 Image Data and Random Forest Algorithm
Authors: Thanh Noi Phan, Martin Kappas, Jan Degener
Abstract:
The currently launched Sentinel 2 (S2) satellite (June, 2015) bring a great potential and opportunities for land use/cover map applications, due to its fine spatial resolution multispectral as well as high temporal resolutions. So far, there are handful studies using S2 real data for land cover classification. Especially in northern Vietnam, to our best knowledge, there exist no studies using S2 data for land cover map application. The aim of this study is to provide the preliminary result of land cover classification using Sentinel -2 data with a rising state – of – art classifier, Random Forest. A case study with heterogeneous land use/cover in the eastern of Hanoi Capital – Vietnam was chosen for this study. All 10 spectral bands of 10 and 20 m pixel size of S2 images were used, the 10 m bands were resampled to 20 m. Among several classified algorithms, supervised Random Forest classifier (RF) was applied because it was reported as one of the most accuracy methods of satellite image classification. The results showed that the red-edge and shortwave infrared (SWIR) bands play an important role in land cover classified results. A very high overall accuracy above 90% of classification results was achieved.Keywords: classify algorithm, classification, land cover, random forest, sentinel 2, Vietnam
Procedia PDF Downloads 3842784 Using Machine Learning to Enhance Win Ratio for College Ice Hockey Teams
Authors: Sadixa Sanjel, Ahmed Sadek, Naseef Mansoor, Zelalem Denekew
Abstract:
Collegiate ice hockey (NCAA) sports analytics is different from the national level hockey (NHL). We apply and compare multiple machine learning models such as Linear Regression, Random Forest, and Neural Networks to predict the win ratio for a team based on their statistics. Data exploration helps determine which statistics are most useful in increasing the win ratio, which would be beneficial to coaches and team managers. We ran experiments to select the best model and chose Random Forest as the best performing. We conclude with how to bridge the gap between the college and national levels of sports analytics and the use of machine learning to enhance team performance despite not having a lot of metrics or budget for automatic tracking.Keywords: NCAA, NHL, sports analytics, random forest, regression, neural networks, game predictions
Procedia PDF Downloads 1132783 Using Combination of Sets of Features of Molecules for Aqueous Solubility Prediction: A Random Forest Model
Authors: Muhammet Baldan, Emel Timuçin
Abstract:
Generally, absorption and bioavailability increase if solubility increases; therefore, it is crucial to predict them in drug discovery applications. Molecular descriptors and Molecular properties are traditionally used for the prediction of water solubility. There are various key descriptors that are used for this purpose, namely Drogan Descriptors, Morgan Descriptors, Maccs keys, etc., and each has different prediction capabilities with differentiating successes between different data sets. Another source for the prediction of solubility is structural features; they are commonly used for the prediction of solubility. However, there are little to no studies that combine three or more properties or descriptors for prediction to produce a more powerful prediction model. Unlike available models, we used a combination of those features in a random forest machine learning model for improved solubility prediction to better predict and, therefore, contribute to drug discovery systems.Keywords: solubility, random forest, molecular descriptors, maccs keys
Procedia PDF Downloads 452782 Optimization of Machine Learning Regression Results: An Application on Health Expenditures
Authors: Songul Cinaroglu
Abstract:
Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure
Procedia PDF Downloads 2232781 Application of Random Forest Model in The Prediction of River Water Quality
Authors: Turuganti Venkateswarlu, Jagadeesh Anmala
Abstract:
Excessive runoffs from various non-point source land uses, and other point sources are rapidly contaminating the water quality of streams in the Upper Green River watershed, Kentucky, USA. It is essential to maintain the stream water quality as the river basin is one of the major freshwater sources in this province. It is also important to understand the water quality parameters (WQPs) quantitatively and qualitatively along with their important features as stream water is sensitive to climatic events and land-use practices. In this paper, a model was developed for predicting one of the significant WQPs, Fecal Coliform (FC) from precipitation, temperature, urban land use factor (ULUF), agricultural land use factor (ALUF), and forest land-use factor (FLUF) using Random Forest (RF) algorithm. The RF model, a novel ensemble learning algorithm, can even find out advanced feature importance characteristics from the given model inputs for different combinations. This model’s outcomes showed a good correlation between FC and climate events and land use factors (R2 = 0.94) and precipitation and temperature are the primary influencing factors for FC.Keywords: water quality, land use factors, random forest, fecal coliform
Procedia PDF Downloads 1942780 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder
Authors: Dua Hişam, Serhat İkizoğlu
Abstract:
Identifying the problem behind balance disorder is one of the most interesting topics in the medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three machine learning (ML) models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest Classifier (RF) was the most accurate model.Keywords: vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting
Procedia PDF Downloads 672779 Relevant LMA Features for Human Motion Recognition
Authors: Insaf Ajili, Malik Mallem, Jean-Yves Didier
Abstract:
Motion recognition from videos is actually a very complex task due to the high variability of motions. This paper describes the challenges of human motion recognition, especially motion representation step with relevant features. Our descriptor vector is inspired from Laban Movement Analysis method. We propose discriminative features using the Random Forest algorithm in order to remove redundant features and make learning algorithms operate faster and more effectively. We validate our method on MSRC-12 and UTKinect datasets.Keywords: discriminative LMA features, features reduction, human motion recognition, random forest
Procedia PDF Downloads 1922778 Using Predictive Analytics to Identify First-Year Engineering Students at Risk of Failing
Authors: Beng Yew Low, Cher Liang Cha, Cheng Yong Teoh
Abstract:
Due to a lack of continual assessment or grade related data, identifying first-year engineering students in a polytechnic education at risk of failing is challenging. Our experience over the years tells us that there is no strong correlation between having good entry grades in Mathematics and the Sciences and excelling in hardcore engineering subjects. Hence, identifying students at risk of failure cannot be on the basis of entry grades in Mathematics and the Sciences alone. These factors compound the difficulty of early identification and intervention. This paper describes the development of a predictive analytics model in the early detection of students at risk of failing and evaluates its effectiveness. Data from continual assessments conducted in term one, supplemented by data of student psychological profiles such as interests and study habits, were used. Three classification techniques, namely Logistic Regression, K Nearest Neighbour, and Random Forest, were used in our predictive model. Based on our findings, Random Forest was determined to be the strongest predictor with an Area Under the Curve (AUC) value of 0.994. Correspondingly, the Accuracy, Precision, Recall, and F-Score were also highest among these three classifiers. Using this Random Forest Classification technique, students at risk of failure could be identified at the end of term one. They could then be assigned to a Learning Support Programme at the beginning of term two. This paper gathers the results of our findings. It also proposes further improvements that can be made to the model.Keywords: continual assessment, predictive analytics, random forest, student psychological profile
Procedia PDF Downloads 1322777 Community Forest Management Practice in Nepal: Public Understanding of Forest Benefit
Authors: Chandralal Shrestha
Abstract:
In the developing countries like Nepal, the community based forest management approach has often been glorified as one of the best forest management alternatives to maximize the forest benefits. Though the approach has succeeded to construct a local level institution and conserve the forest biodiversity, how the local communities perceived about the forest benefits, the question always remains silent among the researchers and policy makers. The paper aims to explore the understanding of forest benefits from the perspective of local communities who used the forests in terms of institutional stability, equity and livelihood opportunity, and ecological stability. The paper revealed that the local communities have mixed understanding over the forest benefits. The institutional and ecological activities carried out by the local communities indicated that they have better understanding over the forest benefits. However, inequality while sharing the forest benefits, low pricing strategy and its negative consequences in valuation of forest products and limited livelihood opportunities indicated the poor understanding.Keywords: community based forest management, forest benefits, lowland, Nepal
Procedia PDF Downloads 3102776 Machine Learning-Driven Prediction of Cardiovascular Diseases: A Supervised Approach
Authors: Thota Sai Prakash, B. Yaswanth, Jhade Bhuvaneswar, Marreddy Divakar Reddy, Shyam Ji Gupta
Abstract:
Across the globe, there are a lot of chronic diseases, and heart disease stands out as one of the most perilous. Sadly, many lives are lost to this condition, even though early intervention could prevent such tragedies. However, identifying heart disease in its initial stages is not easy. To address this challenge, we propose an automated system aimed at predicting the presence of heart disease using advanced techniques. By doing so, we hope to empower individuals with the knowledge needed to take proactive measures against this potentially fatal illness. Our approach towards this problem involves meticulous data preprocessing and the development of predictive models utilizing classification algorithms such as Support Vector Machines (SVM), Decision Tree, and Random Forest. We assess the efficiency of every model based on metrics like accuracy, ensuring that we select the most reliable option. Additionally, we conduct thorough data analysis to reveal the importance of different attributes. Among the models considered, Random Forest emerges as the standout performer with an accuracy rate of 96.04% in our study.Keywords: support vector machines, decision tree, random forest
Procedia PDF Downloads 372775 Application of Machine Learning on Google Earth Engine for Forest Fire Severity, Burned Area Mapping and Land Surface Temperature Analysis: Rajasthan, India
Authors: Alisha Sinha, Laxmi Kant Sharma
Abstract:
Forest fires are a recurring issue in many parts of the world, including India. These fires can have various causes, including human activities (such as agricultural burning, campfires, or discarded cigarettes) and natural factors (such as lightning). This study presents a comprehensive and advanced methodology for assessing wildfire susceptibility by integrating diverse environmental variables and leveraging cutting-edge machine learning techniques across Rajasthan, India. The primary goal of the study is to utilize Google Earth Engine to compare locations in Sariska National Park, Rajasthan (India), before and after forest fires. High-resolution satellite data were used to assess the amount and types of changes caused by forest fires. The present study meticulously analyzes various environmental variables, i.e., slope orientation, elevation, normalized difference vegetation index (NDVI), drainage density, precipitation, and temperature, to understand landscape characteristics and assess wildfire susceptibility. In addition, a sophisticated random forest regression model is used to predict land surface temperature based on a set of environmental parameters.Keywords: wildfire susceptibility mapping, LST, random forest, GEE, MODIS, climatic parameters
Procedia PDF Downloads 182774 Predictive Analysis of Chest X-rays Using NLP and Large Language Models with the Indiana University Dataset and Random Forest Classifier
Authors: Azita Ramezani, Ghazal Mashhadiagha, Bahareh Sanabakhsh
Abstract:
This study researches the combination of Random. Forest classifiers with large language models (LLMs) and natural language processing (NLP) to improve diagnostic accuracy in chest X-ray analysis using the Indiana University dataset. Utilizing advanced NLP techniques, the research preprocesses textual data from radiological reports to extract key features, which are then merged with image-derived data. This improved dataset is analyzed with Random Forest classifiers to predict specific clinical results, focusing on the identification of health issues and the estimation of case urgency. The findings reveal that the combination of NLP, LLMs, and machine learning not only increases diagnostic precision but also reliability, especially in quickly identifying critical conditions. Achieving an accuracy of 99.35%, the model shows significant advancements over conventional diagnostic techniques. The results emphasize the large potential of machine learning in medical imaging, suggesting that these technologies could greatly enhance clinician judgment and patient outcomes by offering quicker and more precise diagnostic approximations.Keywords: natural language processing (NLP), large language models (LLMs), random forest classifier, chest x-ray analysis, medical imaging, diagnostic accuracy, indiana university dataset, machine learning in healthcare, predictive modeling, clinical decision support systems
Procedia PDF Downloads 422773 Manufacturing Anomaly Detection Using a Combination of Gated Recurrent Unit Network and Random Forest Algorithm
Authors: Atinkut Atinafu Yilma, Eyob Messele Sefene
Abstract:
Anomaly detection is one of the essential mechanisms to control and reduce production loss, especially in today's smart manufacturing. Quick anomaly detection aids in reducing the cost of production by minimizing the possibility of producing defective products. However, developing an anomaly detection model that can rapidly detect a production change is challenging. This paper proposes Gated Recurrent Unit (GRU) combined with Random Forest (RF) to detect anomalies in the production process in real-time quickly. The GRU is used as a feature detector, and RF as a classifier using the input features from GRU. The model was tested using various synthesis and real-world datasets against benchmark methods. The results show that the proposed GRU-RF outperforms the benchmark methods with the shortest time taken to detect anomalies in the production process. Based on the investigation from the study, this proposed model can eliminate or reduce unnecessary production costs and bring a competitive advantage to manufacturing industries.Keywords: anomaly detection, multivariate time series data, smart manufacturing, gated recurrent unit network, random forest
Procedia PDF Downloads 1162772 Community Forestry Programme through the Local Forest Users Group, Nepal
Authors: Daniyal Neupane
Abstract:
Establishment of community forestry in Nepal is a successful step in the conservation of forests. Community forestry programme through the local forest users group has shown its positive impacts in the society. This paper discusses an overview of the present scenario of the community forestry in Nepal. It describes the brief historical background, some important forest legislations, and organization of forest. The paper also describes the internal conflicts between forest users and district forest offices, and possible resolution. It also suggests some of the aspects of community forestry in which the research needs to be focused for the better management of the forests in Nepal.Keywords: community forest, conservation of forest, local forest users group, better management, Nepal
Procedia PDF Downloads 3052771 Single Imputation for Audiograms
Authors: Sarah Beaver, Renee Bryce
Abstract:
Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125Hz to 8000Hz. The data contains patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R\textsuperscript{2} values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R\textsuperscript{2} values for the best models for KNN ranges from .89 to .95. The best imputation models received R\textsuperscript{2} between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our best imputation models versus constant imputations by a two percent increase.Keywords: machine learning, audiograms, data imputations, single imputations
Procedia PDF Downloads 802770 Climate Changes in Albania and Their Effect on Cereal Yield
Authors: Lule Basha, Eralda Gjika
Abstract:
This study is focused on analyzing climate change in Albania and its potential effects on cereal yields. Initially, monthly temperature and rainfalls in Albania were studied for the period 1960-2021. Climacteric variables are important variables when trying to model cereal yield behavior, especially when significant changes in weather conditions are observed. For this purpose, in the second part of the study, linear and nonlinear models explaining cereal yield are constructed for the same period, 1960-2021. The multiple linear regression analysis and lasso regression method are applied to the data between cereal yield and each independent variable: average temperature, average rainfall, fertilizer consumption, arable land, land under cereal production, and nitrous oxide emissions. In our regression model, heteroscedasticity is not observed, data follow a normal distribution, and there is a low correlation between factors, so we do not have the problem of multicollinearity. Machine-learning methods, such as random forest, are used to predict cereal yield responses to climacteric and other variables. Random Forest showed high accuracy compared to the other statistical models in the prediction of cereal yield. We found that changes in average temperature negatively affect cereal yield. The coefficients of fertilizer consumption, arable land, and land under cereal production are positively affecting production. Our results show that the Random Forest method is an effective and versatile machine-learning method for cereal yield prediction compared to the other two methods.Keywords: cereal yield, climate change, machine learning, multiple regression model, random forest
Procedia PDF Downloads 882769 Loan Repayment Prediction Using Machine Learning: Model Development, Django Web Integration and Cloud Deployment
Authors: Seun Mayowa Sunday
Abstract:
Loan prediction is one of the most significant and recognised fields of research in the banking, insurance, and the financial security industries. Some prediction systems on the market include the construction of static software. However, due to the fact that static software only operates with strictly regulated rules, they cannot aid customers beyond these limitations. Application of many machine learning (ML) techniques are required for loan prediction. Four separate machine learning models, random forest (RF), decision tree (DT), k-nearest neighbour (KNN), and logistic regression, are used to create the loan prediction model. Using the anaconda navigator and the required machine learning (ML) libraries, models are created and evaluated using the appropriate measuring metrics. From the finding, the random forest performs with the highest accuracy of 80.17% which was later implemented into the Django framework. For real-time testing, the web application is deployed on the Alibabacloud which is among the top 4 biggest cloud computing provider. Hence, to the best of our knowledge, this research will serve as the first academic paper which combines the model development and the Django framework, with the deployment into the Alibaba cloud computing application.Keywords: k-nearest neighbor, random forest, logistic regression, decision tree, django, cloud computing, alibaba cloud
Procedia PDF Downloads 1322768 Monitoring Large-Coverage Forest Canopy Height by Integrating LiDAR and Sentinel-2 Images
Authors: Xiaobo Liu, Rakesh Mishra, Yun Zhang
Abstract:
Continuous monitoring of forest canopy height with large coverage is essential for obtaining forest carbon stocks and emissions, quantifying biomass estimation, analyzing vegetation coverage, and determining biodiversity. LiDAR can be used to collect accurate woody vegetation structure such as canopy height. However, LiDAR’s coverage is usually limited because of its high cost and limited maneuverability, which constrains its use for dynamic and large area forest canopy monitoring. On the other hand, optical satellite images, like Sentinel-2, have the ability to cover large forest areas with a high repeat rate, but they do not have height information. Hence, exploring the solution of integrating LiDAR data and Sentinel-2 images to enlarge the coverage of forest canopy height prediction and increase the prediction repeat rate has been an active research topic in the environmental remote sensing community. In this study, we explore the potential of training a Random Forest Regression (RFR) model and a Convolutional Neural Network (CNN) model, respectively, to develop two predictive models for predicting and validating the forest canopy height of the Acadia Forest in New Brunswick, Canada, with a 10m ground sampling distance (GSD), for the year 2018 and 2021. Two 10m airborne LiDAR-derived canopy height models, one for 2018 and one for 2021, are used as ground truth to train and validate the RFR and CNN predictive models. To evaluate the prediction performance of the trained RFR and CNN models, two new predicted canopy height maps (CHMs), one for 2018 and one for 2021, are generated using the trained RFR and CNN models and 10m Sentinel-2 images of 2018 and 2021, respectively. The two 10m predicted CHMs from Sentinel-2 images are then compared with the two 10m airborne LiDAR-derived canopy height models for accuracy assessment. The validation results show that the mean absolute error (MAE) for year 2018 of the RFR model is 2.93m, CNN model is 1.71m; while the MAE for year 2021 of the RFR model is 3.35m, and the CNN model is 3.78m. These demonstrate the feasibility of using the RFR and CNN models developed in this research for predicting large-coverage forest canopy height at 10m spatial resolution and a high revisit rate.Keywords: remote sensing, forest canopy height, LiDAR, Sentinel-2, artificial intelligence, random forest regression, convolutional neural network
Procedia PDF Downloads 902767 A Study of Classification Models to Predict Drill-Bit Breakage Using Degradation Signals
Authors: Bharatendra Rai
Abstract:
Cutting tools are widely used in manufacturing processes and drilling is the most commonly used machining process. Although drill-bits used in drilling may not be expensive, their breakage can cause damage to expensive work piece being drilled and at the same time has major impact on productivity. Predicting drill-bit breakage, therefore, is important in reducing cost and improving productivity. This study uses twenty features extracted from two degradation signals viz., thrust force and torque. The methodology used involves developing and comparing decision tree, random forest, and multinomial logistic regression models for classifying and predicting drill-bit breakage using degradation signals.Keywords: degradation signal, drill-bit breakage, random forest, multinomial logistic regression
Procedia PDF Downloads 3502766 Simulation of Forest Fire Using Wireless Sensor Network
Authors: Mohammad F. Fauzi, Nurul H. Shahba M. Shahrun, Nurul W. Hamzah, Mohd Noah A. Rahman, Afzaal H. Seyal
Abstract:
In this paper, we proposed a simulation system using Wireless Sensor Network (WSN) that will be distributed around the forest for early forest fire detection and to locate the areas affected. In Brunei Darussalam, approximately 78% of the nation is covered by forest. Since the forest is Brunei’s most precious natural assets, it is very important to protect and conserve our forest. The hot climate in Brunei Darussalam can lead to forest fires which can be a fatal threat to the preservation of our forest. The process consists of getting data from the sensors, analyzing the data and producing an alert. The key factors that we are going to analyze are the surrounding temperature, wind speed and wind direction, humidity of the air and soil.Keywords: forest fire monitor, humidity, wind direction, wireless sensor network
Procedia PDF Downloads 4512765 Economic Benefits in Community Based Forest Management from Users Perspective in Community Forestry, Nepal
Authors: Sovit Pujari
Abstract:
In the developing countries like Nepal, the community-based forest management approach has often been glorified as one of the best forest management alternatives to maximize the forest benefits. Though the approach has succeeded to construct a local level institution and conserve the forest biodiversity, how the local communities perceived about the forest benefits, the question always remains silent among the researchers and policy makers. The paper aims to explore the understanding of forest benefits from the perspective of local communities who used the forests in terms of institutional stability, equity and livelihood opportunity, and ecological stability. The paper revealed that the local communities have mixed understanding over the forest benefits. The institutional and ecological activities carried out by the local communities indicated that they have a better understanding over the forest benefits. However, inequality while sharing the forest benefits, low pricing strategy and its negative consequences in the valuation of forest products and limited livelihood opportunities indicating the poor understanding.Keywords: community based forest management, low pricing strategy, forest benefits, livelihood opportunities, Nepal
Procedia PDF Downloads 3432764 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second
Authors: P. V. Pramila , V. Mahesh
Abstract:
Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest
Procedia PDF Downloads 3082763 Comparative Study od Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast
Authors: Nabilah Filzah Mohd Radzuan, Andi Putra, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan
Abstract:
Precipitation forecast is important to avoid natural disaster incident which can cause losses in the involved area. This paper reviews three techniques logistic regression, decision tree, and random forest which are used in making precipitation forecast. These combination techniques through the vector auto-regression (VAR) model help in finding the advantages and strengths of each technique in the forecast process. The data-set contains variables of the rain’s domain. Adaptation of artificial intelligence techniques involved in rain domain enables the forecast process to be easier and systematic for precipitation forecast.Keywords: logistic regression, decisions tree, random forest, VAR model
Procedia PDF Downloads 4442762 Perceptions of Climate Change Risk to Forest Ecosystems: A Case Study of Patale Community Forestry User Group, Nepal
Authors: N. R. P Withana, E. Auch
Abstract:
The purpose of this study was to investigate perceptions of climate change risk to forest ecosystems and forest-based communities as well as perceived effectiveness of adaptation strategies for climate change as well as challenges for adaptation. Data was gathered using a pre-tested semi-structured questionnaire. Simple random selection technique was applied. For the majority of issues, the responses were obtained on multi-point Likert scales, and the scores provided were, in turn, used to estimate the means and other useful estimates. A composite knowledge index developed using correct responses to a set of self-rated statements were used to evaluate the issues. The mean of the knowledge index was 0.64. Also all respondents recorded values of the knowledge index above 0.25. Increase forest fire was perceived by respondents as the greatest risk to forest eco-system. Decrease access to water supplies was perceived as the greatest risk to livelihoods of forest based communities. The most effective adaptation strategy relevant to climate change risks to forest eco-systems and forest based communities livelihoods in Kathmandu valley in Nepal as perceived by the respondents was reforestation and afforestation. As well, lack of public awareness was perceived as the major limitation for climate change adaptation. However, perceived risks as well as effective adaptation strategies showed an inconsistent association with knowledge indicators and social-cultural variables. The results provide useful information to any party who involve with climate change issues in Nepal, since such attempts would be more effective once the people’s perceptions on these aspects are taken into account.Keywords: climate change, risk perceptions, forest ecosystems, forest-based communities
Procedia PDF Downloads 397