Search results for: outliers
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 79

Search results for: outliers

49 Anomaly Detection Based Fuzzy K-Mode Clustering for Categorical Data

Authors: Murat Yazici

Abstract:

Anomalies are irregularities found in data that do not adhere to a well-defined standard of normal behavior. The identification of outliers or anomalies in data has been a subject of study within the statistics field since the 1800s. Over time, a variety of anomaly detection techniques have been developed in several research communities. The cluster analysis can be used to detect anomalies. It is the process of associating data with clusters that are as similar as possible while dissimilar clusters are associated with each other. Many of the traditional cluster algorithms have limitations in dealing with data sets containing categorical properties. To detect anomalies in categorical data, fuzzy clustering approach can be used with its advantages. The fuzzy k-Mode (FKM) clustering algorithm, which is one of the fuzzy clustering approaches, by extension to the k-means algorithm, is reported for clustering datasets with categorical values. It is a form of clustering: each point can be associated with more than one cluster. In this paper, anomaly detection is performed on two simulated data by using the FKM cluster algorithm. As a significance of the study, the FKM cluster algorithm allows to determine anomalies with their abnormality degree in contrast to numerous anomaly detection algorithms. According to the results, the FKM cluster algorithm illustrated good performance in the anomaly detection of data, including both one anomaly and more than one anomaly.

Keywords: fuzzy k-mode clustering, anomaly detection, noise, categorical data

Procedia PDF Downloads 13
48 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 247
47 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size

Authors: Jude Opara, Esemokumo Perewarebo Akpos

Abstract:

This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.

Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS

Procedia PDF Downloads 275
46 Geospatial Network Analysis Using Particle Swarm Optimization

Authors: Varun Singh, Mainak Bandyopadhyay, Maharana Pratap Singh

Abstract:

The shortest path (SP) problem concerns with finding the shortest path from a specific origin to a specified destination in a given network while minimizing the total cost associated with the path. This problem has widespread applications. Important applications of the SP problem include vehicle routing in transportation systems particularly in the field of in-vehicle Route Guidance System (RGS) and traffic assignment problem (in transportation planning). Well known applications of evolutionary methods like Genetic Algorithms (GA), Ant Colony Optimization, Particle Swarm Optimization (PSO) have come up to solve complex optimization problems to overcome the shortcomings of existing shortest path analysis methods. It has been reported by various researchers that PSO performs better than other evolutionary optimization algorithms in terms of success rate and solution quality. Further Geographic Information Systems (GIS) have emerged as key information systems for geospatial data analysis and visualization. This research paper is focused towards the application of PSO for solving the shortest path problem between multiple points of interest (POI) based on spatial data of Allahabad City and traffic speed data collected using GPS. Geovisualization of results of analysis is carried out in GIS.

Keywords: particle swarm optimization, GIS, traffic data, outliers

Procedia PDF Downloads 447
45 Corporate Socially Responsible and Financial Performance in the Tourism-Related Industries

Authors: Yu Shan Wang

Abstract:

Different from other industries, the structure of the tourism industry depends to a large degree the environmental and cultural resources. The industry has to undertake social responsibilities for its commercial behaviour. This paper refers to the seven dimensions of the KLD STATS in 1991-2011 as the indicator to CSR practices. The purpose is to investigate what CSR activities create significant impacts on accounting-based financials and firm values by delving into different CSR dimensions. Meanwhile, this paper takes into consideration S&P 500 and control variables (firm sizes and financial leverage). In fact, the commercial behavior of the tourism-related industry may result in negative impacts on the economy and the society. Therefore, this paper classifies a positive set of CSR elements and a negative set of CSR elements for the tourism-related industry in order to examine their respective effects on short-term profitability and long-term firm values. This can shed light on which CSR dimensions exhibit significant impacts on CFP better than holistic CSR indicators, and hence provide more useful information to investors and corporates. This paper uses quantile regressions to avoid the impact of outliers in the data set. This helps to offer specific information so that companies can make informed decisions.

Keywords: corporate social responsibility, CSR, firm value, tourism, corporate financial performance, CFP

Procedia PDF Downloads 250
44 A Robust and Adaptive Unscented Kalman Filter for the Air Fine Alignment of the Strapdown Inertial Navigation System/GPS

Authors: Jian Shi, Baoguo Yu, Haonan Jia, Meng Liu, Ping Huang

Abstract:

Adapting to the flexibility of war, a large number of guided weapons launch from aircraft. Therefore, the inertial navigation system loaded in the weapon needs to undergo an alignment process in the air. This article proposes the following methods to the problem of inaccurate modeling of the system under large misalignment angles, the accuracy reduction of filtering caused by outliers, and the noise changes in GPS signals: first, considering the large misalignment errors of Strapdown Inertial Navigation System (SINS)/GPS, a more accurate model is made rather than to make a small-angle approximation, and the Unscented Kalman Filter (UKF) algorithms are used to estimate the state; then, taking into account the impact of GPS noise changes on the fine alignment algorithm, the innovation adaptive filtering algorithm is introduced to estimate the GPS’s noise in real-time; at the same time, in order to improve the anti-interference ability of the air fine alignment algorithm, a robust filtering algorithm based on outlier detection is combined with the air fine alignment algorithm to improve the robustness of the algorithm. The algorithm can improve the alignment accuracy and robustness under interference conditions, which is verified by simulation.

Keywords: air alignment, fine alignment, inertial navigation system, integrated navigation system, UKF

Procedia PDF Downloads 125
43 Long-Term Subcentimeter-Accuracy Landslide Monitoring Using a Cost-Effective Global Navigation Satellite System Rover Network: Case Study

Authors: Vincent Schlageter, Maroua Mestiri, Florian Denzinger, Hugo Raetzo, Michel Demierre

Abstract:

Precise landslide monitoring with differential global navigation satellite system (GNSS) is well known, but technical or economic reasons limit its application by geotechnical companies. This study demonstrates the reliability and the usefulness of Geomon (Infrasurvey Sàrl, Switzerland), a stand-alone and cost-effective rover network. The system permits deploying up to 15 rovers, plus one reference station for differential GNSS. A dedicated radio communication links all the modules to a base station, where an embedded computer automatically provides all the relative positions (L1 phase, open-source RTKLib software) and populates an Internet server. Each measure also contains information from an internal inclinometer, battery level, and position quality indices. Contrary to standard GNSS survey systems, which suffer from a limited number of beacons that must be placed in areas with good GSM signal, Geomon offers greater flexibility and permits a real overview of the whole landslide with good spatial resolution. Each module is powered with solar panels, ensuring autonomous long-term recordings. In this study, we have tested the system on several sites in the Swiss mountains, setting up to 7 rovers per site, for an 18 month-long survey. The aim was to assess the robustness and the accuracy of the system in different environmental conditions. In one case, we ran forced blind tests (vertical movements of a given amplitude) and compared various session parameters (duration from 10 to 90 minutes). Then the other cases were a survey of real landslides sites using fixed optimized parameters. Sub centimetric-accuracy with few outliers was obtained using the best parameters (session duration of 60 minutes, baseline 1 km or less), with the noise level on the horizontal component half that of the vertical one. The performance (percent of aborting solutions, outliers) was reduced with sessions shorter than 30 minutes. The environment also had a strong influence on the percent of aborting solutions (ambiguity search problem), due to multiple reflections or satellites obstructed by trees and mountains. The length of the baseline (distance reference-rover, single baseline processing) reduced the accuracy above 1 km but had no significant effect below this limit. In critical weather conditions, the system’s robustness was limited: snow, avalanche, and frost-covered some rovers, including the antenna and vertically oriented solar panels, leading to data interruption; and strong wind damaged a reference station. The possibility of changing the sessions’ parameters remotely was very useful. In conclusion, the rover network tested provided the foreseen sub-centimetric-accuracy while providing a dense spatial resolution landslide survey. The ease of implementation and the fully automatic long-term survey were timesaving. Performance strongly depends on surrounding conditions, but short pre-measures should allow moving a rover to a better final placement. The system offers a promising hazard mitigation technique. Improvements could include data post-processing for alerts and automatic modification of the duration and numbers of sessions based on battery level and rover displacement velocity.

Keywords: GNSS, GSM, landslide, long-term, network, solar, spatial resolution, sub-centimeter.

Procedia PDF Downloads 89
42 Radiological Assessment of Fish Samples Due to Natural Radionuclides in River Yobe, North Eastern Nigeria

Authors: H. T. Abba, Abbas Baba Kura

Abstract:

Assessment of natural radioactivity of some fish samples in river Yobe was conducted, using gamma spectroscopy method with NaI(TI) detector. Radioactivity is phenomenon that leads to production of radiations, whereas radiation is known to trigger or induce cancer. The fish were analyzed to estimate the radioactivity (activity) concentrations due to natural radionuclides (Radium 222(226Ra), Thorium 232 (232Th) and Potassium 40 (40K)). The obtained result show that the activity concentration for (226Ra), in all the fish samples collected ranges from 15.23±2.45 BqKg-1 to 67.39±2.13 BqKg-1 with an average value of 34.13±1.34 BqKg-1. That of 232Th, ranges from 42.66±0.81 BqKg-1 to 201.18±3.82 BqKg-1, and the average value stands at 96.01±3.82 BqKg-1. The activity concentration for 40K, ranges between 243.3±1.56 BqKg-1 to 618.2±2.81 BqKg-1 and the average is 413.92±1.7 BqKg-1. This study indicated that average daily intake due to natural activity from the fish is valued at 0.913 Bq/day, 2.577Bq/day and 11.088 Bq/day for 226Ra, 232Th and 40K respectively. This shows that the activity concentration values for fish, shows a promising result with most of the fish activity concentrations been within the acceptable limits. However locations (F02, F07 and F12) fish, became outliers with significant values of 112.53μSvy-1, 121.11μSvy-1 and 114.32μSvy-1 effective Dose. This could be attributed to variation in geological formations within the river as while as the feeding habits of these fish. The work shows that consumers of fish from River Yobe have no risk of radioactivity ingestion, even though no amount of radiation is assumed to be totally safe.

Keywords: radiation, radio-activity, dose, radionuclides, river Yobe

Procedia PDF Downloads 283
41 Pyramidal Lucas-Kanade Optical Flow Based Moving Object Detection in Dynamic Scenes

Authors: Hyojin Lim, Cuong Nguyen Khac, Yeongyu Choi, Ho-Youl Jung

Abstract:

In this paper, we propose a simple moving object detection, which is based on motion vectors obtained from pyramidal Lucas-Kanade optical flow. The proposed method detects moving objects such as pedestrians, the other vehicles and some obstacles at the front-side of the host vehicle, and it can provide the warning to the driver. Motion vectors are obtained by using pyramidal Lucas-Kanade optical flow, and some outliers are eliminated by comparing the amplitude of each vector with the pre-defined threshold value. The background model is obtained by calculating the mean and the variance of the amplitude of recent motion vectors in the rectangular shaped local region called the cell. The model is applied as the reference to classify motion vectors of moving objects and those of background. Motion vectors are clustered to rectangular regions by using the unsupervised clustering K-means algorithm. Labeling method is applied to label groups which is close to each other, using by distance between each center points of rectangular. Through the simulations tested on four kinds of scenarios such as approaching motorbike, vehicle, and pedestrians to host vehicle, we prove that the proposed is simple but efficient for moving object detection in parking lots.

Keywords: moving object detection, dynamic scene, optical flow, pyramidal optical flow

Procedia PDF Downloads 313
40 Re-Imagining and De-Constructing the Global Security Architecture

Authors: Smita Singh

Abstract:

The paper develops a critical framework to the hegemonic discourses resorted to by the dominant powers in the global security architecture. Within this framework, security is viewed as a discourse through which identities and threats are represented and produced to legitimize the security concerns of few at the cost of others. International security have long been driven and dominated by power relations. Since the end of the Cold War, the global transformations have triggered contestations to the idea of security at both theoretical and practical level. These widening and deepening of the concept of security have challenged the existing power hierarchies at the theoretical level but not altered the substance and actors defining it. When discourses are introduced into security studies, several critical questions erupt: how has power shaped security policies of the globe through language? How does one understand the meanings and impact of those discourses? Who decides the agenda, rules, players and outliers of the security? Language as a symbolic system and form of power is fluid and not fixed. Over the years the dominant Western powers, led by the United States of America have employed various discursive practices such as humanitarian intervention, responsibility to protect, non proliferation, human rights, war on terror and so on to reorient the constitution of identities and interests and hence the policies that need to be adopted for its actualization. These power relations are illustrated in this paper through the narratives used in the nonproliferation regime. The hierarchical security dynamics is a manifestation of the global power relations driven by many factors including discourses.

Keywords: hegemonic discourse, global security, non-proliferation regime, power politics

Procedia PDF Downloads 288
39 Hybrid GNN Based Machine Learning Forecasting Model For Industrial IoT Applications

Authors: Atish Bagchi, Siva Chandrasekaran

Abstract:

Background: According to World Bank national accounts data, the estimated global manufacturing value-added output in 2020 was 13.74 trillion USD. These manufacturing processes are monitored, modelled, and controlled by advanced, real-time, computer-based systems, e.g., Industrial IoT, PLC, SCADA, etc. These systems measure and manipulate a set of physical variables, e.g., temperature, pressure, etc. Despite the use of IoT, SCADA etc., in manufacturing, studies suggest that unplanned downtime leads to economic losses of approximately 864 billion USD each year. Therefore, real-time, accurate detection, classification and prediction of machine behaviour are needed to minimise financial losses. Although vast literature exists on time-series data processing using machine learning, the challenges faced by the industries that lead to unplanned downtimes are: The current algorithms do not efficiently handle the high-volume streaming data from industrial IoTsensors and were tested on static and simulated datasets. While the existing algorithms can detect significant 'point' outliers, most do not handle contextual outliers (e.g., values within normal range but happening at an unexpected time of day) or subtle changes in machine behaviour. Machines are revamped periodically as part of planned maintenance programmes, which change the assumptions on which original AI models were created and trained. Aim: This research study aims to deliver a Graph Neural Network(GNN)based hybrid forecasting model that interfaces with the real-time machine control systemand can detect, predict machine behaviour and behavioural changes (anomalies) in real-time. This research will help manufacturing industries and utilities, e.g., water, electricity etc., reduce unplanned downtimes and consequential financial losses. Method: The data stored within a process control system, e.g., Industrial-IoT, Data Historian, is generally sampled during data acquisition from the sensor (source) and whenpersistingin the Data Historian to optimise storage and query performance. The sampling may inadvertently discard values that might contain subtle aspects of behavioural changes in machines. This research proposed a hybrid forecasting and classification model which combines the expressive and extrapolation capability of GNN enhanced with the estimates of entropy and spectral changes in the sampled data and additional temporal contexts to reconstruct the likely temporal trajectory of machine behavioural changes. The proposed real-time model belongs to the Deep Learning category of machine learning and interfaces with the sensors directly or through 'Process Data Historian', SCADA etc., to perform forecasting and classification tasks. Results: The model was interfaced with a Data Historianholding time-series data from 4flow sensors within a water treatment plantfor45 days. The recorded sampling interval for a sensor varied from 10 sec to 30 min. Approximately 65% of the available data was used for training the model, 20% for validation, and the rest for testing. The model identified the anomalies within the water treatment plant and predicted the plant's performance. These results were compared with the data reported by the plant SCADA-Historian system and the official data reported by the plant authorities. The model's accuracy was much higher (20%) than that reported by the SCADA-Historian system and matched the validated results declared by the plant auditors. Conclusions: The research demonstrates that a hybrid GNN based approach enhanced with entropy calculation and spectral information can effectively detect and predict a machine's behavioural changes. The model can interface with a plant's 'process control system' in real-time to perform forecasting and classification tasks to aid the asset management engineers to operate their machines more efficiently and reduce unplanned downtimes. A series of trialsare planned for this model in the future in other manufacturing industries.

Keywords: GNN, Entropy, anomaly detection, industrial time-series, AI, IoT, Industry 4.0, Machine Learning

Procedia PDF Downloads 114
38 General Architecture for Automation of Machine Learning Practices

Authors: U. Borasi, Amit Kr. Jain, Rakesh, Piyush Jain

Abstract:

Data collection, data preparation, model training, model evaluation, and deployment are all processes in a typical machine learning workflow. Training data needs to be gathered and organised. This often entails collecting a sizable dataset and cleaning it to remove or correct any inaccurate or missing information. Preparing the data for use in the machine learning model requires pre-processing it after it has been acquired. This often entails actions like scaling or normalising the data, handling outliers, selecting appropriate features, reducing dimensionality, etc. This pre-processed data is then used to train a model on some machine learning algorithm. After the model has been trained, it needs to be assessed by determining metrics like accuracy, precision, and recall, utilising a test dataset. Every time a new model is built, both data pre-processing and model training—two crucial processes in the Machine learning (ML) workflow—must be carried out. Thus, there are various Machine Learning algorithms that can be employed for every single approach to data pre-processing, generating a large set of combinations to choose from. Example: for every method to handle missing values (dropping records, replacing with mean, etc.), for every scaling technique, and for every combination of features selected, a different algorithm can be used. As a result, in order to get the optimum outcomes, these tasks are frequently repeated in different combinations. This paper suggests a simple architecture for organizing this largely produced “combination set of pre-processing steps and algorithms” into an automated workflow which simplifies the task of carrying out all possibilities.

Keywords: machine learning, automation, AUTOML, architecture, operator pool, configuration, scheduler

Procedia PDF Downloads 20
37 Effect of a Mindfulness Application on Graduate Nursing Student’s Stress and Anxiety

Authors: Susan K. Steele-Moses, Aimee Badeaux

Abstract:

Background Literature: Nurse anesthesia education placed high demands on students both personally and professionally. High levels of anxiety affect student’s mental, emotional, and physical well-being, which impacts their student success. Whereas more research has focused on the health and well-being of graduate students, far less has focused specifically on nurse anesthesia students (SNRAs), who may experience higher levels of anxiety due to the rigor of their academic program. Current literature describes stressors experienced by SRNAs that cause anxiety and affect their performance, including personal, academic, clinical, interpersonal, emotional, and financial. Sample: DNP-NA 2025 and DNP-NA 2024 cohorts (N = 36). Eighteen (66.7%) students participated in the study. Instrumentation: The DASS-21 was used to measure stress (7 items; α = .87) and anxiety (7 items; α = .74) from the participants. Intervention: The mind-shift meditation app, based on cognitive behavioral therapy, is being used daily before clinical and exams to decrease nurse anesthesia students’ stress and anxiety over time. Results: At baseline, the students exhibited a moderate level of stress, but their anxiety levels were low. The range of scores was 4-21 (out of 28) for stress (M = 12.88; SD = 5.40) and 0-16 (out of 28) for anxiety (M = 6.81; SD = 5.04). Both stress and anxiety were normally distributed [SW = .242 (stress); SW = .210 (anxiety)] without any outliers. There was a significant difference between their stress and anxiety levels (t = 5.55; p < .001) at baseline. Stress and anxiety will be measured over time, with the change analyzed using repeated measures ANOVA. Implications for Practice: The use of purposeful mindfulness meditation has been shown to decrease stress and anxiety in nursing students.

Keywords: mindfulness, meditation, graduate nursing education, nursing education

Procedia PDF Downloads 47
36 Identification of Outliers in Flood Frequency Analysis: Comparison of Original and Multiple Grubbs-Beck Test

Authors: Ayesha S. Rahman, Khaled Haddad, Ataur Rahman

Abstract:

At-site flood frequency analysis is used to estimate flood quantiles when at-site record length is reasonably long. In Australia, FLIKE software has been introduced for at-site flood frequency analysis. The advantage of FLIKE is that, for a given application, the user can compare a number of most commonly adopted probability distributions and parameter estimation methods relatively quickly using a windows interface. The new version of FLIKE has been incorporated with the multiple Grubbs and Beck test which can identify multiple numbers of potentially influential low flows. This paper presents a case study considering six catchments in eastern Australia which compares two outlier identification tests (original Grubbs and Beck test and multiple Grubbs and Beck test) and two commonly applied probability distributions (Generalized Extreme Value (GEV) and Log Pearson type 3 (LP3)) using FLIKE software. It has been found that the multiple Grubbs and Beck test when used with LP3 distribution provides more accurate flood quantile estimates than when LP3 distribution is used with the original Grubbs and Beck test. Between these two methods, the differences in flood quantile estimates have been found to be up to 61% for the six study catchments. It has also been found that GEV distribution (with L moments) and LP3 distribution with the multiple Grubbs and Beck test provide quite similar results in most of the cases; however, a difference up to 38% has been noted for flood quantiles for annual exceedance probability (AEP) of 1 in 100 for one catchment. These findings need to be confirmed with a greater number of stations across other Australian states.

Keywords: floods, FLIKE, probability distributions, flood frequency, outlier

Procedia PDF Downloads 408
35 Controlling the Process of a Chicken Dressing Plant through Statistical Process Control

Authors: Jasper Kevin C. Dionisio, Denise Mae M. Unsay

Abstract:

In a manufacturing firm, controlling the process ensures that optimum efficiency, productivity, and quality in an organization are achieved. An operation with no standardized procedure yields a poor productivity, inefficiency, and an out of control process. This study focuses on controlling the small intestine processing of a chicken dressing plant through the use of Statistical Process Control (SPC). Since the operation does not employ a standard procedure and does not have an established standard time, the process through the assessment of the observed time of the overall operation of small intestine processing, through the use of X-Bar R Control Chart, is found to be out of control. In the solution of this problem, the researchers conduct a motion and time study aiming to establish a standard procedure for the operation. The normal operator was picked through the use of Westinghouse Rating System. Instead of utilizing the traditional motion and time study, the researchers used the X-Bar R Control Chart in determining the process average of the process that is used for establishing the standard time. The observed time of the normal operator was noted and plotted to the X-Bar R Control Chart. Out of control points that are due to assignable cause were removed and the process average, or the average time the normal operator conducted the process, which was already in control and free form any outliers, was obtained. The process average was then used in determining the standard time of small intestine processing. As a recommendation, the researchers suggest the implementation of the standard time established which is with consonance to the standard procedure which was adopted from the normal operator. With that recommendation, the whole operation will induce a 45.54 % increase in their productivity.

Keywords: motion and time study, process controlling, statistical process control, X-Bar R Control chart

Procedia PDF Downloads 178
34 Examining the Relationship between Family Functioning and Perceived Self-Efficacy

Authors: Fenni Sim

Abstract:

Objectives: The purpose of the study is to examine the relationship between family functioning and level of self-efficacy: how family functioning can potentially affect self-efficacy which will eventually lead to better clinical outcomes. The hypothesis was ‘Patients on haemodialysis with perceived higher family functioning are more likely to have higher perceived level of self-efficacy’. Methods: The study was conducted with a mixed methodology of quantitative and qualitative data collection of survey and semi-structured interview respectively. The General Self-Efficacy scale and SCORE-15 were self-administered by participants. The data will be analysed with correlation analysis method using Microsoft Excel. 79 patients were recruited for the study through random sampling. 6 participants whose results did not reflect the hypothesis were then recruited for the qualitative study. Interpretive phemenological analysis was then used to analyse the qualitative data. Findings: The hypothesis was accepted that higher family functioning leads to higher perceived self-efficacy. The correlation coefficient of -0.21 suggested a mild correlation between the two variables. However, only 4.6% of the variation in perceived self-efficacy is accounted by the variation in family functioning. The qualitative study extrapolated three themes that might explain the variations in the outliers: (1) level of physical functioning affects perceived self-efficacy, (2) instrumental support from family influenced perceived level of family functioning, and self-efficacy, (3) acceptance of illness reflects higher level of self-efficacy. Conclusion: While family functioning does have an impact on perceived self-efficacy, there are many intrapersonal and physical factors that may affect self-efficacy. The concepts of family functioning and self-efficacy are more appropriately seen as complementing each other to help a patient in managing his illness. Healthcare social workers can look at how family functioning is supporting the individual needs of patients with different trajectory of ESRD and the support we can provide to improve one’s self-efficacy.

Keywords: chronic kidney disease, coping of illness, family functioning, self efficacy

Procedia PDF Downloads 145
33 An Improved Robust Algorithm Based on Cubature Kalman Filter for Single-Frequency Global Navigation Satellite System/Inertial Navigation Tightly Coupled System

Authors: Hao Wang, Shuguo Pan

Abstract:

The Global Navigation Satellite System (GNSS) signal received by the dynamic vehicle in the harsh environment will be frequently interfered with and blocked, which generates gross error affecting the positioning accuracy of the GNSS/Inertial Navigation System (INS) integrated navigation. Therefore, this paper put forward an improved robust Cubature Kalman filter (CKF) algorithm for single-frequency GNSS/INS tightly coupled system ambiguity resolution. Firstly, the dynamic model and measurement model of a single-frequency GNSS/INS tightly coupled system was established, and the method for GNSS integer ambiguity resolution with INS aided is studied. Then, we analyzed the influence of pseudo-range observation with gross error on GNSS/INS integrated positioning accuracy. To reduce the influence of outliers, this paper improved the CKF algorithm and realized an intelligent selection of robust strategies by judging the ill-conditioned matrix. Finally, a field navigation test was performed to demonstrate the effectiveness of the proposed algorithm based on the double-differenced solution mode. The experiment has proved the improved robust algorithm can greatly weaken the influence of separate, continuous, and hybrid observation anomalies for enhancing the reliability and accuracy of GNSS/INS tightly coupled navigation solutions.

Keywords: GNSS/INS integrated navigation, ambiguity resolution, Cubature Kalman filter, Robust algorithm

Procedia PDF Downloads 63
32 Comparison between Some of Robust Regression Methods with OLS Method with Application

Authors: Sizar Abed Mohammed, Zahraa Ghazi Sadeeq

Abstract:

The use of the classic method, least squares (OLS) to estimate the linear regression parameters, when they are available assumptions, and capabilities that have good characteristics, such as impartiality, minimum variance, consistency, and so on. The development of alternative statistical techniques to estimate the parameters, when the data are contaminated with outliers. These are powerful methods (or resistance). In this paper, three of robust methods are studied, which are: Maximum likelihood type estimate M-estimator, Modified Maximum likelihood type estimate MM-estimator and Least Trimmed Squares LTS-estimator, and their results are compared with OLS method. These methods applied to real data taken from Duhok company for manufacturing furniture, the obtained results compared by using the criteria: Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE) and Mean Sum of Absolute Error (MSAE). Important conclusions that this study came up with are: a number of typical values detected by using four methods in the furniture line and very close to the data. This refers to the fact that close to the normal distribution of standard errors, but typical values in the doors line data, using OLS less than that detected by the powerful ways. This means that the standard errors of the distribution are far from normal departure. Another important conclusion is that the estimated values of the parameters by using the lifeline is very far from the estimated values using powerful methods for line doors, gave LTS- destined better results using standard MSE, and gave the M- estimator better results using standard MAPE. Moreover, we noticed that using standard MSAE, and MM- estimator is better. The programs S-plus (version 8.0, professional 2007), Minitab (version 13.2) and SPSS (version 17) are used to analyze the data.

Keywords: Robest, LTS, M estimate, MSE

Procedia PDF Downloads 206
31 Interpersonal Variation of Salivary Microbiota Using Denaturing Gradient Gel Electrophoresis

Authors: Manjula Weerasekera, Chris Sissons, Lisa Wong, Sally Anderson, Ann Holmes, Richard Cannon

Abstract:

The aim of this study was to characterize bacterial population and yeasts in saliva by Polymerase chain reaction followed by denaturing gradient gel electrophoresis (PCR-DGGE) and measure yeast levels by culture. PCR-DGGE was performed to identify oral bacteria and yeasts in 24 saliva samples. DNA was extracted and used to generate DNA amplicons of the V2–V3 hypervariable region of the bacterial 16S rDNA gene using PCR. Further universal primers targeting the large subunit rDNA gene (25S-28S) of fungi were used to amplify yeasts present in human saliva. Resulting PCR products were subjected to denaturing gradient gel electrophoresis using Universal mutation detection system. DGGE bands were extracted and sequenced using Sanger method. A potential relationship was evaluated between groups of bacteria identified by cluster analysis of DGGE fingerprints with the yeast levels and with their diversity. Significant interpersonal variation of salivary microbiome was observed. Cluster and principal component analysis of the bacterial DGGE patterns yielded three significant major clusters, and outliers. Seventeen of the 24 (71%) saliva samples were yeast positive going up to 10³ cfu/mL. Predominately, C. albicans, and six other species of yeast were detected. The presence, amount and species of yeast showed no clear relationship to the bacterial clusters. Microbial community in saliva showed a significant variation between individuals. A lack of association between yeasts and the bacterial fingerprints in saliva suggests the significant ecological person-specific independence in highly complex oral biofilm systems under normal oral conditions.

Keywords: bacteria, denaturing gradient gel electrophoresis, oral biofilm, yeasts

Procedia PDF Downloads 194
30 Modeling Average Paths Traveled by Ferry Vessels Using AIS Data

Authors: Devin Simmons

Abstract:

At the USDOT’s Bureau of Transportation Statistics, a biannual census of ferry operators in the U.S. is conducted, with results such as route mileage used to determine federal funding levels for operators. AIS data allows for the possibility of using GIS software and geographical methods to confirm operator-reported mileage for individual ferry routes. As part of the USDOT’s work on the ferry census, an algorithm was developed that uses AIS data for ferry vessels in conjunction with known ferry terminal locations to model the average route travelled for use as both a cartographic product and confirmation of operator-reported mileage. AIS data from each vessel is first analyzed to determine individual journeys based on the vessel’s velocity, and changes in velocity over time. These trips are then converted to geographic linestring objects. Using the terminal locations, the algorithm then determines whether the trip represented a known ferry route. Given a large enough dataset, routes will be represented by multiple trip linestrings, which are then filtered by DBSCAN spatial clustering to remove outliers. Finally, these remaining trips are ready to be averaged into one route. The algorithm interpolates the point on each trip linestring that represents the start point. From these start points, a centroid is calculated, and the first point of the average route is determined. Each trip is interpolated again to find the point that represents one percent of the journey’s completion, and the centroid of those points is used as the next point in the average route, and so on until 100 points have been calculated. Routes created using this algorithm have shown demonstrable improvement over previous methods, which included the implementation of a LOESS model. Additionally, the algorithm greatly reduces the amount of manual digitizing needed to visualize ferry activity.

Keywords: ferry vessels, transportation, modeling, AIS data

Procedia PDF Downloads 131
29 Implementation of Algorithm K-Means for Grouping District/City in Central Java Based on Macro Economic Indicators

Authors: Nur Aziza Luxfiati

Abstract:

Clustering is partitioning data sets into sub-sets or groups in such a way that elements certain properties have shared property settings with a high level of similarity within one group and a low level of similarity between groups. . The K-Means algorithm is one of thealgorithmsclustering as a grouping tool that is most widely used in scientific and industrial applications because the basic idea of the kalgorithm is-means very simple. In this research, applying the technique of clustering using the k-means algorithm as a method of solving the problem of national development imbalances between regions in Central Java Province based on macroeconomic indicators. The data sample used is secondary data obtained from the Central Java Provincial Statistics Agency regarding macroeconomic indicator data which is part of the publication of the 2019 National Socio-Economic Survey (Susenas) data. score and determine the number of clusters (k) using the elbow method. After the clustering process is carried out, the validation is tested using themethodsBetween-Class Variation (BCV) and Within-Class Variation (WCV). The results showed that detection outlier using z-score normalization showed no outliers. In addition, the results of the clustering test obtained a ratio value that was not high, namely 0.011%. There are two district/city clusters in Central Java Province which have economic similarities based on the variables used, namely the first cluster with a high economic level consisting of 13 districts/cities and theclustersecondwith a low economic level consisting of 22 districts/cities. And in the cluster second, namely, between low economies, the authors grouped districts/cities based on similarities to macroeconomic indicators such as 20 districts of Gross Regional Domestic Product, with a Poverty Depth Index of 19 districts, with 5 districts in Human Development, and as many as Open Unemployment Rate. 10 districts.

Keywords: clustering, K-Means algorithm, macroeconomic indicators, inequality, national development

Procedia PDF Downloads 127
28 Constructing the Joint Mean-Variance Regions for Univariate and Bivariate Normal Distributions: Approach Based on the Measure of Cumulative Distribution Functions

Authors: Valerii Dashuk

Abstract:

The usage of the confidence intervals in economics and econometrics is widespread. To be able to investigate a random variable more thoroughly, joint tests are applied. One of such examples is joint mean-variance test. A new approach for testing such hypotheses and constructing confidence sets is introduced. Exploring both the value of the random variable and its deviation with the help of this technique allows checking simultaneously the shift and the probability of that shift (i.e., portfolio risks). Another application is based on the normal distribution, which is fully defined by mean and variance, therefore could be tested using the introduced approach. This method is based on the difference of probability density functions. The starting point is two sets of normal distribution parameters that should be compared (whether they may be considered as identical with given significance level). Then the absolute difference in probabilities at each 'point' of the domain of these distributions is calculated. This measure is transformed to a function of cumulative distribution functions and compared to the critical values. Critical values table was designed from the simulations. The approach was compared with the other techniques for the univariate case. It differs qualitatively and quantitatively in easiness of implementation, computation speed, accuracy of the critical region (theoretical vs. real significance level). Stable results when working with outliers and non-normal distributions, as well as scaling possibilities, are also strong sides of the method. The main advantage of this approach is the possibility to extend it to infinite-dimension case, which was not possible in the most of the previous works. At the moment expansion to 2-dimensional state is done and it allows to test jointly up to 5 parameters. Therefore the derived technique is equivalent to classic tests in standard situations but gives more efficient alternatives in nonstandard problems and on big amounts of data.

Keywords: confidence set, cumulative distribution function, hypotheses testing, normal distribution, probability density function

Procedia PDF Downloads 145
27 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 73
26 Contrasted Mean and Median Models in Egyptian Stock Markets

Authors: Mai A. Ibrahim, Mohammed El-Beltagy, Motaz Khorshid

Abstract:

Emerging Markets return distributions have shown significance departure from normality were they are characterized by fatter tails relative to the normal distribution and exhibit levels of skewness and kurtosis that constitute a significant departure from normality. Therefore, the classical Markowitz Mean-Variance is not applicable for emerging markets since it assumes normally-distributed returns (with zero skewness and kurtosis) and a quadratic utility function. Moreover, the Markowitz mean-variance analysis can be used in cases of moderate non-normality and it still provides a good approximation of the expected utility, but it may be ineffective under large departure from normality. Higher moments models and median models have been suggested in the literature for asset allocation in this case. Higher moments models have been introduced to account for the insufficiency of the description of a portfolio by only its first two moments while the median model has been introduced as a robust statistic which is less affected by outliers than the mean. Tail risk measures such as Value-at Risk (VaR) and Conditional Value-at-Risk (CVaR) have been introduced instead of Variance to capture the effect of risk. In this research, higher moment models including the Mean-Variance-Skewness (MVS) and Mean-Variance-Skewness-Kurtosis (MVSK) are formulated as single-objective non-linear programming problems (NLP) and median models including the Median-Value at Risk (MedVaR) and Median-Mean Absolute Deviation (MedMAD) are formulated as a single-objective mixed-integer linear programming (MILP) problems. The higher moment models and median models are compared to some benchmark portfolios and tested on real financial data in the Egyptian main Index EGX30. The results show that all the median models outperform the higher moment models were they provide higher final wealth for the investor over the entire period of study. In addition, the results have confirmed the inapplicability of the classical Markowitz Mean-Variance to the Egyptian stock market as it resulted in very low realized profits.

Keywords: Egyptian stock exchange, emerging markets, higher moment models, median models, mixed-integer linear programming, non-linear programming

Procedia PDF Downloads 281
25 Principal Component Analysis Combined Machine Learning Techniques on Pharmaceutical Samples by Laser Induced Breakdown Spectroscopy

Authors: Kemal Efe Eseller, Göktuğ Yazici

Abstract:

Laser-induced breakdown spectroscopy (LIBS) is a rapid optical atomic emission spectroscopy which is used for material identification and analysis with the advantages of in-situ analysis, elimination of intensive sample preparation, and micro-destructive properties for the material to be tested. LIBS delivers short pulses of laser beams onto the material in order to create plasma by excitation of the material to a certain threshold. The plasma characteristics, which consist of wavelength value and intensity amplitude, depends on the material and the experiment’s environment. In the present work, medicine samples’ spectrum profiles were obtained via LIBS. Medicine samples’ datasets include two different concentrations for both paracetamol based medicines, namely Aferin and Parafon. The spectrum data of the samples were preprocessed via filling outliers based on quartiles, smoothing spectra to eliminate noise and normalizing both wavelength and intensity axis. Statistical information was obtained and principal component analysis (PCA) was incorporated to both the preprocessed and raw datasets. The machine learning models were set based on two different train-test splits, which were 70% training – 30% test and 80% training – 20% test. Cross-validation was preferred to protect the models against overfitting; thus the sample amount is small. The machine learning results of preprocessed and raw datasets were subjected to comparison for both splits. This is the first time that all supervised machine learning classification algorithms; consisting of Decision Trees, Discriminant, naïve Bayes, Support Vector Machines (SVM), k-NN(k-Nearest Neighbor) Ensemble Learning and Neural Network algorithms; were incorporated to LIBS data of paracetamol based pharmaceutical samples, and their different concentrations on preprocessed and raw dataset in order to observe the effect of preprocessing.

Keywords: machine learning, laser-induced breakdown spectroscopy, medicines, principal component analysis, preprocessing

Procedia PDF Downloads 63
24 Determination of Community Based Reference Interval of Aspartate Aminotransferase to Platelet Ratio Index (APRI) among Healthy Populations in Mekelle City Tigray, Northern Ethiopia

Authors: Getachew Belay Kassahun

Abstract:

Background: Aspartate aminotransferase to Platelet Ratio Index (APRI) currently becomes a biomarker for screening liver fibrosis since liver biopsy procedure is invasive and variation in pathological interpretation. Clinical Laboratory Standard Institute recommends establishing age, sex and environment specific reference interval for biomarkers in a homogenous population. The current study was aimed to derive community based reference interval of APRI aged between 12 and 60 years old in Mekelle city Tigrai, Northern Ethiopia. Method: Six hundred eighty eight study participants were collected from three districts in Mekelle city. The 3 districts were selected through random sampling technique and sample size to kebelles (small administration) were distributed proportional to household number in each district. Lottery method was used at household level if more than 2 study participants to each age partition were found. A community based cross sectional in a total of 534 study participants, 264 male and 270 females, were included in the final laboratory and data analysis but around 154 study participants were excluded through exclusion criteria. Aspartate aminotransferase was analyzed through Biosystem chemistry analyzer and Sysmix machine was used to analyze platelet. Man Whitney U test non parametric stastical tool was used to appreciate stastical difference among gender after excluding the outliers through Box and Whisker. Result: The study appreciated stastical difference among gender for APRI reference interval. The combined, male and female reference interval in the current study was 0.098-0.390, 0.133-0.428 and 0.090-0.319 respectively. The upper and lower reference interval of males was higher than females in all age partition and there was no stastical difference (p-value (<0.05)) between age partition. Conclusion: The current study showed using sex specific reference interval is significant to APRI biomarker in clinical practice for result interpretation.

Keywords: reference interval, aspartate aminotransferase to platelet ratio Index, Ethiopia, tigray

Procedia PDF Downloads 57
23 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 266
22 Improved Computational Efficiency of Machine Learning Algorithm Based on Evaluation Metrics to Control the Spread of Coronavirus in the UK

Authors: Swathi Ganesan, Nalinda Somasiri, Rebecca Jeyavadhanam, Gayathri Karthick

Abstract:

The COVID-19 crisis presents a substantial and critical hazard to worldwide health. Since the occurrence of the disease in late January 2020 in the UK, the number of infected people confirmed to acquire the illness has increased tremendously across the country, and the number of individuals affected is undoubtedly considerably high. The purpose of this research is to figure out a predictive machine learning archetypal that could forecast COVID-19 cases within the UK. This study concentrates on the statistical data collected from 31st January 2020 to 31st March 2021 in the United Kingdom. Information on total COVID cases registered, new cases encountered on a daily basis, total death registered, and patients’ death per day due to Coronavirus is collected from World Health Organisation (WHO). Data preprocessing is carried out to identify any missing values, outliers, or anomalies in the dataset. The data is split into 8:2 ratio for training and testing purposes to forecast future new COVID cases. Support Vector Machines (SVM), Random Forests, and linear regression algorithms are chosen to study the model performance in the prediction of new COVID-19 cases. From the evaluation metrics such as r-squared value and mean squared error, the statistical performance of the model in predicting the new COVID cases is evaluated. Random Forest outperformed the other two Machine Learning algorithms with a training accuracy of 99.47% and testing accuracy of 98.26% when n=30. The mean square error obtained for Random Forest is 4.05e11, which is lesser compared to the other predictive models used for this study. From the experimental analysis Random Forest algorithm can perform more effectively and efficiently in predicting the new COVID cases, which could help the health sector to take relevant control measures for the spread of the virus.

Keywords: COVID-19, machine learning, supervised learning, unsupervised learning, linear regression, support vector machine, random forest

Procedia PDF Downloads 86
21 Methodology for the Multi-Objective Analysis of Data Sets in Freight Delivery

Authors: Dale Dzemydiene, Aurelija Burinskiene, Arunas Miliauskas, Kristina Ciziuniene

Abstract:

Data flow and the purpose of reporting the data are different and dependent on business needs. Different parameters are reported and transferred regularly during freight delivery. This business practices form the dataset constructed for each time point and contain all required information for freight moving decisions. As a significant amount of these data is used for various purposes, an integrating methodological approach must be developed to respond to the indicated problem. The proposed methodology contains several steps: (1) collecting context data sets and data validation; (2) multi-objective analysis for optimizing freight transfer services. For data validation, the study involves Grubbs outliers analysis, particularly for data cleaning and the identification of statistical significance of data reporting event cases. The Grubbs test is often used as it measures one external value at a time exceeding the boundaries of standard normal distribution. In the study area, the test was not widely applied by authors, except when the Grubbs test for outlier detection was used to identify outsiders in fuel consumption data. In the study, the authors applied the method with a confidence level of 99%. For the multi-objective analysis, the authors would like to select the forms of construction of the genetic algorithms, which have more possibilities to extract the best solution. For freight delivery management, the schemas of genetic algorithms' structure are used as a more effective technique. Due to that, the adaptable genetic algorithm is applied for the description of choosing process of the effective transportation corridor. In this study, the multi-objective genetic algorithm methods are used to optimize the data evaluation and select the appropriate transport corridor. The authors suggest a methodology for the multi-objective analysis, which evaluates collected context data sets and uses this evaluation to determine a delivery corridor for freight transfer service in the multi-modal transportation network. In the multi-objective analysis, authors include safety components, the number of accidents a year, and freight delivery time in the multi-modal transportation network. The proposed methodology has practical value in the management of multi-modal transportation processes.

Keywords: multi-objective, analysis, data flow, freight delivery, methodology

Procedia PDF Downloads 153
20 A Comparative Study of Optimization Techniques and Models to Forecasting Dengue Fever

Authors: Sudha T., Naveen C.

Abstract:

Dengue is a serious public health issue that causes significant annual economic and welfare burdens on nations. However, enhanced optimization techniques and quantitative modeling approaches can predict the incidence of dengue. By advocating for a data-driven approach, public health officials can make informed decisions, thereby improving the overall effectiveness of sudden disease outbreak control efforts. The National Oceanic and Atmospheric Administration and the Centers for Disease Control and Prevention are two of the U.S. Federal Government agencies from which this study uses environmental data. Based on environmental data that describe changes in temperature, precipitation, vegetation, and other factors known to affect dengue incidence, many predictive models are constructed that use different machine learning methods to estimate weekly dengue cases. The first step involves preparing the data, which includes handling outliers and missing values to make sure the data is prepared for subsequent processing and the creation of an accurate forecasting model. In the second phase, multiple feature selection procedures are applied using various machine learning models and optimization techniques. During the third phase of the research, machine learning models like the Huber Regressor, Support Vector Machine, Gradient Boosting Regressor (GBR), and Support Vector Regressor (SVR) are compared with several optimization techniques for feature selection, such as Harmony Search and Genetic Algorithm. In the fourth stage, the model's performance is evaluated using Mean Square Error (MSE), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) as assistance. Selecting an optimization strategy with the least number of errors, lowest price, biggest productivity, or maximum potential results is the goal. In a variety of industries, including engineering, science, management, mathematics, finance, and medicine, optimization is widely employed. An effective optimization method based on harmony search and an integrated genetic algorithm is introduced for input feature selection, and it shows an important improvement in the model's predictive accuracy. The predictive models with Huber Regressor as the foundation perform the best for optimization and also prediction.

Keywords: deep learning model, dengue fever, prediction, optimization

Procedia PDF Downloads 16