Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 83

Search results for: outliers

53 Anomaly Detection Based Fuzzy K-Mode Clustering for Categorical Data

Abstract:

Anomalies are irregularities found in data that do not adhere to a well-defined standard of normal behavior. The identification of outliers or anomalies in data has been a subject of study within the statistics field since the 1800s. Over time, a variety of anomaly detection techniques have been developed in several research communities. The cluster analysis can be used to detect anomalies. It is the process of associating data with clusters that are as similar as possible while dissimilar clusters are associated with each other. Many of the traditional cluster algorithms have limitations in dealing with data sets containing categorical properties. To detect anomalies in categorical data, fuzzy clustering approach can be used with its advantages. The fuzzy k-Mode (FKM) clustering algorithm, which is one of the fuzzy clustering approaches, by extension to the k-means algorithm, is reported for clustering datasets with categorical values. It is a form of clustering: each point can be associated with more than one cluster. In this paper, anomaly detection is performed on two simulated data by using the FKM cluster algorithm. As a significance of the study, the FKM cluster algorithm allows to determine anomalies with their abnormality degree in contrast to numerous anomaly detection algorithms. According to the results, the FKM cluster algorithm illustrated good performance in the anomaly detection of data, including both one anomaly and more than one anomaly.

Keywords: fuzzy k-mode clustering, anomaly detection, noise, categorical data

Procedia PDF Downloads 53

52 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 274

51 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size

Authors: Jude Opara, Esemokumo Perewarebo Akpos

Abstract:

This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.

Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS

Procedia PDF Downloads 305

50 Geospatial Network Analysis Using Particle Swarm Optimization

Authors: Varun Singh, Mainak Bandyopadhyay, Maharana Pratap Singh

Abstract:

The shortest path (SP) problem concerns with finding the shortest path from a specific origin to a specified destination in a given network while minimizing the total cost associated with the path. This problem has widespread applications. Important applications of the SP problem include vehicle routing in transportation systems particularly in the field of in-vehicle Route Guidance System (RGS) and traffic assignment problem (in transportation planning). Well known applications of evolutionary methods like Genetic Algorithms (GA), Ant Colony Optimization, Particle Swarm Optimization (PSO) have come up to solve complex optimization problems to overcome the shortcomings of existing shortest path analysis methods. It has been reported by various researchers that PSO performs better than other evolutionary optimization algorithms in terms of success rate and solution quality. Further Geographic Information Systems (GIS) have emerged as key information systems for geospatial data analysis and visualization. This research paper is focused towards the application of PSO for solving the shortest path problem between multiple points of interest (POI) based on spatial data of Allahabad City and traffic speed data collected using GPS. Geovisualization of results of analysis is carried out in GIS.

Keywords: particle swarm optimization, GIS, traffic data, outliers

Procedia PDF Downloads 483

49 Corporate Socially Responsible and Financial Performance in the Tourism-Related Industries

Authors: Yu Shan Wang

Abstract:

Different from other industries, the structure of the tourism industry depends to a large degree the environmental and cultural resources. The industry has to undertake social responsibilities for its commercial behaviour. This paper refers to the seven dimensions of the KLD STATS in 1991-2011 as the indicator to CSR practices. The purpose is to investigate what CSR activities create significant impacts on accounting-based financials and firm values by delving into different CSR dimensions. Meanwhile, this paper takes into consideration S&P 500 and control variables (firm sizes and financial leverage). In fact, the commercial behavior of the tourism-related industry may result in negative impacts on the economy and the society. Therefore, this paper classifies a positive set of CSR elements and a negative set of CSR elements for the tourism-related industry in order to examine their respective effects on short-term profitability and long-term firm values. This can shed light on which CSR dimensions exhibit significant impacts on CFP better than holistic CSR indicators, and hence provide more useful information to investors and corporates. This paper uses quantile regressions to avoid the impact of outliers in the data set. This helps to offer specific information so that companies can make informed decisions.

Keywords: corporate social responsibility, CSR, firm value, tourism, corporate financial performance, CFP

Procedia PDF Downloads 288

48 A Robust and Adaptive Unscented Kalman Filter for the Air Fine Alignment of the Strapdown Inertial Navigation System/GPS

Authors: Jian Shi, Baoguo Yu, Haonan Jia, Meng Liu, Ping Huang

Abstract:

Adapting to the flexibility of war, a large number of guided weapons launch from aircraft. Therefore, the inertial navigation system loaded in the weapon needs to undergo an alignment process in the air. This article proposes the following methods to the problem of inaccurate modeling of the system under large misalignment angles, the accuracy reduction of filtering caused by outliers, and the noise changes in GPS signals: first, considering the large misalignment errors of Strapdown Inertial Navigation System (SINS)/GPS, a more accurate model is made rather than to make a small-angle approximation, and the Unscented Kalman Filter (UKF) algorithms are used to estimate the state; then, taking into account the impact of GPS noise changes on the fine alignment algorithm, the innovation adaptive filtering algorithm is introduced to estimate the GPS’s noise in real-time; at the same time, in order to improve the anti-interference ability of the air fine alignment algorithm, a robust filtering algorithm based on outlier detection is combined with the air fine alignment algorithm to improve the robustness of the algorithm. The algorithm can improve the alignment accuracy and robustness under interference conditions, which is verified by simulation.

Keywords: air alignment, fine alignment, inertial navigation system, integrated navigation system, UKF

Procedia PDF Downloads 166

47 Long-Term Subcentimeter-Accuracy Landslide Monitoring Using a Cost-Effective Global Navigation Satellite System Rover Network: Case Study

Authors: Vincent Schlageter, Maroua Mestiri, Florian Denzinger, Hugo Raetzo, Michel Demierre

Abstract:

Precise landslide monitoring with differential global navigation satellite system (GNSS) is well known, but technical or economic reasons limit its application by geotechnical companies. This study demonstrates the reliability and the usefulness of Geomon (Infrasurvey Sàrl, Switzerland), a stand-alone and cost-effective rover network. The system permits deploying up to 15 rovers, plus one reference station for differential GNSS. A dedicated radio communication links all the modules to a base station, where an embedded computer automatically provides all the relative positions (L1 phase, open-source RTKLib software) and populates an Internet server. Each measure also contains information from an internal inclinometer, battery level, and position quality indices. Contrary to standard GNSS survey systems, which suffer from a limited number of beacons that must be placed in areas with good GSM signal, Geomon offers greater flexibility and permits a real overview of the whole landslide with good spatial resolution. Each module is powered with solar panels, ensuring autonomous long-term recordings. In this study, we have tested the system on several sites in the Swiss mountains, setting up to 7 rovers per site, for an 18 month-long survey. The aim was to assess the robustness and the accuracy of the system in different environmental conditions. In one case, we ran forced blind tests (vertical movements of a given amplitude) and compared various session parameters (duration from 10 to 90 minutes). Then the other cases were a survey of real landslides sites using fixed optimized parameters. Sub centimetric-accuracy with few outliers was obtained using the best parameters (session duration of 60 minutes, baseline 1 km or less), with the noise level on the horizontal component half that of the vertical one. The performance (percent of aborting solutions, outliers) was reduced with sessions shorter than 30 minutes. The environment also had a strong influence on the percent of aborting solutions (ambiguity search problem), due to multiple reflections or satellites obstructed by trees and mountains. The length of the baseline (distance reference-rover, single baseline processing) reduced the accuracy above 1 km but had no significant effect below this limit. In critical weather conditions, the system’s robustness was limited: snow, avalanche, and frost-covered some rovers, including the antenna and vertically oriented solar panels, leading to data interruption; and strong wind damaged a reference station. The possibility of changing the sessions’ parameters remotely was very useful. In conclusion, the rover network tested provided the foreseen sub-centimetric-accuracy while providing a dense spatial resolution landslide survey. The ease of implementation and the fully automatic long-term survey were timesaving. Performance strongly depends on surrounding conditions, but short pre-measures should allow moving a rover to a better final placement. The system offers a promising hazard mitigation technique. Improvements could include data post-processing for alerts and automatic modification of the duration and numbers of sessions based on battery level and rover displacement velocity.

Keywords: GNSS, GSM, landslide, long-term, network, solar, spatial resolution, sub-centimeter.

Procedia PDF Downloads 111

46 Radiological Assessment of Fish Samples Due to Natural Radionuclides in River Yobe, North Eastern Nigeria

Authors: H. T. Abba, Abbas Baba Kura

Abstract:

Assessment of natural radioactivity of some fish samples in river Yobe was conducted, using gamma spectroscopy method with NaI(TI) detector. Radioactivity is phenomenon that leads to production of radiations, whereas radiation is known to trigger or induce cancer. The fish were analyzed to estimate the radioactivity (activity) concentrations due to natural radionuclides (Radium 222(226Ra), Thorium 232 (232Th) and Potassium 40 (40K)). The obtained result show that the activity concentration for (226Ra), in all the fish samples collected ranges from 15.23±2.45 BqKg-1 to 67.39±2.13 BqKg-1 with an average value of 34.13±1.34 BqKg-1. That of 232Th, ranges from 42.66±0.81 BqKg-1 to 201.18±3.82 BqKg-1, and the average value stands at 96.01±3.82 BqKg-1. The activity concentration for 40K, ranges between 243.3±1.56 BqKg-1 to 618.2±2.81 BqKg-1 and the average is 413.92±1.7 BqKg-1. This study indicated that average daily intake due to natural activity from the fish is valued at 0.913 Bq/day, 2.577Bq/day and 11.088 Bq/day for 226Ra, 232Th and 40K respectively. This shows that the activity concentration values for fish, shows a promising result with most of the fish activity concentrations been within the acceptable limits. However locations (F02, F07 and F12) fish, became outliers with significant values of 112.53μSvy-1, 121.11μSvy-1 and 114.32μSvy-1 effective Dose. This could be attributed to variation in geological formations within the river as while as the feeding habits of these fish. The work shows that consumers of fish from River Yobe have no risk of radioactivity ingestion, even though no amount of radiation is assumed to be totally safe.

Keywords: radiation, radio-activity, dose, radionuclides, river Yobe

Procedia PDF Downloads 318

45 Pyramidal Lucas-Kanade Optical Flow Based Moving Object Detection in Dynamic Scenes

Authors: Hyojin Lim, Cuong Nguyen Khac, Yeongyu Choi, Ho-Youl Jung

Abstract:

In this paper, we propose a simple moving object detection, which is based on motion vectors obtained from pyramidal Lucas-Kanade optical flow. The proposed method detects moving objects such as pedestrians, the other vehicles and some obstacles at the front-side of the host vehicle, and it can provide the warning to the driver. Motion vectors are obtained by using pyramidal Lucas-Kanade optical flow, and some outliers are eliminated by comparing the amplitude of each vector with the pre-defined threshold value. The background model is obtained by calculating the mean and the variance of the amplitude of recent motion vectors in the rectangular shaped local region called the cell. The model is applied as the reference to classify motion vectors of moving objects and those of background. Motion vectors are clustered to rectangular regions by using the unsupervised clustering K-means algorithm. Labeling method is applied to label groups which is close to each other, using by distance between each center points of rectangular. Through the simulations tested on four kinds of scenarios such as approaching motorbike, vehicle, and pedestrians to host vehicle, we prove that the proposed is simple but efficient for moving object detection in parking lots.

Keywords: moving object detection, dynamic scene, optical flow, pyramidal optical flow

Procedia PDF Downloads 349

44 Re-Imagining and De-Constructing the Global Security Architecture

Authors: Smita Singh

Abstract:

The paper develops a critical framework to the hegemonic discourses resorted to by the dominant powers in the global security architecture. Within this framework, security is viewed as a discourse through which identities and threats are represented and produced to legitimize the security concerns of few at the cost of others. International security have long been driven and dominated by power relations. Since the end of the Cold War, the global transformations have triggered contestations to the idea of security at both theoretical and practical level. These widening and deepening of the concept of security have challenged the existing power hierarchies at the theoretical level but not altered the substance and actors defining it. When discourses are introduced into security studies, several critical questions erupt: how has power shaped security policies of the globe through language? How does one understand the meanings and impact of those discourses? Who decides the agenda, rules, players and outliers of the security? Language as a symbolic system and form of power is fluid and not fixed. Over the years the dominant Western powers, led by the United States of America have employed various discursive practices such as humanitarian intervention, responsibility to protect, non proliferation, human rights, war on terror and so on to reorient the constitution of identities and interests and hence the policies that need to be adopted for its actualization. These power relations are illustrated in this paper through the narratives used in the nonproliferation regime. The hierarchical security dynamics is a manifestation of the global power relations driven by many factors including discourses.

Keywords: hegemonic discourse, global security, non-proliferation regime, power politics

Procedia PDF Downloads 318

43 Hybrid GNN Based Machine Learning Forecasting Model For Industrial IoT Applications

Authors: Atish Bagchi, Siva Chandrasekaran

Abstract:

Background: According to World Bank national accounts data, the estimated global manufacturing value-added output in 2020 was 13.74 trillion USD. These manufacturing processes are monitored, modelled, and controlled by advanced, real-time, computer-based systems, e.g., Industrial IoT, PLC, SCADA, etc. These systems measure and manipulate a set of physical variables, e.g., temperature, pressure, etc. Despite the use of IoT, SCADA etc., in manufacturing, studies suggest that unplanned downtime leads to economic losses of approximately 864 billion USD each year. Therefore, real-time, accurate detection, classification and prediction of machine behaviour are needed to minimise financial losses. Although vast literature exists on time-series data processing using machine learning, the challenges faced by the industries that lead to unplanned downtimes are: The current algorithms do not efficiently handle the high-volume streaming data from industrial IoTsensors and were tested on static and simulated datasets. While the existing algorithms can detect significant 'point' outliers, most do not handle contextual outliers (e.g., values within normal range but happening at an unexpected time of day) or subtle changes in machine behaviour. Machines are revamped periodically as part of planned maintenance programmes, which change the assumptions on which original AI models were created and trained. Aim: This research study aims to deliver a Graph Neural Network(GNN)based hybrid forecasting model that interfaces with the real-time machine control systemand can detect, predict machine behaviour and behavioural changes (anomalies) in real-time. This research will help manufacturing industries and utilities, e.g., water, electricity etc., reduce unplanned downtimes and consequential financial losses. Method: The data stored within a process control system, e.g., Industrial-IoT, Data Historian, is generally sampled during data acquisition from the sensor (source) and whenpersistingin the Data Historian to optimise storage and query performance. The sampling may inadvertently discard values that might contain subtle aspects of behavioural changes in machines. This research proposed a hybrid forecasting and classification model which combines the expressive and extrapolation capability of GNN enhanced with the estimates of entropy and spectral changes in the sampled data and additional temporal contexts to reconstruct the likely temporal trajectory of machine behavioural changes. The proposed real-time model belongs to the Deep Learning category of machine learning and interfaces with the sensors directly or through 'Process Data Historian', SCADA etc., to perform forecasting and classification tasks. Results: The model was interfaced with a Data Historianholding time-series data from 4flow sensors within a water treatment plantfor45 days. The recorded sampling interval for a sensor varied from 10 sec to 30 min. Approximately 65% of the available data was used for training the model, 20% for validation, and the rest for testing. The model identified the anomalies within the water treatment plant and predicted the plant's performance. These results were compared with the data reported by the plant SCADA-Historian system and the official data reported by the plant authorities. The model's accuracy was much higher (20%) than that reported by the SCADA-Historian system and matched the validated results declared by the plant auditors. Conclusions: The research demonstrates that a hybrid GNN based approach enhanced with entropy calculation and spectral information can effectively detect and predict a machine's behavioural changes. The model can interface with a plant's 'process control system' in real-time to perform forecasting and classification tasks to aid the asset management engineers to operate their machines more efficiently and reduce unplanned downtimes. A series of trialsare planned for this model in the future in other manufacturing industries.

Keywords: GNN, Entropy, anomaly detection, industrial time-series, AI, IoT, Industry 4.0, Machine Learning

Procedia PDF Downloads 150

42 General Architecture for Automation of Machine Learning Practices

Authors: U. Borasi, Amit Kr. Jain, Rakesh, Piyush Jain

Abstract:

Data collection, data preparation, model training, model evaluation, and deployment are all processes in a typical machine learning workflow. Training data needs to be gathered and organised. This often entails collecting a sizable dataset and cleaning it to remove or correct any inaccurate or missing information. Preparing the data for use in the machine learning model requires pre-processing it after it has been acquired. This often entails actions like scaling or normalising the data, handling outliers, selecting appropriate features, reducing dimensionality, etc. This pre-processed data is then used to train a model on some machine learning algorithm. After the model has been trained, it needs to be assessed by determining metrics like accuracy, precision, and recall, utilising a test dataset. Every time a new model is built, both data pre-processing and model training—two crucial processes in the Machine learning (ML) workflow—must be carried out. Thus, there are various Machine Learning algorithms that can be employed for every single approach to data pre-processing, generating a large set of combinations to choose from. Example: for every method to handle missing values (dropping records, replacing with mean, etc.), for every scaling technique, and for every combination of features selected, a different algorithm can be used. As a result, in order to get the optimum outcomes, these tasks are frequently repeated in different combinations. This paper suggests a simple architecture for organizing this largely produced “combination set of pre-processing steps and algorithms” into an automated workflow which simplifies the task of carrying out all possibilities.

Keywords: machine learning, automation, AUTOML, architecture, operator pool, configuration, scheduler

Procedia PDF Downloads 57

41 Activation of Spermidine/Spermine N1-Acetyltransferase 1 (SSAT-1) as Biomarker in Breast Cancer

Authors: Rubina Ghani, Sehrish Zia, Afifa Fatima Rafique, Shaista Emad

Abstract:

Background: Cancer is a leading cause of death worldwide, with breast cancer being the most common cancer in women. Pakistan has the highest rate of breast cancer cases among Asian countries. Early and accurate diagnosis is crucial for treatment outcomes and quality of life. Method: It is a case-control study with a sample size of 150. There were 100 suspected cancer cases, 25 healthy controls, and 25 diagnosed cancer cases. To analyze SSAT-1 mRNA expression in whole blood, Zymo Research Quick-RNA Miniprep and Innu SCRIPT—One Step RT-PCR Syber Green kits were used. Patients were divided into three groups: 100 suspected cancer cases, 25 controls, and 25 confirmed breast cancer cases. Result: The total mRNA was isolated, and the expression of SSAT-1 was measured using RT-qPCR. The threshold cycle (Ct) values were used to determine the amount of each mRNA. Ct values were then calculated by taking the difference between the CtSSAT-1 and Ct GAPDH, and further Ct values were calculated with the median absolute deviation for all the samples within the same experimental group. Samples that did not correlate with the results were taken as outliers and excluded from the analysis. The relative fold change is shown as 2^-Ct values. Suspected cases showed a maximum fold change of 32.24, with a control fold change of 1.31. Conclusion: The study reveals an overexpression of SSAT-1 in breast cancer. Furthermore, we can use SSAT-1 as a diagnostic, prognostic, and therapeutic marker for early diagnosis of cancer.

Keywords: breast cancer, spermidine/spermine, qPCR, mRNA

Procedia PDF Downloads 37

40 Effect of a Mindfulness Application on Graduate Nursing Student’s Stress and Anxiety

Authors: Susan K. Steele-Moses, Aimee Badeaux

Abstract:

Background Literature: Nurse anesthesia education placed high demands on students both personally and professionally. High levels of anxiety affect student’s mental, emotional, and physical well-being, which impacts their student success. Whereas more research has focused on the health and well-being of graduate students, far less has focused specifically on nurse anesthesia students (SNRAs), who may experience higher levels of anxiety due to the rigor of their academic program. Current literature describes stressors experienced by SRNAs that cause anxiety and affect their performance, including personal, academic, clinical, interpersonal, emotional, and financial. Sample: DNP-NA 2025 and DNP-NA 2024 cohorts (N = 36). Eighteen (66.7%) students participated in the study. Instrumentation: The DASS-21 was used to measure stress (7 items; α = .87) and anxiety (7 items; α = .74) from the participants. Intervention: The mind-shift meditation app, based on cognitive behavioral therapy, is being used daily before clinical and exams to decrease nurse anesthesia students’ stress and anxiety over time. Results: At baseline, the students exhibited a moderate level of stress, but their anxiety levels were low. The range of scores was 4-21 (out of 28) for stress (M = 12.88; SD = 5.40) and 0-16 (out of 28) for anxiety (M = 6.81; SD = 5.04). Both stress and anxiety were normally distributed [SW = .242 (stress); SW = .210 (anxiety)] without any outliers. There was a significant difference between their stress and anxiety levels (t = 5.55; p < .001) at baseline. Stress and anxiety will be measured over time, with the change analyzed using repeated measures ANOVA. Implications for Practice: The use of purposeful mindfulness meditation has been shown to decrease stress and anxiety in nursing students.

Keywords: mindfulness, meditation, graduate nursing education, nursing education

Procedia PDF Downloads 83

39 Improving Cheon-Kim-Kim-Song (CKKS) Performance with Vector Computation and GPU Acceleration

Authors: Smaran Manchala

Abstract:

Homomorphic Encryption (HE) enables computations on encrypted data without requiring decryption, mitigating data vulnerability during processing. Usable Fully Homomorphic Encryption (FHE) could revolutionize secure data operations across cloud computing, AI training, and healthcare, providing both privacy and functionality, however, the computational inefficiency of schemes like Cheon-Kim-Kim-Song (CKKS) hinders their widespread practical use. This study focuses on optimizing CKKS for faster matrix operations through the implementation of vector computation parallelization and GPU acceleration. The variable effects of vector parallelization on GPUs were explored, recognizing that while parallelization typically accelerates operations, it could introduce overhead that results in slower runtimes, especially in smaller, less computationally demanding operations. To assess performance, two neural network models, MLPN and CNN—were tested on the MNIST dataset using both ARM and x86-64 architectures, with CNN chosen for its higher computational demands. Each test was repeated 1,000 times, and outliers were removed via Z-score analysis to measure the effect of vector parallelization on CKKS performance. Model accuracy was also evaluated under CKKS encryption to ensure optimizations did not compromise results. According to the results of the trail runs, applying vector parallelization had a 2.63X efficiency increase overall with a 1.83X performance increase for x86-64 over ARM architecture. Overall, these results suggest that the application of vector parallelization in tandem with GPU acceleration significantly improves the efficiency of CKKS even while accounting for vector parallelization overhead, providing impact in future zero trust operations.

Keywords: CKKS scheme, runtime efficiency, fully homomorphic encryption (FHE), GPU acceleration, vector parallelization

Procedia PDF Downloads 23

38 Identification of Outliers in Flood Frequency Analysis: Comparison of Original and Multiple Grubbs-Beck Test

Authors: Ayesha S. Rahman, Khaled Haddad, Ataur Rahman

Abstract:

At-site flood frequency analysis is used to estimate flood quantiles when at-site record length is reasonably long. In Australia, FLIKE software has been introduced for at-site flood frequency analysis. The advantage of FLIKE is that, for a given application, the user can compare a number of most commonly adopted probability distributions and parameter estimation methods relatively quickly using a windows interface. The new version of FLIKE has been incorporated with the multiple Grubbs and Beck test which can identify multiple numbers of potentially influential low flows. This paper presents a case study considering six catchments in eastern Australia which compares two outlier identification tests (original Grubbs and Beck test and multiple Grubbs and Beck test) and two commonly applied probability distributions (Generalized Extreme Value (GEV) and Log Pearson type 3 (LP3)) using FLIKE software. It has been found that the multiple Grubbs and Beck test when used with LP3 distribution provides more accurate flood quantile estimates than when LP3 distribution is used with the original Grubbs and Beck test. Between these two methods, the differences in flood quantile estimates have been found to be up to 61% for the six study catchments. It has also been found that GEV distribution (with L moments) and LP3 distribution with the multiple Grubbs and Beck test provide quite similar results in most of the cases; however, a difference up to 38% has been noted for flood quantiles for annual exceedance probability (AEP) of 1 in 100 for one catchment. These findings need to be confirmed with a greater number of stations across other Australian states.

Keywords: floods, FLIKE, probability distributions, flood frequency, outlier

Procedia PDF Downloads 450

37 Controlling the Process of a Chicken Dressing Plant through Statistical Process Control

Authors: Jasper Kevin C. Dionisio, Denise Mae M. Unsay

Abstract:

In a manufacturing firm, controlling the process ensures that optimum efficiency, productivity, and quality in an organization are achieved. An operation with no standardized procedure yields a poor productivity, inefficiency, and an out of control process. This study focuses on controlling the small intestine processing of a chicken dressing plant through the use of Statistical Process Control (SPC). Since the operation does not employ a standard procedure and does not have an established standard time, the process through the assessment of the observed time of the overall operation of small intestine processing, through the use of X-Bar R Control Chart, is found to be out of control. In the solution of this problem, the researchers conduct a motion and time study aiming to establish a standard procedure for the operation. The normal operator was picked through the use of Westinghouse Rating System. Instead of utilizing the traditional motion and time study, the researchers used the X-Bar R Control Chart in determining the process average of the process that is used for establishing the standard time. The observed time of the normal operator was noted and plotted to the X-Bar R Control Chart. Out of control points that are due to assignable cause were removed and the process average, or the average time the normal operator conducted the process, which was already in control and free form any outliers, was obtained. The process average was then used in determining the standard time of small intestine processing. As a recommendation, the researchers suggest the implementation of the standard time established which is with consonance to the standard procedure which was adopted from the normal operator. With that recommendation, the whole operation will induce a 45.54 % increase in their productivity.

Keywords: motion and time study, process controlling, statistical process control, X-Bar R Control chart

Procedia PDF Downloads 217

36 Examining the Relationship between Family Functioning and Perceived Self-Efficacy

Authors: Fenni Sim

Abstract:

Objectives: The purpose of the study is to examine the relationship between family functioning and level of self-efficacy: how family functioning can potentially affect self-efficacy which will eventually lead to better clinical outcomes. The hypothesis was ‘Patients on haemodialysis with perceived higher family functioning are more likely to have higher perceived level of self-efficacy’. Methods: The study was conducted with a mixed methodology of quantitative and qualitative data collection of survey and semi-structured interview respectively. The General Self-Efficacy scale and SCORE-15 were self-administered by participants. The data will be analysed with correlation analysis method using Microsoft Excel. 79 patients were recruited for the study through random sampling. 6 participants whose results did not reflect the hypothesis were then recruited for the qualitative study. Interpretive phemenological analysis was then used to analyse the qualitative data. Findings: The hypothesis was accepted that higher family functioning leads to higher perceived self-efficacy. The correlation coefficient of -0.21 suggested a mild correlation between the two variables. However, only 4.6% of the variation in perceived self-efficacy is accounted by the variation in family functioning. The qualitative study extrapolated three themes that might explain the variations in the outliers: (1) level of physical functioning affects perceived self-efficacy, (2) instrumental support from family influenced perceived level of family functioning, and self-efficacy, (3) acceptance of illness reflects higher level of self-efficacy. Conclusion: While family functioning does have an impact on perceived self-efficacy, there are many intrapersonal and physical factors that may affect self-efficacy. The concepts of family functioning and self-efficacy are more appropriately seen as complementing each other to help a patient in managing his illness. Healthcare social workers can look at how family functioning is supporting the individual needs of patients with different trajectory of ESRD and the support we can provide to improve one’s self-efficacy.

Keywords: chronic kidney disease, coping of illness, family functioning, self efficacy

Procedia PDF Downloads 173

35 An Improved Robust Algorithm Based on Cubature Kalman Filter for Single-Frequency Global Navigation Satellite System/Inertial Navigation Tightly Coupled System

Authors: Hao Wang, Shuguo Pan

Abstract:

The Global Navigation Satellite System (GNSS) signal received by the dynamic vehicle in the harsh environment will be frequently interfered with and blocked, which generates gross error affecting the positioning accuracy of the GNSS/Inertial Navigation System (INS) integrated navigation. Therefore, this paper put forward an improved robust Cubature Kalman filter (CKF) algorithm for single-frequency GNSS/INS tightly coupled system ambiguity resolution. Firstly, the dynamic model and measurement model of a single-frequency GNSS/INS tightly coupled system was established, and the method for GNSS integer ambiguity resolution with INS aided is studied. Then, we analyzed the influence of pseudo-range observation with gross error on GNSS/INS integrated positioning accuracy. To reduce the influence of outliers, this paper improved the CKF algorithm and realized an intelligent selection of robust strategies by judging the ill-conditioned matrix. Finally, a field navigation test was performed to demonstrate the effectiveness of the proposed algorithm based on the double-differenced solution mode. The experiment has proved the improved robust algorithm can greatly weaken the influence of separate, continuous, and hybrid observation anomalies for enhancing the reliability and accuracy of GNSS/INS tightly coupled navigation solutions.

Keywords: GNSS/INS integrated navigation, ambiguity resolution, Cubature Kalman filter, Robust algorithm

Procedia PDF Downloads 97

34 Comparison between Some of Robust Regression Methods with OLS Method with Application

Authors: Sizar Abed Mohammed, Zahraa Ghazi Sadeeq

Abstract:

The use of the classic method, least squares (OLS) to estimate the linear regression parameters, when they are available assumptions, and capabilities that have good characteristics, such as impartiality, minimum variance, consistency, and so on. The development of alternative statistical techniques to estimate the parameters, when the data are contaminated with outliers. These are powerful methods (or resistance). In this paper, three of robust methods are studied, which are: Maximum likelihood type estimate M-estimator, Modified Maximum likelihood type estimate MM-estimator and Least Trimmed Squares LTS-estimator, and their results are compared with OLS method. These methods applied to real data taken from Duhok company for manufacturing furniture, the obtained results compared by using the criteria: Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE) and Mean Sum of Absolute Error (MSAE). Important conclusions that this study came up with are: a number of typical values detected by using four methods in the furniture line and very close to the data. This refers to the fact that close to the normal distribution of standard errors, but typical values in the doors line data, using OLS less than that detected by the powerful ways. This means that the standard errors of the distribution are far from normal departure. Another important conclusion is that the estimated values of the parameters by using the lifeline is very far from the estimated values using powerful methods for line doors, gave LTS- destined better results using standard MSE, and gave the M- estimator better results using standard MAPE. Moreover, we noticed that using standard MSAE, and MM- estimator is better. The programs S-plus (version 8.0, professional 2007), Minitab (version 13.2) and SPSS (version 17) are used to analyze the data.

Keywords: Robest, LTS, M estimate, MSE

Procedia PDF Downloads 232

33 Interpersonal Variation of Salivary Microbiota Using Denaturing Gradient Gel Electrophoresis

Authors: Manjula Weerasekera, Chris Sissons, Lisa Wong, Sally Anderson, Ann Holmes, Richard Cannon

Abstract:

The aim of this study was to characterize bacterial population and yeasts in saliva by Polymerase chain reaction followed by denaturing gradient gel electrophoresis (PCR-DGGE) and measure yeast levels by culture. PCR-DGGE was performed to identify oral bacteria and yeasts in 24 saliva samples. DNA was extracted and used to generate DNA amplicons of the V2–V3 hypervariable region of the bacterial 16S rDNA gene using PCR. Further universal primers targeting the large subunit rDNA gene (25S-28S) of fungi were used to amplify yeasts present in human saliva. Resulting PCR products were subjected to denaturing gradient gel electrophoresis using Universal mutation detection system. DGGE bands were extracted and sequenced using Sanger method. A potential relationship was evaluated between groups of bacteria identified by cluster analysis of DGGE fingerprints with the yeast levels and with their diversity. Significant interpersonal variation of salivary microbiome was observed. Cluster and principal component analysis of the bacterial DGGE patterns yielded three significant major clusters, and outliers. Seventeen of the 24 (71%) saliva samples were yeast positive going up to 10³ cfu/mL. Predominately, C. albicans, and six other species of yeast were detected. The presence, amount and species of yeast showed no clear relationship to the bacterial clusters. Microbial community in saliva showed a significant variation between individuals. A lack of association between yeasts and the bacterial fingerprints in saliva suggests the significant ecological person-specific independence in highly complex oral biofilm systems under normal oral conditions.

Keywords: bacteria, denaturing gradient gel electrophoresis, oral biofilm, yeasts

Procedia PDF Downloads 222

32 Modeling Average Paths Traveled by Ferry Vessels Using AIS Data

Authors: Devin Simmons

Abstract:

At the USDOT’s Bureau of Transportation Statistics, a biannual census of ferry operators in the U.S. is conducted, with results such as route mileage used to determine federal funding levels for operators. AIS data allows for the possibility of using GIS software and geographical methods to confirm operator-reported mileage for individual ferry routes. As part of the USDOT’s work on the ferry census, an algorithm was developed that uses AIS data for ferry vessels in conjunction with known ferry terminal locations to model the average route travelled for use as both a cartographic product and confirmation of operator-reported mileage. AIS data from each vessel is first analyzed to determine individual journeys based on the vessel’s velocity, and changes in velocity over time. These trips are then converted to geographic linestring objects. Using the terminal locations, the algorithm then determines whether the trip represented a known ferry route. Given a large enough dataset, routes will be represented by multiple trip linestrings, which are then filtered by DBSCAN spatial clustering to remove outliers. Finally, these remaining trips are ready to be averaged into one route. The algorithm interpolates the point on each trip linestring that represents the start point. From these start points, a centroid is calculated, and the first point of the average route is determined. Each trip is interpolated again to find the point that represents one percent of the journey’s completion, and the centroid of those points is used as the next point in the average route, and so on until 100 points have been calculated. Routes created using this algorithm have shown demonstrable improvement over previous methods, which included the implementation of a LOESS model. Additionally, the algorithm greatly reduces the amount of manual digitizing needed to visualize ferry activity.

Keywords: ferry vessels, transportation, modeling, AIS data

Procedia PDF Downloads 176

31 Enhancement of Density-Based Spatial Clustering Algorithm with Noise for Fire Risk Assessment and Warning in Metro Manila

Authors: Pinky Mae O. De Leon, Franchezka S. P. Flores

Abstract:

This study focuses on applying an enhanced density-based spatial clustering algorithm with noise for fire risk assessments and warnings in Metro Manila. Unlike other clustering algorithms, DBSCAN is known for its ability to identify arbitrary-shaped clusters and its resistance to noise. However, its performance diminishes when handling high dimensional data, wherein it can read the noise points as relevant data points. Also, the algorithm is dependent on the parameters (eps & minPts) set by the user; choosing the wrong parameters can greatly affect its clustering result. To overcome these challenges, the study proposes three key enhancements: first is to utilize multiple MinHash and locality-sensitive hashing to decrease the dimensionality of the data set, second is to implement Jaccard Similarity before applying the parameter Epsilon to ensure that only similar data points are considered neighbors, and third is to use the concept of Jaccard Neighborhood along with the parameter MinPts to improve in classifying core points and identifying noise in the data set. The results show that the modified DBSCAN algorithm outperformed three other clustering methods, achieving fewer outliers, which facilitated a clearer identification of fire-prone areas, high Silhouette score, indicating well-separated clusters that distinctly identify areas with potential fire hazards and exceptionally achieved a low Davies-Bouldin Index and a high Calinski-Harabasz score, highlighting its ability to form compact and well-defined clusters, making it an effective tool for assessing fire hazard zones. This study is intended for assessing areas in Metro Manila that are most prone to fire risk.

Keywords: DBSCAN, clustering, Jaccard similarity, MinHash LSH, fires

Procedia PDF Downloads 0

30 Implementation of Algorithm K-Means for Grouping District/City in Central Java Based on Macro Economic Indicators

Authors: Nur Aziza Luxfiati

Abstract:

Clustering is partitioning data sets into sub-sets or groups in such a way that elements certain properties have shared property settings with a high level of similarity within one group and a low level of similarity between groups. . The K-Means algorithm is one of thealgorithmsclustering as a grouping tool that is most widely used in scientific and industrial applications because the basic idea of the kalgorithm is-means very simple. In this research, applying the technique of clustering using the k-means algorithm as a method of solving the problem of national development imbalances between regions in Central Java Province based on macroeconomic indicators. The data sample used is secondary data obtained from the Central Java Provincial Statistics Agency regarding macroeconomic indicator data which is part of the publication of the 2019 National Socio-Economic Survey (Susenas) data. score and determine the number of clusters (k) using the elbow method. After the clustering process is carried out, the validation is tested using themethodsBetween-Class Variation (BCV) and Within-Class Variation (WCV). The results showed that detection outlier using z-score normalization showed no outliers. In addition, the results of the clustering test obtained a ratio value that was not high, namely 0.011%. There are two district/city clusters in Central Java Province which have economic similarities based on the variables used, namely the first cluster with a high economic level consisting of 13 districts/cities and theclustersecondwith a low economic level consisting of 22 districts/cities. And in the cluster second, namely, between low economies, the authors grouped districts/cities based on similarities to macroeconomic indicators such as 20 districts of Gross Regional Domestic Product, with a Poverty Depth Index of 19 districts, with 5 districts in Human Development, and as many as Open Unemployment Rate. 10 districts.

Keywords: clustering, K-Means algorithm, macroeconomic indicators, inequality, national development

Procedia PDF Downloads 158

29 Constructing the Joint Mean-Variance Regions for Univariate and Bivariate Normal Distributions: Approach Based on the Measure of Cumulative Distribution Functions

Authors: Valerii Dashuk

Abstract:

The usage of the confidence intervals in economics and econometrics is widespread. To be able to investigate a random variable more thoroughly, joint tests are applied. One of such examples is joint mean-variance test. A new approach for testing such hypotheses and constructing confidence sets is introduced. Exploring both the value of the random variable and its deviation with the help of this technique allows checking simultaneously the shift and the probability of that shift (i.e., portfolio risks). Another application is based on the normal distribution, which is fully defined by mean and variance, therefore could be tested using the introduced approach. This method is based on the difference of probability density functions. The starting point is two sets of normal distribution parameters that should be compared (whether they may be considered as identical with given significance level). Then the absolute difference in probabilities at each 'point' of the domain of these distributions is calculated. This measure is transformed to a function of cumulative distribution functions and compared to the critical values. Critical values table was designed from the simulations. The approach was compared with the other techniques for the univariate case. It differs qualitatively and quantitatively in easiness of implementation, computation speed, accuracy of the critical region (theoretical vs. real significance level). Stable results when working with outliers and non-normal distributions, as well as scaling possibilities, are also strong sides of the method. The main advantage of this approach is the possibility to extend it to infinite-dimension case, which was not possible in the most of the previous works. At the moment expansion to 2-dimensional state is done and it allows to test jointly up to 5 parameters. Therefore the derived technique is equivalent to classic tests in standard situations but gives more efficient alternatives in nonstandard problems and on big amounts of data.

Keywords: confidence set, cumulative distribution function, hypotheses testing, normal distribution, probability density function

Procedia PDF Downloads 174

28 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 107

27 Contrasted Mean and Median Models in Egyptian Stock Markets

Authors: Mai A. Ibrahim, Mohammed El-Beltagy, Motaz Khorshid

Abstract:

Emerging Markets return distributions have shown significance departure from normality were they are characterized by fatter tails relative to the normal distribution and exhibit levels of skewness and kurtosis that constitute a significant departure from normality. Therefore, the classical Markowitz Mean-Variance is not applicable for emerging markets since it assumes normally-distributed returns (with zero skewness and kurtosis) and a quadratic utility function. Moreover, the Markowitz mean-variance analysis can be used in cases of moderate non-normality and it still provides a good approximation of the expected utility, but it may be ineffective under large departure from normality. Higher moments models and median models have been suggested in the literature for asset allocation in this case. Higher moments models have been introduced to account for the insufficiency of the description of a portfolio by only its first two moments while the median model has been introduced as a robust statistic which is less affected by outliers than the mean. Tail risk measures such as Value-at Risk (VaR) and Conditional Value-at-Risk (CVaR) have been introduced instead of Variance to capture the effect of risk. In this research, higher moment models including the Mean-Variance-Skewness (MVS) and Mean-Variance-Skewness-Kurtosis (MVSK) are formulated as single-objective non-linear programming problems (NLP) and median models including the Median-Value at Risk (MedVaR) and Median-Mean Absolute Deviation (MedMAD) are formulated as a single-objective mixed-integer linear programming (MILP) problems. The higher moment models and median models are compared to some benchmark portfolios and tested on real financial data in the Egyptian main Index EGX30. The results show that all the median models outperform the higher moment models were they provide higher final wealth for the investor over the entire period of study. In addition, the results have confirmed the inapplicability of the classical Markowitz Mean-Variance to the Egyptian stock market as it resulted in very low realized profits.

Keywords: Egyptian stock exchange, emerging markets, higher moment models, median models, mixed-integer linear programming, non-linear programming

Procedia PDF Downloads 314

26 Principal Component Analysis Combined Machine Learning Techniques on Pharmaceutical Samples by Laser Induced Breakdown Spectroscopy

Authors: Kemal Efe Eseller, Göktuğ Yazici

Abstract:

Laser-induced breakdown spectroscopy (LIBS) is a rapid optical atomic emission spectroscopy which is used for material identification and analysis with the advantages of in-situ analysis, elimination of intensive sample preparation, and micro-destructive properties for the material to be tested. LIBS delivers short pulses of laser beams onto the material in order to create plasma by excitation of the material to a certain threshold. The plasma characteristics, which consist of wavelength value and intensity amplitude, depends on the material and the experiment’s environment. In the present work, medicine samples’ spectrum profiles were obtained via LIBS. Medicine samples’ datasets include two different concentrations for both paracetamol based medicines, namely Aferin and Parafon. The spectrum data of the samples were preprocessed via filling outliers based on quartiles, smoothing spectra to eliminate noise and normalizing both wavelength and intensity axis. Statistical information was obtained and principal component analysis (PCA) was incorporated to both the preprocessed and raw datasets. The machine learning models were set based on two different train-test splits, which were 70% training – 30% test and 80% training – 20% test. Cross-validation was preferred to protect the models against overfitting; thus the sample amount is small. The machine learning results of preprocessed and raw datasets were subjected to comparison for both splits. This is the first time that all supervised machine learning classification algorithms; consisting of Decision Trees, Discriminant, naïve Bayes, Support Vector Machines (SVM), k-NN(k-Nearest Neighbor) Ensemble Learning and Neural Network algorithms; were incorporated to LIBS data of paracetamol based pharmaceutical samples, and their different concentrations on preprocessed and raw dataset in order to observe the effect of preprocessing.

Keywords: machine learning, laser-induced breakdown spectroscopy, medicines, principal component analysis, preprocessing

Procedia PDF Downloads 87

25 Determination of Community Based Reference Interval of Aspartate Aminotransferase to Platelet Ratio Index (APRI) among Healthy Populations in Mekelle City Tigray, Northern Ethiopia

Authors: Getachew Belay Kassahun

Abstract:

Background: Aspartate aminotransferase to Platelet Ratio Index (APRI) currently becomes a biomarker for screening liver fibrosis since liver biopsy procedure is invasive and variation in pathological interpretation. Clinical Laboratory Standard Institute recommends establishing age, sex and environment specific reference interval for biomarkers in a homogenous population. The current study was aimed to derive community based reference interval of APRI aged between 12 and 60 years old in Mekelle city Tigrai, Northern Ethiopia. Method: Six hundred eighty eight study participants were collected from three districts in Mekelle city. The 3 districts were selected through random sampling technique and sample size to kebelles (small administration) were distributed proportional to household number in each district. Lottery method was used at household level if more than 2 study participants to each age partition were found. A community based cross sectional in a total of 534 study participants, 264 male and 270 females, were included in the final laboratory and data analysis but around 154 study participants were excluded through exclusion criteria. Aspartate aminotransferase was analyzed through Biosystem chemistry analyzer and Sysmix machine was used to analyze platelet. Man Whitney U test non parametric stastical tool was used to appreciate stastical difference among gender after excluding the outliers through Box and Whisker. Result: The study appreciated stastical difference among gender for APRI reference interval. The combined, male and female reference interval in the current study was 0.098-0.390, 0.133-0.428 and 0.090-0.319 respectively. The upper and lower reference interval of males was higher than females in all age partition and there was no stastical difference (p-value (<0.05)) between age partition. Conclusion: The current study showed using sex specific reference interval is significant to APRI biomarker in clinical practice for result interpretation.

Keywords: reference interval, aspartate aminotransferase to platelet ratio Index, Ethiopia, tigray

Procedia PDF Downloads 114

24 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 296