Search results for: imbalanced
65 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets
Authors: Najmeh Abedzadeh, Matthew Jacobs
Abstract:
An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.Keywords: IDS, imbalanced datasets, sampling algorithms, big data
Procedia PDF Downloads 32964 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data
Authors: Ruchika Malhotra, Megha Khanna
Abstract:
The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics
Procedia PDF Downloads 41863 Towards a Balancing Medical Database by Using the Least Mean Square Algorithm
Authors: Kamel Belammi, Houria Fatrim
Abstract:
imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of imbalanced data sets. In medical diagnosis classification, we often face the imbalanced number of data samples between the classes in which there are not enough samples in rare classes. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weight and some rules of thumb to determine those weights. After the balancing phase, we applythe different classifiers (support vector machine (SVM), k- nearest neighbor (KNN) and multilayer neuronal networks (MNN)) for balanced data set. We have also compared the obtained results before and after balancing method.Keywords: multilayer neural networks, k- nearest neighbor, support vector machine, imbalanced medical data, least mean square algorithm, diabetes
Procedia PDF Downloads 53462 A Priority Based Imbalanced Time Minimization Assignment Problem: An Iterative Approach
Authors: Ekta Jain, Kalpana Dahiya, Vanita Verma
Abstract:
This paper discusses a priority based imbalanced time minimization assignment problem dealing with the allocation of n jobs to m < n persons in which the project is carried out in two stages, viz. Stage-I and Stage-II. Stage-I consists of n1 ( < m) primary jobs and Stage-II consists of remaining (n-n1) secondary jobs which are commenced only after primary jobs are finished. Each job is to be allocated to exactly one person, and each person has to do at least one job. It is assumed that nature of the Stage-I jobs is such that one person can do exactly one primary job whereas a person can do more than one secondary job in Stage-II. In a particular stage, all persons start doing the jobs simultaneously, but if a person is doing more than one job, he does them one after the other in any order. The aim of the proposed study is to find the feasible assignment which minimizes the total time for the two stage execution of the project. For this, an iterative algorithm is proposed, which at each iteration, solves a constrained imbalanced time minimization assignment problem to generate a pair of Stage-I and Stage-II times. For solving this constrained problem, an algorithm is developed in the current paper. Later, alternate combinations based method to solve the priority based imbalanced problem is also discussed and a comparative study is carried out. Numerical illustrations are provided in support of the theory.Keywords: assignment, imbalanced, priority, time minimization
Procedia PDF Downloads 23561 Improved Classification Procedure for Imbalanced and Overlapped Situations
Authors: Hankyu Lee, Seoung Bum Kim
Abstract:
The issue with imbalance and overlapping in the class distribution becomes important in various applications of data mining. The imbalanced dataset is a special case in classification problems in which the number of observations of one class (i.e., major class) heavily exceeds the number of observations of the other class (i.e., minor class). Overlapped dataset is the case where many observations are shared together between the two classes. Imbalanced and overlapped data can be frequently found in many real examples including fraud and abuse patients in healthcare, quality prediction in manufacturing, text classification, oil spill detection, remote sensing, and so on. The class imbalance and overlap problem is the challenging issue because this situation degrades the performance of most of the standard classification algorithms. In this study, we propose a classification procedure that can effectively handle imbalanced and overlapped datasets by splitting data space into three parts: nonoverlapping, light overlapping, and severe overlapping and applying the classification algorithm in each part. These three parts were determined based on the Hausdorff distance and the margin of the modified support vector machine. An experiments study was conducted to examine the properties of the proposed method and compared it with other classification algorithms. The results showed that the proposed method outperformed the competitors under various imbalanced and overlapped situations. Moreover, the applicability of the proposed method was demonstrated through the experiment with real data.Keywords: classification, imbalanced data with class overlap, split data space, support vector machine
Procedia PDF Downloads 30860 One vs. Rest and Error Correcting Output Codes Principled Rebalancing Schemes for Solving Imbalanced Multiclass Problems
Authors: Alvaro Callejas-Ramos, Lorena Alvarez-Perez, Alexander Benitez-Buenache, Anibal R. Figueiras-Vidal
Abstract:
This contribution presents a promising formulation which allows to extend the principled binary rebalancing procedures, also known as neutral re-balancing mechanisms in the sense that they do not alter the likelihood ratioKeywords: Bregman divergences, imbalanced multiclass classifi-cation, informed re-balancing, invariant likelihood ratio
Procedia PDF Downloads 21859 Adaptive Swarm Balancing Algorithms for Rare-Event Prediction in Imbalanced Healthcare Data
Authors: Jinyan Li, Simon Fong, Raymond Wong, Mohammed Sabah, Fiaidhi Jinan
Abstract:
Clinical data analysis and forecasting have make great contributions to disease control, prevention and detection. However, such data usually suffer from highly unbalanced samples in class distributions. In this paper, we target at the binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat-inspired algorithm, and combine both of them with the synthetic minority over-sampling technique (SMOTE) for processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reveal that while the performance improvements obtained by the former methods are not scalable to larger data scales, the later one, which we call Adaptive Swarm Balancing Algorithms, leads to significant efficiency and effectiveness improvements on large datasets. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. Leading to more credible performances of the classifier, and shortening the running time compared with the brute-force method.Keywords: Imbalanced dataset, meta-heuristic algorithm, SMOTE, big data
Procedia PDF Downloads 44358 An Ensemble Deep Learning Architecture for Imbalanced Classification of Thoracic Surgery Patients
Authors: Saba Ebrahimi, Saeed Ahmadian, Hedie Ashrafi
Abstract:
Selecting appropriate patients for surgery is one of the main issues in thoracic surgery (TS). Both short-term and long-term risks and benefits of surgery must be considered in the patient selection criteria. There are some limitations in the existing datasets of TS patients because of missing values of attributes and imbalanced distribution of survival classes. In this study, a novel ensemble architecture of deep learning networks is proposed based on stacking different linear and non-linear layers to deal with imbalance datasets. The categorical and numerical features are split using different layers with ability to shrink the unnecessary features. Then, after extracting the insight from the raw features, a novel biased-kernel layer is applied to reinforce the gradient of the minority class and cause the network to be trained better comparing the current methods. Finally, the performance and advantages of our proposed model over the existing models are examined for predicting patient survival after thoracic surgery using a real-life clinical data for lung cancer patients.Keywords: deep learning, ensemble models, imbalanced classification, lung cancer, TS patient selection
Procedia PDF Downloads 14657 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection
Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada
Abstract:
With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.Keywords: machine learning, imbalanced data, data mining, big data
Procedia PDF Downloads 13256 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique
Authors: Ghada A. Alfattni
Abstract:
Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates.Keywords: imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour
Procedia PDF Downloads 35255 Selection of Soil Quality Indicators of Rice Cropping Systems Using Minimum Data Set Influenced by Imbalanced Fertilization
Authors: Theresa K., Shanmugasundaram R., Kennedy J. S.
Abstract:
Nutrient supplements are indispensable for raising crops and to reap determining productivity. The nutrient imbalance between replenishment and crop uptake is attempted through the input of inorganic fertilizers. Excessive dumping of inorganic nutrients in soil cause stagnant and decline in yield. Imbalanced N-P-K ratio in the soil exacerbates and agitates the soil ecosystems. The study evaluated the fertilization practices of conventional (CFs), organic and Integrated Nutrient Management system (INM) on soil quality using key indicators and soil quality indices. Twelve rice farming fields of which, ten fields were having conventional cultivation practices, one field each was organic farming based and INM based cultivated under monocropping sequence in the Thondamuthur block of Coimbatore district were fixed and properties viz., physical, chemical and biological were studied for four cropping seasons to determine soil quality index (SQI). SQI was computed for conventional, organic and INM fields. Comparing conventional farming (CF) with organic and INM, CF was recorded with a lower soil quality index. While in organic and INM fields, the higher SQI value of 0.99 and 0.88 respectively were registered. CF₄ received with a super-optimal dose of N (250%) showed a lesser SQI value (0.573) as well as the yield (3.20 t ha⁻¹) and the CF6 which received 125 % N recorded the highest SQI (0.715) and yield (6.20 t ha⁻¹). Likewise, most of the CFs received higher N beyond the level of 125 % except CF₃ and CF₉, which recorded lower yields. CFs which received super-optimal P in the order of CF₆&CF₇>CF₁&CF₁₀ recorded lesser yields except for CF₆. Super-optimal K application also recorded lesser yield in CF₄, CF₇ and CF₉.Keywords: rice cropping system, soil quality indicators, imbalanced fertilization, yield
Procedia PDF Downloads 15954 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification
Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike
Abstract:
Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.Keywords: data mining, decision tree, classification, imbalance dataset
Procedia PDF Downloads 13953 Artificial Reproduction System and Imbalanced Dataset: A Mendelian Classification
Authors: Anita Kushwaha
Abstract:
We propose a new evolutionary computational model called Artificial Reproduction System which is based on the complex process of meiotic reproduction occurring between male and female cells of the living organisms. Artificial Reproduction System is an attempt towards a new computational intelligence approach inspired by the theoretical reproduction mechanism, observed reproduction functions, principles and mechanisms. A reproductive organism is programmed by genes and can be viewed as an automaton, mapping and reducing so as to create copies of those genes in its off springs. In Artificial Reproduction System, the binding mechanism between male and female cells is studied, parameters are chosen and a network is constructed also a feedback system for self regularization is established. The model then applies Mendel’s law of inheritance, allele-allele associations and can be used to perform data analysis of imbalanced data, multivariate, multiclass and big data. In the experimental study Artificial Reproduction System is compared with other state of the art classifiers like SVM, Radial Basis Function, neural networks, K-Nearest Neighbor for some benchmark datasets and comparison results indicates a good performance.Keywords: bio-inspired computation, nature- inspired computation, natural computing, data mining
Procedia PDF Downloads 27452 Supervised/Unsupervised Mahalanobis Algorithm for Improving Performance for Cyberattack Detection over Communications Networks
Authors: Radhika Ranjan Roy
Abstract:
Deployment of machine learning (ML)/deep learning (DL) algorithms for cyberattack detection in operational communications networks (wireless and/or wire-line) is being delayed because of low-performance parameters (e.g., recall, precision, and f₁-score). If datasets become imbalanced, which is the usual case for communications networks, the performance tends to become worse. Complexities in handling reducing dimensions of the feature sets for increasing performance are also a huge problem. Mahalanobis algorithms have been widely applied in scientific research because Mahalanobis distance metric learning is a successful framework. In this paper, we have investigated the Mahalanobis binary classifier algorithm for increasing cyberattack detection performance over communications networks as a proof of concept. We have also found that high-dimensional information in intermediate features that are not utilized as much for classification tasks in ML/DL algorithms are the main contributor to the state-of-the-art of improved performance of the Mahalanobis method, even for imbalanced and sparse datasets. With no feature reduction, MD offers uniform results for precision, recall, and f₁-score for unbalanced and sparse NSL-KDD datasets.Keywords: Mahalanobis distance, machine learning, deep learning, NS-KDD, local intrinsic dimensionality, chi-square, positive semi-definite, area under the curve
Procedia PDF Downloads 7951 An Adaptive Oversampling Technique for Imbalanced Datasets
Authors: Shaukat Ali Shahee, Usha Ananthakumar
Abstract:
A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling
Procedia PDF Downloads 41850 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups
Authors: Lily Ingsrisawang, Tasanee Nacharoen
Abstract:
Introduction: The problems of unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many research papers found that the performance of existing classifier tends to be biased towards the majority class. The k -nearest neighbors’ nonparametric discriminant analysis is one method that was proposed for classifying unbalanced classes with good performance. Hence, the methods of discriminant analysis are of interest to us in investigating misclassification error rates for class-imbalanced data of three diabetes risk groups. Objective: The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification application of class-imbalanced data of diabetes risk groups. Methods: Data from a healthy project for 599 staffs in a government hospital in Bangkok were obtained for the classification problem. The staffs were diagnosed into one of three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data along with the variables; diabetes risk group, age, gender, cholesterol, and BMI was analyzed and bootstrapped up to 50 and 100 samples, 599 observations per sample, for additional estimation of misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples show non-normality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. In finding the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions with three choices of (0.90:0.05:0.05), (0.80: 0.10: 0.10) or (0.70, 0.15, 0.15). Results: The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k = 3 or k = 4 and the prior probabilities of {non-risk:risk:diabetic} as {0.90:0.05:0.05} or {0.80:0.10:0.10} gave the smallest error rate of misclassification. Conclusion: The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.Keywords: error rate, bootstrap, diabetes risk groups, k-nearest neighbors
Procedia PDF Downloads 43649 Functioning of a Temporarily Single Parent Family System Due to Migration from the Perspective of Adolescents with Cerebral Palsy
Authors: A. Gagat-Matuła
Abstract:
There is a definite lack – in Poland, as well as around the world – of empirical studies of families raising handicapped child, in which one parent migrates. In diagnostics of the functioning of such families emphasis should be placed not only on the difficulties, but most of all it should be indicated what possibilities are there for the family and how it overcomes the difficulties. Migration of a parent on the one hand is a chance to improve the family’s material situation. In certain circumstances this may only be an “escape” into work from the issues associated with the upbringing and rehabilitation of a handicapped child. The aim of the study was to learn the functioning of a temporarily single parent family system as a result of migration of a parent from the perspective of adolescents with cerebral palsy. The study was conducted in the year 2013 in the area of Eastern Poland. It involved an analysis of 70 persons (with cerebral palsy in an intellectual capacity) from families in which at least one of the parents migrates. The study incorporated the diagnostic survey method. These tools were used: Family Evaluation Scales (SOR) adapted for Poland by Andrzej Margasiński. The explorations in this study indicate, that 47% of studied temporarily single parent families are balanced models. This is evidence of the resources at the disposal of the family which, despite the disability of the child and temporary separation, is able to function properly. The conducted studies show, that 37% of temporarily single parent families are imbalanced models in the perception of adolescents with cerebral palsy. These families experience functional difficulties and require psychological and pedagogical support. There is a need for building skills related to effective coping with family stress. Especially considering, that families of an imbalanced type do not use the internal and external resources of the family system. Such a situation may deepen the disarrangement of family life. In intermediate families (16%) there are also temporary difficulties in functioning. Separation anxiety experienced by mothers may disrupt relations and introduce additional stress factors. For that reason it is important to provide support for women with difficulties coping with the emotions associated with raising handicapped adolescents and migratory separation.Keywords: child with cerebral palsy, family, migration, parents
Procedia PDF Downloads 41948 Fair Federated Learning in Wireless Communications
Authors: Shayan Mohajer Hamidi
Abstract:
Federated Learning (FL) has emerged as a promising paradigm for training machine learning models on distributed data without the need for centralized data aggregation. In the realm of wireless communications, FL has the potential to leverage the vast amounts of data generated by wireless devices to improve model performance and enable intelligent applications. However, the fairness aspect of FL in wireless communications remains largely unexplored. This abstract presents an idea for fair federated learning in wireless communications, addressing the challenges of imbalanced data distribution, privacy preservation, and resource allocation. Firstly, the proposed approach aims to tackle the issue of imbalanced data distribution in wireless networks. In typical FL scenarios, the distribution of data across wireless devices can be highly skewed, resulting in unfair model updates. To address this, we propose a weighted aggregation strategy that assigns higher importance to devices with fewer samples during the aggregation process. By incorporating fairness-aware weighting mechanisms, the proposed approach ensures that each participating device's contribution is proportional to its data distribution, thereby mitigating the impact of data imbalance on model performance. Secondly, privacy preservation is a critical concern in federated learning, especially in wireless communications where sensitive user data is involved. The proposed approach incorporates privacy-enhancing techniques, such as differential privacy, to protect user privacy during the model training process. By adding carefully calibrated noise to the gradient updates, the proposed approach ensures that the privacy of individual devices is preserved without compromising the overall model accuracy. Moreover, the approach considers the heterogeneity of devices in terms of computational capabilities and energy constraints, allowing devices to adaptively adjust the level of privacy preservation to strike a balance between privacy and utility. Thirdly, efficient resource allocation is crucial for federated learning in wireless communications, as devices operate under limited bandwidth, energy, and computational resources. The proposed approach leverages optimization techniques to allocate resources effectively among the participating devices, considering factors such as data quality, network conditions, and device capabilities. By intelligently distributing the computational load, communication bandwidth, and energy consumption, the proposed approach minimizes resource wastage and ensures a fair and efficient FL process in wireless networks. To evaluate the performance of the proposed fair federated learning approach, extensive simulations and experiments will be conducted. The experiments will involve a diverse set of wireless devices, ranging from smartphones to Internet of Things (IoT) devices, operating in various scenarios with different data distributions and network conditions. The evaluation metrics will include model accuracy, fairness measures, privacy preservation, and resource utilization. The expected outcomes of this research include improved model performance, fair allocation of resources, enhanced privacy preservation, and a better understanding of the challenges and solutions for fair federated learning in wireless communications. The proposed approach has the potential to revolutionize wireless communication systems by enabling intelligent applications while addressing fairness concerns and preserving user privacy.Keywords: federated learning, wireless communications, fairness, imbalanced data, privacy preservation, resource allocation, differential privacy, optimization
Procedia PDF Downloads 7647 Enhancing Fault Detection in Rotating Machinery Using Wiener-CNN Method
Authors: Mohamad R. Moshtagh, Ahmad Bagheri
Abstract:
Accurate fault detection in rotating machinery is of utmost importance to ensure optimal performance and prevent costly downtime in industrial applications. This study presents a robust fault detection system based on vibration data collected from rotating gears under various operating conditions. The considered scenarios include: (1) both gears being healthy, (2) one healthy gear and one faulty gear, and (3) introducing an imbalanced condition to a healthy gear. Vibration data was acquired using a Hentek 1008 device and stored in a CSV file. Python code implemented in the Spider environment was used for data preprocessing and analysis. Winner features were extracted using the Wiener feature selection method. These features were then employed in multiple machine learning algorithms, including Convolutional Neural Networks (CNN), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), and Random Forest, to evaluate their performance in detecting and classifying faults in both the training and validation datasets. The comparative analysis of the methods revealed the superior performance of the Wiener-CNN approach. The Wiener-CNN method achieved a remarkable accuracy of 100% for both the two-class (healthy gear and faulty gear) and three-class (healthy gear, faulty gear, and imbalanced) scenarios in the training and validation datasets. In contrast, the other methods exhibited varying levels of accuracy. The Wiener-MLP method attained 100% accuracy for the two-class training dataset and 100% for the validation dataset. For the three-class scenario, the Wiener-MLP method demonstrated 100% accuracy in the training dataset and 95.3% accuracy in the validation dataset. The Wiener-KNN method yielded 96.3% accuracy for the two-class training dataset and 94.5% for the validation dataset. In the three-class scenario, it achieved 85.3% accuracy in the training dataset and 77.2% in the validation dataset. The Wiener-Random Forest method achieved 100% accuracy for the two-class training dataset and 85% for the validation dataset, while in the three-class training dataset, it attained 100% accuracy and 90.8% accuracy for the validation dataset. The exceptional accuracy demonstrated by the Wiener-CNN method underscores its effectiveness in accurately identifying and classifying fault conditions in rotating machinery. The proposed fault detection system utilizes vibration data analysis and advanced machine learning techniques to improve operational reliability and productivity. By adopting the Wiener-CNN method, industrial systems can benefit from enhanced fault detection capabilities, facilitating proactive maintenance and reducing equipment downtime.Keywords: fault detection, gearbox, machine learning, wiener method
Procedia PDF Downloads 8146 EDM for Prediction of Academic Trends and Patterns
Authors: Trupti Diwan
Abstract:
Predicting student failure at school has changed into a difficult challenge due to both the large number of factors that can affect the reduced performance of students and the imbalanced nature of these kinds of data sets. This paper surveys the two elements needed to make prediction on Students’ Academic Performances which are parameters and methods. This paper also proposes a framework for predicting the performance of engineering students. Genetic programming can be used to predict student failure/success. Ranking algorithm is used to rank students according to their credit points. The framework can be used as a basis for the system implementation & prediction of students’ Academic Performance in Higher Learning Institute.Keywords: classification, educational data mining, student failure, grammar-based genetic programming
Procedia PDF Downloads 42345 Predictive Modelling of Aircraft Component Replacement Using Imbalanced Learning and Ensemble Method
Authors: Dangut Maren David, Skaf Zakwan
Abstract:
Adequate monitoring of vehicle component in other to obtain high uptime is the goal of predictive maintenance, the major challenge faced by businesses in industries is the significant cost associated with a delay in service delivery due to system downtime. Most of those businesses are interested in predicting those problems and proactively prevent them in advance before it occurs, which is the core advantage of Prognostic Health Management (PHM) application. The recent emergence of industry 4.0 or industrial internet of things (IIoT) has led to the need for monitoring systems activities and enhancing system-to-system or component-to- component interactions, this has resulted to a large generation of data known as big data. Analysis of big data represents an increasingly important, however, due to complexity inherently in the dataset such as imbalance classification problems, it becomes extremely difficult to build a model with accurate high precision. Data-driven predictive modeling for condition-based maintenance (CBM) has recently drowned research interest with growing attention to both academics and industries. The large data generated from industrial process inherently comes with a different degree of complexity which posed a challenge for analytics. Thus, imbalance classification problem exists perversely in industrial datasets which can affect the performance of learning algorithms yielding to poor classifier accuracy in model development. Misclassification of faults can result in unplanned breakdown leading economic loss. In this paper, an advanced approach for handling imbalance classification problem is proposed and then a prognostic model for predicting aircraft component replacement is developed to predict component replacement in advanced by exploring aircraft historical data, the approached is based on hybrid ensemble-based method which improves the prediction of the minority class during learning, we also investigate the impact of our approach on multiclass imbalance problem. We validate the feasibility and effectiveness in terms of the performance of our approach using real-world aircraft operation and maintenance datasets, which spans over 7 years. Our approach shows better performance compared to other similar approaches. We also validate our approach strength for handling multiclass imbalanced dataset, our results also show good performance compared to other based classifiers.Keywords: prognostics, data-driven, imbalance classification, deep learning
Procedia PDF Downloads 17544 Unbalanced Mean-Time and Buffer Effects in Lines Suffering Breakdown
Authors: Sabry Shaaban, Tom McNamara, Sarah Hudson
Abstract:
This article studies the performance of unpaced serial production lines that are subject to breakdown and are imbalanced in terms of both of their processing time means (MTs) and buffer storage capacities (BCs). Simulation results show that the best pattern in terms of throughput is a balanced line with respect to average buffer level; the best configuration is a monotone decreasing MT order, together with an ascending BC arrangement. Statistical analysis shows that BC, patterns of MT and BC imbalance, line length and degree of imbalance all contribute significantly to performance. Results show that unbalanced lines cope well with unreliability.Keywords: unreliable unpaced serial lines, simulation, unequal mean operation times, uneven buffer capacities, patterns of imbalance, throughput, average buffer level
Procedia PDF Downloads 47443 Association Between Malnutrition and Dental Caries in Children
Authors: Mohammed Khalid Mahmood, Delphine Tardivo, Romain Lan
Abstract:
Dental caries is one of the most common diseases in the world, affecting billions of people and significantly lowering the quality of life. Malnutrition, on the other hand, is defined as inadequate, imbalanced, or excessive consumption of macronutrients, micronutrients, or both, which is characterized as an abnormal physiological condition. Oral health is impacted by malnutrition, and malnutrition can result from poor oral health. The objective of this paper was to study the association of serum Vitamin D level and body mass index as representatives of malnutrition at micro and macro levels, respectively, on dental caries. Results showed that: 1. The majority of the population studied (70%) are Vitamin D deficient. 2. Having a normal and even a sufficient level of serum Vitamin D and having a normal body mass index increase the chances of children being caries-free and having a lower caries index.Keywords: children, dental Caries, malnutrition, vitamin D
Procedia PDF Downloads 8242 Application of Model Tree in the Prediction of TBM Rate of Penetration with Synthetic Minority Oversampling Technique
Authors: Ehsan Mehryaar
Abstract:
The rate of penetration is (RoP) one of the vital factors in the cost and time of tunnel boring projects; therefore, predicting it can lead to a substantial increase in the efficiency of the project. RoP is heavily dependent geological properties of the project site and TBM properties. In this study, 151-point data from Queen’s water tunnel is collected, which includes unconfined compression strength, peak slope index, angle with weak planes, and distance between planes of weaknesses. Since the size of the data is small, it was observed that it is imbalanced. To solve that problem synthetic minority oversampling technique is utilized. The model based on the model tree is proposed, where each leaf consists of a support vector machine model. Proposed model performance is then compared to existing empirical equations in the literature.Keywords: Model tree, SMOTE, rate of penetration, TBM(tunnel boring machine), SVM
Procedia PDF Downloads 17441 Efficacy of DAPG Producing Fluorescent Pseudomonas for Enhancing Nutrient Use Efficacy, Bio-Control of Soil-Borne Diseases and Yield of Groundnut
Authors: Basavaraj Yenagi, P. Nagaraju, C. R. Patil
Abstract:
Groundnut (Arachis hypohaea L.) is called as “King of oilseeds” and one of the most important food and cash crops in Indian subcontinent. Yield and quality of oil are negatively correlated with poor or imbalanced nutrition and constant exposure to both biotic and abiotic stress factors. Variety of diseases affect groundnut plant, most of them are caused by fungi and lead to severe yield loss. Imbalanced nutrition increases the concerns of environmental deterioration which includes soil fertility. Among different microbial antagonists, Pseudomonas is common member of the plant growth promoting rhizobacteria microflora present in the rhizosphere of groundnut. These are known to produce a beneficial effect on groundnut due to their high metabolic activity leading to the production of enzymes, exopolysaccharides, secondary metabolites, and antibiotics. The ability of pseudomonas lies on their ability to produce antibiotic metabolites such as 2, 4-diacetylphloroglucinol (DAPG). DAPG can inhibit the growth of fungal pathogens namely collar rot and stem rot and also increase the availability of plant nutrients through increased solubilization and uptake of nutrients. Hence, the present study was conducted for three consecutive years (2014 to 2016) in vertisol during the rainy season to assess the efficacy of DAPG producing fluorescent pseudomonas for enhancing nutrient use efficacy, bio-control of soil-borne diseases and yield of groundnut at University of Agricultural Sciences, Dharwad farm. The experiment was laid out in an RCBD with three replications and seven treatments. The mean of three years data revealed that the effect of DAPG-producing producing fluorescent pseudomonas enhanced groundnut yield, uptake of nitrogen and phosphorus and nutrient use efficiency and also found to be effective in bio-control of collar rot and stem rot incidence leading to increase pod yield of groundnut. Higher dry pod yield of groundnut was obtained with DAPG 2(3535 kg ha-1) closely followed by DAPG 4(3492 kg ha-1), FP 98(3443 kg ha-1), DAPG 1(3414 kg ha-1), FP 86(3361 kg ha-1) and Trichoderma spp. (3380 kg ha-1) over control(3173 kg ha-1). A similar trend was obtained with other growth and yield attributing parameters. N uptake ranged from 8.21 percent to FP 86 to 17.91 percent with DAPG 2 and P uptake ranged between 5.56 percent with FP 86 to 16.67 percent with DAPG 2 over control. The first year, there was no incidence of collar rot. During the second year, the control plot recorded 2.51 percent incidence and it ranged from 0.82 percent to 1.43 percent in different DAPG-producing fluorescent pseudomonas treatments. The similar trend was noticed in the third year with lower incidence. The stem rot incidence was recorded during all the three years. Mean data indicated that the control plot recorded 2.65 percent incidence and it ranged from 0.71 percent to 1.23 percent in different DAPG-producing fluorescent pseudomonas treatments. The increase in net monetary benefits ranged from Rs.5975 ha-1 to Rs.11407 ha 1 in different treatments. Hence, as a low-cost technology, seed treatment with available DAPG-producing fluorescent pseudomonas has a beneficial effect on groundnut for enhancing groundnut yield, nutrient use efficiency and bio-control of soil-borne diseases.Keywords: groundnut, DAPG, fluorescent pseudomonas, nutrient use efficiency, collar rot, stem rot
Procedia PDF Downloads 18140 Prediction of All-Beta Protein Secondary Structure Using Garnier-Osguthorpe-Robson Method
Authors: K. Tejasri, K. Suvarna Vani, S. Prathyusha, S. Ramya
Abstract:
Proteins are chained sequences of amino acids which are brought together by the peptide bonds. Many varying formations of the chains are possible due to multiple combinations of amino acids and rotation in numerous positions along the chain. Protein structure prediction is one of the crucial goals worked towards by the members of bioinformatics and theoretical chemistry backgrounds. Among the four different structure levels in proteins, we emphasize mainly the secondary level structure. Generally, the secondary protein basically comprises alpha-helix and beta-sheets. Multi-class classification problem of data with disparity is truly a challenge to overcome and has to be addressed for the beta strands. Imbalanced data distribution constitutes a couple of the classes of data having very limited training samples collated with other classes. The secondary structure data is extracted from the protein primary sequence, and the beta-strands are predicted using suitable machine learning algorithms.Keywords: proteins, secondary structure elements, beta-sheets, beta-strands, alpha-helices, machine learning algorithms
Procedia PDF Downloads 9439 Rotor Dynamic Analysis for a Shaft Train by Using Finite Element Method
Authors: M. Najafi
Abstract:
In the present paper, a large turbo-generator shaft train including a heavy-duty gas turbine engine, a coupling, and a generator is established. The method of analysis is based on finite element simplified model for lateral and torsional vibration calculation. The basic elements of rotor are the shafts and the disks which are represented as circular cross section flexible beams and rigid body elements, respectively. For more accurate results, the gyroscopic effect and bearing dynamics coefficients and function of rotation are taken into account, and for the influence of shear effect, rotor has been modeled in the form of Timoshenko beam. Lateral critical speeds, critical speed map, damped mode shapes, Campbell diagram, zones of instability, amplitudes, phase angles response due to synchronous forces of excitation and amplification factor are calculated. Also, in the present paper, the effect of imbalanced rotor and effects of changing in internal force and temperature are studied.Keywords: rotor dynamic analysis, finite element method, shaft train, Campbell diagram
Procedia PDF Downloads 13638 Spillover Effect of Husbands' Lifestyle on Their Wives' Marital Satisfaction in China
Authors: Xitong Liu, Yutong Huang, Shu-Ching Yang
Abstract:
The phenomena of hypergamous and hypogamous marriages have become popular due to the imbalanced sex ratio caused by Chinese social preference for sons. Our research explores the spillover effect of husbands' lifestyles on their wives' marital satisfaction in China. Both personal and spouse lifestyle elements are utilized to develop regression models to study husbands' spillover effects on women's marital satisfaction. With data from China Family Panel Study and Stata for analysis, we tested our hypothesis that both smoking and substance use by a spouse will negatively impact women's marital satisfaction. Our empirical findings suggest that substance use has negative implications on marriage satisfaction. In particular, husbands' substance use is more critical to wives' marriage satisfaction than wives' behaviours. Conversely, another behavior indicating bad habits, the number of times the spouse drank alcohol, had no significant effect on the wife's marital satisfaction. We concluded our investigation and provided future implications for scholars in the family economics field.Keywords: Asian/Pacific Islander families, family economics, housework/division of labor, spillover
Procedia PDF Downloads 12537 Robustified Asymmetric Logistic Regression Model for Global Fish Stock Assessment
Authors: Osamu Komori, Shinto Eguchi, Hiroshi Okamura, Momoko Ichinokawa
Abstract:
The long time-series data on population assessments are essential for global ecosystem assessment because the temporal change of biomass in such a database reflects the status of global ecosystem properly. However, the available assessment data usually have limited sample sizes and the ratio of populations with low abundance of biomass (collapsed) to those with high abundance (non-collapsed) is highly imbalanced. To allow for the imbalance and uncertainty involved in the ecological data, we propose a binary regression model with mixed effects for inferring ecosystem status through an asymmetric logistic model. In the estimation equation, we observe that the weights for the non-collapsed populations are relatively reduced, which in turn puts more importance on the small number of observations of collapsed populations. Moreover, we extend the asymmetric logistic regression model using propensity score to allow for the sample biases observed in the labeled and unlabeled datasets. It robustified the estimation procedure and improved the model fitting.Keywords: double robust estimation, ecological binary data, mixed effect logistic regression model, propensity score
Procedia PDF Downloads 26836 Energy Efficiency Analysis of Discharge Modes of an Adiabatic Compressed Air Energy Storage System
Authors: Shane D. Inder, Mehrdad Khamooshi
Abstract:
Efficient energy storage is a crucial factor in facilitating the uptake of renewable energy resources. Among the many options available for energy storage systems required to balance imbalanced supply and demand cycles, compressed air energy storage (CAES) is a proven technology in grid-scale applications. This paper reviews the current state of micro scale CAES technology and describes a micro-scale advanced adiabatic CAES (A-CAES) system, where heat generated during compression is stored for use in the discharge phase. It will also describe a thermodynamic model, developed in EES (Engineering Equation Solver) to evaluate the performance and critical parameters of the discharge phase of the proposed system. Three configurations are explained including: single turbine without preheater, two turbines with preheaters, and three turbines with preheaters. It is shown that the micro-scale A-CAES is highly dependent upon key parameters including; regulator pressure, air pressure and volume, thermal energy storage temperature and flow rate and the number of turbines. It was found that a micro-scale AA-CAES, when optimized with an appropriate configuration, could deliver energy input to output efficiency of up to 70%.Keywords: CAES, adiabatic compressed air energy storage, expansion phase, micro generation, thermodynamic
Procedia PDF Downloads 311