Search results for: churn
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 21

Search results for: churn

21 Churn Prediction for Savings Bank Customers: A Machine Learning Approach

Authors: Prashant Verma

Abstract:

Commercial banks are facing immense pressure, including financial disintermediation, interest rate volatility and digital ways of finance. Retaining an existing customer is 5 to 25 less expensive than acquiring a new one. This paper explores customer churn prediction, based on various statistical & machine learning models and uses under-sampling, to improve the predictive power of these models. The results show that out of the various machine learning models, Random Forest which predicts the churn with 78% accuracy, has been found to be the most powerful model for the scenario. Customer vintage, customer’s age, average balance, occupation code, population code, average withdrawal amount, and an average number of transactions were found to be the variables with high predictive power for the churn prediction model. The model can be deployed by the commercial banks in order to avoid the customer churn so that they may retain the funds, which are kept by savings bank (SB) customers. The article suggests a customized campaign to be initiated by commercial banks to avoid SB customer churn. Hence, by giving better customer satisfaction and experience, the commercial banks can limit the customer churn and maintain their deposits.

Keywords: savings bank, customer churn, customer retention, random forests, machine learning, under-sampling

Procedia PDF Downloads 102
20 Uplift Segmentation Approach for Targeting Customers in a Churn Prediction Model

Authors: Shivahari Revathi Venkateswaran

Abstract:

Segmenting customers plays a significant role in churn prediction. It helps the marketing team with proactive and reactive customer retention. For the reactive retention, the retention team reaches out to customers who already showed intent to disconnect by giving some special offers. When coming to proactive retention, the marketing team uses churn prediction model, which ranks each customer from rank 1 to 100, where 1 being more risk to churn/disconnect (high ranks have high propensity to churn). The churn prediction model is built by using XGBoost model. However, with the churn rank, the marketing team can only reach out to the customers based on their individual ranks. To profile different groups of customers and to frame different marketing strategies for targeted groups of customers are not possible with the churn ranks. For this, the customers must be grouped in different segments based on their profiles, like demographics and other non-controllable attributes. This helps the marketing team to frame different offer groups for the targeted audience and prevent them from disconnecting (proactive retention). For segmentation, machine learning approaches like k-mean clustering will not form unique customer segments that have customers with same attributes. This paper finds an alternate approach to find all the combination of unique segments that can be formed from the user attributes and then finds the segments who have uplift (churn rate higher than the baseline churn rate). For this, search algorithms like fast search and recursive search are used. Further, for each segment, all customers can be targeted using individual churn ranks from the churn prediction model. Finally, a UI (User Interface) is developed for the marketing team to interactively search for the meaningful segments that are formed and target the right set of audience for future marketing campaigns and prevent them from disconnecting.

Keywords: churn prediction modeling, XGBoost model, uplift segments, proactive marketing, search algorithms, retention, k-mean clustering

Procedia PDF Downloads 39
19 Cluster Analysis of Customer Churn in Telecom Industry

Authors: Abbas Al-Refaie

Abstract:

The research examines the factors that affect customer churn (CC) in the Jordanian telecom industry. A total of 700 surveys were distributed. Cluster analysis revealed three main clusters. Results showed that CC and customer satisfaction (CS) were the key determinants in forming the three clusters. In two clusters, the center values of CC were high, indicating that the customers were loyal and SC was expensive and time- and energy-consuming. Still, the mobile service provider (MSP) should enhance its communication (COM), and value added services (VASs), as well as customer complaint management systems (CCMS). Finally, for the third cluster the center of the CC indicates a poor level of loyalty, which facilitates customers churn to another MSP. The results of this study provide valuable feedback for MSP decision makers regarding approaches to improving their performance and reducing CC.

Keywords: cluster analysis, telecom industry, switching cost, customer churn

Procedia PDF Downloads 297
18 Cost Sensitive Feature Selection in Decision-Theoretic Rough Set Models for Customer Churn Prediction: The Case of Telecommunication Sector Customers

Authors: Emel Kızılkaya Aydogan, Mihrimah Ozmen, Yılmaz Delice

Abstract:

In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.

Keywords: churn prediction, data mining, decision-theoretic rough set, feature selection

Procedia PDF Downloads 410
17 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 100
16 Comparative Analysis of Predictive Models for Customer Churn Prediction in the Telecommunication Industry

Authors: Deepika Christopher, Garima Anand

Abstract:

To determine the best model for churn prediction in the telecom industry, this paper compares 11 machine learning algorithms, namely Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, XGBoost, LightGBM, Cat Boost, AdaBoost, Extra Trees, Deep Neural Network, and Hybrid Model (MLPClassifier). It also aims to pinpoint the top three factors that lead to customer churn and conducts customer segmentation to identify vulnerable groups. According to the data, the Logistic Regression model performs the best, with an F1 score of 0.6215, 81.76% accuracy, 68.95% precision, and 56.57% recall. The top three attributes that cause churn are found to be tenure, Internet Service Fiber optic, and Internet Service DSL; conversely, the top three models in this article that perform the best are Logistic Regression, Deep Neural Network, and AdaBoost. The K means algorithm is applied to establish and analyze four different customer clusters. This study has effectively identified customers that are at risk of churn and may be utilized to develop and execute strategies that lower customer attrition.

Keywords: attrition, retention, predictive modeling, customer segmentation, telecommunications

Procedia PDF Downloads 16
15 Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Authors: Ulas Vural, M. Ergun Okay, E. Mesut Yildiz

Abstract:

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

Keywords: customer relationship management, churn prediction, telecom industry, deep learning, artificial neural networks

Procedia PDF Downloads 116
14 Computational Fluid Dynamics Modeling of Flow Properties Fluctuations in Slug-Churn Flow through Pipe Elbow

Authors: Nkemjika Chinenye-Kanu, Mamdud Hossain, Ghazi Droubi

Abstract:

Prediction of multiphase flow induced forces, void fraction and pressure is crucial at both design and operating stages of practical energy and process pipe systems. In this study, transient numerical simulations of upward slug-churn flow through a vertical 90-degree elbow have been conducted. The volume of fluid (VOF) method was used to model the two-phase flows while the K-epsilon Reynolds-Averaged Navier-Stokes (RANS) equations were used to model turbulence in the flows. The simulation results were validated using experimental results. Void fraction signal, peak frequency and maximum magnitude of void fraction fluctuation of the slug-churn flow validation case studies compared well with experimental results. The x and y direction force fluctuation signals at the elbow control volume were obtained by carrying out force balance calculations using the directly extracted time domain signals of flow properties through the control volume in the numerical simulation. The computed force signal compared well with experiment for the slug and churn flow validation case studies. Hence, the present numerical simulation technique was able to predict the behaviours of the one-way flow induced forces and void fraction fluctuations.

Keywords: computational fluid dynamics, flow induced vibration, slug-churn flow, void fraction and force fluctuation

Procedia PDF Downloads 127
13 Learning Dynamic Representations of Nodes in Temporally Variant Graphs

Authors: Sandra Mitrovic, Gaurav Singh

Abstract:

In many industries, including telecommunications, churn prediction has been a topic of active research. A lot of attention has been drawn on devising the most informative features, and this area of research has gained even more focus with spread of (social) network analytics. The call detail records (CDRs) have been used to construct customer networks and extract potentially useful features. However, to the best of our knowledge, no studies including network features have yet proposed a generic way of representing network information. Instead, ad-hoc and dataset dependent solutions have been suggested. In this work, we build upon a recently presented method (node2vec) to obtain representations for nodes in observed network. The proposed approach is generic and applicable to any network and domain. Unlike node2vec, which assumes a static network, we consider a dynamic and time-evolving network. To account for this, we propose an approach that constructs the feature representation of each node by generating its node2vec representations at different timestamps, concatenating them and finally compressing using an auto-encoder-like method in order to retain reasonably long and informative feature vectors. We test the proposed method on churn prediction task in telco domain. To predict churners at timestamp ts+1, we construct training and testing datasets consisting of feature vectors from time intervals [t1, ts-1] and [t2, ts] respectively, and use traditional supervised classification models like SVM and Logistic Regression. Observed results show the effectiveness of proposed approach as compared to ad-hoc feature selection based approaches and static node2vec.

Keywords: churn prediction, dynamic networks, node2vec, auto-encoders

Procedia PDF Downloads 282
12 Customer Churn Analysis in Telecommunication Industry Using Data Mining Approach

Authors: Burcu Oralhan, Zeki Oralhan, Nilsun Sariyer, Kumru Uyar

Abstract:

Data mining has been becoming more and more important and a wide range of applications in recent years. Data mining is the process of find hidden and unknown patterns in big data. One of the applied fields of data mining is Customer Relationship Management. Understanding the relationships between products and customers is crucial for every business. Customer Relationship Management is an approach to focus on customer relationship development, retention and increase on customer satisfaction. In this study, we made an application of a data mining methods in telecommunication customer relationship management side. This study aims to determine the customers profile who likely to leave the system, develop marketing strategies, and customized campaigns for customers. Data are clustered by applying classification techniques for used to determine the churners. As a result of this study, we will obtain knowledge from international telecommunication industry. We will contribute to the understanding and development of this subject in Customer Relationship Management.

Keywords: customer churn analysis, customer relationship management, data mining, telecommunication industry

Procedia PDF Downloads 281
11 Customer Relationship Management - “Is It a Myth or a Reality in Indian Consumer Context”

Authors: Manish Manohar Hingorani

Abstract:

The purpose of the research is to find out the level of understanding, adoption, and implementation of CRM in Indian Businesses, either product/ service and the processes which should be followed to ensure minimal to no customer churn and further enhance loyalty. The study used comprehensive qualitative interviews of 36 respondents across mid and senior-level management in product and services organizations of Indian origin. The findings of the study exhibit a gap between the understanding, adoption and implementation of CRM in the Indian context. Different Industries have attributed different levels of understanding, adoption, and limited implementation studies on CRM to the Indian context exists in different industries, but studies related to the consequences of not understanding the true meaning of CRM at the grass root level and further than on non-adoption and non-implementation will have an adverse effect on the customer loyalty, and customer satisfaction leading to customer churn. As this was a qualitative approach, the analysis was content-based and discourse based. The responses were taken from mid to very-senior management decision-makers in organizations of Indian origin.

Keywords: customer relationship management, Indian consumer, customer loyalty, customer experience, customer satisfaction

Procedia PDF Downloads 56
10 Big Data Strategy for Telco: Network Transformation

Authors: F. Amin, S. Feizi

Abstract:

Big data has the potential to improve the quality of services; enable infrastructure that businesses depend on to adapt continually and efficiently; improve the performance of employees; help organizations better understand customers; and reduce liability risks. Analytics and marketing models of fixed and mobile operators are falling short in combating churn and declining revenue per user. Big Data presents new method to reverse the way and improve profitability. The benefits of Big Data and next-generation network, however, are more exorbitant than improved customer relationship management. Next generation of networks are in a prime position to monetize rich supplies of customer information—while being mindful of legal and privacy issues. As data assets are transformed into new revenue streams will become integral to high performance.

Keywords: big data, next generation networks, network transformation, strategy

Procedia PDF Downloads 328
9 Identification of the Main Transition Velocities in a Bubble Column Based on a Modified Shannon Entropy

Authors: Stoyan Nedeltchev, Markus Schubert

Abstract:

The gas holdup fluctuations in a bubble column (0.15 m in ID) have been recorded by means of a conductivity wire-mesh sensor in order to extract information about the main transition velocities. These parameters are very important for bubble column design, operation and scale-up. For this purpose, the classical definition of the Shannon entropy was modified and used to identify both the onset (at UG=0.034 m/s) of the transition flow regime and the beginning (at UG=0.089 m/s) of the churn-turbulent flow regime. The results were compared with the Kolmogorov entropy (KE) results. A slight discrepancy was found, namely the transition velocities identified by means of the KE were shifted to somewhat higher (0.045 and 0.101 m/s) superficial gas velocities UG.

Keywords: bubble column, gas holdup fluctuations, modified Shannon entropy, Kolmogorov entropy

Procedia PDF Downloads 291
8 Family Business Succession through the Eye of the Upper Echelon Theory: A Phenomenological Approach

Authors: Ruswiati Suryasaputra, Linda Salim

Abstract:

This concept paper, initially a proposal for the completion of the degree of Doctor of Philosophy, is seeking to gain more understanding of family business succession in order to extend the average lifespan of family business that has shrunken significantly for the past 20 years. While multitude studies have been done in family business succession, the average lifespan of a family business continues to decline sharply over the past two decades to only 24 years, or 1.5 generations, in 2010, from 50-60 years, equivalent to 3 generations, as recently as 1990. While the qualitative approach of this study will not churn a theoretical framework unique to the family business field, it will bring to the surface important issues during a family business succession process that have been hidden behind the mostly profit-making issues that have been the main highlight of the family business field.

Keywords: family business, succession, nepotism, family studies

Procedia PDF Downloads 508
7 A Hybrid P2P Storage Scheme Based on Erasure Coding and Replication

Authors: Usman Mahmood, Khawaja M. U. Suleman

Abstract:

A peer-to-peer storage system has challenges like; peer availability, data protection, churn rate. To address these challenges different redundancy, replacement and repair schemes are used. This paper presents a hybrid scheme of redundancy using replication and erasure coding. We calculate and compare the storage, access, and maintenance costs of our proposed scheme with existing redundancy schemes. For realistic behaviour of peers a trace of live peer-to-peer system is used. The effect of different replication, and repair schemes are also shown. The proposed hybrid scheme performs better than existing double coding hybrid scheme in all metrics and have an improved maintenance cost than hierarchical codes.

Keywords: erasure coding, P2P, redundancy, replication

Procedia PDF Downloads 359
6 Video-On-Demand QoE Evaluation across Different Age-Groups and Its Significance for Network Capacity

Authors: Mujtaba Roshan, John A. Schormans

Abstract:

Quality of Experience (QoE) drives churn in the broadband networks industry, and good QoE plays a large part in the retention of customers. QoE is known to be affected by the Quality of Service (QoS) factors packet loss probability (PLP), delay and delay jitter caused by the network. Earlier results have shown that the relationship between these QoS factors and QoE is non-linear, and may vary from application to application. We use the network emulator Netem as the basis for experimentation, and evaluate how QoE varies as we change the emulated QoS metrics. Focusing on Video-on-Demand, we discovered that the reported QoE may differ widely for users of different age groups, and that the most demanding age group (the youngest) can require an order of magnitude lower PLP to achieve the same QoE than is required by the most widely studied age group of users. We then used a bottleneck TCP model to evaluate the capacity cost of achieving an order of magnitude decrease in PLP, and found it be (almost always) a 3-fold increase in link capacity that was required.

Keywords: network capacity, packet loss probability, quality of experience, quality of service

Procedia PDF Downloads 241
5 Influence of Telkom Membership Card Customer Perceived Value on Retaining PT. Telkom Indonesia's Customer in 2013-2014

Authors: Eka Yuliana, Siska Shabrina Julyan

Abstract:

The competitive environment and high customer’s churn rate in telecommunication industries lead Indonesian telecommunication companies become strive to offer products with more value. Offering product with more value can encourage customers to keep using the companies product. One of way to retain customer is give a membership card to the customers as practiced by PT. Telkom by giving Telkom Membership Card to PT. Telkom loyal customer. This study aims to determine the influence of Telkom Membership Card customer perceived value on retaining PT. Telkom Indonesia’s customer in 2013-2014 by using quantitative method with causal study. Analythical technique used in this study is Structural Equation Modelling (SEM) to test the causal relationship with 216 owner of Telkom Membership Card in Indonesia. This study conclude that: (i) Customer perceived value on Telkom Membership Card is located in fair value zone, (ii) PT. Telkom efforts in order to retain the customers is classified as good, (iii) Customer perceived value is influencing the effort to retain the customer with the probability value less than 0.05 and level of influence 69%. Based on result of this study, PT. Telkom should (i) Improve Telkom Membership Card’s promotion because not all customer of PT. Telkom have the membership card. (iia) Adding Telkom Membership Card’s benefit such as discount at various merchant (iib) Making call center for member of Telkom Membership Card (iii) PT. Telkom should be ensure availability of their service. (iv) PT. Telkom should make a priority to customer who have telkom membership card and offers a better service.For future research should be use different variables.

Keywords: customer perceived value, customer retention, marketing, relationship marketing

Procedia PDF Downloads 281
4 Cooperative Learning: A Case Study on Teamwork through Community Service Project

Authors: Priyadharshini Ahrumugam

Abstract:

Cooperative groups through much research have been recognized to churn remarkable achievements instead of solitary or individualistic efforts. Based on Johnson and Johnson’s model of cooperative learning, the five key components of cooperation are positive interdependence, face-to-face promotive interaction, individual accountability, social skills and group processing. In 2011, the Malaysian Ministry of Higher Education (MOHE) introduced the Holistic Student Development policy with the aim to develop morally sound individuals equipped with lifelong learning skills. The Community Service project was included in the improvement initiative. The purpose of this study is to assess the relationship of team-based learning in facilitating particularly students’ positive interdependence and face-to-face promotive interaction. The research methods involve in-depth interviews with the team leaders and selected team members, and a content analysis of the undergraduate students’ reflective journals. A significant positive relationship was found between students’ progressive outlook towards teamwork and the highlighted two components. The key findings show that students have gained in their individual learning and work results through teamwork and interaction with other students. The inclusion of Community Service as a MOHE subject resonates with cooperative learning methods that enhances supportive relationships and develops students’ social skills together with their professional skills.

Keywords: community service, cooperative learning, positive interdependence, teamwork

Procedia PDF Downloads 271
3 Operations Training Using Immersive Technologies: A Development Experience

Authors: A. Aman, S. M. Tang, F. H. Alharrassy

Abstract:

Omanisation was established to increase job opportunities for national employment in Sultanate of Oman. With half of the population below 25 years of age, the sultanate is striving to diversify the economy fast enough to meet the burgeoning number of jobseekers annually. On the other hand, training personnel to be competent oil and gas operators and technicians is a difficult task in a complex reservoir structures in Oman using highly advanced and sophisticated extracting processes. Coupled towards Omanisation which encourages nationals into the oil and gas sector so as to create sustainable employment for the local population, the challenge to churn out competent manpower became a daunting task. Immersive technologies provided the impetus to create a new digital media sector which provided job opportunities as well as the learning contents to enhance the competency-based training for the oil and gas sector in the Sultanate. This lead to a win-win-win collaboration amongst the government represented by the Information Technology Authority (ITA), private sector specialised company (represented by ASM Technologies), jobseekers and oil and gas organisations. This is also one of the first private-public partnership model in the Information Communication Technology (ICT) sector in Oman. A pilot phase was conducted for 8 months to develop four virtual applications for training in equipment and process engineering; oil rig familiarisation, Health Safety Environment (HSE) application, turbine application and the mechanical vapour compressor (MVC) water recycling plant in order to enhance the competency level of the trainees. The immersive applications were installed in operational settings which enabled new employees to practice and understand various processes and procedures regarding enhanced oil recovery. Existing employees used the application to review the working principles in order to carry out troubleshooting scenarios. Concurrently, these applications were also developed by local Omani resources within the country. This created job opportunities for job-seekers as well the establishment of a digital media sector. The purpose of this paper is to discuss how immersive technologies can enhance operational competencies, create job and establish a digital media sector in the Sultanate of Oman.

Keywords: immersive, virtual reality, operations training, Omanisation

Procedia PDF Downloads 190
2 Transition towards a Market Society: Commodification of Public Health in India and Pakistan

Authors: Mayank Mishra

Abstract:

Market Economy can be broadly defined as economic system where supply and demand regulate the economy and in which decisions pertaining to production, consumption, allocation of resources, price and competition are made by collective actions of individuals or organisations with limited government intervention. On the other hand Market Society is one where instead of the economy being embedded in social relations, social relations are embedded in the economy. A market economy becomes a market society when all of land, labour and capital are commodified. This transition also has effect on people’s attitude and values. Such a transition commence impacting the non-material aspect of life such as public education, public health and the like. The inception of neoliberal policies in non-market norms altered the nature of social goods like public health that raised the following questions. What impact would the transition to a market society make on people in terms of accessibility to public health? Is healthcare a commodity that can be subjected to a competitive market place? What kind of private investments are being made in public health and how do private investments alter the nature of a public good like healthcare? This research problem will employ empirical-analytical approach that includes deductive reasoning which will be using the existing concept of market economy and market society as a foundation for the analytical framework and the hypotheses to be examined. The research also intends to inculcate the naturalistic elements of qualitative methodology which refers to studying of real world situations as they unfold. The research will analyse the existing literature available on the subject. Concomitantly the research intends to access the primary literature which includes reports from the World Bank, World Health Organisation (WHO) and the different departments of respective ministries of the countries for the analysis. This paper endeavours to highlight how the issue of commodification of public health would lead to perpetual increase in its inaccessibility leading to stratification of healthcare services where one can avail the better services depending on the extent of one’s ability to pay. Since the fundamental maxim of private investments is to churn out profits, these kinds of trends would pose a detrimental effect on the society at large perpetuating the lacuna between the have and the have-nots.The increasing private investments, both, domestic and foreign, in public health sector are leading to increasing inaccessibility of public health services. Despite the increase in various public health schemes the quality and impact of government public health services are on a continuous decline.

Keywords: commodity, India and Pakistan, market society, public health

Procedia PDF Downloads 277
1 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 385