Search results for: evolutionary neural network
2736 A Mathematical Agent-Based Model to Examine Two Patterns of Language Change
Authors: Gareth Baxter
Abstract:
We use a mathematical model of language change to examine two recently observed patterns of language change: one in which most speakers change gradually, following the mean of the community change, and one in which most individuals use predominantly one variant or another, and change rapidly if they change at all. The model is based on Croft’s Utterance Selection account of language change, which views language change as an evolutionary process, in which different variants (different ‘ways of saying the same thing’) compete for usage in a population of speakers. Language change occurs when a new variant replaces an older one as the convention within a given population. The present model extends a previous simpler model to include effects related to speaker aging and interspeaker variation in behaviour. The two patterns of individual change (one more centralized and the other more polarized) were recently observed in historical language changes, and it was further observed that slower changes were more associated with the centralized pattern, while quicker changes were more polarized. Our model suggests that the two patterns of change can be explained by different balances between the preference of speakers to use one variant over another and the degree of accommodation to (propensity to adapt towards) other speakers. The correlation with the rate of change appears naturally in our model, and results from the fact that both differential weighting of variants and the degree of accommodation affect the time for change to occur, while also determining the patterns of change. This work represents part of an ongoing effort to examine phenomena in language change through the use of mathematical models. This offers another way to evaluate qualitative explanations that cannot be practically tested (or cannot be tested at all) in a real-world, large-scale speech community.Keywords: agent based modeling, cultural evolution, language change, social behavior modeling, social influence
Procedia PDF Downloads 2362735 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison
Authors: Xiangtuo Chen, Paul-Henry Cournéde
Abstract:
Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest
Procedia PDF Downloads 2322734 Genome Characterization and Phylogeny Analysis of Viruses Infected Invertebrates, Parvoviridae Family
Authors: Niloofar Fariborzi, Hamzeh Alipour, Kourosh Azizi, Neda Eskandarzade, Abozar Ghorbani
Abstract:
The family Parvoviridae consists of a large diversity of single-stranded DNA viruses, which cause mild to severe diseases in both vertebrates and invertebrates. The Parvoviridae are classified into three subfamilies: Parvovirinae infect vertebrates, Densovirinae infects invertebrates, while Hamaparovirinae infects both vertebrates and invertebrates. Except for the NS1 region, which is the prime criterion for phylogeny analysis, other parts of the parvoviruses genome, such as UTRs, are diverse even among closely related viruses or within the same genus. It is believed that host switching in parvoviruses may be related to genetic changes in regions other than NS1; therefore, whole-genome screening is valuable for studying parvoviruses' host-virus interactions. The aim of this study was to analyze genome organization and phylogeny of the complete genome sequence of the 132 Paroviridae family members, focusing on viruses that infect invertebrates. The maximum and minimum divergence within each subfamily belonged to Densovirinae and Parvovirinae, respectively. The greatest evolutionary divergence was between Hamaparovirinae and Parvovirinae. Unclassified viruses were mostly from Parovirinae and had the highest divergence to densoviruses and the lowest divergence to Parovirinae viruses. In a phylogenetic tree, all hamparoviruses were found in the center of densoviruses, with the exception of Syngnathid Ichthamaparvovirus 1 (NC_055527), which was positioned between two Parvovirinae members (NC _022089 and NC_038544). The proximity of hamparoviruses members to some densoviruses strengthens the possibility that densoviruses may be the ancestors of hamaparoviruses or vice versa. Therefore, examination and phylogeny analysis of the whole genome is necessary to understand Parvoviridae family host selection.Keywords: densoviruses, parvoviridae, bioinformatics, phylogeny
Procedia PDF Downloads 962733 Elucidation of the Sequential Transcriptional Activity in Escherichia coli Using Time-Series RNA-Seq Data
Authors: Pui Shan Wong, Kosuke Tashiro, Satoru Kuhara, Sachiyo Aburatani
Abstract:
Functional genomics and gene regulation inference has readily expanded our knowledge and understanding of gene interactions with regards to expression regulation. With the advancement of transcriptome sequencing in time-series comes the ability to study the sequential changes of the transcriptome. This method presented here works to augment existing regulation networks accumulated in literature with transcriptome data gathered from time-series experiments to construct a sequential representation of transcription factor activity. This method is applied on a time-series RNA-Seq data set from Escherichia coli as it transitions from growth to stationary phase over five hours. Investigations are conducted on the various metabolic activities in gene regulation processes by taking advantage of the correlation between regulatory gene pairs to examine their activity on a dynamic network. Especially, the changes in metabolic activity during phase transition are analyzed with focus on the pagP gene as well as other associated transcription factors. The visualization of the sequential transcriptional activity is used to describe the change in metabolic pathway activity originating from the pagP transcription factor, phoP. The results show a shift from amino acid and nucleic acid metabolism, to energy metabolism during the transition to stationary phase in E. coli.Keywords: Escherichia coli, gene regulation, network, time-series
Procedia PDF Downloads 3732732 Using Building Information Modelling to Mitigate Risks Associated with Health and Safety in the Construction and Maintenance of Infrastructure Assets
Authors: Mohammed Muzafar, Darshan Ruikar
Abstract:
BIM, an acronym for Building Information Modelling relates to the practice of creating a computer generated model which is capable of displaying the planning, design, construction and operation of a structure. The resulting simulation is a data-rich, object-oriented, intelligent and parametric digital representation of the facility, from which views and data, appropriate to various users needs can be extracted and analysed to generate information that can be used to make decisions and to improve the process of delivering the facility. BIM also refers to a shift in culture that will influence the way the built environment and infrastructure operates and how it is delivered. One of the main issues of concern in the construction industry at present in the UK is its record on Health & Safety (H&S). It is, therefore, important that new technologies such as BIM are developed to help improve the quality of health and safety. Historically the H&S record of the construction industry in the UK is relatively poor as compared to the manufacturing industries. BIM and the digital environment it operates within now allow us to use design and construction data in a more intelligent way. It allows data generated by the design process to be re-purposed and contribute to improving efficiencies in other areas of a project. This evolutionary step in design is not only creating exciting opportunities for the designers themselves but it is also creating opportunity for every stakeholder in any given project. From designers, engineers, contractors through to H&S managers, BIM is accelerating a cultural change. The paper introduces the concept behind a research project that mitigates the H&S risks associated with the construction, operation and maintenance of assets through the adoption of BIM.Keywords: building information modeling, BIM levels, health, safety, integration
Procedia PDF Downloads 2552731 Wireless Sensor Network for Forest Fire Detection and Localization
Authors: Tarek Dandashi
Abstract:
WSNs may provide a fast and reliable solution for the early detection of environment events like forest fires. This is crucial for alerting and calling for fire brigade intervention. Sensor nodes communicate sensor data to a host station, which enables a global analysis and the generation of a reliable decision on a potential fire and its location. A WSN with TinyOS and nesC for the capturing and transmission of a variety of sensor information with controlled source, data rates, duration, and the records/displaying activity traces is presented. We propose a similarity distance (SD) between the distribution of currently sensed data and that of a reference. At any given time, a fire causes diverging opinions in the reported data, which alters the usual data distribution. Basically, SD consists of a metric on the Cumulative Distribution Function (CDF). SD is designed to be invariant versus day-to-day changes of temperature, changes due to the surrounding environment, and normal changes in weather, which preserve the data locality. Evaluation shows that SD sensitivity is quadratic versus an increase in sensor node temperature for a group of sensors of different sizes and neighborhood. Simulation of fire spreading when ignition is placed at random locations with some wind speed shows that SD takes a few minutes to reliably detect fires and locate them. We also discuss the case of false negative and false positive and their impact on the decision reliability.Keywords: forest fire, WSN, wireless sensor network, algortihm
Procedia PDF Downloads 2632730 Transmission of Values among Polish Young Adults and Their Parents: Pseudo Dyad Analysis and Gender Differences
Authors: Karolina Pietras, Joanna Fryt, Aleksandra Gronostaj, Tomasz Smolen
Abstract:
Young women and men differ from their parents in preferred values. Those differences enable their adaptability to a new socio-cultural context and help with fulfilling developmental tasks specific to young adulthood. At the same time core values, with special importance to family members, are transmitted within families. Intergenerational similarities in values may thus be both an effect of value transmission within a family and a consequence of sharing the same socio-cultural context. These processes are difficult to separate. In our study we assessed similarities and differences in values within four intergenerational family dyads (mothers-daughters, fathers-daughters, mothers-sons, fathers-sons). Sixty Polish young adults (30 women and 30 men aged 19-25) along with their parents (a total of 180 participants) completed the Schwartz’ Portrait Value Questionnaire (PVQ-21). To determine which values may be transmitted within families, we used a correlation analysis and pseudo dyad analysis that allows for the estimation of a baseline likeness between all tested subjects and consequently makes it possible to determine if similarities between actual family members are greater than chance. We also assessed whether different strategies of measuring similarity between family members render different results, and checked whether resemblances in family dyads are influenced by child’s and parent’s gender. Reported similarities were interpreted in light of the evolutionary and the value salience perspective.Keywords: intergenerational differences in values, gender differences, pseudo dyad analysis, transmission of values
Procedia PDF Downloads 5022729 Filtering Intrusion Detection Alarms Using Ant Clustering Approach
Authors: Ghodhbani Salah, Jemili Farah
Abstract:
With the growth of cyber attacks, information safety has become an important issue all over the world. Many firms rely on security technologies such as intrusion detection systems (IDSs) to manage information technology security risks. IDSs are considered to be the last line of defense to secure a network and play a very important role in detecting large number of attacks. However the main problem with today’s most popular commercial IDSs is generating high volume of alerts and huge number of false positives. This drawback has become the main motivation for many research papers in IDS area. Hence, in this paper we present a data mining technique to assist network administrators to analyze and reduce false positive alarms that are produced by an IDS and increase detection accuracy. Our data mining technique is unsupervised clustering method based on hybrid ANT algorithm. This algorithm discovers clusters of intruders’ behavior without prior knowledge of a possible number of classes, then we apply K-means algorithm to improve the convergence of the ANT clustering. Experimental results on real dataset show that our proposed approach is efficient with high detection rate and low false alarm rate.Keywords: intrusion detection system, alarm filtering, ANT class, ant clustering, intruders’ behaviors, false alarms
Procedia PDF Downloads 4052728 Optimal Placement of the Unified Power Controller to Improve the Power System Restoration
Authors: Mohammad Reza Esmaili
Abstract:
One of the most important parts of the restoration process of a power network is the synchronizing of its subsystems. In this situation, the biggest concern of the system operators will be the reduction of the standing phase angle (SPA) between the endpoints of the two islands. In this regard, the system operators perform various actions and maneuvers so that the synchronization operation of the subsystems is successfully carried out and the system finally reaches acceptable stability. The most common of these actions include load control, generation control and, in some cases, changing the network topology. Although these maneuvers are simple and common, due to the weak network and extreme load changes, the restoration will be associated with low speed. One of the best ways to control the SPA is to use FACTS devices. By applying a soft control signal, these tools can reduce the SPA between two subsystems with more speed and accuracy, and the synchronization process can be done in less time. Meanwhile, the unified power controller (UPFC), a series-parallel compensator device with the change of transmission line power and proper adjustment of the phase angle, will be the proposed option in order to realize the subject of this research. Therefore, with the optimal placement of UPFC in a power system, in addition to improving the normal conditions of the system, it is expected to be effective in reducing the SPA during power system restoration. Therefore, the presented paper provides an optimal structure to coordinate the three problems of improving the division of subsystems, reducing the SPA and optimal power flow with the aim of determining the optimal location of UPFC and optimal subsystems. The proposed objective functions in this paper include maximizing the quality of the subsystems, reducing the SPA at the endpoints of the subsystems, and reducing the losses of the power system. Since there will be a possibility of creating contradictions in the simultaneous optimization of the proposed objective functions, the structure of the proposed optimization problem is introduced as a non-linear multi-objective problem, and the Pareto optimization method is used to solve it. The innovative technique proposed to implement the optimization process of the mentioned problem is an optimization algorithm called the water cycle (WCA). To evaluate the proposed method, the IEEE 39 bus power system will be used.Keywords: UPFC, SPA, water cycle algorithm, multi-objective problem, pareto
Procedia PDF Downloads 672727 Bitcoin, Blockchain and Smart Contract: Attacks and Mitigations
Authors: Mohamed Rasslan, Doaa Abdelrahman, Mahmoud M. Nasreldin, Ghada Farouk, Heba K. Aslan
Abstract:
Blockchain is a distributed database that endorses transparency while bitcoin is a decentralized cryptocurrency (electronic cash) that endorses anonymity and is powered by blockchain technology. Smart contracts are programs that are stored on a blockchain. Smart contracts are executed when predetermined conditions are fulfilled. Smart contracts automate the agreement execution in order to make sure that all participants immediate-synchronism of the outcome-certainty, without any intermediary's involvement or time loss. Currently, the Bitcoin market worth billions of dollars. Bitcoin could be transferred from one purchaser to another without the need for an intermediary bank. Network nodes through cryptography verify bitcoin transactions, which are registered in a public-book called “blockchain”. Bitcoin could be replaced by other coins, merchandise, and services. Rapid growing of the bitcoin market-value, encourages its counterparts to make use of its weaknesses and exploit vulnerabilities for profit. Moreover, it motivates scientists to define known vulnerabilities, offer countermeasures, and predict future threats. In his paper, we study blockchain technology and bitcoin from the attacker’s point of view. Furthermore, mitigations for the attacks are suggested, and contemporary security solutions are discussed. Finally, research methods that achieve strict security and privacy protocol are elaborated.Keywords: Cryptocurrencies, Blockchain, Bitcoin, Smart Contracts, Peer-to-Peer Network, Security Issues, Privacy Techniques
Procedia PDF Downloads 842726 Exploring the Synergistic Effects of Aerobic Exercise and Cinnamon Extract on Metabolic Markers in Insulin-Resistant Rats through Advanced Machine Learning and Deep Learning Techniques
Authors: Masoomeh Alsadat Mirshafaei
Abstract:
The present study aims to explore the effect of an 8-week aerobic training regimen combined with cinnamon extract on serum irisin and leptin levels in insulin-resistant rats. Additionally, this research leverages various machine learning (ML) and deep learning (DL) algorithms to model the complex interdependencies between exercise, nutrition, and metabolic markers, offering a groundbreaking approach to obesity and diabetes research. Forty-eight Wistar rats were selected and randomly divided into four groups: control, training, cinnamon, and training cinnamon. The training protocol was conducted over 8 weeks, with sessions 5 days a week at 75-80% VO2 max. The cinnamon and training-cinnamon groups were injected with 200 ml/kg/day of cinnamon extract. Data analysis included serum data, dietary intake, exercise intensity, and metabolic response variables, with blood samples collected 72 hours after the final training session. The dataset was analyzed using one-way ANOVA (P<0.05) and fed into various ML and DL models, including Support Vector Machines (SVM), Random Forest (RF), and Convolutional Neural Networks (CNN). Traditional statistical methods indicated that aerobic training, with and without cinnamon extract, significantly increased serum irisin and decreased leptin levels. Among the algorithms, the CNN model provided superior performance in identifying specific interactions between cinnamon extract concentration and exercise intensity, optimizing the increase in irisin and the decrease in leptin. The CNN model achieved an accuracy of 92%, outperforming the SVM (85%) and RF (88%) models in predicting the optimal conditions for metabolic marker improvements. The study demonstrated that advanced ML and DL techniques could uncover nuanced relationships and potential cellular responses to exercise and dietary supplements, which is not evident through traditional methods. These findings advocate for the integration of advanced analytical techniques in nutritional science and exercise physiology, paving the way for personalized health interventions in managing obesity and diabetes.Keywords: aerobic training, cinnamon extract, insulin resistance, irisin, leptin, convolutional neural networks, exercise physiology, support vector machines, random forest
Procedia PDF Downloads 412725 Electric Arc Furnaces as a Source of Voltage Fluctuations in the Power System
Authors: Zbigniew Olczykowski
Abstract:
The paper presents the impact of work on the electric arc furnace power grid. The arc furnace operating will be modeled at different power conditions of steelworks. The paper will describe how to determine the increase in voltage fluctuations caused by working in parallel arc furnaces. The analysis of indicators characterizing the quality of electricity recorded during several cycles of measurement made at the same time at three points grid, with different power and different short-circuit rated voltage, will be carried out. The measurements analysis presented in this paper were conducted in the mains of one of the Polish steel. The indicators characterizing the quality of electricity was recorded during several cycles of measurement while making measurements at three points of different power network short-circuit power and various voltage ratings. Measurements of power quality indices included the one-week measurement cycles in accordance with the EN-50160. Data analysis will include the results obtained during the simultaneous measurement of three-point grid. This will determine the actual propagation of interference generated by the device. Based on the model studies and measurements of quality indices of electricity we will establish the effect of a specific arc on the mains. The short-circuit power network’s minimum value will also be estimated, this is necessary to limit the voltage fluctuations generated by arc furnaces.Keywords: arc furnaces, long-term flicker, measurement and modeling of power quality, voltage fluctuations
Procedia PDF Downloads 2902724 Variations in Wood Traits across Major Gymnosperm and Angiosperm Tree Species and the Driving Factors in China
Authors: Meixia Zhang, Chengjun Ji, Wenxuan Han
Abstract:
Many wood traits are important functional attributes for tree species, connected with resource competition among species, community dynamics, and ecosystem functions. Large variations in these traits exist among taxonomic categories, but variation in these traits between gymnosperms and angiosperms is still poorly documented. This paper explores the systematic differences in 12 traits between the two tree categories and the potential effects of environmental factors and life form. Based on a database of wood traits for major gymnosperm and angiosperm tree species across China, the values of 12 wood traits and their driving factors in gymnosperms vs. angiosperms were compared. The results are summarized below: i) Means of wood traits were all significantly lower in gymnosperms than in angiosperms. ii) Air-dried density (ADD) and tangential shrinkage coefficient (TSC) reflect the basic information of wood traits for gymnosperms, while ADD and radial shrinkage coefficient (RSC) represent those for angiosperms, providing higher explanation power when used as the evaluation index of wood traits. iii) For both gymnosperm and angiosperm species, life form exhibits the largest explanation rate for large-scale spatial patterns of ADD, TSC (RSC), climatic factors the next, and edaphic factors have the least effect, suggesting that life form is the dominant factor controlling spatial patterns of wood traits. Variations in the magnitude and key traits between gymnosperms and angiosperms and the same dominant factors might indicate the evolutionary divergence and convergence in key functional traits among woody plants.Keywords: allometry, functional traits, phylogeny, shrinkage coefficient, wood density
Procedia PDF Downloads 2772723 Lessons Learned in Developing a Clinical Information System and Electronic Health Record (EHR) System That Meet the End User Needs and State of Qatar's Emerging Regulations
Authors: Darshani Premaratne, Afshin Kandampath Puthiyadath
Abstract:
The Government of Qatar is taking active steps in improving quality of health care industry in the state of Qatar. In this initiative development and market introduction of Clinical Information System and Electronic Health Record (EHR) system are proved to be a highly challenging process. Along with an organization specialized on EHR system development and with the blessing of Health Ministry of Qatar the process of introduction of EHR system in Qatar healthcare industry was undertaken. Initially a market survey was carried out to understand the requirements. Secondly, the available government regulations, needs and possible upcoming regulations were carefully studied before deployment of resources for software development. Sufficient flexibility was allowed to cater for both the changes in the market and the regulations. As the first initiative a system that enables integration of referral network where referral clinic and laboratory system for all single doctor (and small scale) clinics was developed. Setting of isolated single doctor clinics all over the state to bring in to an integrated referral network along with a referral hospital need a coherent steering force and a solid top down framework. This paper discusses about the lessons learned in developing, in obtaining approval of the health ministry and in introduction to the industry of the single doctor referral network along with an EHR system. It was concluded that development of this nature required continues balance between the market requirements and upcoming regulations. Further accelerating the development based on the emerging needs, implementation based on the end user needs while tallying with the regulations, diffusion, and uptake of demand-driven and evidence-based products, tools, strategies, and proper utilization of findings were equally found paramount in successful development of end product. Development of full scale Clinical Information System and EHR system are underway based on the lessons learned. The Government of Qatar is taking active steps in improving quality of health care industry in the state of Qatar. In this initiative development and market introduction of Clinical Information System and Electronic Health Record (EHR) system are proved to be a highly challenging process. Along with an organization specialized on EHR system development and with the blessing of Health Ministry of Qatar the process of introduction of EHR system in Qatar healthcare industry was undertaken. Initially a market survey was carried out to understand the requirements. Secondly the available government regulations, needs and possible upcoming regulations were carefully studied before deployment of resources for software development. Sufficient flexibility was allowed to cater for both the changes in the market and the regulations. As the first initiative a system that enables integration of referral network where referral clinic and laboratory system for all single doctor (and small scale) clinics was developed. Setting of isolated single doctor clinics all over the state to bring in to an integrated referral network along with a referral hospital need a coherent steering force and a solid top down framework. This paper discusses about the lessons learned in developing, in obtaining approval of the health ministry and in introduction to the industry of the single doctor referral network along with an EHR system. It was concluded that development of this nature required continues balance between the market requirements and upcoming regulations. Further accelerating the development based on the emerging needs, implementation based on the end user needs while tallying with the regulations, diffusion, and uptake of demand-driven and evidence-based products, tools, strategies, and proper utilization of findings were equally found paramount in successful development of end product. Development of full scale Clinical Information System and EHR system are underway based on the lessons learned.Keywords: clinical information system, electronic health record, state regulations, integrated referral network of clinics
Procedia PDF Downloads 3632722 Performance Evaluation of Wideband Code Division Multiplication Network
Authors: Osama Abdallah Mohammed Enan, Amin Babiker A/Nabi Mustafa
Abstract:
The aim of this study is to evaluate and analyze different parameters of WCDMA (wideband code division multiplication). Moreover, this study also incorporates brief yet throughout analysis of WCDMA’s components as well as its internal architecture. This study also examines different power controls. These power controls may include open loop power control, closed or inner group loop power control and outer loop power control. Different handover techniques or methods of WCDMA are also illustrated in this study. These handovers may include hard handover, inter system handover and soft and softer handover. Different duplexing techniques are also described in the paper. This study has also presented an idea about different parameters of WCDMA that leads the system towards QoS issues. This may help the operator in designing and developing adequate network configuration. In addition to this, the study has also investigated various parameters including Bit Energy per Noise Spectral Density (Eb/No), Noise rise, and Bit Error Rate (BER). After simulating these parameters, using MATLAB environment, it was investigated that, for a given Eb/No value the system capacity increase by increasing the reuse factor. Besides that, it was also analyzed that, noise rise is decreasing for lower data rates and for lower interference levels. Finally, it was examined that, BER increase by using one type of modulation technique than using other type of modulation technique.Keywords: duplexing, handover, loop power control, WCDMA
Procedia PDF Downloads 2172721 Understanding the Basics of Information Security: An Act of Defense
Authors: Sharon Q. Yang, Robert J. Congleton
Abstract:
Information security is a broad concept that covers any issues and concerns about the proper access and use of information on the Internet, including measures and procedures to protect intellectual property and private data from illegal access and online theft; the act of hacking; and any defensive technologies that contest such cybercrimes. As more research and commercial activities are conducted online, cybercrimes have increased significantly, putting sensitive information at risk. Information security has become critically important for organizations and private citizens alike. Hackers scan for network vulnerabilities on the Internet and steal data whenever they can. Cybercrimes disrupt our daily life, cause financial losses, and instigate fear in the public. Since the start of the pandemic, most data related cybercrimes targets have been either financial or health information from companies and organizations. Libraries also should have a high interest in understanding and adopting information security methods to protect their patron data and copyrighted materials. But according to information security professionals, higher education and cultural organizations, including their libraries, are the least prepared entities for cyberattacks. One recent example is that of Steven’s Institute of Technology in New Jersey in the US, which had its network hacked in 2020, with the hackers demanding a ransom. As a result, the network of the college was down for two months, causing serious financial loss. There are other cases where libraries, colleges, and universities have been targeted for data breaches. In order to build an effective defense, we need to understand the most common types of cybercrimes, including phishing, whaling, social engineering, distributed denial of service (DDoS) attacks, malware and ransomware, and hacker profiles. Our research will focus on each hacking technique and related defense measures; and the social background and reasons/purpose of hacker and hacking. Our research shows that hacking techniques will continue to evolve as new applications, housing information, and data on the Internet continue to be developed. Some cybercrimes can be stopped with effective measures, while others present challenges. It is vital that people understand what they face and the consequences when not prepared.Keywords: cybercrimes, hacking technologies, higher education, information security, libraries
Procedia PDF Downloads 1352720 Identification of Hedgerows in the Agricultural Landscapes of Mugada within Bartın Province, Turkey
Authors: Yeliz Sarı Nayim, B. Niyami Nayim
Abstract:
Biotopes such as forest areas rich in biodiversity, wetlands, hedgerows and woodlands play important ecological roles in agricultural landscapes. Of these semi-natural areas and features, hedgerows are the most common landscape elements. Their most significant features are that they serve as a barrier between the agricultural lands, serve as shelter, add aesthetical value to the landscape and contribute significantly to the wildlife and biodiversity. Hedgerows surrounding agricultural landscapes also provide an important habitat for pollinators which are important for agricultural production. This study looks into the identification of hedgerows in agricultural lands in the Mugada rural area within Bartın province, Turkey. From field data and-and satellite images, it is clear that in this area, especially around rural settlements, large forest areas have been cleared for settlement and agriculture. A network of hedgerows is also apparent, which might potentially play an important role in the otherwise open agricultural landscape. We found that these hedgerows serve as an ecological and biological corridor, linking forest ecosystems. Forest patches of different sizes and creating a habitat network across the landscape. Some examples of this will be presented. The overall conclusion from the study is that ecologically, biologically and aesthetically important hedge biotopes should be maintained in the long term in agricultural landscapes such as this. Some suggestions are given for how they could be managed sustainably into the future.Keywords: agricultural biotopes, Hedgerows, landscape ecology, Turkey
Procedia PDF Downloads 3072719 Cognitive Model of Analogy Based on Operation of the Brain Cells: Glial, Axons and Neurons
Authors: Ozgu Hafizoglu
Abstract:
Analogy is an essential tool of human cognition that enables connecting diffuse and diverse systems with attributional, deep structural, casual relations that are essential to learning, to innovation in artificial worlds, and to discovery in science. Cognitive Model of Analogy (CMA) leads and creates information pattern transfer within and between domains and disciplines in science. This paper demonstrates the Cognitive Model of Analogy (CMA) as an evolutionary approach to scientific research. The model puts forward the challenges of deep uncertainty about the future, emphasizing the need for flexibility of the system in order to enable reasoning methodology to adapt to changing conditions. In this paper, the model of analogical reasoning is created based on brain cells, their fractal, and operational forms within the system itself. Visualization techniques are used to show correspondences. Distinct phases of the problem-solving processes are divided thusly: encoding, mapping, inference, and response. The system is revealed relevant to brain activation considering each of these phases with an emphasis on achieving a better visualization of the brain cells: glial cells, axons, axon terminals, and neurons, relative to matching conditions of analogical reasoning and relational information. It’s found that encoding, mapping, inference, and response processes in four-term analogical reasoning are corresponding with the fractal and operational forms of brain cells: glial, axons, and neurons.Keywords: analogy, analogical reasoning, cognitive model, brain and glials
Procedia PDF Downloads 1862718 Human Identification and Detection of Suspicious Incidents Based on Outfit Colors: Image Processing Approach in CCTV Videos
Authors: Thilini M. Yatanwala
Abstract:
CCTV (Closed-Circuit-Television) Surveillance System is being used in public places over decades and a large variety of data is being produced every moment. However, most of the CCTV data is stored in isolation without having integrity. As a result, identification of the behavior of suspicious people along with their location has become strenuous. This research was conducted to acquire more accurate and reliable timely information from the CCTV video records. The implemented system can identify human objects in public places based on outfit colors. Inter-process communication technologies were used to implement the CCTV camera network to track people in the premises. The research was conducted in three stages and in the first stage human objects were filtered from other movable objects available in public places. In the second stage people were uniquely identified based on their outfit colors and in the third stage an individual was continuously tracked in the CCTV network. A face detection algorithm was implemented using cascade classifier based on the training model to detect human objects. HAAR feature based two-dimensional convolution operator was introduced to identify features of the human face such as region of eyes, region of nose and bridge of the nose based on darkness and lightness of facial area. In the second stage outfit colors of human objects were analyzed by dividing the area into upper left, upper right, lower left, lower right of the body. Mean color, mod color and standard deviation of each area were extracted as crucial factors to uniquely identify human object using histogram based approach. Color based measurements were written in to XML files and separate directories were maintained to store XML files related to each camera according to time stamp. As the third stage of the approach, inter-process communication techniques were used to implement an acknowledgement based CCTV camera network to continuously track individuals in a network of cameras. Real time analysis of XML files generated in each camera can determine the path of individual to monitor full activity sequence. Higher efficiency was achieved by sending and receiving acknowledgments only among adjacent cameras. Suspicious incidents such as a person staying in a sensitive area for a longer period or a person disappeared from the camera coverage can be detected in this approach. The system was tested for 150 people with the accuracy level of 82%. However, this approach was unable to produce expected results in the presence of group of people wearing similar type of outfits. This approach can be applied to any existing camera network without changing the physical arrangement of CCTV cameras. The study of human identification and suspicious incident detection using outfit color analysis can achieve higher level of accuracy and the project will be continued by integrating motion and gait feature analysis techniques to derive more information from CCTV videos.Keywords: CCTV surveillance, human detection and identification, image processing, inter-process communication, security, suspicious detection
Procedia PDF Downloads 1842717 The Correlation between Air Pollution and Tourette Syndrome
Authors: Mengnan Sun
Abstract:
It is unclear about the association between air pollution and Tourette Syndrome (TS), although people have suspected that air pollution might trigger TS. TS is a type of neural system disease usually found among children. The number of TS patients has significantly increased in recent decades, suggesting an importance and urgency to examine the possible triggers or conditions that are associated with TS. In this study, the correlation between air pollution and three allergic diseases---asthma, allergic conjunctivitis (AC), and allergic rhinitis (AR)---is examined. Then, a correlation between these allergic diseases and TS is proved. In this way, this study establishes a positive correlation between air pollution and TS. Measures the public can take to help TS patients are also analyzed at the end of this article. The article hopes to raise people’s awareness to reduce air pollution for the good of TS patients or people with other disorders that are associated with air pollution.Keywords: air pollution, allergic diseases, climate change, Tourette Syndrome
Procedia PDF Downloads 642716 Criticality Assessment Model for Water Pipelines Using Fuzzy Analytical Network Process
Abstract:
Water networks (WNs) are responsible of providing adequate amounts of safe, high quality, water to the public. As other critical infrastructure systems, WNs are subjected to deterioration which increases the number of breaks and leaks and lower water quality. In Canada, 35% of water assets require critical attention and there is a significant gap between the needed and the implemented investments. Thus, the need for efficient rehabilitation programs is becoming more urgent given the paradigm of aging infrastructure and tight budget. The first step towards developing such programs is to formulate a Performance Index that reflects the current condition of water assets along with its criticality. While numerous studies in the literature have focused on various aspects of condition assessment and reliability, limited efforts have investigated the criticality of such components. Critical water mains are those whose failure cause significant economic, environmental or social impacts on a community. Inclusion of criticality in computing the performance index will serve as a prioritizing tool for the optimum allocating of the available resources and budget. In this study, several social, economic, and environmental factors that dictate the criticality of a water pipelines have been elicited from analyzing the literature. Expert opinions were sought to provide pairwise comparisons of the importance of such factors. Subsequently, Fuzzy Logic along with Analytical Network Process (ANP) was utilized to calculate the weights of several criteria factors. Multi Attribute Utility Theories (MAUT) was then employed to integrate the aforementioned weights with the attribute values of several pipelines in Montreal WN. The result is a criticality index, 0-1, that quantifies the severity of the consequence of failure of each pipeline. A novel contribution of this approach is that it accounts for both the interdependency between criteria factors as well as the inherited uncertainties in calculating the criticality. The practical value of the current study is represented by the automated tool, Excel-MATLAB, which can be used by the utility managers and decision makers in planning for future maintenance and rehabilitation activities where high-level efficiency in use of materials and time resources is required.Keywords: water networks, criticality assessment, asset management, fuzzy analytical network process
Procedia PDF Downloads 1482715 A Cloud-Based Federated Identity Management in Europe
Authors: Jesus Carretero, Mario Vasile, Guillermo Izquierdo, Javier Garcia-Blas
Abstract:
Currently, there is a so called ‘identity crisis’ in cybersecurity caused by the substantial security, privacy and usability shortcomings encountered in existing systems for identity management. Federated Identity Management (FIM) could be solution for this crisis, as it is a method that facilitates management of identity processes and policies among collaborating entities without enforcing a global consistency, that is difficult to achieve when there are ID legacy systems. To cope with this problem, the Connecting Europe Facility (CEF) initiative proposed in 2014 a federated solution in anticipation of the adoption of the Regulation (EU) N°910/2014, the so-called eIDAS Regulation. At present, a network of eIDAS Nodes is being deployed at European level to allow that every citizen recognized by a member state is to be recognized within the trust network at European level, enabling the consumption of services in other member states that, until now were not allowed, or whose concession was tedious. This is a very ambitious approach, since it tends to enable cross-border authentication of Member States citizens without the need to unify the authentication method (eID Scheme) of the member state in question. However, this federation is currently managed by member states and it is initially applied only to citizens and public organizations. The goal of this paper is to present the results of a European Project, named eID@Cloud, that focuses on the integration of eID in 5 cloud platforms belonging to authentication service providers of different EU Member States to act as Service Providers (SP) for private entities. We propose an initiative based on a private eID Scheme both for natural and legal persons. The methodology followed in the eID@Cloud project is that each Identity Provider (IdP) is subscribed to an eIDAS Node Connector, requesting for authentication, that is subscribed to an eIDAS Node Proxy Service, issuing authentication assertions. To cope with high loads, load balancing is supported in the eIDAS Node. The eID@Cloud project is still going on, but we already have some important outcomes. First, we have deployed the federation identity nodes and tested it from the security and performance point of view. The pilot prototype has shown the feasibility of deploying this kind of systems, ensuring good performance due to the replication of the eIDAS nodes and the load balance mechanism. Second, our solution avoids the propagation of identity data out of the native domain of the user or entity being identified, which avoids problems well known in cybersecurity due to network interception, man in the middle attack, etc. Last, but not least, this system allows to connect any country or collectivity easily, providing incremental development of the network and avoiding difficult political negotiations to agree on a single authentication format (which would be a major stopper).Keywords: cybersecurity, identity federation, trust, user authentication
Procedia PDF Downloads 1672714 Hybrid Weighted Multiple Attribute Decision Making Handover Method for Heterogeneous Networks
Authors: Mohanad Alhabo, Li Zhang, Naveed Nawaz
Abstract:
Small cell deployment in 5G networks is a promising technology to enhance capacity and coverage. However, unplanned deployment may cause high interference levels and high number of unnecessary handovers, which in turn will result in an increase in the signalling overhead. To guarantee service continuity, minimize unnecessary handovers, and reduce signalling overhead in heterogeneous networks, it is essential to properly model the handover decision problem. In this paper, we model the handover decision according to Multiple Attribute Decision Making (MADM) method, specifically Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS). In this paper, we propose a hybrid TOPSIS method to control the handover in heterogeneous network. The proposed method adopts a hybrid weighting, which is a combination of entropy and standard deviation. A hybrid weighting control parameter is introduced to balance the impact of the standard deviation and entropy weighting on the network selection process and the overall performance. Our proposed method shows better performance, in terms of the number of frequent handovers and the mean user throughput, compared to the existing methods.Keywords: handover, HetNets, interference, MADM, small cells, TOPSIS, weight
Procedia PDF Downloads 1502713 Pattern of Cybercrime Among Adolescents: An Exploratory Study
Authors: Mohamamd Shahjahan
Abstract:
Background: Cybercrime is common phenomenon at present both developed and developing countries. Young generation, especially adolescents now engaged internet frequently and they commit cybercrime frequently in Bangladesh. Objective: In this regard, the present study on the pattern of cybercrime among youngers of Bangladesh has been conducted. Methods and tools: This study was a cross-sectional study, descriptive in nature. Non-probability accidental sampling technique has been applied to select the sample because of the nonfinite population and the sample size was 167. A printed semi-structured questionnaire was used to collect data. Results: The study shows that adolescents mainly do hacking (94.6%), pornography (88.6%), software piracy (85 %), cyber theft (82.6%), credit card fraud (81.4%), cyber defamation (75.6%), sweet heart swindling (social network) (65.9%) etc. as cybercrime. According to findings the major causes of cybercrime among the respondents in Bangladesh were- weak laws (88.0%), defective socialization (81.4%), peer group influence (80.2%), easy accessibility to internet (74.3%), corruption (62.9%), unemployment (58.7%), and poverty (24.6%) etc. It is evident from the study that 91.0% respondents used password cracker as the techniques of cyber criminality. About 76.6%, 72.5%, 71.9%, 68.3% and 60.5% respondents’ technique was key loggers, network sniffer, exploiting, vulnerability scanner and port scanner consecutively. Conclusion: The study concluded that pattern of cybercrimes is frequently changing and increasing dramatically. Finally, it is recommending that the private public partnership and execution of existing laws can be controlling this crime.Keywords: cybercrime, adolescents, pattern, internet
Procedia PDF Downloads 812712 An Integrated Approach to the Carbonate Reservoir Modeling: Case Study of the Eastern Siberia Field
Authors: Yana Snegireva
Abstract:
Carbonate reservoirs are known for their heterogeneity, resulting from various geological processes such as diagenesis and fracturing. These complexities may cause great challenges in understanding fluid flow behavior and predicting the production performance of naturally fractured reservoirs. The investigation of carbonate reservoirs is crucial, as many petroleum reservoirs are naturally fractured, which can be difficult due to the complexity of their fracture networks. This can lead to geological uncertainties, which are important for global petroleum reserves. The problem outlines the key challenges in carbonate reservoir modeling, including the accurate representation of fractures and their connectivity, as well as capturing the impact of fractures on fluid flow and production. Traditional reservoir modeling techniques often oversimplify fracture networks, leading to inaccurate predictions. Therefore, there is a need for a modern approach that can capture the complexities of carbonate reservoirs and provide reliable predictions for effective reservoir management and production optimization. The modern approach to carbonate reservoir modeling involves the utilization of the hybrid fracture modeling approach, including the discrete fracture network (DFN) method and implicit fracture network, which offer enhanced accuracy and reliability in characterizing complex fracture systems within these reservoirs. This study focuses on the application of the hybrid method in the Nepsko-Botuobinskaya anticline of the Eastern Siberia field, aiming to prove the appropriateness of this method in these geological conditions. The DFN method is adopted to model the fracture network within the carbonate reservoir. This method considers fractures as discrete entities, capturing their geometry, orientation, and connectivity. But the method has significant disadvantages since the number of fractures in the field can be very high. Due to limitations in the amount of main memory, it is very difficult to represent these fractures explicitly. By integrating data from image logs (formation micro imager), core data, and fracture density logs, a discrete fracture network (DFN) model can be constructed to represent fracture characteristics for hydraulically relevant fractures. The results obtained from the DFN modeling approaches provide valuable insights into the East Siberia field's carbonate reservoir behavior. The DFN model accurately captures the fracture system, allowing for a better understanding of fluid flow pathways, connectivity, and potential production zones. The analysis of simulation results enables the identification of zones of increased fracturing and optimization opportunities for reservoir development with the potential application of enhanced oil recovery techniques, which were considered in further simulations on the dual porosity and dual permeability models. This approach considers fractures as separate, interconnected flow paths within the reservoir matrix, allowing for the characterization of dual-porosity media. The case study of the East Siberia field demonstrates the effectiveness of the hybrid model method in accurately representing fracture systems and predicting reservoir behavior. The findings from this study contribute to improved reservoir management and production optimization in carbonate reservoirs with the use of enhanced and improved oil recovery methods.Keywords: carbonate reservoir, discrete fracture network, fracture modeling, dual porosity, enhanced oil recovery, implicit fracture model, hybrid fracture model
Procedia PDF Downloads 762711 Methodology: A Review in Modelling and Predictability of Embankment in Soft Ground
Authors: Bhim Kumar Dahal
Abstract:
Transportation network development in the developing country is in rapid pace. The majority of the network belongs to railway and expressway which passes through diverse topography, landform and geological conditions despite the avoidance principle during route selection. Construction of such networks demand many low to high embankment which required improvement in the foundation soil. This paper is mainly focused on the various advanced ground improvement techniques used to improve the soft soil, modelling approach and its predictability for embankments construction. The ground improvement techniques can be broadly classified in to three groups i.e. densification group, drainage and consolidation group and reinforcement group which are discussed with some case studies. Various methods were used in modelling of the embankments from simple 1-dimensional to complex 3-dimensional model using variety of constitutive models. However, the reliability of the predictions is not found systematically improved with the level of sophistication. And sometimes the predictions are deviated more than 60% to the monitored value besides using same level of erudition. This deviation is found mainly due to the selection of constitutive model, assumptions made during different stages, deviation in the selection of model parameters and simplification during physical modelling of the ground condition. This deviation can be reduced by using optimization process, optimization tools and sensitivity analysis of the model parameters which will guide to select the appropriate model parameters.Keywords: cement, improvement, physical properties, strength
Procedia PDF Downloads 1762710 Carbon Capture and Storage by Continuous Production of CO₂ Hydrates Using a Network Mixing Technology
Authors: João Costa, Francisco Albuquerque, Ricardo J. Santos, Madalena M. Dias, José Carlos B. Lopes, Marcelo Costa
Abstract:
Nowadays, it is well recognized that carbon dioxide emissions, together with other greenhouse gases, are responsible for the dramatic climate changes that have been occurring over the past decades. Gas hydrates are currently seen as a promising and disruptive set of materials that can be used as a basis for developing new technologies for CO₂ capture and storage. Its potential as a clean and safe pathway for CCS is tremendous since it requires only water and gas to be mixed under favorable temperatures and mild high pressures. However, the hydrates formation process is highly exothermic; it releases about 2 MJ per kilogram of CO₂, and it only occurs in a narrow window of operational temperatures (0 - 10 °C) and pressures (15 to 40 bar). Efficient continuous hydrate production at a specific temperature range necessitates high heat transfer rates in mixing processes. Past technologies often struggled to meet this requirement, resulting in low productivity or extended mixing/contact times due to inadequate heat transfer rates, which consistently posed a limitation. Consequently, there is a need for more effective continuous hydrate production technologies in industrial applications. In this work, a network mixing continuous production technology has been shown to be viable for producing CO₂ hydrates. The structured mixer used throughout this work consists of a network of unit cells comprising mixing chambers interconnected by transport channels. These mixing features result in enhanced heat and mass transfer rates and high interfacial surface area. The mixer capacity emerges from the fact that, under proper hydrodynamic conditions, the flow inside the mixing chambers becomes fully chaotic and self-sustained oscillatory flow, inducing intense local laminar mixing. The device presents specific heat transfer rates ranging from 107 to 108 W⋅m⁻³⋅K⁻¹. A laboratory scale pilot installation was built using a device capable of continuously capturing 1 kg⋅h⁻¹ of CO₂, in an aqueous slurry of up to 20% in mass. The strong mixing intensity has proven to be sufficient to enhance dissolution and initiate hydrate crystallization without the need for external seeding mechanisms and to achieve, at the device outlet, conversions of 99% in CO₂. CO₂ dissolution experiments revealed that the overall liquid mass transfer coefficient is orders of magnitude larger than in similar devices with the same purpose, ranging from 1 000 to 12 000 h⁻¹. The present technology has shown itself to be capable of continuously producing CO₂ hydrates. Furthermore, the modular characteristics of the technology, where scalability is straightforward, underline the potential development of a modular hydrate-based CO₂ capture process for large-scale applications.Keywords: network, mixing, hydrates, continuous process, carbon dioxide
Procedia PDF Downloads 522709 Niftiness of the COLME to Promote Shared Decision-Making in Organizations
Authors: Prakash Singh
Abstract:
The question that arises is whether a theory such as the Collegial Leadership Model of Emancipation (COLME) has the potency to introduce leadership change by empowering and emancipating their employees. It is a fallacy to simply assume that experience alone, in the absence of theory, will contribute to this knowledge base to develop collegial leaders. The focus of this study is to therefore ascertain whether the COLME can serve as a conceptual framework to transform traditional bureaucratic management practices (TBMPs) in order to promote shared decision-making in organizations such as schools. All the respondents in this exploratory qualitative study embraced collegiality to transform TBMPs in their organizations. For the positive effects to be sustained, the collegial practices need to be evolutionary and emancipatory in order to evoke the values of collegial leadership as elucidated by the findings of this study. Interviewees affirmed that the COLME provides an astute framework to develop commendable collegial leadership practices as it clearly outlines procedures to develop and use the leadership potential of all the employees in order to foster joint accountability. They acknowledged that when the principles of collegiality are flexibly applied, they contribute to the creation of a holistic milieu in which all employees are able to express themselves freely, without fear of failure, and thus feel that they are part of the democratic decision-making process. Evidently, a conceptual framework such as the COLME can serve as a benchmark for leadership effectiveness because organizational outcomes need to be measured against standards of excellence in meeting both employee and customer expectations.Keywords: collegial leadership model, employee empowerment, shared decision-making, traditional bureaucratic management practices
Procedia PDF Downloads 4952708 Cross Attention Fusion for Dual-Stream Speech Emotion Recognition
Authors: Shaode Yu, Jiajian Meng, Bing Zhu, Hang Yu, Qiurui Sun
Abstract:
Speech emotion recognition (SER) is for recognizing human subjective emotions through audio data in-depth analysis. From speech audios, how to comprehensively extract emotional information and how to effectively fuse extracted features remain challenging. This paper presents a dual-stream SER framework that embraces both full training and transfer learning of different networks for thorough feature encoding. Besides, a plug-and-play cross-attention fusion (CAF) module is implemented for the valid integration of the dual-stream encoder output. The effectiveness of the proposed CAF module is compared to the other three fusion modules (feature summation, feature concatenation, and feature-wise linear modulation) on two databases (RAVDESS and IEMO-CAP) using different dual-stream encoders (full training network, DPCNN or TextRCNN; transfer learning network, HuBERT or Wav2Vec2). Experimental results suggest that the CAF module can effectively reconcile conflicts between features from different encoders and outperform the other three feature fusion modules on the SER task. In the future, the plug-and-play CAF module can be extended for multi-branch feature fusion, and the dual-stream SER framework can be widened for multi-stream data representation to improve the recognition performance and generalization capacity.Keywords: speech emotion recognition, cross-attention fusion, dual-stream, pre-trained
Procedia PDF Downloads 802707 An Adaptive Oversampling Technique for Imbalanced Datasets
Authors: Shaukat Ali Shahee, Usha Ananthakumar
Abstract:
A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling
Procedia PDF Downloads 418