Search results for: imputation method of missing data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 38565

Search results for: imputation method of missing data

38145 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 549
38144 Effectiveness of Earthing System in Vertical Configurations

Authors: S. Yunus, A. Suratman, N. Mohamad Nor, M. Othman

Abstract:

This paper presents the measurement and simulation results by Finite Element Method (FEM) for earth resistance (RDC) for interconnected vertical ground rod configurations. The soil resistivity was measured using the Wenner four-pin Method, and RDC was measured using the Fall of Potential (FOP) method, as outlined in the standard. Genetic Algorithm (GA) is employed to interpret the soil resistivity to that of a 2-layer soil model. The same soil resistivity data that were obtained by Wenner four-pin method were used in FEM for simulation. This paper compares the results of RDC obtained by FEM simulation with the real measurement at field site. A good agreement was seen for RDC obtained by measurements and FEM. This shows that FEM is a reliable software to be used for design of earthing systems. It is also found that the parallel rod system has a better performance compared to a similar setup using a grid layout.

Keywords: earthing system, earth electrodes, finite element method, genetic algorithm, earth resistances

Procedia PDF Downloads 110
38143 Structural Health Monitoring of Buildings–Recorded Data and Wave Method

Authors: Tzong-Ying Hao, Mohammad T. Rahmani

Abstract:

This article presents the structural health monitoring (SHM) method based on changes in wave traveling times (wave method) within a layered 1-D shear beam model of structure. The wave method measures the velocity of shear wave propagating in a building from the impulse response functions (IRF) obtained from recorded data at different locations inside the building. If structural damage occurs in a structure, the velocity of wave propagation through it changes. The wave method analysis is performed on the responses of Torre Central building, a 9-story shear wall structure located in Santiago, Chile. Because events of different intensity (ambient vibrations, weak and strong earthquake motions) have been recorded at this building, therefore it can serve as a full-scale benchmark to validate the structural health monitoring method utilized. The analysis of inter-story drifts and the Fourier spectra for the EW and NS motions during 2010 Chile earthquake are presented. The results for the NS motions suggest the coupling of translation and torsion responses. The system frequencies (estimated from the relative displacement response of the 8th-floor with respect to the basement from recorded data) were detected initially decreasing approximately 24% in the EW motion. Near the end of shaking, an increase of about 17% was detected. These analysis and results serve as baseline indicators of the occurrence of structural damage. The detected changes in wave velocities of the shear beam model are consistent with the observed damage. However, the 1-D shear beam model is not sufficient to simulate the coupling of translation and torsion responses in the NS motion. The wave method is proven for actual implementation in structural health monitoring systems based on carefully assessing the resolution and accuracy of the model for its effectiveness on post-earthquake damage detection in buildings.

Keywords: Chile earthquake, damage detection, earthquake response, impulse response function, shear beam model, shear wave velocity, structural health monitoring, torre central building, wave method

Procedia PDF Downloads 369
38142 Artificial Intelligence and Development: The Missing Link

Authors: Driss Kettani

Abstract:

ICT4D actors are naturally attempted to include AI in the range of enabling technologies and tools that could support and boost the Development process, and to refer to these as AI4D. But, doing so, assumes that AI complies with the very specific features of ICT4D context, including, among others, affordability, relevance, openness, and ownership. Clearly, none of these is fulfilled, and the enthusiastic posture that AI4D is a natural part of ICT4D is not grounded and, to certain extent, does not serve the purpose of Technology for Development at all. In the context of Development, it is important to emphasize and prioritize ICT4D, in the national digital transformation strategies, instead of borrowing "trendy" waves of the IT Industry that are motivated by business considerations, with no specific care/consideration to Development.

Keywords: AI, ICT4D, technology for development, position paper

Procedia PDF Downloads 93
38141 Ontology for a Voice Transcription of OpenStreetMap Data: The Case of Space Apprehension by Visually Impaired Persons

Authors: Said Boularouk, Didier Josselin, Eitan Altman

Abstract:

In this paper, we present a vocal ontology of OpenStreetMap data for the apprehension of space by visually impaired people. Indeed, the platform based on produsage gives a freedom to data producers to choose the descriptors of geocoded locations. Unfortunately, this freedom, called also folksonomy leads to complicate subsequent searches of data. We try to solve this issue in a simple but usable method to extract data from OSM databases in order to send them to visually impaired people using Text To Speech technology. We focus on how to help people suffering from visual disability to plan their itinerary, to comprehend a map by querying computer and getting information about surrounding environment in a mono-modal human-computer dialogue.

Keywords: TTS, ontology, open street map, visually impaired

Procedia PDF Downloads 297
38140 Sentiment Classification Using Enhanced Contextual Valence Shifters

Authors: Vo Ngoc Phu, Phan Thi Tuoi

Abstract:

We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.

Keywords: sentiment classification, sentiment orientation, valence shifters, contextual, valence shifters, term counting

Procedia PDF Downloads 505
38139 A Comparative Study of the Athlete Health Records' Minimum Data Set in Selected Countries and Presenting a Model for Iran

Authors: Robab Abdolkhani, Farzin Halabchi, Reza Safdari, Goli Arji

Abstract:

Background and purpose: The quality of health record depends on the quality of its content and proper documentation. Minimum data set makes a standard method for collecting key data elements that make them easy to understand and enable comparison. The aim of this study was to determine the minimum data set for Iranian athletes’ health records. Methods: This study is an applied research of a descriptive comparative type which was carried out in 2013. By using internal and external forms of documentation, a checklist was created that included data elements of athletes health record and was subjected to debate in Delphi method by experts in the field of sports medicine and health information management. Results: From 97 elements which were subjected to discussion, 85 elements by more than 75 percent of the participants (as the main elements) and 12 elements by 50 to 75 percent of the participants (as the proposed elements) were agreed upon. In about 97 elements of the case, there was no significant difference between responses of alumni groups of sport pathology and sports medicine specialists with medical record, medical informatics and information management professionals. Conclusion: Minimum data set of Iranian athletes’ health record with four information categories including demographic information, health history, assessment and treatment plan was presented. The proposed model is available for manual and electronic medical records.

Keywords: Documentation, Health record, Minimum data set, Sports medicine

Procedia PDF Downloads 483
38138 A Proposed Framework for Software Redocumentation Using Distributed Data Processing Techniques and Ontology

Authors: Laila Khaled Almawaldi, Hiew Khai Hang, Sugumaran A. l. Nallusamy

Abstract:

Legacy systems are crucial for organizations, but their intricacy and lack of documentation pose challenges for maintenance and enhancement. Redocumentation of legacy systems is vital for automatically or semi-automatically creating documentation for software lacking sufficient records. It aims to enhance system understandability, maintainability, and knowledge transfer. However, existing redocumentation methods need improvement in data processing performance and document generation efficiency. This stems from the necessity to efficiently handle the extensive and complex code of legacy systems. This paper proposes a method for semi-automatic legacy system re-documentation using semantic parallel processing and ontology. Leveraging parallel processing and ontology addresses current challenges by distributing the workload and creating documentation with logically interconnected data. The paper outlines challenges in legacy system redocumentation and suggests a method of redocumentation using parallel processing and ontology for improved efficiency and effectiveness.

Keywords: legacy systems, redocumentation, big data analysis, parallel processing

Procedia PDF Downloads 48
38137 Numerical Calculation of Dynamic Response of Catamaran Vessels Based on 3D Green Function Method

Authors: Md. Moinul Islam, N. M. Golam Zakaria

Abstract:

Seakeeping analysis of catamaran vessels in the earlier stages of design has become an important issue as it dictates the seakeeping characteristics, and it ensures safe navigation during the voyage. In the present paper, a 3D numerical method for the seakeeping prediction of catamaran vessel is presented using the 3D Green Function method. Both steady and unsteady potential flow problem is dealt with here. Using 3D linearized potential theory, the dynamic wave loads and the subsequent response of the vessel is computed. For validation of the numerical procedure catamaran vessel composed of twin, Wigley form demi-hull is used. The results of the present calculation are compared with the available experimental data and also with other calculations. The numerical procedure is also carried out for NPL-based round bilge catamaran, and hydrodynamic coefficients along with heave and pitch motion responses are presented for various Froude number. The results obtained by the present numerical method are found to be in fairly good agreement with the available data. This can be used as a design tool for predicting the seakeeping behavior of catamaran ships in waves.

Keywords: catamaran, hydrodynamic coefficients , motion response, 3D green function

Procedia PDF Downloads 222
38136 Bayesian Analysis of Topp-Leone Generalized Exponential Distribution

Authors: Najrullah Khan, Athar Ali Khan

Abstract:

The Topp-Leone distribution was introduced by Topp- Leone in 1955. In this paper, an attempt has been made to fit Topp-Leone Generalized exponential (TPGE) distribution. A real survival data set is used for illustrations. Implementation is done using R and JAGS and appropriate illustrations are made. R and JAGS codes have been provided to implement censoring mechanism using both optimization and simulation tools. The main aim of this paper is to describe and illustrate the Bayesian modelling approach to the analysis of survival data. Emphasis is placed on the modeling of data and the interpretation of the results. Crucial to this is an understanding of the nature of the incomplete or 'censored' data encountered. Analytic approximation and simulation tools are covered here, but most of the emphasis is on Markov chain based Monte Carlo method including independent Metropolis algorithm, which is currently the most popular technique. For analytic approximation, among various optimization algorithms and trust region method is found to be the best. In this paper, TPGE model is also used to analyze the lifetime data in Bayesian paradigm. Results are evaluated from the above mentioned real survival data set. The analytic approximation and simulation methods are implemented using some software packages. It is clear from our findings that simulation tools provide better results as compared to those obtained by asymptotic approximation.

Keywords: Bayesian Inference, JAGS, Laplace Approximation, LaplacesDemon, posterior, R Software, simulation

Procedia PDF Downloads 535
38135 Shield Tunnel Excavation Simulation of a Case Study Using a So-Called 'Stress Relaxation' Method

Authors: Shengwei Zhu, Alireza Afshani, Hirokazu Akagi

Abstract:

Ground surface settlement induced by shield tunneling is addressing increasing attention as shield tunneling becomes a popular construction technique for tunnels in urban areas. This paper discusses a 2D longitudinal FEM simulation of a tunneling case study in Japan (Tokyo Metro Yurakucho Line). Tunneling-induced field data was already collected and is used here for comparison and evaluating purposes. In this model, earth pressure, face pressure, backfilling grouting, elastic tunnel lining, and Mohr-Coulomb failure criterion for soil elements are considered. A method called ‘stress relaxation’ is also exploited to simulate the gradual tunneling excavation. Ground surface settlements obtained from numerical results using the introduced method are then compared with the measurement data.

Keywords: 2D longitudinal FEM model, tunneling case study, stress relaxation, shield tunneling excavation

Procedia PDF Downloads 332
38134 An Ensemble Deep Learning Architecture for Imbalanced Classification of Thoracic Surgery Patients

Authors: Saba Ebrahimi, Saeed Ahmadian, Hedie Ashrafi

Abstract:

Selecting appropriate patients for surgery is one of the main issues in thoracic surgery (TS). Both short-term and long-term risks and benefits of surgery must be considered in the patient selection criteria. There are some limitations in the existing datasets of TS patients because of missing values of attributes and imbalanced distribution of survival classes. In this study, a novel ensemble architecture of deep learning networks is proposed based on stacking different linear and non-linear layers to deal with imbalance datasets. The categorical and numerical features are split using different layers with ability to shrink the unnecessary features. Then, after extracting the insight from the raw features, a novel biased-kernel layer is applied to reinforce the gradient of the minority class and cause the network to be trained better comparing the current methods. Finally, the performance and advantages of our proposed model over the existing models are examined for predicting patient survival after thoracic surgery using a real-life clinical data for lung cancer patients.

Keywords: deep learning, ensemble models, imbalanced classification, lung cancer, TS patient selection

Procedia PDF Downloads 146
38133 Improvement of the Geometric of Dental Bridge Framework through Automatic Program

Authors: Rong-Yang Lai, Jia-Yu Wu, Chih-Han Chang, Yung-Chung Chen

Abstract:

The dental bridge is one of the clinical methods of the treatment for missing teeth. The dental bridge is generally designed for two layers, containing the inner layer of the framework(zirconia) and the outer layer of the porcelain-fused to framework restorations. The design of a conventional bridge is generally based on the antagonist tooth profile so that the framework evenly indented by an equal thickness from outer contour. All-ceramic dental bridge made of zirconia have well demonstrated remarkable potential to withstand a higher physiological occlusal load in posterior region, but it was found that there is still the risk of all-ceramic bridge failure in five years. Thus, how to reduce the incidence of failure is still a problem to be solved. Therefore, the objective of this study is to develop mechanical designs for all-ceramic dental bridges framework by reducing the stress and enhancing fracture resistance under given loading conditions by finite element method. In this study, dental design software is used to design dental bridge based on tooth CT images. After building model, Bi-directional Evolutionary Structural Optimization (BESO) Method algorithm implemented in finite element software was employed to analyze results of finite element software and determine the distribution of the materials in dental bridge; BESO searches the optimum distribution of two different materials, namely porcelain and zirconia. According to the previous calculation of the stress value of each element, when the element stress value is higher than the threshold value, the element would be replaced by the framework material; besides, the difference of maximum stress peak value is less than 0.1%, calculation is complete. After completing the design of dental bridge, the stress distribution of the whole structure is changed. BESO reduces the peak values of principle stress of 10% in outer-layer porcelain and avoids producing tensile stress failure.

Keywords: dental bridge, finite element analysis, framework, automatic program

Procedia PDF Downloads 283
38132 Generalized Correlation Coefficient in Genome-Wide Association Analysis of Cognitive Ability in Twins

Authors: Afsaneh Mohammadnejad, Marianne Nygaard, Jan Baumbach, Shuxia Li, Weilong Li, Jesper Lund, Jacob v. B. Hjelmborg, Lene Christensen, Qihua Tan

Abstract:

Cognitive impairment in the elderly is a key issue affecting the quality of life. Despite a strong genetic background in cognition, only a limited number of single nucleotide polymorphisms (SNPs) have been found. These explain a small proportion of the genetic component of cognitive function, thus leaving a large proportion unaccounted for. We hypothesize that one reason for this missing heritability is the misspecified modeling in data analysis concerning phenotype distribution as well as the relationship between SNP dosage and the phenotype of interest. In an attempt to overcome these issues, we introduced a model-free method based on the generalized correlation coefficient (GCC) in a genome-wide association study (GWAS) of cognitive function in twin samples and compared its performance with two popular linear regression models. The GCC-based GWAS identified two genome-wide significant (P-value < 5e-8) SNPs; rs2904650 near ZDHHC2 on chromosome 8 and rs111256489 near CD6 on chromosome 11. The kinship model also detected two genome-wide significant SNPs, rs112169253 on chromosome 4 and rs17417920 on chromosome 7, whereas no genome-wide significant SNPs were found by the linear mixed model (LME). Compared to the linear models, more meaningful biological pathways like GABA receptor activation, ion channel transport, neuroactive ligand-receptor interaction, and the renin-angiotensin system were found to be enriched by SNPs from GCC. The GCC model outperformed the linear regression models by identifying more genome-wide significant genetic variants and more meaningful biological pathways related to cognitive function. Moreover, GCC-based GWAS was robust in handling genetically related twin samples, which is an important feature in handling genetic confounding in association studies.

Keywords: cognition, generalized correlation coefficient, GWAS, twins

Procedia PDF Downloads 127
38131 Development of New Technology Evaluation Model by Using Patent Information and Customers' Review Data

Authors: Kisik Song, Kyuwoong Kim, Sungjoo Lee

Abstract:

Many global firms and corporations derive new technology and opportunity by identifying vacant technology from patent analysis. However, previous studies failed to focus on technologies that promised continuous growth in industrial fields. Most studies that derive new technology opportunities do not test practical effectiveness. Since previous studies depended on expert judgment, it became costly and time-consuming to evaluate new technologies based on patent analysis. Therefore, research suggests a quantitative and systematic approach to technology evaluation indicators by using patent data to and from customer communities. The first step involves collecting two types of data. The data is used to construct evaluation indicators and apply these indicators to the evaluation of new technologies. This type of data mining allows a new method of technology evaluation and better predictor of how new technologies are adopted.

Keywords: data mining, evaluating new technology, technology opportunity, patent analysis

Procedia PDF Downloads 378
38130 Analysis and Rule Extraction of Coronary Artery Disease Data Using Data Mining

Authors: Rezaei Hachesu Peyman, Oliyaee Azadeh, Salahzadeh Zahra, Alizadeh Somayyeh, Safaei Naser

Abstract:

Coronary Artery Disease (CAD) is one major cause of disability in adults and one main cause of death in developed. In this study, data mining techniques including Decision Trees, Artificial neural networks (ANNs), and Support Vector Machine (SVM) analyze CAD data. Data of 4948 patients who had suffered from heart diseases were included in the analysis. CAD is the target variable, and 24 inputs or predictor variables are used for the classification. The performance of these techniques is compared in terms of sensitivity, specificity, and accuracy. The most significant factor influencing CAD is chest pain. Elderly males (age > 53) have a high probability to be diagnosed with CAD. SVM algorithm is the most useful way for evaluation and prediction of CAD patients as compared to non-CAD ones. Application of data mining techniques in analyzing coronary artery diseases is a good method for investigating the existing relationships between variables.

Keywords: classification, coronary artery disease, data-mining, knowledge discovery, extract

Procedia PDF Downloads 659
38129 Operating Speed Models on Tangent Sections of Two-Lane Rural Roads

Authors: Dražen Cvitanić, Biljana Maljković

Abstract:

This paper presents models for predicting operating speeds on tangent sections of two-lane rural roads developed on continuous speed data. The data corresponds to 20 drivers of different ages and driving experiences, driving their own cars along an 18 km long section of a state road. The data were first used for determination of maximum operating speeds on tangents and their comparison with speeds in the middle of tangents i.e. speed data used in most of operating speed studies. Analysis of continuous speed data indicated that the spot speed data are not reliable indicators of relevant speeds. After that, operating speed models for tangent sections were developed. There was no significant difference between models developed using speed data in the middle of tangent sections and models developed using maximum operating speeds on tangent sections. All developed models have higher coefficient of determination then models developed on spot speed data. Thus, it can be concluded that the method of measuring has more significant impact on the quality of operating speed model than the location of measurement.

Keywords: operating speed, continuous speed data, tangent sections, spot speed, consistency

Procedia PDF Downloads 452
38128 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 564
38127 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 149
38126 Elvis Improved Method for Solving Simultaneous Equations in Two Variables with Some Applications

Authors: Elvis Adam Alhassan, Kaiyu Tian, Akos Konadu, Ernest Zamanah, Michael Jackson Adjabui, Ibrahim Justice Musah, Esther Agyeiwaa Owusu, Emmanuel K. A. Agyeman

Abstract:

In this paper, how to solve simultaneous equations using the Elvis improved method is shown. The Elvis improved method says; to make one variable in the first equation the subject; make the same variable in the second equation the subject; equate the results and simplify to obtain the value of the unknown variable; put the value of the variable found into one equation from the first or second steps and simplify for the remaining unknown variable. The difference between our Elvis improved method and the substitution method is that: with Elvis improved method, the same variable is made the subject in both equations, and the two resulting equations equated, unlike the substitution method where one variable is made the subject of only one equation and substituted into the other equation. After describing the Elvis improved method, findings from 100 secondary students and the views of 5 secondary tutors to demonstrate the effectiveness of the method are presented. The study's purpose is proved by hypothetical examples.

Keywords: simultaneous equations, substitution method, elimination method, graphical method, Elvis improved method

Procedia PDF Downloads 141
38125 Lineup Optimization Model of Basketball Players Based on the Prediction of Recursive Neural Networks

Authors: Wang Yichen, Haruka Yamashita

Abstract:

In recent years, in the field of sports, decision making such as member in the game and strategy of the game based on then analysis of the accumulated sports data are widely attempted. In fact, in the NBA basketball league where the world's highest level players gather, to win the games, teams analyze the data using various statistical techniques. However, it is difficult to analyze the game data for each play such as the ball tracking or motion of the players in the game, because the situation of the game changes rapidly, and the structure of the data should be complicated. Therefore, it is considered that the analysis method for real time game play data is proposed. In this research, we propose an analytical model for "determining the optimal lineup composition" using the real time play data, which is considered to be difficult for all coaches. In this study, because replacing the entire lineup is too complicated, and the actual question for the replacement of players is "whether or not the lineup should be changed", and “whether or not Small Ball lineup is adopted”. Therefore, we propose an analytical model for the optimal player selection problem based on Small Ball lineups. In basketball, we can accumulate scoring data for each play, which indicates a player's contribution to the game, and the scoring data can be considered as a time series data. In order to compare the importance of players in different situations and lineups, we combine RNN (Recurrent Neural Network) model, which can analyze time series data, and NN (Neural Network) model, which can analyze the situation on the field, to build the prediction model of score. This model is capable to identify the current optimal lineup for different situations. In this research, we collected all the data of accumulated data of NBA from 2019-2020. Then we apply the method to the actual basketball play data to verify the reliability of the proposed model.

Keywords: recurrent neural network, players lineup, basketball data, decision making model

Procedia PDF Downloads 134
38124 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 418
38123 Investigating Physician-Induced Demand among Mental Patients in East Azerbaijan, Iran: A Multilevel Approach of Hierarchical Linear Modeling

Authors: Hossein Panahi, Firouz Fallahi, Sima Nasibparast

Abstract:

Background & Aim: Unnecessary growth in health expenditures of developing countries in recent decades, and also the importance of physicians’ behavior in health market, have made the theory of physician-induced demand (PID) as one of the most important issues in health economics. Therefore, the main objective of this study is to investigate the hypothesis of induced demand among mental patients who receive services from either psychologists or psychiatrists in East Azerbaijan province. Methods: Using data from questionnaires in 2020 and employing the theoretical model of Jaegher and Jegers (2000) and hierarchical linear modeling (HLM), this study examines the PID hypothesis of selected psychologists and psychiatrists. The sample size of the study, after removing the questionnaires with missing data, is 45 psychologists and 203 people of their patients, as well as 30 psychiatrists and 160 people of their patients. Results: The results show that, although psychiatrists are ‘profit-oriented physicians’, there is no evidence of inducing unnecessary demand by them (PID), and the difference between the behavior of employers and employee doctors is due to differences in practice style. However, with regard to psychologists, the results indicate that they are ‘profit-oriented’, and there is a PID effect in this sector. Conclusion: According to the results, it is suggested that in order to reduce competition and eliminate the PID effect, the admission of students in the field of psychology should be reduced, patient information on mental illness should be increased, and government monitoring and control over the national health system must be increased.

Keywords: physician-induced demand, national health system, hierarchical linear modeling methods, multilevel modela

Procedia PDF Downloads 138
38122 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 383
38121 The Influence of the Normative Gender Binary in Diversity Management: A Multi-Method Study on Gender Diversity of Diversity Management

Authors: Robin C. Ladwig

Abstract:

Diversity Management, as a substantial element of Human Resource Management, aims to secure the economic benefit that assumingly comes with a diverse workforce. Consequently, diversity managers focus on the protection of employees and securing equality measurements to assure organisational gender diversity. Gender diversity as one aspect of Diversity Management seems to adhere to gender binarism and cis-normativity. Workplaces are gendered spaces which are echoing the binary gender-normativity presented in Diversity Management, sold under the label of gender diversity. While the expectation of Diversity Management implies the inclusion of a multiplicity of marginalised groups, such as trans and gender diverse people, in current literature and practice, the reality is curated by gender binarism and cis-normativity. The qualitative multi-method research showed a lack of knowledge about trans and gender diverse matters within the profession of Diversity Management and Human Resources. The semi-structured interviews with trans and gender diverse individuals from various backgrounds and occupations in Australia exposed missing considerations of trans and gender diverse experiences in the inclusivity and gender equity of various workplaces. Even if practitioners consider trans and gender diverse matters under gender diversity, the practical execution is limited to gender binary structures and cis-normative actions as the photo-elicit questionnaire with diversity managers, human resource officers, and personnel management demonstrates. Diversity Management should approach a broader source of informed practice by extending their business focus to the knowledge of humanity studies. Humanity studies could include diversity, queer, or gender studies to increase the inclusivity of marginalised groups such as trans and gender diverse employees and people. Furthermore, the definition of gender diversity should be extended beyond the gender binary and cis-normative experience. People may lose trust in Diversity Management as a supportive ally of marginalised employees if the understanding of inclusivity is limited to a gender binary and cis-normativity value system that misrepresents the richness of gender diversity.

Keywords: cis-normativity, diversity management, gender binarism, trans and gender diversity

Procedia PDF Downloads 204
38120 Preprocessing and Fusion of Multiple Representation of Finger Vein patterns using Conventional and Machine Learning techniques

Authors: Tomas Trainys, Algimantas Venckauskas

Abstract:

Application of biometric features to the cryptography for human identification and authentication is widely studied and promising area of the development of high-reliability cryptosystems. Biometric cryptosystems typically are designed for patterns recognition, which allows biometric data acquisition from an individual, extracts feature sets, compares the feature set against the set stored in the vault and gives a result of the comparison. Preprocessing and fusion of biometric data are the most important phases in generating a feature vector for key generation or authentication. Fusion of biometric features is critical for achieving a higher level of security and prevents from possible spoofing attacks. The paper focuses on the tasks of initial processing and fusion of multiple representations of finger vein modality patterns. These tasks are solved by applying conventional image preprocessing methods and machine learning techniques, Convolutional Neural Network (SVM) method for image segmentation and feature extraction. An article presents a method for generating sets of biometric features from a finger vein network using several instances of the same modality. Extracted features sets were fused at the feature level. The proposed method was tested and compared with the performance and accuracy results of other authors.

Keywords: bio-cryptography, biometrics, cryptographic key generation, data fusion, information security, SVM, pattern recognition, finger vein method.

Procedia PDF Downloads 152
38119 Atomic Decomposition Audio Data Compression and Denoising Using Sparse Dictionary Feature Learning

Authors: T. Bryan , V. Kepuska, I. Kostnaic

Abstract:

A method of data compression and denoising is introduced that is based on atomic decomposition of audio data using “basis vectors” that are learned from the audio data itself. The basis vectors are shown to have higher data compression and better signal-to-noise enhancement than the Gabor and gammatone “seed atoms” that were used to generate them. The basis vectors are the input weights of a Sparse AutoEncoder (SAE) that is trained using “envelope samples” of windowed segments of the audio data. The envelope samples are extracted from the audio data by performing atomic decomposition with Gabor or gammatone seed atoms. This process identifies segments of audio data that are locally coherent with the seed atoms. Envelope samples are extracted by identifying locally coherent audio data segments with Gabor or gammatone seed atoms, found by matching pursuit. The envelope samples are formed by taking the kronecker products of the atomic envelopes with the locally coherent data segments. Oracle signal-to-noise ratio (SNR) verses data compression curves are generated for the seed atoms as well as the basis vectors learned from Gabor and gammatone seed atoms. SNR data compression curves are generated for speech signals as well as early American music recordings. The basis vectors are shown to have higher denoising capability for data compression rates ranging from 90% to 99.84% for speech as well as music. Envelope samples are displayed as images by folding the time series into column vectors. This display method is used to compare of the output of the SAE with the envelope samples that produced them. The basis vectors are also displayed as images. Sparsity is shown to play an important role in producing the highest denoising basis vectors.

Keywords: sparse dictionary learning, autoencoder, sparse autoencoder, basis vectors, atomic decomposition, envelope sampling, envelope samples, Gabor, gammatone, matching pursuit

Procedia PDF Downloads 254
38118 Comparative Analysis of Classification Methods in Determining Non-Active Student Characteristics in Indonesia Open University

Authors: Dewi Juliah Ratnaningsih, Imas Sukaesih Sitanggang

Abstract:

Classification is one of data mining techniques that aims to discover a model from training data that distinguishes records into the appropriate category or class. Data mining classification methods can be applied in education, for example, to determine the classification of non-active students in Indonesia Open University. This paper presents a comparison of three methods of classification: Naïve Bayes, Bagging, and C.45. The criteria used to evaluate the performance of three methods of classification are stratified cross-validation, confusion matrix, the value of the area under the ROC Curve (AUC), Recall, Precision, and F-measure. The data used for this paper are from the non-active Indonesia Open University students in registration period of 2004.1 to 2012.2. Target analysis requires that non-active students were divided into 3 groups: C1, C2, and C3. Data analyzed are as many as 4173 students. Results of the study show: (1) Bagging method gave a high degree of classification accuracy than Naïve Bayes and C.45, (2) the Bagging classification accuracy rate is 82.99 %, while the Naïve Bayes and C.45 are 80.04 % and 82.74 % respectively, (3) the result of Bagging classification tree method has a large number of nodes, so it is quite difficult in decision making, (4) classification of non-active Indonesia Open University student characteristics uses algorithms C.45, (5) based on the algorithm C.45, there are 5 interesting rules which can describe the characteristics of non-active Indonesia Open University students.

Keywords: comparative analysis, data mining, clasiffication, Bagging, Naïve Bayes, C.45, non-active students, Indonesia Open University

Procedia PDF Downloads 316
38117 Fuzzy Total Factor Productivity by Credibility Theory

Authors: Shivi Agarwal, Trilok Mathur

Abstract:

This paper proposes the method to measure the total factor productivity (TFP) change by credibility theory for fuzzy input and output variables. Total factor productivity change has been widely studied with crisp input and output variables, however, in some cases, input and output data of decision-making units (DMUs) can be measured with uncertainty. These data can be represented as linguistic variable characterized by fuzzy numbers. Malmquist productivity index (MPI) is widely used to estimate the TFP change by calculating the total factor productivity of a DMU for different time periods using data envelopment analysis (DEA). The fuzzy DEA (FDEA) model is solved using the credibility theory. The results of FDEA is used to measure the TFP change for fuzzy input and output variables. Finally, numerical examples are presented to illustrate the proposed method to measure the TFP change input and output variables. The suggested methodology can be utilized for performance evaluation of DMUs and help to assess the level of integration. The methodology can also apply to rank the DMUs and can find out the DMUs that are lagging behind and make recommendations as to how they can improve their performance to bring them at par with other DMUs.

Keywords: chance-constrained programming, credibility theory, data envelopment analysis, fuzzy data, Malmquist productivity index

Procedia PDF Downloads 367
38116 Assessment of ASEI-PDSI Method on Students’ Attitude and Achievement in Junior Secondary Schools Mathematics in FCT-Abuja

Authors: Amenaghawon Clement Osemwinyen

Abstract:

The Activity, Student-centred, Experiment, Improvisation - Plan, Do, See, Improve (ASEI-PDSI) method championed by the Strengthening Mathematics And Science Education (SMASE) - Nigeria Project is an attempt to improve the quality of mathematics, which has consistently declined over the years in both public primary and secondary schools across the country. The study thus assessed the ASEI-PDSI method on students’ attitudes and achievement in junior secondary schools (JSS) mathematics in FCT-Abuja. A survey research design was adopted, and 100 mathematics teachers using a stratified random sampling method were used for the study. The data were collected using structured questionnaires and analyzed using descriptive statistics. The findings showed that the ASEI-PDSI method had significantly improved the attitudes of students toward mathematics. The study also revealed that the ASEI-PDSI method significantly influenced junior secondary school (JSS) students’ mathematics achievement. Amongst the recommendations were that teachers should be encouraged to adopt the ASEI-PDSI method in teaching and learning mathematics in order to create a mathematically stimulating classroom environment which could advertently influence junior secondary school (JSS) students’ attitude and academic performance in mathematics. Also, regular in-service training programs should be organized by stakeholders (government and other interest groups) so as to improve the teaching strategies of teachers, mostly as they affect the ASEI-PDSI method.

Keywords: achievement, ASEI-PDSI method, attitude, mathematics, SMASE

Procedia PDF Downloads 119