Search results for: large margin nearest neighbor regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3220

Search results for: large margin nearest neighbor regression

3100 Assessment of Water Resources and Inculcation of Controlled Water Consumption System

Authors: Vakhtang Geladze, Nana Bolashvili, Tamazi Karalashvili, Nino Machavariani, Vajha Neidze, Nana Kvirkvelia, Tamar Chichinadze

Abstract:

Deficiency of fresh water is a vital global problem today. It must be taken into consideration that in the nearest future fresh water crisis will become even more acute owing to the global climate warming and fast desertification processes in the world. Georgia has signed the association agreement with Euro Union last year where the priority spheres of cooperation are the management of water resources, development of trans-boundary approach to the problem and active participation in the “Euro Union water initiative” component of “the East Europe, Caucasus and the Central Asia”. Fresh water resources are the main natural wealth of Georgia. According to the average water layer height, Georgia is behind such European countries only as Norway, Switzerland and Austria. The annual average water provision of Georgia is 4-8 times higher than in its neighbor countries Armenia and Azerbaijan. Despite abundant water resources in Georgia, there is considerable discrepancy between their volume and use in some regions because of the uneven territorial distribution. In the East Georgia, water supply of the territory and population is four times less than in the West Georgia.

Keywords: GIS, sociological survey, water consumption, water resources.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 907
3099 Calculus of Turbojet Performances for Ideal Case

Authors: S. Bennoud, S. Hocine, H. Slme

Abstract:

Developments in turbine cooling technology play an important role in increasing the thermal efficiency and the power output of recent gas turbines, in particular the turbojets.

Advanced turbojets operate at high temperatures to improve thermal efficiency and power output. These temperatures are far above the permissible metal temperatures. Therefore, there is a critical need to cool the blades in order to give theirs a maximum life period for safe operation.

The focused objective of this work is to calculate the turbojet performances, as well as the calculation of turbine blades cooling.

The developed application able the calculation of turbojet performances to different altitudes in order to find a point of optimal use making possible to maintain the turbine blades at an acceptable maximum temperature and to limit the local variations in temperatures in order to guarantee their integrity during all the lifespan of the engine.

Keywords: Brayton cycle, Turbine Blades Cooling, Turbojet Cycle, turbojet performances.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2215
3098 A Systems Approach to Gene Ranking from DNA Microarray Data of Cervical Cancer

Authors: Frank Emmert Streib, Matthias Dehmer, Jing Liu, Max Mühlhauser

Abstract:

In this paper we present a method for gene ranking from DNA microarray data. More precisely, we calculate the correlation networks, which are unweighted and undirected graphs, from microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to progression of the tumor. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth and, hence, indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, DNA microarray data, cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755
3097 Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring

Authors: Youngji Yoo, Cheong-Sool Park, Jun Seok Kim, Young-Hak Lee, Sung-Shick Kim, Jun-Geol Baek

Abstract:

In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.

Keywords: Semiconductor, non-normal mixed process data, clustering, Statistical Quality Control (SQC), regression tree, Pearson distribution system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
3096 A Novel Method for the Characterization of Synchronization and Coupling in Multichannel EEG and ECoG

Authors: Manfred Hartmann, Andreas Graef, Hannes Perko, Christoph Baumgartner, Tilmann Kluge

Abstract:

In this paper we introduce a novel method for the characterization of synchronziation and coupling effects in multivariate time series that can be used for the analysis of EEG or ECoG signals recorded during epileptic seizures. The method allows to visualize the spatio-temporal evolution of synchronization and coupling effects that are characteristic for epileptic seizures. Similar to other methods proposed for this purpose our method is based on a regression analysis. However, a more general definition of the regression together with an effective channel selection procedure allows to use the method even for time series that are highly correlated, which is commonly the case in EEG/ECoG recordings with large numbers of electrodes. The method was experimentally tested on ECoG recordings of epileptic seizures from patients with temporal lobe epilepsies. A comparision with the results from a independent visual inspection by clinical experts showed an excellent agreement with the patterns obtained with the proposed method.

Keywords: EEG, epilepsy, regression analysis, seizurepropagation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1431
3095 Segmentation of Piecewise Polynomial Regression Model by Using Reversible Jump MCMC Algorithm

Authors: Suparman

Abstract:

Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.

Keywords: Piecewise, Bayesian, reversible jump MCMC, segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1668
3094 Security in Resource Constraints Network Light Weight Encryption for Z-MAC

Authors: Mona Almansoori, Ahmed Mustafa, Ahmad Elshamy

Abstract:

Wireless sensor network was formed by a combination of nodes, systematically it transmitting the data to their base stations, this transmission data can be easily compromised if the limited processing power and the data consistency from these nodes are kept in mind; there is always a discussion to address the secure data transfer or transmission in actual time. This will present a mechanism to securely transmit the data over a chain of sensor nodes without compromising the throughput of the network by utilizing available battery resources available in the sensor node. Our methodology takes many different advantages of Z-MAC protocol for its efficiency, and it provides a unique key by sharing the mechanism using neighbor node MAC address. We present a light weighted data integrity layer which is embedded in the Z-MAC protocol to prove that our protocol performs well than Z-MAC when we introduce the different attack scenarios.

Keywords: Hybrid MAC protocol, data integrity, lightweight encryption, Neighbor based key sharing, Sensor node data processing, Z-MAC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 564
3093 Fuzzy Logic Approach to Robust Regression Models of Uncertain Medical Categories

Authors: Arkady Bolotin

Abstract:

Dichotomization of the outcome by a single cut-off point is an important part of various medical studies. Usually the relationship between the resulted dichotomized dependent variable and explanatory variables is analyzed with linear regression, probit regression or logistic regression. However, in many real-life situations, a certain cut-off point dividing the outcome into two groups is unknown and can be specified only approximately, i.e. surrounded by some (small) uncertainty. It means that in order to have any practical meaning the regression model must be robust to this uncertainty. In this paper, we show that neither the beta in the linear regression model, nor its significance level is robust to the small variations in the dichotomization cut-off point. As an alternative robust approach to the problem of uncertain medical categories, we propose to use the linear regression model with the fuzzy membership function as a dependent variable. This fuzzy membership function denotes to what degree the value of the underlying (continuous) outcome falls below or above the dichotomization cut-off point. In the paper, we demonstrate that the linear regression model of the fuzzy dependent variable can be insensitive against the uncertainty in the cut-off point location. In the paper we present the modeling results from the real study of low hemoglobin levels in infants. We systematically test the robustness of the binomial regression model and the linear regression model with the fuzzy dependent variable by changing the boundary for the category Anemia and show that the behavior of the latter model persists over a quite wide interval.

Keywords: Categorization, Uncertain medical categories, Binomial regression model, Fuzzy dependent variable, Robustness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1558
3092 The Selection of the Nearest Anchor Using Received Signal Strength Indication (RSSI)

Authors: Hichem Sassi, Tawfik Najeh, Noureddine Liouane

Abstract:

The localization information is crucial for the operation of WSN. There are principally two types of localization algorithms. The Range-based localization algorithm has strict requirements on hardware, thus is expensive to be implemented in practice. The Range-free localization algorithm reduces the hardware cost. However, it can only achieve high accuracy in ideal scenarios. In this paper, we locate unknown nodes by incorporating the advantages of these two types of methods. The proposed algorithm makes the unknown nodes select the nearest anchor using the Received Signal Strength Indicator (RSSI) and choose two other anchors which are the most accurate to achieve the estimated location. Our algorithm improves the localization accuracy compared with previous algorithms, which has been demonstrated by the simulating results.

Keywords: WSN, localization, DV-hop, RSSI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1815
3091 Voltage Stability Margin-Based Approach for Placement of Distributed Generators in Power Systems

Authors: Oludamilare Bode Adewuyi, Yanxia Sun, Isaiah Gbadegesin Adebayo

Abstract:

Voltage stability analysis is crucial to the reliable and economic operation of power systems. The power system of developing nations is more susceptible to failures due to the continuously increasing load demand which is not matched with generation increase and efficient transmission infrastructures. Thus, most power systems are heavily stressed and the planning of extra generation from distributed generation sources needs to be efficiently done so as to ensure the security of the power system. In this paper, the performance of a relatively different approach using line voltage stability margin indicator, which has proven to have better accuracy, has been presented and compared with a conventional line voltage stability index for distributed generators (DGs) siting using the Nigerian 28 bus system. Critical Boundary Index (CBI) for voltage stability margin estimation was deployed to identify suitable locations for DG placement and the performance was compared with DG placement using Novel Line Stability Index (NLSI) approach. From the simulation results, both CBI and NLSI agreed greatly on suitable locations for DG on the test system; while CBI identified bus 18 as the most suitable at system overload, NLSI identified bus 8 to be the most suitable. Considering the effect of the DG placement at the selected buses on the voltage magnitude profile, the result shows that the DG placed on bus 18 identified by CBI improved the performance of the power system better.

Keywords: Voltage stability analysis, voltage collapse, voltage stability index, distributed generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 455
3090 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 948
3089 The Relative Efficiency of Parameter Estimation in Linear Weighted Regression

Authors: Baoguang Tian, Nan Chen

Abstract:

A new relative efficiency in linear model in reference is instructed into the linear weighted regression, and its upper and lower bound are proposed. In the linear weighted regression model, for the best linear unbiased estimation of mean matrix respect to the least-squares estimation, two new relative efficiencies are given, and their upper and lower bounds are also studied.

Keywords: Linear weighted regression, Relative efficiency, Mean matrix, Trace.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2472
3088 Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier

Authors: Khin May Win, Nan Sai Moon Kham

Abstract:

Identification of cancer genes that might anticipate the clinical behaviors from different types of cancer disease is challenging due to the huge number of genes and small number of patients samples. The new method is being proposed based on supervised learning of classification like support vector machines (SVMs).A new solution is described by the introduction of the Maximized Margin (MM) in the subset criterion, which permits to get near the least generalization error rate. In class prediction problem, gene selection is essential to improve the accuracy and to identify genes for cancer disease. The performance of the new method was evaluated with real-world data experiment. It can give the better accuracy for classification.

Keywords: Microarray data, feature selection, recursive featureelimination, support vector machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1540
3087 Factors Influencing Bank Profitability of Czech Banks and Their International Parent Companies

Authors: Libena Cernohorska

Abstract:

The goal of this paper is to specify factors influencing the profitability of selected banks. Next, a model will be created to help establish variables that have a demonstrable influence on the development of the selected banks' profitability ratios. Czech banks and their international parent companies were selected for analyzing profitability. Banks categorized as large banks (according to the Czech National Bank's system, which ranks banks according to balance sheet total) were selected to represent the Czech banks. Two ratios, the return on assets ratio (ROA) and the return on equity ratio (ROE) are used to assess bank profitability. Six endogenous and four external indicators were selected from among other factors that influence bank profitability. The data analyzed were for 2001–2013. First, correlation analysis, which was supposed to eliminate correlated values, was conducted. A large number of correlated values were established on the basis of this analysis. The strongly correlated values were omitted. Despite this, the subsequent regression analysis of profitability for the individual banks that were selected did not confirm that the selected variables influenced their profitability. The studied factors' influence on bank profitability was demonstrated only for Ceskoslovenska Obchodni Banka and Société Générale using regression analysis. For Československa Obchodni Banka, it was demonstrated that inflation level and the amount of the central bank's interest rate influenced the return on assets ratio and that capital adequacy and market concentration influenced the return on equity ratio for Société Générale.

Keywords: Banks, profitability, regression analysis, ROA, ROE.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1887
3086 Isolation and Classification of Red Blood Cells in Anemic Microscopic Images

Authors: Jameela Ali Alkrimi, Loay E. George, Azizah Suliman, Abdul Rahim Ahmad, Karim Al-Jashamy

Abstract:

Red blood cells (RBCs) are among the most commonly and intensively studied type of blood cells in cell biology. Anemia is a lack of RBCs is characterized by its level compared to the normal hemoglobin level. In this study, a system based image processing methodology was developed to localize and extract RBCs from microscopic images. Also, the machine learning approach is adopted to classify the localized anemic RBCs images. Several textural and geometrical features are calculated for each extracted RBCs. The training set of features was analyzed using principal component analysis (PCA). With the proposed method, RBCs were isolated in 4.3secondsfrom an image containing 18 to 27 cells. The reasons behind using PCA are its low computation complexity and suitability to find the most discriminating features which can lead to accurate classification decisions. Our classifier algorithm yielded accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and neural network RBFNN, respectively. Classification was evaluated in highly sensitivity, specificity, and kappa statistical parameters. In conclusion, the classification results were obtained within short time period, and the results became better when PCA was used.

Keywords: Red blood cells, pre-processing image algorithms, classification algorithms, principal component analysis PCA, confusion matrix, kappa statistical parameters, ROC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3199
3085 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić

Abstract:

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analyzed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analyzed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

Keywords: European Union, Internet purchases, multiple linear regression model, outlier

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2955
3084 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: Fuzzy C-means clustering, Fuzzy C-means clustering based attribute weighting, Pima Indians diabetes dataset, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1762
3083 FEA for Teeth Preparations Marginal Geometry

Authors: L. Sandu, F. Topalâ, S. Porojan

Abstract:

Knowledge of factors, which influence stress and its distribution, is of key importance to the successful production of durable restorations. One of this is the marginal geometry. The objective of this study was to evaluate, by finite element analysis (FEA), the influence of different marginal designs on the stress distribution in teeth prepared for cast metal crowns. Five margin designs were taken into consideration: shoulderless, chamfer, shoulder, sloped shoulder and shoulder with bevel. For each kind of preparation three dimensional finite element analyses were initiated. Maximal equivalent stresses were calculated and stress patterns were represented in order to compare the marginal designs. Within the limitation of this study, the shoulder and beveled shoulder margin preparations of the teeth are preferred for cast metal crowns from biomechanical point of view.

Keywords: finite element, marginal geometry, metal crown

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2060
3082 Inverter Based Gain-Boosting Fully Differential CMOS Amplifier

Authors: Alpana Agarwal, Akhil Sharma

Abstract:

This work presents a fully differential CMOS amplifier consisting of two self-biased gain boosted inverter stages, that provides an alternative to the power hungry operational amplifier. The self-biasing avoids the use of external biasing circuitry, thus reduces the die area, design efforts, and power consumption. In the present work, regulated cascode technique has been employed for gain boosting. The Miller compensation is also applied to enhance the phase margin. The circuit has been designed and simulated in 1.8 V 0.18 µm CMOS technology. The simulation results show a high DC gain of 100.7 dB, Unity-Gain Bandwidth of 107.8 MHz, and Phase Margin of 66.7o with a power dissipation of 286 μW and makes it suitable candidate for the high resolution pipelined ADCs.

Keywords: CMOS amplifier, gain boosting, inverter-based amplifier, self-biased inverter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2616
3081 Extended Least Squares LS–SVM

Authors: József Valyon, Gábor Horváth

Abstract:

Among neural models the Support Vector Machine (SVM) solutions are attracting increasing attention, mostly because they eliminate certain crucial questions involved by neural network construction. The main drawback of standard SVM is its high computational complexity, therefore recently a new technique, the Least Squares SVM (LS–SVM) has been introduced. In this paper we present an extended view of the Least Squares Support Vector Regression (LS–SVR), which enables us to develop new formulations and algorithms to this regression technique. Based on manipulating the linear equation set -which embodies all information about the regression in the learning process- some new methods are introduced to simplify the formulations, speed up the calculations and/or provide better results.

Keywords: Function estimation, Least–Squares Support VectorMachines, Regression, System Modeling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2008
3080 Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression

Authors: Galal Elkobrosy, Amr M. Abdelrazek, Bassuny M. Elsouhily, Mohamed E. Khidr

Abstract:

Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3rd degree to 1st degree and suggested valid predictions and stable explanations.

Keywords: Design of experiments, regression analysis, SI Engine, statistical modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252
3079 Churn Prediction: Does Technology Matter?

Authors: John Hadden, Ashutosh Tiwari, Rajkumar Roy, Dymitr Ruta

Abstract:

The aim of this paper is to identify the most suitable model for churn prediction based on three different techniques. The paper identifies the variables that affect churn in reverence of customer complaints data and provides a comparative analysis of neural networks, regression trees and regression in their capabilities of predicting customer churn.

Keywords: Churn, Decision Trees, Neural Networks, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3300
3078 How to Use E-Learning to Increase Job Satisfaction in Large Commercial Bank in Bangkok

Authors: Teerada Apibunyopas, Nithinant Thammakoranonta

Abstract:

Many organizations bring e-Learning to use as a tool in their training and human development department. It is getting more popular because it is easy to access to get knowledge all the time and also it provides a rich content, which can develop the employees’ skill efficiently. This study is focused on the factors that affect using e-Learning efficiently, so it will make job satisfaction increasing. The questionnaires were sent to employees in large commercial banks, which use e-Learning located in Bangkok, the results from multiple linear regression analysis showed that employee’s characteristics, characteristics of e-Learning, learning and growth have influence on job satisfaction.

Keywords: e-Learning, Job Satisfaction, Learning and growth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2386
3077 Categorical Data Modeling: Logistic Regression Software

Authors: Abdellatif Tchantchane

Abstract:

A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.

Keywords: Logistic regression, Matlab, Categorical data, Influential observation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881
3076 Research on the Problems of Housing Prices in Qingdao from a Macro Perspective

Authors: Liu Zhiyuan, Sun Zongdi, Liu Zhiyuan, Sun Zongdi

Abstract:

Qingdao is a seaside city. Taking into account the characteristics of Qingdao, this article established a multiple linear regression model to analyze the impact of macroeconomic factors on housing prices. We used stepwise regression method to make multiple linear regression analysis, and made statistical analysis of F test values and T test values. According to the analysis results, the model is continuously optimized. Finally, this article obtained the multiple linear regression equation and the influencing factors, and the reliability of the model was verified by F test and T test.

Keywords: Housing prices, multiple linear regression model, macroeconomic factors, Qingdao City.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1179
3075 Spatial Data Mining by Decision Trees

Authors: S. Oujdi, H. Belbachir

Abstract:

Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.

Keywords: C4.5 Algorithm, Decision trees, S-CART, Spatial data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2986
3074 Ranking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison

Authors: Frank Emmert-Streib, Matthias Dehmer, Jing Liu, Max Muhlhauser

Abstract:

The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, generalized trees, graph alignment, DNA microarray data, cervical cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1752
3073 Architecture of Large-Scale Systems

Authors: Arne Koschel, Irina Astrova, Elena Deutschkämer, Jacob Ester, Johannes Feldmann

Abstract:

In this paper various techniques in relation to large-scale systems are presented. At first, explanation of large-scale systems and differences from traditional systems are given. Next, possible specifications and requirements on hardware and software are listed. Finally, examples of large-scale systems are presented.

Keywords: Distributed file systems, cashing, large scale systems, MapReduce algorithm, NoSQL databases.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3057
3072 Estimating Regression Parameters in Linear Regression Model with a Censored Response Variable

Authors: Jesus Orbe, Vicente Nunez-Anton

Abstract:

In this work we study the effect of several covariates X on a censored response variable T with unknown probability distribution. In this context, most of the studies in the literature can be located in two possible general classes of regression models: models that study the effect the covariates have on the hazard function; and models that study the effect the covariates have on the censored response variable. Proposals in this paper are in the second class of models and, more specifically, on least squares based model approach. Thus, using the bootstrap estimate of the bias, we try to improve the estimation of the regression parameters by reducing their bias, for small sample sizes. Simulation results presented in the paper show that, for reasonable sample sizes and censoring levels, the bias is always smaller for the new proposals.

Keywords: Censored response variable, regression, bias.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1474
3071 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches

Authors: Aya Salama

Abstract:

Digital Twin has emerged as a compelling research area, capturing the attention of scholars over the past decade. It finds applications across diverse fields, including smart manufacturing and healthcare, offering significant time and cost savings. Notably, it often intersects with other cutting-edge technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, the concept of a Human Digital Twin (HDT) is still in its infancy and requires further demonstration of its practicality. HDT takes the notion of Digital Twin a step further by extending it to living entities, notably humans, who are vastly different from inanimate physical objects. The primary objective of this research was to create an HDT capable of automating real-time human responses by simulating human behavior. To achieve this, the study delved into various areas, including clustering, supervised classification, topic extraction, and sentiment analysis. The paper successfully demonstrated the feasibility of HDT for generating personalized responses in social messaging applications. Notably, the proposed approach achieved an overall accuracy of 63%, a highly promising result that could pave the way for further exploration of the HDT concept. The methodology employed Random Forest for clustering the question database and matching new questions, while K-nearest neighbor was utilized for sentiment analysis.

Keywords: Human Digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification and clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188