Search results for: multivariate regression
505 A Linear Regression Model for Estimating Anxiety Index Using Wide Area Frontal Lobe Brain Blood Volume
Authors: Takashi Kaburagi, Masashi Takenaka, Yosuke Kurihara, Takashi Matsumoto
Abstract:
Major depressive disorder (MDD) is one of the most common mental illnesses today. It is believed to be caused by a combination of several factors, including stress. Stress can be quantitatively evaluated using the State-Trait Anxiety Inventory (STAI), one of the best indices to evaluate anxiety. Although STAI scores are widely used in applications ranging from clinical diagnosis to basic research, the scores are calculated based on a self-reported questionnaire. An objective evaluation is required because the subject may intentionally change his/her answers if multiple tests are carried out. In this article, we present a modified index called the “multi-channel Laterality Index at Rest (mc-LIR)” by recording the brain activity from a wider area of the frontal lobe using multi-channel functional near-infrared spectroscopy (fNIRS). The presented index aims to measure multiple positions near the Fpz defined by the international 10-20 system positioning. Using 24 subjects, the dependencies on the number of measuring points used to calculate the mc-LIR and its correlation coefficients with the STAI scores are reported. Furthermore, a simple linear regression was performed to estimate the STAI scores from mc-LIR. The cross-validation error is also reported. The experimental results show that using multiple positions near the Fpz will improve the correlation coefficients and estimation than those using only two positions.
Keywords: Stress, functional near-infrared spectroscopy, frontal lobe, state-trait anxiety inventory score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1166504 Integrating Decision Tree and Spatial Cluster Analysis for Landslide Susceptibility Zonation
Authors: Chien-Min Chu, Bor-Wen Tsai, Kang-Tsung Chang
Abstract:
Landslide susceptibility map delineates the potential zones for landslide occurrence. Previous works have applied multivariate methods and neural networks for mapping landslide susceptibility. This study proposed a new approach to integrate decision tree model and spatial cluster statistic for assessing landslide susceptibility spatially. A total of 2057 landslide cells were digitized for developing the landslide decision tree model. The relationships of landslides and instability factors were explicitly represented by using tree graphs in the model. The local Getis-Ord statistics were used to cluster cells with high landslide probability. The analytic result from the local Getis-Ord statistics was classed to create a map of landslide susceptibility zones. The map was validated using new landslide data with 482 cells. Results of validation show an accuracy rate of 86.1% in predicting new landslide occurrence. This indicates that the proposed approach is useful for improving landslide susceptibility mapping.Keywords: Landslide susceptibility Zonation, Decision treemodel, Spatial cluster, Local Getis-Ord statistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1940503 Multinomial Dirichlet Gaussian Process Model for Classification of Multidimensional Data
Authors: Wanhyun Cho, Soonja Kang, Sangkyoon Kim, Soonyoung Park
Abstract:
We present probabilistic multinomial Dirichlet classification model for multidimensional data and Gaussian process priors. Here, we have considered efficient computational method that can be used to obtain the approximate posteriors for latent variables and parameters needed to define the multiclass Gaussian process classification model. We first investigated the process of inducing a posterior distribution for various parameters and latent function by using the variational Bayesian approximations and important sampling method, and next we derived a predictive distribution of latent function needed to classify new samples. The proposed model is applied to classify the synthetic multivariate dataset in order to verify the performance of our model. Experiment result shows that our model is more accurate than the other approximation methods.Keywords: Multinomial dirichlet classification model, Gaussian process priors, variational Bayesian approximation, Importance sampling, approximate posterior distribution, Marginal likelihood evidence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614502 A Prototype of Augmented Reality for Visualising Large Sensors’ Datasets
Authors: Folorunso Olufemi Ayinde, Mohd Shahrizal Sunar, Sarudin Kari, Dzulkifli Mohamad
Abstract:
In this paper we discuss the development of an Augmented Reality (AR) - based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations. Therefore we have developed a data model to effectively manage such data and enhance the computational support needed for the effective data explorations. A challenge of this approach is to reduce the data inefficiency powered by the disparate, repeated, inconsistent and missing attributes of most available sensors datasets. To handle this challenge, this paper aim to develop an AR-based scientific visualization interface which automatically identifies, localise and visualizes all necessary data relevant to a particularly selected region of interest (ROI) along the virtual pipeline network. Necessary system architectural supports needed as well as the interface requirements for such visualizations are also discussed in this paper.
Keywords: Sensor Leakages Datasets, Augmented Reality, Sensor Data-Model, Scientific Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1680501 Foreign Direct Investment on Economic Growth by Industries in Central and Eastern European Countries
Authors: Shorena Pharjiani
Abstract:
Present empirical paper investigates the relationship between FDI and economic growth by 10 selected industries in 10 Central and Eastern European countries from the period 1995 to 2012. Different estimation approaches were used to explore the connection between FDI and economic growth, for example OLS, RE, FE with and without time dummies. Obtained empirical results leads to some main consequences: First, the Central and East European countries (CEEC) attracted foreign direct investment, which raised the productivity of industries they entered in. It should be concluded that the linkage between FDI and output growth by industries is positive and significant enough to suggest that foreign firm’s participation enhanced the productivity of the industries they occupied. There had been an endogeneity problem in the regression and fixed effects estimation approach was used which partially corrected the regression analysis in order to make the results less biased. Second, it should be stressed that the results show that time has an important role in making FDI operational for enhancing output growth by industries via total factor productivity. Third, R&D positively affected economic growth and at the same time, it should take some time for research and development to influence economic growth. Fourth, the general trends masked crucial differences at the country level: over the last 20 years, the analysis of the tables and figures at the country level show that the main recipients of FDI of the 11 Central and Eastern European countries were Hungary, Poland and the Czech Republic. The main reason was that these countries had more open door policies for attracting the FDI. Fifth, according to the graphical analysis, while Hungary had the highest FDI inflow in this region, it was not reflected in the GDP growth as much as in other Central and Eastern European countries.Keywords: Central and East European countries (CEEC), economic growth, FDI, panel data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1665500 Effects of Environmental Factors on Polychaete Assemblage in Penang National Park, Malaysia
Authors: Mohammad Gholizadeh, Khairun Yahya, Anita Talib, Omar Ahmad
Abstract:
Macrobenthos distribution along the coastal waters of Penang National Park was studid to estimate the effect of different environmental parameters at three stations, during six sampling months, from June 2010 to April 2011. The aim of this survey was to investigate different environment stress over soft bottom polychaete community along Teluk Ketapang and Pantai Acheh (Penang National Park) over a year period. Variations in the polychaete community were evaluated using univariate and multivariate methods. A total of 604 individuals were examined which was grouped into 23 families. Family Nereidae was the most abundant (22.68%), followed by Spionidae (22.02%), Hesionidae (12.58%), Nephtylidae (9.27%) and Orbiniidae (8.61%). It is noticeable that good results can only be obtained on the basis of good taxonomic resolution. The maximum Shannon-Wiener diversity (H'=2.16) was recorded at distance 200m and 1200m (August 2010) in Teluk Ketapang and lowest value of diversity was found at distance 1200m (December 2010) in Teluk Ketapang.Keywords: Polychaete assemblage, environment factor, Pantai Acheh, Teluk Ketapang.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2198499 Defining Human Resources “Bundles” and Its’ Correlation with Companies’ Financial Performances
Authors: Ivana Tadić, Snježana Pivac
Abstract:
Although human resources are recognized as the crucial companies’ resources and their positive influence on companies’ performances has been confirmed through different researches, scientists are still debating it. In order to contribute this debate, this paper firstly discusses the most important human resource management elements and practices and its influence on companies’ success. Afterwards it defines human resource “bundles” – interrelated and internally consistent human resource practices, complementary to each other, or the most important human resource practices and elements regarding Croatian companies and its human resource management activities. Finally, the paper provides empirical results; more precisely it reveals the relation of the level of development of human resource management function (“bundles”) and companies’ financial performances (using profitability ratios, liquidity ratios, solvency ratios and a group of additional ratios related to employees’ indicators).
Keywords: Companies’ performances, human resource bundles, multivariate statistical analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8810498 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector
Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh
Abstract:
A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.
Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 408497 Evaluation of Short-Term Load Forecasting Techniques Applied for Smart Micro Grids
Authors: Xiaolei Hu, Enrico Ferrera, Riccardo Tomasi, Claudio Pastrone
Abstract:
Load Forecasting plays a key role in making today's and future's Smart Energy Grids sustainable and reliable. Accurate power consumption prediction allows utilities to organize in advance their resources or to execute Demand Response strategies more effectively, which enables several features such as higher sustainability, better quality of service, and affordable electricity tariffs. It is easy yet effective to apply Load Forecasting at larger geographic scale, i.e. Smart Micro Grids, wherein the lower available grid flexibility makes accurate prediction more critical in Demand Response applications. This paper analyses the application of short-term load forecasting in a concrete scenario, proposed within the EU-funded GreenCom project, which collect load data from single loads and households belonging to a Smart Micro Grid. Three short-term load forecasting techniques, i.e. linear regression, artificial neural networks, and radial basis function network, are considered, compared, and evaluated through absolute forecast errors and training time. The influence of weather conditions in Load Forecasting is also evaluated. A new definition of Gain is introduced in this paper, which innovatively serves as an indicator of short-term prediction capabilities of time spam consistency. Two models, 24- and 1-hour-ahead forecasting, are built to comprehensively compare these three techniques.
Keywords: Short-term load forecasting, smart micro grid, linear regression, artificial neural networks, radial basis function network, Gain.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2602496 The Impact of Socio-Economic and Type of Religion on the Behavior of Obedience among Arab-Israeli Teenagers
Authors: Sadhana Ghnayem
Abstract:
This article examines the relationship between several socio-economic and background variables of Arab-Israeli families and their effect on the conflict management style of forcing, where teenage children are expected to obey their parents without questioning. The article explores the inter-generational gap and the desire of Arab-Israeli parents to force their teenage children to obey without questioning. The independent variables include: the sex of the parent, religion (Christian or Muslim), income of the parent, years of education of the parent, and the sex of the teenage child. We use the dependent variable of “Obedience Without Questioning” that is reported twice: by each of the parents as well as by the children. We circulated a questionnaire and collected data from a sample of 180 parents and their adolescent child living in the Galilee area during 2018. In this questionnaire we asked each of the parent and his/her teenage child about whether the latter is expected to follow the instructions of the former without questioning. The outcome of this article indicates, first, that Christian-Arab families are less authoritarian than Muslims families in demanding sheer obedience from their children. Second, female parents indicate more than male parents that their teenage child indeed obeys without questioning. Third, there is a negative correlation between the variable “Income” and “Obedience without Questioning.” Yet, the regression coefficient of this variable is close zero. Fourth, there is a positive correlation between years of education and obedience reported by the children. In other words, more educated parents are more likely to demand obedience from their children. Finally, after running the regression, the study also found that the impact of the variables of religion as well as the sex of the child on the dependent variable of obedience is also significant at above 95 and 90%, respectively.
Keywords: Arab-Israeli parents, Obedience, Forcing, Inter-generational gap.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 793495 Hippocampus Segmentation using a Local Prior Model on its Boundary
Authors: Dimitrios Zarpalas, Anastasios Zafeiropoulos, Petros Daras, Nicos Maglaveras
Abstract:
Segmentation techniques based on Active Contour Models have been strongly benefited from the use of prior information during their evolution. Shape prior information is captured from a training set and is introduced in the optimization procedure to restrict the evolution into allowable shapes. In this way, the evolution converges onto regions even with weak boundaries. Although significant effort has been devoted on different ways of capturing and analyzing prior information, very little thought has been devoted on the way of combining image information with prior information. This paper focuses on a more natural way of incorporating the prior information in the level set framework. For proof of concept the method is applied on hippocampus segmentation in T1-MR images. Hippocampus segmentation is a very challenging task, due to the multivariate surrounding region and the missing boundary with the neighboring amygdala, whose intensities are identical. The proposed method, mimics the human segmentation way and thus shows enhancements in the segmentation accuracy.Keywords: Medical imaging & processing, Brain MRI segmentation, hippocampus segmentation, hippocampus-amygdala missingboundary, weak boundary segmentation, region based segmentation, prior information, local weighting scheme in level sets, spatialdistribution of labels, gradient distribution on boundary.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1752494 Meta Model for Optimum Design Objective Function of Steel Frames Subjected to Seismic Loads
Authors: Salah R. Al Zaidee, Ali S. Mahdi
Abstract:
Except for simple problems of statically determinate structures, optimum design problems in structural engineering have implicit objective functions where structural analysis and design are essential within each searching loop. With these implicit functions, the structural engineer is usually enforced to write his/her own computer code for analysis, design, and searching for optimum design among many feasible candidates and cannot take advantage of available software for structural analysis, design, and searching for the optimum solution. The meta-model is a regression model used to transform an implicit objective function into objective one and leads in turn to decouple the structural analysis and design processes from the optimum searching process. With the meta-model, well-known software for structural analysis and design can be used in sequence with optimum searching software. In this paper, the meta-model has been used to develop an explicit objective function for plane steel frames subjected to dead, live, and seismic forces. Frame topology is assumed as predefined based on architectural and functional requirements. Columns and beams sections and different connections details are the main design variables in this study. Columns and beams are grouped to reduce the number of design variables and to make the problem similar to that adopted in engineering practice. Data for the implicit objective function have been generated based on analysis and assessment for many design proposals with CSI SAP software. These data have been used later in SPSS software to develop a pure quadratic nonlinear regression model for the explicit objective function. Good correlations with a coefficient, R2, in the range from 0.88 to 0.99 have been noted between the original implicit functions and the corresponding explicit functions generated with meta-model.
Keywords: Meta-modal, objective function, steel frames, seismic analysis, design.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1333493 On the Performance of Information Criteria in Latent Segment Models
Authors: Jaime R. S. Fonseca
Abstract:
Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which may support the selection of the correct number of segments we conduct a simulation study. In particular, this study is intended to determine which information criteria are more appropriate for mixture model selection when considering data sets with only categorical segmentation base variables. The generation of mixtures of multinomial data supports the proposed analysis. As a result, we establish a relationship between the level of measurement of segmentation variables and some (eleven) information criteria-s performance. The criterion AIC3 shows better performance (it indicates the correct number of the simulated segments- structure more often) when referring to mixtures of multinomial segmentation base variables.Keywords: Quantitative Methods, Multivariate Data Analysis, Clustering, Finite Mixture Models, Information Theoretical Criteria, Simulation experiments.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1519492 Forecast of the Small Wind Turbines Sales with Replacement Purchases and with or without Account of Price Changes
Authors: V. Churkin, M. Lopatin
Abstract:
The purpose of the paper is to estimate the US small wind turbines market potential and forecast the small wind turbines sales in the US. The forecasting method is based on the application of the Bass model and the generalized Bass model of innovations diffusion under replacement purchases. In the work an exponential distribution is used for modeling of replacement purchases. Only one parameter of such distribution is determined by average lifetime of small wind turbines. The identification of the model parameters is based on nonlinear regression analysis on the basis of the annual sales statistics which has been published by the American Wind Energy Association (AWEA) since 2001 up to 2012. The estimation of the US average market potential of small wind turbines (for adoption purchases) without account of price changes is 57080 (confidence interval from 49294 to 64866 at P = 0.95) under average lifetime of wind turbines 15 years, and 62402 (confidence interval from 54154 to 70648 at P = 0.95) under average lifetime of wind turbines 20 years. In the first case the explained variance is 90,7%, while in the second - 91,8%. The effect of the wind turbines price changes on their sales was estimated using generalized Bass model. This required a price forecast. To do this, the polynomial regression function, which is based on the Berkeley Lab statistics, was used. The estimation of the US average market potential of small wind turbines (for adoption purchases) in that case is 42542 (confidence interval from 32863 to 52221 at P = 0.95) under average lifetime of wind turbines 15 years, and 47426 (confidence interval from 36092 to 58760 at P = 0.95) under average lifetime of wind turbines 20 years. In the first case the explained variance is 95,3%, while in the second – 95,3%.Keywords: Bass model, generalized Bass model, replacement purchases, sales forecasting of innovations, statistics of sales of small wind turbines in the United States.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1883491 Full-genomic Network Inference for Non-model organisms: A Case Study for the Fungal Pathogen Candida albicans
Authors: Jörg Linde, Ekaterina Buyko, Robert Altwasser, Udo Hahn, Reinhard Guthke
Abstract:
Reverse engineering of full-genomic interaction networks based on compendia of expression data has been successfully applied for a number of model organisms. This study adapts these approaches for an important non-model organism: The major human fungal pathogen Candida albicans. During the infection process, the pathogen can adapt to a wide range of environmental niches and reversibly changes its growth form. Given the importance of these processes, it is important to know how they are regulated. This study presents a reverse engineering strategy able to infer fullgenomic interaction networks for C. albicans based on a linear regression, utilizing the sparseness criterion (LASSO). To overcome the limited amount of expression data and small number of known interactions, we utilize different prior-knowledge sources guiding the network inference to a knowledge driven solution. Since, no database of known interactions for C. albicans exists, we use a textmining system which utilizes full-text research papers to identify known regulatory interactions. By comparing with these known regulatory interactions, we find an optimal value for global modelling parameters weighting the influence of the sparseness criterion and the prior-knowledge. Furthermore, we show that soft integration of prior-knowledge additionally improves the performance. Finally, we compare the performance of our approach to state of the art network inference approaches.
Keywords: Pathogen, network inference, text-mining, Candida albicans, LASSO, mutual information, reverse engineering, linear regression, modelling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673490 Microscopic Emission and Fuel Consumption Modeling for Light-duty Vehicles Using Portable Emission Measurement System Data
Authors: Wei Lei, Hui Chen, Lin Lu
Abstract:
Microscopic emission and fuel consumption models have been widely recognized as an effective method to quantify real traffic emission and energy consumption when they are applied with microscopic traffic simulation models. This paper presents a framework for developing the Microscopic Emission (HC, CO, NOx, and CO2) and Fuel consumption (MEF) models for light-duty vehicles. The variable of composite acceleration is introduced into the MEF model with the purpose of capturing the effects of historical accelerations interacting with current speed on emission and fuel consumption. The MEF model is calibrated by multivariate least-squares method for two types of light-duty vehicle using on-board data collected in Beijing, China by a Portable Emission Measurement System (PEMS). The instantaneous validation results shows the MEF model performs better with lower Mean Absolute Percentage Error (MAPE) compared to other two models. Moreover, the aggregate validation results tells the MEF model produces reasonable estimations compared to actual measurements with prediction errors within 12%, 10%, 19%, and 9% for HC, CO, NOx emissions and fuel consumption, respectively.Keywords: Emission, Fuel consumption, Light-duty vehicle, Microscopic, Modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2004489 Empirical Process Monitoring Via Chemometric Analysis of Partially Unbalanced Data
Authors: Hyun-Woo Cho
Abstract:
Real-time or in-line process monitoring frameworks are designed to give early warnings for a fault along with meaningful identification of its assignable causes. In artificial intelligence and machine learning fields of pattern recognition various promising approaches have been proposed such as kernel-based nonlinear machine learning techniques. This work presents a kernel-based empirical monitoring scheme for batch type production processes with small sample size problem of partially unbalanced data. Measurement data of normal operations are easy to collect whilst special events or faults data are difficult to collect. In such situations, noise filtering techniques can be helpful in enhancing process monitoring performance. Furthermore, preprocessing of raw process data is used to get rid of unwanted variation of data. The performance of the monitoring scheme was demonstrated using three-dimensional batch data. The results showed that the monitoring performance was improved significantly in terms of detection success rate of process fault.
Keywords: Process Monitoring, kernel methods, multivariate filtering, data-driven techniques, quality improvement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1746488 An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia
Authors: Carol Anne Hargreaves
Abstract:
A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.
Keywords: Machine learning, stock market trading, logistic principal component analysis, automated stock investment system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1098487 Machine Learning Techniques in Bank Credit Analysis
Authors: Fernanda M. Assef, Maria Teresinha A. Steiner
Abstract:
The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.
Keywords: Artificial Neural Networks, ANNs, classifier algorithms, credit risk assessment, logistic regression, machine learning, support vector machines.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1281486 Dispersion Rate of Spilled Oil in Water Column under Non-Breaking Water Waves
Authors: Hanifeh Imanian, Morteza Kolahdoozan
Abstract:
The purpose of this study is to present a mathematical phrase for calculating the dispersion rate of spilled oil in water column under non-breaking waves. In this regard, a multiphase numerical model is applied for which waves and oil phase were computed concurrently, and accuracy of its hydraulic calculations have been proven. More than 200 various scenarios of oil spilling in wave waters were simulated using the multiphase numerical model and its outcome were collected in a database. The recorded results were investigated to identify the major parameters affected vertical oil dispersion and finally 6 parameters were identified as main independent factors. Furthermore, some statistical tests were conducted to identify any relationship between the dependent variable (dispersed oil mass in the water column) and independent variables (water wave specifications containing height, length and wave period and spilled oil characteristics including density, viscosity and spilled oil mass). Finally, a mathematical-statistical relationship is proposed to predict dispersed oil in marine waters. To verify the proposed relationship, a laboratory example available in the literature was selected. Oil mass rate penetrated in water body computed by statistical regression was in accordance with experimental data was predicted. On this occasion, it was necessary to verify the proposed mathematical phrase. In a selected laboratory case available in the literature, mass oil rate penetrated in water body computed by suggested regression. Results showed good agreement with experimental data. The validated mathematical-statistical phrase is a useful tool for oil dispersion prediction in oil spill events in marine areas.Keywords: Dispersion, marine environment, mathematical-statistical relationship, oil spill.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1146485 The Loess Regression Relationship Between Age and BMI for both Sydney World Masters Games Athletes and the Australian National Population
Authors: Joe Walsh, Mike Climstein, Ian Timothy Heazlewood, Stephen Burke, Jyrki Kettunen, Kent Adams, Mark DeBeliso
Abstract:
Thousands of masters athletes participate quadrennially in the World Masters Games (WMG), yet this cohort of athletes remains proportionately under-investigated. Due to a growing global obesity pandemic in context of benefits of physical activity across the lifespan, the BMI trends for this unique population was of particular interest. The nexus between health, physical activity and aging is complex and has raised much interest in recent times due to the realization that a multifaceted approach is necessary in order to counteract the obesity pandemic. By investigating age based trends within a population adhering to competitive sport at older ages, further insight might be gleaned to assist in understanding one of many factors influencing this relationship.BMI was derived using data gathered on a total of 6,071 masters athletes (51.9% male, 48.1% female) aged 25 to 91 years ( =51.5, s =±9.7), competing at the Sydney World Masters Games (2009). Using linear and loess regression it was demonstrated that the usual tendency for prevalence of higher BMI increasing with age was reversed in the sample. This trend in reversal was repeated for both male and female only sub-sets of the sample participants, indicating the possibility of improved prevalence of BMI with increasing age for both the sample as a whole and these individual sub-groups.This evidence of improved classification in one index of health (reduced BMI) for masters athletes (when compared to the general population) implies there are either improved levels of this index of health with aging due to adherence to sport or possibly the reduced BMI is advantageous and contributes to this cohort adhering (or being attracted) to masters sport at older ages.Keywords: Aging, masters athlete, Quetelet Index, sport
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1712484 Automated Process Quality Monitoring with Prediction of Fault Condition Using Measurement Data
Authors: Hyun-Woo Cho
Abstract:
Detection of incipient abnormal events is important to improve safety and reliability of machine operations and reduce losses caused by failures. Improper set-ups or aligning of parts often leads to severe problems in many machines. The construction of prediction models for predicting faulty conditions is quite essential in making decisions on when to perform machine maintenance. This paper presents a multivariate calibration monitoring approach based on the statistical analysis of machine measurement data. The calibration model is used to predict two faulty conditions from historical reference data. This approach utilizes genetic algorithms (GA) based variable selection, and we evaluate the predictive performance of several prediction methods using real data. The results shows that the calibration model based on supervised probabilistic principal component analysis (SPPCA) yielded best performance in this work. By adopting a proper variable selection scheme in calibration models, the prediction performance can be improved by excluding non-informative variables from their model building steps.Keywords: Prediction, operation monitoring, on-line data, nonlinear statistical methods, empirical model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658483 A Quantitative Model for Determining the Area of the “Core and Structural System Elements” of Tall Office Buildings
Authors: Görkem Arslan Kılınç
Abstract:
Due to the high construction, operation, and maintenance costs of tall buildings, quantification of the area in the plan layout which provides a financial return is an important design criterion. The area of the “core and the structural system elements” does not provide financial return but must exist in the plan layout. Some characteristic items of tall office buildings affect the size of these areas. From this point of view, 15 tall office buildings were systematically investigated. The typical office floor plans of these buildings were re-produced digitally. The area of the “core and the structural system elements” in each building and the characteristic items of each building were calculated. These characteristic items are the size of the long and short plan edge, plan length/width ratio, size of the core long and short edge, core length/width ratio, core area, slenderness, building height, number of floors, and floor height. These items were analyzed by correlation and regression analyses. Results of this paper put forward that; characteristic items which affect the area of "core and structural system elements" are plan long and short edge size, core short edge size, building height, and the number of floors. A one-unit increase in plan short side size increases the area of the "core and structural system elements" in the plan by 12,378 m2. An increase in core short edge size increases the area of the core and structural system elements in the plan by 25,650 m2. Subsequent studies can be conducted by expanding the sample of the study and considering the geographical location of the building.
Keywords: Core area, correlation analysis, floor area, regression analysis, space efficiency, tall office buildings.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 506482 Variability of Metal Composition and Concentrations in Road Dust in the Urban Environment
Authors: Sandya Mummullage, Prasanna Egodawatta, Ashantha Goonetilleke, Godwin A. Ayoko
Abstract:
Urban road dust comprises of a range of potentially toxic metal elements and plays a critical role in degrading urban receiving water quality. Hence, assessing the metal composition and concentration in urban road dust is a high priority. This study investigated the variability of metal composition and concentrations in road dust in 4 different urban land uses in Gold Coast, Australia. Samples from 16 road sites were collected and tested for selected 12 metal species. The data set was analyzed using both univariate and multivariate techniques. Outcomes of the data analysis revealed that the metal concentrations inroad dust differs considerably within and between different land uses. Iron, aluminum, magnesium and zinc are the most abundant in urban land uses. It was also noted that metal species such as titanium, nickel, copper and zinc have the highest concentrations in industrial land use. The study outcomes revealed that soil and traffic related sources as key sources of metals deposited on road surfaces.
Keywords: Metals build-up, Pollutant accumulation, Stormwater quality, Urban road dust.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2350481 Meta Model Based EA for Complex Optimization
Authors: Maumita Bhattacharya
Abstract:
Evolutionary Algorithms are population-based, stochastic search techniques, widely used as efficient global optimizers. However, many real life optimization problems often require finding optimal solution to complex high dimensional, multimodal problems involving computationally very expensive fitness function evaluations. Use of evolutionary algorithms in such problem domains is thus practically prohibitive. An attractive alternative is to build meta models or use an approximation of the actual fitness functions to be evaluated. These meta models are order of magnitude cheaper to evaluate compared to the actual function evaluation. Many regression and interpolation tools are available to build such meta models. This paper briefly discusses the architectures and use of such meta-modeling tools in an evolutionary optimization context. We further present two evolutionary algorithm frameworks which involve use of meta models for fitness function evaluation. The first framework, namely the Dynamic Approximate Fitness based Hybrid EA (DAFHEA) model [14] reduces computation time by controlled use of meta-models (in this case approximate model generated by Support Vector Machine regression) to partially replace the actual function evaluation by approximate function evaluation. However, the underlying assumption in DAFHEA is that the training samples for the metamodel are generated from a single uniform model. This does not take into account uncertain scenarios involving noisy fitness functions. The second model, DAFHEA-II, an enhanced version of the original DAFHEA framework, incorporates a multiple-model based learning approach for the support vector machine approximator to handle noisy functions [15]. Empirical results obtained by evaluating the frameworks using several benchmark functions demonstrate their efficiencyKeywords: Meta model, Evolutionary algorithm, Stochastictechnique, Fitness function, Optimization, Support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2067480 Design of IMC-PID Controller Cascaded Filter for Simplified Decoupling Control System
Authors: Le Linh, Truong Nguyen Luan Vu, Le Hieu Giang
Abstract:
In this work, the IMC-PID controller cascaded filter based on Internal Model Control (IMC) scheme is systematically proposed for the simplified decoupling control system. The simplified decoupling is firstly introduced for multivariable processes by using coefficient matching to obtain a stable, proper, and causal simplified decoupler. Accordingly, transfer functions of decoupled apparent processes can be expressed as a set of n equivalent independent processes and then derived as a ratio of the original open-loop transfer function to the diagonal element of the dynamic relative gain array. The IMC-PID controller in series with filter is then directly employed to enhance the overall performance of the decoupling control system while avoiding difficulties arising from properties inherent to simplified decoupling. Some simulation studies are considered to demonstrate the simplicity and effectiveness of the proposed method. Simulations were conducted by tuning various controllers of the multivariate processes with multiple time delays. The results indicate that the proposed method consistently performs well with fast and well-balanced closed-loop time responses.
Keywords: Coefficient matching method, internal model control scheme, PID controller cascaded filter, simplified decoupler.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1483479 Protein Profiling in Alanine Aminotransferase Induced Patient cohort using Acetaminophen
Authors: Gry M, Bergström J, Lengquist J, Lindberg J, Drobin K, Schwenk J, Nilsson P, Schuppe-Koistinen I.
Abstract:
Sensitive and predictive DILI (Drug Induced Liver Injury) biomarkers are needed in drug R&D to improve early detection of hepatotoxicity. The discovery of DILI biomarkers that demonstrate the predictive power to identify individuals at risk to DILI would represent a major advance in the development of personalized healthcare approaches. In this healthy volunteer acetaminophen study (4g/day for 7 days, with 3 monitored nontreatment days before and 4 after), 450 serum samples from 32 subjects were analyzed using protein profiling by antibody suspension bead arrays. Multiparallel protein profiles were generated using a DILI target protein array with 300 antibodies, where the antibodies were selected based on previous literature findings of putative DILI biomarkers and a screening process using pre dose samples from the same cohort. Of the 32 subjects, 16 were found to develop an elevated ALT value (2Xbaseline, responders). Using the plasma profiling approach together with multivariate statistical analysis some novel findings linked to lipid metabolism were found and more important, endogenous protein profiles in baseline samples (prior to treatment) with predictive power for ALT elevations were identified.Keywords: DILI, Plasma profiling, PLSDA, Randomforest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1316478 Work Engagement of Malaysian Nurses: Exploring the Impact of Hope and Resilience
Authors: Noraini Othman, Aizzat Mohd Nasurdin
Abstract:
The purpose of this study was to investigate the relationship between hope and resilience with work engagement. A total of 422 staff nurses working in three public hospitals in Peninsular Malaysia participated in this study. Statistical results using regression analysis revealed that hope and resilience were positively related to work engagement. Possible reasons for these findings, as well as their implications and future research directions are discussed.
Keywords: hope, nurses, resilience, work engagement
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3752477 An Economic Analysis of Phu Kradueng National Park
Authors: Chutarat Boontho
Abstract:
The purposes of this study were as follows to evaluate the economic value of Phu Kradueng National Park by the travel cost method (TCM) and the contingent valuation method (CVM) and to estimate the demand for traveling and the willingness to pay. The data for this study were collected by conducting two large scale surveys on users and non-users. A total of 1,016 users and 1,034 non-users were interviewed. The data were analyzed using multiple linear regression analysis, logistic regression model and the consumer surplus (CS) was the integral of demand function for trips. The survey found, were as follows: 1)Using the travel cost method which provides an estimate of direct benefits to park users, we found that visitors- total willingness to pay per visit was 2,284.57 bath, of which 958.29 bath was travel cost, 1,129.82 bath was expenditure for accommodation, food, and services, and 166.66 bath was consumer surplus or the visitors -net gain or satisfaction from the visit (the integral of demand function for trips). 2) Thai visitors to Phu Kradueng National Park were further willing to pay an average of 646.84 bath per head per year to ensure the continued existence of Phu Kradueng National Park and to preserve their option to use it in the future. 3) Thai non-visitors, on the other hand, are willing to pay an average of 212.61 bath per head per year for the option and existence value provided by the Park. 4) The total economic value of Phu Kradueng National Park to Thai visitors and non-visitors taken together stands today at 9,249.55 million bath per year. 5) The users- average willingness to pay for access to Phu Kradueng National Park rises from 40 bath to 84.66 bath per head per trip for improved services such as road improvement, increased cleanliness, and upgraded information. This paper was needed to investigate of the potential market demand for bio prospecting in Phu Kradueng national Park and to investigate how a larger share of the economic benefits of tourism could be distributed income to the local residents.Keywords: Contingent Valuation Method, Travel Cost Method, Consumer surplus.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1788476 A Quantitative Tool for Analyze Process Design
Authors: Andrés Carrión García, Aura López de Murillo, José Jabaloyes Vivas, Angela Grisales del Río
Abstract:
Some quality control tools use non metric subjective information coming from experts, who qualify the intensity of relations existing inside processes, but without quantifying them. In this paper we have developed a quality control analytic tool, measuring the impact or strength of the relationship between process operations and product characteristics. The tool includes two models: a qualitative model, allowing relationships description and analysis; and a formal quantitative model, by means of which relationship quantification is achieved. In the first one, concepts from the Graphs Theory were applied to identify those process elements which can be sources of variation, that is, those quality characteristics or operations that have some sort of prelacy over the others and that should become control items. Also the most dependent elements can be identified, that is those elements receiving the effects of elements identified as variation sources. If controls are focused in those dependent elements, efficiency of control is compromised by the fact that we are controlling effects, not causes. The second model applied adapts the multivariate statistical technique of Covariance Structural Analysis. This approach allowed us to quantify the relationships. The computer package LISREL was used to obtain statistics and to validate the model.
Keywords: Characteristics matrix, covariance structure analysis, LISREL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597