Search results for: Regression ClassTree
493 Performance Analysis of Evolutionary ANN for Output Prediction of a Grid-Connected Photovoltaic System
Authors: S.I Sulaiman, T.K Abdul Rahman, I. Musirin, S. Shaari
Abstract:
This paper presents performance analysis of the Evolutionary Programming-Artificial Neural Network (EPANN) based technique to optimize the architecture and training parameters of a one-hidden layer feedforward ANN model for the prediction of energy output from a grid connected photovoltaic system. The ANN utilizes solar radiation and ambient temperature as its inputs while the output is the total watt-hour energy produced from the grid-connected PV system. EP is used to optimize the regression performance of the ANN model by determining the optimum values for the number of nodes in the hidden layer as well as the optimal momentum rate and learning rate for the training. The EPANN model is tested using two types of transfer function for the hidden layer, namely the tangent sigmoid and logarithmic sigmoid. The best transfer function, neural topology and learning parameters were selected based on the highest regression performance obtained during the ANN training and testing process. It is observed that the best transfer function configuration for the prediction model is [logarithmic sigmoid, purely linear].Keywords: Artificial neural network (ANN), Correlation coefficient (R), Evolutionary programming-ANN (EPANN), Photovoltaic (PV), logarithmic sigmoid and tangent sigmoid.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1900492 Monte Carlo Estimation of Heteroscedasticity and Periodicity Effects in a Panel Data Regression Model
Authors: Nureni O. Adeboye, Dawud A. Agunbiade
Abstract:
This research attempts to investigate the effects of heteroscedasticity and periodicity in a Panel Data Regression Model (PDRM) by extending previous works on balanced panel data estimation within the context of fitting PDRM for Banks audit fee. The estimation of such model was achieved through the derivation of Joint Lagrange Multiplier (LM) test for homoscedasticity and zero-serial correlation, a conditional LM test for zero serial correlation given heteroscedasticity of varying degrees as well as conditional LM test for homoscedasticity given first order positive serial correlation via a two-way error component model. Monte Carlo simulations were carried out for 81 different variations, of which its design assumed a uniform distribution under a linear heteroscedasticity function. Each of the variation was iterated 1000 times and the assessment of the three estimators considered are based on Variance, Absolute bias (ABIAS), Mean square error (MSE) and the Root Mean Square (RMSE) of parameters estimates. Eighteen different models at different specified conditions were fitted, and the best-fitted model is that of within estimator when heteroscedasticity is severe at either zero or positive serial correlation value. LM test results showed that the tests have good size and power as all the three tests are significant at 5% for the specified linear form of heteroscedasticity function which established the facts that Banks operations are severely heteroscedastic in nature with little or no periodicity effects.
Keywords: Audit fee, heteroscedasticity, Lagrange multiplier test, periodicity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 739491 A New Composition Method of Admissible Support Vector Kernel Based on Reproducing Kernel
Authors: Wei Zhang, Xin Zhao, Yi-Fan Zhu, Xin-Jian Zhang
Abstract:
Kernel function, which allows the formulation of nonlinear variants of any algorithm that can be cast in terms of dot products, makes the Support Vector Machines (SVM) have been successfully applied in many fields, e.g. classification and regression. The importance of kernel has motivated many studies on its composition. It-s well-known that reproducing kernel (R.K) is a useful kernel function which possesses many properties, e.g. positive definiteness, reproducing property and composing complex R.K by simple operation. There are two popular ways to compute the R.K with explicit form. One is to construct and solve a specific differential equation with boundary value whose handicap is incapable of obtaining a unified form of R.K. The other is using a piecewise integral of the Green function associated with a differential operator L. The latter benefits the computation of a R.K with a unified explicit form and theoretical analysis, whereas there are relatively later studies and fewer practical computations. In this paper, a new algorithm for computing a R.K is presented. It can obtain the unified explicit form of R.K in general reproducing kernel Hilbert space. It avoids constructing and solving the complex differential equations manually and benefits an automatic, flexible and rigorous computation for more general RKHS. In order to validate that the R.K computed by the algorithm can be used in SVM well, some illustrative examples and a comparison between R.K and Gaussian kernel (RBF) in support vector regression are presented. The result shows that the performance of R.K is close or slightly superior to that of RBF.
Keywords: admissible support vector kernel, reproducing kernel, reproducing kernel Hilbert space, Green function, support vectorregression
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544490 Interest Rate Fluctuation Effect on Commercial Bank’s Fixed Fund Deposit in Nigeria
Authors: Okolo Chimaobi Valentine
Abstract:
Commercial banks in Nigeria adopted many strategies to attract fresh deposits including the use of high deposit rate. However, pricing of banking services moved in favor of the banks at the expense of customers, resulting in their seeking other investment alternatives rather than saving their money in the bank. Both deposit and lending rates were greatly influenced by the Central Bank of Nigeria (CBN) decision on interest rate. Therefore, commercial bank effort to attract deposits via manipulation of her rates was greatly limited, otherwise the banks will be giving out more than it earned. The study aimed at examining the relationship between interest rate and fixed fund deposit of commercial banks, how policy-controlled interest rate affected commercial bank’s fixed fund deposit The researcher employed ordinary least square technique, using, multiple linear regression, unrestricted vector auto-regression, correlation matrix test, granger causality and impulse response graph in the analysis. Commercial bank’s interest rates affected commercial bank’s fixed fund deposit significantly while policy-controlled interest rate did not significantly transmit through the commercial bank’s interest rates to affect fixed fund deposit. While commercial banks seek creative ways to expand their fixed fund deposit, policy authorities in Nigeria should better coordinate interest rate fluctuation and induce competition in the entire financial sector.Keywords: Commercial bank, fixed fund deposit, fluctuation effects, interest rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3601489 Comparing Machine Learning Estimation of Fuel Consumption of Heavy-Duty Vehicles
Authors: Victor Bodell, Lukas Ekstrom, Somayeh Aghanavesi
Abstract:
Fuel consumption (FC) is one of the key factors in determining expenses of operating a heavy-duty vehicle. A customer may therefore request an estimate of the FC of a desired vehicle. The modular design of heavy-duty vehicles allows their construction by specifying the building blocks, such as gear box, engine and chassis type. If the combination of building blocks is unprecedented, it is unfeasible to measure the FC, since this would first r equire the construction of the vehicle. This paper proposes a machine learning approach to predict FC. This study uses around 40,000 vehicles specific and o perational e nvironmental c onditions i nformation, such as road slopes and driver profiles. A ll v ehicles h ave d iesel engines and a mileage of more than 20,000 km. The data is used to investigate the accuracy of machine learning algorithms Linear regression (LR), K-nearest neighbor (KNN) and Artificial n eural n etworks (ANN) in predicting fuel consumption for heavy-duty vehicles. Performance of the algorithms is evaluated by reporting the prediction error on both simulated data and operational measurements. The performance of the algorithms is compared using nested cross-validation and statistical hypothesis testing. The statistical evaluation procedure finds that ANNs have the lowest prediction error compared to LR and KNN in estimating fuel consumption on both simulated and operational data. The models have a mean relative prediction error of 0.3% on simulated data, and 4.2% on operational data.Keywords: Artificial neural networks, fuel consumption, machine learning, regression, statistical tests.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 830488 Optimizing and Evaluating Performance Quality Control of the Production Process of Disposable Essentials Using Approach Vague Goal Programming
Authors: Hadi Gholizadeh, Ali Tajdin
Abstract:
To have effective production planning, it is necessary to control the quality of processes. This paper aims at improving the performance of the disposable essentials process using statistical quality control and goal programming in a vague environment. That is expressed uncertainty because there is always a measurement error in the real world. Therefore, in this study, the conditions are examined in a vague environment that is a distance-based environment. The disposable essentials process in Kach Company was studied. Statistical control tools were used to characterize the existing process for four factor responses including the average of disposable glasses’ weights, heights, crater diameters, and volumes. Goal programming was then utilized to find the combination of optimal factors setting in a vague environment which is measured to apply uncertainty of the initial information when some of the parameters of the models are vague; also, the fuzzy regression model is used to predict the responses of the four described factors. Optimization results show that the process capability index values for disposable glasses’ average of weights, heights, crater diameters and volumes were improved. Such increasing the quality of the products and reducing the waste, which will reduce the cost of the finished product, and ultimately will bring customer satisfaction, and this satisfaction, will mean increased sales.Keywords: Goal programming, quality control, vague environment, disposable glasses’ optimization, fuzzy regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1040487 Modeling Default Probabilities of the Chosen Czech Banks in the Time of the Financial Crisis
Authors: Petr Gurný
Abstract:
One of the most important tasks in the risk management is the correct determination of probability of default (PD) of particular financial subjects. In this paper a possibility of determination of financial institution’s PD according to the creditscoring models is discussed. The paper is divided into the two parts. The first part is devoted to the estimation of the three different models (based on the linear discriminant analysis, logit regression and probit regression) from the sample of almost three hundred US commercial banks. Afterwards these models are compared and verified on the control sample with the view to choose the best one. The second part of the paper is aimed at the application of the chosen model on the portfolio of three key Czech banks to estimate their present financial stability. However, it is not less important to be able to estimate the evolution of PD in the future. For this reason, the second task in this paper is to estimate the probability distribution of the future PD for the Czech banks. So, there are sampled randomly the values of particular indicators and estimated the PDs’ distribution, while it’s assumed that the indicators are distributed according to the multidimensional subordinated Lévy model (Variance Gamma model and Normal Inverse Gaussian model, particularly). Although the obtained results show that all banks are relatively healthy, there is still high chance that “a financial crisis” will occur, at least in terms of probability. This is indicated by estimation of the various quantiles in the estimated distributions. Finally, it should be noted that the applicability of the estimated model (with respect to the used data) is limited to the recessionary phase of the financial market.
Keywords: Credit-scoring Models, Multidimensional Subordinated Lévy Model, Probability of Default.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1919486 Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists
Authors: Alexander Kaplunovsky, Vladimir Khailenko, Alexander Bolshoy, Shara Atambayeva, AnatoliyIvashchenko
Abstract:
Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.
Keywords: Comparative genomics, exon-intron structure, eukaryotic clustering, linear regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2573485 Development of Rock Engineering System-Based Models for Tunneling Progress Analysis and Evaluation: Case Study of Tailrace Tunnel of Azad Power Plant Project
Authors: S. Golmohammadi, M. Noorian Bidgoli
Abstract:
Tunneling progress is a key parameter in the blasting method of tunneling. Taking measures to enhance tunneling advance can limit the progress distance without a supporting system, subsequently reducing or eliminating the risk of damage. This paper focuses on modeling tunneling progress using three main groups of parameters (tunneling geometry, blasting pattern, and rock mass specifications) based on the Rock Engineering Systems (RES) methodology. In the proposed models, four main effective parameters on tunneling progress are considered as inputs (RMR, Q-system, Specific charge of blasting, Area), with progress as the output. Data from 86 blasts conducted at the tailrace tunnel in the Azad Dam, western Iran, were used to evaluate the progress value for each blast. The results indicated that, for the 86 blasts, the progress of the estimated model aligns mostly with the measured progress. This paper presents a method for building the interaction matrix (statistical base) of the RES model. Additionally, a comparison was made between the results of the new RES-based model and a Multi-Linear Regression (MLR) analysis model. In the RES-based model, the effective parameters are RMR (35.62%), Q (28.6%), q (specific charge of blasting) (20.35%), and A (15.42%), respectively, whereas for MLR analysis, the main parameters are RMR, Q (system), q, and A. These findings confirm the superior performance of the RES-based model over the other proposed models.
Keywords: Rock Engineering Systems, tunneling progress, Multi Linear Regression, Specific charge of blasting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 141484 Indoor Air Pollution of the Flexographic Printing Environment
Authors: Jelena S. Kiurski, Vesna S. Kecić, Snežana M. Aksentijević
Abstract:
The identification and evaluation of organic and inorganic pollutants were performed in a flexographic facility in Novi Sad, Serbia. Air samples were collected and analyzed in situ, during 4-hours working time at five sampling points by the mobile gas chromatograph and ozonometer at the printing of collagen casing. Experimental results showed that the concentrations of isopropyl alcohol, acetone, total volatile organic compounds and ozone varied during the sampling times. The highest average concentrations of 94.80 ppm and 102.57 ppm were achieved at 200 minutes from starting the production for isopropyl alcohol and total volatile organic compounds, respectively. The mutual dependences between target hazardous and microclimate parameters were confirmed using a multiple linear regression model with software package STATISTICA 10. Obtained multiple coefficients of determination in the case of ozone and acetone (0.507 and 0.589) with microclimate parameters indicated a moderate correlation between the observed variables. However, a strong positive correlation was obtained for isopropyl alcohol and total volatile organic compounds (0.760 and 0.852) with microclimate parameters. Higher values of parameter F than Fcritical for all examined dependences indicated the existence of statistically significant difference between the concentration levels of target pollutants and microclimates parameters. Given that, the microclimate parameters significantly affect the emission of investigated gases and the application of eco-friendly materials in production process present a necessity.
Keywords: Flexographic printing, indoor air, multiple regression analysis, pollution emission.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1308483 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform
Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu
Abstract:
Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.
Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance empirical formula, typical SQL query tasks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837482 Child Homicide Victimization and Community Context: A Research Note
Authors: Bohsiu Wu
Abstract:
Among serious crimes, child homicide is a rather rare event. However, the killing of children stirs up a special type of emotion in society that pales other criminal acts. This study examines the relevancy of three possible community-level explanations for child homicide: social deprivation, female empowerment, and social isolation. The social deprivation hypothesis posits that child homicide results from lack of resources in communities. The female empowerment hypothesis argues that a higher female status translates into a higher level of capability to prevent child homicide. Finally, the social isolation hypothesis regards child homicide as a result of lack of social connectivity. Child homicide data, aggregated by US postal ZIP codes in California from 1990 to 1999, were analyzed with a negative binomial regression. The results of the negative binomial analysis demonstrate that social deprivation is the most salient and consistent predictor among all other factors in explaining child homicide victimization at the ZIP-code level. Both social isolation and female labor force participation are weak predictors of child homicide victimization across communities. Further, results from the negative binomial regression show that it is the communities with a higher, not lower, degree of female labor force participation that are associated with a higher count of child homicide. It is possible that poor communities with a higher level of female employment have a lesser capacity to provide the necessary care and protection for the children. Policies aiming at reducing social deprivation and strengthening female empowerment possess the potential to reduce child homicide in the community.
Keywords: Child homicide, deprivation, empowerment, isolation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 689481 Factors Affecting Slot Machine Performance in an Electronic Gaming Machine Facility
Authors: Etienne Provencal, David L. St-Pierre
Abstract:
A facility exploiting only electronic gambling machines (EGMs) opened in 2007 in Quebec City, Canada under the name of Salons de Jeux du Québec (SdjQ). This facility is one of the first worldwide to rely on that business model. This paper models the performance of such EGMs. The interest from a managerial point of view is to identify the variables that can be controlled or influenced so that a comprehensive model can help improve the overall performance of the business. The EGM individual performance model contains eight different variables under study (Game Title, Progressive jackpot, Bonus Round, Minimum Coin-in, Maximum Coin-in, Denomination, Slant Top and Position). Using data from Quebec City’s SdjQ, a linear regression analysis explains 90.80% of the EGM performance. Moreover, results show a behavior slightly different than that of a casino. The addition of GameTitle as a factor to predict the EGM performance is one of the main contributions of this paper. The choice of the game (GameTitle) is very important. Games having better position do not have significantly better performance than games located elsewhere on the gaming floor. Progressive jackpots have a positive and significant effect on the individual performance of EGMs. The impact of BonusRound on the dependent variable is significant but negative. The effect of Denomination is significant but weakly negative. As expected, the Language of an EGMS does not impact its individual performance. This paper highlights some possible improvements by indicating which features are performing well. Recommendations are given to increase the performance of the EGMs performance.
Keywords: EGM, linear regression, model prediction, slot operations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1563480 Machine Learning Framework: Competitive Intelligence and Key Drivers Identification of Market Share Trends among Healthcare Facilities
Authors: A. Appe, B. Poluparthi, L. Kasivajjula, U. Mv, S. Bagadi, P. Modi, A. Singh, H. Gunupudi, S. Troiano, J. Paul, J. Stovall, J. Yamamoto
Abstract:
The necessity of data-driven decisions in healthcare strategy formulation is rapidly increasing. A reliable framework which helps identify factors impacting a healthcare provider facility or a hospital (from here on termed as facility) market share is of key importance. This pilot study aims at developing a data-driven machine learning-regression framework which aids strategists in formulating key decisions to improve the facility’s market share which in turn impacts in improving the quality of healthcare services. The US (United States) healthcare business is chosen for the study, and the data spanning 60 key facilities in Washington State and about 3 years of historical data are considered. In the current analysis, market share is termed as the ratio of the facility’s encounters to the total encounters among the group of potential competitor facilities. The current study proposes a two-pronged approach of competitor identification and regression approach to evaluate and predict market share, respectively. Leveraged model agnostic technique, SHAP (SHapley Additive exPlanations), to quantify the relative importance of features impacting the market share. Typical techniques in literature to quantify the degree of competitiveness among facilities use an empirical method to calculate a competitive factor to interpret the severity of competition. The proposed method identifies a pool of competitors, develops Directed Acyclic Graphs (DAGs) and feature level word vectors, and evaluates the key connected components at the facility level. This technique is robust since it is data-driven, which minimizes the bias from empirical techniques. The DAGs factor in partial correlations at various segregations and key demographics of facilities along with a placeholder to factor in various business rules (for e.g., quantifying the patient exchanges, provider references, and sister facilities). Identified are the multiple groups of competitors among facilities. Leveraging the competitors' identified developed and fine-tuned Random Forest Regression model to predict the market share. To identify key drivers of market share at an overall level, permutation feature importance of the attributes was calculated. For relative quantification of features at a facility level, incorporated SHAP, a model agnostic explainer. This helped to identify and rank the attributes at each facility which impacts the market share. This approach proposes an amalgamation of the two popular and efficient modeling practices, viz., machine learning with graphs and tree-based regression techniques to reduce the bias. With these, we helped to drive strategic business decisions.
Keywords: Competition, DAGs, hospital, healthcare, machine learning, market share, random forest, SHAP.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 284479 Electricity Load Modeling: An Application to Italian Market
Authors: Giovanni Masala, Stefania Marica
Abstract:
Forecasting electricity load plays a crucial role regards decision making and planning for economical purposes. Besides, in the light of the recent privatization and deregulation of the power industry, the forecasting of future electricity load turned out to be a very challenging problem. Empirical data about electricity load highlights a clear seasonal behavior (higher load during the winter season), which is partly due to climatic effects. We also emphasize the presence of load periodicity at a weekly basis (electricity load is usually lower on weekends or holidays) and at daily basis (electricity load is clearly influenced by the hour). Finally, a long-term trend may depend on the general economic situation (for example, industrial production affects electricity load). All these features must be captured by the model. The purpose of this paper is then to build an hourly electricity load model. The deterministic component of the model requires non-linear regression and Fourier series while we will investigate the stochastic component through econometrical tools. The calibration of the parameters’ model will be performed by using data coming from the Italian market in a 6 year period (2007- 2012). Then, we will perform a Monte Carlo simulation in order to compare the simulated data respect to the real data (both in-sample and out-of-sample inspection). The reliability of the model will be deduced thanks to standard tests which highlight a good fitting of the simulated values.Keywords: ARMA-GARCH process, electricity load, fitting tests, Fourier series, Monte Carlo simulation, non-linear regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1486478 Regional Analysis of Streamflow Drought: A Case Study for Southwestern Iran
Authors: M. Byzedi, B. Saghafian
Abstract:
Droughts are complex, natural hazards that, to a varying degree, affect some parts of the world every year. The range of drought impacts is related to drought occurring in different stages of the hydrological cycle and usually different types of droughts, such as meteorological, agricultural, hydrological, and socioeconomical are distinguished. Streamflow drought was analyzed by the method of truncation level (at 70% level) on daily discharges measured in 54 hydrometric stations in southwestern Iran. Frequency analysis was carried out for annual maximum series (AMS) of drought deficit volume and duration series. Some factors including physiographic, climatic, geologic, and vegetation cover were studied as influential factors in the regional analysis. According to the results of factor analysis, six most effective factors were identified as area, rainfall from December to February, the percent of area with Normalized Difference Vegetation Index (NDVI) <0.1, the percent of convex area, drainage density and the minimum of watershed elevation that explained 90.9% of variance. The homogenous regions were determined by cluster analysis and discriminate function analysis. Suitable multivariate regression models were evaluated for streamflow drought deficit volume with 2 years return period. The significance level of regression models was 0.01. The results showed that the watershed area is the most effective factor with high correlation with deficit volume. Also, drought duration was not a suitable drought index for regional analysis.Keywords: Iran, Streamflow drought, truncation level method, regional analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1744477 Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications
Authors: Kanthida Kusonmano, Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Klaus R. Liedl, Armin Graber
Abstract:
Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.
Keywords: Classification, High dimensional data, Machine learning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2384476 Predictor Factors for Treatment Failure among Patients on Second Line Antiretroviral Therapy
Authors: Mohd. A. M. Rahim, Yahaya Hassan, Mathumalar L. Fahrni
Abstract:
Second line antiretroviral therapy (ART) regimen is used when patients fail their first line regimen. There are many factors such as non-adherence, drug resistance as well as virological and immunological failure that lead to second line highly active antiretroviral therapy (HAART) regimen treatment failure. This study was aimed at determining predictor factors to treatment failure with second line HAART and analyzing median survival time. An observational, retrospective study was conducted in Sungai Buloh Hospital (HSB) to assess current status of HIV patients treated with second line HAART regimen. Convenience sampling was used and 104 patients were included based on the study’s inclusion and exclusion criteria. Data was collected for six months i.e. from July until December 2013. Data was then analysed using SPSS version 18. Kaplan-Meier and Cox regression analyses were used to measure median survival times and predictor factors for treatment failure. The study population consisted mainly of male subjects, aged 30- 45 years, who were heterosexual, and had HIV infection for less than 6 years. The most common second line HAART regimen given was lopinavir/ritonavir (LPV/r)-based combination. Kaplan-Meier analysis showed that patients on LPV/r demonstrated longer median survival times than patients on indinavir/ritonavir (IDV/r) based combination (p<0.001). The commonest reason for a treatment to fail with second line HAART was non-adherence. Based on Cox regression analysis, other predictor factors for treatment failure with second line HAART regimen were age and mode of HIV transmission.
Keywords: Adherence, antiretroviral therapy, second line, treatment failure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2717475 A Three Elements Vector Valued Structure’s Ultimate Strength-Strong Motion-Intensity Measure
Authors: A. Nicknam, N. Eftekhari, A. Mazarei, M. Ganjvar
Abstract:
This article presents an alternative collapse capacity intensity measure in the three elements form which is influenced by the spectral ordinates at periods longer than that of the first mode period at near and far source sites. A parameter, denoted by β, is defined by which the spectral ordinate effects, up to the effective period (2T1), on the intensity measure are taken into account. The methodology permits to meet the hazard-levelled target extreme event in the probabilistic and deterministic forms. A MATLAB code is developed involving OpenSees to calculate the collapse capacities of the 8 archetype RC structures having 2 to 20 stories for regression process. The incremental dynamic analysis (IDA) method is used to calculate the structure’s collapse values accounting for the element stiffness and strength deterioration. The general near field set presented by FEMA is used in a series of performing nonlinear analyses. 8 linear relationships are developed for the 8structutres leading to the correlation coefficient up to 0.93. A collapse capacity near field prediction equation is developed taking into account the results of regression processes obtained from the 8 structures. The proposed prediction equation is validated against a set of actual near field records leading to a good agreement. Implementation of the proposed equation to the four archetype RC structures demonstrated different collapse capacities at near field site compared to those of FEMA. The reasons of differences are believed to be due to accounting for the spectral shape effects.Keywords: Collapse capacity, fragility analysis, spectral shape effects, IDA method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794474 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach
Authors: Rajvir Kaur, Jeewani Anupama Ginige
Abstract:
With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1553473 Evaluation of the Beach Erosion Process in Varadero, Matanzas, Cuba: Effects of Different Hurricane Trajectories
Authors: Ana Gabriela Diaz, Luis Fermín Córdova, Jr., Roberto Lamazares
Abstract:
The island of Cuba, the largest of the Greater Antilles, is located in the tropical North Atlantic. It is annually affected by numerous weather events, which have caused severe damage to our coastal areas. In the same way that many other coastlines around the world, the beautiful beaches of the Hicacos Peninsula also suffer from erosion. This leads to a structural regression of the coastline. If measures are not taken, the hotels will be exposed to the advance of the sea, and it will be a serious problem for the economy. With the aim of studying the intensity of this type of activity, specialists of group of coastal and marine engineering from CIH, in the framework of the research conducted within the project MEGACOSTAS 2, provide their research to simulate extreme events and assess their impact in coastal areas, mainly regarding the definition of flood volumes and morphodynamic changes in sandy beaches. The main objective of this work is the evaluation of the process of Varadero beach erosion (the coastal sector has an important impact in the country's economy) on the Hicacos Peninsula for different paths of hurricanes. The mathematical model XBeach, which was integrated into the Coastal engineering system introduced by the project of MEGACOSTA 2 to determine the area and the more critical profiles for the path of hurricanes under study, was applied. The results of this project have shown that Center area is the greatest dynamic area in the simulation of the three paths of hurricanes under study, showing high erosion volumes and the greatest average length of regression of the coastline, from 15- 22 m.
Keywords: Beach, erosion, mathematical model, coastal areas.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1219472 Potentials of Raphia hookeri Wine in Livelihood Sustenance among Rural and Urban Populations in Nigeria
Authors: A. A. Aiyeloja, A.T. Oladele, O. Tumulo
Abstract:
Raphia wine is an important forest product with cultural significance besides its use as medicine and food in southern Nigeria. This work aims to evaluate the profitability of Raphia wine production and marketing in Sapele Local Government Area, Nigeria. Four communities (Sapele, Ogiede, Okuoke and Elume) were randomly selected for data collection via questionnaires among producers and marketers. A total of 50 producers and 34 marketers were randomly selected for interview. Data was analyzed using descriptive statistics, profit margin, multiple regression and rate of returns on investment (RORI). Annual average profit was highest in Okuoke (Producers – N90, 000.00, Marketers - N70, 000.00) and least in Sapele (Producers N50, 000.00, Marketers – N45, 000.00). Calculated RORI for marketers were Elume (40.0%), Okuoke (25.0%), Ogiede (33.3%) and Sapele (50.0%). Regression results showed that location has significant effects (0.000, ρ ≤ 0.05) on profit margins. Male (58.8%) and female (41.2%) invest in Raphia wine marketing, while males (100.0%) dominate production. Results showed that Raphia wine has potentials to generate household income, enhance food security and improve quality of life in rural, semi-urban and urban communities. Improved marketing channels, storage facilities and credit facilities via cooperative groups are recommended for producers and marketers by concerned agencies.
Keywords: Raphia wine, Profit margin, RORI, Livelihood, Nigeria.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2426471 A Linear Regression Model for Estimating Anxiety Index Using Wide Area Frontal Lobe Brain Blood Volume
Authors: Takashi Kaburagi, Masashi Takenaka, Yosuke Kurihara, Takashi Matsumoto
Abstract:
Major depressive disorder (MDD) is one of the most common mental illnesses today. It is believed to be caused by a combination of several factors, including stress. Stress can be quantitatively evaluated using the State-Trait Anxiety Inventory (STAI), one of the best indices to evaluate anxiety. Although STAI scores are widely used in applications ranging from clinical diagnosis to basic research, the scores are calculated based on a self-reported questionnaire. An objective evaluation is required because the subject may intentionally change his/her answers if multiple tests are carried out. In this article, we present a modified index called the “multi-channel Laterality Index at Rest (mc-LIR)” by recording the brain activity from a wider area of the frontal lobe using multi-channel functional near-infrared spectroscopy (fNIRS). The presented index aims to measure multiple positions near the Fpz defined by the international 10-20 system positioning. Using 24 subjects, the dependencies on the number of measuring points used to calculate the mc-LIR and its correlation coefficients with the STAI scores are reported. Furthermore, a simple linear regression was performed to estimate the STAI scores from mc-LIR. The cross-validation error is also reported. The experimental results show that using multiple positions near the Fpz will improve the correlation coefficients and estimation than those using only two positions.
Keywords: Stress, functional near-infrared spectroscopy, frontal lobe, state-trait anxiety inventory score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1166470 Foreign Direct Investment on Economic Growth by Industries in Central and Eastern European Countries
Authors: Shorena Pharjiani
Abstract:
Present empirical paper investigates the relationship between FDI and economic growth by 10 selected industries in 10 Central and Eastern European countries from the period 1995 to 2012. Different estimation approaches were used to explore the connection between FDI and economic growth, for example OLS, RE, FE with and without time dummies. Obtained empirical results leads to some main consequences: First, the Central and East European countries (CEEC) attracted foreign direct investment, which raised the productivity of industries they entered in. It should be concluded that the linkage between FDI and output growth by industries is positive and significant enough to suggest that foreign firm’s participation enhanced the productivity of the industries they occupied. There had been an endogeneity problem in the regression and fixed effects estimation approach was used which partially corrected the regression analysis in order to make the results less biased. Second, it should be stressed that the results show that time has an important role in making FDI operational for enhancing output growth by industries via total factor productivity. Third, R&D positively affected economic growth and at the same time, it should take some time for research and development to influence economic growth. Fourth, the general trends masked crucial differences at the country level: over the last 20 years, the analysis of the tables and figures at the country level show that the main recipients of FDI of the 11 Central and Eastern European countries were Hungary, Poland and the Czech Republic. The main reason was that these countries had more open door policies for attracting the FDI. Fifth, according to the graphical analysis, while Hungary had the highest FDI inflow in this region, it was not reflected in the GDP growth as much as in other Central and Eastern European countries.Keywords: Central and East European countries (CEEC), economic growth, FDI, panel data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1665469 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector
Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh
Abstract:
A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.
Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 408468 Replicating Brain’s Resting State Functional Connectivity Network Using a Multi-Factor Hub-Based Model
Authors: B. L. Ho, L. Shi, D. F. Wang, V. C. T. Mok
Abstract:
The brain’s functional connectivity while temporally non-stationary does express consistency at a macro spatial level. The study of stable resting state connectivity patterns hence provides opportunities for identification of diseases if such stability is severely perturbed. A mathematical model replicating the brain’s spatial connections will be useful for understanding brain’s representative geometry and complements the empirical model where it falls short. Empirical computations tend to involve large matrices and become infeasible with fine parcellation. However, the proposed analytical model has no such computational problems. To improve replicability, 92 subject data are obtained from two open sources. The proposed methodology, inspired by financial theory, uses multivariate regression to find relationships of every cortical region of interest (ROI) with some pre-identified hubs. These hubs acted as representatives for the entire cortical surface. A variance-covariance framework of all ROIs is then built based on these relationships to link up all the ROIs. The result is a high level of match between model and empirical correlations in the range of 0.59 to 0.66 after adjusting for sample size; an increase of almost forty percent. More significantly, the model framework provides an intuitive way to delineate between systemic drivers and idiosyncratic noise while reducing dimensions by more than 30 folds, hence, providing a way to conduct attribution analysis. Due to its analytical nature and simple structure, the model is useful as a standalone toolkit for network dependency analysis or as a module for other mathematical models.Keywords: Functional magnetic resonance imaging, multivariate regression, network hubs, resting state functional connectivity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 807467 Evaluation of Short-Term Load Forecasting Techniques Applied for Smart Micro Grids
Authors: Xiaolei Hu, Enrico Ferrera, Riccardo Tomasi, Claudio Pastrone
Abstract:
Load Forecasting plays a key role in making today's and future's Smart Energy Grids sustainable and reliable. Accurate power consumption prediction allows utilities to organize in advance their resources or to execute Demand Response strategies more effectively, which enables several features such as higher sustainability, better quality of service, and affordable electricity tariffs. It is easy yet effective to apply Load Forecasting at larger geographic scale, i.e. Smart Micro Grids, wherein the lower available grid flexibility makes accurate prediction more critical in Demand Response applications. This paper analyses the application of short-term load forecasting in a concrete scenario, proposed within the EU-funded GreenCom project, which collect load data from single loads and households belonging to a Smart Micro Grid. Three short-term load forecasting techniques, i.e. linear regression, artificial neural networks, and radial basis function network, are considered, compared, and evaluated through absolute forecast errors and training time. The influence of weather conditions in Load Forecasting is also evaluated. A new definition of Gain is introduced in this paper, which innovatively serves as an indicator of short-term prediction capabilities of time spam consistency. Two models, 24- and 1-hour-ahead forecasting, are built to comprehensively compare these three techniques.
Keywords: Short-term load forecasting, smart micro grid, linear regression, artificial neural networks, radial basis function network, Gain.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2602466 The Impact of Socio-Economic and Type of Religion on the Behavior of Obedience among Arab-Israeli Teenagers
Authors: Sadhana Ghnayem
Abstract:
This article examines the relationship between several socio-economic and background variables of Arab-Israeli families and their effect on the conflict management style of forcing, where teenage children are expected to obey their parents without questioning. The article explores the inter-generational gap and the desire of Arab-Israeli parents to force their teenage children to obey without questioning. The independent variables include: the sex of the parent, religion (Christian or Muslim), income of the parent, years of education of the parent, and the sex of the teenage child. We use the dependent variable of “Obedience Without Questioning” that is reported twice: by each of the parents as well as by the children. We circulated a questionnaire and collected data from a sample of 180 parents and their adolescent child living in the Galilee area during 2018. In this questionnaire we asked each of the parent and his/her teenage child about whether the latter is expected to follow the instructions of the former without questioning. The outcome of this article indicates, first, that Christian-Arab families are less authoritarian than Muslims families in demanding sheer obedience from their children. Second, female parents indicate more than male parents that their teenage child indeed obeys without questioning. Third, there is a negative correlation between the variable “Income” and “Obedience without Questioning.” Yet, the regression coefficient of this variable is close zero. Fourth, there is a positive correlation between years of education and obedience reported by the children. In other words, more educated parents are more likely to demand obedience from their children. Finally, after running the regression, the study also found that the impact of the variables of religion as well as the sex of the child on the dependent variable of obedience is also significant at above 95 and 90%, respectively.
Keywords: Arab-Israeli parents, Obedience, Forcing, Inter-generational gap.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 793465 Meta Model for Optimum Design Objective Function of Steel Frames Subjected to Seismic Loads
Authors: Salah R. Al Zaidee, Ali S. Mahdi
Abstract:
Except for simple problems of statically determinate structures, optimum design problems in structural engineering have implicit objective functions where structural analysis and design are essential within each searching loop. With these implicit functions, the structural engineer is usually enforced to write his/her own computer code for analysis, design, and searching for optimum design among many feasible candidates and cannot take advantage of available software for structural analysis, design, and searching for the optimum solution. The meta-model is a regression model used to transform an implicit objective function into objective one and leads in turn to decouple the structural analysis and design processes from the optimum searching process. With the meta-model, well-known software for structural analysis and design can be used in sequence with optimum searching software. In this paper, the meta-model has been used to develop an explicit objective function for plane steel frames subjected to dead, live, and seismic forces. Frame topology is assumed as predefined based on architectural and functional requirements. Columns and beams sections and different connections details are the main design variables in this study. Columns and beams are grouped to reduce the number of design variables and to make the problem similar to that adopted in engineering practice. Data for the implicit objective function have been generated based on analysis and assessment for many design proposals with CSI SAP software. These data have been used later in SPSS software to develop a pure quadratic nonlinear regression model for the explicit objective function. Good correlations with a coefficient, R2, in the range from 0.88 to 0.99 have been noted between the original implicit functions and the corresponding explicit functions generated with meta-model.
Keywords: Meta-modal, objective function, steel frames, seismic analysis, design.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1333464 Forecast of the Small Wind Turbines Sales with Replacement Purchases and with or without Account of Price Changes
Authors: V. Churkin, M. Lopatin
Abstract:
The purpose of the paper is to estimate the US small wind turbines market potential and forecast the small wind turbines sales in the US. The forecasting method is based on the application of the Bass model and the generalized Bass model of innovations diffusion under replacement purchases. In the work an exponential distribution is used for modeling of replacement purchases. Only one parameter of such distribution is determined by average lifetime of small wind turbines. The identification of the model parameters is based on nonlinear regression analysis on the basis of the annual sales statistics which has been published by the American Wind Energy Association (AWEA) since 2001 up to 2012. The estimation of the US average market potential of small wind turbines (for adoption purchases) without account of price changes is 57080 (confidence interval from 49294 to 64866 at P = 0.95) under average lifetime of wind turbines 15 years, and 62402 (confidence interval from 54154 to 70648 at P = 0.95) under average lifetime of wind turbines 20 years. In the first case the explained variance is 90,7%, while in the second - 91,8%. The effect of the wind turbines price changes on their sales was estimated using generalized Bass model. This required a price forecast. To do this, the polynomial regression function, which is based on the Berkeley Lab statistics, was used. The estimation of the US average market potential of small wind turbines (for adoption purchases) in that case is 42542 (confidence interval from 32863 to 52221 at P = 0.95) under average lifetime of wind turbines 15 years, and 47426 (confidence interval from 36092 to 58760 at P = 0.95) under average lifetime of wind turbines 20 years. In the first case the explained variance is 95,3%, while in the second – 95,3%.Keywords: Bass model, generalized Bass model, replacement purchases, sales forecasting of innovations, statistics of sales of small wind turbines in the United States.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1883