Search results for: prediction models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7967

Search results for: prediction models

7517 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining

Procedia PDF Downloads 328
7516 Reliability Estimation of Bridge Structures with Updated Finite Element Models

Authors: Ekin Ozer

Abstract:

Assessment of structural reliability is essential for efficient use of civil infrastructure which is subjected hazardous events. Dynamic analysis of finite element models is a commonly used tool to simulate structural behavior and estimate its performance accordingly. However, theoretical models purely based on preliminary assumptions and design drawings may deviate from the actual behavior of the structure. This study proposes up-to-date reliability estimation procedures which engages actual bridge vibration data modifying finite element models for finite element model updating and performing reliability estimation, accordingly. The proposed method utilizes vibration response measurements of bridge structures to identify modal parameters, then uses these parameters to calibrate finite element models which are originally based on design drawings. The proposed method does not only show that reliability estimation based on updated models differs from the original models, but also infer that non-updated models may overestimate the structural capacity.

Keywords: earthquake engineering, engineering vibrations, reliability estimation, structural health monitoring

Procedia PDF Downloads 193
7515 Detection of Chaos in General Parametric Model of Infectious Disease

Authors: Javad Khaligh, Aghileh Heydari, Ali Akbar Heydari

Abstract:

Mathematical epidemiological models for the spread of disease through a population are used to predict the prevalence of a disease or to study the impacts of treatment or prevention measures. Initial conditions for these models are measured from statistical data collected from a population since these initial conditions can never be exact, the presence of chaos in mathematical models has serious implications for the accuracy of the models as well as how epidemiologists interpret their findings. This paper confirms the chaotic behavior of a model for dengue fever and SI by investigating sensitive dependence, bifurcation, and 0-1 test under a variety of initial conditions.

Keywords: epidemiological models, SEIR disease model, bifurcation, chaotic behavior, 0-1 test

Procedia PDF Downloads 299
7514 Applied Complement of Probability and Information Entropy for Prediction in Student Learning

Authors: Kennedy Efosa Ehimwenma, Sujatha Krishnamoorthy, Safiya Al‑Sharji

Abstract:

The probability computation of events is in the interval of [0, 1], which are values that are determined by the number of outcomes of events in a sample space S. The probability Pr(A) that an event A will never occur is 0. The probability Pr(B) that event B will certainly occur is 1. This makes both events A and B a certainty. Furthermore, the sum of probabilities Pr(E₁) + Pr(E₂) + … + Pr(Eₙ) of a finite set of events in a given sample space S equals 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. This paper first discusses Bayes, the complement of probability, and the difference of probability for occurrences of learning-events before applying them in the prediction of learning objects in student learning. Given the sum of 1; to make a recommendation for student learning, this paper proposes that the difference of argMaxPr(S) and the probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates i) the probability of skill-set events that have occurred that would lead to higher-level learning; ii) the probability of the events that have not occurred that requires subject-matter relearning; iii) accuracy of the decision tree in the prediction of student performance into class labels and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning.

Keywords: complement of probability, Bayes’ rule, prediction, pre-assessments, computational education, information theory

Procedia PDF Downloads 136
7513 Innovative Methods of Improving Train Formation in Freight Transport

Authors: Jaroslav Masek, Juraj Camaj, Eva Nedeliakova

Abstract:

The paper is focused on the operational model for transport the single wagon consignments on railway network by using two different models of train formation. The paper gives an overview of possibilities of improving the quality of transport services. Paper deals with two models used in problematic of train formatting - time continuously and time discrete. By applying these models in practice, the transport company can guarantee a higher quality of service and expect increasing of transport performance. The models are also applicable into others transport networks. The models supplement a theoretical problem of train formation by new ways of looking to affecting the organization of wagon flows.

Keywords: train formation, wagon flows, marshalling yard, railway technology

Procedia PDF Downloads 417
7512 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk

Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour

Abstract:

The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.

Keywords: cancer risk, extrinsic factors, genome sequencing, intrinsic factors

Procedia PDF Downloads 247
7511 A Deep-Learning Based Prediction of Pancreatic Adenocarcinoma with Electronic Health Records from the State of Maine

Authors: Xiaodong Li, Peng Gao, Chao-Jung Huang, Shiying Hao, Xuefeng B. Ling, Yongxia Han, Yaqi Zhang, Le Zheng, Chengyin Ye, Modi Liu, Minjie Xia, Changlin Fu, Bo Jin, Karl G. Sylvester, Eric Widen

Abstract:

Predicting the risk of Pancreatic Adenocarcinoma (PA) in advance can benefit the quality of care and potentially reduce population mortality and morbidity. The aim of this study was to develop and prospectively validate a risk prediction model to identify patients at risk of new incident PA as early as 3 months before the onset of PA in a statewide, general population in Maine. The PA prediction model was developed using Deep Neural Networks, a deep learning algorithm, with a 2-year electronic-health-record (EHR) cohort. Prospective results showed that our model identified 54.35% of all inpatient episodes of PA, and 91.20% of all PA that required subsequent chemoradiotherapy, with a lead-time of up to 3 months and a true alert of 67.62%. The risk assessment tool has attained an improved discriminative ability. It can be immediately deployed to the health system to provide automatic early warnings to adults at risk of PA. It has potential to identify personalized risk factors to facilitate customized PA interventions.

Keywords: cancer prediction, deep learning, electronic health records, pancreatic adenocarcinoma

Procedia PDF Downloads 129
7510 Groundwater Potential Mapping using Frequency Ratio and Shannon’s Entropy Models in Lesser Himalaya Zone, Nepal

Authors: Yagya Murti Aryal, Bipin Adhikari, Pradeep Gyawali

Abstract:

The Lesser Himalaya zone of Nepal consists of thrusting and folding belts, which play an important role in the sustainable management of groundwater in the Himalayan regions. The study area is located in the Dolakha and Ramechhap Districts of Bagmati Province, Nepal. Geologically, these districts are situated in the Lesser Himalayas and partly encompass the Higher Himalayan rock sequence, which includes low-grade to high-grade metamorphic rocks. Following the Gorkha Earthquake in 2015, numerous springs dried up, and many others are currently experiencing depletion due to the distortion of the natural groundwater flow. The primary objective of this study is to identify potential groundwater areas and determine suitable sites for artificial groundwater recharge. Two distinct statistical approaches were used to develop models: The Frequency Ratio (FR) and Shannon Entropy (SE) methods. The study utilized both primary and secondary datasets and incorporated significant role and controlling factors derived from field works and literature reviews. Field data collection involved spring inventory, soil analysis, lithology assessment, and hydro-geomorphology study. Additionally, slope, aspect, drainage density, and lineament density were extracted from a Digital Elevation Model (DEM) using GIS and transformed into thematic layers. For training and validation, 114 springs were divided into a 70/30 ratio, with an equal number of non-spring pixels. After assigning weights to each class based on the two proposed models, a groundwater potential map was generated using GIS, classifying the area into five levels: very low, low, moderate, high, and very high. The model's outcome reveals that over 41% of the area falls into the low and very low potential categories, while only 30% of the area demonstrates a high probability of groundwater potential. To evaluate model performance, accuracy was assessed using the Area under the Curve (AUC). The success rate AUC values for the FR and SE methods were determined to be 78.73% and 77.09%, respectively. Additionally, the prediction rate AUC values for the FR and SE methods were calculated as 76.31% and 74.08%. The results indicate that the FR model exhibits greater prediction capability compared to the SE model in this case study.

Keywords: groundwater potential mapping, frequency ratio, Shannon’s Entropy, Lesser Himalaya Zone, sustainable groundwater management

Procedia PDF Downloads 51
7509 Aerodynamic Coefficients Prediction from Minimum Computation Combinations Using OpenVSP Software

Authors: Marine Segui, Ruxandra Mihaela Botez

Abstract:

OpenVSP is an aerodynamic solver developed by National Aeronautics and Space Administration (NASA) that allows building a reliable model of an aircraft. This software performs an aerodynamic simulation according to the angle of attack of the aircraft makes between the incoming airstream, and its speed. A reliable aerodynamic model of the Cessna Citation X was designed but it required a lot of computation time. As a consequence, a prediction method was established that allowed predicting lift and drag coefficients for all Mach numbers and for all angles of attack, exclusively for stall conditions, from a computation of three angles of attack and only one Mach number. Aerodynamic coefficients given by the prediction method for a Cessna Citation X model were finally compared with aerodynamics coefficients obtained using a complete OpenVSP study.

Keywords: aerodynamic, coefficient, cruise, improving, longitudinal, openVSP, solver, time

Procedia PDF Downloads 213
7508 Sensitivity Analysis of the Thermal Properties in Early Age Modeling of Mass Concrete

Authors: Farzad Danaei, Yilmaz Akkaya

Abstract:

In many civil engineering applications, especially in the construction of large concrete structures, the early age behavior of concrete has shown to be a crucial problem. The uneven rise in temperature within the concrete in these constructions is the fundamental issue for quality control. Therefore, developing accurate and fast temperature prediction models is essential. The thermal properties of concrete fluctuate over time as it hardens, but taking into account all of these fluctuations makes numerical models more complex. Experimental measurement of the thermal properties at the laboratory conditions also can not accurately predict the variance of these properties at site conditions. Therefore, specific heat capacity and the heat conductivity coefficient are two variables that are considered constant values in many of the models previously recommended. The proposed equations demonstrate that these two quantities are linearly decreasing as cement hydrates, and their value are related to the degree of hydration. The effects of changing the thermal conductivity and specific heat capacity values on the maximum temperature and the time it takes for concrete to reach that temperature are examined in this study using numerical sensibility analysis, and the results are compared to models that take a fixed value for these two thermal properties. The current study is conducted in 7 different mix designs of concrete with varying amounts of supplementary cementitious materials (fly ash and ground granulated blast furnace slag). It is concluded that the maximum temperature will not change as a result of the constant conductivity coefficient, but variable specific heat capacity must be taken into account, also about duration when a concrete's central node reaches its max value again variable specific heat capacity can have a considerable effect on the final result. Also, the usage of GGBFS has more influence compared to fly ash.

Keywords: early-age concrete, mass concrete, specific heat capacity, thermal conductivity coefficient

Procedia PDF Downloads 53
7507 Mean Monthly Rainfall Prediction at Benina Station Using Artificial Neural Networks

Authors: Hasan G. Elmazoghi, Aisha I. Alzayani, Lubna S. Bentaher

Abstract:

Rainfall is a highly non-linear phenomena, which requires application of powerful supervised data mining techniques for its accurate prediction. In this study the Artificial Neural Network (ANN) technique is used to predict the mean monthly historical rainfall data collected from BENINA station in Benghazi for 31 years, the period of “1977-2006” and the results are compared against the observed values. The specific objective to achieve this goal was to determine the best combination of weather variables to be used as inputs for the ANN model. Several statistical parameters were calculated and an uncertainty analysis for the results is also presented. The best ANN model is then applied to the data of one year (2007) as a case study in order to evaluate the performance of the model. Simulation results reveal that application of ANN technique is promising and can provide reliable estimates of rainfall.

Keywords: neural networks, rainfall, prediction, climatic variables

Procedia PDF Downloads 462
7506 SOM Map vs Hopfield Neural Network: A Comparative Study in Microscopic Evacuation Application

Authors: Zouhour Neji Ben Salem

Abstract:

Microscopic evacuation focuses on the evacuee behavior and way of search of safety place in an egress situation. In recent years, several models handled microscopic evacuation problem. Among them, we have proposed Artificial Neural Network (ANN) as an alternative to mathematical models that can deal with such problem. In this paper, we present two ANN models: SOM map and Hopfield Network used to predict the evacuee behavior in a disaster situation. These models are tested in a real case, the second floor of Tunisian children hospital evacuation in case of fire. The two models are studied and compared in order to evaluate their performance.

Keywords: artificial neural networks, self-organization map, hopfield network, microscopic evacuation, fire building evacuation

Procedia PDF Downloads 376
7505 Possibility of Making Ceramic Models from Condemned Plaster of Paris (Pop) Moulds for Ceramics Production in Edo State Nigeria

Authors: Osariyekemwen, Daniel Nosakhare

Abstract:

Some ceramic wastes, such as discarded (condemn) Plaster of Paris (POP) in Auchi Polytechnic, Edo State, constitute environmental hazards. This study, therefore, bridges the forgoing gaps by undertaking the use of these discarded (POP) moulds to produced ceramic models for making casting moulds for mass production. This is in line with the possibility of using this medium to properly manage the discarded (condemn) Plaster of Paris (POP) that littered our immediate environment. Presently these are major wastes disposal in the department. Hence, the study has been made to fabricate sanitary miniature models and contract fuse models, respectively. Findings arising from this study show that discarded (condemn) Plaster of Paris (POP) can be carved when to set it neither shrink nor expand; hence warping is quite unusual. Above all, it also gives good finishing with little deterioration with time when compared to clay models.

Keywords: plaster of Paris, condemn, moulds, models, production

Procedia PDF Downloads 164
7504 Short Review on Models to Estimate the Risk in the Financial Area

Authors: Tiberiu Socaciu, Tudor Colomeischi, Eugenia Iancu

Abstract:

Business failure affects in various proportions shareholders, managers, lenders (banks), suppliers, customers, the financial community, government and society as a whole. In the era in which we have telecommunications networks, exists an interdependence of markets, the effect of a failure of a company is relatively instant. To effectively manage risk exposure is thus require sophisticated support systems, supported by analytical tools to measure, monitor, manage and control operational risks that may arise. As we know, bankruptcy is a phenomenon that managers do not want no matter what stage of life is the company they direct / lead. In the analysis made by us, by the nature of economic models that are reviewed (Altman, Conan-Holder etc.), estimating the risk of bankruptcy of a company corresponds to some extent with its own business cycle tracing of the company. Various models for predicting bankruptcy take into account direct / indirect aspects such as market position, company growth trend, competition structure, characteristics and customer retention, organization and distribution, location etc. From the perspective of our research we will now review the economic models known in theory and practice for estimating the risk of bankruptcy; such models are based on indicators drawn from major accounting firms.

Keywords: Anglo-Saxon models, continental models, national models, statistical models

Procedia PDF Downloads 384
7503 Online Prediction of Nonlinear Signal Processing Problems Based Kernel Adaptive Filtering

Authors: Hamza Nejib, Okba Taouali

Abstract:

This paper presents two of the most knowing kernel adaptive filtering (KAF) approaches, the kernel least mean squares and the kernel recursive least squares, in order to predict a new output of nonlinear signal processing. Both of these methods implement a nonlinear transfer function using kernel methods in a particular space named reproducing kernel Hilbert space (RKHS) where the model is a linear combination of kernel functions applied to transform the observed data from the input space to a high dimensional feature space of vectors, this idea known as the kernel trick. Then KAF is the developing filters in RKHS. We use two nonlinear signal processing problems, Mackey Glass chaotic time series prediction and nonlinear channel equalization to figure the performance of the approaches presented and finally to result which of them is the adapted one.

Keywords: online prediction, KAF, signal processing, RKHS, Kernel methods, KRLS, KLMS

Procedia PDF Downloads 375
7502 Comparison of Different Reanalysis Products for Predicting Extreme Precipitation in the Southern Coast of the Caspian Sea

Authors: Parvin Ghafarian, Mohammadreza Mohammadpur Panchah, Mehri Fallahi

Abstract:

Synoptic patterns from surface up to tropopause are very important for forecasting the weather and atmospheric conditions. There are many tools to prepare and analyze these maps. Reanalysis data and the outputs of numerical weather prediction models, satellite images, meteorological radar, and weather station data are used in world forecasting centers to predict the weather. The forecasting extreme precipitating on the southern coast of the Caspian Sea (CS) is the main issue due to complex topography. Also, there are different types of climate in these areas. In this research, we used two reanalysis data such as ECMWF Reanalysis 5th Generation Description (ERA5) and National Centers for Environmental Prediction /National Center for Atmospheric Research (NCEP/NCAR) for verification of the numerical model. ERA5 is the latest version of ECMWF. The temporal resolution of ERA5 is hourly, and the NCEP/NCAR is every six hours. Some atmospheric parameters such as mean sea level pressure, geopotential height, relative humidity, wind speed and direction, sea surface temperature, etc. were selected and analyzed. Some different type of precipitation (rain and snow) was selected. The results showed that the NCEP/NCAR has more ability to demonstrate the intensity of the atmospheric system. The ERA5 is suitable for extract the value of parameters for specific point. Also, ERA5 is appropriate to analyze the snowfall events over CS (snow cover and snow depth). Sea surface temperature has the main role to generate instability over CS, especially when the cold air pass from the CS. Sea surface temperature of NCEP/NCAR product has low resolution near coast. However, both data were able to detect meteorological synoptic patterns that led to heavy rainfall over CS. However, due to the time lag, they are not suitable for forecast centers. The application of these two data is for research and verification of meteorological models. Finally, ERA5 has a better resolution, respect to NCEP/NCAR reanalysis data, but NCEP/NCAR data is available from 1948 and appropriate for long term research.

Keywords: synoptic patterns, heavy precipitation, reanalysis data, snow

Procedia PDF Downloads 96
7501 Modelling and Maping Malnutrition Toddlers in Bojonegoro Regency with Mixed Geographically Weighted Regression Approach

Authors: Elvira Mustikawati P.H., Iis Dewi Ratih, Dita Amelia

Abstract:

Bojonegoro has proclaimed a policy of zero malnutrition. Therefore, as an effort to solve the cases of malnutrition children in Bojonegoro, this study used the approach geographically Mixed Weighted Regression (MGWR) to determine the factors that influence the percentage of malnourished children under five in which factors can be divided into locally influential factor in each district and global factors that influence throughout the district. Based on the test of goodness of fit models, R2 and AIC values in GWR models are better than MGWR models. R2 and AIC values in MGWR models are 84.37% and 14.28, while the GWR models respectively are 91.04% and -62.04. Based on the analysis with GWR models, District Sekar, Bubulan, Gondang, and Dander is a district with three predictor variables (percentage of vitamin A, the percentage of births assisted health personnel, and the percentage of clean water) that significantly influence the percentage of malnourished children under five.

Keywords: GWR, MGWR, R2, AIC

Procedia PDF Downloads 274
7500 Estimation of Transition and Emission Probabilities

Authors: Aakansha Gupta, Neha Vadnere, Tapasvi Soni, M. Anbarsi

Abstract:

Protein secondary structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine and biotechnology. Some aspects of protein functions and genome analysis can be predicted by secondary structure prediction. This is used to help annotate sequences, classify proteins, identify domains, and recognize functional motifs. In this paper, we represent protein secondary structure as a mathematical model. To extract and predict the protein secondary structure from the primary structure, we require a set of parameters. Any constants appearing in the model are specified by these parameters, which also provide a mechanism for efficient and accurate use of data. To estimate these model parameters there are many algorithms out of which the most popular one is the EM algorithm or called the Expectation Maximization Algorithm. These model parameters are estimated with the use of protein datasets like RS126 by using the Bayesian Probabilistic method (data set being categorical). This paper can then be extended into comparing the efficiency of EM algorithm to the other algorithms for estimating the model parameters, which will in turn lead to an efficient component for the Protein Secondary Structure Prediction. Further this paper provides a scope to use these parameters for predicting secondary structure of proteins using machine learning techniques like neural networks and fuzzy logic. The ultimate objective will be to obtain greater accuracy better than the previously achieved.

Keywords: model parameters, expectation maximization algorithm, protein secondary structure prediction, bioinformatics

Procedia PDF Downloads 452
7499 Machine Learning Models for the Prediction of Heating and Cooling Loads of a Residential Building

Authors: Aaditya U. Jhamb

Abstract:

Due to the current energy crisis that many countries are battling, energy-efficient buildings are the subject of extensive research in the modern technological era because of growing worries about energy consumption and its effects on the environment. The paper explores 8 factors that help determine energy efficiency for a building: (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, and glazing area distribution), with Tsanas and Xifara providing a dataset. The data set employed 768 different residential building models to anticipate heating and cooling loads with a low mean squared error. By optimizing these characteristics, machine learning algorithms may assess and properly forecast a building's heating and cooling loads, lowering energy usage while increasing the quality of people's lives. As a result, the paper studied the magnitude of the correlation between these input factors and the two output variables using various statistical methods of analysis after determining which input variable was most closely associated with the output loads. The most conclusive model was the Decision Tree Regressor, which had a mean squared error of 0.258, whilst the least definitive model was the Isotonic Regressor, which had a mean squared error of 21.68. This paper also investigated the KNN Regressor and the Linear Regression, which had to mean squared errors of 3.349 and 18.141, respectively. In conclusion, the model, given the 8 input variables, was able to predict the heating and cooling loads of a residential building accurately and precisely.

Keywords: energy efficient buildings, heating load, cooling load, machine learning models

Procedia PDF Downloads 73
7498 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.

Keywords: classification, CRISP-DM, machine learning, predictive quality, regression

Procedia PDF Downloads 118
7497 Nonparametric Quantile Regression for Multivariate Spatial Data

Authors: S. H. Arnaud Kanga, O. Hili, S. Dabo-Niang

Abstract:

Spatial prediction is an issue appealing and attracting several fields such as agriculture, environmental sciences, ecology, econometrics, and many others. Although multiple non-parametric prediction methods exist for spatial data, those are based on the conditional expectation. This paper took a different approach by examining a non-parametric spatial predictor of the conditional quantile. The study especially observes the stationary multidimensional spatial process over a rectangular domain. Indeed, the proposed quantile is obtained by inverting the conditional distribution function. Furthermore, the proposed estimator of the conditional distribution function depends on three kernels, where one of them controls the distance between spatial locations, while the other two control the distance between observations. In addition, the almost complete convergence and the convergence in mean order q of the kernel predictor are obtained when the sample considered is alpha-mixing. Such approach of the prediction method gives the advantage of accuracy as it overcomes sensitivity to extreme and outliers values.

Keywords: conditional quantile, kernel, nonparametric, stationary

Procedia PDF Downloads 130
7496 Effect of Genuine Missing Data Imputation on Prediction of Urinary Incontinence

Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Ananias Diokno

Abstract:

Missing data is a common challenge in statistical analyses of most clinical survey datasets. A variety of methods have been developed to enable analysis of survey data to deal with missing values. Imputation is the most commonly used among the above methods. However, in order to minimize the bias introduced due to imputation, one must choose the right imputation technique and apply it to the correct type of missing data. In this paper, we have identified different types of missing values: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD) and applied rough set imputation on only the GMD portion of the missing data. We have used rough set imputation to evaluate the effect of such imputation on prediction by generating several simulation datasets based on an existing epidemiological dataset (MESA). To measure how well each dataset lends itself to the prediction model (logistic regression), we have used p-values from the Wald test. To evaluate the accuracy of the prediction, we have considered the width of 95% confidence interval for the probability of incontinence. Both imputed and non-imputed simulation datasets were fit to the prediction model, and they both turned out to be significant (p-value < 0.05). However, the Wald score shows a better fit for the imputed compared to non-imputed datasets (28.7 vs. 23.4). The average confidence interval width was decreased by 10.4% when the imputed dataset was used, meaning higher precision. The results show that using the rough set method for missing data imputation on GMD data improve the predictive capability of the logistic regression. Further studies are required to generalize this conclusion to other clinical survey datasets.

Keywords: rough set, imputation, clinical survey data simulation, genuine missing data, predictive index

Procedia PDF Downloads 145
7495 A Comparative Analysis of E-Government Quality Models

Authors: Abdoullah Fath-Allah, Laila Cheikhi, Rafa E. Al-Qutaish, Ali Idri

Abstract:

Many quality models have been used to measure e-government portals quality. However, the absence of an international consensus for e-government portals quality models results in many differences in terms of quality attributes and measures. The aim of this paper is to compare and analyze the existing e-government quality models proposed in literature (those that are based on ISO standards and those that are not) in order to propose guidelines to build a good and useful e-government portals quality model. Our findings show that, there is no e-government portal quality model based on the new international standard ISO 25010. Besides that, the quality models are not based on a best practice model to allow agencies to both; measure e-government portals quality and identify missing best practices for those portals.

Keywords: e-government, portal, best practices, quality model, ISO, standard, ISO 25010, ISO 9126

Procedia PDF Downloads 534
7494 A Deep Learning Based Integrated Model For Spatial Flood Prediction

Authors: Vinayaka Gude Divya Sampath

Abstract:

The research introduces an integrated prediction model to assess the susceptibility of roads in a future flooding event. The model consists of deep learning algorithm for forecasting gauge height data and Flood Inundation Mapper (FIM) for spatial flooding. An optimal architecture for Long short-term memory network (LSTM) was identified for the gauge located on Tangipahoa River at Robert, LA. Dropout was applied to the model to evaluate the uncertainty associated with the predictions. The estimates are then used along with FIM to identify the spatial flooding. Further geoprocessing in ArcGIS provides the susceptibility values for different roads. The model was validated based on the devastating flood of August 2016. The paper discusses the challenges for generalization the methodology for other locations and also for various types of flooding. The developed model can be used by the transportation department and other emergency response organizations for effective disaster management.

Keywords: deep learning, disaster management, flood prediction, urban flooding

Procedia PDF Downloads 122
7493 Customer Acquisition through Time-Aware Marketing Campaign Analysis in Banking Industry

Authors: Harneet Walia, Morteza Zihayat

Abstract:

Customer acquisition has become one of the critical issues of any business in the 21st century; having a healthy customer base is the essential asset of the bank business. Term deposits act as a major source of cheap funds for the banks to invest and benefit from interest rate arbitrage. To attract customers, the marketing campaigns at most financial institutions consist of multiple outbound telephonic calls with more than one contact to a customer which is a very time-consuming process. Therefore, customized direct marketing has become more critical than ever for attracting new clients. As customer acquisition is becoming more difficult to archive, having an intelligent and redefined list is necessary to sell a product smartly. Our aim of this research is to increase the effectiveness of campaigns by predicting customers who will most likely subscribe to the fixed deposit and suggest the most suitable month to reach out to customers. We design a Time Aware Upsell Prediction Framework (TAUPF) using two different approaches, with an aim to find the best approach and technique to build the prediction model. TAUPF is implemented using Upsell Prediction Approach (UPA) and Clustered Upsell Prediction Approach (CUPA). We also address the data imbalance problem by examining and comparing different methods of sampling (Up-sampling and down-sampling). Our results have shown building such a model is quite feasible and profitable for the financial institutions. The Time Aware Upsell Prediction Framework (TAUPF) can be easily used in any industry such as telecom, automobile, tourism, etc. where the TAUPF (Clustered Upsell Prediction Approach (CUPA) or Upsell Prediction Approach (UPA)) holds valid. In our case, CUPA books more reliable. As proven in our research, one of the most important challenges is to define measures which have enough predictive power as the subscription to a fixed deposit depends on highly ambiguous situations and cannot be easily isolated. While we have shown the practicality of time-aware upsell prediction model where financial institutions can benefit from contacting the customers at the specified month, further research needs to be done to understand the specific time of the day. In addition, a further empirical/pilot study on real live customer needs to be conducted to prove the effectiveness of the model in the real world.

Keywords: customer acquisition, predictive analysis, targeted marketing, time-aware analysis

Procedia PDF Downloads 101
7492 Machine Learning-Driven Prediction of Cardiovascular Diseases: A Supervised Approach

Authors: Thota Sai Prakash, B. Yaswanth, Jhade Bhuvaneswar, Marreddy Divakar Reddy, Shyam Ji Gupta

Abstract:

Across the globe, there are a lot of chronic diseases, and heart disease stands out as one of the most perilous. Sadly, many lives are lost to this condition, even though early intervention could prevent such tragedies. However, identifying heart disease in its initial stages is not easy. To address this challenge, we propose an automated system aimed at predicting the presence of heart disease using advanced techniques. By doing so, we hope to empower individuals with the knowledge needed to take proactive measures against this potentially fatal illness. Our approach towards this problem involves meticulous data preprocessing and the development of predictive models utilizing classification algorithms such as Support Vector Machines (SVM), Decision Tree, and Random Forest. We assess the efficiency of every model based on metrics like accuracy, ensuring that we select the most reliable option. Additionally, we conduct thorough data analysis to reveal the importance of different attributes. Among the models considered, Random Forest emerges as the standout performer with an accuracy rate of 96.04% in our study.

Keywords: support vector machines, decision tree, random forest

Procedia PDF Downloads 19
7491 A Predictive MOC Solver for Water Hammer Waves Distribution in Network

Authors: A. Bayle, F. Plouraboué

Abstract:

Water Distribution Network (WDN) still suffers from a lack of knowledge about fast pressure transient events prediction, although the latter may considerably impact their durability. Accidental or planned operating activities indeed give rise to complex pressure interactions and may drastically modified the local pressure value generating leaks and, in rare cases, pipe’s break. In this context, a numerical predictive analysis is conducted to prevent such event and optimize network management. A couple of Python/FORTRAN 90, home-made software, has been developed using Method Of Characteristic (MOC) solving for water-hammer equations. The solver is validated by direct comparison with theoretical and experimental measurement in simple configurations whilst afterward extended to network analysis. The algorithm's most costly steps are designed for parallel computation. A various set of boundary conditions and energetic losses models are considered for the network simulations. The results are analyzed in both real and frequencies domain and provide crucial information on the pressure distribution behavior within the network.

Keywords: energetic losses models, method of characteristic, numerical predictive analysis, water distribution network, water hammer

Procedia PDF Downloads 199
7490 Archaeology Study of Soul Houses in Ancient Egypt on Five Models in the Grand Egyptian Museum

Authors: Ayman Aboelkassem, Mahmoud Ali

Abstract:

Introduction: The models of soul houses have appeared in the prehistory, old kingdom and middle kingdom period. These soul houses represented the imagination of the deceased about his house in the afterlife, some of these soul houses were two floors and the study will examine five models of soul houses which were discovered near Saqqara site by an Egyptian mission. These models had been transferred to The Grand Egyptian Museum (GEM) to be ready to display at the new museum. We focus on models of soul houses (GEM Numbers, 1276, 1280, 1281, 1282, 8711) these models of soul houses were related to the old kingdom period. These models were all made of pottery, the five models have an oval shape and were decorated with relief. Methodology: The study will focus on the development of soul houses during the different periods in ancient Egypt, the function of soul houses, the kind of offerings which were put in it and the symbolism of the offerings colors in ancient Egyptian believe. Conclusion: This study is useful for the heritage and ancient civilizations especially when we talk about opening new museums like The Grand Egyptian Museum which will display a new collection of soul houses. The study of soul houses and The kinds of offerings which put in it reflect the economic situation in the Egyptian society and kinds of oils which were famous in ancient Egypt.

Keywords: archaeology study, Grand Egyptian Museum, relief, soul houses

Procedia PDF Downloads 230
7489 Uplift Segmentation Approach for Targeting Customers in a Churn Prediction Model

Authors: Shivahari Revathi Venkateswaran

Abstract:

Segmenting customers plays a significant role in churn prediction. It helps the marketing team with proactive and reactive customer retention. For the reactive retention, the retention team reaches out to customers who already showed intent to disconnect by giving some special offers. When coming to proactive retention, the marketing team uses churn prediction model, which ranks each customer from rank 1 to 100, where 1 being more risk to churn/disconnect (high ranks have high propensity to churn). The churn prediction model is built by using XGBoost model. However, with the churn rank, the marketing team can only reach out to the customers based on their individual ranks. To profile different groups of customers and to frame different marketing strategies for targeted groups of customers are not possible with the churn ranks. For this, the customers must be grouped in different segments based on their profiles, like demographics and other non-controllable attributes. This helps the marketing team to frame different offer groups for the targeted audience and prevent them from disconnecting (proactive retention). For segmentation, machine learning approaches like k-mean clustering will not form unique customer segments that have customers with same attributes. This paper finds an alternate approach to find all the combination of unique segments that can be formed from the user attributes and then finds the segments who have uplift (churn rate higher than the baseline churn rate). For this, search algorithms like fast search and recursive search are used. Further, for each segment, all customers can be targeted using individual churn ranks from the churn prediction model. Finally, a UI (User Interface) is developed for the marketing team to interactively search for the meaningful segments that are formed and target the right set of audience for future marketing campaigns and prevent them from disconnecting.

Keywords: churn prediction modeling, XGBoost model, uplift segments, proactive marketing, search algorithms, retention, k-mean clustering

Procedia PDF Downloads 48
7488 Power Grid Line Ampacity Forecasting Based on a Long-Short-Term Memory Neural Network

Authors: Xiang-Yao Zheng, Jen-Cheng Wang, Joe-Air Jiang

Abstract:

Improving the line ampacity while using existing power grids is an important issue that electricity dispatchers are now facing. Using the information provided by the dynamic thermal rating (DTR) of transmission lines, an overhead power grid can operate safely. However, dispatchers usually lack real-time DTR information. Thus, this study proposes a long-short-term memory (LSTM)-based method, which is one of the neural network models. The LSTM-based method predicts the DTR of lines using the weather data provided by Central Weather Bureau (CWB) of Taiwan. The possible thermal bottlenecks at different locations along the line and the margin of line ampacity can be real-time determined by the proposed LSTM-based prediction method. A case study that targets the 345 kV power grid of TaiPower in Taiwan is utilized to examine the performance of the proposed method. The simulation results show that the proposed method is useful to provide the information for the smart grid application in the future.

Keywords: electricity dispatch, line ampacity prediction, dynamic thermal rating, long-short-term memory neural network, smart grid

Procedia PDF Downloads 260