Search results for: Prediction Based Data Reduction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16681

Search results for: Prediction Based Data Reduction

16651 Representing Data without Lost Compression Properties in Time Series: A Review

Authors: Nabilah Filzah Mohd Radzuan, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Uncertain data is believed to be an important issue in building up a prediction model. The main objective in the time series uncertainty analysis is to formulate uncertain data in order to gain knowledge and fit low dimensional model prior to a prediction task. This paper discusses the performance of a number of techniques in dealing with uncertain data specifically those which solve uncertain data condition by minimizing the loss of compression properties.

Keywords: Compression properties, uncertainty, uncertain time series, mining technique, weather prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1575
16650 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Associative classification (AC) is a data mining approach that combines association rule and classification to build classification models (classifiers). AC has attracted a significant attention from several researchers mainly because it derives accurate classifiers that contain simple yet effective rules. In the last decade, a number of associative classification algorithms have been proposed such as Classification based Association (CBA), Classification based on Multiple Association Rules (CMAR), Class based Associative Classification (CACA), and Classification based on Predicted Association Rule (CPAR). This paper surveys major AC algorithms and compares the steps and methods performed in each algorithm including: rule learning, rule sorting, rule pruning, classifier building, and class prediction.

Keywords: Associative Classification, Classification, Data Mining, Learning, Rule Ranking, Rule Pruning, Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6573
16649 Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

Authors: Seyoung Kim, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Keywords: Big data, k-NN, machine learning, traffic speed prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1331
16648 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: Decision tree, genetic algorithm, machine learning, software defect prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410
16647 A Prediction-Based Reversible Watermarking for MRI Images

Authors: Nuha Omran Abokhdair, Azizah Bt Abdul Manaf

Abstract:

Reversible watermarking is a special branch of image watermarking, that is able to recover the original image after extracting the watermark from the image. In this paper, an adaptive prediction-based reversible watermarking scheme is presented, in order to increase the payload capacity of MRI medical images. The scheme divides the image into two parts, Region of Interest (ROI) and Region of Non-Interest (RONI). Two bits are embedded in each embeddable pixel of RONI and one bit is embedded in each embeddable pixel of ROI. The experimental results demonstrate that the proposed scheme is able to achieve high embedding capacity. This is mainly caused by two reasons. First, the pixels that were excluded from data embedding due to overflow/underflow are used for data embedding. Second, large location map that need to be added to watermark data as overhead is eliminated and thus lower data embedding capacity is prevented. Moreover, the scheme provides good visual quality to the watermarked image.

Keywords: Medical image watermarking, reversible watermarking, Difference Expansion, Prediction-Error Expansion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1880
16646 Two States Mapping Based Neural Network Model for Decreasing of Prediction Residual Error

Authors: Insung Jung, lockjo Koo, Gi-Nam Wang

Abstract:

The objective of this paper is to design a model of human vital sign prediction for decreasing prediction error by using two states mapping based time series neural network BP (back-propagation) model. Normally, lot of industries has been applying the neural network model by training them in a supervised manner with the error back-propagation algorithm for time series prediction systems. However, it still has a residual error between real value and prediction output. Therefore, we designed two states of neural network model for compensation of residual error which is possible to use in the prevention of sudden death and metabolic syndrome disease such as hypertension disease and obesity. We found that most of simulations cases were satisfied by the two states mapping based time series prediction model compared to normal BP. In particular, small sample size of times series were more accurate than the standard MLP model. We expect that this algorithm can be available to sudden death prevention and monitoring AGENT system in a ubiquitous homecare environment.

Keywords: Neural network, U-healthcare, prediction, timeseries, computer aided prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1941
16645 Certain Data Dimension Reduction Techniques for application with ANN based MCS for Study of High Energy Shower

Authors: Gitanjali Devi, Kandarpa Kumar Sarma, Pranayee Datta, Anjana Kakoti Mahanta

Abstract:

Cosmic showers, from their places of origin in space, after entering earth generate secondary particles called Extensive Air Shower (EAS). Detection and analysis of EAS and similar High Energy Particle Showers involve a plethora of experimental setups with certain constraints for which soft-computational tools like Artificial Neural Network (ANN)s can be adopted. The optimality of ANN classifiers can be enhanced further by the use of Multiple Classifier System (MCS) and certain data - dimension reduction techniques. This work describes the performance of certain data dimension reduction techniques like Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Self Organizing Map (SOM) approximators for application with an MCS formed using Multi Layer Perceptron (MLP), Recurrent Neural Network (RNN) and Probabilistic Neural Network (PNN). The data inputs are obtained from an array of detectors placed in a circular arrangement resembling a practical detector grid which have a higher dimension and greater correlation among themselves. The PCA, ICA and SOM blocks reduce the correlation and generate a form suitable for real time practical applications for prediction of primary energy and location of EAS from density values captured using detectors in a circular grid.

Keywords: EAS, Shower, Core, ANN, Location.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1567
16644 Estimation of Relative Subsidence of Collapsible Soils Using Electromagnetic Measurements

Authors: Henok Hailemariam, Frank Wuttke

Abstract:

Collapsible soils are weak soils that appear to be stable in their natural state, normally dry condition, but rapidly deform under saturation (wetting), thus generating large and unexpected settlements which often yield disastrous consequences for structures unwittingly built on such deposits. In this study, a prediction model for the relative subsidence of stressed collapsible soils based on dielectric permittivity measurement is presented. Unlike most existing methods for soil subsidence prediction, this model does not require moisture content as an input parameter, thus providing the opportunity to obtain accurate estimation of the relative subsidence of collapsible soils using dielectric measurement only. The prediction model is developed based on an existing relative subsidence prediction model (which is dependent on soil moisture condition) and an advanced theoretical frequency and temperature-dependent electromagnetic mixing equation (which effectively removes the moisture content dependence of the original relative subsidence prediction model). For large scale sub-surface soil exploration purposes, the spatial sub-surface soil dielectric data over wide areas and high depths of weak (collapsible) soil deposits can be obtained using non-destructive high frequency electromagnetic (HF-EM) measurement techniques such as ground penetrating radar (GPR). For laboratory or small scale in-situ measurements, techniques such as an open-ended coaxial line with widely applicable time domain reflectometry (TDR) or vector network analysers (VNAs) are usually employed to obtain the soil dielectric data. By using soil dielectric data obtained from small or large scale non-destructive HF-EM investigations, the new model can effectively predict the relative subsidence of weak soils without the need to extract samples for moisture content measurement. Some of the resulting benefits are the preservation of the undisturbed nature of the soil as well as a reduction in the investigation costs and analysis time in the identification of weak (problematic) soils. The accuracy of prediction of the presented model is assessed by conducting relative subsidence tests on a collapsible soil at various initial soil conditions and a good match between the model prediction and experimental results is obtained.

Keywords: Collapsible soil, relative subsidence, dielectric permittivity, moisture content.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1066
16643 Integration of Big Data to Predict Transportation for Smart Cities

Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin

Abstract:

The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system.  The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.

Keywords: Big data, bus headway prediction, machine learning, public transportation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1519
16642 Artificial Neural Networks Technique for Seismic Hazard Prediction Using Seismic Bumps

Authors: Belkacem Selma, Boumediene Selma, Samira Chouraqui, Hanifi Missoum, Tourkia Guerzou

Abstract:

Natural disasters have occurred and will continue to cause human and material damage. Therefore, the idea of "preventing" natural disasters will never be possible. However, their prediction is possible with the advancement of technology. Even if natural disasters are effectively inevitable, their consequences may be partly controlled. The rapid growth and progress of artificial intelligence (AI) had a major impact on the prediction of natural disasters and risk assessment which are necessary for effective disaster reduction. Earthquake prediction to prevent the loss of human lives and even property damage is an important factor; that, is why it is crucial to develop techniques for predicting this natural disaster. This study aims to analyze the ability of artificial neural networks (ANNs) to predict earthquakes that occur in a given area. The used data describe the problem of high energy (higher than 104 J) seismic bumps forecasting in a coal mine using two long walls as an example. For this purpose, seismic bumps data obtained from mines have been analyzed. The results obtained show that the ANN is able to predict earthquake parameters with  high accuracy; the classification accuracy through neural networks is more than 94%, and the models developed are efficient and robust and depend only weakly on the initial database.

Keywords: Earthquake prediction, artificial intelligence, AI, Artificial Neural Network, ANN, seismic bumps.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1092
16641 A New History Based Method to Handle the Recurring Concept Shifts in Data Streams

Authors: Hossein Morshedlou, Ahmad Abdollahzade Barforoush

Abstract:

Recent developments in storage technology and networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data streams.

Keywords: Data Stream, Classification, Concept Shift, History.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1242
16640 Model Order Reduction for Frequency Response and Effect of Order of Method for Matching Condition

Authors: Aref Ghafouri, Mohammad Javad Mollakazemi, Farhad Asadi

Abstract:

In this paper, model order reduction method is used for approximation in linear and nonlinearity aspects in some experimental data. This method can be used for obtaining offline reduced model for approximation of experimental data and can produce and follow the data and order of system and also it can match to experimental data in some frequency ratios. In this study, the method is compared in different experimental data and influence of choosing of order of the model reduction for obtaining the best and sufficient matching condition for following the data is investigated in format of imaginary and reality part of the frequency response curve and finally the effect and important parameter of number of order reduction in nonlinear experimental data is explained further.

Keywords: Frequency response, Order of model reduction, frequency matching condition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014
16639 A Spatial Information Network Traffic Prediction Method Based on Hybrid Model

Authors: Jingling Li, Yi Zhang, Wei Liang, Tao Cui, Jun Li

Abstract:

Compared with terrestrial network, the traffic of spatial information network has both self-similarity and short correlation characteristics. By studying its traffic prediction method, the resource utilization of spatial information network can be improved, and the method can provide an important basis for traffic planning of a spatial information network. In this paper, considering the accuracy and complexity of the algorithm, the spatial information network traffic is decomposed into approximate component with long correlation and detail component with short correlation, and a time series hybrid prediction model based on wavelet decomposition is proposed to predict the spatial network traffic. Firstly, the original traffic data are decomposed to approximate components and detail components by using wavelet decomposition algorithm. According to the autocorrelation and partial correlation smearing and truncation characteristics of each component, the corresponding model (AR/MA/ARMA) of each detail component can be directly established, while the type of approximate component modeling can be established by ARIMA model after smoothing. Finally, the prediction results of the multiple models are fitted to obtain the prediction results of the original data. The method not only considers the self-similarity of a spatial information network, but also takes into account the short correlation caused by network burst information, which is verified by using the measured data of a certain back bone network released by the MAWI working group in 2018. Compared with the typical time series model, the predicted data of hybrid model is closer to the real traffic data and has a smaller relative root means square error, which is more suitable for a spatial information network.

Keywords: Spatial Information Network, Traffic prediction, Wavelet decomposition, Time series model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 571
16638 A Novel Prediction Method for Tag SNP Selection using Genetic Algorithm based on KNN

Authors: Li-Yeh Chuang, Yu-Jen Hou, Jr., Cheng-Hong Yang

Abstract:

Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene association. However, research is limited by the cost of genotyping the tremendous number of SNPs. Therefore, it is important to identify a small subset of informative SNPs, the so-called tag SNPs. This subset consists of selected SNPs of the genotypes, and accurately represents the rest of the SNPs. Furthermore, an effective evaluation method is needed to evaluate prediction accuracy of a set of tag SNPs. In this paper, a genetic algorithm (GA) is applied to tag SNP problems, and the K-nearest neighbor (K-NN) serves as a prediction method of tag SNP selection. The experimental data used was taken from the HapMap project; it consists of genotype data rather than haplotype data. The proposed method consistently identified tag SNPs with considerably better prediction accuracy than methods from the literature. At the same time, the number of tag SNPs identified was smaller than the number of tag SNPs in the other methods. The run time of the proposed method was much shorter than the run time of the SVM/STSA method when the same accuracy was reached.

Keywords: Genetic Algorithm (GA), Genotype, Single nucleotide polymorphism (SNP), tag SNPs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1724
16637 Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Authors: Ulas Vural, M. Ergun Okay, E. Mesut Yildiz

Abstract:

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

Keywords: Customer relationship management, churn prediction, telecom industry, deep learning, Artificial Neural Networks, ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 696
16636 An Improved Prediction Model of Ozone Concentration Time Series Based On Chaotic Approach

Authors: N. Z. A. Hamid, M. S. M. Noorani

Abstract:

This study is focused on the development of prediction models of the Ozone concentration time series. Prediction model is built based on chaotic approach. Firstly, the chaotic nature of the time series is detected by means of phase space plot and the Cao method. Then, the prediction model is built and the local linear approximation method is used for the forecasting purposes. Traditional prediction of autoregressive linear model is also built. Moreover, an improvement in local linear approximation method is also performed. Prediction models are applied to the hourly Ozone time series observed at the benchmark station in Malaysia. Comparison of all models through the calculation of mean absolute error, root mean squared error and correlation coefficient shows that the one with improved prediction method is the best. Thus, chaotic approach is a good approach to be used to develop a prediction model for the Ozone concentration time series.

Keywords: Chaotic approach, phase space, Cao method, local linear approximation method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1734
16635 Dust Storm Prediction Using ANNs Technique (A Case Study: Zabol City)

Authors: Jamalizadeh, M.R., Moghaddamnia, A., Piri, J., Arbabi, V., Homayounifar, M., Shahryari, A.

Abstract:

Dust storms are one of the most costly and destructive events in many desert regions. They can cause massive damages both in natural environments and human lives. This paper is aimed at presenting a preliminary study on dust storms, as a major natural hazard in arid and semi-arid regions. As a case study, dust storm events occurred in Zabol city located in Sistan Region of Iran was analyzed to diagnose and predict dust storms. The identification and prediction of dust storm events could have significant impacts on damages reduction. Present models for this purpose are complicated and not appropriate for many areas with poor-data environments. The present study explores Gamma test for identifying inputs of ANNs model, for dust storm prediction. Results indicate that more attempts must be carried out concerning dust storms identification and segregate between various dust storm types.

Keywords: Dust Storm, Gamma Test, Prediction, ANNs, Zabol.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2103
16634 Evaluation of Low-Reducible Sinter in Blast Furnace Technology by Mathematical Model Developed at Centre ENET, VSB – Technical University of Ostrava

Authors: S. Jursová, P. Pustějovská, S. Brožová, J. Bilík

Abstract:

The paper deals with possibilities of interpretation of iron ore reducibility tests. It presents a mathematical model developed at Centre ENET, VŠB – Technical University of Ostrava, Czech Republic for an evaluation of metallurgical material of blast furnace feedstock such as iron ore, sinter or pellets. According to the data from the test, the model predicts its usage in blast furnace technology and its effects on production parameters of shaft aggregate. At the beginning, the paper sums up the general concept and experience in mathematical modelling of iron ore reduction. It presents basic equation for the calculation and the main parts of the developed model. In the experimental part, there is an example of usage of the mathematical model. The paper describes the usage of data for some predictive calculation. There are presented material, method of carried test of iron ore reducibility. Then there are graphically interpreted effects of used material on carbon consumption, rate of direct reduction and the whole reduction process.

Keywords: Blast furnace technology, iron ore reduction, mathematical model, prediction of iron ore reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1947
16633 Equity Risk Premiums and Risk Free Rates in Modelling and Prediction of Financial Markets

Authors: Mohammad Ghavami, Reza S. Dilmaghani

Abstract:

This paper presents an adaptive framework for modelling financial markets using equity risk premiums, risk free rates and volatilities. The recorded economic factors are initially used to train four adaptive filters for a certain limited period of time in the past. Once the systems are trained, the adjusted coefficients are used for modelling and prediction of an important financial market index. Two different approaches based on least mean squares (LMS) and recursive least squares (RLS) algorithms are investigated. Performance analysis of each method in terms of the mean squared error (MSE) is presented and the results are discussed. Computer simulations carried out using recorded data show MSEs of 4% and 3.4% for the next month prediction using LMS and RLS adaptive algorithms, respectively. In terms of twelve months prediction, RLS method shows a better tendency estimation compared to the LMS algorithm.

Keywords: Prediction of financial markets, Adaptive methods, MSE, LSE.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 976
16632 Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: Die-Map Clustering, Feature Extraction, Pattern Recognition, Semiconductor Manufacturing Process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3098
16631 Integration of Educational Data Mining Models to a Web-Based Support System for Predicting High School Student Performance

Authors: Sokkhey Phauk, Takeo Okazaki

Abstract:

The challenging task in educational institutions is to maximize the high performance of students and minimize the failure rate of poor-performing students. An effective method to leverage this task is to know student learning patterns with highly influencing factors and get an early prediction of student learning outcomes at the timely stage for setting up policies for improvement. Educational data mining (EDM) is an emerging disciplinary field of data mining, statistics, and machine learning concerned with extracting useful knowledge and information for the sake of improvement and development in the education environment. The study is of this work is to propose techniques in EDM and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models is conducted. Subsequently, high performing models are developed to get higher performance. The hybrid random forest (Hybrid RF) produces the most successful classification. For the context of intervention and improving the learning outcomes, a feature selection method MICHI, which is the combination of mutual information (MI) and chi-square (CHI) algorithms based on the ranked feature scores, is introduced to select a dominant feature set that improves the performance of prediction and uses the obtained dominant set as information for intervention. By using the proposed techniques of EDM, an academic performance prediction system (APPS) is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental outcomes and evaluation surveys report the effectiveness and usefulness of the developed system. The system is used to help educational stakeholders and related individuals for intervening and improving student performance.

Keywords: Academic performance prediction system, prediction model, educational data mining, dominant factors, feature selection methods, student performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 910
16630 Novel Hybrid Method for Gene Selection and Cancer Prediction

Authors: Liping Jing, Michael K. Ng, Tieyong Zeng

Abstract:

Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.

Keywords: Gene Selection, Cancer Prediction, Lasso, Clustering, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1998
16629 Modified Naïve Bayes Based Prediction Modeling for Crop Yield Prediction

Authors: Kefaya Qaddoum

Abstract:

Most of greenhouse growers desire a determined amount of yields in order to accurately meet market requirements. The purpose of this paper is to model a simple but often satisfactory supervised classification method. The original naive Bayes have a serious weakness, which is producing redundant predictors. In this paper, utilized regularization technique was used to obtain a computationally efficient classifier based on naive Bayes. The suggested construction, utilized L1-penalty, is capable of clearing redundant predictors, where a modification of the LARS algorithm is devised to solve this problem, making this method applicable to a wide range of data. In the experimental section, a study conducted to examine the effect of redundant and irrelevant predictors, and test the method on WSG data set for tomato yields, where there are many more predictors than data, and the urge need to predict weekly yield is the goal of this approach. Finally, the modified approach is compared with several naive Bayes variants and other classification algorithms (SVM and kNN), and is shown to be fairly good.

Keywords: Tomato yields prediction, naive Bayes, redundancy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5055
16628 Intelligent Earthquake Prediction System Based On Neural Network

Authors: Emad Amar, Tawfik Khattab, Fatma Zada

Abstract:

Predicting earthquakes is an important issue in the study of geography. Accurate prediction of earthquakes can help people to take effective measures to minimize the loss of personal and economic damage, such as large casualties, destruction of buildings and broken of traffic, occurred within a few seconds. United States Geological Survey (USGS) science organization provides reliable scientific information about Earthquake Existed throughout history & the Preliminary database from the National Center Earthquake Information (NEIC) show some useful factors to predict an earthquake in a seismic area like Aleutian Arc in the U.S. state of Alaska. The main advantage of this prediction method that it does not require any assumption, it makes prediction according to the future evolution of the object's time series. The article compares between simulation data result from trained BP and RBF neural network versus actual output result from the system calculations. Therefore, this article focuses on analysis of data relating to real earthquakes. Evaluation results show better accuracy and higher speed by using radial basis functions (RBF) neural network.

Keywords: BP neural network, Prediction, RBF neural network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3172
16627 High Capacity Data Hiding based on Predictor and Histogram Modification

Authors: Hui-Yu Huang, Shih-Hsu Chang

Abstract:

In this paper, we propose a high capacity image hiding technology based on pixel prediction and the difference of modified histogram. This approach is used the pixel prediction and the difference of modified histogram to calculate the best embedding point. This approach can improve the predictive accuracy and increase the pixel difference to advance the hiding capacity. We also use the histogram modification to prevent the overflow and underflow. Experimental results demonstrate that our proposed method within the same average hiding capacity can still keep high quality of image and low distortion

Keywords: data hiding, predictor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1855
16626 Breast Cancer Prediction Using Score-Level Fusion of Machine Learning and Deep Learning Models

Authors: [email protected]

Abstract:

Breast cancer is one of the most common types in women. Early prediction of breast cancer helps physicians detect cancer in its early stages. Big cancer data need a very powerful tool to analyze and extract predictions. Machine learning and deep learning are two of the most efficient tools for predicting cancer based on textual data. In this study, we developed a fusion model of two machine learning and deep learning models. To obtain the final prediction, Long-Short Term Memory (LSTM), ensemble learning with hyper parameters optimization, and score-level fusion is used. Experiments are done on the Breast Cancer Surveillance Consortium (BCSC) dataset after balancing and grouping the class categories. Five different training scenarios are used, and the tests show that the designed fusion model improved the performance by 3.3% compared to the individual models.

Keywords: Machine learning, Deep learning, cancer prediction, breast cancer, LSTM, Score-Level Fusion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 308
16625 Wavelet and K-L Seperability Based Feature Extraction Method for Functional Data Classification

Authors: Jun Wan, Zehua Chen, Yingwu Chen, Zhidong Bai

Abstract:

This paper proposes a novel feature extraction method, based on Discrete Wavelet Transform (DWT) and K-L Seperability (KLS), for the classification of Functional Data (FD). This method combines the decorrelation and reduction property of DWT and the additive independence property of KLS, which is helpful to extraction classification features of FD. It is an advanced approach of the popular wavelet based shrinkage method for functional data reduction and classification. A theory analysis is given in the paper to prove the consistent convergence property, and a simulation study is also done to compare the proposed method with the former shrinkage ones. The experiment results show that this method has advantages in improving classification efficiency, precision and robustness.

Keywords: classification, functional data, feature extraction, K-Lseperability, wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1406
16624 A Comparison of Grey Model and Fuzzy Predictive Model for Time Series

Authors: A. I. Dounis, P. Tiropanis, D. Tseles, G. Nikolaou, G. P. Syrcos

Abstract:

The prediction of meteorological parameters at a meteorological station is an interesting and open problem. A firstorder linear dynamic model GM(1,1) is the main component of the grey system theory. The grey model requires only a few previous data points in order to make a real-time forecast. In this paper, we consider the daily average ambient temperature as a time series and the grey model GM(1,1) applied to local prediction (short-term prediction) of the temperature. In the same case study we use a fuzzy predictive model for global prediction. We conclude the paper with a comparison between local and global prediction schemes.

Keywords: Fuzzy predictive model, grey model, local andglobal prediction, meteorological forecasting, time series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2099
16623 Dimension Reduction of Microarray Data Based on Local Principal Component

Authors: Ali Anaissi, Paul J. Kennedy, Madhu Goyal

Abstract:

Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.

Keywords: Linear Dimension Reduction, Non-Linear Dimension Reduction, Principal Component Analysis, Biologists.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1540
16622 Analysis of Physicochemical Properties on Prediction of R5, X4 and R5X4 HIV-1 Coreceptor Usage

Authors: Kai-Ti Hsu, Hui-Ling Huang, Chun-Wei Tung, Yi-Hsiung Chen, Shinn-Ying Ho

Abstract:

Bioinformatics methods for predicting the T cell coreceptor usage from the array of membrane protein of HIV-1 are investigated. In this study, we aim to propose an effective prediction method for dealing with the three-class classification problem of CXCR4 (X4), CCR5 (R5) and CCR5/CXCR4 (R5X4). We made efforts in investigating the coreceptor prediction problem as follows: 1) proposing a feature set of informative physicochemical properties which is cooperated with SVM to achieve high prediction test accuracy of 81.48%, compared with the existing method with accuracy of 70.00%; 2) establishing a large up-to-date data set by increasing the size from 159 to 1225 sequences to verify the proposed prediction method where the mean test accuracy is 88.59%, and 3) analyzing the set of 14 informative physicochemical properties to further understand the characteristics of HIV-1coreceptors.

Keywords: Coreceptor, genetic algorithm, HIV-1, SVM, physicochemical properties, prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2338