Search results for: Data mining classification algorithms
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9096

Search results for: Data mining classification algorithms

8796 A Hybrid Approach for Quantification of Novelty in Rule Discovery

Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar

Abstract:

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.

Keywords: Knowledge Discovery in Databases (KDD), Data Mining, Rule Discovery, Interestingness, Subjective Measures, Novelty Measure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1354
8795 Exploring Performance-Based Music Attributes for Stylometric Analysis

Authors: Abdellghani Bellaachia, Edward Jimenez

Abstract:

Music Information Retrieval (MIR) and modern data mining techniques are applied to identify style markers in midi music for stylometric analysis and author attribution. Over 100 attributes are extracted from a library of 2830 songs then mined using supervised learning data mining techniques. Two attributes are identified that provide high informational gain. These attributes are then used as style markers to predict authorship. Using these style markers the authors are able to correctly distinguish songs written by the Beatles from those that were not with a precision and accuracy of over 98 per cent. The identification of these style markers as well as the architecture for this research provides a foundation for future research in musical stylometry.

Keywords: Music Information Retrieval, Music Data Mining, Stylometry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1680
8794 The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making

Authors: Nevena Stolba, A Min Tjoa

Abstract:

Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.

Keywords: data mining, data warehousing, decision-support systems, evidence-based medicine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3811
8793 Exponentially Weighted Simultaneous Estimation of Several Quantiles

Authors: Valeriy Naumov, Olli Martikainen

Abstract:

In this paper we propose new method for simultaneous generating multiple quantiles corresponding to given probability levels from data streams and massive data sets. This method provides a basis for development of single-pass low-storage quantile estimation algorithms, which differ in complexity, storage requirement and accuracy. We demonstrate that such algorithms may perform well even for heavy-tailed data.

Keywords: Quantile estimation, data stream, heavy-taileddistribution, tail index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1533
8792 Ensemble Learning with Decision Tree for Remote Sensing Classification

Authors: Mahesh Pal

Abstract:

In recent years, a number of works proposing the combination of multiple classifiers to produce a single classification have been reported in remote sensing literature. The resulting classifier, referred to as an ensemble classifier, is generally found to be more accurate than any of the individual classifiers making up the ensemble. As accuracy is the primary concern, much of the research in the field of land cover classification is focused on improving classification accuracy. This study compares the performance of four ensemble approaches (boosting, bagging, DECORATE and random subspace) with a univariate decision tree as base classifier. Two training datasets, one without ant noise and other with 20 percent noise was used to judge the performance of different ensemble approaches. Results with noise free data set suggest an improvement of about 4% in classification accuracy with all ensemble approaches in comparison to the results provided by univariate decision tree classifier. Highest classification accuracy of 87.43% was achieved by boosted decision tree. A comparison of results with noisy data set suggests that bagging, DECORATE and random subspace approaches works well with this data whereas the performance of boosted decision tree degrades and a classification accuracy of 79.7% is achieved which is even lower than that is achieved (i.e. 80.02%) by using unboosted decision tree classifier.

Keywords: Ensemble learning, decision tree, remote sensingclassification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2584
8791 Selection of Best Band Combination for Soil Salinity Studies using ETM+ Satellite Images (A Case study: Nyshaboor Region,Iran)

Authors: Sanaeinejad, S. H.; A. Astaraei, . P. Mirhoseini.Mousavi, M. Ghaemi,

Abstract:

One of the main environmental problems which affect extensive areas in the world is soil salinity. Traditional data collection methods are neither enough for considering this important environmental problem nor accurate for soil studies. Remote sensing data could overcome most of these problems. Although satellite images are commonly used for these studies, however there are still needs to find the best calibration between the data and real situations in each specified area. Neyshaboor area, North East of Iran was selected as a field study of this research. Landsat satellite images for this area were used in order to prepare suitable learning samples for processing and classifying the images. 300 locations were selected randomly in the area to collect soil samples and finally 273 locations were reselected for further laboratory works and image processing analysis. Electrical conductivity of all samples was measured. Six reflective bands of ETM+ satellite images taken from the study area in 2002 were used for soil salinity classification. The classification was carried out using common algorithms based on the best composition bands. The results showed that the reflective bands 7, 3, 4 and 1 are the best band composition for preparing the color composite images. We also found out, that hybrid classification is a suitable method for identifying and delineation of different salinity classes in the area.

Keywords: Soil salinity, Remote sensing, Image processing, ETM+, Nyshaboor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2021
8790 A Bayesian Classification System for Facilitating an Institutional Risk Profile Definition

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for easy creation and classification of institutional risk profiles supporting endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support set up of the most important risk factors. Subsequently, risk profiles employ risk factors classifier and associated configurations to support digital preservation experts with a semi-automatic estimation of endangerment group for file format risk profiles. Our goal is to make use of an expert knowledge base, accuired through a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation of risk factors for a requried dimension for analysis. Using the naive Bayes method, the decision support system recommends to an expert the matching risk profile group for the previously selected institutional risk profile. The proposed methods improve the visibility of risk factor values and the quality of a digital preservation process. The presented approach is designed to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and values of file format risk profiles. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert and to define its profile group. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.

Keywords: linked open data, information integration, digital libraries, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 730
8789 Discovering User Behaviour Patterns from Web Log Analysis to Enhance the Accessibility and Usability of Website

Authors: Harpreet Singh

Abstract:

Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.

Keywords: Web usage mining, log file, web mining, data mining, deep log analyser.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1062
8788 Analysis of Feature Space for a 2d/3d Vision based Emotion Recognition Method

Authors: Robert Niese, Ayoub Al-Hamadi, Bernd Michaelis

Abstract:

In modern human computer interaction systems (HCI), emotion recognition is becoming an imperative characteristic. The quest for effective and reliable emotion recognition in HCI has resulted in a need for better face detection, feature extraction and classification. In this paper we present results of feature space analysis after briefly explaining our fully automatic vision based emotion recognition method. We demonstrate the compactness of the feature space and show how the 2d/3d based method achieves superior features for the purpose of emotion classification. Also it is exposed that through feature normalization a widely person independent feature space is created. As a consequence, the classifier architecture has only a minor influence on the classification result. This is particularly elucidated with the help of confusion matrices. For this purpose advanced classification algorithms, such as Support Vector Machines and Artificial Neural Networks are employed, as well as the simple k- Nearest Neighbor classifier.

Keywords: Facial expression analysis, Feature extraction, Image processing, Pattern Recognition, Application.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1923
8787 Image Classification and Accuracy Assessment Using the Confusion Matrix, Contingency Matrix, and Kappa Coefficient

Authors: F. F. Howard, C. B. Boye, I. Yakubu, J. S. Y. Kuma

Abstract:

One of the ways that could be used for the production of land use and land cover maps by a procedure known as image classification is the use of the remote sensing technique. Numerous elements ought to be taken into consideration, including the availability of highly satisfactory Landsat imagery, secondary data and a precise classification process. The goal of this study was to classify and map the land use and land cover of the study area using remote sensing and Geospatial Information System (GIS) analysis. The classification was done using Landsat 8 satellite images acquired in December 2020 covering the study area. The Landsat image was downloaded from the USGS. The Landsat image with 30 m resolution was geo-referenced to the WGS_84 datum and Universal Transverse Mercator (UTM) Zone 30N coordinate projection system. A radiometric correction was applied to the image to reduce the noise in the image. This study consists of two sections: the Land Use/Land Cover (LULC) and Accuracy Assessments using the confusion and contingency matrix and the Kappa coefficient. The LULC classifications were vegetation (agriculture) (67.87%), water bodies (0.01%), mining areas (5.24%), forest (26.02%), and settlement (0.88%). The overall accuracy of 97.87% and the kappa coefficient (K) of 97.3% were obtained for the confusion matrix. While an overall accuracy of 95.7% and a Kappa coefficient of 0.947 were obtained for the contingency matrix, the kappa coefficients were rated as substantial; hence, the classified image is fit for further research.

Keywords: Confusion Matrix, contingency matrix, kappa coefficient, land used/ land cover, accuracy assessment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 254
8786 A Family of Affine Projection Adaptive Filtering Algorithms With Selective Regressors

Authors: Mohammad Shams Esfand Abadi, Nader Hadizadeh Kashani, Vahid Mehrdad

Abstract:

In this paper we present a general formalism for the establishment of the family of selective regressor affine projection algorithms (SR-APA). The SR-APA, the SR regularized APA (SR-RAPA), the SR partial rank algorithm (SR-PRA), the SR binormalized data reusing least mean squares (SR-BNDR-LMS), and the SR normalized LMS with orthogonal correction factors (SR-NLMS-OCF) algorithms are established by this general formalism. We demonstrate the performance of the presented algorithms through simulations in acoustic echo cancellation scenario.

Keywords: Adaptive filter, affine projection, selective regressor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1574
8785 Hybrid Structure Learning Approach for Assessing the Phosphate Laundries Impact

Authors: Emna Benmohamed, Hela Ltifi, Mounir Ben Ayed

Abstract:

Bayesian Network (BN) is one of the most efficient classification methods. It is widely used in several fields (i.e., medical diagnostics, risk analysis, bioinformatics research). The BN is defined as a probabilistic graphical model that represents a formalism for reasoning under uncertainty. This classification method has a high-performance rate in the extraction of new knowledge from data. The construction of this model consists of two phases for structure learning and parameter learning. For solving this problem, the K2 algorithm is one of the representative data-driven algorithms, which is based on score and search approach. In addition, the integration of the expert's knowledge in the structure learning process allows the obtainment of the highest accuracy. In this paper, we propose a hybrid approach combining the improvement of the K2 algorithm called K2 algorithm for Parents and Children search (K2PC) and the expert-driven method for learning the structure of BN. The evaluation of the experimental results, using the well-known benchmarks, proves that our K2PC algorithm has better performance in terms of correct structure detection. The real application of our model shows its efficiency in the analysis of the phosphate laundry effluents' impact on the watershed in the Gafsa area (southwestern Tunisia).

Keywords: Classification, Bayesian network; structure learning, K2 algorithm, expert knowledge, surface water analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 513
8784 The Performance of Predictive Classification Using Empirical Bayes

Authors: N. Deetae, S. Sukparungsee, Y. Areepong, K. Jampachaisri

Abstract:

This research is aimed to compare the percentages of correct classification of Empirical Bayes method (EB) to Classical method when data are constructed as near normal, short-tailed and long-tailed symmetric, short-tailed and long-tailed asymmetric. The study is performed using conjugate prior, normal distribution with known mean and unknown variance. The estimated hyper-parameters obtained from EB method are replaced in the posterior predictive probability and used to predict new observations. Data are generated, consisting of training set and test set with the sample sizes 100, 200 and 500 for the binary classification. The results showed that EB method exhibited an improved performance over Classical method in all situations under study.

Keywords: Classification, Empirical Bayes, Posterior predictive probability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597
8783 An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions

Authors: R. Mallika, V. Saravanan

Abstract:

This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.

Keywords: Support vector machines-one against all, cancerclassification, Linear Discriminant analysis, K nearest neighbour, microarray gene expression, gene pair ranking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2562
8782 ECG-Based Heartbeat Classification Using Convolutional Neural Networks

Authors: Jacqueline R. T. Alipo-on, Francesca I. F. Escobar, Myles J. T. Tan, Hezerul Abdul Karim, Nouar AlDahoul

Abstract:

Electrocardiogram (ECG) signal analysis and processing are crucial in the diagnosis of cardiovascular diseases which are considered as one of the leading causes of mortality worldwide. However, the traditional rule-based analysis of large volumes of ECG data is time-consuming, labor-intensive, and prone to human errors. With the advancement of the programming paradigm, algorithms such as machine learning have been increasingly used to perform an analysis on the ECG signals. In this paper, various deep learning algorithms were adapted to classify five classes of heart beat types. The dataset used in this work is the synthetic MIT-Beth Israel Hospital (MIT-BIH) Arrhythmia dataset produced from generative adversarial networks (GANs). Various deep learning models such as ResNet-50 convolutional neural network (CNN), 1-D CNN, and long short-term memory (LSTM) were evaluated and compared. ResNet-50 was found to outperform other models in terms of recall and F1 score using a five-fold average score of 98.88% and 98.87%, respectively. 1-D CNN, on the other hand, was found to have the highest average precision of 98.93%.

Keywords: Heartbeat classification, convolutional neural network, electrocardiogram signals, ECG signals, generative adversarial networks, long short-term memory, LSTM, ResNet-50.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 189
8781 Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network

Authors: Insung Jung, Gi-Nam Wang

Abstract:

The objective of this paper is to a design of pattern classification model based on the back-propagation (BP) algorithm for decision support system. Standard BP model has done full connection of each node in the layers from input to output layers. Therefore, it takes a lot of computing time and iteration computing for good performance and less accepted error rate when we are doing some pattern generation or training the network. However, this model is using exclusive connection in between hidden layer nodes and output nodes. The advantage of this model is less number of iteration and better performance compare with standard back-propagation model. We simulated some cases of classification data and different setting of network factors (e.g. hidden layer number and nodes, number of classification and iteration). During our simulation, we found that most of simulations cases were satisfied by BP based using exclusive connection network model compared to standard BP. We expect that this algorithm can be available to identification of user face, analysis of data, mapping data in between environment data and information.

Keywords: Neural network, Back-propagation, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1656
8780 Application of Data Mining Tools to Predicate Completion Time of a Project

Authors: Seyed Hossein Iranmanesh, Zahra Mokhtari

Abstract:

Estimation time and cost of work completion in a project and follow up them during execution are contributors to success or fail of a project, and is very important for project management team. Delivering on time and within budgeted cost needs to well managing and controlling the projects. To dealing with complex task of controlling and modifying the baseline project schedule during execution, earned value management systems have been set up and widely used to measure and communicate the real physical progress of a project. But it often fails to predict the total duration of the project. In this paper data mining techniques is used predicting the total project duration in term of Time Estimate At Completion-EAC (t). For this purpose, we have used a project with 90 activities, it has updated day by day. Then, it is used regular indexes in literature and applied Earned Duration Method to calculate time estimate at completion and set these as input data for prediction and specifying the major parameters among them using Clem software. By using data mining, the effective parameters on EAC and the relationship between them could be extracted and it is very useful to manage a project with minimum delay risks. As we state, this could be a simple, safe and applicable method in prediction the completion time of a project during execution.

Keywords: Data Mining Techniques, Earned Duration Method, Earned Value, Estimate At Completion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1803
8779 Impovement of a Label Extraction Method for a Risk Search System

Authors: Shigeaki Sakurai, Ryohei Orihara

Abstract:

This paper proposes an improvement method of classification efficiency in a classification model. The model is used in a risk search system and extracts specific labels from articles posted at bulletin board sites. The system can analyze the important discussions composed of the articles. The improvement method introduces ensemble learning methods that use multiple classification models. Also, it introduces expressions related to the specific labels into generation of word vectors. The paper applies the improvement method to articles collected from three bulletin board sites selected by users and verifies the effectiveness of the improvement method.

Keywords: Text mining, Risk search system, Corporate reputation, Bulletin board site, Ensemble learning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1325
8778 About Methods of Additional Mining Pressure Figuring while Reconstruction of Tunnels

Authors: M. Moistsrapishvili, I. Ugrekhelidze, T. Baramashvili, D. Malaghuradze

Abstract:

At the end of the 20th century it was actual the development of transport corridors and the improvement of their technical parameters. With this purpose, many countries and Georgia among them manufacture to construct new highways, railways and also reconstruction-modernization of the existing transport infrastructure. It is necessary to explore the artificial structures (bridges and tunnels) on the existing tracks as they are very old. Conference report includes the peculiarities of reconstruction of tunnels, because we think that this theme is important for the modernization of the existing road infrastructure. We must remark that the methods of determining mining pressure of tunnel reconstructions are worked out according to the jobs of new tunnels but it is necessary to foresee additional mining pressure which will be formed during their reconstruction. In this report there are given the methods of figuring the additional mining pressure while reconstruction of tunnels, there was worked out the computer program, it is determined that during reconstruction of tunnels the additional mining pressure is 1/3rd of main mining pressure.

Keywords: Mining pressure, Reconstruction of tunnels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1677
8777 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic

Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam

Abstract:

In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.

Keywords: Decision support system, data mining, knowledge discovery, data discovery, fuzzy logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2132
8776 Decision Support System Based on Data Warehouse

Authors: Yang Bao, LuJing Zhang

Abstract:

Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.

Keywords: Decision Support System, Data Warehouse, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3862
8775 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning

Authors: Chunming Xu

Abstract:

Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.

Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1447
8774 Analysis of Student Motivation Behavior on e-Learning Based on Association Rule Mining

Authors: Kunyanuth Kularbphettong, Phanu Waraporn, Cholticha Tongsiri

Abstract:

This research aims to create a model for analysis of student motivation behavior on e-Learning based on association rule mining techniques in case of the Information Technology for Communication and Learning Course at Suan Sunandha Rajabhat University. The model was created under association rules, one of the data mining techniques with minimum confidence. The results showed that the student motivation behavior model by using association rule technique can indicate the important variables that influence the student motivation behavior on e-Learning.

Keywords: Motivation behavior, e-learning, moodle log, association rule mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1886
8773 Support Vector Machine Approach for Classification of Cancerous Prostate Regions

Authors: Metehan Makinacı

Abstract:

The objective of this paper, is to apply support vector machine (SVM) approach for the classification of cancerous and normal regions of prostate images. Three kinds of textural features are extracted and used for the analysis: parameters of the Gauss- Markov random field (GMRF), correlation function and relative entropy. Prostate images are acquired by the system consisting of a microscope, video camera and a digitizing board. Cross-validated classification over a database of 46 images is implemented to evaluate the performance. In SVM classification, sensitivity and specificity of 96.2% and 97.0% are achieved for the 32x32 pixel block sized data, respectively, with an overall accuracy of 96.6%. Classification performance is compared with artificial neural network and k-nearest neighbor classifiers. Experimental results demonstrate that the SVM approach gives the best performance.

Keywords: Computer-aided diagnosis, support vector machines, Gauss-Markov random fields, texture classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1792
8772 An Exact Solution to Support Vector Mixture

Authors: Monjed Ezzeddinne, Nicolas Lefebvre, Régis Lengellé

Abstract:

This paper presents a new version of the SVM mixture algorithm initially proposed by Kwok for classification and regression problems. For both cases, a slight modification of the mixture model leads to a standard SVM training problem, to the existence of an exact solution and allows the direct use of well known decomposition and working set selection algorithms. Only the regression case is considered in this paper but classification has been addressed in a very similar way. This method has been successfully applied to engine pollutants emission modeling.

Keywords: Identification, Learning systems, Mixture ofExperts, Support Vector Machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365
8771 Congestion Control for Internet Media Traffic

Authors: Mohammad A. Talaat, Magdi A. Koutb, Hoda S. Sorour

Abstract:

In this paper we investigated a number of the Internet congestion control algorithms that has been developed in the last few years. It was obviously found that many of these algorithms were designed to deal with the Internet traffic merely as a train of consequent packets. Other few algorithms were specifically tailored to handle the Internet congestion caused by running media traffic that represents audiovisual content. This later set of algorithms is considered to be aware of the nature of this media content. In this context we briefly explained a number of congestion control algorithms and hence categorized them into the two following categories: i) Media congestion control algorithms. ii) Common congestion control algorithms. We hereby recommend the usage of the media congestion control algorithms for the reason of being media content-aware rather than the other common type of algorithms that blindly manipulates such traffic. We showed that the spread of such media content-aware algorithms over Internet will lead to better congestion control status in the coming years. This is due to the observed emergence of the era of digital convergence where the media traffic type will form the majority of the Internet traffic.

Keywords: Congestion Control, Media Traffic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1567
8770 Deep Web Content Mining

Authors: Shohreh Ajoudanian, Mohammad Davarpanah Jazi

Abstract:

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords: Content mining, complex matching, correlation mining, information extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2278
8769 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection

Authors: Yaojun Wang, Yaoqing Wang

Abstract:

Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.

Keywords: Case-based reasoning, decision tree, stock selection, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1705
8768 Improving the Performance of Proxy Server by Using Data Mining Technique

Authors: P. Jomsri

Abstract:

Currently, web usage make a huge data from a lot of user attention. In general, proxy server is a system to support web usage from user and can manage system by using hit rates. This research tries to improve hit rates in proxy system by applying data mining technique. The data set are collected from proxy servers in the university and are investigated relationship based on several features. The model is used to predict the future access websites. Association rule technique is applied to get the relation among Date, Time, Main Group web, Sub Group web, and Domain name for created model. The results showed that this technique can predict web content for the next day, moreover the future accesses of websites increased from 38.15% to 85.57 %. This model can predict web page access which tends to increase the efficient of proxy servers as a result. In additional, the performance of internet access will be improved and help to reduce traffic in networks.

Keywords: Association rule, proxy server, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3062
8767 Questions Categorization in E-Learning Environment Using Data Mining Technique

Authors: Vilas P. Mahatme, K. K. Bhoyar

Abstract:

Nowadays, education cannot be imagined without digital technologies. It broadens the horizons of teaching learning processes. Several universities are offering online courses. For evaluation purpose, e-examination systems are being widely adopted in academic environments. Multiple-choice tests are extremely popular. Moving away from traditional examinations to e-examination, Moodle as Learning Management Systems (LMS) is being used. Moodle logs every click that students make for attempting and navigational purposes in e-examination. Data mining has been applied in various domains including retail sales, bioinformatics. In recent years, there has been increasing interest in the use of data mining in e-learning environment. It has been applied to discover, extract, and evaluate parameters related to student’s learning performance. The combination of data mining and e-learning is still in its babyhood. Log data generated by the students during online examination can be used to discover knowledge with the help of data mining techniques. In web based applications, number of right and wrong answers of the test result is not sufficient to assess and evaluate the student’s performance. So, assessment techniques must be intelligent enough. If student cannot answer the question asked by the instructor then some easier question can be asked. Otherwise, more difficult question can be post on similar topic. To do so, it is necessary to identify difficulty level of the questions. Proposed work concentrate on the same issue. Data mining techniques in specific clustering is used in this work. This method decide difficulty levels of the question and categories them as tough, easy or moderate and later this will be served to the desire students based on their performance. Proposed experiment categories the question set and also group the students based on their performance in examination. This will help the instructor to guide the students more specifically. In short mined knowledge helps to support, guide, facilitate and enhance learning as a whole.

Keywords: Data mining, e-examination, e-learning, moodle.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2075