Search results for: Features selection
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2422

Search results for: Features selection

2272 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Authors: Essam Al Daoud

Abstract:

Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Keywords: Gradient boosting, XGBoost, LightGBM, CatBoost, home credit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8979
2271 Site Selection of Traffic Camera based on Dempster-Shafer and Bagging Theory

Authors: S. Rokhsari, M. Delavar, A. Sadeghi-Niaraki, A. Abed-Elmdoust, B. Moshiri

Abstract:

Traffic incident has bad effect on all parts of society so controlling road networks with enough traffic devices could help to decrease number of accidents, so using the best method for optimum site selection of these devices could help to implement good monitoring system. This paper has considered here important criteria for optimum site selection of traffic camera based on aggregation methods such as Bagging and Dempster-Shafer concepts. In the first step, important criteria such as annual traffic flow, distance from critical places such as parks that need more traffic controlling were identified for selection of important road links for traffic camera installation, Then classification methods such as Artificial neural network and Decision tree algorithms were employed for classification of road links based on their importance for camera installation. Then for improving the result of classifiers aggregation methods such as Bagging and Dempster-Shafer theories were used.

Keywords: Aggregation, Bagging theory, Dempster-Shafer theory, Site selection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1655
2270 Experimental Evaluation of Mobility Anchor Point Selection Scheme in Hierarchical Mobile IPv6

Authors: Zulkeflee Kusin, Mohamad Shanudin Zakaria

Abstract:

Hierarchical Mobile IPv6 (HMIPv6) was designed to support IP micro-mobility management in the Next Generation Networks (NGN) framework. The main design behind this protocol is the usage of Mobility Anchor Point (MAP) located at any level router of network to support hierarchical mobility management. However, the distance MAP selection in HMIPv6 causes MAP overloaded and increase frequent binding update as the network grows. Therefore, to address the issue in designing MAP selection scheme, we propose a dynamic load control mechanism integrates with a speed detection mechanism (DMS-DLC). From the experimental results we obtain that the proposed scheme gives better distribution in MAP load and increase handover speed.

Keywords: Dynamic load control, HMIPv6, Mobility AnchorPoint, MAP selection scheme

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1753
2269 Automatic Extraction of Features and Opinion-Oriented Sentences from Customer Reviews

Authors: Khairullah Khan, Baharum B. Baharudin, Aurangzeb Khan, Fazal_e_Malik

Abstract:

Opinion extraction about products from customer reviews is becoming an interesting area of research. Customer reviews about products are nowadays available from blogs and review sites. Also tools are being developed for extraction of opinion from these reviews to help the user as well merchants to track the most suitable choice of product. Therefore efficient method and techniques are needed to extract opinions from review and blogs. As reviews of products mostly contains discussion about the features, functions and services, therefore, efficient techniques are required to extract user comments about the desired features, functions and services. In this paper we have proposed a novel idea to find features of product from user review in an efficient way. Our focus in this paper is to get the features and opinion-oriented words about products from text through auxiliary verbs (AV) {is, was, are, were, has, have, had}. From the results of our experiments we found that 82% of features and 85% of opinion-oriented sentences include AVs. Thus these AVs are good indicators of features and opinion orientation in customer reviews.

Keywords: Classification, Customer Reviews, Helping Verbs, Opinion Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2042
2268 Ensembling Adaptively Constructed Polynomial Regression Models

Authors: Gints Jekabsons

Abstract:

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1631
2267 An Automatic Bayesian Classification System for File Format Selection

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Keywords: Data mining, digital libraries, digital preservation, file format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615
2266 Ranking of Inventory Policies Using Distance Based Approach Method

Authors: Gupta Amit, Kumar Ramesh, Tewari P. C.

Abstract:

Globalization is putting enormous pressure on the business organizations specially manufacturing one to rethink the supply chain in innovative manners. Inventory consumes major portion of total sale revenue. Effective and efficient inventory management plays a vital role for the successful functioning of any organization. Selection of inventory policy is one of the important purchasing activities. This paper focuses on selection and ranking of alternative inventory policies. A deterministic quantitative model based on Distance Based Approach (DBA) method has been developed for evaluation and ranking of inventory policies. We have employed this concept first time for this type of the selection problem. Four inventory policies economic order quantity (EOQ), just in time (JIT), vendor managed inventory (VMI) and monthly policy are considered. Improper selection could affect a company’s competitiveness in terms of the productivity of its facilities and quality of its products. The ranking of inventory policies is a multi-criteria problem. There is a need to first identify the selection criteria and then processes the information with reference to relative importance of attributes for comparison. Criteria values for each inventory policy can be obtained either analytically or by using a simulation technique or they are linguistic subjective judgments defined by fuzzy sets, like, for example, the values of criteria. A methodology is developed and applied to rank the inventory policies.

Keywords: Inventory Policy, Ranking, DBA, Selection criteria.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
2265 Multi-Criteria Decision-Making Selection Model with Application to Chemical Engineering Management Decisions

Authors: Mohsen Pirdashti, Arezou Ghadi, Mehrdad Mohammadi, Gholamreza Shojatalab

Abstract:

Chemical industry project management involves complex decision making situations that require discerning abilities and methods to make sound decisions. Project managers are faced with decision environments and problems in projects that are complex. In this work, case study is Research and Development (R&D) project selection. R&D is an ongoing process for forward thinking technology-based chemical industries. R&D project selection is an important task for organizations with R&D project management. It is a multi-criteria problem which includes both tangible and intangible factors. The ability to make sound decisions is very important to success of R&D projects. Multiple-criteria decision making (MCDM) approaches are major parts of decision theory and analysis. This paper presents all of MCDM approaches for use in R&D project selection. It is hoped that this work will provide a ready reference on MCDM and this will encourage the application of the MCDM by chemical engineering management.

Keywords: Chemical Engineering, R&D Project, MCDM, Selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4024
2264 Automatic Classification of the Stand-to-Sit Phase in the TUG Test Using Machine Learning

Authors: Y. A. Adla, R. Soubra, M. Kasab, M. O. Diab, A. Chkeir

Abstract:

Over the past several years, researchers have shown a great interest in assessing the mobility of elderly people to measure their functional status. Usually, such an assessment is done by conducting tests that require the subject to walk a certain distance, turn around, and finally sit back down. Consequently, this study aims to provide an at home monitoring system to assess the patient’s status continuously. Thus, we proposed a technique to automatically detect when a subject sits down while walking at home. In this study, we utilized a Doppler radar system to capture the motion of the subjects. More than 20 features were extracted from the radar signals out of which 11 were chosen based on their Intraclass Correlation Coefficient (ICC > 0.75). Accordingly, the sequential floating forward selection wrapper was applied to further narrow down the final feature vector. Finally, five features were introduced to the Linear Discriminant Analysis classifier and an accuracy of 93.75% was achieved as well as a precision and recall of 95% and 90% respectively.

Keywords: Doppler radar system, stand-to-sit phase, TUG test, machine learning, classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 379
2263 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: 'Reddit'

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell

Abstract:

Native Language Identification is one of the growing subfields in Natural Language Processing (NLP). The task of Native Language Identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL) and then the trained models are evaluated on a different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and Logistic Regression. Results show that content-based features are more accurate and robust than content independent ones when tested within corpus and across corpus.

Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 327
2262 Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms

Authors: T. S. Chou, K. K. Yen, J. Luo

Abstract:

The network traffic data provided for the design of intrusion detection always are large with ineffective information and enclose limited and ambiguous information about users- activities. We study the problems and propose a two phases approach in our intrusion detection design. In the first phase, we develop a correlation-based feature selection algorithm to remove the worthless information from the original high dimensional database. Next, we design an intrusion detection method to solve the problems of uncertainty caused by limited and ambiguous information. In the experiments, we choose six UCI databases and DARPA KDD99 intrusion detection data set as our evaluation tools. Empirical studies indicate that our feature selection algorithm is capable of reducing the size of data set. Our intrusion detection method achieves a better performance than those of participating intrusion detectors.

Keywords: Intrusion detection, feature selection, k-nearest neighbors, fuzzy clustering, Dempster-Shafer theory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881
2261 Analytic Network Process in Location Selection and Its Application to a Real Life Problem

Authors: Eylem Koç, Hasan Arda Burhan

Abstract:

Location selection presents a crucial decision problem in today’s business world where strategic decision making processes have critical importance. Thus, location selection has strategic importance for companies in boosting their strength regarding competition, increasing corporate performances and efficiency in addition to lowering production and transportation costs. A right choice in location selection has a direct impact on companies’ commercial success. In this study, a store location selection problem of Carglass Turkey which operates in vehicle glass branch is handled. As this problem includes both tangible and intangible criteria, Analytic Network Process (ANP) was accepted as the main methodology. The model consists of control hierarchy and BOCR subnetworks which include clusters of actors, alternatives and criteria. In accordance with the management’s choices, five different locations were selected. In addition to the literature review, a strict cooperation with the actor group was ensured and maintained while determining the criteria and during whole process. Obtained results were presented to the management as a report and its feasibility was confirmed accordingly.

Keywords: Analytic Network Process, BOCR, location selection, multi-actor decision making, multi-criteria decision making, real life problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2039
2260 Anomaly Detection using Neuro Fuzzy system

Authors: Fatemeh Amiri, Caro Lucas, Nasser Yazdani

Abstract:

As the network based technologies become omnipresent, demands to secure networks/systems against threat increase. One of the effective ways to achieve higher security is through the use of intrusion detection systems (IDS), which are a software tool to detect anomalous in the computer or network. In this paper, an IDS has been developed using an improved machine learning based algorithm, Locally Linear Neuro Fuzzy Model (LLNF) for classification whereas this model is originally used for system identification. A key technical challenge in IDS and LLNF learning is the curse of high dimensionality. Therefore a feature selection phase is proposed which is applicable to any IDS. While investigating the use of three feature selection algorithms, in this model, it is shown that adding feature selection phase reduces computational complexity of our model. Feature selection algorithms require the use of a feature goodness measure. The use of both a linear and a non-linear measure - linear correlation coefficient and mutual information- is investigated respectively

Keywords: anomaly Detection, feature selection, Locally Linear Neuro Fuzzy (LLNF), Mutual Information (MI), liner correlation coefficient.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2137
2259 On the Performance of Information Criteria in Latent Segment Models

Authors: Jaime R. S. Fonseca

Abstract:

Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which may support the selection of the correct number of segments we conduct a simulation study. In particular, this study is intended to determine which information criteria are more appropriate for mixture model selection when considering data sets with only categorical segmentation base variables. The generation of mixtures of multinomial data supports the proposed analysis. As a result, we establish a relationship between the level of measurement of segmentation variables and some (eleven) information criteria-s performance. The criterion AIC3 shows better performance (it indicates the correct number of the simulated segments- structure more often) when referring to mixtures of multinomial segmentation base variables.

Keywords: Quantitative Methods, Multivariate Data Analysis, Clustering, Finite Mixture Models, Information Theoretical Criteria, Simulation experiments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1478
2258 Image Retrieval: Techniques, Challenge, and Trend

Authors: Hui Hui Wang, Dzulkifli Mohamad, N.A Ismail

Abstract:

This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users- mind.

Keywords: content based image retrieval, keyword based imageretrieval, semantic gap, semantic image retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2482
2257 Performance Analysis of Software Reliability Models using Matrix Method

Authors: RajPal Garg, Kapil Sharma, Rajive Kumar, R. K. Garg

Abstract:

This paper presents a computational methodology based on matrix operations for a computer based solution to the problem of performance analysis of software reliability models (SRMs). A set of seven comparison criteria have been formulated to rank various non-homogenous Poisson process software reliability models proposed during the past 30 years to estimate software reliability measures such as the number of remaining faults, software failure rate, and software reliability. Selection of optimal SRM for use in a particular case has been an area of interest for researchers in the field of software reliability. Tools and techniques for software reliability model selection found in the literature cannot be used with high level of confidence as they use a limited number of model selection criteria. A real data set of middle size software project from published papers has been used for demonstration of matrix method. The result of this study will be a ranking of SRMs based on the Permanent value of the criteria matrix formed for each model based on the comparison criteria. The software reliability model with highest value of the Permanent is ranked at number – 1 and so on.

Keywords: Matrix method, Model ranking, Model selection, Model selection criteria, Software reliability models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2270
2256 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models

Authors: Yoonsuh Jung

Abstract:

As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an ‘optimal’ value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.

Keywords: Cross Validation, Parameter Averaging, Parameter Selection, Regularization Parameter Search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1521
2255 Improving Classification in Bayesian Networks using Structural Learning

Authors: Hong Choon Ong

Abstract:

Naïve Bayes classifiers are simple probabilistic classifiers. Classification extracts patterns by using data file with a set of labeled training examples and is currently one of the most significant areas in data mining. However, Naïve Bayes assumes the independence among the features. Structural learning among the features thus helps in the classification problem. In this study, the use of structural learning in Bayesian Network is proposed to be applied where there are relationships between the features when using the Naïve Bayes. The improvement in the classification using structural learning is shown if there exist relationship between the features or when they are not independent.

Keywords: Bayesian Network, Classification, Naïve Bayes, Structural Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2555
2254 A Hybridized Competency-Based Teacher Candidate Selection System

Authors: R. Ramli, M. I. Ghazali, H. Ibrahim, M. M. Kasim, F. M. Kamal, S.Vikneswari

Abstract:

Teachers form the backbone of any educational system, hence selecting qualified candidates is very crucial. In Malaysia, the decision making in the selection process involves a few stages: Initial filtering through academic achievement, taking entry examination and going through an interview session. The last stage is the most challenging since it highly depends on human judgment. Therefore, this study sought to identify the selection criteria for teacher candidates that form the basis for an efficient multi-criteria teacher-candidate selection model for that last stage. The relevant criteria were determined from the literature and also based on expert input that is those who were involved in interviewing teacher candidates from a public university offering the formal training program. There are three main competency criteria that were identified which are content of knowledge, communication skills and personality. Further, each main criterion was divided into a few subcriteria. The Analytical Hierarchy Process (AHP) technique was employed to allocate weights for the criteria and later, integrated a Simple Weighted Average (SWA) scoring approach to develop the selection model. Subsequently, a web-based Decision Support System was developed to assist in the process of selecting the qualified teacher candidates. The Teacher-Candidate Selection (TeCaS) system is able to assist the panel of interviewers during the selection process which involves a large amount of complex qualitative judgments.

Keywords: Analytic Hierarchy Process, Simple Weighted Average, Decision Support System, Multi-criteria decision making problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2139
2253 Ontology-based Concept Weighting for Text Documents

Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt

Abstract:

Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.

Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2362
2252 Gene Selection Guided by Feature Interdependence

Authors: Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel

Abstract:

Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.

Keywords: Colon cancer, feature interdependence, feature subset selection, gene selection, microarray data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2097
2251 Correlation-based Feature Selection using Ant Colony Optimization

Authors: M. Sadeghzadeh, M. Teshnehlab

Abstract:

Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

Keywords: Ant colony optimization, Classification, Datamining, Feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2375
2250 Dynamic Features Selection for Heart Disease Classification

Authors: Walid MOUDANI

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the Coronary Heart Disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts- knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: Multi-Classifier Decisions Tree, Features Reduction, Dynamic Programming, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2483
2249 Logistics Information and Customer Service

Authors: Š. Čemerková, M. Wilczková

Abstract:

The paper deals with the importance of information flow for providing of defined level of customer service in the firms. Setting of the criteria for the selection and implementation of logistics information system is a prerequisite for ensuring of the flow of information in firms. The decision on the selection and implementation of logistics information system is linked to the investment costs and operating costs, which are included in the total logistics costs. The article also deals with the conclusions of the research focused on the logistics information system selection in companies in the Czech Republic.

Keywords: Customer service, information system, logistics, research.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1627
2248 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 884
2247 Analysis of Initial Entry-Level Technology Course Impacts on STEM Major Selection

Authors: Ethan Shafer, Timothy Graziano, Jay Fisher

Abstract:

This research seeks to answer whether first-year courses at institutions of higher learning can impact STEM major selection. Unlike many universities, an entry-level technology course (often referred to as CS0) is required for all United States Military Academy (USMA) students–regardless of major–in their first year of attendance. Students at the Academy choose their major at the end of their first year of studies. Through student responses to a multi-semester survey, this paper identifies a number of factors that potentially influence STEM major selection. Student demographic data, pre-existing exposure and access to technology, perceptions of STEM subjects, and initial desire for a STEM major are captured before and after taking a CS0 course. An analysis of factors that contribute to student perception of STEM and major selection are presented. This work provides recommendations and suggestions for institutions currently providing or looking to provide CS0-like courses to their students.

Keywords: STEM major, STEM, pedagogy, digital literacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 130
2246 Using the Keystrokes Dynamic for Systems of Personal Security

Authors: Gláucya C. Boechat, Jeneffer C. Ferreira, Edson C. B. Carvalho

Abstract:

This paper presents a boarding on biometric authentication through the Keystrokes Dynamics that it intends to identify a person from its habitual rhythm to type in conventional keyboard. Seven done experiments: verifying amount of prototypes, threshold, features and the variation of the choice of the times of the features vector. The results show that the use of the Keystroke Dynamics is simple and efficient for personal authentication, getting optimum resulted using 90% of the features with 4.44% FRR and 0% FAR.

Keywords: Biometrics techniques, Keystroke Dynamics, patternrecognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1694
2245 A Feasible Path Selection QoS Routing Algorithm with two Constraints in Packet Switched Networks

Authors: P.S.Prakash, S.Selvan

Abstract:

Over the past several years, there has been a considerable amount of research within the field of Quality of Service (QoS) support for distributed multimedia systems. One of the key issues in providing end-to-end QoS guarantees in packet networks is determining a feasible path that satisfies a number of QoS constraints. The problem of finding a feasible path is NPComplete if number of constraints is more than two and cannot be exactly solved in polynomial time. We proposed Feasible Path Selection Algorithm (FPSA) that addresses issues with pertain to finding a feasible path subject to delay and cost constraints and it offers higher success rate in finding feasible paths.

Keywords: feasible path, multiple constraints, path selection, QoS routing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1703
2244 Genetic Algorithms and Kernel Matrix-based Criteria Combined Approach to Perform Feature and Model Selection for Support Vector Machines

Authors: A. Perolini

Abstract:

Feature and model selection are in the center of attention of many researches because of their impact on classifiers- performance. Both selections are usually performed separately but recent developments suggest using a combined GA-SVM approach to perform them simultaneously. This approach improves the performance of the classifier identifying the best subset of variables and the optimal parameters- values. Although GA-SVM is an effective method it is computationally expensive, thus a rough method can be considered. The paper investigates a joined approach of Genetic Algorithm and kernel matrix criteria to perform simultaneously feature and model selection for SVM classification problem. The purpose of this research is to improve the classification performance of SVM through an efficient approach, the Kernel Matrix Genetic Algorithm method (KMGA).

Keywords: Feature and model selection, Genetic Algorithms, Support Vector Machines, kernel matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1549
2243 Investigation on Feature Extraction and Classification of Medical Images

Authors: P. Gnanasekar, A. Nagappan, S. Sharavanan, O. Saravanan, D. Vinodkumar, T. Elayabharathi, G. Karthik

Abstract:

In this paper we present the deep study about the Bio- Medical Images and tag it with some basic extracting features (e.g. color, pixel value etc). The classification is done by using a nearest neighbor classifier with various distance measures as well as the automatic combination of classifier results. This process selects a subset of relevant features from a group of features of the image. It also helps to acquire better understanding about the image by describing which the important features are. The accuracy can be improved by increasing the number of features selected. Various types of classifications were evolved for the medical images like Support Vector Machine (SVM) which is used for classifying the Bacterial types. Ant Colony Optimization method is used for optimal results. It has high approximation capability and much faster convergence, Texture feature extraction method based on Gabor wavelets etc..

Keywords: ACO Ant Colony Optimization, Correlogram, CCM Co-Occurrence Matrix, RTS Rough-Set theory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2965