Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 21

Search results for: information gain

21 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 336
20 Underrepresentation of Women in Management Information Systems: Gender Differences in Key Environmental Barriers

Authors: Asli Yagmur Akbulut

Abstract:

Despite a robust and growing job market and lucrative salaries, there is a global shortage of Information Technology (IT) professionals. To make matters worse, women continue to be underrepresented in the IT workforce and among IT degree holders. In today’s knowledge based economy and society, it is extremely important to increase the presence of women in the IT field. In order to do so, it is necessary to reduce entry barriers and attract more women to pursue degrees in various IT fields including the field of Management Information Systems (MIS). Even though MIS is considered to have a more feminine nature, women still tend to avoid majoring in this field. Unfortunately, there is a lack of research that investigates the specific factors that may deter women from pursuing a degree in MIS. To address this research gap, this study examined a set of key environmental barriers that might prevent women from pursuing an MIS degree and explored whether there were any gender differences between female and male students in terms of these key barriers. Based on a survey of 280 students enrolled in an introductory level MIS course, the study empirically confirmed that there were significant differences between male and female students in terms of the key contextual barriers perceived. Female students demonstrated major concerns about gender discrimination related barriers, whereas male students were more concerned about negative social influences. Both male and female students were equally concerned about not being able to fit in well with other MIS majors. The findings have important implications for MIS programs, as the information gained can be used to design and implement specific intervention strategies to overcome the barriers and attract larger pools of women to the MIS discipline. The paper concludes with a discussion of the findings, implications, and future research directions.

Keywords: Gender differences, MIS major, underrepresentation, women in IT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 778
19 A Distributed Mobile Agent Based on Intrusion Detection System for MANET

Authors: Maad Kamal Al-Anni

Abstract:

This study is about an algorithmic dependence of Artificial Neural Network on Multilayer Perceptron (MPL) pertaining to the classification and clustering presentations for Mobile Adhoc Network vulnerabilities. Moreover, mobile ad hoc network (MANET) is ubiquitous intelligent internetworking devices in which it has the ability to detect their environment using an autonomous system of mobile nodes that are connected via wireless links. Security affairs are the most important subject in MANET due to the easy penetrative scenarios occurred in such an auto configuration network. One of the powerful techniques used for inspecting the network packets is Intrusion Detection System (IDS); in this article, we are going to show the effectiveness of artificial neural networks used as a machine learning along with stochastic approach (information gain) to classify the malicious behaviors in simulated network with respect to different IDS techniques. The monitoring agent is responsible for detection inference engine, the audit data is collected from collecting agent by simulating the node attack and contrasted outputs with normal behaviors of the framework, whenever. In the event that there is any deviation from the ordinary behaviors then the monitoring agent is considered this event as an attack , in this article we are going to demonstrate the  signature-based IDS approach in a MANET by implementing the back propagation algorithm over ensemble-based Traffic Table (TT), thus the signature of malicious behaviors or undesirable activities are often significantly prognosticated and efficiently figured out, by increasing the parametric set-up of Back propagation algorithm during the experimental results which empirically shown its effectiveness  for the ratio of detection index up to 98.6 percentage. Consequently it is proved in empirical results in this article, the performance matrices are also being included in this article with Xgraph screen show by different through puts like Packet Delivery Ratio (PDR), Through Put(TP), and Average Delay(AD).

Keywords: Mobile ad hoc network, MANET, intrusion detection system, back propagation algorithm, neural networks, traffic table, multilayer perceptron, feed-forward back-propagation, network simulator 2.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 498
18 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: Classification algorithms; data mining; tourism; knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1696
17 A Static Android Malware Detection Based on Actual Used Permissions Combination and API Calls

Authors: Xiaoqing Wang, Junfeng Wang, Xiaolan Zhu

Abstract:

Android operating system has been recognized by most application developers because of its good open-source and compatibility, which enriches the categories of applications greatly. However, it has become the target of malware attackers due to the lack of strict security supervision mechanisms, which leads to the rapid growth of malware, thus bringing serious safety hazards to users. Therefore, it is critical to detect Android malware effectively. Generally, the permissions declared in the AndroidManifest.xml can reflect the function and behavior of the application to a large extent. Since current Android system has not any restrictions to the number of permissions that an application can request, developers tend to apply more than actually needed permissions in order to ensure the successful running of the application, which results in the abuse of permissions. However, some traditional detection methods only consider the requested permissions and ignore whether it is actually used, which leads to incorrect identification of some malwares. Therefore, a machine learning detection method based on the actually used permissions combination and API calls was put forward in this paper. Meanwhile, several experiments are conducted to evaluate our methodology. The result shows that it can detect unknown malware effectively with higher true positive rate and accuracy while maintaining a low false positive rate. Consequently, the AdaboostM1 (J48) classification algorithm based on information gain feature selection algorithm has the best detection result, which can achieve an accuracy of 99.8%, a true positive rate of 99.6% and a lowest false positive rate of 0.

Keywords: Android, permissions combination, API calls, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1355
16 Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems

Authors: Bruno Trstenjak, Dzenana Donko

Abstract:

Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.

Keywords: Case based reasoning, classification, expert's knowledge, hybrid model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1006
15 Support of Knowledge Sharing in Manufacturing Companies: A Case Study

Authors: Zuzana Crhova, Karel Kolman, Drahomíra Pavelkova

Abstract:

Knowledge is considered as an important asset which can help organizations to create competitive advantage. The necessity of taking care of these assets is more important in these days – in days of turbulent changes in business environment. Knowledge could facilitate adaption to constant changes. The aim of this paper is to describe how the knowledge sharing can be supported in the manufacturing companies. The methods of case studies and grounded theory were used to present information gained by carrying out semistructured interviews. Results show that knowledge sharing is supported in very similar ways in respondent companies.

Keywords: Case Study, Human Resource Management, Knowledge, Knowledge Sharing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1869
14 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2326
13 Airfoils Aerodynamic Efficiency Study in Heavy Rain via Two Phase Flow Approach

Authors: M. Ismail, Cao Yihua, Zhao Ming

Abstract:

Heavy rainfall greatly affects the aerodynamic performance of the aircraft. There are many accidents of aircraft caused by aerodynamic efficiency degradation by heavy rain. In this Paper we have studied the heavy rain effects on the aerodynamic efficiency of NACA 64-210 & NACA 0012 airfoils. For our analysis, CFD method and preprocessing grid generator are used as our main analytical tools, and the simulation of rain is accomplished via two phase flow approach-s Discrete Phase Model (DPM). Raindrops are assumed to be non-interacting, non-deforming, non-evaporating and non-spinning spheres. Both airfoil sections exhibited significant reduction in lift and increase in drag for a given lift condition in simulated rain. The most significant difference between these two airfoils was the sensitivity of the NACA 64-210 to liquid water content (LWC), while NACA 0012 performance losses in the rain environment is not a function of LWC . It is expected that the quantitative information gained in this paper will be useful to the operational airline industry and greater effort such as small scale and full scale flight tests should put in this direction to further improve aviation safety.

Keywords: airfoil, discrete phase modeling, heavy rain, Reynolds number

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3155
12 Emotion Classification for Students with Autism in Mathematics E-learning using Physiological and Facial Expression Measures

Authors: Hui-Chuan Chu, Min-Ju Liao, Wei-Kai Cheng, William Wei-Jen Tsai, Yuh-Min Chen

Abstract:

Avoiding learning failures in mathematics e-learning environments caused by emotional problems in students with autism has become an important topic for combining of special education with information and communications technology. This study presents an adaptive emotional adjustment model in mathematics e-learning for students with autism, emphasizing the lack of emotional perception in mathematics e-learning systems. In addition, an emotion classification for students with autism was developed by inducing emotions in mathematical learning environments to record changes in the physiological signals and facial expressions of students. Using these methods, 58 emotional features were obtained. These features were then processed using one-way ANOVA and information gain (IG). After reducing the feature dimension, methods of support vector machines (SVM), k-nearest neighbors (KNN), and classification and regression trees (CART) were used to classify four emotional categories: baseline, happy, angry, and anxious. After testing and comparisons, in a situation without feature selection, the accuracy rate of the SVM classification can reach as high as 79.3-%. After using IG to reduce the feature dimension, with only 28 features remaining, SVM still has a classification accuracy of 78.2-%. The results of this research could enhance the effectiveness of eLearning in special education.

Keywords: Emotion classification, Physiological and facial Expression measures, Students with autism, Mathematics e-learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1374
11 Parkinsons Disease Classification using Neural Network and Feature Selection

Authors: Anchana Khemphila, Veera Boonjing

Abstract:

In this study, the Multi-Layer Perceptron (MLP)with Back-Propagation learning algorithm are used to classify to effective diagnosis Parkinsons disease(PD).It-s a challenging problem for medical community.Typically characterized by tremor, PD occurs due to the loss of dopamine in the brains thalamic region that results in involuntary or oscillatory movement in the body. A feature selection algorithm along with biomedical test values to diagnose Parkinson disease.Clinical diagnosis is done mostly by doctor-s expertise and experience.But still cases are reported of wrong diagnosis and treatment. Patients are asked to take number of tests for diagnosis.In many cases,not all the tests contribute towards effective diagnosis of a disease.Our work is to classify the presence of Parkinson disease with reduced number of attributes.Original,22 attributes are involved in classify.We use Information Gain to determine the attributes which reduced the number of attributes which is need to be taken from patients.The Artificial neural networks is used to classify the diagnosis of patients.Twenty-Two attributes are reduced to sixteen attributes.The accuracy is in training data set is 82.051% and in the validation data set is 83.333%.

Keywords: Data mining, classification, Parkinson disease, artificial neural networks, feature selection, information gain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2881
10 A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Authors: M. F. Zaiyadi, B. Baharudin

Abstract:

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Keywords: Ant colony optimization, feature selection, information gain, text categorization, text representation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1686
9 Information Gain Ratio Based Clustering for Investigation of Environmental Parameters Effects on Human Mental Performance

Authors: H. Mehdi, Kh. S. Karimov, A. A. Kavokin

Abstract:

Methods of clustering which were developed in the data mining theory can be successfully applied to the investigation of different kinds of dependencies between the conditions of environment and human activities. It is known, that environmental parameters such as temperature, relative humidity, atmospheric pressure and illumination have significant effects on the human mental performance. To investigate these parameters effect, data mining technique of clustering using entropy and Information Gain Ratio (IGR) K(Y/X) = (H(X)–H(Y/X))/H(Y) is used, where H(Y)=-ΣPi ln(Pi). This technique allows adjusting the boundaries of clusters. It is shown that the information gain ratio (IGR) grows monotonically and simultaneously with degree of connectivity between two variables. This approach has some preferences if compared, for example, with correlation analysis due to relatively smaller sensitivity to shape of functional dependencies. Variant of an algorithm to implement the proposed method with some analysis of above problem of environmental effects is also presented. It was shown that proposed method converges with finite number of steps.

Keywords: Clustering, Correlation analysis, EnvironmentalParameters, Information Gain Ratio, Mental Performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1500
8 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification

Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman

Abstract:

In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.

Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2333
7 Dimensionality Reduction of PSSM Matrix and its Influence on Secondary Structure and Relative Solvent Accessibility Predictions

Authors: Rafał Adamczak

Abstract:

State-of-the-art methods for secondary structure (Porter, Psi-PRED, SAM-T99sec, Sable) and solvent accessibility (Sable, ACCpro) predictions use evolutionary profiles represented by the position specific scoring matrix (PSSM). It has been demonstrated that evolutionary profiles are the most important features in the feature space for these predictions. Unfortunately applying PSSM matrix leads to high dimensional feature spaces that may create problems with parameter optimization and generalization. Several recently published suggested that applying feature extraction for the PSSM matrix may result in improvements in secondary structure predictions. However, none of the top performing methods considered here utilizes dimensionality reduction to improve generalization. In the present study, we used simple and fast methods for features selection (t-statistics, information gain) that allow us to decrease the dimensionality of PSSM matrix by 75% and improve generalization in the case of secondary structure prediction compared to the Sable server.

Keywords: Secondary structure prediction, feature selection, position specific scoring matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609
6 Oncogene Identification using Filter based Approaches between Various Cancer Types in Lung

Authors: Michael Netzer, Michael Seger, Mahesh Visvanathan, Bernhard Pfeifer, Gerald H. Lushington, Christian Baumgartner

Abstract:

Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.

Keywords: lung cancer, micro arrays, data mining, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1364
5 Evolutionary Feature Selection for Text Documents using the SVM

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, we present three feature selection methods: Information Gain, Support Vector Machine feature selection called (SVM_FS) and Genetic Algorithm with SVM (called GA_SVM). We show that the best results were obtained with GA_SVM method for a relatively small dimension of the feature vector.

Keywords: Feature Selection, Learning with Kernels, Support Vector Machine, Genetic Algorithm, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1297
4 Evaluating some Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Features selection, learning with kernels, support vector machine, genetic algorithms and classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1200
3 Bureau Management Technologies and Information Systems in Developing Countries

Authors: Mehmet Altınöz

Abstract:

This study focuses on bureau management technologies and information systems in developing countries. Developing countries use such systems which facilitate executive and organizational functions through the utilization of bureau management technologies and provide the executive staff with necessary information. The concepts of data and information differ from each other in developing countries, and thus the concepts of data processing and information processing are different. Symbols represent ideas, objects, figures, letters and numbers. Data processing system is an integrated system which deals with the processing of the data related to the internal and external environment of the organization in order to make decisions, create plans and develop strategies; it goes without saying that this system is composed of both human beings and machines. Information is obtained through the acquisition and the processing of data. On the other hand, data are raw communicative messages. Within this framework, data processing equals to producing plausible information out of raw data. Organizations in developing countries need to obtain information relevant to them because rapid changes in the organizational arena require rapid access to accurate information. The most significant role of the directors and managers who work in the organizational arena is to make decisions. Making a correct decision is possible only when the directors and managers are equipped with sound ideas and appropriate information. Therefore, acquisition, organization and distribution of information gain significance. Today-s organizations make use of computer-assisted “Management Information Systems" in order to obtain and distribute information. Decision Support System which is closely related to practice is an information system that facilitates the director-s task of making decisions. Decision Support System integrates human intelligence, information technology and software in order to solve the complex problems. With the support of the computer technology and software systems, Decision Support System produces information relevant to the decision to be made by the director and provides the executive staff with supportive ideas about the decision. Artificial Intelligence programs which transfer the studies and experiences of the people to the computer are called expert systems. An expert system stores expert information in a limited area and can solve problems by deriving rational consequences. Bureau management technologies and information systems in developing countries create a kind of information society and information economy which make those countries have their places in the global socio-economic structure and which enable them to play a reasonable and fruitful role; therefore it is of crucial importance to make use of information and management technologies in order to work together with innovative and enterprising individuals and it is also significant to create “scientific policies" based on information and technology in the fields of economy, politics, law and culture.

Keywords: Bureau Management, Information Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1134
2 Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1331
1 Towards an Enhanced Stochastic Simulation Model for Risk Analysis in Highway Construction

Authors: Anshu Manik, William G. Buttlar, Kasthurirangan Gopalakrishnan

Abstract:

Over the years, there is a growing trend towards quality-based specifications in highway construction. In many Quality Control/Quality Assurance (QC/QA) specifications, the contractor is primarily responsible for quality control of the process, whereas the highway agency is responsible for testing the acceptance of the product. A cooperative investigation was conducted in Illinois over several years to develop a prototype End-Result Specification (ERS) for asphalt pavement construction. The final characteristics of the product are stipulated in the ERS and the contractor is given considerable freedom in achieving those characteristics. The risk for the contractor or agency depends on how the acceptance limits and processes are specified. Stochastic simulation models are very useful in estimating and analyzing payment risk in ERS systems and these form an integral part of the Illinois-s prototype ERS system. This paper describes the development of an innovative methodology to estimate the variability components in in-situ density, air voids and asphalt content data from ERS projects. The information gained from this would be crucial in simulating these ERS projects for estimation and analysis of payment risks associated with asphalt pavement construction. However, these methods require at least two parties to conduct tests on all the split samples obtained according to the sampling scheme prescribed in present ERS implemented in Illinois.

Keywords: Asphalt Pavement, Risk Analysis, StochasticSimulation, QC/QA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1147