Search results for: efficient features selection

3972 An Automatic Bayesian Classification System for File Format Selection

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Keywords: Data mining, digital libraries, digital preservation, file format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615

3971 Multi-Criteria Decision-Making Selection Model with Application to Chemical Engineering Management Decisions

Authors: Mohsen Pirdashti, Arezou Ghadi, Mehrdad Mohammadi, Gholamreza Shojatalab

Abstract:

Chemical industry project management involves complex decision making situations that require discerning abilities and methods to make sound decisions. Project managers are faced with decision environments and problems in projects that are complex. In this work, case study is Research and Development (R&D) project selection. R&D is an ongoing process for forward thinking technology-based chemical industries. R&D project selection is an important task for organizations with R&D project management. It is a multi-criteria problem which includes both tangible and intangible factors. The ability to make sound decisions is very important to success of R&D projects. Multiple-criteria decision making (MCDM) approaches are major parts of decision theory and analysis. This paper presents all of MCDM approaches for use in R&D project selection. It is hoped that this work will provide a ready reference on MCDM and this will encourage the application of the MCDM by chemical engineering management.

Keywords: Chemical Engineering, R&D Project, MCDM, Selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4024

3970 Automatic Classification of the Stand-to-Sit Phase in the TUG Test Using Machine Learning

Authors: Y. A. Adla, R. Soubra, M. Kasab, M. O. Diab, A. Chkeir

Abstract:

Over the past several years, researchers have shown a great interest in assessing the mobility of elderly people to measure their functional status. Usually, such an assessment is done by conducting tests that require the subject to walk a certain distance, turn around, and finally sit back down. Consequently, this study aims to provide an at home monitoring system to assess the patient’s status continuously. Thus, we proposed a technique to automatically detect when a subject sits down while walking at home. In this study, we utilized a Doppler radar system to capture the motion of the subjects. More than 20 features were extracted from the radar signals out of which 11 were chosen based on their Intraclass Correlation Coefficient (ICC > 0.75). Accordingly, the sequential floating forward selection wrapper was applied to further narrow down the final feature vector. Finally, five features were introduced to the Linear Discriminant Analysis classifier and an accuracy of 93.75% was achieved as well as a precision and recall of 95% and 90% respectively.

Keywords: Doppler radar system, stand-to-sit phase, TUG test, machine learning, classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 379

3969 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: 'Reddit'

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell

Abstract:

Native Language Identification is one of the growing subfields in Natural Language Processing (NLP). The task of Native Language Identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL) and then the trained models are evaluated on a different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and Logistic Regression. Results show that content-based features are more accurate and robust than content independent ones when tested within corpus and across corpus.

Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 325

3968 Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms

Authors: T. S. Chou, K. K. Yen, J. Luo

Abstract:

The network traffic data provided for the design of intrusion detection always are large with ineffective information and enclose limited and ambiguous information about users- activities. We study the problems and propose a two phases approach in our intrusion detection design. In the first phase, we develop a correlation-based feature selection algorithm to remove the worthless information from the original high dimensional database. Next, we design an intrusion detection method to solve the problems of uncertainty caused by limited and ambiguous information. In the experiments, we choose six UCI databases and DARPA KDD99 intrusion detection data set as our evaluation tools. Empirical studies indicate that our feature selection algorithm is capable of reducing the size of data set. Our intrusion detection method achieves a better performance than those of participating intrusion detectors.

Keywords: Intrusion detection, feature selection, k-nearest neighbors, fuzzy clustering, Dempster-Shafer theory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881

3967 Analytic Network Process in Location Selection and Its Application to a Real Life Problem

Authors: Eylem Koç, Hasan Arda Burhan

Abstract:

Location selection presents a crucial decision problem in today’s business world where strategic decision making processes have critical importance. Thus, location selection has strategic importance for companies in boosting their strength regarding competition, increasing corporate performances and efficiency in addition to lowering production and transportation costs. A right choice in location selection has a direct impact on companies’ commercial success. In this study, a store location selection problem of Carglass Turkey which operates in vehicle glass branch is handled. As this problem includes both tangible and intangible criteria, Analytic Network Process (ANP) was accepted as the main methodology. The model consists of control hierarchy and BOCR subnetworks which include clusters of actors, alternatives and criteria. In accordance with the management’s choices, five different locations were selected. In addition to the literature review, a strict cooperation with the actor group was ensured and maintained while determining the criteria and during whole process. Obtained results were presented to the management as a report and its feasibility was confirmed accordingly.

Keywords: Analytic Network Process, BOCR, location selection, multi-actor decision making, multi-criteria decision making, real life problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2039

3966 Anomaly Detection using Neuro Fuzzy system

Authors: Fatemeh Amiri, Caro Lucas, Nasser Yazdani

Abstract:

As the network based technologies become omnipresent, demands to secure networks/systems against threat increase. One of the effective ways to achieve higher security is through the use of intrusion detection systems (IDS), which are a software tool to detect anomalous in the computer or network. In this paper, an IDS has been developed using an improved machine learning based algorithm, Locally Linear Neuro Fuzzy Model (LLNF) for classification whereas this model is originally used for system identification. A key technical challenge in IDS and LLNF learning is the curse of high dimensionality. Therefore a feature selection phase is proposed which is applicable to any IDS. While investigating the use of three feature selection algorithms, in this model, it is shown that adding feature selection phase reduces computational complexity of our model. Feature selection algorithms require the use of a feature goodness measure. The use of both a linear and a non-linear measure - linear correlation coefficient and mutual information- is investigated respectively

Keywords: anomaly Detection, feature selection, Locally Linear Neuro Fuzzy (LLNF), Mutual Information (MI), liner correlation coefficient.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2137

3965 Object Identification with Color, Texture, and Object-Correlation in CBIR System

Authors: Awais Adnan, Muhammad Nawaz, Sajid Anwar, Tamleek Ali, Muhammad Ali

Abstract:

Needs of an efficient information retrieval in recent years in increased more then ever because of the frequent use of digital information in our life. We see a lot of work in the area of textual information but in multimedia information, we cannot find much progress. In text based information, new technology of data mining and data marts are now in working that were started from the basic concept of database some where in 1960. In image search and especially in image identification, computerized system at very initial stages. Even in the area of image search we cannot see much progress as in the case of text based search techniques. One main reason for this is the wide spread roots of image search where many area like artificial intelligence, statistics, image processing, pattern recognition play their role. Even human psychology and perception and cultural diversity also have their share for the design of a good and efficient image recognition and retrieval system. A new object based search technique is presented in this paper where object in the image are identified on the basis of their geometrical shapes and other features like color and texture where object-co-relation augments this search process. To be more focused on objects identification, simple images are selected for the work to reduce the role of segmentation in overall process however same technique can also be applied for other images.

Keywords: Object correlation, Geometrical shape, Color, texture, features, contents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1981

3964 On the Performance of Information Criteria in Latent Segment Models

Authors: Jaime R. S. Fonseca

Abstract:

Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which may support the selection of the correct number of segments we conduct a simulation study. In particular, this study is intended to determine which information criteria are more appropriate for mixture model selection when considering data sets with only categorical segmentation base variables. The generation of mixtures of multinomial data supports the proposed analysis. As a result, we establish a relationship between the level of measurement of segmentation variables and some (eleven) information criteria-s performance. The criterion AIC3 shows better performance (it indicates the correct number of the simulated segments- structure more often) when referring to mixtures of multinomial segmentation base variables.

Keywords: Quantitative Methods, Multivariate Data Analysis, Clustering, Finite Mixture Models, Information Theoretical Criteria, Simulation experiments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1477

3963 Network Coding-based ARQ scheme with Overlapping Selection for Resource Limited Multicast/Broadcast Services

Authors: Jung-Hyun Kim, Jihyung Kim, Kwangjae Lim, Dong Seung Kwon

Abstract:

Network coding has recently attracted attention as an efficient technique in multicast/broadcast services. The problem of finding the optimal network coding mechanism maximizing the bandwidth efficiency is hard to solve and hard to approximate. Lots of network coding-based schemes have been suggested in the literature to improve the bandwidth efficiency, especially network coding-based automatic repeat request (NCARQ) schemes. However, existing schemes have several limitations which cause the performance degradation in resource limited systems. To improve the performance in resource limited systems, we propose NCARQ with overlapping selection (OS-NCARQ) scheme. The advantages of OS-NCARQ scheme over the traditional ARQ scheme and existing NCARQ schemes are shown through the analysis and simulations.

Keywords: ARQ, Network coding, Multicast/Broadcast services, Packet-based systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1471

3962 Image Retrieval: Techniques, Challenge, and Trend

Authors: Hui Hui Wang, Dzulkifli Mohamad, N.A Ismail

Abstract:

This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users- mind.

Keywords: content based image retrieval, keyword based imageretrieval, semantic gap, semantic image retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2482

3961 Image Segmentation Based on Graph Theoretical Approach to Improve the Quality of Image Segmentation

Authors: Deepthi Narayan, Srikanta Murthy K., G. Hemantha Kumar

Abstract:

Graph based image segmentation techniques are considered to be one of the most efficient segmentation techniques which are mainly used as time & space efficient methods for real time applications. How ever, there is need to focus on improving the quality of segmented images obtained from the earlier graph based methods. This paper proposes an improvement to the graph based image segmentation methods already described in the literature. We contribute to the existing method by proposing the use of a weighted Euclidean distance to calculate the edge weight which is the key element in building the graph. We also propose a slight modification of the segmentation method already described in the literature, which results in selection of more prominent edges in the graph. The experimental results show the improvement in the segmentation quality as compared to the methods that already exist, with a slight compromise in efficiency.

Keywords: Graph based image segmentation, threshold, Weighted Euclidean distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513

3960 A New Framework for Evaluation and Prioritization of Suppliers using a Hierarchical Fuzzy TOPSIS

Authors: Mohammad Taghi Taghavifard, Danial Mirheydari

Abstract:

This paper suggests an algorithm for the evaluation and selection of suppliers. At the beginning, all the needed materials and services used by the organization were identified and categorized with regard to their nature by ABC method. Afterwards, in order to reduce risk factors and maximize the organization's profit, purchase strategies were determined. Then, appropriate criteria were identified for primary evaluation of suppliers applying to the organization. The output of this stage was a list of suppliers qualified by the organization to participate in its tenders. Subsequently, considering a material in particular, appropriate criteria on the ordering of the mentioned material were determined, taking into account the particular materials' specifications as well as the organization's needs. Finally, for the purpose of validation and verification of the proposed model, it was applied to Mobarakeh Steel Company (MSC), the qualified suppliers of this Company are ranked by the means of a Hierarchical Fuzzy TOPSIS method. The obtained results show that the proposed algorithm is quite effective, efficient and easy to apply.

Keywords: ABC analysis, Hierarchical Fuzzy TOPSIS, Primary supplier evaluation, Purchasing strategy, supplier selection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365

3959 Performance Analysis of Software Reliability Models using Matrix Method

Authors: RajPal Garg, Kapil Sharma, Rajive Kumar, R. K. Garg

Abstract:

This paper presents a computational methodology based on matrix operations for a computer based solution to the problem of performance analysis of software reliability models (SRMs). A set of seven comparison criteria have been formulated to rank various non-homogenous Poisson process software reliability models proposed during the past 30 years to estimate software reliability measures such as the number of remaining faults, software failure rate, and software reliability. Selection of optimal SRM for use in a particular case has been an area of interest for researchers in the field of software reliability. Tools and techniques for software reliability model selection found in the literature cannot be used with high level of confidence as they use a limited number of model selection criteria. A real data set of middle size software project from published papers has been used for demonstration of matrix method. The result of this study will be a ranking of SRMs based on the Permanent value of the criteria matrix formed for each model based on the comparison criteria. The software reliability model with highest value of the Permanent is ranked at number – 1 and so on.

Keywords: Matrix method, Model ranking, Model selection, Model selection criteria, Software reliability models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2270

3958 Multiple Criteria Decision Making Analysis for Selecting and Evaluating Fighter Aircraft

Authors: C. Ardil, A. M. Pashaev, R.A. Sadiqov, P. Abdullayev

Abstract:

In this paper, multiple criteria decision making analysis technique, is presented for ranking and selection of a set of determined alternatives - fighter aircraft - which are associated with a set of decision factors. In fighter aircraft design, conflicting decision criteria, disciplines, and technologies are always involved in the design process. Multiple criteria decision making analysis techniques can be helpful to effectively deal with such situations and make wise design decisions. Multiple criteria decision making analysis theory is a systematic mathematical approach for dealing with problems which contain uncertainties in decision making. The feasibility and contributions of applying the multiple criteria decision making analysis technique in fighter aircraft selection analysis is explored. In this study, an integrated framework incorporating multiple criteria decision making analysis technique in fighter aircraft analysis is established using entropy objective weighting method. An improved integrated multiple criteria decision making analysis method is utilized to aggregate the multiple decision criteria into one composite figure of merit, which serves as an objective function in the decision process. Therefore, it is demonstrated that the suitable multiple criteria decision making analysis method with decision solution provides an effective objective function for the decision making analysis. Considering that the inherent uncertainties and the weighting factors have crucial decision impacts on the fighter aircraft evaluation, seven fighter aircraft models for the multiple design criteria in terms of the weighting factors are constructed. The proposed multiple criteria decision making analysis model is based on integrated entropy index procedure, and additive multiple criteria decision making analysis theory. Hence, the applicability of proposed technique for fighter aircraft selection problem is considered. The constructed multiple criteria decision making analysis model can provide efficient decision analysis approach for uncertainty assessment of the decision problem. Consequently, the fighter aircraft alternatives are ranked based their final evaluation scores, and sensitivity analysis is conducted.

Keywords: Fighter Aircraft, Fighter Aircraft Selection, Multiple Criteria Decision Making, Multiple Criteria Decision Making Analysis, MCDMA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 566

3957 A Proposed Optimized and Efficient Intrusion Detection System for Wireless Sensor Network

Authors: Abdulaziz Alsadhan, Naveed Khan

Abstract:

In recent years intrusions on computer network are the major security threat. Hence, it is important to impede such intrusions. The hindrance of such intrusions entirely relies on its detection, which is primary concern of any security tool like Intrusion detection system (IDS). Therefore, it is imperative to accurately detect network attack. Numerous intrusion detection techniques are available but the main issue is their performance. The performance of IDS can be improved by increasing the accurate detection rate and reducing false positive. The existing intrusion detection techniques have the limitation of usage of raw dataset for classification. The classifier may get jumble due to redundancy, which results incorrect classification. To minimize this problem, Principle component analysis (PCA), Linear Discriminant Analysis (LDA) and Local Binary Pattern (LBP) can be applied to transform raw features into principle features space and select the features based on their sensitivity. Eigen values can be used to determine the sensitivity. To further classify, the selected features greedy search, back elimination, and Particle Swarm Optimization (PSO) can be used to obtain a subset of features with optimal sensitivity and highest discriminatory power. This optimal feature subset is used to perform classification. For classification purpose, Support Vector Machine (SVM) and Multilayer Perceptron (MLP) are used due to its proven ability in classification. The Knowledge Discovery and Data mining (KDD’99) cup dataset was considered as a benchmark for evaluating security detection mechanisms. The proposed approach can provide an optimal intrusion detection mechanism that outperforms the existing approaches and has the capability to minimize the number of features and maximize the detection rates.

Keywords: Particle Swarm Optimization (PSO), Principle component analysis (PCA), Linear Discriminant Analysis (LDA), Local Binary Pattern (LBP), Support Vector Machine (SVM), Multilayer Perceptron (MLP).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2706

3956 Energy Management Techniques in Mobile Robots

Authors: G. Gurguze, I. Turkoglu

Abstract:

Today, the developing features of technological tools with limited energy resources have made it necessary to use energy efficiently. Energy management techniques have emerged for this purpose. As with every field, energy management is vital for robots that are being used in many areas from industry to daily life and that are thought to take up more spaces in the future. Particularly, effective power management in autonomous and multi robots, which are getting more complicated and increasing day by day, will improve the performance and success. In this study, robot management algorithms, usage of renewable and hybrid energy sources, robot motion patterns, robot designs, sharing strategies of workloads in multiple robots, road and mission planning algorithms are discussed for efficient use of energy resources by mobile robots. These techniques have been evaluated in terms of efficient use of existing energy resources and energy management in robots.

Keywords: Energy management, mobile robot, robot administration, robot management, robot planning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1489

3955 Improving Classification in Bayesian Networks using Structural Learning

Authors: Hong Choon Ong

Abstract:

Naïve Bayes classifiers are simple probabilistic classifiers. Classification extracts patterns by using data file with a set of labeled training examples and is currently one of the most significant areas in data mining. However, Naïve Bayes assumes the independence among the features. Structural learning among the features thus helps in the classification problem. In this study, the use of structural learning in Bayesian Network is proposed to be applied where there are relationships between the features when using the Naïve Bayes. The improvement in the classification using structural learning is shown if there exist relationship between the features or when they are not independent.

Keywords: Bayesian Network, Classification, Naïve Bayes, Structural Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2554

3954 Design and Fabrication of a Scaffold with Appropriate Features for Cartilage Tissue Engineering

Authors: S. S. Salehi, A. Shamloo

Abstract:

Poor ability of cartilage tissue when experiencing a damage leads scientists to use tissue engineering as a reliable and effective method for regenerating or replacing damaged tissues. An artificial tissue should have some features such as biocompatibility, biodegradation and, enough mechanical properties like the original tissue. In this work, a composite hydrogel is prepared by using natural and synthetic materials that has high porosity. Mechanical properties of different combinations of polymers such as modulus of elasticity were tested, and a hydrogel with good mechanical properties was selected. Bone marrow derived mesenchymal stem cells were also seeded into the pores of the sponge, and the results showed the adhesion and proliferation of cells within the hydrogel after one month. In comparison with previous works, this study offers a new and efficient procedure for the fabrication of cartilage like tissue and further cartilage repair.

Keywords: Cartilage tissue engineering, hydrogel, mechanical strength, mesenchymal stem cell.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1250

3953 Centre Of Mass Selection Operator Based Meta-Heuristic For Unbounded Knapsack Problem

Authors: D.Venkatesan, K.Kannan, S. Raja Balachandar

Abstract:

In this paper a new Genetic Algorithm based on a heuristic operator and Centre of Mass selection operator (CMGA) is designed for the unbounded knapsack problem(UKP), which is NP-Hard combinatorial optimization problem. The proposed genetic algorithm is based on a heuristic operator, which utilizes problem specific knowledge. This center of mass operator when combined with other Genetic Operators forms a competitive algorithm to the existing ones. Computational results show that the proposed algorithm is capable of obtaining high quality solutions for problems of standard randomly generated knapsack instances. Comparative study of CMGA with simple GA in terms of results for unbounded knapsack instances of size up to 200 show the superiority of CMGA. Thus CMGA is an efficient tool of solving UKP and this algorithm is competitive with other Genetic Algorithms also.

Keywords: Genetic Algorithm, Unbounded Knapsack Problem, Combinatorial Optimization, Meta-Heuristic, Center of Mass

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1647

3952 Ontology-based Concept Weighting for Text Documents

Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt

Abstract:

Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.

Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2362

3951 Gene Selection Guided by Feature Interdependence

Authors: Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel

Abstract:

Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.

Keywords: Colon cancer, feature interdependence, feature subset selection, gene selection, microarray data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2096

3950 Correlation-based Feature Selection using Ant Colony Optimization

Authors: M. Sadeghzadeh, M. Teshnehlab

Abstract:

Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

Keywords: Ant colony optimization, Classification, Datamining, Feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2375

3949 Dynamic Features Selection for Heart Disease Classification

Authors: Walid MOUDANI

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the Coronary Heart Disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts- knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: Multi-Classifier Decisions Tree, Features Reduction, Dynamic Programming, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2482

3948 Bangla Vowel Characterization Based on Analysis by Synthesis

Authors: Syed Akhter Hossain, M. Lutfar Rahman, Farruk Ahmed

Abstract:

Bangla Vowel characterization determines the spectral properties of Bangla vowels for efficient synthesis as well as recognition of Bangla vowels. In this paper, Bangla vowels in isolated word have been analyzed based on speech production model within the framework of Analysis-by-Synthesis. This has led to the extraction of spectral parameters for the production model in order to produce different Bangla vowel sounds. The real and synthetic spectra are compared and a weighted square error has been computed along with the error in the formant bandwidths for efficient representation of Bangla vowels. The extracted features produced good representation of targeted Bangla vowel. Such a representation also plays essential role in low bit rate speech coding and vocoders.

Keywords: Speech, vowel, formant, synthesis, spectrum, LPC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2329

3947 Logistics Information and Customer Service

Authors: Š. Čemerková, M. Wilczková

Abstract:

The paper deals with the importance of information flow for providing of defined level of customer service in the firms. Setting of the criteria for the selection and implementation of logistics information system is a prerequisite for ensuring of the flow of information in firms. The decision on the selection and implementation of logistics information system is linked to the investment costs and operating costs, which are included in the total logistics costs. The article also deals with the conclusions of the research focused on the logistics information system selection in companies in the Czech Republic.

Keywords: Customer service, information system, logistics, research.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1627

3946 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: Bioassay, machine learning, preprocessing, virtual screen.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 940

3945 Slovenian Text-to-Speech Synthesis for Speech User Interfaces

Authors: Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešić, Mario Žganec, Stanislav Gruden

Abstract:

The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.

Keywords: text-to-speech synthesis, prosody modeling, speech user interface.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1401

3944 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 882

3943 Analysis of Initial Entry-Level Technology Course Impacts on STEM Major Selection

Authors: Ethan Shafer, Timothy Graziano, Jay Fisher

Abstract:

This research seeks to answer whether first-year courses at institutions of higher learning can impact STEM major selection. Unlike many universities, an entry-level technology course (often referred to as CS0) is required for all United States Military Academy (USMA) students–regardless of major–in their first year of attendance. Students at the Academy choose their major at the end of their first year of studies. Through student responses to a multi-semester survey, this paper identifies a number of factors that potentially influence STEM major selection. Student demographic data, pre-existing exposure and access to technology, perceptions of STEM subjects, and initial desire for a STEM major are captured before and after taking a CS0 course. An analysis of factors that contribute to student perception of STEM and major selection are presented. This work provides recommendations and suggestions for institutions currently providing or looking to provide CS0-like courses to their students.

Keywords: STEM major, STEM, pedagogy, digital literacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 129