Search results for: evolutionary tree

389 A New DIDS Design Based on a Combination Feature Selection Approach

Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman

Abstract:

Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.

Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1884

388 Granulation using Clustering and Rough Set Theory and its Tree Representation

Authors: Girish Kumar Singh, Sonajharia Minz

Abstract:

Granular computing deals with representation of information in the form of some aggregates and related methods for transformation and analysis for problem solving. A granulation scheme based on clustering and Rough Set Theory is presented with focus on structured conceptualization of information has been presented in this paper. Experiments for the proposed method on four labeled data exhibit good result with reference to classification problem. The proposed granulation technique is semi-supervised imbibing global as well as local information granulation. To represent the results of the attribute oriented granulation a tree structure is proposed in this paper.

Keywords: Granular computing, clustering, Rough sets, datamining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1676

387 Evolutionary Origin of the αC Helix in Integrins

Authors: B. Chouhan, A. Denesyuk, J. Heino, M. S. Johnson, K. Denessiouk

Abstract:

Integrins are a large family of multidomain α/β cell signaling receptors. Some integrins contain an additional inserted I domain, whose earliest expression appears to be with the chordates, since they are observed in the urochordates Ciona intestinalis (vase tunicate) and Halocynthia roretzi (sea pineapple), but not in integrins of earlier diverging species. The domain-s presence is viewed as a hallmark of integrins of higher metazoans, however in vertebrates, there are clearly three structurally-different classes: integrins without I domains, and two groups of integrins with I domains but separable by the presence or absence of an additional αC helix. For example, the αI domains in collagen-binding integrins from Osteichthyes (bony fish) and all higher vertebrates contain the specific αC helix, whereas the αI domains in non-collagen binding integrins from vertebrates and the αI domains from earlier diverging urochordate integrins, i.e. tunicates, do not. Unfortunately, within the early chordates, there is an evolutionary gap due to extinctions between the tunicates and cartilaginous fish. This, coupled with a knowledge gap due to the lack of complete genomic data from surviving species, means that the origin of collagen-binding αC-containing αI domains remains unknown. Here, we analyzed two available genomes from Callorhinchus milii (ghost shark/elephant shark; Chondrichthyes – cartilaginous fish) and Petromyzon marinus (sea lamprey; Agnathostomata), and several available Expression Sequence Tags from two Chondrichthyes species: Raja erinacea (little skate) and Squalus acanthias (dogfish shark); and Eptatretus burgeri (inshore hagfish; Agnathostomata), which evolutionary reside between the urochordates and osteichthyes. In P. marinus, we observed several fragments coding for the αC-containing αI domain, allowing us to shed more light on the evolution of the collagen-binding integrins.

Keywords: Integrin αI domain, integrin evolution, collagen binding, structure, αC helix

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3628

386 A Memetic Algorithm for an Energy-Costs-Aware Flexible Job-Shop Scheduling Problem

Authors: Christian Böning, Henrik Prinzhorn, Eric C. Hund, Malte Stonis

Abstract:

In this article, the flexible job-shop scheduling problem is extended by consideration of energy costs which arise owing to the power peak, and further decision variables such as work in process and throughput time are incorporated into the objective function. This enables a production plan to be simultaneously optimized in respect of the real arising energy and logistics costs. The energy-costs-aware flexible job-shop scheduling problem (EFJSP) which arises is described mathematically, and a memetic algorithm (MA) is presented as a solution. In the MA, the evolutionary process is supplemented with a local search. Furthermore, repair procedures are used in order to rectify any infeasible solutions that have arisen in the evolutionary process. The potential for lowering the real arising costs of a production plan through consideration of energy consumption levels is highlighted.

Keywords: Energy costs, flexible job-shop scheduling, memetic algorithm, power peak.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1072

385 Attacks and Counter Measures in BST Overlay Structure of Peer-To-Peer System

Authors: Guruprasad Khataniar, Hitesh Tahbildar, Prakriti Prava Das

Abstract:

There are various overlay structures that provide efficient and scalable solutions for point and range query in a peer-topeer network. Overlay structure based on m-Binary Search Tree (BST) is one such popular technique. It deals with the division of the tree into different key intervals and then assigning the key intervals to a BST. The popularity of the BST makes this overlay structure vulnerable to different kinds of attacks. Here we present four such possible attacks namely index poisoning attack, eclipse attack, pollution attack and syn flooding attack. The functionality of BST is affected by these attacks. We also provide different security techniques that can be applied against these attacks.

Keywords: BST, eclipse attack, index poisoning attack, pollution attack, syn flooding attack.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1587

384 Quad Tree Decomposition Based Analysis of Compressed Image Data Communication for Lossy and Lossless Using WSN

Authors: N. Muthukumaran, R. Ravi

Abstract:

The Quad Tree Decomposition based performance analysis of compressed image data communication for lossy and lossless through wireless sensor network is presented. Images have considerably higher storage requirement than text. While transmitting a multimedia content there is chance of the packets being dropped due to noise and interference. At the receiver end the packets that carry valuable information might be damaged or lost due to noise, interference and congestion. In order to avoid the valuable information from being dropped various retransmission schemes have been proposed. In this proposed scheme QTD is used. QTD is an image segmentation method that divides the image into homogeneous areas. In this proposed scheme involves analysis of parameters such as compression ratio, peak signal to noise ratio, mean square error, bits per pixel in compressed image and analysis of difficulties during data packet communication in Wireless Sensor Networks. By considering the above, this paper is to use the QTD to improve the compression ratio as well as visual quality and the algorithm in MATLAB 7.1 and NS2 Simulator software tool.

Keywords: Image compression, Compression Ratio, Quad tree decomposition, Wireless sensor networks, NS2 simulator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2350

383 An Intelligent Human-Computer Interaction System for Decision Support

Authors: Chee Siong Teh, Chee Peng Lim

Abstract:

This paper proposes a novel architecture for developing decision support systems. Unlike conventional decision support systems, the proposed architecture endeavors to reveal the decision-making process such that humans' subjectivity can be incorporated into a computerized system and, at the same time, to preserve the capability of the computerized system in processing information objectively. A number of techniques used in developing the decision support system are elaborated to make the decisionmarking process transparent. These include procedures for high dimensional data visualization, pattern classification, prediction, and evolutionary computational search. An artificial data set is first employed to compare the proposed approach with other methods. A simulated handwritten data set and a real data set on liver disease diagnosis are then employed to evaluate the efficacy of the proposed approach. The results are analyzed and discussed. The potentials of the proposed architecture as a useful decision support system are demonstrated.

Keywords: Interactive evolutionary computation, multivariate data projection, pattern classification, topographic map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1416

382 Evolutionary Computation Technique for Solving Riccati Differential Equation of Arbitrary Order

Authors: Raja Muhammad Asif Zahoor, Junaid Ali Khan, I. M. Qureshi

Abstract:

In this article an evolutionary technique has been used for the solution of nonlinear Riccati differential equations of fractional order. In this method, genetic algorithm is used as a tool for the competent global search method hybridized with active-set algorithm for efficient local search. The proposed method has been successfully applied to solve the different forms of Riccati differential equations. The strength of proposed method has in its equal applicability for the integer order case, as well as, fractional order case. Comparison of the method has been made with standard numerical techniques as well as the analytic solutions. It is found that the designed method can provide the solution to the equation with better accuracy than its counterpart deterministic approaches. Another advantage of the given approach is to provide results on entire finite continuous domain unlike other numerical methods which provide solutions only on discrete grid of points.

Keywords: Riccati Equation, Non linear ODE, Fractional differential equation, Genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1898

381 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining

Authors: Hina Kausher, Sangita Srivastava

Abstract:

In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which cover the variety of figure proportions in both height and girth. 3,000 data have been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from the some states of India to produce the sizing system suitable for clothing manufacture and retailing. The data are used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from the large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.

Keywords: Anthropometric data, data mining, decision tree, garments manufacturing, ready-made garments, sizing systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 884

380 Artificial Neural Network Development by means of Genetic Programming with Graph Codification

Authors: Daniel Rivero, Julián Dorado, Juan R. Rabuñal, Alejandro Pazos, Javier Pereira

Abstract:

The development of Artificial Neural Networks (ANNs) is usually a slow process in which the human expert has to test several architectures until he finds the one that achieves best results to solve a certain problem. This work presents a new technique that uses Genetic Programming (GP) for automatically generating ANNs. To do this, the GP algorithm had to be changed in order to work with graph structures, so ANNs can be developed. This technique also allows the obtaining of simplified networks that solve the problem with a small group of neurons. In order to measure the performance of the system and to compare the results with other ANN development methods by means of Evolutionary Computation (EC) techniques, several tests were performed with problems based on some of the most used test databases. The results of those comparisons show that the system achieves good results comparable with the already existing techniques and, in most of the cases, they worked better than those techniques.

Keywords: Artificial Neural Networks, Evolutionary Computation, Genetic Programming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1420

379 Predictive Modelling Techniques in Sediment Yield and Hydrological Modelling

Authors: Adesoji T. Jaiyeola, Josiah Adeyemo

Abstract:

This paper presents an extensive review of literature relevant to the modelling techniques adopted in sediment yield and hydrological modelling. Several studies relating to sediment yield are discussed. Many research areas of sedimentation in rivers, runoff and reservoirs are presented. Different types of hydrological models, different methods employed in selecting appropriate models for different case studies are analysed. Applications of evolutionary algorithms and artificial intelligence techniques are discussed and compared especially in water resources management and modelling. This review concentrates on Genetic Programming (GP) and fully discusses its theories and applications. The successful applications of GP as a soft computing technique were reviewed in sediment modelling. Some fundamental issues such as benchmark, generalization ability, bloat, over-fitting and other open issues relating to the working principles of GP are highlighted. This paper concludes with the identification of some research gaps in hydrological modelling and sediment yield.

Keywords: Artificial intelligence, evolutionary algorithm, genetic programming, sediment yield.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1818

378 Detecting Remote Protein Evolutionary Relationships via String Scoring Method

Authors: Nazar Zaki, Safaai Deris

Abstract:

The amount of the information being churned out by the field of biology has jumped manifold and now requires the extensive use of computer techniques for the management of this information. The predominance of biological information such as protein sequence similarity in the biological information sea is key information for detecting protein evolutionary relationship. Protein sequence similarity typically implies homology, which in turn may imply structural and functional similarities. In this work, we propose, a learning method for detecting remote protein homology. The proposed method uses a transformation that converts protein sequence into fixed-dimensional representative feature vectors. Each feature vector records the sensitivity of a protein sequence to a set of amino acids substrings generated from the protein sequences of interest. These features are then used in conjunction with support vector machines for the detection of the protein remote homology. The proposed method is tested and evaluated on two different benchmark protein datasets and it-s able to deliver improvements over most of the existing homology detection methods.

Keywords: Protein homology detection; support vectormachine; string kernel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1345

377 Low Computational Image Compression Scheme based on Absolute Moment Block Truncation Coding

Authors: K.Somasundaram, I.Kaspar Raj

Abstract:

In this paper we have proposed three and two stage still gray scale image compressor based on BTC. In our schemes, we have employed a combination of four techniques to reduce the bit rate. They are quad tree segmentation, bit plane omission, bit plane coding using 32 visual patterns and interpolative bit plane coding. The experimental results show that the proposed schemes achieve an average bit rate of 0.46 bits per pixel (bpp) for standard gray scale images with an average PSNR value of 30.25, which is better than the results from the exiting similar methods based on BTC.

Keywords: Bit plane, Block Truncation Coding, Image compression, lossy compression, quad tree segmentation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1709

376 ORPP with MAIEP Based Technique for Loadability Enhancement

Authors: Norziana Aminudin, Titik Khawa Abdul Rahman, Ismail Musirin

Abstract:

One of the factors to maintain system survivability is the adequate reactive power support to the system. Lack of reactive power support may cause undesirable voltage decay leading to total system instability. Thus, appropriate reactive power support scheme should be arranged in order to maintain system stability. The strength of a system capacity is normally denoted as system loadability. This paper presents the enhancement of system loadability through optimal reactive power planning technique using a newly developed optimization technique, termed as Multiagent Immune Evolutionary Programming (MAIEP). The concept of MAIEP is developed based on the combination of Multiagent System (MAS), Artificial Immune System (AIS) and Evolutionary Programming (EP). In realizing the effectiveness of the proposed technique, validation is conducted on the IEEE-26-Bus Reliability Test System. The results obtained from pre-optimization and post-optimization process were compared which eventually revealed the merit of MAIEP.

Keywords: Load margin, MAIEP, Maximum loading point, ORPP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1458

375 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 670

374 Discovering Complex Regularities: from Tree to Semi-Lattice Classifications

Authors: A. Faro, D. Giordano, F. Maiorana

Abstract:

Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optimize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is able to automatically suggest a strategy to optimize the number of classes optimization, but also support both tree classifications and semi-lattice organizations of the classes to give to the users the possibility of passing from one class to the ones with which it has some aspects in common. Examples of using tree and semi-lattice classifications are given to illustrate advantages and problems. The tool is applied to classify macroeconomic data that report the most developed countries- import and export. It is possible to classify the countries based on their economic behaviour and use the tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation. Possible interrelationships between the classes and their meaning are also discussed.

Keywords: Unsupervised classification, Kohonen networks, macroeconomics, Visual data mining, Cluster interpretation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506

373 Machine Learning Techniques for Short-Term Rain Forecasting System in the Northeastern Part of Thailand

Authors: Lily Ingsrisawang, Supawadee Ingsriswang, Saisuda Somchit, Prasert Aungsuratana, Warawut Khantiyanan

Abstract:

This paper presents the methodology from machine learning approaches for short-term rain forecasting system. Decision Tree, Artificial Neural Network (ANN), and Support Vector Machine (SVM) were applied to develop classification and prediction models for rainfall forecasts. The goals of this presentation are to demonstrate (1) how feature selection can be used to identify the relationships between rainfall occurrences and other weather conditions and (2) what models can be developed and deployed for predicting the accurate rainfall estimates to support the decisions to launch the cloud seeding operations in the northeastern part of Thailand. Datasets collected during 2004-2006 from the Chalermprakiat Royal Rain Making Research Center at Hua Hin, Prachuap Khiri khan, the Chalermprakiat Royal Rain Making Research Center at Pimai, Nakhon Ratchasima and Thai Meteorological Department (TMD). A total of 179 records with 57 features was merged and matched by unique date. There are three main parts in this work. Firstly, a decision tree induction algorithm (C4.5) was used to classify the rain status into either rain or no-rain. The overall accuracy of classification tree achieves 94.41% with the five-fold cross validation. The C4.5 algorithm was also used to classify the rain amount into three classes as no-rain (0-0.1 mm.), few-rain (0.1- 10 mm.), and moderate-rain (>10 mm.) and the overall accuracy of classification tree achieves 62.57%. Secondly, an ANN was applied to predict the rainfall amount and the root mean square error (RMSE) were used to measure the training and testing errors of the ANN. It is found that the ANN yields a lower RMSE at 0.171 for daily rainfall estimates, when compared to next-day and next-2-day estimation. Thirdly, the ANN and SVM techniques were also used to classify the rain amount into three classes as no-rain, few-rain, and moderate-rain as above. The results achieved in 68.15% and 69.10% of overall accuracy of same-day prediction for the ANN and SVM models, respectively. The obtained results illustrated the comparison of the predictive power of different methods for rainfall estimation.

Keywords: Machine learning, decision tree, artificial neural network, support vector machine, root mean square error.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3178

372 Enhanced-Delivery Overlay Multicasting Scheme by Optimizing Bandwidth and Latency Discrepancy Ratios

Authors: Omar F. Hamad, T. Marwala

Abstract:

With optimized bandwidth and latency discrepancy ratios, Node Gain Scores (NGSs) are determined and used as a basis for shaping the max-heap overlay. The NGSs - determined as the respective bandwidth-latency-products - govern the construction of max-heap-form overlays. Each NGS is earned as a synergy of discrepancy ratio of the bandwidth requested with respect to the estimated available bandwidth, and latency discrepancy ratio between the nodes and the source node. The tree leads to enhanceddelivery overlay multicasting – increasing packet delivery which could, otherwise, be hindered by induced packet loss occurring in other schemes not considering the synergy of these parameters on placing the nodes on the overlays. The NGS is a function of four main parameters – estimated available bandwidth, Ba; individual node's requested bandwidth, Br; proposed node latency to its prospective parent (Lp); and suggested best latency as advised by source node (Lb). Bandwidth discrepancy ratio (BDR) and latency discrepancy ratio (LDR) carry weights of α and (1,000 - α ) , respectively, with arbitrary chosen α ranging between 0 and 1,000 to ensure that the NGS values, used as node IDs, maintain a good possibility of uniqueness and balance between the most critical factor between the BDR and the LDR. A max-heap-form tree is constructed with assumption that all nodes possess NGS less than the source node. To maintain a sense of load balance, children of each level's siblings are evenly distributed such that a node can not accept a second child, and so on, until all its siblings able to do so, have already acquired the same number of children. That is so logically done from left to right in a conceptual overlay tree. The records of the pair-wise approximate available bandwidths as measured by a pathChirp scheme at individual nodes are maintained. Evaluation measures as compared to other schemes – Bandwidth Aware multicaSt architecturE (BASE), Tree Building Control Protocol (TBCP), and Host Multicast Tree Protocol (HMTP) - have been conducted. This new scheme generally performs better in terms of trade-off between packet delivery ratio; link stress; control overhead; and end-to-end delays.

Keywords: Overlay multicast, Available bandwidth, Max-heapform overlay, Induced packet loss, Bandwidth-latency product, Node Gain Score (NGS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532

371 Comparison of CPW Fed Microstrip Patch Antennas with Varied Ground Structures for Fixed Satellite Applications

Authors: Deepanshu Kaushal, T. Shanmuganantham

Abstract:

This paper draws a comparison between two microstrip patch antennas having different ground structures. The designs utilize 45 mm x 40 mm x 1.6 mm FR4 epoxy substrate (relative permittivity of 4.4 and dielectric loss tangent of 0.02) and CPW feeding technique. The design 1 uses conducting partial ground plates along the two sides of the radiating X’mas tree shaped patch. The design 2 utilizes an X’mas tree shaped slotted ground structure that features a circular radiating patch. A comparative analysis of results of both designs has been carried. The two designs are intended to serve the fixed satellite applications in X and Ku band respectively.

Keywords: CPW feed, partial ground structures, slotted ground structures, fixed satellite applications.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 719

370 Comparative Study - Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast

Authors: Nabilah Filzah Mohd Radzuan, Andi Putra, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Precipitation forecast is important in avoid incident of natural disaster which can cause loss in involved area. This review paper involves three techniques from artificial intelligence namely logistic regression, decisions tree, and random forest which used in making precipitation forecast. These combination techniques through VAR model in finding advantages and strength for every technique in forecast process. Data contains variables from rain domain. Adaptation of artificial intelligence techniques involved on rain domain enables the process to be easier and systematic for precipitation forecast.

Keywords: Logistic regression, decisions tree, random forest, VAR model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1994

369 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2436

368 Genetic Programming: Principles, Applications and Opportunities for Hydrological Modelling

Authors: Oluwaseun K. Oyebode, Josiah A. Adeyemo

Abstract:

Hydrological modelling plays a crucial role in the planning and management of water resources, most especially in water stressed regions where the need to effectively manage the available water resources is of critical importance. However, due to the complex, nonlinear and dynamic behaviour of hydro-climatic interactions, achieving reliable modelling of water resource systems and accurate projection of hydrological parameters are extremely challenging. Although a significant number of modelling techniques (process-based and data-driven) have been developed and adopted in that regard, the field of hydrological modelling is still considered as one that has sluggishly progressed over the past decades. This is majorly as a result of the identification of some degree of uncertainty in the methodologies and results of techniques adopted. In recent times, evolutionary computation (EC) techniques have been developed and introduced in response to the search for efficient and reliable means of providing accurate solutions to hydrological related problems. This paper presents a comprehensive review of the underlying principles, methodological needs and applications of a promising evolutionary computation modelling technique – genetic programming (GP). It examines the specific characteristics of the technique which makes it suitable to solving hydrological modelling problems. It discusses the opportunities inherent in the application of GP in water related-studies such as rainfall estimation, rainfall-runoff modelling, streamflow forecasting, sediment transport modelling, water quality modelling and groundwater modelling among others. Furthermore, the means by which such opportunities could be harnessed in the near future are discussed. In all, a case for total embracement of GP and its variants in hydrological modelling studies is made so as to put in place strategies that would translate into achieving meaningful progress as it relates to modelling of water resource systems, and also positively influence decision-making by relevant stakeholders.

Keywords: Computational modelling, evolutionary algorithms, genetic programming, hydrological modelling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3287

367 Dimensionality Reduction of PSSM Matrix and its Influence on Secondary Structure and Relative Solvent Accessibility Predictions

Authors: Rafał Adamczak

Abstract:

State-of-the-art methods for secondary structure (Porter, Psi-PRED, SAM-T99sec, Sable) and solvent accessibility (Sable, ACCpro) predictions use evolutionary profiles represented by the position specific scoring matrix (PSSM). It has been demonstrated that evolutionary profiles are the most important features in the feature space for these predictions. Unfortunately applying PSSM matrix leads to high dimensional feature spaces that may create problems with parameter optimization and generalization. Several recently published suggested that applying feature extraction for the PSSM matrix may result in improvements in secondary structure predictions. However, none of the top performing methods considered here utilizes dimensionality reduction to improve generalization. In the present study, we used simple and fast methods for features selection (t-statistics, information gain) that allow us to decrease the dimensionality of PSSM matrix by 75% and improve generalization in the case of secondary structure prediction compared to the Sable server.

Keywords: Secondary structure prediction, feature selection, position specific scoring matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1889

366 Primer Design with Specific PCR Product using Particle Swarm Optimization

Authors: Cheng-Hong Yang, Yu-Huei Cheng, Hsueh-Wei Chang, Li-Yeh Chuang

Abstract:

Before performing polymerase chain reactions (PCR), a feasible primer set is required. Many primer design methods have been proposed for design a feasible primer set. However, the majority of these methods require a relatively long time to obtain an optimal solution since large quantities of template DNA need to be analyzed. Furthermore, the designed primer sets usually do not provide a specific PCR product. In recent years, evolutionary computation has been applied to PCR primer design and yielded promising results. In this paper, a particle swarm optimization (PSO) algorithm is proposed to solve primer design problems associated with providing a specific product for PCR experiments. A test set of the gene CYP1A1, associated with a heightened lung cancer risk was analyzed and the comparison of accuracy and running time with the genetic algorithm (GA) and memetic algorithm (MA) was performed. A comparison of results indicated that the proposed PSO method for primer design finds optimal or near-optimal primer sets and effective PCR products in a relatively short time.

Keywords: polymerase chain reaction (PCR), primer design, evolutionary computation, particle swarm optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1817

365 CBCTL: A Reasoning System of TemporalEpistemic Logic with Communication Channel

Authors: Suguru Yoshioka, Satoshi Tojo

Abstract:

This paper introduces a temporal epistemic logic CBCTL that updates agent-s belief states through communications in them, based on computational tree logic (CTL). In practical environments, communication channels between agents may not be secure, and in bad cases agents might suffer blackouts. In this study, we provide inform* protocol based on ACL of FIPA, and declare the presence of secure channels between two agents, dependent on time. Thus, the belief state of each agent is updated along with the progress of time. We show a prover, that is a reasoning system for a given formula in a given a situation of an agent ; if it is directly provable or if it could be validated through the chains of communications, the system returns the proof.

Keywords: communication channel, computational tree logic, reasoning system, temporal epistemic logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1199

364 Proposing a Pareto-based Multi-Objective Evolutionary Algorithm to Flexible Job Shop Scheduling Problem

Authors: Seyed Habib A. Rahmati

Abstract:

During last decades, developing multi-objective evolutionary algorithms for optimization problems has found considerable attention. Flexible job shop scheduling problem, as an important scheduling optimization problem, has found this attention too. However, most of the multi-objective algorithms that are developed for this problem use nonprofessional approaches. In another words, most of them combine their objectives and then solve multi-objective problem through single objective approaches. Of course, except some scarce researches that uses Pareto-based algorithms. Therefore, in this paper, a new Pareto-based algorithm called controlled elitism non-dominated sorting genetic algorithm (CENSGA) is proposed for the multi-objective FJSP (MOFJSP). Our considered objectives are makespan, critical machine work load, and total work load of machines. The proposed algorithm is also compared with one the best Pareto-based algorithms of the literature on some multi-objective criteria, statistically.

Keywords: Scheduling, Flexible job shop scheduling problem, controlled elitism non-dominated sorting genetic algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1896

363 Unnoticeable Mumps Infection in India: Does MMR Vaccine Protect against Circulating Mumps Virus Genotype C?

Authors: Jeevan Malayan, Aparna Warrier, Padmasani Venkat Ramanan, Sanjeeva Reddy N, Elanchezhiyan Manickan

Abstract:

MMR vaccine failure had been reported globally and here we report that it occurs now in India. Samples were collected from clinically suspected mumps cases were subjected for anti mumps antibodies, virus isolation, RT-PCR, sequencing and phylogenetic tree analysis. 56 samples collected from men and women belonging to various age groups. 30 had been vaccinated and the status of 26 patients was unknown. 28 out of 30 samples were found to be symptomatic and positive for Mumps IgM, indicating active mumps infection in 93.4% of the vaccinated population. A phylogenetic tree comparison of the clinical isolate is shown to be genotype C which is distinct from vaccine strain. Our study clearly sending warning signs that MMR vaccine is a failure and it needs to be revamped for the human use by increasing its efficacy and efficiency.

Keywords: Genotype C, Mumps virus, MMR vaccine, Sero types.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2403

362 GeoSEMA: A Modelling Platform, Emerging “GeoSpatial-based Evolutionary and Mobile Agents“

Authors: Mohamed Dbouk, Ihab Sbeity

Abstract:

Spatial and mobile computing evolves. This paper describes a smart modeling platform called “GeoSEMA". This approach tends to model multidimensional GeoSpatial Evolutionary and Mobile Agents. Instead of 3D and location-based issues, there are some other dimensions that may characterize spatial agents, e.g. discrete-continuous time, agent behaviors. GeoSEMA is seen as a devoted design pattern motivating temporal geographic-based applications; it is a firm foundation for multipurpose and multidimensional special-based applications. It deals with multipurpose smart objects (buildings, shapes, missiles, etc.) by stimulating geospatial agents. Formally, GeoSEMA refers to geospatial, spatio-evolutive and mobile space constituents where a conceptual geospatial space model is given in this paper. In addition to modeling and categorizing geospatial agents, the model incorporates the concept of inter-agents event-based protocols. Finally, a rapid software-architecture prototyping GeoSEMA platform is also given. It will be implemented/ validated in the next phase of our work.

Keywords: Location-Trajectory management, GIS, Mobile- Moving Objects/Agents, Multipurpose/Spatiotemporal data, Multi- Agent Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1610

361 Empirical and Indian Automotive Equity Portfolio Decision Support

Authors: P. Sankar, P. James Daniel Paul, Siddhant Sahu

Abstract:

A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.

Keywords: Indian Automotive Sector, Stock Market Decisions, Equity Portfolio Analysis, Decision Tree Classifiers, Statistical Data Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1990

360 Geospatial Network Analysis Using Particle Swarm Optimization

Authors: Varun Singh, Mainak Bandyopadhyay, Maharana Pratap Singh

Abstract:

The shortest path (SP) problem concerns with finding the shortest path from a specific origin to a specified destination in a given network while minimizing the total cost associated with the path. This problem has widespread applications. Important applications of the SP problem include vehicle routing in transportation systems particularly in the field of in-vehicle Route Guidance System (RGS) and traffic assignment problem (in transportation planning). Well known applications of evolutionary methods like Genetic Algorithms (GA), Ant Colony Optimization, Particle Swarm Optimization (PSO) have come up to solve complex optimization problems to overcome the shortcomings of existing shortest path analysis methods. It has been reported by various researchers that PSO performs better than other evolutionary optimization algorithms in terms of success rate and solution quality. Further Geographic Information Systems (GIS) have emerged as key information systems for geospatial data analysis and visualization. This research paper is focused towards the application of PSO for solving the shortest path problem between multiple points of interest (POI) based on spatial data of Allahabad City and traffic speed data collected using GPS. Geovisualization of results of analysis is carried out in GIS.

Keywords: GIS, Outliers, PSO, Traffic Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2834