Search results for: tree based model.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 15676

Search results for: tree based model.

15526 Ranking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison

Authors: Frank Emmert-Streib, Matthias Dehmer, Jing Liu, Max Muhlhauser

Abstract:

The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, generalized trees, graph alignment, DNA microarray data, cervical cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1718
15525 Human Body Configuration using Bayesian Model

Authors: Rui. Zhang, Yiming. Pi

Abstract:

In this paper we present a novel approach for human Body configuration based on the Silhouette. We propose to address this problem under the Bayesian framework. We use an effective Model based MCMC (Markov Chain Monte Carlo) method to solve the configuration problem, in which the best configuration could be defined as MAP (maximize a posteriori probability) in Bayesian model. This model based MCMC utilizes the human body model to drive the MCMC sampling from the solution space. It converses the original high dimension space into a restricted sub-space constructed by the human model and uses a hybrid sampling algorithm. We choose an explicit human model and carefully select the likelihood functions to represent the best configuration solution. The experiments show that this method could get an accurate configuration and timesaving for different human from multi-views.

Keywords: Bayesian framework, MCMC, model based, human body configuration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1272
15524 Processing and Economic Analysis of Rain Tree (Samanea saman) Pods for Village Level Hydrous Bioethanol Production

Authors: Dharell B. Siano, Wendy C. Mateo, Victorino T. Taylan, Francisco D. Cuaresma

Abstract:

Biofuel is one of the renewable energy sources adapted by the Philippine government in order to lessen the dependency on foreign fuel and to reduce carbon dioxide emissions. Rain tree pods were seen to be a promising source of bioethanol since it contains significant amount of fermentable sugars. The study was conducted to establish the complete procedure in processing rain tree pods for village level hydrous bioethanol production. Production processes were done for village level hydrous bioethanol production from collection, drying, storage, shredding, dilution, extraction, fermentation, and distillation. The feedstock was sundried, and moisture content was determined at a range of 20% to 26% prior to storage. Dilution ratio was 1:1.25 (1 kg of pods = 1.25 L of water) and after extraction process yielded a sugar concentration of 22 0Bx to 24 0Bx. The dilution period was three hours. After three hours of diluting the samples, the juice was extracted using extractor with a capacity of 64.10 L/hour. 150 L of rain tree pods juice was extracted and subjected to fermentation process using a village level anaerobic bioreactor. Fermentation with yeast (Saccharomyces cerevisiae) can fasten up the process, thus producing more ethanol at a shorter period of time; however, without yeast fermentation, it also produces ethanol at lower volume with slower fermentation process. Distillation of 150 L of fermented broth was done for six hours at 85 °C to 95 °C temperature (feedstock) and 74 °C to 95 °C temperature of the column head (vapor state of ethanol). The highest volume of ethanol recovered was established at with yeast fermentation at five-day duration with a value of 14.89 L and lowest actual ethanol content was found at without yeast fermentation at three-day duration having a value of 11.63 L. In general, the results suggested that rain tree pods had a very good potential as feedstock for bioethanol production. Fermentation of rain tree pods juice can be done with yeast and without yeast.

Keywords: Fermentation, hydrous bioethanol, rain tree pods, village level.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1519
15523 Case Study Analysis of 2017 European Railway Traffic Management Incident: The Application of System for Investigation of Railway Interfaces Methodology

Authors: Sanjeev Kumar Appicharla

Abstract:

This paper presents the results of the modelling and analysis of the European Railway Traffic Management (ERTMS) safety critical incident to raise awareness of biases in systems engineering process on the Cambrian Railway in the UK using the RAIB 17/2019 as a primary input. The RAIB, the UK independent accident investigator, published the Report- RAIB 17/2019 giving the details of their investigation of the focal event in the form of immediate cause, causal factors and underlying factors and recommendations to prevent a repeat of the safety-critical incident on the Cambrian Line. The Systems for Investigation of Railway Interfaces (SIRI) is the Methodology used to model and analyse the safety-critical incident. The SIRI Methodology uses the Swiss Cheese Model to model the incident and identify latent failure conditions (potentially less than adequate conditions) by means of the Management Oversight and Risk Tree technique. The benefits of the SIRI Methodology are threefold: first is that it incorporates “Heuristics and Biases” approach, in the Management Oversight and Risk Tree technique to identify systematic errors. Civil engineering and programme management railway professionals are aware of role “optimism bias” plays in programme cost overruns and are aware of bow tie (fault and event tree) model-based safety risk modelling technique. However, the role of systematic errors due to “Heuristics and Biases” is not appreciated as yet. This overcomes the problems of omission of human and organisational factors from accident analysis. Second, the scope of the investigation includes all levels of the socio-technical system, including government, regulatory, railway safety bodies, duty holders, signalling firms and transport planners, and front-line staff such that lessons learned at the decision making and implementation level as well. Third, the author’s past accident case studies are supplemented with research pieces of evidence drawn from the practitioner’s and academic researchers’ publications as well. This is to discuss the role of system thinking to improve the decision making and risk management processes and practices in the IEC 15288 Systems Engineering standard, and in the industrial context such as the GB railways and Artificial Intelligence (AI) contexts as well.

Keywords: Accident analysis, AI algorithm internal audit, bounded rationality, Byzantine failures, heuristics and biases approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 316
15522 Compact Binary Tree Representation of Logic Function with Enhanced Throughput

Authors: Padmanabhan Balasubramanian, C. Ardil

Abstract:

An effective approach for realizing the binary tree structure, representing a combinational logic functionality with enhanced throughput, is discussed in this paper. The optimization in maximum operating frequency was achieved through delay minimization, which in turn was possible by means of reducing the depth of the binary network. The proposed synthesis methodology has been validated by experimentation with FPGA as the target technology. Though our proposal is technology independent, yet the heuristic enables better optimization in throughput even after technology mapping for such Boolean functionality; whose reduced CNF form is associated with a lesser literal cost than its reduced DNF form at the Boolean equation level. For cases otherwise, our method converges to similar results as that of [12]. The practical results obtained for a variety of case studies demonstrate an improvement in the maximum throughput rate for Spartan IIE (XC2S50E-7FT256) and Spartan 3 (XC3S50-4PQ144) FPGA logic families by 10.49% and 13.68% respectively. With respect to the LUTs and IOBUFs required for physical implementation of the requisite non-regenerative logic functionality, the proposed method enabled savings to the tune of 44.35% and 44.67% respectively, over the existing efficient method available in literature [12].

Keywords: Binary logic tree, FPGA based design, Boolean function, Throughput rate, CNF, DNF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1867
15521 Performance Analysis of Artificial Neural Network with Decision Tree in Prediction of Diabetes Mellitus

Authors: J. K. Alhassan, B. Attah, S. Misra

Abstract:

Human beings have the ability to make logical decisions. Although human decision - making is often optimal, it is insufficient when huge amount of data is to be classified. Medical dataset is a vital ingredient used in predicting patient’s health condition. In other to have the best prediction, there calls for most suitable machine learning algorithms. This work compared the performance of Artificial Neural Network (ANN) and Decision Tree Algorithms (DTA) as regards to some performance metrics using diabetes data. WEKA software was used for the implementation of the algorithms. Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were the two algorithms used for ANN, while RegTree and LADTree algorithms were the DTA models used. From the results obtained, DTA performed better than ANN. The Root Mean Squared Error (RMSE) of MLP is 0.3913 that of RBF is 0.3625, that of RepTree is 0.3174 and that of LADTree is 0.3206 respectively.

Keywords: Artificial neural network, classification, decision tree, diabetes mellitus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2385
15520 Enhancing the Performance of Wireless Sensor Networks Using Low Power Design

Authors: N. Mahendran, R. Madhuranthi

Abstract:

Wireless sensor networks (WSNs), are constantly in demand to process information more rapidly with less energy and area cost. Presently, processor based solutions have difficult to achieve high processing speed with low-power consumption. This paper presents a simple and accurate data processing scheme for low power wireless sensor node, based on reduced number of processing element (PE). The presented model provides a simple recursive structure (SRS) to process the sampled data in the wireless sensor environment and to reduce the power consumption in wireless sensor node. Based on this model, to process the incoming samples and produce a smaller amount of data sufficient to reconstruct the original signal. The ModelSim simulator used to simulate SRS structure. Functional simulation is carried out for the validation of the presented architecture. Xilinx Power Estimator (XPE) tool is used to measure the power consumption. The experimental results show the average power consumption of 91 mW; this is 42% improvement compared to the folded tree architecture.

Keywords: Power consumption, energy efficiency, low power WSN node, recursive structure, sleep/wake scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974
15519 Context-aware Recommender Systems using Data Mining Techniques

Authors: Kyoung-jae Kim, Hyunchul Ahn, Sangwon Jeong

Abstract:

This study proposes a novel recommender system to provide the advertisements of context-aware services. Our proposed model is designed to apply a modified collaborative filtering (CF) algorithm with regard to the several dimensions for the personalization of mobile devices – location, time and the user-s needs type. In particular, we employ a classification rule to understand user-s needs type using a decision tree algorithm. In addition, we collect primary data from the mobile phone users and apply them to the proposed model to validate its effectiveness. Experimental results show that the proposed system makes more accurate and satisfactory advertisements than comparative systems.

Keywords: Location-based advertisement, Recommender system, Collaborative filtering, User needs type, Mobile user.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2141
15518 Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression

Authors: Ehab Mourtaga, Ahmad Sharieh, Mousa Abdallah

Abstract:

An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.

Keywords: Hidden Markov Model (HMM), MaximumLikelihood Linear Regression (MLLR), Quran, Regression ClassTree, Speech Recognition, Speaker-independent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1882
15517 Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model

Authors: Selvam M, Natarajan. A M, Thangarajan R

Abstract:

Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syntax and semantics thereby increasing accuracy and efficiency of the parser. Tamil language has some inherent features which are more challenging. In order to obtain the solutions, lexicalized and statistical approach is to be applied in the parsing with the aid of a language model. Statistical models mainly focus on semantics of the language which are suitable for large vocabulary tasks where as structural methods focus on syntax which models small vocabulary tasks. A statistical language model based on Trigram for Tamil language with medium vocabulary of 5000 words has been built. Though statistical parsing gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like focus on semantics rather than syntax, lack of support in free ordering of words and long term relationship. To overcome the disadvantages a structural component is to be incorporated in statistical language models which leads to the implementation of hybrid language models. This paper has attempted to build phrase structured hybrid language model which resolves above mentioned disadvantages. In the development of hybrid language model, new part of speech tag set for Tamil language has been developed with more than 500 tags which have the wider coverage. A phrase structured Treebank has been developed with 326 Tamil sentences which covers more than 5000 words. A hybrid language model has been trained with the phrase structured Treebank using immediate head parsing technique. Lexicalized and statistical parser which employs this hybrid language model and immediate head parsing technique gives better results than pure grammar and trigram based model.

Keywords: Hybrid Language Model, Immediate Head Parsing, Lexicalized and Statistical Parsing, Natural Language Processing, Parts of Speech, Probabilistic Context Free Grammar, Tamil Language, Tree Bank.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3597
15516 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1453
15515 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection

Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi

Abstract:

It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, hybrid, filter-wrapper, phishing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 123
15514 A Green Method for Selective Spectrophotometric Determination of Hafnium(IV) with Aqueous Extract of Ficus carica Tree Leaves

Authors: A. Boveiri Monji, H. Yousefnia, M. Haji Hosseini, S. Zolghadri

Abstract:

A clean spectrophotometric method for the determination of hafnium by using a green reagent, acidic extract of Ficus carica tree leaves is developed. In 6-M hydrochloric acid, hafnium reacts with this reagent to form a yellow product. The formed product shows maximum absorbance at 421 nm with a molar absorptivity value of 0.28 × 104 l mol⁻¹ cm⁻¹, and the method was linear in the 2-11 µg ml⁻¹ concentration range. The detection limit value was found to be 0.312 µg ml⁻¹. Except zirconium and iron, the selectivity was good, and most of the ions did not show any significant spectral interference at concentrations up to several hundred times. The proposed method was green, simple, low cost, and selective.

Keywords: Spectrophotometric determination, Ficus carica tree leaves, synthetic reagents, hafnium.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 701
15513 Adsorption of Ferrous and Ferric Ions in Aqueous and Industrial Effluent onto Pongamia pinnata Tree Bark

Authors: M. Mamatha, H. B. Aravinda, E. T. Puttaiah, S. Manjappa

Abstract:

One of the causes of water pollution is the presence of heavy metals in water. In the present study, an adsorbent prepared from the raw bark of the Pongamia pinnata tree is used for the removal of ferrous or ferric ions from aqueous and waste water containing heavy metals. Adsorption studies were conducted at different pH, concentration of metal ion, amount of adsorbent, contact time, agitation and temperature. The Langmuir and Freundlich adsorption isotherm models were applied for the results. The Langmuir isotherms were best fitted by the equilibrium data. The maximum adsorption was found to 146mg/g in waste water at a temperature of 30°C which is in agreement as comparable to the adsorption capacity of different adsorbents reported in literature. Pseudo second order model best fitted the adsorption of both ferrous and ferric ions.

Keywords: Adsorption, Adsorption isotherms, Heavy metals, Industrial effluents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3217
15512 A 25-year Monitoring of the Air Pollution Depicted by Plane Tree Species in Tehran

Authors: S. A. A. Korori, H. Valipour K., S. Shabestani, A. shirvany, M. Matinizadeh

Abstract:

Tehran, one of the heavily-populated capitals, is severely suffering from increasing air pollution. To show a documented trend of such pollutants during last years, plane tree species (Platanus orientalis) were suited to be studied as indicators, for the species have been planted throughout the city many years ago. Two areas (Saadatabad and Narmak districts) allotting different contents of crowed and highly-traffic routs but the same ecological characteristics were selected. Twelve sample individuals were cored twice perpendicularly in each area. Tree-rings of each core were measured by a binocular microscope and separated annually for the last 25 years. Two heavy metals including Cd and Pb accompanied by a mineral element (Ca) were analyzed using Hatch method. Treerings analysis of the two areas showed different groups in term of physiologically ability as the growths were plunged during the last 10 years in Saadatabad district and showed a slight decrease in the same period for another studying area. In direct contrast to decreasing growth trend in Saadatabad, all three mentioned elements increased sharply during last 25 years in the same area. When it came to Narmak district, the trend was completely different with Saadatabad. There were some fluctuations in absorbing trace elements like tree-rings widths were, yet calcium showed an upward trend all the last 25 years. The results of the study proved the possibility of using tree species of each region to monitor its air pollution trends of the past, hence to depict a pollution assessment of a populated city for last years and then to make appropriate decisions for the future as it is well-known what the trend is. On the other hand, risen values of calcium (as the stress-indicator element) accompanied by increased trace elements suggests non-sustainable state of the trees.

Keywords: Air pollution, Platanus orientalis, Tehran, Traceelements, Tree rings.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1645
15511 Using Single Decision Tree to Assess the Impact of Cutting Conditions on Vibration

Authors: S. Ghorbani, N. I. Polushin

Abstract:

Vibration during machining process is crucial since it affects cutting tool, machine, and workpiece leading to a tool wear, tool breakage, and an unacceptable surface roughness. This paper applies a nonparametric statistical method, single decision tree (SDT), to identify factors affecting on vibration in machining process. Workpiece material (AISI 1045 Steel, AA2024 Aluminum alloy, A48-class30 Gray Cast Iron), cutting tool (conventional, cutting tool with holes in toolholder, cutting tool filled up with epoxy-granite), tool overhang (41-65 mm), spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev) and depth of cut (0.05-0.15 mm) were used as input variables, while vibration was the output parameter. It is concluded that workpiece material is the most important parameters for natural frequency followed by cutting tool and overhang.

Keywords: Cutting condition, vibration, natural frequency, decision tree, CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1387
15510 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502
15509 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: Classification algorithms; data mining; tourism; knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2492
15508 Dynamic Routing to Multiple Destinations in IP Networks using Hybrid Genetic Algorithm (DRHGA)

Authors: K. Vijayalakshmi, S. Radhakrishnan

Abstract:

In this paper we have proposed a novel dynamic least cost multicast routing protocol using hybrid genetic algorithm for IP networks. Our protocol finds the multicast tree with minimum cost subject to delay, degree, and bandwidth constraints. The proposed protocol has the following features: i. Heuristic local search function has been devised and embedded with normal genetic operation to increase the speed and to get the optimized tree, ii. It is efficient to handle the dynamic situation arises due to either change in the multicast group membership or node / link failure, iii. Two different crossover and mutation probabilities have been used for maintaining the diversity of solution and quick convergence. The simulation results have shown that our proposed protocol generates dynamic multicast tree with lower cost. Results have also shown that the proposed algorithm has better convergence rate, better dynamic request success rate and less execution time than other existing algorithms. Effects of degree and delay constraints have also been analyzed for the multicast tree interns of search success rate.

Keywords: Dynamic Group membership change, Hybrid Genetic Algorithm, Link / node failure, QoS Parameters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1412
15507 Towards a Measurement-Based E-Government Portals Maturity Model

Authors: Abdoullah Fath-Allah, Laila Cheikhi, Rafa E. Al-Qutaish, Ali Idri

Abstract:

The e-government emerging concept transforms the way in which the citizens are dealing with their governments. Thus, the citizens can execute the intended services online anytime and anywhere. This results in great benefits for both the governments (reduces the number of officers) and the citizens (more flexibility and time saving). Therefore, building a maturity model to assess the egovernment portals becomes desired to help in the improvement process of such portals. This paper aims at proposing an egovernment maturity model based on the measurement of the best practices’ presence. The main benefit of such maturity model is to provide a way to rank an e-government portal based on the used best practices, and also giving a set of recommendations to go to the higher stage in the maturity model.

Keywords: Best practices, e-government portal, maturity model, quality model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2077
15506 A Relative Analysis of Carbon and Dust Uptake by Important Tree Species in Tehran, Iran

Authors: Sahar Elkaee Behjati

Abstract:

Air pollution, particularly with dust, is one of the biggest issues Tehran is dealing with, and the city's green space which consists of trees has a critical role in absorption of it. The question this study aimed to investigate was which tree species the highest uptake capacity of the dust and carbon have suspended in the air. On this basis, 30 samples of trees from two different districts in Tehran were collected, and after washing and centrifuging, the samples were oven dried. The results of the study revealed that Ulmus minor had the highest amount of deposited dust in both districts. In addition, it was found that in Chamran district Ailanthus altissima and in Gandi district Ulmus minor has had the highest absorption of deposited carbon. Therefore, it could be argued that decision making on the selection of species for urban green spaces should take the above-mentioned parameters into account.

Keywords: Dust, leaves, uptake total carbon, tehran, tree species.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 680
15505 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2442
15504 A Systems Approach to Gene Ranking from DNA Microarray Data of Cervical Cancer

Authors: Frank Emmert Streib, Matthias Dehmer, Jing Liu, Max Mühlhauser

Abstract:

In this paper we present a method for gene ranking from DNA microarray data. More precisely, we calculate the correlation networks, which are unweighted and undirected graphs, from microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to progression of the tumor. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth and, hence, indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, DNA microarray data, cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1712
15503 A Patricia-Tree Approach for Frequent Closed Itemsets

Authors: Moez Ben Hadj Hamida, Yahya SlimaniI

Abstract:

In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to generate non redundant rule associations. Using this adaptation, we can generate frequent closed itemsets that are more compact than frequent itemsets used in Apriori approach. This adaptation has been experimented on a set of datasets benchmarks.

Keywords: Datamining, Frequent itemsets, Frequent closeditemsets, Sparse datasets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1843
15502 Adaptive MPC Using a Recursive Learning Technique

Authors: Ahmed Abbas Helmy, M. R. M. Rizk, Mohamed El-Sayed

Abstract:

A model predictive controller based on recursive learning is proposed. In this SISO adaptive controller, a model is automatically updated using simple recursive equations. The identified models are then stored in the memory to be re-used in the future. The decision for model update is taken based on a new control performance index. The new controller allows the use of simple linear model predictive controllers in the control of nonlinear time varying processes.

Keywords: Adaptive control, model predictive control, dynamic matrix control, online model identification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1739
15501 Identification of Healthy and BSR-Infected Oil Palm Trees Using Color Indices

Authors: Siti Khairunniza-Bejo, Yusnida Yusoff, Nik Salwani Nik Yusoff, Idris Abu Seman, Mohamad Izzuddin Anuar

Abstract:

Most of the oil palm plantations have been threatened by Basal Stem Rot (BSR) disease which causes serious economic impact. This study was conducted to identify the healthy and BSRinfected oil palm tree using thirteen color indices. Multispectral and thermal camera was used to capture 216 images of the leaves taken from frond number 1, 9 and 17. Indices of normalized difference vegetation index (NDVI), red (R), green (G), blue (B), near infrared (NIR), green – blue (GB), green/blue (G/B), green – red (GR), green/red (G/R), hue (H), saturation (S), intensity (I) and thermal index (T) were used. From this study, it can be concluded that G index taken from frond number 9 is the best index to differentiate between the healthy and BSR-infected oil palm trees. It not only gave high value of correlation coefficient (R=-0.962), but also high value of separation between healthy and BSR-infected oil palm tree. Furthermore, power and S model developed using G index gave the highest R2 value which is 0.985.

Keywords: Oil palm, image processing, disease, leaves.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2921
15500 Discovering Complex Regularities: from Tree to Semi-Lattice Classifications

Authors: A. Faro, D. Giordano, F. Maiorana

Abstract:

Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optimize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is able to automatically suggest a strategy to optimize the number of classes optimization, but also support both tree classifications and semi-lattice organizations of the classes to give to the users the possibility of passing from one class to the ones with which it has some aspects in common. Examples of using tree and semi-lattice classifications are given to illustrate advantages and problems. The tool is applied to classify macroeconomic data that report the most developed countries- import and export. It is possible to classify the countries based on their economic behaviour and use the tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation. Possible interrelationships between the classes and their meaning are also discussed.

Keywords: Unsupervised classification, Kohonen networks, macroeconomics, Visual data mining, Cluster interpretation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1510
15499 A Similarity Function for Global Quality Assessment of Retinal Vessel Segmentations

Authors: Arturo Aquino, Manuel Emilio Gegundez, Jose Manuel Bravo, Diego Marin

Abstract:

Retinal vascularity assessment plays an important role in diagnosis of ophthalmic pathologies. The employment of digital images for this purpose makes possible a computerized approach and has motivated development of many methods for automated vascular tree segmentation. Metrics based on contingency tables for binary classification have been widely used for evaluating performance of these algorithms and, concretely, the accuracy has been mostly used as measure of global performance in this topic. However, this metric shows very poor matching with human perception as well as other notable deficiencies. Here, a new similarity function for measuring quality of retinal vessel segmentations is proposed. This similarity function is based on characterizing the vascular tree as a connected structure with a measurable area and length. Tests made indicate that this new approach shows better behaviour than the current one does. Generalizing, this concept of measuring descriptive properties may be used for designing functions for measuring more successfully segmentation quality of other complex structures.

Keywords: Retinal vessel segmentation, quality assessment, performanceevaluation, similarity function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1463
15498 Development of the Academic Model to Predict Student Success at VUT-FSASEC Using Decision Trees

Authors: Langa Hendrick Musawenkosi, Twala Bhekisipho

Abstract:

The success or failure of students is a concern for every academic institution, college, university, governments and students themselves. Several approaches have been researched to address this concern. In this paper, a view is held that when a student enters a university or college or an academic institution, he or she enters an academic environment. The academic environment is unique concept used to develop the solution for making predictions effectively. This paper presents a model to determine the propensity of a student to succeed or fail in the French South African Schneider Electric Education Center (FSASEC) at the Vaal University of Technology (VUT). The Decision Tree algorithm is used to implement the model at FSASEC.

Keywords: Academic environment model, decision trees, FSASEC, K-nearest neighbor, machine learning, popularity index, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1091
15497 A High-Speed Multiplication Algorithm Using Modified Partial Product Reduction Tree

Authors: P. Asadee

Abstract:

Multiplication algorithms have considerable effect on processors performance. A new high-speed, low-power multiplication algorithm has been presented using modified Dadda tree structure. Three important modifications have been implemented in inner product generation step, inner product reduction step and final addition step. Optimized algorithms have to be used into basic computation components, such as multiplication algorithms. In this paper, we proposed a new algorithm to reduce power, delay, and transistor count of a multiplication algorithm implemented using low power modified counter. This work presents a novel design for Dadda multiplication algorithms. The proposed multiplication algorithm includes structured parts, which have important effect on inner product reduction tree. In this paper, a 1.3V, 64-bit carry hybrid adder is presented for fast, low voltage applications. The new 64-bit adder uses a new circuit to implement the proposed carry hybrid adder. The new adder using 80 nm CMOS technology has been implemented on 700 MHz clock frequency. The proposed multiplication algorithm has achieved 14 percent improvement in transistor count, 13 percent reduction in delay and 12 percent modification in power consumption in compared with conventional designs.

Keywords: adder, CMOS, counter, Dadda tree, encoder.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2273