Search results for: Desicion Tree

171 Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications

Authors: Kanthida Kusonmano, Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Klaus R. Liedl, Armin Graber

Abstract:

Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.

Keywords: Classification, High dimensional data, Machine learning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2384

170 Development of Better Quality Low-Cost Activated Carbon from South African Pine Tree (Pinus patula) Sawdust: Characterization and Comparative Phenol Adsorption

Authors: L. Mukosha, M. S. Onyango, A. Ochieng, H. Kasaini

Abstract:

The remediation of water resources pollution in developing countries requires the application of alternative sustainable cheaper and efficient end-of-pipe wastewater treatment technologies. The feasibility of use of South African cheap and abundant pine tree (Pinus patula) sawdust for development of lowcost AC of comparable quality to expensive commercial ACs in the abatement of water pollution was investigated. AC was developed at optimized two-stage N2-superheated steam activation conditions in a fixed bed reactor, and characterized for proximate and ultimate properties, N2-BET surface area, pore size distribution, SEM, pHPZC and FTIR. The sawdust pyrolysis activation energy was evaluated by TGA. Results indicated that the chars prepared at 800oC and 2hrs were suitable for development of better quality AC at 800oC and 47% burn-off having BET surface area (1086m2/g), micropore volume (0.26cm3/g), and mesopore volume (0.43cm3/g) comparable to expensive commercial ACs, and suitable for water contaminants removal. The developed AC showed basic surface functionality at pHPZC at 10.3, and a phenol adsorption capacity that was higher than that of commercial Norit (RO 0.8) AC. Thus, it is feasible to develop better quality low-cost AC from (Pinus patula) sawdust using twostage N2-steam activation in fixed-bed reactor.

Keywords: Activated carbon, phenol adsorption, sawdust integrated utilization, economical wastewater treatment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3470

169 Level of Acceptability of Moringa oleifera Diversified Products among Rural and Urban Dwellers in Nigeria

Authors: Mojisola F. Oyewole, Franscisca T. Adetoro, Nkiru T. Meludu

Abstract:

Moringa oleifera is a nutritious vegetable tree with varieties of potential uses, as almost every part of the Moringa oleifera tree can be used for food. This study was conducted in Oyo State, Nigeria, to find out the level of acceptability of Moringa oleifera diversified products among rural and urban dwellers. Purposive sampling was used to select two local governments’ areas. Stratified sampling technique was also used to select one community each from rural and urban areas while snowball sampling technique was used to select ten respondents each from the two communities, making a total number of forty respondents. Data were analyzed using frequencies, percentages, Chi-square, Pearson Product Moment Correlation and regression analysis. Result from the study revealed that majority of the respondents (80%) fell within the age range of 20-49 years and 55% of them were male, 55% were married, 70% of them were Christians, 80% of them had tertiary education. The result also showed that 85% were aware of the Moringa plant and (65%) of them have consumed Moringa oleifera and the perception statements on the benefits of Moringa oleifera indicated that (52.5%) of the respondents rated Moringa oleifera to be favorable, most of them had high acceptability for Moringa egusi soup, Moringa tea, Moringa pap and yam pottage with Moringa. The result of the hypotheses testing showed that there is a significant relationship between sex of the respondents and acceptability of the diversified Moringa oleifera products (x2=6.465, p = 0.011). There is also a significant relationship between family size of the respondents level of acceptability of the Moringa oleifera products (r = 0.327, p = 0.040). Based on the level of acceptability of Moringa oleifera diversified products; the plant is of great economic importance to the populace. Therefore, there should be more public awareness through the media to enlighten people on the beneficial effects of Moringa oleifera.

Keywords: Acceptability, Moringa oleifera, Diversified, Product, Dwellers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2612

168 Soil Evaluation for Cashew, Cocoa and Oil Palm in Akure, South-West Nigeria

Authors: Francis Bukola Dada, Samuel Ojo Ajayi, Babatunde Sunday Ewulo, Kehinde Oseni Saani

Abstract:

A key element in the sustainability of the soil-plant relationship in crop yield and performance is the soil's capacity to support tree crops prior to establishment. With the intention of determining the suitability and limitations of the soils of the locations, the northern and southern portions of Akure, a rainforest in Nigeria, were chosen for the suitability evaluation of land for tree crops. In the study area, 16 pedons were established with the help of the Global Positioning System (GPS), the locations were georeferenced and samples were taken from the pedons. The samples were subjected to standard physical and chemical testing. The findings revealed that soils in the research locations were deep to extremely deep, with pH ranging from highly acidic to slightly acidic (4.94 to 6.71). and that sand predominated. The soils had low levels of organic carbon, effective cation exchange capacity (ECEC), total nitrogen, and available phosphorus, whereas exchangeable cations were evaluated as low to moderate. The suitability result indicated that only Pedon 2 and Pedon 14 are currently highly suitable (S1) for the production of oil palms, while others ranged from moderately suitable to marginally suitable. Pedons 4, 12, and 16 were not suitable (N1), respectively, but other Pedons were moderately suitable (S2) and marginally suitable (S3) for the cultivation of cocoa. None of the study areas are currently highly suitable for the production of oil palms. The poor soil texture and low fertility status were the two main drawbacks found. Finally, sound management practices and soil conservation are essential for fertility sustainability.

Keywords: Cashew, cocoa, land evaluation, oil palm, soil fertility suitability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 451

167 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 408

166 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection

Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi

Abstract:

It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, hybrid, filter-wrapper, phishing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178

165 Case Study Analysis of 2017 European Railway Traffic Management Incident: The Application of System for Investigation of Railway Interfaces Methodology

Authors: Sanjeev Kumar Appicharla

Abstract:

This paper presents the results of the modelling and analysis of the European Railway Traffic Management (ERTMS) safety critical incident to raise awareness of biases in systems engineering process on the Cambrian Railway in the UK using the RAIB 17/2019 as a primary input. The RAIB, the UK independent accident investigator, published the Report- RAIB 17/2019 giving the details of their investigation of the focal event in the form of immediate cause, causal factors and underlying factors and recommendations to prevent a repeat of the safety-critical incident on the Cambrian Line. The Systems for Investigation of Railway Interfaces (SIRI) is the Methodology used to model and analyse the safety-critical incident. The SIRI Methodology uses the Swiss Cheese Model to model the incident and identify latent failure conditions (potentially less than adequate conditions) by means of the Management Oversight and Risk Tree technique. The benefits of the SIRI Methodology are threefold: first is that it incorporates “Heuristics and Biases” approach, in the Management Oversight and Risk Tree technique to identify systematic errors. Civil engineering and programme management railway professionals are aware of role “optimism bias” plays in programme cost overruns and are aware of bow tie (fault and event tree) model-based safety risk modelling technique. However, the role of systematic errors due to “Heuristics and Biases” is not appreciated as yet. This overcomes the problems of omission of human and organisational factors from accident analysis. Second, the scope of the investigation includes all levels of the socio-technical system, including government, regulatory, railway safety bodies, duty holders, signalling firms and transport planners, and front-line staff such that lessons learned at the decision making and implementation level as well. Third, the author’s past accident case studies are supplemented with research pieces of evidence drawn from the practitioner’s and academic researchers’ publications as well. This is to discuss the role of system thinking to improve the decision making and risk management processes and practices in the IEC 15288 Systems Engineering standard, and in the industrial context such as the GB railways and Artificial Intelligence (AI) contexts as well.

Keywords: Accident analysis, AI algorithm internal audit, bounded rationality, Byzantine failures, heuristics and biases approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 380

164 Some New Bounds for a Real Power of the Normalized Laplacian Eigenvalues

Authors: Ayşe Dilek Maden

Abstract:

For a given a simple connected graph, we present some new bounds via a new approach for a special topological index given by the sum of the real number power of the non-zero normalized Laplacian eigenvalues. To use this approach presents an advantage not only to derive old and new bounds on this topic but also gives an idea how some previous results in similar area can be developed.

Keywords: Degree Kirchhoff index, normalized Laplacian eigenvalue, spanning tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201

163 Integrating Context Priors into a Decision Tree Classification Scheme

Authors: Kasim Terzic, Bernd Neumann

Abstract:

Scene interpretation systems need to match (often ambiguous) low-level input data to concepts from a high-level ontology. In many domains, these decisions are uncertain and benefit greatly from proper context. This paper demonstrates the use of decision trees for estimating class probabilities for regions described by feature vectors, and shows how context can be introduced in order to improve the matching performance.

Keywords: Classification, Decision Trees, Interpretation, Vision

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1300

162 A New Distribution Network Reconfiguration Approach using a Tree Model

Authors: E. Dolatdar, S. Soleymani, B. Mozafari

Abstract:

Power loss reduction is one of the main targets in power industry and so in this paper, the problem of finding the optimal configuration of a radial distribution system for loss reduction is considered. Optimal reconfiguration involves the selection of the best set of branches to be opened ,one each from each loop, for reducing resistive line losses , and reliving overloads on feeders by shifting the load to adjacent feeders. However ,since there are many candidate switching combinations in the system ,the feeder reconfiguration is a complicated problem. In this paper a new approach is proposed based on a simple optimum loss calculation by determining optimal trees of the given network. From graph theory a distribution network can be represented with a graph that consists a set of nodes and branches. In fact this problem can be viewed as a problem of determining an optimal tree of the graph which simultaneously ensure radial structure of each candidate topology .In this method the refined genetic algorithm is also set up and some improvements of algorithm are made on chromosome coding. In this paper an implementation of the algorithm presented by [7] is applied by modifying in load flow program and a comparison of this method with the proposed method is employed. In [7] an algorithm is proposed that the choice of the switches to be opened is based on simple heuristic rules. This algorithm reduce the number of load flow runs and also reduce the switching combinations to a fewer number and gives the optimum solution. To demonstrate the validity of these methods computer simulations with PSAT and MATLAB programs are carried out on 33-bus test system. The results show that the performance of the proposed method is better than [7] method and also other methods.

Keywords: Distribution System, Reconfiguration, Loss Reduction , Graph Theory , Optimization , Genetic Algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3782

161 Use XML Format like a Model of Data Backup

Authors: Souleymane Oumtanaga, Kadjo Tanon Lambert, Koné Tiémoman, Tety Pierre, Dowa N’sreke Florent

Abstract:

Nowadays data backup format doesn-t cease to appear raising so the anxiety on their accessibility and their perpetuity. XML is one of the most promising formats to guarantee the integrity of data. This article suggests while showing one thing man can do with XML. Indeed XML will help to create a data backup model. The main task will consist in defining an application in JAVA able to convert information of a database in XML format and restore them later.

Keywords: Backup, Proprietary format, parser, syntactic tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730

160 Prime Cordial Labeling on Graphs

Authors: S. Babitha, J. Baskar Babujee

Abstract:

A prime cordial labeling of a graph G with vertex set V is a bijection f from V to {1, 2, ..., |V |} such that each edge uv is assigned the label 1 if gcd(f(u), f(v)) = 1 and 0 if gcd(f(u), f(v)) > 1, then the number of edges labeled with 0 and the number of edges labeled with 1 differ by at most 1. In this paper we exhibit some characterization results and new constructions on prime cordial graphs.

Keywords: Prime cordial, tree, Euler, bijective, function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3576

159 Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite

Authors: F. Lazzeri, I. Reiter

Abstract:

Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.

Keywords: Time-series, features engineering methods for forecasting, energy demand forecasting, Azure machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1290

158 Calculation of a Sustainable Quota Harvesting of Long-Tailed Macaque (Macaca fascicularis Raffles) in Their Natural Habitats

Authors: Y. Santosa, D. A. Rahman, C. Wulan, A. H. Mustari

Abstract:

The global demand for long-tailed macaques for medical experimentation has continued to increase. Fulfillment of Indonesian export demands has been mostly from natural habitats, based on a harvesting quota. This quota has been determined according to the total catch for a given year, and not based on consideration of any demographic parameters or physical environmental factors with regard to the animal; hence threatening the sustainability of the various populations. It is therefore necessary to formulate a method for calculating a sustainable harvesting quota, based on population parameters in natural habitats. Considering the possibility of variations in habitat characteristics and population parameters, a time series observation of demographic and physical/biotic parameters, in various habitats, was performed on 13 groups of long-tailed macaques, distributed throughout the West Java, Lampung and Yogyakarta areas of Indonesia. These provinces were selected for comparison of the influence of human/tourism activities. Data on population parameters that was collected included data on life expectancy according to age class, numbers of individuals by sex and age class, and ‘ratio of infants to reproductive females’. The estimation of population growth was based on a population dynamic growth model: the Leslie matrix. The harvesting quota was calculated as being the difference between the actual population size and the MVP (minimum viable population) for each sex and age class. Observation indicated that there were variations within group size (24–106 individuals), gender (sex) ratio (1:1 to 1:1.3), life expectancy value (0.30 to 0.93), and ‘ratio of infants to reproductive females’ (0.23 to 1.56). Results of subsequent calculations showed that sustainable harvesting quotas for each studied group of long-tailed macaques, ranged from 29 to 110 individuals. An estimation model of the MVP for each age class was formulated as Log Y = 0.315 + 0.884 Log Ni (number of individual on ith age class). This study also found that life expectancy for the juvenile age class was affected by the humidity under tree stands, and dietary plants’ density at sapling, pole and tree stages (equation: Y=2.296 – 1.535 RH + 0.002 Kpcg – 0.002 Ktg – 0.001 Kphn, R2 = 89.6% with a significance value of 0.001). By contrast, for the sub-adult-adult age class, life expectancy was significantly affected by slope (equation: Y=0.377 = 0.012 Kml, R2 = 50.4%, with significance level of 0.007). The infant-toreproductive- female ratio was affected by humidity under tree stands, and dietary plant density at sapling and pole stages (equation: Y = - 1.432 + 2.172 RH – 0.004 Kpcg + 0.003 Ktg, R2 = 82.0% with significance level of 0.001). This research confirmed the importance of population parameters in determining the minimum viable population, and that MVP varied according to habitat characteristics (especially food availability). It would be difficult therefore, to formulate a general mathematical equation model for determining a harvesting quota for the species as a whole.

Keywords: Harvesting, long-tailed macaque, population, quota.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014

157 Rolling Element Bearing Diagnosis by Improved Envelope Spectrum: Optimal Frequency Band Selection

Authors: Juan David Arango, Alejandro Restrepo-Martinez

Abstract:

The Rolling Element Bearing (REB) vibration diagnosis is worth of special interest by the variety of REB and the wide necessity of those elements in industrial applications. The presence of a localized fault in a REB gives rise to a vibrational response, characterized by the modulation of a carrier signal. Frequency content of carrier signal (Spectral Frequency –f) is mainly related to resonance frequencies of the REB. This carrier signal is modulated by another signal, governed by the periodicity of the fault impact (Cyclic Frequency –α). In this sense, REB fault vibration response gives rise to a second-order cyclostationary signal. Second order cyclostationary signals could be represented in a bi-spectral map, where Spectral Coherence –SCoh are plotted against f and α. The Improved Envelope Spectrum –IES, is a useful approach to execute REB fault diagnosis. IES could be applied by the integration of SCoh over a predefined bandwidth on the f axis. Approaches to select f-bandwidth have been recently exposed by the definition of a metric which intends to evaluate the magnitude of the IES at the fault characteristics frequencies. This metric is represented in a 1/3-binary tree as a function of the frequency bandwidth and centre. Based on this binary tree the optimal frequency band is selected. However, some advantages have been seen if the metric is changed, which in fact tends to dictate different optimal f-bandwidth and so improve the IES representation. This paper evaluates the behaviour of the IES from a different metric optimization. This metric is based on the sample correlation coefficient, detecting high peaks in the selected frequencies while penalizing high peaks in the neighbours of the selected frequencies. Prior results indicate an improvement on the signal-noise ratio (SNR) on around 86% of samples analysed, which belong to IMS database.

Keywords: Sample Correlation IESFOgram, cyclostationary analysis, improved envelope spectrum, IES, rolling element bearing diagnosis, spectral coherence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 742

156 Distributed Data-Mining by Probability-Based Patterns

Authors: M. Kargar, F. Gharbalchi

Abstract:

In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.

Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555

155 Project Selection by Using Fuzzy AHP and TOPSIS Technique

Authors: S. Mahmoodzadeh, J. Shahrabi, M. Pariazar, M. S. Zaeri

Abstract:

In this article, by using fuzzy AHP and TOPSIS technique we propose a new method for project selection problem. After reviewing four common methods of comparing alternatives investment (net present value, rate of return, benefit cost analysis and payback period) we use them as criteria in AHP tree. In this methodology by utilizing improved Analytical Hierarchy Process by Fuzzy set theory, first we try to calculate weight of each criterion. Then by implementing TOPSIS algorithm, assessment of projects has been done. Obtained results have been tested in a numerical example.

Keywords: Fuzzy AHP, Project Selection, TOPSIS Technique.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6600

154 The Diameter of an Interval Graph is Twice of its Radius

Authors: Tarasankar Pramanik, Sukumar Mondal, Madhumangal Pal

Abstract:

In an interval graph G = (V,E) the distance between two vertices u, v is de£ned as the smallest number of edges in a path joining u and v. The eccentricity of a vertex v is the maximum among distances from all other vertices of V . The diameter (δ) and radius (ρ) of the graph G is respectively the maximum and minimum among all the eccentricities of G. The center of the graph G is the set C(G) of vertices with eccentricity ρ. In this context our aim is to establish the relation ρ = δ 2 for an interval graph and to determine the center of it.

Keywords: Interval graph, interval tree, radius, center.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643

153 Mining Sequential Patterns Using I-PrefixSpan

Authors: Dhany Saputra, Dayang R. A. Rambli, Oi Mean Foong

Abstract:

In this paper, we propose an improvement of pattern growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree framework and separator database to reduce the execution time and memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.

Keywords: ArrayList, ArrayIntList, minimum support, sequence database, sequential patterns.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1564

152 Using Interval Trees for Approximate Indexing of Instances

Authors: Khalil el Hindi

Abstract:

This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.

Keywords: Instance based learning, interval trees, the knn algorithm, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1510

151 A Knowledge Engineering Workshop: Application for Choise Car

Authors: Touahria Mohamed, Khababa Abdallah, Frécon Louis

Abstract:

This paper proposes a declarative language for knowledge representation (Ibn Rochd), and its environment of exploitation (DeGSE). This DeGSE system was designed and developed to facilitate Ibn Rochd writing applications. The system was tested on several knowledge bases by ascending complexity, culminating in a system for recognition of a plant or a tree, and advisors to purchase a car, for pedagogical and academic guidance, or for bank savings and credit. Finally, the limits of the language and research perspectives are stated.

Keywords: Knowledge representation, declarative language, IbnRochd, DeGSE, facets, cognitive approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1328

150 Clustering Categorical Data Using Hierarchies (CLUCDUH)

Authors: Gökhan Silahtaroğlu

Abstract:

Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).

Keywords: Clustering, tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1556

149 Tool for Fast Detection of Java Code Snippets

Authors: Tomáš Bublík, Miroslav Virius

Abstract:

This paper presents general results on the Java source code snippet detection problem. We propose the tool which uses graph and subgraph isomorphism detection. A number of solutions for all of these tasks have been proposed in the literature. However, although that all these solutions are really fast, they compare just the constant static trees. Our solution offers to enter an input sample dynamically with the Scripthon language while preserving an acceptable speed. We used several optimizations to achieve very low number of comparisons during the matching algorithm.

Keywords: AST, Java, tree matching, Scripthon, source code recognition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958

148 An Energy Efficient Algorithm for Distributed Mutual Exclusion in Mobile Ad-hoc Networks

Authors: Sayani Sil, Sukanta Das

Abstract:

This paper reports a distributed mutual exclusion algorithm for mobile Ad-hoc networks. The network is clustered hierarchically. The proposed algorithm considers the clustered network as a logical tree and develops a token passing scheme to get the mutual exclusion. The performance analysis and simulation results show that its message requirement is optimal, and thus the algorithm is energy efficient.

Keywords: Critical section, Distributed mutual exclusion, MobileAd-hoc network, Token-based algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751

147 N-Sun Decomposition of Complete, Complete Bipartite and Some Harary Graphs

Authors: R. Anitha, R. S. Lekshmi

Abstract:

Graph decompositions are vital in the study of combinatorial design theory. A decomposition of a graph G is a partition of its edge set. An n-sun graph is a cycle Cn with an edge terminating in a vertex of degree one attached to each vertex. In this paper, we define n-sun decomposition of some even order graphs with a perfect matching. We have proved that the complete graph K2n, complete bipartite graph K2n, 2n and the Harary graph H4, 2n have n-sun decompositions. A labeling scheme is used to construct the n-suns.

Keywords: Decomposition, Hamilton cycle, n-sun graph, perfect matching, spanning tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2396

146 Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees

Authors: Doru Anastasiu Popescu, Dan Rădulescu

Abstract:

In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.

Keywords: Tag, HTML, web page, genetic algorithm, similarity value, binary tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309

145 A Methodology for Definition of Road Networks in Rural Areas of Nepal

Authors: J. K. Shrestha, A. Benta, R. B. Lopes, N. Lopes

Abstract:

This work provides a practical method for the development of rural road networks in rural areas of developing countries. The proposed methodology enables to determine obligatory points in the rural road network maximizing the number of settlements that have access to basic services within a given maximum distance. The proposed methodology is simple and practical, hence, highly applicable to real-world scenarios, as demonstrated in the definition of the road network for the rural areas of Nepal.

Keywords: Minimum spanning tree, nodal points, rural road network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2880

144 A Frame Work for the Development of a Suitable Method to Find Shoot Length at Maturity of Mustard Plant Using Soft Computing Model

Authors: Satyendra Nath Mandal, J. Pal Choudhury, Dilip De, S. R. Bhadra Chaudhuri

Abstract:

The production of a plant can be measured in terms of seeds. The generation of seeds plays a critical role in our social and daily life. The fruit production which generates seeds, depends on the various parameters of the plant, such as shoot length, leaf number, root length, root number, etc When the plant is growing, some leaves may be lost and some new leaves may appear. It is very difficult to use the number of leaves of the tree to calculate the growth of the plant.. It is also cumbersome to measure the number of roots and length of growth of root in several time instances continuously after certain initial period of time, because roots grow deeper and deeper under ground in course of time. On the contrary, the shoot length of the tree grows in course of time which can be measured in different time instances. So the growth of the plant can be measured using the data of shoot length which are measured at different time instances after plantation. The environmental parameters like temperature, rain fall, humidity and pollution are also play some role in production of yield. The soil, crop and distance management are taken care to produce maximum amount of yields of plant. The data of the growth of shoot length of some mustard plant at the initial stage (7,14,21 & 28 days after plantation) is available from the statistical survey by a group of scientists under the supervision of Prof. Dilip De. In this paper, initial shoot length of Ken( one type of mustard plant) has been used as an initial data. The statistical models, the methods of fuzzy logic and neural network have been tested on this mustard plant and based on error analysis (calculation of average error) that model with minimum error has been selected and can be used for the assessment of shoot length at maturity. Finally, all these methods have been tested with other type of mustard plants and the particular soft computing model with the minimum error of all types has been selected for calculating the predicted data of growth of shoot length. The shoot length at the stage of maturity of all types of mustard plants has been calculated using the statistical method on the predicted data of shoot length.

Keywords: Fuzzy time series, neural network, forecasting error, average error.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591

143 Path Planning of a Robot Manipulator using Retrieval RRT Strategy

Authors: K. Oh, J. P. Hwang, E. Kim, H. Lee

Abstract:

This paper presents an algorithm which extends the rapidly-exploring random tree (RRT) framework to deal with change of the task environments. This algorithm called the Retrieval RRT Strategy (RRS) combines a support vector machine (SVM) and RRT and plans the robot motion in the presence of the change of the surrounding environment. This algorithm consists of two levels. At the first level, the SVM is built and selects a proper path from the bank of RRTs for a given environment. At the second level, a real path is planned by the RRT planners for the given environment. The suggested method is applied to the control of KUKA™,, a commercial 6 DOF robot manipulator, and its feasibility and efficiency are demonstrated via the cosimulatation of MatLab™, and RecurDyn™,.

Keywords: Path planning, RRT, 6 DOF manipulator, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2531

142 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: A classifier, Algorithms decision tree, knowledge extraction, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870