Search results for: tree labeling
178 Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications
Authors: Kanthida Kusonmano, Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Klaus R. Liedl, Armin Graber
Abstract:
Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.
Keywords: Classification, High dimensional data, Machine learning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2383177 Development of Better Quality Low-Cost Activated Carbon from South African Pine Tree (Pinus patula) Sawdust: Characterization and Comparative Phenol Adsorption
Authors: L. Mukosha, M. S. Onyango, A. Ochieng, H. Kasaini
Abstract:
The remediation of water resources pollution in developing countries requires the application of alternative sustainable cheaper and efficient end-of-pipe wastewater treatment technologies. The feasibility of use of South African cheap and abundant pine tree (Pinus patula) sawdust for development of lowcost AC of comparable quality to expensive commercial ACs in the abatement of water pollution was investigated. AC was developed at optimized two-stage N2-superheated steam activation conditions in a fixed bed reactor, and characterized for proximate and ultimate properties, N2-BET surface area, pore size distribution, SEM, pHPZC and FTIR. The sawdust pyrolysis activation energy was evaluated by TGA. Results indicated that the chars prepared at 800oC and 2hrs were suitable for development of better quality AC at 800oC and 47% burn-off having BET surface area (1086m2/g), micropore volume (0.26cm3/g), and mesopore volume (0.43cm3/g) comparable to expensive commercial ACs, and suitable for water contaminants removal. The developed AC showed basic surface functionality at pHPZC at 10.3, and a phenol adsorption capacity that was higher than that of commercial Norit (RO 0.8) AC. Thus, it is feasible to develop better quality low-cost AC from (Pinus patula) sawdust using twostage N2-steam activation in fixed-bed reactor.
Keywords: Activated carbon, phenol adsorption, sawdust integrated utilization, economical wastewater treatment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3469176 Level of Acceptability of Moringa oleifera Diversified Products among Rural and Urban Dwellers in Nigeria
Authors: Mojisola F. Oyewole, Franscisca T. Adetoro, Nkiru T. Meludu
Abstract:
Moringa oleifera is a nutritious vegetable tree with varieties of potential uses, as almost every part of the Moringa oleifera tree can be used for food. This study was conducted in Oyo State, Nigeria, to find out the level of acceptability of Moringa oleifera diversified products among rural and urban dwellers. Purposive sampling was used to select two local governments’ areas. Stratified sampling technique was also used to select one community each from rural and urban areas while snowball sampling technique was used to select ten respondents each from the two communities, making a total number of forty respondents. Data were analyzed using frequencies, percentages, Chi-square, Pearson Product Moment Correlation and regression analysis. Result from the study revealed that majority of the respondents (80%) fell within the age range of 20-49 years and 55% of them were male, 55% were married, 70% of them were Christians, 80% of them had tertiary education. The result also showed that 85% were aware of the Moringa plant and (65%) of them have consumed Moringa oleifera and the perception statements on the benefits of Moringa oleifera indicated that (52.5%) of the respondents rated Moringa oleifera to be favorable, most of them had high acceptability for Moringa egusi soup, Moringa tea, Moringa pap and yam pottage with Moringa. The result of the hypotheses testing showed that there is a significant relationship between sex of the respondents and acceptability of the diversified Moringa oleifera products (x2=6.465, p = 0.011). There is also a significant relationship between family size of the respondents level of acceptability of the Moringa oleifera products (r = 0.327, p = 0.040). Based on the level of acceptability of Moringa oleifera diversified products; the plant is of great economic importance to the populace. Therefore, there should be more public awareness through the media to enlighten people on the beneficial effects of Moringa oleifera.Keywords: Acceptability, Moringa oleifera, Diversified, Product, Dwellers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2612175 An Experimental Study of a Self-Supervised Classifier Ensemble
Authors: Neamat El Gayar
Abstract:
Learning using labeled and unlabelled data has received considerable amount of attention in the machine learning community due its potential in reducing the need for expensive labeled data. In this work we present a new method for combining labeled and unlabeled data based on classifier ensembles. The model we propose assumes each classifier in the ensemble observes the input using different set of features. Classifiers are initially trained using some labeled samples. The trained classifiers learn further through labeling the unknown patterns using a teaching signals that is generated using the decision of the classifier ensemble, i.e. the classifiers self-supervise each other. Experiments on a set of object images are presented. Our experiments investigate different classifier models, different fusing techniques, different training sizes and different input features. Experimental results reveal that the proposed self-supervised ensemble learning approach reduces classification error over the single classifier and the traditional ensemble classifier approachs.Keywords: Multiple Classifier Systems, classifier ensembles, learning using labeled and unlabelled data, K-nearest neighbor classifier, Bayes classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643174 Soil Evaluation for Cashew, Cocoa and Oil Palm in Akure, South-West Nigeria
Authors: Francis Bukola Dada, Samuel Ojo Ajayi, Babatunde Sunday Ewulo, Kehinde Oseni Saani
Abstract:
A key element in the sustainability of the soil-plant relationship in crop yield and performance is the soil's capacity to support tree crops prior to establishment. With the intention of determining the suitability and limitations of the soils of the locations, the northern and southern portions of Akure, a rainforest in Nigeria, were chosen for the suitability evaluation of land for tree crops. In the study area, 16 pedons were established with the help of the Global Positioning System (GPS), the locations were georeferenced and samples were taken from the pedons. The samples were subjected to standard physical and chemical testing. The findings revealed that soils in the research locations were deep to extremely deep, with pH ranging from highly acidic to slightly acidic (4.94 to 6.71). and that sand predominated. The soils had low levels of organic carbon, effective cation exchange capacity (ECEC), total nitrogen, and available phosphorus, whereas exchangeable cations were evaluated as low to moderate. The suitability result indicated that only Pedon 2 and Pedon 14 are currently highly suitable (S1) for the production of oil palms, while others ranged from moderately suitable to marginally suitable. Pedons 4, 12, and 16 were not suitable (N1), respectively, but other Pedons were moderately suitable (S2) and marginally suitable (S3) for the cultivation of cocoa. None of the study areas are currently highly suitable for the production of oil palms. The poor soil texture and low fertility status were the two main drawbacks found. Finally, sound management practices and soil conservation are essential for fertility sustainability.
Keywords: Cashew, cocoa, land evaluation, oil palm, soil fertility suitability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 451173 The Role of Immunogenic Adhesin Vibrio alginolyticus 49 k Da to Molecule Expression of Major Histocompatibility Complex on Receptors of Humpback Grouper Cromileptes altivelis
Authors: Uun Yanuhar
Abstract:
The purpose of research was to know the role of immunogenic protein of 49 kDa from V.alginolyticus which capable to initiate molecule expression of MHC Class II in receptor of Cromileptes altivelis. The method used was in vivo experimental research through testing of immunogenic protein 49 kDa from V.alginolyticus at Cromileptes altivelis (size of 250 - 300 grams) using 3 times booster by injecting an immunogenic protein in a intramuscular manner. Response of expressed MHC molecule was shown using immunocytochemistry method and SEM. Results indicated that adhesin V.alginolyticus 49 kDa which have immunogenic character could trigger expression of MHC class II on receptor of grouper and has been proven by staining using immunocytochemistry and SEM with labeling using antibody anti MHC (anti mouse). This visible expression based on binding between epitopes antigen and antibody anti MHC in the receptor. Using immunocytochemistry, intracellular response of MHC to in vivo induction of immunogenic adhesin from V.alginolyticus was shown.Keywords: C.altivelis, immunogenic, MHC, V.alginolyticus.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1237172 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector
Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh
Abstract:
A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.
Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 408171 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection
Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi
Abstract:
It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.
Keywords: Ensemble, hybrid, filter-wrapper, phishing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178170 Case Study Analysis of 2017 European Railway Traffic Management Incident: The Application of System for Investigation of Railway Interfaces Methodology
Authors: Sanjeev Kumar Appicharla
Abstract:
This paper presents the results of the modelling and analysis of the European Railway Traffic Management (ERTMS) safety critical incident to raise awareness of biases in systems engineering process on the Cambrian Railway in the UK using the RAIB 17/2019 as a primary input. The RAIB, the UK independent accident investigator, published the Report- RAIB 17/2019 giving the details of their investigation of the focal event in the form of immediate cause, causal factors and underlying factors and recommendations to prevent a repeat of the safety-critical incident on the Cambrian Line. The Systems for Investigation of Railway Interfaces (SIRI) is the Methodology used to model and analyse the safety-critical incident. The SIRI Methodology uses the Swiss Cheese Model to model the incident and identify latent failure conditions (potentially less than adequate conditions) by means of the Management Oversight and Risk Tree technique. The benefits of the SIRI Methodology are threefold: first is that it incorporates “Heuristics and Biases” approach, in the Management Oversight and Risk Tree technique to identify systematic errors. Civil engineering and programme management railway professionals are aware of role “optimism bias” plays in programme cost overruns and are aware of bow tie (fault and event tree) model-based safety risk modelling technique. However, the role of systematic errors due to “Heuristics and Biases” is not appreciated as yet. This overcomes the problems of omission of human and organisational factors from accident analysis. Second, the scope of the investigation includes all levels of the socio-technical system, including government, regulatory, railway safety bodies, duty holders, signalling firms and transport planners, and front-line staff such that lessons learned at the decision making and implementation level as well. Third, the author’s past accident case studies are supplemented with research pieces of evidence drawn from the practitioner’s and academic researchers’ publications as well. This is to discuss the role of system thinking to improve the decision making and risk management processes and practices in the IEC 15288 Systems Engineering standard, and in the industrial context such as the GB railways and Artificial Intelligence (AI) contexts as well.
Keywords: Accident analysis, AI algorithm internal audit, bounded rationality, Byzantine failures, heuristics and biases approach.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 379169 Some New Bounds for a Real Power of the Normalized Laplacian Eigenvalues
Authors: Ayşe Dilek Maden
Abstract:
For a given a simple connected graph, we present some new bounds via a new approach for a special topological index given by the sum of the real number power of the non-zero normalized Laplacian eigenvalues. To use this approach presents an advantage not only to derive old and new bounds on this topic but also gives an idea how some previous results in similar area can be developed.
Keywords: Degree Kirchhoff index, normalized Laplacian eigenvalue, spanning tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201168 Integrating Context Priors into a Decision Tree Classification Scheme
Authors: Kasim Terzic, Bernd Neumann
Abstract:
Scene interpretation systems need to match (often ambiguous) low-level input data to concepts from a high-level ontology. In many domains, these decisions are uncertain and benefit greatly from proper context. This paper demonstrates the use of decision trees for estimating class probabilities for regions described by feature vectors, and shows how context can be introduced in order to improve the matching performance.Keywords: Classification, Decision Trees, Interpretation, Vision
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1300167 A New Distribution Network Reconfiguration Approach using a Tree Model
Authors: E. Dolatdar, S. Soleymani, B. Mozafari
Abstract:
Power loss reduction is one of the main targets in power industry and so in this paper, the problem of finding the optimal configuration of a radial distribution system for loss reduction is considered. Optimal reconfiguration involves the selection of the best set of branches to be opened ,one each from each loop, for reducing resistive line losses , and reliving overloads on feeders by shifting the load to adjacent feeders. However ,since there are many candidate switching combinations in the system ,the feeder reconfiguration is a complicated problem. In this paper a new approach is proposed based on a simple optimum loss calculation by determining optimal trees of the given network. From graph theory a distribution network can be represented with a graph that consists a set of nodes and branches. In fact this problem can be viewed as a problem of determining an optimal tree of the graph which simultaneously ensure radial structure of each candidate topology .In this method the refined genetic algorithm is also set up and some improvements of algorithm are made on chromosome coding. In this paper an implementation of the algorithm presented by [7] is applied by modifying in load flow program and a comparison of this method with the proposed method is employed. In [7] an algorithm is proposed that the choice of the switches to be opened is based on simple heuristic rules. This algorithm reduce the number of load flow runs and also reduce the switching combinations to a fewer number and gives the optimum solution. To demonstrate the validity of these methods computer simulations with PSAT and MATLAB programs are carried out on 33-bus test system. The results show that the performance of the proposed method is better than [7] method and also other methods.
Keywords: Distribution System, Reconfiguration, Loss Reduction , Graph Theory , Optimization , Genetic Algorithm
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3781166 Mining Association Rules from Unstructured Documents
Authors: Hany Mahgoub
Abstract:
This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2441165 Use XML Format like a Model of Data Backup
Authors: Souleymane Oumtanaga, Kadjo Tanon Lambert, Koné Tiémoman, Tety Pierre, Dowa N’sreke Florent
Abstract:
Nowadays data backup format doesn-t cease to appear raising so the anxiety on their accessibility and their perpetuity. XML is one of the most promising formats to guarantee the integrity of data. This article suggests while showing one thing man can do with XML. Indeed XML will help to create a data backup model. The main task will consist in defining an application in JAVA able to convert information of a database in XML format and restore them later.
Keywords: Backup, Proprietary format, parser, syntactic tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729164 Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite
Authors: F. Lazzeri, I. Reiter
Abstract:
Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.
Keywords: Time-series, features engineering methods for forecasting, energy demand forecasting, Azure machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1290163 Calculation of a Sustainable Quota Harvesting of Long-Tailed Macaque (Macaca fascicularis Raffles) in Their Natural Habitats
Authors: Y. Santosa, D. A. Rahman, C. Wulan, A. H. Mustari
Abstract:
The global demand for long-tailed macaques for medical experimentation has continued to increase. Fulfillment of Indonesian export demands has been mostly from natural habitats, based on a harvesting quota. This quota has been determined according to the total catch for a given year, and not based on consideration of any demographic parameters or physical environmental factors with regard to the animal; hence threatening the sustainability of the various populations. It is therefore necessary to formulate a method for calculating a sustainable harvesting quota, based on population parameters in natural habitats. Considering the possibility of variations in habitat characteristics and population parameters, a time series observation of demographic and physical/biotic parameters, in various habitats, was performed on 13 groups of long-tailed macaques, distributed throughout the West Java, Lampung and Yogyakarta areas of Indonesia. These provinces were selected for comparison of the influence of human/tourism activities. Data on population parameters that was collected included data on life expectancy according to age class, numbers of individuals by sex and age class, and ‘ratio of infants to reproductive females’. The estimation of population growth was based on a population dynamic growth model: the Leslie matrix. The harvesting quota was calculated as being the difference between the actual population size and the MVP (minimum viable population) for each sex and age class. Observation indicated that there were variations within group size (24–106 individuals), gender (sex) ratio (1:1 to 1:1.3), life expectancy value (0.30 to 0.93), and ‘ratio of infants to reproductive females’ (0.23 to 1.56). Results of subsequent calculations showed that sustainable harvesting quotas for each studied group of long-tailed macaques, ranged from 29 to 110 individuals. An estimation model of the MVP for each age class was formulated as Log Y = 0.315 + 0.884 Log Ni (number of individual on ith age class). This study also found that life expectancy for the juvenile age class was affected by the humidity under tree stands, and dietary plants’ density at sapling, pole and tree stages (equation: Y=2.296 – 1.535 RH + 0.002 Kpcg – 0.002 Ktg – 0.001 Kphn, R2 = 89.6% with a significance value of 0.001). By contrast, for the sub-adult-adult age class, life expectancy was significantly affected by slope (equation: Y=0.377 = 0.012 Kml, R2 = 50.4%, with significance level of 0.007). The infant-toreproductive- female ratio was affected by humidity under tree stands, and dietary plant density at sapling and pole stages (equation: Y = - 1.432 + 2.172 RH – 0.004 Kpcg + 0.003 Ktg, R2 = 82.0% with significance level of 0.001). This research confirmed the importance of population parameters in determining the minimum viable population, and that MVP varied according to habitat characteristics (especially food availability). It would be difficult therefore, to formulate a general mathematical equation model for determining a harvesting quota for the species as a whole.Keywords: Harvesting, long-tailed macaque, population, quota.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014162 Rolling Element Bearing Diagnosis by Improved Envelope Spectrum: Optimal Frequency Band Selection
Authors: Juan David Arango, Alejandro Restrepo-Martinez
Abstract:
The Rolling Element Bearing (REB) vibration diagnosis is worth of special interest by the variety of REB and the wide necessity of those elements in industrial applications. The presence of a localized fault in a REB gives rise to a vibrational response, characterized by the modulation of a carrier signal. Frequency content of carrier signal (Spectral Frequency –f) is mainly related to resonance frequencies of the REB. This carrier signal is modulated by another signal, governed by the periodicity of the fault impact (Cyclic Frequency –α). In this sense, REB fault vibration response gives rise to a second-order cyclostationary signal. Second order cyclostationary signals could be represented in a bi-spectral map, where Spectral Coherence –SCoh are plotted against f and α. The Improved Envelope Spectrum –IES, is a useful approach to execute REB fault diagnosis. IES could be applied by the integration of SCoh over a predefined bandwidth on the f axis. Approaches to select f-bandwidth have been recently exposed by the definition of a metric which intends to evaluate the magnitude of the IES at the fault characteristics frequencies. This metric is represented in a 1/3-binary tree as a function of the frequency bandwidth and centre. Based on this binary tree the optimal frequency band is selected. However, some advantages have been seen if the metric is changed, which in fact tends to dictate different optimal f-bandwidth and so improve the IES representation. This paper evaluates the behaviour of the IES from a different metric optimization. This metric is based on the sample correlation coefficient, detecting high peaks in the selected frequencies while penalizing high peaks in the neighbours of the selected frequencies. Prior results indicate an improvement on the signal-noise ratio (SNR) on around 86% of samples analysed, which belong to IMS database.
Keywords: Sample Correlation IESFOgram, cyclostationary analysis, improved envelope spectrum, IES, rolling element bearing diagnosis, spectral coherence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 742161 Distributed Data-Mining by Probability-Based Patterns
Authors: M. Kargar, F. Gharbalchi
Abstract:
In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1554160 Project Selection by Using Fuzzy AHP and TOPSIS Technique
Authors: S. Mahmoodzadeh, J. Shahrabi, M. Pariazar, M. S. Zaeri
Abstract:
In this article, by using fuzzy AHP and TOPSIS technique we propose a new method for project selection problem. After reviewing four common methods of comparing alternatives investment (net present value, rate of return, benefit cost analysis and payback period) we use them as criteria in AHP tree. In this methodology by utilizing improved Analytical Hierarchy Process by Fuzzy set theory, first we try to calculate weight of each criterion. Then by implementing TOPSIS algorithm, assessment of projects has been done. Obtained results have been tested in a numerical example.Keywords: Fuzzy AHP, Project Selection, TOPSIS Technique.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6599159 A Multipurpose Audio Watermarking Algorithm Based on Vector Quantization in DCT Domain
Authors: Jixin Liu, Zheming Lu
Abstract:
In this paper, a novel multipurpose audio watermarking algorithm is proposed based on Vector Quantization (VQ) in Discrete Cosine Transform (DCT) domain using the codeword labeling and index-bit constrained method. By using this algorithm, it can fulfill the requirements of both the copyright protection and content integrity authentication at the same time for the multimedia artworks. The robust watermark is embedded in the middle frequency coefficients of the DCT transform during the labeled codeword vector quantization procedure. The fragile watermark is embedded into the indices of the high frequency coefficients of the DCT transform by using the constrained index vector quantization method for the purpose of integrity authentication of the original audio signals. Both the robust and the fragile watermarks can be extracted without the original audio signals, and the simulation results show that our algorithm is effective with regard to the transparency, robustness and the authentication requirementsKeywords: Copyright Protection, Discrete Cosine Transform, Integrity Authentication, Multipurpose Audio Watermarking, Vector Quantization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1992158 The Diameter of an Interval Graph is Twice of its Radius
Authors: Tarasankar Pramanik, Sukumar Mondal, Madhumangal Pal
Abstract:
In an interval graph G = (V,E) the distance between two vertices u, v is de£ned as the smallest number of edges in a path joining u and v. The eccentricity of a vertex v is the maximum among distances from all other vertices of V . The diameter (δ) and radius (ρ) of the graph G is respectively the maximum and minimum among all the eccentricities of G. The center of the graph G is the set C(G) of vertices with eccentricity ρ. In this context our aim is to establish the relation ρ = δ 2 for an interval graph and to determine the center of it.
Keywords: Interval graph, interval tree, radius, center.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643157 Mining Sequential Patterns Using I-PrefixSpan
Authors: Dhany Saputra, Dayang R. A. Rambli, Oi Mean Foong
Abstract:
In this paper, we propose an improvement of pattern growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree framework and separator database to reduce the execution time and memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.Keywords: ArrayList, ArrayIntList, minimum support, sequence database, sequential patterns.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1563156 Using Interval Trees for Approximate Indexing of Instances
Authors: Khalil el Hindi
Abstract:
This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.
Keywords: Instance based learning, interval trees, the knn algorithm, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1509155 A Knowledge Engineering Workshop: Application for Choise Car
Authors: Touahria Mohamed, Khababa Abdallah, Frécon Louis
Abstract:
This paper proposes a declarative language for knowledge representation (Ibn Rochd), and its environment of exploitation (DeGSE). This DeGSE system was designed and developed to facilitate Ibn Rochd writing applications. The system was tested on several knowledge bases by ascending complexity, culminating in a system for recognition of a plant or a tree, and advisors to purchase a car, for pedagogical and academic guidance, or for bank savings and credit. Finally, the limits of the language and research perspectives are stated.Keywords: Knowledge representation, declarative language, IbnRochd, DeGSE, facets, cognitive approach.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1327154 Clustering Categorical Data Using Hierarchies (CLUCDUH)
Authors: Gökhan Silahtaroğlu
Abstract:
Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).Keywords: Clustering, tree, split, pruning, entropy, gini.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555153 Tool for Fast Detection of Java Code Snippets
Authors: Tomáš Bublík, Miroslav Virius
Abstract:
This paper presents general results on the Java source code snippet detection problem. We propose the tool which uses graph and subgraph isomorphism detection. A number of solutions for all of these tasks have been proposed in the literature. However, although that all these solutions are really fast, they compare just the constant static trees. Our solution offers to enter an input sample dynamically with the Scripthon language while preserving an acceptable speed. We used several optimizations to achieve very low number of comparisons during the matching algorithm.
Keywords: AST, Java, tree matching, Scripthon, source code recognition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958152 An Energy Efficient Algorithm for Distributed Mutual Exclusion in Mobile Ad-hoc Networks
Authors: Sayani Sil, Sukanta Das
Abstract:
This paper reports a distributed mutual exclusion algorithm for mobile Ad-hoc networks. The network is clustered hierarchically. The proposed algorithm considers the clustered network as a logical tree and develops a token passing scheme to get the mutual exclusion. The performance analysis and simulation results show that its message requirement is optimal, and thus the algorithm is energy efficient.Keywords: Critical section, Distributed mutual exclusion, MobileAd-hoc network, Token-based algorithms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1750151 Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees
Authors: Doru Anastasiu Popescu, Dan Rădulescu
Abstract:
In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.
Keywords: Tag, HTML, web page, genetic algorithm, similarity value, binary tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309150 Evaluation of the Internal Quality for Pineapple Based on the Spectroscopy Approach and Neural Network
Authors: Nonlapun Meenil, Pisitpong Intarapong, Thitima Wongsheree, Pranchalee Samanpiboon
Abstract:
In Thailand, once pineapples are harvested, they must be classified into two classes based on their sweetness: sweet and unsweet. This paper has studied and developed the assessment of internal quality of pineapples using a low-cost compact spectroscopy sensor according to the spectroscopy approach and Neural Network (NN). During the experiments, Batavia pineapples were utilized, generating 100 samples. The extracted pineapple juice of each sample was used to determine the Soluble Solid Content (SSC) labeling into sweet and unsweet classes. In terms of experimental equipment, the sensor cover was specifically designed to install the sensor and light source to read the reflectance at a five mm depth from pineapple flesh. By using a spectroscopy sensor, data on visible and near-infrared reflectance (Vis-NIR) were collected. The NN was used to classify the pineapple classes. Before the classification step, the preprocessing methods, which are class balancing, data shuffling, and standardization, were applied. The 510 nm and 900 nm reflectance values of the middle parts of pineapples were used as features of the NN. With the sequential model and ReLU activation function, 100% accuracy of the training set and 76.67% accuracy of the test set were achieved. According to the abovementioned information, using a low-cost compact spectroscopy sensor has achieved favorable results in classifying the sweetness of the two classes of pineapples.
Keywords: Spectroscopy, soluble solid content, pineapple, neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 120149 Pectoral Muscles Suppression in Digital Mammograms Using Hybridization of Soft Computing Methods
Authors: I. Laurence Aroquiaraj, K. Thangavel
Abstract:
Breast region segmentation is an essential prerequisite in computerized analysis of mammograms. It aims at separating the breast tissue from the background of the mammogram and it includes two independent segmentations. The first segments the background region which usually contains annotations, labels and frames from the whole breast region, while the second removes the pectoral muscle portion (present in Medio Lateral Oblique (MLO) views) from the rest of the breast tissue. In this paper we propose hybridization of Connected Component Labeling (CCL), Fuzzy, and Straight line methods. Our proposed methods worked good for separating pectoral region. After removal pectoral muscle from the mammogram, further processing is confined to the breast region alone. To demonstrate the validity of our segmentation algorithm, it is extensively tested using over 322 mammographic images from the Mammographic Image Analysis Society (MIAS) database. The segmentation results were evaluated using a Mean Absolute Error (MAE), Hausdroff Distance (HD), Probabilistic Rand Index (PRI), Local Consistency Error (LCE) and Tanimoto Coefficient (TC). The hybridization of fuzzy with straight line method is given more than 96% of the curve segmentations to be adequate or better. In addition a comparison with similar approaches from the state of the art has been given, obtaining slightly improved results. Experimental results demonstrate the effectiveness of the proposed approach.
Keywords: X-ray Mammography, CCL, Fuzzy, Straight line.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755