Search results for: M5 decision tree model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8615

Search results for: M5 decision tree model

8585 A Hybrid Classification Method using Artificial Neural Network Based Decision Tree for Automatic Sleep Scoring

Authors: Haoyu Ma, Bin Hu, Mike Jackson, Jingzhi Yan, Wen Zhao

Abstract:

In this paper we propose a new classification method for automatic sleep scoring using an artificial neural network based decision tree. It attempts to treat sleep scoring progress as a series of two-class problems and solves them with a decision tree made up of a group of neural network classifiers, each of which uses a special feature set and is aimed at only one specific sleep stage in order to maximize the classification effect. A single electroencephalogram (EEG) signal is used for our analysis rather than depending on multiple biological signals, which makes greatly simplifies the data acquisition process. Experimental results demonstrate that the average epoch by epoch agreement between the visual and the proposed method in separating 30s wakefulness+S1, REM, S2 and SWS epochs was 88.83%. This study shows that the proposed method performed well in all the four stages, and can effectively limit error propagation at the same time. It could, therefore, be an efficient method for automatic sleep scoring. Additionally, since it requires only a small volume of data it could be suited to pervasive applications.

Keywords: Sleep, Sleep stage, Automatic sleep scoring, Electroencephalography, Decision tree, Artificial neural network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2071
8584 Hybrid Machine Learning Approach for Text Categorization

Authors: Nerijus Remeikis, Ignas Skucas, Vida Melninkaite

Abstract:

Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.

Keywords: Text categorization, decision trees, neural networks, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1806
8583 Hybrid Anomaly Detection Using Decision Tree and Support Vector Machine

Authors: Elham Serkani, Hossein Gharaee Garakani, Naser Mohammadzadeh, Elaheh Vaezpour

Abstract:

Intrusion detection systems (IDS) are the main components of network security. These systems analyze the network events for intrusion detection. The design of an IDS is through the training of normal traffic data or attack. The methods of machine learning are the best ways to design IDSs. In the method presented in this article, the pruning algorithm of C5.0 decision tree is being used to reduce the features of traffic data used and training IDS by the least square vector algorithm (LS-SVM). Then, the remaining features are arranged according to the predictor importance criterion. The least important features are eliminated in the order. The remaining features of this stage, which have created the highest level of accuracy in LS-SVM, are selected as the final features. The features obtained, compared to other similar articles which have examined the selected features in the least squared support vector machine model, are better in the accuracy, true positive rate, and false positive. The results are tested by the UNSW-NB15 dataset.

Keywords: Intrusion detection system, decision tree, support vector machine, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1239
8582 Using Single Decision Tree to Assess the Impact of Cutting Conditions on Vibration

Authors: S. Ghorbani, N. I. Polushin

Abstract:

Vibration during machining process is crucial since it affects cutting tool, machine, and workpiece leading to a tool wear, tool breakage, and an unacceptable surface roughness. This paper applies a nonparametric statistical method, single decision tree (SDT), to identify factors affecting on vibration in machining process. Workpiece material (AISI 1045 Steel, AA2024 Aluminum alloy, A48-class30 Gray Cast Iron), cutting tool (conventional, cutting tool with holes in toolholder, cutting tool filled up with epoxy-granite), tool overhang (41-65 mm), spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev) and depth of cut (0.05-0.15 mm) were used as input variables, while vibration was the output parameter. It is concluded that workpiece material is the most important parameters for natural frequency followed by cutting tool and overhang.

Keywords: Cutting condition, vibration, natural frequency, decision tree, CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434
8581 Empirical and Indian Automotive Equity Portfolio Decision Support

Authors: P. Sankar, P. James Daniel Paul, Siddhant Sahu

Abstract:

A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.

Keywords: Indian Automotive Sector, Stock Market Decisions, Equity Portfolio Analysis, Decision Tree Classifiers, Statistical Data Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2036
8580 The Optimization of an Intelligent Traffic Congestion Level Classification from Motorists- Judgments on Vehicle's Moving Patterns

Authors: Thammasak Thianniwet, Satidchoke Phosaard, Wasan Pattara-Atikom

Abstract:

We proposed a technique to identify road traffic congestion levels from velocity of mobile sensors with high accuracy and consistent with motorists- judgments. The data collection utilized a GPS device, a webcam, and an opinion survey. Human perceptions were used to rate the traffic congestion levels into three levels: light, heavy, and jam. Then the ratings and velocity were fed into a decision tree learning model (J48). We successfully extracted vehicle movement patterns to feed into the learning model using a sliding windows technique. The parameters capturing the vehicle moving patterns and the windows size were heuristically optimized. The model achieved accuracy as high as 99.68%. By implementing the model on the existing traffic report systems, the reports will cover comprehensive areas. The proposed method can be applied to any parts of the world.

Keywords: intelligent transportation system (ITS), traffic congestion level, human judgment, decision tree (J48), geographic positioning system (GPS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1821
8579 Fusion of ETM+ Multispectral and Panchromatic Texture for Remote Sensing Classification

Authors: Mahesh Pal

Abstract:

This paper proposes to use ETM+ multispectral data and panchromatic band as well as texture features derived from the panchromatic band for land cover classification. Four texture features including one 'internal texture' and three GLCM based textures namely correlation, entropy, and inverse different moment were used in combination with ETM+ multispectral data. Two data sets involving combination of multispectral, panchromatic band and its texture were used and results were compared with those obtained by using multispectral data alone. A decision tree classifier with and without boosting were used to classify different datasets. Results from this study suggest that the dataset consisting of panchromatic band, four of its texture features and multispectral data was able to increase the classification accuracy by about 2%. In comparison, a boosted decision tree was able to increase the classification accuracy by about 3% with the same dataset.

Keywords: Internal texture; GLCM; decision tree; boosting; classification accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
8578 Historical Landscape Affects Present Tree Density in Paddy Field

Authors: Ha T. Pham, Shuichi Miyagawa

Abstract:

Ongoing landscape transformation is one of the major causes behind disappearance of traditional landscapes, and lead to species and resource loss. Tree in paddy fields in the northeast of Thailand is one of those traditional landscapes. Using three different historical time layers, we acknowledged the severe deforestation and rapid urbanization happened in the region. Despite the general thinking of decline in tree density as consequences, the heterogeneous trend of changes in total tree density in three studied landscapes denied the hypothesis that number of trees in paddy field depend on the length of land use practice. On the other hand, due to selection of planting new trees on levees, existence of trees in paddy field now relies on their values for human use. Besides, changes in land use and landscape structure had a significant impact on decision of which tree density level is considered as suitable for the landscape.

Keywords: Aerial photographs, land use change, traditional landscape, tree in paddy fields.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1864
8577 Development of the Academic Model to Predict Student Success at VUT-FSASEC Using Decision Trees

Authors: Langa Hendrick Musawenkosi, Twala Bhekisipho

Abstract:

The success or failure of students is a concern for every academic institution, college, university, governments and students themselves. Several approaches have been researched to address this concern. In this paper, a view is held that when a student enters a university or college or an academic institution, he or she enters an academic environment. The academic environment is unique concept used to develop the solution for making predictions effectively. This paper presents a model to determine the propensity of a student to succeed or fail in the French South African Schneider Electric Education Center (FSASEC) at the Vaal University of Technology (VUT). The Decision Tree algorithm is used to implement the model at FSASEC.

Keywords: Academic environment model, decision trees, FSASEC, K-nearest neighbor, machine learning, popularity index, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1137
8576 A Decision Tree Approach to Estimate Permanent Residents Using Remote Sensing Data in Lebanese Municipalities

Authors: K. Allaw, J. Adjizian Gerard, M. Chehayeb, A. Raad, W. Fahs, A. Badran, A. Fakherdin, H. Madi, N. Badaro Saliba

Abstract:

Population estimation using Geographic Information System (GIS) and remote sensing faces many obstacles such as the determination of permanent residents. A permanent resident is an individual who stays and works during all four seasons in his village. So, all those who move towards other cities or villages are excluded from this category. The aim of this study is to identify the factors affecting the percentage of permanent residents in a village and to determine the attributed weight to each factor. To do so, six factors have been chosen (slope, precipitation, temperature, number of services, time to Central Business District (CBD) and the proximity to conflict zones) and each one of those factors has been evaluated using one of the following data: the contour lines map of 50 m, the precipitation map, four temperature maps and data collected through surveys. The weighting procedure has been done using decision tree method. As a result of this procedure, temperature (50.8%) and percentage of precipitation (46.5%) are the most influencing factors.

Keywords: Remote sensing and GIS, permanent residence, decision tree, Lebanon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1010
8575 Comparison between Associative Classification and Decision Tree for HCV Treatment Response Prediction

Authors: Enas M. F. El Houby, Marwa S. Hassan

Abstract:

Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.

Keywords: Associative Classification, Data mining, Decision tree, HCV, interferon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1899
8574 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch

Abstract:

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.

Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2523
8573 A Novel Methodology for Synthesis of Fault Trees from MATLAB-Simulink Model

Authors: F. Tajarrod, G. Latif-Shabgahi

Abstract:

Fault tree analysis is a well-known method for reliability and safety assessment of engineering systems. In the last 3 decades, a number of methods have been introduced, in the literature, for automatic construction of fault trees. The main difference between these methods is the starting model from which the tree is constructed. This paper presents a new methodology for the construction of static and dynamic fault trees from a system Simulink model. The method is introduced and explained in detail, and its correctness and completeness is experimentally validated by using an example, taken from literature. Advantages of the method are also mentioned.

Keywords: Fault tree, Simulink, Standby Sparing and Redundancy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3001
8572 Visualising Energy Efficiency Landscape

Authors: Hairulliza M. Judi, Soon Y. Chee

Abstract:

This paper discusses the landscape design that could increase energy efficiency in a house. By planting trees in a house compound, the tree shades prevent direct sunlight from heating up the building, and it enables cooling off the surrounding air. The requirement for air-conditioning could be minimized and the air quality could be improved. During the life time of a tree, the saving cost from the mentioned benefits could be up to US $ 200 for each tree. The project intends to visually describe the landscape design in a house compound that could enhance energy efficiency and consequently lead to energy saving. The house compound model was developed in three dimensions by using AutoCAD 2005, the animation was programmed by using LightWave 3D softwares i.e. Modeler and Layout to display the tree shadings in the wall. The visualization was executed on a VRML Pad platform and implemented on a web environment.

Keywords: Tree planting, tree shading, energy efficiency, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1700
8571 Attacks Classification in Adaptive Intrusion Detection using Decision Tree

Authors: Dewan Md. Farid, Nouria Harbi, Emna Bahri, Mohammad Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Recently, information security has become a key issue in information technology as the number of computer security breaches are exposed to an increasing number of security threats. A variety of intrusion detection systems (IDS) have been employed for protecting computers and networks from malicious network-based or host-based attacks by using traditional statistical methods to new data mining approaches in last decades. However, today's commercially available intrusion detection systems are signature-based that are not capable of detecting unknown attacks. In this paper, we present a new learning algorithm for anomaly based network intrusion detection system using decision tree algorithm that distinguishes attacks from normal behaviors and identifies different types of intrusions. Experimental results on the KDD99 benchmark network intrusion detection dataset demonstrate that the proposed learning algorithm achieved 98% detection rate (DR) in comparison with other existing methods.

Keywords: Detection rate, decision tree, intrusion detectionsystem, network security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3629
8570 Adding Edges between One Node and Every Other Node with the Same Depth in a Complete K-ary Tree

Authors: Kiyoshi Sawada, Takashi Mitsuishi

Abstract:

This paper proposes a model of adding relations between members of the same level in a pyramid organization structure which is a complete K-ary tree such that the communication of information between every member in the organization becomes the most efficient. When edges between one node and every other node with the same depth N in a complete K-ary tree of height H are added, an optimal depth N* = H is obtained by minimizing the total path length which is the sum of lengths of shortest paths between every pair of all nodes.

Keywords: complete K-ary tree, organization structure, shortest path

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1356
8569 Distributed Data-Mining by Probability-Based Patterns

Authors: M. Kargar, F. Gharbalchi

Abstract:

In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.

Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555
8568 Auto Regressive Tree Modeling for Parametric Optimization in Fuzzy Logic Control System

Authors: Arshia Azam, J. Amarnath, Ch. D. V. Paradesi Rao

Abstract:

The advantage of solving the complex nonlinear problems by utilizing fuzzy logic methodologies is that the experience or expert-s knowledge described as a fuzzy rule base can be directly embedded into the systems for dealing with the problems. The current limitation of appropriate and automated designing of fuzzy controllers are focused in this paper. The structure discovery and parameter adjustment of the Branched T-S fuzzy model is addressed by a hybrid technique of type constrained sparse tree algorithms. The simulation result for different system model is evaluated and the identification error is observed to be minimum.

Keywords: Fuzzy logic, branch T-S fuzzy model, tree modeling, complex nonlinear system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1389
8567 Construction Of Decentralized Lifetime Maximizing Tree for Data Aggregation in Wireless Sensor Networks

Authors: Deepali Virmani , Satbir Jain

Abstract:

To meet the demands of wireless sensor networks (WSNs) where data are usually aggregated at a single source prior to transmitting to any distant user, there is a need to establish a tree structure inside any given event region. In this paper , a novel technique to create one such tree is proposed .This tree preserves the energy and maximizes the lifetime of event sources while they are constantly transmitting for data aggregation. The term Decentralized Lifetime Maximizing Tree (DLMT) is used to denote this tree. DLMT features in nodes with higher energy tend to be chosen as data aggregating parents so that the time to detect the first broken tree link can be extended and less energy is involved in tree maintenance. By constructing the tree in such a way, the protocol is able to reduce the frequency of tree reconstruction, minimize the amount of data loss ,minimize the delay during data collection and preserves the energy.

Keywords: branch energy, decentralized, energy level , lifetime, tree energy, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488
8566 Using Time-Series NDVI to Model Land Cover Change: A Case Study in the Berg River Catchment Area, Western Cape, South Africa

Authors: A. S. Adesuyi, Z. Munch

Abstract:

This study investigates the use of a time-series of MODIS NDVI data to identify agricultural land cover change on an annual time step (2007 - 2012) and characterize the trend. Following an ISODATA classification of the MODIS imagery to selectively mask areas not agriculture or semi-natural, NDVI signatures were created to identify areas cereals and vineyards with the aid of ancillary, pictometry and field sample data for 2010. The NDVI signature curve and training samples were used to create a decision tree model in WEKA 3.6.9 using decision tree classifier (J48) algorithm; Model 1 including ISODATA classification and Model 2 not. These two models were then used to classify all data for the study area for 2010, producing land cover maps with classification accuracies of 77% and 80% for Model 1 and 2 respectively. Model 2 was subsequently used to create land cover classification and change detection maps for all other years. Subtle changes and areas of consistency (unchanged) were observed in the agricultural classes and crop practices. Over the years as predicted by the land cover classification. Forty one percent of the catchment comprised of cereals with 35% possibly following a crop rotation system. Vineyards largely remained constant with only one percent conversion to vineyard from other land cover classes.

Keywords: Change detection, Land cover, NDVI, time-series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2290
8565 Plant Varieties Selection System

Authors: Kitti Koonsanit, Chuleerat Jaruskulchai, Poonsak Miphokasap, Apisit Eiumnoh

Abstract:

In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.

Keywords: Plant varieties selection system, decision tree, expert recommendation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1793
8564 Integrating Context Priors into a Decision Tree Classification Scheme

Authors: Kasim Terzic, Bernd Neumann

Abstract:

Scene interpretation systems need to match (often ambiguous) low-level input data to concepts from a high-level ontology. In many domains, these decisions are uncertain and benefit greatly from proper context. This paper demonstrates the use of decision trees for estimating class probabilities for regions described by feature vectors, and shows how context can be introduced in order to improve the matching performance.

Keywords: Classification, Decision Trees, Interpretation, Vision

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1300
8563 Balancing of Quad Tree using Point Pattern Analysis

Authors: Amitava Chakraborty, Sudip Kumar De, Ranjan Dasgupta

Abstract:

Point quad tree is considered as one of the most common data organizations to deal with spatial data & can be used to increase the efficiency for searching the point features. As the efficiency of the searching technique depends on the height of the tree, arbitrary insertion of the point features may make the tree unbalanced and lead to higher time of searching. This paper attempts to design an algorithm to make a nearly balanced quad tree. Point pattern analysis technique has been applied for this purpose which shows a significant enhancement of the performance and the results are also included in the paper for the sake of completeness.

Keywords: Algorithm, Height balanced tree, Point patternanalysis, Point quad tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2699
8562 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining

Authors: Hina Kausher, Sangita Srivastava

Abstract:

In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which cover the variety of figure proportions in both height and girth. 3,000 data have been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from the some states of India to produce the sizing system suitable for clothing manufacture and retailing. The data are used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from the large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.

Keywords: Anthropometric data, data mining, decision tree, garments manufacturing, ready-made garments, sizing systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 960
8561 Using Suffix Tree Document Representation in Hierarchical Agglomerative Clustering

Authors: Daniel I. Morariu, Radu G. Cretulescu, Lucian N. Vintan

Abstract:

In text categorization problem the most used method for documents representation is based on words frequency vectors called VSM (Vector Space Model). This representation is based only on words from documents and in this case loses any “word context" information found in the document. In this article we make a comparison between the classical method of document representation and a method called Suffix Tree Document Model (STDM) that is based on representing documents in the Suffix Tree format. For the STDM model we proposed a new approach for documents representation and a new formula for computing the similarity between two documents. Thus we propose to build the suffix tree only for any two documents at a time. This approach is faster, it has lower memory consumption and use entire document representation without using methods for disposing nodes. Also for this method is proposed a formula for computing the similarity between documents, which improves substantially the clustering quality. This representation method was validated using HAC - Hierarchical Agglomerative Clustering. In this context we experiment also the stemming influence in the document preprocessing step and highlight the difference between similarity or dissimilarity measures to find “closer" documents.

Keywords: Text Clustering, Suffix tree documentrepresentation, Hierarchical Agglomerative Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1910
8560 Multilevel Fuzzy Decision Support Model for China-s Urban Rail Transit Planning Schemes

Authors: Jin-Bao Zhao, Wei Deng

Abstract:

This paper aims at developing a multilevel fuzzy decision support model for urban rail transit planning schemes in China under the background that China is presently experiencing an unprecedented construction of urban rail transit. In this study, an appropriate model using multilevel fuzzy comprehensive evaluation method is developed. In the decision process, the followings are considered as the influential objectives: traveler attraction, environment protection, project feasibility and operation. In addition, consistent matrix analysis method is used to determine the weights between objectives and the weights between the objectives- sub-indictors, which reduces the work caused by repeated establishment of the decision matrix on the basis of ensuring the consistency of decision matrix. The application results show that multilevel fuzzy decision model can perfectly deal with the multivariable and multilevel decision process, which is particularly useful in the resolution of multilevel decision-making problem of urban rail transit planning schemes.

Keywords: Urban rail transit, planning schemes, multilevel fuzzy decision support model, consistent matrix analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1319
8559 Variable Rough Set Model and Its Knowledge Reduction for Incomplete and Fuzzy Decision Information Systems

Authors: Da-kuan Wei, Xian-zhong Zhou, Dong-jun Xin, Zhi-wei Chen

Abstract:

The information systems with incomplete attribute values and fuzzy decisions commonly exist in practical problems. On the base of the notion of variable precision rough set model for incomplete information system and the rough set model for incomplete and fuzzy decision information system, the variable rough set model for incomplete and fuzzy decision information system is constructed, which is the generalization of the variable precision rough set model for incomplete information system and that of rough set model for incomplete and fuzzy decision information system. The knowledge reduction and heuristic algorithm, built on the method and theory of precision reduction, are proposed.

Keywords: Rough set, Incomplete and fuzzy decision information system, Limited valued tolerance relation, Knowledge reduction, Variable rough set model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1585
8558 An Effective Framework for Chinese Syntactic Parsing

Authors: Xing Li, Chengqing Zong

Abstract:

This paper presents an effective framework for Chinesesyntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsingalgorithm, and integrates the idea of the beam search strategy of N bestalgorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation model, which integrates contextual and partial lexical information into traditional PCFG model and defines a new score function. Using this model, the tree with the highest score is found out as the best parsing tree. Finally,the contrasting experiment results are given. Keywords?syntactic parsing, PCFG, pruning, evaluation model.

Keywords: syntactic parsing, PCFG, pruning, evaluation model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1221
8557 Improving Fault Resilience and Reconstruction of Overlay Multicast Tree Using Leaving Time of Participants

Authors: Bhed Bahadur Bista

Abstract:

Network layer multicast, i.e. IP multicast, even after many years of research, development and standardization, is not deployed in large scale due to both technical (e.g. upgrading of routers) and political (e.g. policy making and negotiation) issues. Researchers looked for alternatives and proposed application/overlay multicast where multicast functions are handled by end hosts, not network layer routers. Member hosts wishing to receive multicast data form a multicast delivery tree. The intermediate hosts in the tree act as routers also, i.e. they forward data to the lower hosts in the tree. Unlike IP multicast, where a router cannot leave the tree until all members below it leave, in overlay multicast any member can leave the tree at any time thus disjoining the tree and disrupting the data dissemination. All the disrupted hosts have to rejoin the tree. This characteristic of the overlay multicast causes multicast tree unstable, data loss and rejoin overhead. In this paper, we propose that each node sets its leaving time from the tree and sends join request to a number of nodes in the tree. The nodes in the tree will reject the request if their leaving time is earlier than the requesting node otherwise they will accept the request. The node can join at one of the accepting nodes. This makes the tree more stable as the nodes will join the tree according to their leaving time, earliest leaving time node being at the leaf of the tree. Some intermediate nodes may not follow their leaving time and leave earlier than their leaving time thus disrupting the tree. For this, we propose a proactive recovery mechanism so that disrupted nodes can rejoin the tree at predetermined nodes immediately. We have shown by simulation that there is less overhead when joining the multicast tree and the recovery time of the disrupted nodes is much less than the previous works. Keywords

Keywords: Network layer multicast, Fault Resilience, IP multicast

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1387
8556 Biodiversity and Climate Change: Consequences for Norway Spruce Mountain Forests in Slovakia

Authors: Jozef Mindas, Jaroslav Skvarenina, Jana Skvareninova

Abstract:

Study of the effects of climate change on Norway Spruce (Picea abies) forests has mainly focused on the diversity of tree species diversity of tree species as a result of the ability of species to tolerate temperature and moisture changes as well as some effects of disturbance regime changes. The tree species’ diversity changes in spruce forests due to climate change have been analyzed via gap model. Forest gap model is a dynamic model for calculation basic characteristics of individual forest trees. Input ecological data for model calculations have been taken from the permanent research plots located in primeval forests in mountainous regions in Slovakia. The results of regional scenarios of the climatic change for the territory of Slovakia have been used, from which the values are according to the CGCM3.1 (global) model, KNMI and MPI (regional) models. Model results for conditions of the climate change scenarios suggest a shift of the upper forest limit to the region of the present subalpine zone, in supramontane zone. N. spruce representation will decrease at the expense of beech and precious broadleaved species (Acer sp., Sorbus sp., Fraxinus sp.). The most significant tree species diversity changes have been identified for the upper tree line and current belt of dwarf pine (Pinus mugo) occurrence. The results have been also discussed in relation to most important disturbances (wind storms, snow and ice storms) and phenological changes which consequences are little known. Special discussion is focused on biomass production changes in relation to carbon storage diversity in different carbon pools.

Keywords: Biodiversity, climate change, Norway spruce forests, gap model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644