Search results for: Semantic Web Usage Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1394

Search results for: Semantic Web Usage Mining

1094 A Survey on Usage and Diffusion of Project Risk Management Techniques and Software Tools in the Construction Industry

Authors: Muhammad Jamaluddin Thaheem, Alberto De Marco

Abstract:

The area of Project Risk Management (PRM) has been extensively researched, and the utilization of various tools and techniques for managing risk in several industries has been sufficiently reported. Formal and systematic PRM practices have been made available for the construction industry. Based on such body of knowledge, this paper tries to find out the global picture of PRM practices and approaches with the help of a survey to look into the usage of PRM techniques and diffusion of software tools, their level of maturity, and their usefulness in the construction sector. Results show that, despite existing techniques and tools, their usage is limited: software tools are used only by a minority of respondents and their cost is one of the largest hurdles in adoption. Finally, the paper provides some important guidelines for future research regarding quantitative risk analysis techniques and suggestions for PRM software tools development and improvement.

Keywords: Construction industry, Project risk management, Software tools, Survey study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2928
1093 Estimation Model of Dry Docking Duration Using Data Mining

Authors: Isti Surjandari, Riara Novita

Abstract:

Maintenance is one of the most important activities in the shipyard industry. However, sometimes it is not supported by adequate services from the shipyard, where inaccuracy in estimating the duration of the ship maintenance is still common. This makes estimation of ship maintenance duration is crucial. This study uses Data Mining approach, i.e., CART (Classification and Regression Tree) to estimate the duration of ship maintenance that is limited to dock works or which is known as dry docking. By using the volume of dock works as an input to estimate the maintenance duration, 4 classes of dry docking duration were obtained with different linear model and job criteria for each class. These linear models can then be used to estimate the duration of dry docking based on job criteria.

Keywords: Classification and regression tree (CART), data mining, dry docking, maintenance duration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2395
1092 A New Algorithm for Cluster Initialization

Authors: Moth'd Belal. Al-Daoud

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.

Keywords: clustering, k-means, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2065
1091 Arsenic Mobility from Mining Tailings of Monte San Nicolas to Presa de Mata in Guanajuato, Mexico

Authors: I. Cano-Aguilera, B. E. Rubio-Campos, G. De la Rosa, A. F. Aguilera-Alvarado

Abstract:

Mining tailings represent a generating source of rich heavy metal material with a potential danger the public health and the environment, since these metals, under certain conditions, can leach and contaminate aqueous systems that serve like supplying potable water sources. The strategy for this work is based on the observation, experimentation and the simulation that can be obtained by binding real answers of the hydrodynamic behavior of metals leached from mining tailings, and the applied mathematics that provides the logical structure to decipher the individual effects of the general physicochemical phenomenon. The case of study presented herein focuses on mining tailings deposits located in Monte San Nicolas, Guanajuato, Mexico, an abandoned mine. This was considered the contamination source that under certain physicochemical conditions can favor the metal leaching, and its transport towards aqueous systems. In addition, the cartography, meteorology, geology and the hydrodynamics and hydrological characteristics of the place, will be helpful in determining the way and the time in which these systems can interact. Preliminary results demonstrated that arsenic presents a great mobility, since this one was identified in several superficial aqueous systems of the micro watershed, as well as in sediments in concentrations that exceed the established maximum limits in the official norms. Also variations in pH and potential oxide-reduction were registered, conditions that favor the presence of different species from this element its solubility and therefore its mobility.

Keywords: Arsenic, mining tailings, transport.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649
1090 UB-Tree Indexing for Semantic Query Optimization of Range Queries

Authors: S. Housseno, A. Simonet, M. Simonet

Abstract:

Semantic query optimization consists in restricting the search space in order to reduce the set of objects of interest for a query. This paper presents an indexing method based on UB-trees and a static analysis of the constraints associated to the views of the database and to any constraint expressed on attributes. The result of the static analysis is a partitioning of the object space into disjoint blocks. Through Space Filling Curve (SFC) techniques, each fragment (block) of the partition is assigned a unique identifier, enabling the efficient indexing of fragments by UB-trees. The search space corresponding to a range query is restricted to a subset of the blocks of the partition. This approach has been developed in the context of a KB-DBMS but it can be applied to any relational system.

Keywords: Index, Range query, UB-tree, Space Filling Curve, Query optimization, Views, Database, Integrity Constraint, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1460
1089 Characteristics of Football Spectators Using Second Screen

Authors: Florian Pfeffel, Christoph A. Kexel, Peter Kexel, Maria Ratz

Abstract:

The parallel usage of different media channels has increased recently owing to technological advances. Second Screen describes the use of a second device by television viewers to consume further content which is related to the program they are watching. This study analysed the characteristics of football spectators regarding their media consumption in relation to Second Screen usage while watching a football match on TV. The existing literature on Second Screen usage is still very limited, especially in the context of particular broadcasting settings such as sport or even more specific such as football matches. Therefore, the primary research objective was to reveal first insights into the user behaviour of football spectators regarding Second Screen services. The survey, which was conducted among German football supporters in 2015, revealed some characteristics such as the identification and involvement into the sports which are related to an increased use of Second Screen services. One important finding for football supporters was that at the time of a match they have a lower parallel media usage compared to other TV broadcastings. Nevertheless, if supporters used a second device while watching a match on TV, then they were using specific Second Screen services. This means they searched for more content related information. The findings on the habits and characteristics of people who are using Second Screen services are relevant for future developments in that area as well as for marketing decisions.

Keywords: Media consumption, second screen, sport marketing, user behaviour.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1746
1088 Regression Analysis of Travel Indicators and Public Transport Usage in Urban Areas

Authors: M. Moeinaddini, Z. Asadi-Shekari, M. Zaly Shah, A. Hamzah

Abstract:

Currently, planners try to have more green travel options to decrease economic, social and environmental problems. Therefore, this study tries to find significant urban travel factors to be used to increase the usage of alternative urban travel modes. This paper attempts to identify the relationship between prominent urban mobility indicators and daily trips by public transport in 30 cities from various parts of the world. Different travel modes, infrastructures and cost indicators were evaluated in this research as mobility indicators. The results of multi-linear regression analysis indicate that there is a significant relationship between mobility indicators and the daily usage of public transport.

Keywords: Green travel modes, urban travel indicators, daily trips by public transport, multi-linear regression analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2512
1087 Association of Smoking with Chest Radiographic and Lung Function Findings in Retired Bauxite Mining Workers

Authors: L. R. Ferreira, R. C. G. Bianchi, L. C.R. Ferreira, C. M. Galhardi, E. P. Baciuk, L. H. Oliveira

Abstract:

Inhalation hazards are associated with potentially injurious exposure and increased risk for lung diseases, within the bauxite mining industry, especially for the smelter workers. Smoking is related to decreased lung function and leads to chronic lung diseases. This study had the objective to evaluate whether smoking is related to functional and radiographic respiratory changes in retired bauxite mining workers. Methods: This was a retrospective and cross-sectional study involving the analysis of database information of 140 retired bauxite mining workers from Poços de Caldas-MG evaluated at Worker’s Health Reference Center and at the Social Security Brazilian National Institute, from July 1st, 2015 until June 30th, 2016. The workers were divided into three groups: non-smokers (n = 47), ex-smokers (n = 46), and smokers (n = 47). The data included: age, gender, spirometry results, and the presence or not of pulmonary pleural and/or parenchymal changes in chest radiographs. Chi-Squared test was used (p < 0,05). Results: In the smokers’ group, 83% of spirometry tests and 64% of chest x-rays were altered. In the non-smokers’ group, 19% of spirometry tests and 13% of chest x-rays were altered. In the ex-smokers’ group, 35% of spirometry tests and 30% of chest x-rays were altered. Most of the results were statistically significant. Results demonstrated a significant difference between smokers’ and non-smokers’ groups in regard to spirometric and radiographic pulmonary alterations. Ex-smokers’ and non-smokers’ group demonstrated better results when compared to the smokers’ group in relation to altered spirometry and radiograph findings. These data may contribute to planning strategies to enhance smoking cessation programs within the bauxite mining industry.

Keywords: Bauxite mining, spirometry, chest radiography, smoking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 639
1086 Investigation of Cascade Loop Heat Pipes

Authors: Nandy Putra, Atrialdipa Duanovsah, Kristofer Haliansyah

Abstract:

The aim of this research is to design a LHP with low thermal resistance and low condenser temperature. A Self-designed cascade LHP was tested by using biomaterial, sintered copper powder, and aluminum screen mesh as the wick. Using pure water as the working fluid for the first level of the LHP and 96% alcohol as the working fluid for the second level of LHP, the experiments were run with 10W, 20W, and 30W heat input. Experimental result shows that the usage of biomaterial as wick could reduce more temperature at evaporator than by using sintered copper powder and screen mesh up to 22.63% and 37.41% respectively. The lowest thermal resistance occurred during the usage of biomaterial as wick of heat pipe, which is 2.06 oC/W. The usage of cascade system could be applied to LHP to reduce the temperature at condenser and reduced thermal resistance up to 17.6%.

Keywords: Biomaterial, cascade loop heat pipe, screen mesh, sintered Cu.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 877
1085 Applying Fuzzy FP-Growth to Mine Fuzzy Association Rules

Authors: Chien-Hua Wang, Wei-Hsuan Lee, Chin-Tzong Pang

Abstract:

In data mining, the association rules are used to find for the associations between the different items of the transactions database. As the data collected and stored, rules of value can be found through association rules, which can be applied to help managers execute marketing strategies and establish sound market frameworks. This paper aims to use Fuzzy Frequent Pattern growth (FFP-growth) to derive from fuzzy association rules. At first, we apply fuzzy partition methods and decide a membership function of quantitative value for each transaction item. Next, we implement FFP-growth to deal with the process of data mining. In addition, in order to understand the impact of Apriori algorithm and FFP-growth algorithm on the execution time and the number of generated association rules, the experiment will be performed by using different sizes of databases and thresholds. Lastly, the experiment results show FFPgrowth algorithm is more efficient than other existing methods.

Keywords: Data mining, association rule, fuzzy frequent patterngrowth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1763
1084 Applications of Genetic Programming in Data Mining

Authors: Saleh Mesbah Elkaffas, Ahmed A. Toony

Abstract:

This paper details the application of a genetic programming framework for induction of useful classification rules from a database of income statements, balance sheets, and cash flow statements for North American public companies. Potentially interesting classification rules are discovered. Anomalies in the discovery process merit further investigation of the application of genetic programming to the dataset for the problem domain.

Keywords: Genetic programming, data mining classification rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1516
1083 In Cognitive Radio the Analysis of Bit-Error- Rate (BER) by using PSO Algorithm

Authors: Shrikrishan Yadav, Akhilesh Saini, Krishna Chandra Roy

Abstract:

The electromagnetic spectrum is a natural resource and hence well-organized usage of the limited natural resources is the necessities for better communication. The present static frequency allocation schemes cannot accommodate demands of the rapidly increasing number of higher data rate services. Therefore, dynamic usage of the spectrum must be distinguished from the static usage to increase the availability of frequency spectrum. Cognitive radio is not a single piece of apparatus but it is a technology that can incorporate components spread across a network. It offers great promise for improving system efficiency, spectrum utilization, more effective applications, reduction in interference and reduced complexity of usage for users. Cognitive radio is aware of its environmental, internal state, and location, and autonomously adjusts its operations to achieve designed objectives. It first senses its spectral environment over a wide frequency band, and then adapts the parameters to maximize spectrum efficiency with high performance. This paper only focuses on the analysis of Bit-Error-Rate in cognitive radio by using Particle Swarm Optimization Algorithm. It is theoretically as well as practically analyzed and interpreted in the sense of advantages and drawbacks and how BER affects the efficiency and performance of the communication system.

Keywords: BER, Cognitive Radio, Environmental Parameters, PSO, Radio spectrum, Transmission Parameters

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2114
1082 Novelty as a Measure of Interestingness in Knowledge Discovery

Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar

Abstract:

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Keywords: Knowledge Discovery in Databases (KDD), Interestingness, Subjective Measures, Novelty Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1766
1081 3D-Vehicle Associated Research Fields for Smart City via Semantic Search Approach

Authors: Haluk Eren, Mucahit Karaduman

Abstract:

This paper presents 15-year trends for scientific studies in a scientific database considering 3D and vehicle words. Two words are selected to find their associated publications in IEEE scholar database. Both of keywords are entered individually for the years 2002, 2012, and 2016 on the database to identify the preferred subjects of researchers in same years. We have classified closer research fields after searching and listing. Three years (2002, 2012, and 2016) have been investigated to figure out progress in specified time intervals. The first one is assumed as the initial progress in between 2002-2012, and the second one is in 2012-2016 that is fast development duration. We have found very interesting and beneficial results to understand the scholars’ research field preferences for a decade. This information will be highly desirable in smart city-based research purposes consisting of 3D and vehicle-related issues.

Keywords: Vehicle, 3D, smart city, scholarly search, semantic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 841
1080 Using a Semantic Self-Organising Web Page-Ranking Mechanism for Public Administration and Education

Authors: Marios Poulos, Sozon Papavlasopoulos, V. S. Belesiotis

Abstract:

In the proposed method for Web page-ranking, a novel theoretic model is introduced and tested by examples of order relationships among IP addresses. Ranking is induced using a convexity feature, which is learned according to these examples using a self-organizing procedure. We consider the problem of selforganizing learning from IP data to be represented by a semi-random convex polygon procedure, in which the vertices correspond to IP addresses. Based on recent developments in our regularization theory for convex polygons and corresponding Euclidean distance based methods for classification, we develop an algorithmic framework for learning ranking functions based on a Computational Geometric Theory. We show that our algorithm is generic, and present experimental results explaining the potential of our approach. In addition, we explain the generality of our approach by showing its possible use as a visualization tool for data obtained from diverse domains, such as Public Administration and Education.

Keywords: Computational Geometry, Education, e-Governance, Semantic Web.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1721
1079 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487
1078 Real-Time Episodic Memory Construction for Optimal Action Selection in Cognitive Robotics

Authors: Deon de Jager, Yahya Zweiri, Dimitrios Makris

Abstract:

The three most important components in the cognitive architecture for cognitive robotics is memory representation, memory recall, and action-selection performed by the executive. In this paper, action selection, performed by the executive, is defined as a memory quantification and optimization process. The methodology describes the real-time construction of episodic memory through semantic memory optimization. The optimization is performed by set-based particle swarm optimization, using an adaptive entropy memory quantification approach for fitness evaluation. The performance of the approach is experimentally evaluated by simulation, where a UAV is tasked with the collection and delivery of a medical package. The experiments show that the UAV dynamically uses the episodic memory to autonomously control its velocity, while successfully completing its mission.

Keywords: Cognitive robotics, semantic memory, episodic memory, maximum entropy principle, particle swarm optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577
1077 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2419
1076 Post ERP Feral System and use of ‘Feral System as Coping Mechanism

Authors: Tajul Urus, S., Molla, A., Teoh, S.Y.

Abstract:

A number of studies highlighted problems related to ERP systems, yet, most of these studies focus on the problems during the project and implementation stages but not during the postimplementation use process. Problems encountered in the process of using ERP would hinder the effective exploitation and the extended and continued use of ERP systems and their value to organisations. This paper investigates the different types of problems users (operational, supervisory and managerial) faced in using ERP and how 'feral system' is used as the coping mechanism. The paper adopts a qualitative method and uses data collected from two cases and 26 interviews, to inductively develop a casual network model of ERP usage problem and its coping mechanism. This model classified post ERP usage problems as data quality, system quality, interface and infrastructure. The model is also categorised the different coping mechanism through use of 'feral system' inclusive of feral information system, feral data and feral use of technology.

Keywords: Case Studies, Coping Mechanism, Post Implementation ERP system, Usage Problem

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1461
1075 Concept Abduction in Description Logics with Cardinality Restrictions

Authors: Viet-Hoang Vu, Nhan Le-Thanh

Abstract:

Recently the usefulness of Concept Abduction, a novel non-monotonic inference service for Description Logics (DLs), has been argued in the context of ontology-based applications such as semantic matchmaking and resource retrieval. Based on tableau calculus, a method has been proposed to realize this reasoning task in ALN, a description logic that supports simple cardinality restrictions as well as other basic constructors. However, in many ontology-based systems, the representation of ontology would require expressive formalisms for capturing domain-specific constraints, this language is not sufficient. In order to increase the applicability of the abductive reasoning method in such contexts, we would like to present in the scope of this paper an extension of the tableaux-based algorithm for dealing with concepts represented inALCQ, the description logic that extends ALN with full concept negation and quantified number restrictions.

Keywords: Abductive reasoning, description logics, semantic matchmaking, non-monotonic inference, tableaux-based method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527
1074 Extensions to Some AOSE Methodologies

Authors: Louay M. Jeroudaih, Mohamed S. Hajji

Abstract:

This paper looks into areas not covered by prominent Agent-Oriented Software Engineering (AOSE) methodologies. Extensive paper review led to the identification of two issues, first most of these methodologies almost neglect semantic web and ontology. Second, as expected, each one has its strength and weakness and may focus on some phases of the development lifecycle but not all of the phases. The work presented here builds extensions to a highly regarded AOSE methodology (MaSE) in order to cover the areas that this methodology does not concentrate on. The extensions include introducing an ontology stage for semantic representation and integrating early requirement specification from a methodology which mainly focuses on that. The integration involved developing transformation rules (with the necessary handling of nonmatching notions) between the two sets of representations and building the software which automates the transformation. The application of this integration on a case study is also presented in the paper. The main flow of MaSE stages was changed to smoothly accommodate the new additions.

Keywords: Agents, Intelligent Agents, Software Engineering(SE), UML, AUML, and Design Patterns.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1858
1073 Chances and Challenges of Intelligent Technologies in the Production and Retail Sector

Authors: Carsten Röcker

Abstract:

This paper provides an introduction into the evolution of information and communication technology and illustrates its usage in the work domain. The paper is sub-divided into two parts. The first part gives an overview over the different phases of information processing in the work domain. It starts by charting the past and present usage of computers in work environments and shows current technological trends, which are likely to influence future business applications. The second part starts by briefly describing, how the usage of computers changed business processes in the past, and presents first Ambient Intelligence applications based on identification and localization information, which are already used in the production and retail sector. Based on current systems and prototype applications, the paper gives an outlook of how Ambient Intelligence technologies could change business processes in the future.

Keywords: Ambient Intelligence, Ubiquitous Computing, Business Applications, Radio Frequency Identification (RFID).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881
1072 Ambient Intelligence in the Production and Retail Sector: Emerging Opportunities and Potential Pitfalls

Authors: Carsten Röcker

Abstract:

This paper provides an introduction into the evolution of information and communication technology and illustrates its usage in the work domain. The paper is sub-divided into two parts. The first part gives an overview over the different phases of information processing in the work domain. It starts by charting the past and present usage of computers in work environments and shows current technological trends, which are likely to influence future business applications. The second part starts by briefly describing, how the usage of computers changed business processes in the past, and presents first Ambient Intelligence applications based on identification and localization information, which are already used in the production and retail sector. Based on current systems and prototype applications, the paper gives an outlook of how Ambient Intelligence technologies could change business processes in the future.

Keywords: Ambient Intelligence, Ubiquitous Computing, Business Applications, Radio Frequency Identification (RFID)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1831
1071 A Cumulative Learning Approach to Data Mining Employing Censored Production Rules (CPRs)

Authors: Rekha Kandwal, Kamal K.Bharadwaj

Abstract:

Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.

Keywords: Censored production rules, cumulative learning, data mining, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450
1070 Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge

Authors: Lu Zhang, Chunping Li, Jun Liu, Hui Wang

Abstract:

Text similarity measurement is a fundamental issue in many textual applications such as document clustering, classification, summarization and question answering. However, prevailing approaches based on Vector Space Model (VSM) more or less suffer from the limitation of Bag of Words (BOW), which ignores the semantic relationship among words. Enriching document representation with background knowledge from Wikipedia is proven to be an effective way to solve this problem, but most existing methods still cannot avoid similar flaws of BOW in a new vector space. In this paper, we propose a novel text similarity measurement which goes beyond VSM and can find semantic affinity between documents. Specifically, it is a unified graph model that exploits Wikipedia as background knowledge and synthesizes both document representation and similarity computation. The experimental results on two different datasets show that our approach significantly improves VSM-based methods in both text clustering and classification.

Keywords: Text classification, Text clustering, Text similarity, Wikipedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2066
1069 Classifier Based Text Mining for Neural Network

Authors: M. Govindarajan, R. M. Chandrasekaran

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.

Keywords: Back propagation, classification accuracy, textmining, time complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4174
1068 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2442
1067 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5390
1066 Multi-Dimensional Concerns Mining for Web Applications via Concept-Analysis

Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini

Abstract:

Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.

Keywords: Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1433
1065 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 683