Search results for: graph mining
269 Induction of Expressive Rules using the Binary Coding Method
Authors: Seyed R Mousavi
Abstract:
In most rule-induction algorithms, the only operator used against nominal attributes is the equality operator =. In this paper, we first propose the use of the inequality operator, ≠, in addition to the equality operator, to increase the expressiveness of induced rules. Then, we present a new method, Binary Coding, which can be used along with an arbitrary rule-induction algorithm to make use of the inequality operator without any need to change the algorithm. Experimental results suggest that the Binary Coding method is promising enough for further investigation, especially in cases where the minimum number of rules is desirable.
Keywords: Data mining, Inequality operator, Number of rules, Rule-induction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1256268 Improving Classification in Bayesian Networks using Structural Learning
Authors: Hong Choon Ong
Abstract:
Naïve Bayes classifiers are simple probabilistic classifiers. Classification extracts patterns by using data file with a set of labeled training examples and is currently one of the most significant areas in data mining. However, Naïve Bayes assumes the independence among the features. Structural learning among the features thus helps in the classification problem. In this study, the use of structural learning in Bayesian Network is proposed to be applied where there are relationships between the features when using the Naïve Bayes. The improvement in the classification using structural learning is shown if there exist relationship between the features or when they are not independent.Keywords: Bayesian Network, Classification, Naïve Bayes, Structural Learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2599267 Data Preprocessing for Supervised Leaning
Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas
Abstract:
Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.Keywords: Data mining, feature selection, data cleaning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6091266 Temperature Dependence of Relative Permittivity: A Measurement Technique Using Split Ring Resonators
Authors: Sreedevi P. Chakyar, Jolly Andrews, V. P. Joseph
Abstract:
A compact method for measuring the relative permittivity of a dielectric material at different temperatures using a single circular Split Ring Resonator (SRR) metamaterial unit working as a test probe is presented in this paper. The dielectric constant of a material is dependent upon its temperature and the LC resonance of the SRR depends on its dielectric environment. Hence, the temperature of the dielectric material in contact with the resonator influences its resonant frequency. A single SRR placed between transmitting and receiving probes connected to a Vector Network Analyser (VNA) is used as a test probe. The dependence of temperature between 30 oC and 60 oC on resonant frequency of SRR is analysed. Relative permittivities ‘ε’ of test samples for different temperatures are extracted from a calibration graph drawn between the relative permittivity of samples of known dielectric constant and their corresponding resonant frequencies. This method is found to be an easy and efficient technique for analysing the temperature dependent permittivity of different materials.
Keywords: Metamaterials, negative permeability, permittivity measurement techniques, split ring resonators, temperature dependent dielectric constant.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2585265 Utilizing 5G Mobile Connection as a Node in Layer 1 Proof of Authority Blockchain Used for Microtransaction
Authors: Frode van der Laak
Abstract:
The paper contributes to the feasibility of using a 5G mobile connection as a node for a Proof of Authority (PoA) blockchain, which is used for microtransactions at the same time. It uses the phone number identity of the users that are linked to the crypto wallet address. It also proposed a consensus protocol based on PoA blockchain; PoA is a permission blockchain where consensus is achieved through a set of designated authority rather than through mining, as is the case with a Proof of Work (PoW) blockchain. This report will first explain the concept of a PoA blockchain and how it works. It will then discuss the potential benefits and challenges of using a 5G mobile connection as a node in such a blockchain, and finally, the main open problem statement and proposed solutions with the requirements.
Keywords: 5G, mobile, connection, node, PoA, blockchain, microtransaction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 189264 Simulation Model for Predicting Dengue Fever Outbreak
Authors: Azmi Ibrahim, Nor Azan Mat Zin, Noraidah Sahari Ashaari
Abstract:
Dengue fever is prevalent in Malaysia with numerous cases including mortality recorded over the years. Public education on the prevention of the desease through various means has been carried out besides the enforcement of legal means to eradicate Aedes mosquitoes, the dengue vector breeding ground. Hence, other means need to be explored, such as predicting the seasonal peak period of the dengue outbreak and identifying related climate factors contributing to the increase in the number of mosquitoes. Simulation model can be employed for this purpose. In this study, we created a simulation of system dynamic to predict the spread of dengue outbreak in Hulu Langat, Selangor Malaysia. The prototype was developed using STELLA 9.1.2 software. The main data input are rainfall, temperature and denggue cases. Data analysis from the graph showed that denggue cases can be predicted accurately using these two main variables- rainfall and temperature. However, the model will be further tested over a longer time period to ensure its accuracy, reliability and efficiency as a prediction tool for dengue outbreak.Keywords: dengue fever, prediction, system dynamic, simulation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2336263 Representing Data without Lost Compression Properties in Time Series: A Review
Authors: Nabilah Filzah Mohd Radzuan, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan
Abstract:
Uncertain data is believed to be an important issue in building up a prediction model. The main objective in the time series uncertainty analysis is to formulate uncertain data in order to gain knowledge and fit low dimensional model prior to a prediction task. This paper discusses the performance of a number of techniques in dealing with uncertain data specifically those which solve uncertain data condition by minimizing the loss of compression properties.
Keywords: Compression properties, uncertainty, uncertain time series, mining technique, weather prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620262 Experience Modularization for New Value of Evanescent Cultural Communities: Developing Creative Tourism Services in Bangkok
Authors: Wuttigrai Ngamsirijit
Abstract:
Creative tourism is an ongoing development in many countries as an attempt to moving away from serial reproduction of culture and reviving the culture. Despite, in the destinations with diverse and potential cultural resources, creating new tourism services can be vague. This paper presents how tourism experiences are modularized and consolidated in order to form new creative tourism service offerings in evanescent cultural communities of Bangkok, Thailand. The benefits from data mining in accommodating value co-creation are discussed, and implication of experience modularization to national creative tourism policy is addressed.
Keywords: Co-creation, Creative tourism, New Service Design
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2403261 MCOKE: Multi-Cluster Overlapping K-Means Extension Algorithm
Authors: Said Baadel, Fadi Thabtah, Joan Lu
Abstract:
Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold be defined a priori which can be difficult to determine by novice users.
Keywords: Data mining, k-means, MCOKE, overlapping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2754260 Inheritance of Primary Yield Component Traits of Common Beans (Phaseolus vulgaris L.): Number of Seeds per Pod and 1000 Seed Weight in an 8X8 Diallel Cross Population
Authors: Atnaf Tiruneh Mulugeta, Mohammed Ali Hussein, Zelleke Habtamu
Abstract:
Thirty six genotypes (8 parents and 28 F1 diallel crosses) were grown in randomized complete block design during 2006 at Mandura, North western Ethiopia. The experiment was executed to study the inheritance of two primary yield component traits: number of seeds per pod and 1000 seed weight. Statistical significant difference was observed between genotypes, parents, and crosses for these traits. The mean square due to GCA was significant for the two traits. However, SCA mean square was significant only for number of seeds per pod. Thus both additive and non-additive types of gene actions were important in the inheritance of number of seeds per pod. Significant b1 component was obtained for this trait. The b2 and b3 components, however, were not significant, suggesting the absence of gene asymmetry. From Wr/Vr graph, inheritance of seeds per pod was governed by partial dominance with additive gene action.
Keywords: Diallel crosses, General combining ability, Phaseolus vulgaris L., Specific combining ability
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2493259 Reasons for Non-Applicability of Software Entropy Metrics for Bug Prediction in Android
Authors: Arvinder Kaur, Deepti Chopra
Abstract:
Software Entropy Metrics for bug prediction have been validated on various software systems by different researchers. In our previous research, we have validated that Software Entropy Metrics calculated for Mozilla subsystem’s predict the future bugs reasonably well. In this study, the Software Entropy metrics are calculated for a subsystem of Android and it is noticed that these metrics are not suitable for bug prediction. The results are compared with a subsystem of Mozilla and a comparison is made between the two software systems to determine the reasons why Software Entropy metrics are not applicable for Android.
Keywords: Android, bug prediction, mining software repositories, Software Entropy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1092258 Indoor Localization by Pattern Matching Method Based On Extended Database
Authors: Gyumin Hwang, Jihong Lee
Abstract:
This paper studied the CSS-based indoor localization system which is easy to implement, inexpensive to compose the systems, additionally CSS-based indoor localization system covers larger area than other system. However, this system has problem which is affected by reflected distance data. This problem in localization is caused by the multi-path effect. Error caused by multi-path is difficult to be corrected because the indoor environment cannot be described. In this paper, in order to solve the problem by multi-path, we have supplemented the localization system by using pattern matching method based on extended database. Thereby, this method improves precision of estimated. Also this method is verified by experiments in gymnasium. Database was constructed by 1m intervals, and 16 sample data were collected from random position inside the region of DB points. As a result, this paper shows higher accuracy than existing method through graph and table.
Keywords: Chirp Spread Spectrum (CSS), Indoor Localization, Pattern-Matching, Time of Arrival (ToA), Multi-Path, Mahalanobis Distance, Reception Rate, Simultaneous Localization and Mapping (SLAM), Laser Range Finder (LRF).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1891257 Minimal Spanning Tree based Fuzzy Clustering
Authors: Ágnes Vathy-Fogarassy, Balázs Feil, János Abonyi
Abstract:
Most of fuzzy clustering algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. This paper studies the synergistic combination of the hierarchical and graph theoretic minimal spanning tree based clustering algorithm with the partitional Gath-Geva fuzzy clustering algorithm. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. For the analysis of the resulted fuzzy clusters a new fuzzy similarity measure based tool has been presented. The calculated similarities of the clusters can be used for the hierarchical clustering of the resulted fuzzy clusters, which information is useful for cluster merging and for the visualization of the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the classical Gath-Geva fuzzy clustering algorithm.Keywords: Clustering, fuzzy clustering, minimal spanning tree, cluster validity, fuzzy similarity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2406256 Topology Preservation in SOM
Authors: E. Arsuaga Uriarte, F. Díaz Martín
Abstract:
The SOM has several beneficial features which make it a useful method for data mining. One of the most important features is the ability to preserve the topology in the projection. There are several measures that can be used to quantify the goodness of the map in order to obtain the optimal projection, including the average quantization error and many topological errors. Many researches have studied how the topology preservation should be measured. One option consists of using the topographic error which considers the ratio of data vectors for which the first and second best BMUs are not adjacent. In this work we present a study of the behaviour of the topographic error in different kinds of maps. We have found that this error devaluates the rectangular maps and we have studied the reasons why this happens. Finally, we suggest a new topological error to improve the deficiency of the topographic error.Keywords: Map lattice, Self-Organizing Map, topographic error, topology preservation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3012255 Artificial Neural Network based Modeling of Evaporation Losses in Reservoirs
Authors: Surinder Deswal, Mahesh Pal
Abstract:
An Artificial Neural Network based modeling technique has been used to study the influence of different combinations of meteorological parameters on evaporation from a reservoir. The data set used is taken from an earlier reported study. Several input combination were tried so as to find out the importance of different input parameters in predicting the evaporation. The prediction accuracy of Artificial Neural Network has also been compared with the accuracy of linear regression for predicting evaporation. The comparison demonstrated superior performance of Artificial Neural Network over linear regression approach. The findings of the study also revealed the requirement of all input parameters considered together, instead of individual parameters taken one at a time as reported in earlier studies, in predicting the evaporation. The highest correlation coefficient (0.960) along with lowest root mean square error (0.865) was obtained with the input combination of air temperature, wind speed, sunshine hours and mean relative humidity. A graph between the actual and predicted values of evaporation suggests that most of the values lie within a scatter of ±15% with all input parameters. The findings of this study suggest the usefulness of ANN technique in predicting the evaporation losses from reservoirs.Keywords: Artificial neural network, evaporation losses, multiple linear regression, modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1978254 Ontology-based Domain Modelling for Consistent Content Change Management
Authors: Muhammad Javed, Yalemisew M. Abgaz, Claus Pahl
Abstract:
Ontology-based modelling of multi-formatted software application content is a challenging area in content management. When the number of software content unit is huge and in continuous process of change, content change management is important. The management of content in this context requires targeted access and manipulation methods. We present a novel approach to deal with model-driven content-centric information systems and access to their content. At the core of our approach is an ontology-based semantic annotation technique for diversely formatted content that can improve the accuracy of access and systems evolution. Domain ontologies represent domain-specific concepts and conform to metamodels. Different ontologies - from application domain ontologies to software ontologies - capture and model the different properties and perspectives on a software content unit. Interdependencies between domain ontologies, the artifacts and the content are captured through a trace model. The annotation traces are formalised and a graph-based system is selected for the representation of the annotation traces.Keywords: Consistent Content Management, Impact Categorisation, Trace Model, Ontology Evolution
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1684253 A New Evolutionary Algorithm for Cluster Analysis
Authors: B.Bahmani Firouzi, T. Niknam, M. Nayeripour
Abstract:
Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.
Keywords: Data clustering, Hybrid evolutionary optimization algorithm, K-means algorithm, Simulated Annealing (SA), Particle Swarm Optimization (PSO).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2277252 Reliability Analysis of P-I Diagram Formula for RC Column Subjected to Blast Load
Authors: Masoud Abedini, Azrul A. Mutalib, Shahrizan Baharom, Hong Hao
Abstract:
This study was conducted published to investigate there liability of the equation pressure-impulse (PI) reinforced concrete column inprevious studies. Equation involves three different levels of damage criteria known as D =0. 2, D =0. 5 and D =0. 8.The damage criteria known as a minor when 0-0.2, 0.2-0.5is known as moderate damage, high damage known as 0.5-0.8, and 0.8-1 of the structure is considered a failure. In this study, two types of reliability analyzes conducted. First, using pressure-impulse equation with different parameters. The parameters involved are the concrete strength, depth, width, and height column, the ratio of longitudinal reinforcement and transverse reinforcement ratio. In the first analysis of the reliability of this new equation is derived to improve the previous equations. The second reliability analysis involves three types of columns used to derive the PI curve diagram using the derived equation to compare with the equation derived from other researchers and graph minimum standoff versus weapon yield Federal Emergency Management Agency (FEMA). The results showed that the derived equation is more accurate with FEMA standards than previous researchers.
Keywords: Blast load, RC column, P-I curve, Analytical formulae, Standard FEMA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2913251 Annual Power Load Forecasting Using Support Vector Regression Machines: A Study on Guangdong Province of China 1985-2008
Authors: Zhiyong Li, Zhigang Chen, Chao Fu, Shipeng Zhang
Abstract:
Load forecasting has always been the essential part of an efficient power system operation and planning. A novel approach based on support vector machines is proposed in this paper for annual power load forecasting. Different kernel functions are selected to construct a combinatorial algorithm. The performance of the new model is evaluated with a real-world dataset, and compared with two neural networks and some traditional forecasting techniques. The results show that the proposed method exhibits superior performance.Keywords: combinatorial algorithm, data mining, load forecasting, support vector machines
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646250 Economy-Based Computing with WebCom
Authors: Adarsh Patil, David A. Power, John P. Morrison
Abstract:
Grid environments consist of the volatile integration of discrete heterogeneous resources. The notion of the Grid is to unite different users and organisations and pool their resources into one large computing platform where they can harness, inter-operate, collaborate and interact. If the Grid Community is to achieve this objective, then participants (Users and Organisations) need to be willing to donate or share their resources and permit other participants to use their resources. Resources do not have to be shared at all times, since it may result in users not having access to their own resource. The idea of reward-based computing was developed to address the sharing problem in a pragmatic manner. Participants are offered a reward to donate their resources to the Grid. A reward may include monetary recompense or a pro rata share of available resources when constrained. This latter point may imply a quality of service, which in turn may require some globally agreed reservation mechanism. This paper presents a platform for economybased computing using the WebCom Grid middleware. Using this middleware, participants can configure their resources at times and priority levels to suit their local usage policy. The WebCom system accounts for processing done on individual participants- resources and rewards them accordingly.Keywords: WebCom, Economy-based computing, WebComGrid Bank Reward, Condensed Graph, Distributor, Accounting, GridPoint.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1207249 Mining News Sites to Create Special Domain News Collections
Authors: David B. Bracewell, Fuji Ren, Shingo Kuroiwa
Abstract:
We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles.Keywords: Information Retrieval, News, Special DomainCollections,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487248 Prediction of a Human Facial Image by ANN using Image Data and its Content on Web Pages
Authors: Chutimon Thitipornvanid, Siripun Sanguansintukul
Abstract:
Choosing the right metadata is a critical, as good information (metadata) attached to an image will facilitate its visibility from a pile of other images. The image-s value is enhanced not only by the quality of attached metadata but also by the technique of the search. This study proposes a technique that is simple but efficient to predict a single human image from a website using the basic image data and the embedded metadata of the image-s content appearing on web pages. The result is very encouraging with the prediction accuracy of 95%. This technique may become a great assist to librarians, researchers and many others for automatically and efficiently identifying a set of human images out of a greater set of images.Keywords: Metadata, Prediction, Multi-layer perceptron, Human facial image, Image mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1214247 Iterative Clustering Algorithm for Analyzing Temporal Patterns of Gene Expression
Authors: Seo Young Kim, Jae Won Lee, Jong Sung Bae
Abstract:
Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms.Keywords: Clustering, microarray experiment, temporal pattern of gene expression data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1355246 WebGD: A CORBA-based Document Classification and Retrieval System on the Web
Authors: Fuyang Peng, Bo Deng, Chao Qi, Mou Zhan
Abstract:
This paper presents the design and implementation of the WebGD, a CORBA-based document classification and retrieval system on Internet. The WebGD makes use of such techniques as Web, CORBA, Java, NLP, fuzzy technique, knowledge-based processing and database technology. Unified classification and retrieval model, classifying and retrieving with one reasoning engine and flexible working mode configuration are some of its main features. The architecture of WebGD, the unified classification and retrieval model, the components of the WebGD server and the fuzzy inference engine are discussed in this paper in detail.Keywords: Text Mining, document classification, knowledgeprocessing, fuzzy logic, Web, CORBA
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1848245 An Overview of Construction and Demolition Waste as Coarse Aggregate in Concrete
Authors: S. R. Shamili, J. Karthikeyan
Abstract:
Fast development of the total populace and far and wide urbanization has surprisingly expanded the advancement of the construction industry. As a result of these activities, old structures are being demolished to make new buildings. Due to these large-scale demolitions, a huge amount of debris is generated all over the world, which results in a landfill. The use of construction and demolition waste as landfill causes groundwater contamination, which is hazardous. Using construction and demolition waste as aggregate can reduce the use of natural aggregates and the problem of mining. The objective of this study is to provide a detailed overview on how the construction and demolition waste material has been used as aggregate in structural concrete. In this study, the preparation, classification, and composition of construction and demolition wastes are also discussed.
Keywords: Aggregate, construction and demolition waste, landfill, large scale demolition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 643244 Eclectic Rule-Extraction from Support Vector Machines
Authors: Nahla Barakat, Joachim Diederich
Abstract:
Support vector machines (SVMs) have shown superior performance compared to other machine learning techniques, especially in classification problems. Yet one limitation of SVMs is the lack of an explanation capability which is crucial in some applications, e.g. in the medical and security domains. In this paper, a novel approach for eclectic rule-extraction from support vector machines is presented. This approach utilizes the knowledge acquired by the SVM and represented in its support vectors as well as the parameters associated with them. The approach includes three stages; training, propositional rule-extraction and rule quality evaluation. Results from four different experiments have demonstrated the value of the approach for extracting comprehensible rules of high accuracy and fidelity.Keywords: Data mining, hybrid rule-extraction algorithms, medical diagnosis, SVMs
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708243 Discovery of Sequential Patterns Based On Constraint Patterns
Authors: Shigeaki Sakurai, Youichi Kitahata, Ryohei Orihara
Abstract:
This paper proposes a method that discovers sequential patterns corresponding to user-s interests from sequential data. This method expresses the interests as constraint patterns. The constraint patterns can define relationships among attributes of the items composing the data. The method recursively decomposes the constraint patterns into constraint subpatterns. The method evaluates the constraint subpatterns in order to efficiently discover sequential patterns satisfying the constraint patterns. Also, this paper applies the method to the sequential data composed of stock price indexes and verifies its effectiveness through comparing it with a method without using the constraint patterns.
Keywords: Sequential pattern mining, Constraint pattern, Attribute constraint, Stock price indexes
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1423242 Investigation on Toxicity of Manufactured Nanoparticles to Bioluminescence Bacteria Vibrio fischeri
Authors: E. Binaeian, SH. Soroushnia
Abstract:
Acute toxicity of nano SiO2, ZnO, MCM-41 (Meso pore silica), Cu, Multi Wall Carbon Nano Tube (MWCNT), Single Wall Carbon Nano Tube (SWCNT) , Fe (Coated) to bacteria Vibrio fischeri using a homemade luminometer , was evaluated. The values of the nominal effective concentrations (EC), causing 20% and 50% inhibition of biouminescence, using two mathematical models at two times of 5 and 30 minutes were calculated. Luminometer was designed with Photomultiplier (PMT) detector. Luminol chemiluminescence reaction was carried out for the calibration graph. In the linear calibration range, the correlation coefficients and coefficient of Variation (CV) were 0.988 and 3.21% respectively which demonstrate the accuracy and reproducibility of the instrument that are suitable. The important part of this research depends on how to optimize the best condition for maximum bioluminescence. The culture of Vibrio fischeri with optimal conditions in liquid media, were stirring at 120 rpm at a temperature of 150C to 180C and were incubated for 24 to 72 hours while solid medium was held at 180C and for 48 hours. Suspension of nanoparticles ZnO, after 30 min contact time to bacteria Vibrio fischeri, showed the highest toxicity while SiO2 nanoparticles showed the lowest toxicity. After 5 min exposure time, the toxicity of ZnO was the strongest and MCM-41 was the weakest toxicant component.
Keywords: Bioluminescence, effective concentration, nanomaterials, toxicity, Vibrio fischeri.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2960241 Issue Reorganization Using the Measure of Relevance
Authors: William Wong Xiu Shun, Yoonjin Hyun, Mingyu Kim, Seongi Choi, Namgyu Kim
Abstract:
The need to extract R&D keywords from issues and use them to retrieve R&D information is increasing rapidly. However, it is difficult to identify related issues or distinguish them. Although the similarity between issues cannot be identified, with an R&D lexicon, issues that always share the same R&D keywords can be determined. In detail, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Furthermore, the relationship among issues that share the same R&D keywords can be shown in a more systematic way by clustering them according to keywords. Thus, sharing R&D results and reusing R&D technology can be facilitated. Indirectly, redundant investment in R&D can be reduced as the relevant R&D information can be shared among corresponding issues and the reusability of related R&D can be improved. Therefore, a methodology to cluster issues from the perspective of common R&D keywords is proposed to satisfy these demands.
Keywords: Clustering, Social Network Analysis, Text Mining, Topic Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2038240 Modeling Language for Constructing Solvers in Machine Learning: Reductionist Perspectives
Authors: Tsuyoshi Okita
Abstract:
For a given specific problem an efficient algorithm has been the matter of study. However, an alternative approach orthogonal to this approach comes out, which is called a reduction. In general for a given specific problem this reduction approach studies how to convert an original problem into subproblems. This paper proposes a formal modeling language to support this reduction approach in order to make a solver quickly. We show three examples from the wide area of learning problems. The benefit is a fast prototyping of algorithms for a given new problem. It is noted that our formal modeling language is not intend for providing an efficient notation for data mining application, but for facilitating a designer who develops solvers in machine learning.
Keywords: Formal language, statistical inference problem, reduction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1328