Search results for: Constraint Based Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 11641

Search results for: Constraint Based Mining

11131 Optimizing Spatial Trend Detection By Artificial Immune Systems

Authors: M. Derakhshanfar, B. Minaei-Bidgoli

Abstract:

Spatial trends are one of the valuable patterns in geo databases. They play an important role in data analysis and knowledge discovery from spatial data. A spatial trend is a regular change of one or more non spatial attributes when spatially moving away from a start object. Spatial trend detection is a graph search problem therefore heuristic methods can be good solution. Artificial immune system (AIS) is a special method for searching and optimizing. AIS is a novel evolutionary paradigm inspired by the biological immune system. The models based on immune system principles, such as the clonal selection theory, the immune network model or the negative selection algorithm, have been finding increasing applications in fields of science and engineering. In this paper, we develop a novel immunological algorithm based on clonal selection algorithm (CSA) for spatial trend detection. We are created neighborhood graph and neighborhood path, then select spatial trends that their affinity is high for antibody. In an evolutionary process with artificial immune algorithm, affinity of low trends is increased with mutation until stop condition is satisfied.

Keywords: Spatial Data Mining, Spatial Trend Detection, Heuristic Methods, Artificial Immune System, Clonal Selection Algorithm (CSA)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2022
11130 Classifying Biomedical Text Abstracts based on Hierarchical 'Concept' Structure

Authors: Rozilawati Binti Dollah, Masaki Aono

Abstract:

Classifying biomedical literature is a difficult and challenging task, especially when a large number of biomedical articles should be organized into a hierarchical structure. In this paper, we present an approach for classifying a collection of biomedical text abstracts downloaded from Medline database with the help of ontology alignment. To accomplish our goal, we construct two types of hierarchies, the OHSUMED disease hierarchy and the Medline abstract disease hierarchies from the OHSUMED dataset and the Medline abstracts, respectively. Then, we enrich the OHSUMED disease hierarchy before adapting it to ontology alignment process for finding probable concepts or categories. Subsequently, we compute the cosine similarity between the vector in probable concepts (in the “enriched" OHSUMED disease hierarchy) and the vector in Medline abstract disease hierarchies. Finally, we assign category to the new Medline abstracts based on the similarity score. The results obtained from the experiments show the performance of our proposed approach for hierarchical classification is slightly better than the performance of the multi-class flat classification.

Keywords: Biomedical literature, hierarchical text classification, ontology alignment, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1993
11129 K-Means for Spherical Clusters with Large Variance in Sizes

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.

Keywords: K-Means, Data Clustering, Cluster Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3259
11128 The Investigation of Enzymatic Activity in the Soils under the Impact of Metallurgical Industrial Activity in Lori Marz, Armenia

Authors: T. H. Derdzyan, K. A. Ghazaryan, G. A. Gevorgyan

Abstract:

Beta-glucosidase, chitinase, leucine-aminopeptidase, acid phosphomonoesterase and acetate-esterase enzyme activities in the soils under the impact of metallurgical industrial activity in Lori marz (district) were investigated. The results of the study showed that the activities of the investigated enzymes in the soils decreased with increasing distance from the Shamlugh copper mine, the Chochkan tailings storage facility and the ore transportation road. Statistical analysis revealed that the activities of the enzymes were positively correlated (significant) to each other according to the observation sites which indicated that enzyme activities were affected by the same anthropogenic factor. The investigations showed that the soils were polluted with heavy metals (Cu, Pb, As, Co, Ni, Zn) due to copper mining activity in this territory. The results of Pearson correlation analysis revealed a significant negative correlation between heavy metal pollution degree (Nemerow integrated pollution index) and soil enzyme activity. All of this indicated that copper mining activity in this territory causing the heavy metal pollution of the soils resulted in the inhabitation of the activities of the enzymes which are considered as biological catalysts to decompose organic materials and facilitate the cycling of nutrients.

Keywords: Armenia, metallurgical industrial activity, heavy metal pollutionl, soil enzyme activity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2546
11127 Learning an Overcomplete Dictionary using a Cauchy Mixture Model for Sparse Decay

Authors: E. S. Gower, M. O. J. Hawksford

Abstract:

An algorithm for learning an overcomplete dictionary using a Cauchy mixture model for sparse decomposition of an underdetermined mixing system is introduced. The mixture density function is derived from a ratio sample of the observed mixture signals where 1) there are at least two but not necessarily more mixture signals observed, 2) the source signals are statistically independent and 3) the sources are sparse. The basis vectors of the dictionary are learned via the optimization of the location parameters of the Cauchy mixture components, which is shown to be more accurate and robust than the conventional data mining methods usually employed for this task. Using a well known sparse decomposition algorithm, we extract three speech signals from two mixtures based on the estimated dictionary. Further tests with additive Gaussian noise are used to demonstrate the proposed algorithm-s robustness to outliers.

Keywords: expectation-maximization, Pitman estimator, sparsedecomposition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925
11126 Development of Subjective Measures of Interestingness: From Unexpectedness to Shocking

Authors: Eiad Yafi, M. A. Alam, Ranjit Biswas

Abstract:

Knowledge Discovery of Databases (KDD) is the process of extracting previously unknown but useful and significant information from large massive volume of databases. Data Mining is a stage in the entire process of KDD which applies an algorithm to extract interesting patterns. Usually, such algorithms generate huge volume of patterns. These patterns have to be evaluated by using interestingness measures to reflect the user requirements. Interestingness is defined in different ways, (i) Objective measures (ii) Subjective measures. Objective measures such as support and confidence extract meaningful patterns based on the structure of the patterns, while subjective measures such as unexpectedness and novelty reflect the user perspective. In this report, we try to brief the more widely spread and successful subjective measures and propose a new subjective measure of interestingness, i.e. shocking.

Keywords: Shocking rules (SHR).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1514
11125 Motor Skill Adaptation Depends On the Level of Learning

Authors: Herbert Ugrinowitsch, Suziane Peixoto dos Santos-Naves, Michele Viviene Carbinatto, Rodolfo NovellinoBenda, Go Tani

Abstract:

An experiment was conducted to examine the effect of the level of performance stabilization on the human adaptability to perceptual-motor perturbation in a complex coincident timing task. Three levels of performance stabilization were established operationally: pre-stabilization, stabilization, and super-stabilization groups. Each group practiced the task until reached its level of stabilization in a constant sequence of movements and under a constant time constraint before exposure to perturbation. The results clearly showed that performance stabilization is a pre-condition for adaptation. Moreover, variability before reaching stabilization is harmful to adaptation and persistent variability after stabilization is beneficial. Moreover, the behavior of variability is specific to each measure.

Keywords: Adaptation, motor skill, perturbation, stabilization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1764
11124 Application Procedure for Optimized Placement of Buckling Restrained Braces in Reinforced Concrete Building Structures

Authors: S. A. Faizi, S. Yoshitomi

Abstract:

The optimal design procedure of buckling restrained braces (BRBs) in reinforced concrete (RC) building structures can provide the distribution of horizontal stiffness of BRBs at each story, which minimizes story drift response of the structure under the constraint of specified total stiffness of BRBs. In this paper, a simple rule is proposed to convert continuous horizontal stiffness of BRBs into sectional sizes of BRB which are available from standardized section list assuming realistic structural design stage.

Keywords: Buckling restrained brace, building engineering, optimal damper placement, structural engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1363
11123 A New DIDS Design Based on a Combination Feature Selection Approach

Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman

Abstract:

Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.

Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1911
11122 An Appraisal of Coal Fly Ash Soil Amendment Technology (FASAT) of Central Institute of Mining and Fuel Research (CIMFR)

Authors: L.C. Ram, R.E. Masto, Smriti Singh, R.C. Tripathi, S.K. Jha, N.K. Srivastava, A.K. Sinha, V.A. Selvi, A. Sinha

Abstract:

Coal will continue to be the predominant source of global energy for coming several decades. The huge generation of fly ash (FA) from combustion of coal in thermal power plants (TPPs) is apprehended to pose the concerns of its disposal and utilization. FA application based on its typical characteristics as soil ameliorant for agriculture and forestry is the potential area, and hence the global attempt. The inferences drawn suffer from the variations of ash characteristics, soil types, and agro-climatic conditions; thereby correlating the effects of ash between various plant species and soil types is difficult. Indian FAs have low bulk density, high water holding capacity and porosity, rich silt-sized particles, alkaline nature, negligible solubility, and reasonable plant nutrients. Findings of the demonstrations trials for more than two decades from lab/pot to field scale long-term experiments are developed as FA soil amendment technology (FASAT) by Central Institute of Mining and Fuel Research (CIMFR), Dhanbad. Performance of different crops and plant species in cultivable and problematic soils, are encouraging, eco-friendly, and being adopted by the farmers. FA application includes ash alone and in combination with inorganic/organic amendments; combination treatments including bio-solids perform better than FA alone. Optimum dose being up to 100 t/ha for cultivable land and up to/ or above 200 t/ha of FA for waste/degraded land/mine refuse, depending on the characteristics of ash and soil. The elemental toxicity in Indian FA is usually not of much concern owing to alkaline ashes, oxide forms of elements, and elemental concentration within the threshold limits for soil application. Combating toxicity, if any, is possible through combination treatments with organic materials and phytoremediation. Government initiatives through extension programme involving farmers and ash generating organizations need to be accelerated

Keywords: Fly ash, soil quality, CIMFR, FASAT, agriculture, forestry, toxicity, remediation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3040
11121 Latent Semantic Inference for Agriculture FAQ Retrieval

Authors: Dawei Wang, Rujing Wang, Ying Li, Baozi Wei

Abstract:

FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture domain extracted from user input .Input queries or questions are converted into four parts, the question word segment (QWS), the verb segment (VS), the concept of agricultural areas segment (CS), the auxiliary segment (AS). A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the pool of the candidate. A thesaurus constructed from the HowNet, a Chinese knowledge base, is adopted for word similarity measure in the matcher. The questions are classified into eleven intension categories using predefined question stemming keywords. For FAQ mining, given a query, the question part and answer part in an FAQ question-answer pair is matched with the input query, respectively. Finally, the probabilities estimated from these two parts are integrated and used to choose the most likely answer for the input query. These approaches are experimented on an agriculture FAQ system. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in agriculture FAQ retrieval.

Keywords: FAQ, Semantic Inference, Ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1361
11120 A Growing Natural Gas Approach for Evaluating Quality of Software Modules

Authors: Parvinder S. Sandhu, Sandeep Khimta, Kiranpreet Kaur

Abstract:

The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.

Keywords: Growing Neural Gas, data clustering, fault prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1840
11119 Network Anomaly Detection using Soft Computing

Authors: Surat Srinoy, Werasak Kurutach, Witcha Chimphlee, Siriporn Chimphlee

Abstract:

One main drawback of intrusion detection system is the inability of detecting new attacks which do not have known signatures. In this paper we discuss an intrusion detection method that proposes independent component analysis (ICA) based feature selection heuristics and using rough fuzzy for clustering data. ICA is to separate these independent components (ICs) from the monitored variables. Rough set has to decrease the amount of data and get rid of redundancy and Fuzzy methods allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining- (KDDCup 1999) dataset.

Keywords: Network security, intrusion detection, rough set, ICA, anomaly detection, independent component analysis, rough fuzzy .

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926
11118 Motion Detection Techniques Using Optical Flow

Authors: A. A. Shafie, Fadhlan Hafiz, M. H. Ali

Abstract:

Motion detection is very important in image processing. One way of detecting motion is using optical flow. Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components. A second constraint is needed. The method used for finding the optical flow in this project is assuming that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image. This technique is later used in developing software for motion detection which has the capability to carry out four types of motion detection. The motion detection software presented in this project also can highlight motion region, count motion level as well as counting object numbers. Many objects such as vehicles and human from video streams can be recognized by applying optical flow technique.

Keywords: Background modeling, Motion detection, Optical flow, Velocity smoothness constant, motion trajectories.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5363
11117 An Iterative Algorithm for Inverse Kinematics of 5-DOF Manipulator with Offset Wrist

Authors: Juyi Park, Jung-Min Kim, Hee-Hwan Park, Jin-Wook Kim, Gye-Hyung Kang, Soo-Ho Kim

Abstract:

This paper presents an iterative algorithm to find a inverse kinematic solution of 5-DOF robot. The algorithm is to minimize the iteration number. Since the 5-DOF robot cannot give full orientation of tool. Only z-direction of tool is satisfied while rotation of tool is determined by kinematic constraint. This work therefore described how to specify the tool direction and let the tool rotation free. The simulation results show that this algorithm effectively worked. Using the proposed iteration algorithm, error due to inverse kinematics converged to zero rapidly in 5 iterations. This algorithm was applied in real welding robot and verified through various practical works.

Keywords: 5-DOF manipulator, Inverse kinematics, Iterative algorithm, Wrist offset.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4106
11116 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay

Abstract:

Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies  the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.

Keywords: Retail stores, Faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 518
11115 Context-aware Recommender Systems using Data Mining Techniques

Authors: Kyoung-jae Kim, Hyunchul Ahn, Sangwon Jeong

Abstract:

This study proposes a novel recommender system to provide the advertisements of context-aware services. Our proposed model is designed to apply a modified collaborative filtering (CF) algorithm with regard to the several dimensions for the personalization of mobile devices – location, time and the user-s needs type. In particular, we employ a classification rule to understand user-s needs type using a decision tree algorithm. In addition, we collect primary data from the mobile phone users and apply them to the proposed model to validate its effectiveness. Experimental results show that the proposed system makes more accurate and satisfactory advertisements than comparative systems.

Keywords: Location-based advertisement, Recommender system, Collaborative filtering, User needs type, Mobile user.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2155
11114 Genetic Programming Approach to Hierarchical Production Rule Discovery

Authors: Basheer M. Al-Maqaleh, Kamal K. Bharadwaj

Abstract:

Automated discovery of hierarchical structures in large data sets has been an active research area in the recent past. This paper focuses on the issue of mining generalized rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses flat rules as initial individuals of GP and discovers hierarchical structure. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Keywords: Genetic Programming, Hierarchy, Knowledge Discovery in Database, Subsumption Matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1418
11113 Does Practice Reflect Theory? An Exploratory Study of a Successful Knowledge Management System

Authors: Janet L. Kourik, Peter E. Maher

Abstract:

To investigate the correspondence of theory and practice, a successfully implemented Knowledge Management System (KMS) is explored through the lens of Alavi and Leidner-s proposed KMS framework for the analysis of an information system in knowledge management (Framework-AISKM). The applied KMS system was designed to manage curricular knowledge in a distributed university environment. The motivation for the KMS is discussed along with the types of knowledge necessary in an academic setting. Elements of the KMS involved in all phases of capturing and disseminating knowledge are described. As the KMS matures the resulting data stores form the precursor to and the potential for knowledge mining. The findings from this exploratory study indicate substantial correspondence between the successful KMS and the theory-based framework providing provisional confirmation for the framework while suggesting factors that contributed to the system-s success. Avenues for future work are described.

Keywords: Applied KMS, education, knowledge management (KM), KM framework, knowledge management system (KMS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1018
11112 Survey on Energy Efficient Routing Protocols in Mobile Ad Hoc Networks

Authors: Swapnil Singh, Sanjoy Das

Abstract:

Mobile Ad-Hoc Network (MANET) is a network without infrastructure dynamically formed by autonomous system of mobile nodes that are connected via wireless links. Mobile nodes communicate with each other on the fly. In this network each node also acts as a router. The battery power and the bandwidth are very scarce resources in this network. The network lifetime and connectivity of nodes depend on battery power. Therefore, energy is a valuable constraint which should be efficiently used. In this paper we survey various energy efficient routing protocols. The energy efficient routing protocols are classified on the basis of approaches they use to minimize the energy consumption. The purpose of this paper is to facilitate the research work and combine the existing solution and to develop a more energy efficient routing mechanism.

Keywords: Delaunay Triangulation, deployment, energy efficiency, MANET.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3018
11111 The Inverse Problem of Nonsymmetric Matrices with a Submatrix Constraint and its Approximation

Authors: Yongxin Yuan, Hao Liu

Abstract:

In this paper, we first give the representation of the general solution of the following least-squares problem (LSP): Given matrices X ∈ Rn×p, B ∈ Rp×p and A0 ∈ Rr×r, find a matrix A ∈ Rn×n such that XT AX − B = min, s. t. A([1, r]) = A0, where A([1, r]) is the r×r leading principal submatrix of the matrix A. We then consider a best approximation problem: given an n × n matrix A˜ with A˜([1, r]) = A0, find Aˆ ∈ SE such that A˜ − Aˆ = minA∈SE A˜ − A, where SE is the solution set of LSP. We show that the best approximation solution Aˆ is unique and derive an explicit formula for it. Keyw

Keywords: Inverse problem, Least-squares solution, model updating, Singular value decomposition (SVD), Optimal approximation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1622
11110 Knowledge Acquisition for the Construction of an Evolving Ontology: Application to Augmented Surgery

Authors: Nora Taleb, Sellami Mokhtar, Michel Simonet

Abstract:

This work concerns the evolution and the maintenance of an ontological resource in relation with the evolution of the corpus of texts from which it had been built. The knowledge forming a text corpus, especially in dynamic domains, is in continuous evolution. When a change in the corpus occurs, the domain ontology must evolve accordingly. Most methods manage ontology evolution independently from the corpus from which it is built; in addition, they treat evolution just as a process of knowledge addition, not considering other knowledge changes. We propose a methodology for managing an evolving ontology from a text corpus that evolves over time, while preserving the consistency and the persistence of this ontology. Our methodology is based on the changes made on the corpus to reflect the evolution of the considered domain - augmented surgery in our case. In this context, the results of text mining techniques, as well as the ARCHONTE method slightly modified, are used to support the evolution process.

Keywords: Corpus, Evolution, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1416
11109 Using Pattern Search Methods for Minimizing Clustering Problems

Authors: Parvaneh Shabanzadeh, Malik Hj Abu Hassan, Leong Wah June, Maryam Mohagheghtabar

Abstract:

Clustering is one of an interesting data mining topics that can be applied in many fields. Recently, the problem of cluster analysis is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the cluster analysis problem based on nonsmooth optimization techniques is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. These methods do not explicitly use derivatives, an important feature that has not been addressed in previous studies. Results of numerical experiments are presented which demonstrate the effectiveness of the proposed method.

Keywords: Clustering functions, Non-smooth Optimization, Nonconvex Optimization, Pattern Search Method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1622
11108 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: Data mining, information retrieval system, multi-label, problem transformation, histogram of gradients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1289
11107 Advanced Information Extraction with n-gram based LSI

Authors: Ahmet Güven, Ö. Özgür Bozkurt, Oya Kalıpsız

Abstract:

Number of documents being created increases at an increasing pace while most of them being in already known topics and little of them introducing new concepts. This fact has started a new era in information retrieval discipline where the requirements have their own specialties. That is digging into topics and concepts and finding out subtopics or relations between topics. Up to now IR researches were interested in retrieving documents about a general topic or clustering documents under generic subjects. However these conventional approaches can-t go deep into content of documents which makes it difficult for people to reach to right documents they were searching. So we need new ways of mining document sets where the critic point is to know much about the contents of the documents. As a solution we are proposing to enhance LSI, one of the proven IR techniques by supporting its vector space with n-gram forms of words. Positive results we have obtained are shown in two different application area of IR domain; querying a document database, clustering documents in the document database.

Keywords: Document clustering, Information Extraction, Information Retrieval, LSI, n-gram.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
11106 Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms

Authors: Chia-Ta Tsai, Wen-Lin Huang, Shinn-Jang Ho, Li-Sun Shu, Shinn-Ying Ho

Abstract:

Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.

Keywords: Bacterial virulence factors, GO terms, prediction, protein sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2168
11105 An Intelligent Approach of Rough Set in Knowledge Discovery Databases

Authors: Hrudaya Ku. Tripathy, B. K. Tripathy, Pradip K. Das

Abstract:

Knowledge Discovery in Databases (KDD) has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Rough Set Theory (RST) is a mathematical formalism for representing uncertainty that can be considered an extension of the classical set theory. It has been used in many different research areas, including those related to inductive machine learning and reduction of knowledge in knowledge-based systems. One important concept related to RST is that of a rough relation. In this paper we presented the current status of research on applying rough set theory to KDD, which will be helpful for handle the characteristics of real-world databases. The main aim is to show how rough set and rough set analysis can be effectively used to extract knowledge from large databases.

Keywords: Data mining, Data tables, Knowledge discovery in database (KDD), Rough sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2307
11104 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1951
11103 Estimating Frequency, Amplitude and Phase of Two Sinusoids with Very Close Frequencies

Authors: Jayme G. A. Barbedo, Amauri Lopes

Abstract:

This paper presents an algorithm to estimate the parameters of two closely spaced sinusoids, providing a frequency resolution that is more than 800 times greater than that obtained by using the Discrete Fourier Transform (DFT). The strategy uses a highly optimized grid search approach to accurately estimate frequency, amplitude and phase of both sinusoids, keeping at the same time the computational effort at reasonable levels. The proposed method has three main characteristics: 1) a high frequency resolution; 2) frequency, amplitude and phase are all estimated at once using one single package; 3) it does not rely on any statistical assumption or constraint. Potential applications to this strategy include the difficult task of resolving coincident partials of instruments in musical signals.

Keywords: Closely spaced sinusoids, high-resolution parameter estimation, optimized grid search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2841
11102 Oncogene Identification using Filter based Approaches between Various Cancer Types in Lung

Authors: Michael Netzer, Michael Seger, Mahesh Visvanathan, Bernhard Pfeifer, Gerald H. Lushington, Christian Baumgartner

Abstract:

Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.

Keywords: lung cancer, micro arrays, data mining, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1731