Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24

Search results for: Gene expression data

24 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 520
23 A Hybrid Gene Selection Technique Using Improved Mutual Information and Fisher Score for Cancer Classification Using Microarrays

Authors: M. Anidha, K. Premalatha

Abstract:

Feature Selection is significant in order to perform constructive classification in the area of cancer diagnosis. However, a large number of features compared to the number of samples makes the task of classification computationally very hard and prone to errors in microarray gene expression datasets. In this paper, we present an innovative method for selecting highly informative gene subsets of gene expression data that effectively classifies the cancer data into tumorous and non-tumorous. The hybrid gene selection technique comprises of combined Mutual Information and Fisher score to select informative genes. The gene selection is validated by classification using Support Vector Machine (SVM) which is a supervised learning algorithm capable of solving complex classification problems. The results obtained from improved Mutual Information and F-Score with SVM as a classifier has produced efficient results.

Keywords: Gene selection, mutual information, Fisher score, classification, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 750
22 A Cuckoo Search with Differential Evolution for Clustering Microarray Gene Expression Data

Authors: M. Pandi, K. Premalatha

Abstract:

A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.

Keywords: DNA, Microarray, genomics, Cuckoo Search, Differential Evolution, Gene expression data, Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 949
21 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: Metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1899
20 Statistical Measures and Optimization Algorithms for Gene Selection in Lung and Ovarian Tumor

Authors: C. Gunavathi, K. Premalatha

Abstract:

Microarray technology is universally used in the study of disease diagnosis using gene expression levels. The main shortcoming of gene expression data is that it includes thousands of genes and a small number of samples. Abundant methods and techniques have been proposed for tumor classification using microarray gene expression data. Feature or gene selection methods can be used to mine the genes that directly involve in the classification and to eliminate irrelevant genes. In this paper statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR) and F-Statistics are used to rank the genes. The ranked genes are used for further classification. Particle Swarm Optimization (PSO) algorithm and Shuffled Frog Leaping (SFL) algorithm are used to find the significant genes from the top-m ranked genes. The Naïve Bayes Classifier (NBC) is used to classify the samples based on the significant genes. The proposed work is applied on Lung and Ovarian datasets. The experimental results show that the proposed method achieves 100% accuracy in all the three datasets and the results are compared with previous works.

Keywords: Microarray, T-Statistics, Signal-to-Noise Ratio, FStatistics, Particle Swarm Optimization, Shuffled Frog Leaping, Naïve Bayes Classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1593
19 Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification

Authors: C. Gunavathi, K. Premalatha

Abstract:

Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.

Keywords: F-Test, Gene Expression, Genetic Algorithm, k- Nearest-Neighbor, Microarray, Signal-to-Noise Ratio, Support Vector Machine, T-statistics, Tumor Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3876
18 An Advanced Nelder Mead Simplex Method for Clustering of Gene Expression Data

Authors: M. Pandi, K. Premalatha

Abstract:

The DNA microarray technology concurrently monitors the expression levels of thousands of genes during significant biological processes and across the related samples. The better understanding of functional genomics is obtained by extracting the patterns hidden in gene expression data. It is handled by clustering which reveals natural structures and identify interesting patterns in the underlying data. In the proposed work clustering gene expression data is done through an Advanced Nelder Mead (ANM) algorithm. Nelder Mead (NM) method is a method designed for optimization process. In Nelder Mead method, the vertices of a triangle are considered as the solutions. Many operations are performed on this triangle to obtain a better result. In the proposed work, the operations like reflection and expansion is eliminated and a new operation called spread-out is introduced. The spread-out operation will increase the global search area and thus provides a better result on optimization. The spread-out operation will give three points and the best among these three points will be used to replace the worst point. The experiment results are analyzed with optimization benchmark test functions and gene expression benchmark datasets. The results show that ANM outperforms NM in both benchmarks.

Keywords: Spread out, simplex, multi-minima, fitness function, optimization, search area, monocyte, solution, genomes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2082
17 Comparative Study on Swarm Intelligence Techniques for Biclustering of Microarray Gene Expression Data

Authors: R. Balamurugan, A. M. Natarajan, K. Premalatha

Abstract:

Microarray gene expression data play a vital in biological processes, gene regulation and disease mechanism. Biclustering in gene expression data is a subset of the genes indicating consistent patterns under the subset of the conditions. Finding a biclustering is an optimization problem. In recent years, swarm intelligence techniques are popular due to the fact that many real-world problems are increasingly large, complex and dynamic. By reasons of the size and complexity of the problems, it is necessary to find an optimization technique whose efficiency is measured by finding the near optimal solution within a reasonable amount of time. In this paper, the algorithmic concepts of the Particle Swarm Optimization (PSO), Shuffled Frog Leaping (SFL) and Cuckoo Search (CS) algorithms have been analyzed for the four benchmark gene expression dataset. The experiment results show that CS outperforms PSO and SFL for 3 datasets and SFL give better performance in one dataset. Also this work determines the biological relevance of the biclusters with Gene Ontology in terms of function, process and component.

Keywords: Particle swarm optimization, Shuffled frog leaping, Cuckoo search, biclustering, gene expression data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2195
16 Simultaneous Clustering and Feature Selection Method for Gene Expression Data

Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar

Abstract:

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.

Keywords: Clustering, Feature selection, Gene expression data, Quick reduct.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542
15 A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection

Authors: E. N. Sathishkumar, K. Thangavel, T. Chandrasekhar

Abstract:

Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.

Keywords: Clustering, Gene Selection, K-Mean-Quick Reduct, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1911
14 Categorization and Estimation of Relative Connectivity of Genes from Meta-OFTEN Network

Authors: U. Kairov, T. Karpenyuk, E. Ramanculov, A. Zinovyev

Abstract:

The most common result of analysis of highthroughput data in molecular biology represents a global list of genes, ranked accordingly to a certain score. The score can be a measure of differential expression. Recent work proposed a new method for selecting a number of genes in a ranked gene list from microarray gene expression data such that this set forms the Optimally Functionally Enriched Network (OFTEN), formed by known physical interactions between genes or their products. Here we present calculation results of relative connectivity of genes from META-OFTEN network and tentative biological interpretation of the most reproducible signal. The relative connectivity and inbetweenness values of genes from META-OFTEN network were estimated.

Keywords: Microarray, META-OFTEN, gene network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
13 Clustering Approach to Unveiling Relationships between Gene Regulatory Networks

Authors: Hiba Hasan, Khalid Raza

Abstract:

Reverse engineering of genetic regulatory network involves the modeling of the given gene expression data into a form of the network. Computationally it is possible to have the relationships between genes, so called gene regulatory networks (GRNs), that can help to find the genomics and proteomics based diagnostic approach for any disease. In this paper, clustering based method has been used to reconstruct genetic regulatory network from time series gene expression data. Supercoiled data set from Escherichia coli has been taken to demonstrate the proposed method.

Keywords: Gene expression, gene regulatory networks (GRNs), clustering, data preprocessing, network visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708
12 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning

Authors: Chunming Xu

Abstract:

Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.

Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1105
11 A Comparison of SVM-based Criteria in Evolutionary Method for Gene Selection and Classification of Microarray Data

Authors: Rameswar Debnath, Haruhisa Takahashi

Abstract:

An evolutionary method whose selection and recombination operations are based on generalization error-bounds of support vector machine (SVM) can select a subset of potentially informative genes for SVM classifier very efficiently [7]. In this paper, we will use the derivative of error-bound (first-order criteria) to select and recombine gene features in the evolutionary process, and compare the performance of the derivative of error-bound with the error-bound itself (zero-order) in the evolutionary process. We also investigate several error-bounds and their derivatives to compare the performance, and find the best criteria for gene selection and classification. We use 7 cancer-related human gene expression datasets to evaluate the performance of the zero-order and first-order criteria of error-bounds. Though both criteria have the same strategy in theoretically, experimental results demonstrate the best criterion for microarray gene expression data.

Keywords: support vector machine, generalization error-bound, feature selection, evolutionary algorithm, microarray data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1206
10 An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions

Authors: R. Mallika, V. Saravanan

Abstract:

This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.

Keywords: Support vector machines-one against all, cancerclassification, Linear Discriminant analysis, K nearest neighbour, microarray gene expression, gene pair ranking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2176
9 An Integrative Bayesian Approach to Supporting the Prediction of Protein-Protein Interactions: A Case Study in Human Heart Failure

Authors: Fiona Browne, Huiru Zheng, Haiying Wang, Francisco Azuaje

Abstract:

Recent years have seen a growing trend towards the integration of multiple information sources to support large-scale prediction of protein-protein interaction (PPI) networks in model organisms. Despite advances in computational approaches, the combination of multiple “omic" datasets representing the same type of data, e.g. different gene expression datasets, has not been rigorously studied. Furthermore, there is a need to further investigate the inference capability of powerful approaches, such as fullyconnected Bayesian networks, in the context of the prediction of PPI networks. This paper addresses these limitations by proposing a Bayesian approach to integrate multiple datasets, some of which encode the same type of “omic" data to support the identification of PPI networks. The case study reported involved the combination of three gene expression datasets relevant to human heart failure (HF). In comparison with two traditional methods, Naive Bayesian and maximum likelihood ratio approaches, the proposed technique can accurately identify known PPI and can be applied to infer potentially novel interactions.

Keywords: Bayesian network, Classification, Data integration, Protein interaction networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1276
8 Modeling Stress-Induced Regulatory Cascades with Artificial Neural Networks

Authors: Maria E. Manioudaki, Panayiota Poirazi

Abstract:

Yeast cells live in a constantly changing environment that requires the continuous adaptation of their genomic program in order to sustain their homeostasis, survive and proliferate. Due to the advancement of high throughput technologies, there is currently a large amount of data such as gene expression, gene deletion and protein-protein interactions for S. Cerevisiae under various environmental conditions. Mining these datasets requires efficient computational methods capable of integrating different types of data, identifying inter-relations between different components and inferring functional groups or 'modules' that shape intracellular processes. This study uses computational methods to delineate some of the mechanisms used by yeast cells to respond to environmental changes. The GRAM algorithm is first used to integrate gene expression data and ChIP-chip data in order to find modules of coexpressed and co-regulated genes as well as the transcription factors (TFs) that regulate these modules. Since transcription factors are themselves transcriptionally regulated, a three-layer regulatory cascade consisting of the TF-regulators, the TFs and the regulated modules is subsequently considered. This three-layer cascade is then modeled quantitatively using artificial neural networks (ANNs) where the input layer corresponds to the expression of the up-stream transcription factors (TF-regulators) and the output layer corresponds to the expression of genes within each module. This work shows that (a) the expression of at least 33 genes over time and for different stress conditions is well predicted by the expression of the top layer transcription factors, including cases in which the effect of up-stream regulators is shifted in time and (b) identifies at least 6 novel regulatory interactions that were not previously associated with stress-induced changes in gene expression. These findings suggest that the combination of gene expression and protein-DNA interaction data with artificial neural networks can successfully model biological pathways and capture quantitative dependencies between distant regulators and downstream genes.

Keywords: gene modules, artificial neural networks, yeast, stress

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1132
7 BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.

Keywords: Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1537
6 Dynamic Bayesian Networks Modeling for Inferring Genetic Regulatory Networks by Search Strategy: Comparison between Greedy Hill Climbing and MCMC Methods

Authors: Huihai Wu, Xiaohui Liu

Abstract:

Using Dynamic Bayesian Networks (DBN) to model genetic regulatory networks from gene expression data is one of the major paradigms for inferring the interactions among genes. Averaging a collection of models for predicting network is desired, rather than relying on a single high scoring model. In this paper, two kinds of model searching approaches are compared, which are Greedy hill-climbing Search with Restarts (GSR) and Markov Chain Monte Carlo (MCMC) methods. The GSR is preferred in many papers, but there is no such comparison study about which one is better for DBN models. Different types of experiments have been carried out to try to give a benchmark test to these approaches. Our experimental results demonstrated that on average the MCMC methods outperform the GSR in accuracy of predicted network, and having the comparable performance in time efficiency. By proposing the different variations of MCMC and employing simulated annealing strategy, the MCMC methods become more efficient and stable. Apart from comparisons between these approaches, another objective of this study is to investigate the feasibility of using DBN modeling approaches for inferring gene networks from few snapshots of high dimensional gene profiles. Through synthetic data experiments as well as systematic data experiments, the experimental results revealed how the performances of these approaches can be influenced as the target gene network varies in the network size, data size, as well as system complexity.

Keywords: Genetic regulatory network, Dynamic Bayesian network, GSR, MCMC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1562
5 Analysis of DNA Microarray Data using Association Rules: A Selective Study

Authors: M. Anandhavalli Gauthaman

Abstract:

DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.

Keywords: DNA microarray, gene expression, association rule mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1723
4 MIBiClus: Mutual Information based Biclustering Algorithm

Authors: Neelima Gupta, Seema Aggarwal

Abstract:

Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.

Keywords: Biclustering, mutual information.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304
3 A Dynamic Time-Lagged Correlation based Method to Learn Multi-Time Delay Gene Networks

Authors: Ankit Agrawal, Ankush Mittal

Abstract:

A gene network gives the knowledge of the regulatory relationships among the genes. Each gene has its activators and inhibitors that regulate its expression positively and negatively respectively. Genes themselves are believed to act as activators and inhibitors of other genes. They can even activate one set of genes and inhibit another set. Identifying gene networks is one of the most crucial and challenging problems in Bioinformatics. Most work done so far either assumes that there is no time delay in gene regulation or there is a constant time delay. We here propose a Dynamic Time- Lagged Correlation Based Method (DTCBM) to learn the gene networks, which uses time-lagged correlation to find the potential gene interactions, and then uses a post-processing stage to remove false gene interactions to common parents, and finally uses dynamic correlation thresholds for each gene to construct the gene network. DTCBM finds correlation between gene expression signals shifted in time, and therefore takes into consideration the multi time delay relationships among the genes. The implementation of our method is done in MATLAB and experimental results on Saccharomyces cerevisiae gene expression data and comparison with other methods indicate that it has a better performance.

Keywords: Activators, correlation, dynamic time-lagged correlation based method, inhibitors, multi-time delay gene network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1246
2 Evaluation of Clustering Based on Preprocessing in Gene Expression Data

Authors: Seo Young Kim, Toshimitsu Hamasaki

Abstract:

Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.

Keywords: Gene expression, clustering, data preprocessing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1346
1 Iterative Clustering Algorithm for Analyzing Temporal Patterns of Gene Expression

Authors: Seo Young Kim, Jae Won Lee, Jong Sung Bae

Abstract:

Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms.

Keywords: Clustering, microarray experiment, temporal pattern of gene expression data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1016