Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 51

Search results for: Microarray

51 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: K-means algorithm, microarray data mining, biological pattern recognition, partitional clustering, centroid initialization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 559
50 Biomolecules Based Microarray for Screening Human Endothelial Cells Behavior

Authors: Adel Dalilottojari, Bahman Delalat, Frances J. Harding, Michaelia P. Cockshell, Claudine S. Bonder, Nicolas H. Voelcker

Abstract:

Endothelial Progenitor Cell (EPC) based therapies continue to be of interest to treat ischemic events based on their proven role to promote blood vessel formation and thus tissue re-vascularisation. Current strategies for the production of clinical-grade EPCs requires the in vitro isolation of EPCs from peripheral blood followed by cell expansion to provide sufficient quantities EPCs for cell therapy. This study aims to examine the use of different biomolecules to significantly improve the current strategy of EPC capture and expansion on collagen type I (Col I). In this study, four different biomolecules were immobilised on a surface and then investigated for their capacity to support EPC capture and proliferation. First, a cell microarray platform was fabricated by coating a glass surface with epoxy functional allyl glycidyl ether plasma polymer (AGEpp) to mediate biomolecule binding. The four candidate biomolecules tested were Col I, collagen type II (Col II), collagen type IV (Col IV) and vascular endothelial growth factor A (VEGF-A), which were arrayed on the epoxy-functionalised surface using a non-contact printer. The surrounding area between the printed biomolecules was passivated with polyethylene glycol-bisamine (A-PEG) to prevent non-specific cell attachment. EPCs were seeded onto the microarray platform and cell numbers quantified after 1 h (to determine capture) and 72 h (to determine proliferation). All of the extracellular matrix (ECM) biomolecules printed demonstrated an ability to capture EPCs within 1 h of cell seeding with Col II exhibiting the highest level of attachment when compared to the other biomolecules. Interestingly, Col IV exhibited the highest increase in EPC expansion after 72 h when compared to Col I, Col II and VEGF-A. These results provide information for significant improvement in the capture and expansion of human EPC for further application.

Keywords: Cardiovascular Disease, Cell Therapy, Endothelial Progenitor Cells, high throughput screening, cell microarray platform

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 990
49 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: Data Mining, microarray data, Decision Tree, classification techniques, classification rule, leukemia diseases

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1921
48 A Hybrid Gene Selection Technique Using Improved Mutual Information and Fisher Score for Cancer Classification Using Microarrays

Authors: M. Anidha, K. Premalatha

Abstract:

Feature Selection is significant in order to perform constructive classification in the area of cancer diagnosis. However, a large number of features compared to the number of samples makes the task of classification computationally very hard and prone to errors in microarray gene expression datasets. In this paper, we present an innovative method for selecting highly informative gene subsets of gene expression data that effectively classifies the cancer data into tumorous and non-tumorous. The hybrid gene selection technique comprises of combined Mutual Information and Fisher score to select informative genes. The gene selection is validated by classification using Support Vector Machine (SVM) which is a supervised learning algorithm capable of solving complex classification problems. The results obtained from improved Mutual Information and F-Score with SVM as a classifier has produced efficient results.

Keywords: classification, Gene Selection, SVM, mutual information, Fisher score

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 755
47 A Cuckoo Search with Differential Evolution for Clustering Microarray Gene Expression Data

Authors: M. Pandi, K. Premalatha

Abstract:

A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.

Keywords: Genomics, Clustering, Dna, Differential Evolution, Microarray, Gene Expression Data, cuckoo search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 957
46 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models

Authors: Yoonsuh Jung

Abstract:

As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an ‘optimal’ value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.

Keywords: cross validation, parameter averaging, regularization parameter search, Parameter Selection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1182
45 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: Machine Learning, Metabolic Pathways, Microarray, Gene Expression Data, SVM, Kullback–Leibler divergence, KL divergence, support vector machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1899
44 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: Bioinformatics, Feature selection, DNA microarray, missing data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2057
43 Apoptosis Pathway Targeted by Thymoquinone in MCF7 Breast Cancer Cell Line

Authors: M. Marjaneh, M. Y. Narazah, H. Shahrul

Abstract:

Array-based gene expression analysis is a powerful tool to profile expression of genes and to generate information on therapeutic effects of new anti-cancer compounds. Anti-apoptotic effect of thymoquinone was studied in MCF7 breast cancer cell line using gene expression profiling with cDNA microarray. The purity and yield of RNA samples were determined using RNeasyPlus Mini kit. The Agilent RNA 6000 NanoLabChip kit evaluated the quantity of the RNA samples. AffinityScript RT oligo-dT promoter primer was used to generate cDNA strands. T7 RNA polymerase was used to convert cDNA to cRNA. The cRNA samples and human universal reference RNA were labelled with Cy-3-CTP and Cy-5-CTP, respectively. Feature Extraction and GeneSpring softwares analysed the data. The single experiment analysis revealed involvement of 64 pathways with up-regulated genes and 78 pathways with downregulated genes. The MAPK and p38-MAPK pathways were inhibited due to the up-regulation of PTPRR gene. The inhibition of p38-MAPK suggested up-regulation of TGF-ß pathway. Inhibition of p38-MAPK caused up-regulation of TP53 and down-regulation of Bcl2 genes indicating involvement of intrinsic apoptotic pathway. Down-regulation of CARD16 gene as an adaptor molecule regulated CASP1 and suggested necrosis-like programmed cell death and involvement of caspase in apoptosis. Furthermore, down-regulation of GPCR, EGF-EGFR signalling pathways suggested reduction of ER. Involvement of AhR pathway which control cytochrome P450 and glucuronidation pathways showed metabolism of Thymoquinone. The findings showed differential expression of several genes in apoptosis pathways with thymoquinone treatment in estrogen receptor-positive breast cancer cells.

Keywords: thymoquinone, cDNA microarray, CARD16, PTPRR, CASP10

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1880
42 Integration of Microarray Data into a Genome-Scale Metabolic Model to Study Flux Distribution after Gene Knockout

Authors: Mona Heydari, Ehsan Motamedian, Seyed Abbas Shojaosadati

Abstract:

Prediction of perturbations after genetic manipulation (especially gene knockout) is one of the important challenges in systems biology. In this paper, a new algorithm is introduced that integrates microarray data into the metabolic model. The algorithm was used to study the change in the cell phenotype after knockout of Gss gene in Escherichia coli BW25113. Algorithm implementation indicated that gene deletion resulted in more activation of the metabolic network. Growth yield was more and less regulating gene were identified for mutant in comparison with the wild-type strain.

Keywords: Integration, microarray data, metabolic network, gene knockout, flux balance analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 611
41 Statistical Measures and Optimization Algorithms for Gene Selection in Lung and Ovarian Tumor

Authors: C. Gunavathi, K. Premalatha

Abstract:

Microarray technology is universally used in the study of disease diagnosis using gene expression levels. The main shortcoming of gene expression data is that it includes thousands of genes and a small number of samples. Abundant methods and techniques have been proposed for tumor classification using microarray gene expression data. Feature or gene selection methods can be used to mine the genes that directly involve in the classification and to eliminate irrelevant genes. In this paper statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR) and F-Statistics are used to rank the genes. The ranked genes are used for further classification. Particle Swarm Optimization (PSO) algorithm and Shuffled Frog Leaping (SFL) algorithm are used to find the significant genes from the top-m ranked genes. The Naïve Bayes Classifier (NBC) is used to classify the samples based on the significant genes. The proposed work is applied on Lung and Ovarian datasets. The experimental results show that the proposed method achieves 100% accuracy in all the three datasets and the results are compared with previous works.

Keywords: Particle Swarm Optimization, Microarray, naive Bayes classifier, signal-to-noise ratio, T-statistics, Shuffled frog leaping, FStatistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1593
40 Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification

Authors: C. Gunavathi, K. Premalatha

Abstract:

Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.

Keywords: Gene expression, Genetic Algorithm, Microarray, signal-to-noise ratio, Support Vector Machine, F-test, k- Nearest-Neighbor, T-statistics, Tumor Classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3884
39 An Advanced Nelder Mead Simplex Method for Clustering of Gene Expression Data

Authors: M. Pandi, K. Premalatha

Abstract:

The DNA microarray technology concurrently monitors the expression levels of thousands of genes during significant biological processes and across the related samples. The better understanding of functional genomics is obtained by extracting the patterns hidden in gene expression data. It is handled by clustering which reveals natural structures and identify interesting patterns in the underlying data. In the proposed work clustering gene expression data is done through an Advanced Nelder Mead (ANM) algorithm. Nelder Mead (NM) method is a method designed for optimization process. In Nelder Mead method, the vertices of a triangle are considered as the solutions. Many operations are performed on this triangle to obtain a better result. In the proposed work, the operations like reflection and expansion is eliminated and a new operation called spread-out is introduced. The spread-out operation will increase the global search area and thus provides a better result on optimization. The spread-out operation will give three points and the best among these three points will be used to replace the worst point. The experiment results are analyzed with optimization benchmark test functions and gene expression benchmark datasets. The results show that ANM outperforms NM in both benchmarks.

Keywords: Optimization, genomes, Solution, simplex, fitness function, monocyte, Spread out, multi-minima, search area

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2089
38 Comparative Study on Swarm Intelligence Techniques for Biclustering of Microarray Gene Expression Data

Authors: R. Balamurugan, A. M. Natarajan, K. Premalatha

Abstract:

Microarray gene expression data play a vital in biological processes, gene regulation and disease mechanism. Biclustering in gene expression data is a subset of the genes indicating consistent patterns under the subset of the conditions. Finding a biclustering is an optimization problem. In recent years, swarm intelligence techniques are popular due to the fact that many real-world problems are increasingly large, complex and dynamic. By reasons of the size and complexity of the problems, it is necessary to find an optimization technique whose efficiency is measured by finding the near optimal solution within a reasonable amount of time. In this paper, the algorithmic concepts of the Particle Swarm Optimization (PSO), Shuffled Frog Leaping (SFL) and Cuckoo Search (CS) algorithms have been analyzed for the four benchmark gene expression dataset. The experiment results show that CS outperforms PSO and SFL for 3 datasets and SFL give better performance in one dataset. Also this work determines the biological relevance of the biclusters with Gene Ontology in terms of function, process and component.

Keywords: Particle Swarm Optimization, Gene Expression Data, cuckoo search, biclustering, Shuffled frog leaping

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2200
37 Principal Component Analysis using Singular Value Decomposition of Microarray Data

Authors: Dong Hoon Lim

Abstract:

A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. Principal component analysis(PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. PCA, which can be implemented via a singular value decomposition(SVD), is useful for analysis of microarray data. For application of PCA using SVD we use the DNA microarray data for the small round blue cell tumors(SRBCT) of childhood by Khan et al.(2001). To decide the number of components which account for sufficient amount of information we draw scree plot. Biplot, a graphic display associated with PCA, reveals important features that exhibit relationship between variables and also the relationship of variables with observations.

Keywords: Principal Component Analysis, microarray data, singular value decomposition, SRBCT

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2876
36 Cutaneous Application of Royal Jelly Inhibits Skin Lesions in NC/Nga Mice, a Human-Like Mouse Model of Atopic Dermatitis

Authors: Junki Miyamoto, Mariko Kiyomi, Yuuki Nagashio, Takuya Suzuki, Soichi Tanabe

Abstract:

Anti-allergic effects of royal jelly were evaluated in a human-like mouse model of atopic dermatitis. NC/Nga mice were cutaneously applied with royal jelly for 6 weeks. Royal jelly-treated mice exhibited lower levels of serum total immunoglobulin E in comparison with controls. We found that the treatment decreased (11% to the control) expression of mRNA for aquaporin-3, which is involved in the modulation of epidermal hydration. Microarray analysis revealed more than 10-fold changes in the expression of several genes, such as transglutaminase 2, repetin, and keratins. In normal human epidermal keratinocytes, royal jelly extract suppressed interleukin-8 elevation induced by TNF-α and interferon-γ, suggesting direct anti-inflammatory activity in keratinocytes. Collectively, topical application of royal jelly may be useful for amelioration of lesions and inflammation in atopic dermatitis.

Keywords: Aquaporin 3, immunoglobulin E, NC/Nga, royal jelly

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1466
35 Gene Selection Guided by Feature Interdependence

Authors: Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel

Abstract:

Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.

Keywords: Colon Cancer, Microarray Data Analysis, Gene Selection, feature interdependence, feature subset selection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1706
34 Simultaneous Clustering and Feature Selection Method for Gene Expression Data

Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar

Abstract:

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.

Keywords: Clustering, Feature selection, Gene Expression Data, Quick reduct

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1545
33 Categorization and Estimation of Relative Connectivity of Genes from Meta-OFTEN Network

Authors: U. Kairov, T. Karpenyuk, E. Ramanculov, A. Zinovyev

Abstract:

The most common result of analysis of highthroughput data in molecular biology represents a global list of genes, ranked accordingly to a certain score. The score can be a measure of differential expression. Recent work proposed a new method for selecting a number of genes in a ranked gene list from microarray gene expression data such that this set forms the Optimally Functionally Enriched Network (OFTEN), formed by known physical interactions between genes or their products. Here we present calculation results of relative connectivity of genes from META-OFTEN network and tentative biological interpretation of the most reproducible signal. The relative connectivity and inbetweenness values of genes from META-OFTEN network were estimated.

Keywords: Microarray, gene network, META-OFTEN

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
32 Gene Expression Signature for Classification of Metastasis Positive and Negative Oral Cancer in Homosapiens

Authors: A. Shukla, A. Tarsauliya, R. Tiwari, S. Sharma

Abstract:

Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.

Keywords: Cancer, classification, SAM, Gene Signature

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1701
31 Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification

Authors: María-Dolores Cubiles-de-la-Vega, Rafael Pino-Mejías, Esther-Lydia Silva-Ramírez

Abstract:

A dissimilarity measure between the empiric characteristic functions of the subsamples associated to the different classes in a multivariate data set is proposed. This measure can be efficiently computed, and it depends on all the cases of each class. It may be used to find groups of similar classes, which could be joined for further analysis, or it could be employed to perform an agglomerative hierarchical cluster analysis of the set of classes. The final tree can serve to build a family of binary classification models, offering an alternative approach to the multi-class SVM problem. We have tested this dendrogram based SVM approach with the oneagainst- one SVM approach over four publicly available data sets, three of them being microarray data. Both performances have been found equivalent, but the first solution requires a smaller number of binary SVM models.

Keywords: Cluster Analysis, multi-class SVM, Empiric Characteristic Function

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1503
30 Carbon Sources Utilization Profiles of Thermophilic Phytase Producing Bacteria Isolated from Hot-spring in Malaysia

Authors: Noor Muzamil Mohamad, Abdul Manaf Ali, Hamzah Mohd Salleh

Abstract:

Phytases (myo-inositol hexakisphosphate phosphohydrolases; EC 3.1.3.8) catalyze the hydrolysis of phytic acid (myoinositol hexakisphosphate) to the mono-, di-, tri-, tetra-, and pentaphosphates of myo-inositol and inorganic phosphate. Therrmophilic bacteria isolated from water sampled from hot spring. About 120 isolates of bacteria were successfully isolated form hot spring water sample and tested for extracellular phytase producing. After 5 passages of the screening on the PSM media, 4 isolates were found stable in producing phytase enzyme. The 16s RDNA sequencing for identification of bacteria using molecular technique revealed that all isolates those positive in phytase producing are belong to Geobacillus spp. And Anoxybacillus spp. Anoxybacillus rupiensis UniSZA-7 were identified for their carbon source utilization using Phenotype Microarray Plate of Biolog and found they utilize several kind of carbon source provided.

Keywords: phytase, phytic acid, thermophilic bacteria, PSM Media and Phytase Assay

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1791
29 Probe Selection for Pathway-Specific Microarray Probe Design Minimizing Melting Temperature Variance

Authors: Fabian Horn, Reinhard Guthke

Abstract:

In molecular biology, microarray technology is widely and successfully utilized to efficiently measure gene activity. If working with less studied organisms, methods to design custom-made microarray probes are available. One design criterion is to select probes with minimal melting temperature variances thus ensuring similar hybridization properties. If the microarray application focuses on the investigation of metabolic pathways, it is not necessary to cover the whole genome. It is more efficient to cover each metabolic pathway with a limited number of genes. Firstly, an approach is presented which minimizes the overall melting temperature variance of selected probes for all genes of interest. Secondly, the approach is extended to include the additional constraints of covering all pathways with a limited number of genes while minimizing the overall variance. The new optimization problem is solved by a bottom-up programming approach which reduces the complexity to make it computationally feasible. The new method is exemplary applied for the selection of microarray probes in order to cover all fungal secondary metabolite gene clusters for Aspergillus terreus.

Keywords: Bottom-Up Approach, Metabolic Pathway, melting temperature, gene clusters, microarray probe design, probe selection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1238
28 Dimension Reduction of Microarray Data Based on Local Principal Component

Authors: Ali Anaissi, Paul J. Kennedy, Madhu Goyal

Abstract:

Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.

Keywords: Principal Component Analysis, Linear Dimension Reduction, Non-Linear Dimension Reduction, Biologists

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1249
27 A Pairwise-Gaussian-Merging Approach: Towards Genome Segmentation for Copy Number Analysis

Authors: Chih-Hao Chen, Hsing-Chung Lee, Qingdong Ling, Hsiao-Jung Chen, Sun-Chong Wang, Li-Ching Wu, H.C. Lee

Abstract:

Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N2) in either computation time or memory requirement, or both, and the rapid advance of microarray resolution has practically rendered such algorithms useless. Here we propose an algorithm, SAD, that is much faster and much less thirsty for memory – O(N) in both computation time and memory requirement -- and offers higher accuracy. The two key ingredients of SAD are the fundamental assumption in statistics that measurement errors are normally distributed and the mathematical relation that the product of two Gaussians is another Gaussian (function). We have produced a computer program for analyzing CNV based on SAD. In addition to being fast and small it offers two important features: quantitative statistics for predictions and, with only two user-decided parameters, ease of use. Its speed shows little dependence on genomic profile. Running on an average modern computer, it completes CNV analyses for a 262 thousand-probe array in ~1 second and a 1.8 million-probe array in 9 seconds

Keywords: Cancer, Pathogenesis, segmentation analysis, chromosomal aberration, copy number variation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1121
26 A Comparison of SVM-based Criteria in Evolutionary Method for Gene Selection and Classification of Microarray Data

Authors: Rameswar Debnath, Haruhisa Takahashi

Abstract:

An evolutionary method whose selection and recombination operations are based on generalization error-bounds of support vector machine (SVM) can select a subset of potentially informative genes for SVM classifier very efficiently [7]. In this paper, we will use the derivative of error-bound (first-order criteria) to select and recombine gene features in the evolutionary process, and compare the performance of the derivative of error-bound with the error-bound itself (zero-order) in the evolutionary process. We also investigate several error-bounds and their derivatives to compare the performance, and find the best criteria for gene selection and classification. We use 7 cancer-related human gene expression datasets to evaluate the performance of the zero-order and first-order criteria of error-bounds. Though both criteria have the same strategy in theoretically, experimental results demonstrate the best criterion for microarray gene expression data.

Keywords: Feature selection, microarray data, evolutionary algorithm, support vector machine, generalization error-bound

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1208
25 A Novel Microarray Biclustering Algorithm

Authors: Chieh-Yuan Tsai, Chuang-Cheng Chiu

Abstract:

Biclustering aims at identifying several biclusters that reveal potential local patterns from a microarray matrix. A bicluster is a sub-matrix of the microarray consisting of only a subset of genes co-regulates in a subset of conditions. In this study, we extend the motif of subspace clustering to present a K-biclusters clustering (KBC) algorithm for the microarray biclustering issue. Besides minimizing the dissimilarities between genes and bicluster centers within all biclusters, the objective function of the KBC algorithm additionally takes into account how to minimize the residues within all biclusters based on the mean square residue model. In addition, the objective function also maximizes the entropy of conditions to stimulate more conditions to contribute the identification of biclusters. The KBC algorithm adopts the K-means type clustering process to efficiently make the partition of K biclusters be optimized. A set of experiments on a practical microarray dataset are demonstrated to show the performance of the proposed KBC algorithm.

Keywords: Microarray, biclustering, Subspace Clustering, Meansquare residue model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
24 Novel Hybrid Method for Gene Selection and Cancer Prediction

Authors: Liping Jing, Michael K. Ng, Tieyong Zeng

Abstract:

Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.

Keywords: Clustering, classification, Gene Selection, LASSO, Cancer Prediction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1660
23 An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions

Authors: R. Mallika, V. Saravanan

Abstract:

This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.

Keywords: linear discriminant analysis, Support vector machines-one against all, cancerclassification, K nearest neighbour, microarray gene expression, gene pair ranking

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2179
22 A Simple Affymetrix Ratio-transformation Method Yields Comparable Expression Level Quantifications with cDNA Data

Authors: Chintanu K. Sarmah, Sandhya Samarasinghe, Don Kulasiri, Daniel Catchpoole

Abstract:

Gene expression profiling is rapidly evolving into a powerful technique for investigating tumor malignancies. The researchers are overwhelmed with the microarray-based platforms and methods that confer them the freedom to conduct large-scale gene expression profiling measurements. Simultaneously, investigations into cross-platform integration methods have started gaining momentum due to their underlying potential to help comprehend a myriad of broad biological issues in tumor diagnosis, prognosis, and therapy. However, comparing results from different platforms remains to be a challenging task as various inherent technical differences exist between the microarray platforms. In this paper, we explain a simple ratio-transformation method, which can provide some common ground for cDNA and Affymetrix platform towards cross-platform integration. The method is based on the characteristic data attributes of Affymetrix- and cDNA- platform. In the work, we considered seven childhood leukemia patients and their gene expression levels in either platform. With a dataset of 822 differentially expressed genes from both these platforms, we carried out a specific ratio-treatment to Affymetrix data, which subsequently showed an improvement in the relationship with the cDNA data.

Keywords: Microarray, gene expression profiling, cDNA, Affymetrix, childhood leukaemia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1161