Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 20

Bioinformatics Related Publications

20 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: Bioinformatics, Meta-Learning, hierarchical classification, algorithm recommendation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 459
19 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.

Keywords: Bioinformatics, Dna, SOM, cancer motif, k-mers, Levenshtein distance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 494
18 ELISA Based hTSH Assessment Using Two Sensitive and Specific Anti-hTSH Polyclonal Antibodies

Authors: Mehdi Hedayati, Maysam Mard-Soltani, Mohamad Javad Rasaee, Saeed Khalili, Abdol Karim Sheikhi

Abstract:

Production of specific antibody responses against hTSH is a cumbersome process due to the high identity between the hTSH and the other members of the glycoprotein hormone family (FSH, LH and HCG) and the high identity between the human hTSH and host animals for antibody production. Therefore, two polyclonal antibodies were purified against two recombinant proteins. Four possible ELISA tests were designed based on these antibodies. These ELISA tests were checked against hTSH and other glycoprotein hormones, and their sensitivity and specificity were assessed. Bioinformatics tools were used to analyze the immunological properties. After the immunogen region selection from hTSH protein, c terminal of B hTSH was selected and applied. Two recombinant genes, with these cut pieces (first: two repeats of C terminal of B hTSH, second: tetanous toxin+B hTSH C terminal), were designed and sub-cloned into the pET32a expression vector. Standard methods were used for protein expression, purification, and verification. Thereafter, immunizations of the white New Zealand rabbits were performed and the serums of them were used for antibody titration, purification and characterization. Then, four ELISA tests based on two antibodies were employed to assess the hTSH and other glycoprotein hormones. The results of these assessments were compared with standard amounts. The obtained results indicated that the desired antigens were successfully designed, sub-cloned, expressed, confirmed and used for in vivo immunization. The raised antibodies were capable of specific and sensitive hTSH detection, while the cross reactivity with the other members of the glycoprotein hormone family was minimum. Among the four designed tests, the test in which the antibody against first protein was used as capture antibody, and the antibody against second protein was used as detector antibody did not show any hook effect up to 50 miu/l. Both proteins have the ability to induce highly sensitive and specific antibody responses against the hTSH. One of the antibody combinations of these antibodies has the highest sensitivity and specificity in hTSH detection.

Keywords: Bioinformatics, Protein expression, Cross reactivity, hTSH

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 540
17 Intellectual Property Protection of CRISPR Related Technologies

Authors: Zheng Miao, Dennis Fernandez

Abstract:

CRISPR research has the potential to completely transform life science, agriculture, live-stock and the health care industry. The Intellectual Property derived from its research has raised significant attention in the academic as well as the biopharmaceutical industry culminating an urgent need for strategic IP protection. We review the rudimentary concepts and key competitors of CRISPR technologies as well as the paramount strategies for intellectual property protection. Further, we elaborate on prosecution issues related to CRISPR patents as well as possible solutions to various patent laws, interferences and litigation. Finally, we address how the bioinformatics of the CRISPR technology begs an inquiry into issues of privacy and a host of ethical concerns.

Keywords: Bioinformatics, Biotechnology, Intellectual Property, CRISPR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1118
16 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Mai Mabrouk, Younies Mahmoud, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: Bioinformatics, Feature selection, DNA microarray, missing data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2059
15 Polymorphic Marker Designed from Bioinformatics Sequences Related to Cell Wall Strength for Discrimination of Mangosteen (Garcinia mangostana L.) Clones Resistant to Gamboge Disorder

Authors: E. Mansyah, Sobir, E. Santosa, A. Sisharmini, Sulassih

Abstract:

Gamboge disorder (GD) or fruit damage by the yellow sap is a major problem in mangosteen. Mangosteen plants varied in the level of GD, from very low or non GD to low, moderate and high GD. However it was difficult to differentiate between GD and non GD plants because evaluation of the disorder is strongly influenced by environment. In this study we investigated the usefulness of primer designed from bioinformatics related to cell wall strength, termed as MCWS, to predict GD. Plant materials used were 28 mangosteen plants selected based on percentage of GD categorized as high, moderate, low and very low or non GD. The result showed that the specific DNA fragments were absent in the high GD accessions. The MCWS marker suggests as a novel polymorphic marker for GD in mangosteen as well as a marker for detect variability in mangosteen as apomictic plant.

Keywords: Bioinformatics, mangosteen, cell wall strength, gamboge disorder, polymorphic marker

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1919
14 Biological Data Integration using SOA

Authors: Noura Meshaan Al-Otaibi, Amin Yousef Noaman

Abstract:

Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. This research suggests the use of Service Oriented Architecture (SOA) to integrate biological data from different data sources. This work shows SOA will solve the problems that facing integration process and if the biologist scientists can access the biological data in easier way. There are several methods to implement SOA but web service is the most popular method. The Microsoft .Net Framework used to implement proposed architecture.

Keywords: Bioinformatics, Data Integration, SOA and Web services, biological data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1905
13 A Novel Approach for Protein Classification Using Fourier Transform

Authors: A. F. Ali, D. M. Shawky

Abstract:

Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.

Keywords: Artificial Neural Networks, Bioinformatics, Feature Extraction, Protein Sequence Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1874
12 A Hybridization of Constructive Beam Search with Local Search for Far From Most Strings Problem

Authors: Sayyed R Mousavi

Abstract:

The Far From Most Strings Problem (FFMSP) is to obtain a string which is far from as many as possible of a given set of strings. All the input and the output strings are of the same length, and two strings are said to be far if their hamming distance is greater than or equal to a given positive integer. FFMSP belongs to the class of sequences consensus problems which have applications in molecular biology. The problem is NP-hard; it does not admit a constant-ratio approximation either, unless P = NP. Therefore, in addition to exact and approximate algorithms, (meta)heuristic algorithms have been proposed for the problem in recent years. On the other hand, in the recent years, hybrid algorithms have been proposed and successfully used for many hard problems in a variety of domains. In this paper, a new metaheuristic algorithm, called Constructive Beam and Local Search (CBLS), is investigated for the problem, which is a hybridization of constructive beam search and local search algorithms. More specifically, the proposed algorithm consists of two phases, the first phase is to obtain several candidate solutions via the constructive beam search and the second phase is to apply local search to the candidate solutions obtained by the first phase. The best solution found is returned as the final solution to the problem. The proposed algorithm is also similar to memetic algorithms in the sense that both use local search to further improve individual solutions. The CBLS algorithm is compared with the most recent published algorithm for the problem, GRASP, with significantly positive results; the improvement is by order of magnitudes in most cases.

Keywords: Bioinformatics, Far From Most Strings Problem, Hybrid metaheuristics, Matheuristics, Sequences consensus problems

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1288
11 Endometrial Cancer Recognition via EEG Dependent upon 14-3-3 Protein Leading to an Ontological Diagnosis

Authors: Marios Poulos, George Bokos, Eirini Maliagani, Minas Paschopoulos

Abstract:

The purpose of my research proposal is to demonstrate that there is a relationship between EEG and endometrial cancer. The above relationship is based on an Aristotelian Syllogism; since it is known that the 14-3-3 protein is related to the electrical activity of the brain via control of the flow of Na+ and K+ ions and since it is also known that many types of cancer are associated with 14-3-3 protein, it is possible that there is a relationship between EEG and cancer. This research will be carried out by well-defined diagnostic indicators, obtained via the EEG, using signal processing procedures and pattern recognition tools such as neural networks in order to recognize the endometrial cancer type. The current research shall compare the findings from EEG and hysteroscopy performed on women of a wide age range. Moreover, this practice could be expanded to other types of cancer. The implementation of this methodology will be completed with the creation of an ontology. This ontology shall define the concepts existing in this research-s domain and the relationships between them. It will represent the types of relationships between hysteroscopy and EEG findings.

Keywords: Bioinformatics, Ontology, eeg, Protein 14-3-3, Endometrial cancer

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1251
10 Bioinformatics Profiling of Missense Mutations

Authors: I. Nassiri, B. Goliaei, M. Tavassoli

Abstract:

The ability to distinguish missense nucleotide substitutions that contribute to harmful effect from those that do not is a difficult problem usually accomplished through functional in vivo analyses. In this study, instead current biochemical methods, the effects of missense mutations upon protein structure and function were assayed by means of computational methods and information from the databases. For this order, the effects of new missense mutations in exon 5 of PTEN gene upon protein structure and function were examined. The gene coding for PTEN was identified and localized on chromosome region 10q23.3 as the tumor suppressor gene. The utilization of these methods were shown that c.319G>A and c.341T>G missense mutations that were recognized in patients with breast cancer and Cowden disease, could be pathogenic. This method could be use for analysis of missense mutation in others genes.

Keywords: Bioinformatics, missense mutations, PTEN tumorsuppressor gene

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1951
9 Exploring Dimensionality, Systematic Mutations and Number of Contacts in Simple HP ab-initio Protein Folding Using a Blackboard-based Agent Platform

Authors: Hiram I. Beltrán, Arturo Rojo-Domínguez, Máximo Eduardo Sánchez Gutiérrez, Pedro Pablo González Pérez

Abstract:

A computational platform is presented in this contribution. It has been designed as a virtual laboratory to be used for exploring optimization algorithms in biological problems. This platform is built on a blackboard-based agent architecture. As a test case, the version of the platform presented here is devoted to the study of protein folding, initially with a bead-like description of the chain and with the widely used model of hydrophobic and polar residues (HP model). Some details of the platform design are presented along with its capabilities and also are revised some explorations of the protein folding problems with different types of discrete space. It is also shown the capability of the platform to incorporate specific tools for the structural analysis of the runs in order to understand and improve the optimization process. Accordingly, the results obtained demonstrate that the ensemble of computational tools into a single platform is worthwhile by itself, since experiments developed on it can be designed to fulfill different levels of information in a self-consistent fashion. By now, it is being explored how an experiment design can be useful to create a computational agent to be included within the platform. These inclusions of designed agents –or software pieces– are useful for the better accomplishment of the tasks to be developed by the platform. Clearly, while the number of agents increases the new version of the virtual laboratory thus enhances in robustness and functionality.

Keywords: Bioinformatics, Optimization, Genetic Algorithms, Structural biology, Multi-Agent Systems, Protein Folding

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1512
8 A Bayesian Kernel for the Prediction of Protein- Protein Interactions

Authors: Razib M. Othman, Safaai Deris, Hany Alashwal

Abstract:

Understanding proteins functions is a major goal in the post-genomic era. Proteins usually work in context of other proteins and rarely function alone. Therefore, it is highly relevant to study the interaction partners of a protein in order to understand its function. Machine learning techniques have been widely applied to predict protein-protein interactions. Kernel functions play an important role for a successful machine learning technique. Choosing the appropriate kernel function can lead to a better accuracy in a binary classifier such as the support vector machines. In this paper, we describe a Bayesian kernel for the support vector machine to predict protein-protein interactions. The use of Bayesian kernel can improve the classifier performance by incorporating the probability characteristic of the available experimental protein-protein interactions data that were compiled from different sources. In addition, the probabilistic output from the Bayesian kernel can assist biologists to conduct more research on the highly predicted interactions. The results show that the accuracy of the classifier has been improved using the Bayesian kernel compared to the standard SVM kernels. These results imply that protein-protein interaction can be predicted using Bayesian kernel with better accuracy compared to the standard SVM kernels.

Keywords: Bioinformatics, Support Vector Machines, Protein-Protein Interactions, Bayesian Kernel

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1697
7 Grid Computing in Physics and Life Sciences

Authors: Heinz Stockinger

Abstract:

Certain sciences such as physics, chemistry or biology, have a strong computational aspect and use computing infrastructures to advance their scientific goals. Often, high performance and/or high throughput computing infrastructures such as clusters and computational Grids are applied to satisfy computational needs. In addition, these sciences are sometimes characterised by scientific collaborations requiring resource sharing which is typically provided by Grid approaches. In this article, I discuss Grid computing approaches in High Energy Physics as well as in bioinformatics and highlight some of my experience in both scientific domains.

Keywords: Web services, Bioinformatics, Physics, Grid Computing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1181
6 Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System

Authors: A. Gruzdz, A. Ihnatowicz, J. Siddiqi, B. Akhgar

Abstract:

MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.

Keywords: Bioinformatics, Ontology, Gene expression, selforganizingmaps

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625
5 Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines

Authors: Mona Soliman Habib

Abstract:

This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.

Keywords: Bioinformatics, Support Vector Machines, Named entity recognition, language independence

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1372
4 Comparison of Domain and Hydrophobicity Features for the Prediction of Protein-Protein Interactions using Support Vector Machines

Authors: Razib M. Othman, Safaai Deris, Hany Alashwal

Abstract:

The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.

Keywords: Bioinformatics, Support Vector Machines, Protein-Protein Interactions, protein features

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534
3 A New Predictor of Coding Regions in Genomic Sequences using a Combination of Different Approaches

Authors: Aníbal Rodríguez Fuentes, Juan V. Lorenzo Ginori, Ricardo Grau Ábalo

Abstract:

Identifying protein coding regions in DNA sequences is a basic step in the location of genes. Several approaches based on signal processing tools have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new predictor that improves the efficacy of three techniques that use the Fourier Transform to predict coding regions, and that could be computed using an algorithm that reduces the computation load. Some ideas about the combination of the predictor with other methods are discussed. ROC curves are used to demonstrate the efficacy of the proposed predictor, based on the computation of 25 DNA sequences from three different organisms.

Keywords: Bioinformatics, Digital Signal Processing, Fourier transform, computational load reduction, Coding region prediction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1291
2 One-Class Support Vector Machines for Protein-Protein Interactions Prediction

Authors: Razib M. Othman, Safaai Deris, Hany Alashwal

Abstract:

Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.

Keywords: Bioinformatics, Protein-Protein Interactions, One-Class Support Vector Machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1420
1 EEG Waves Classifier using Wavelet Transform and Fourier Transform

Authors: Maan M. Shaker

Abstract:

The electroencephalograph (EEG) signal is one of the most widely signal used in the bioinformatics field due to its rich information about human tasks. In this work EEG waves classification is achieved using the Discrete Wavelet Transform DWT with Fast Fourier Transform (FFT) by adopting the normalized EEG data. The DWT is used as a classifier of the EEG wave's frequencies, while FFT is implemented to visualize the EEG waves in multi-resolution of DWT. Several real EEG data sets (real EEG data for both normal and abnormal persons) have been tested and the results improve the validity of the proposed technique.

Keywords: Bioinformatics, DWT, FFT, EEG waves

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5043