Ghada Badr

Publications

2 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

Authors: Ghada Badr, Arwa Alturki

Abstract:

The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.

Keywords: Data Mining, alignment, RNA secondary structure, pairwise, component-based

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 556
1 BeamGA Median: A Hybrid Heuristic Search Approach

Authors: Ghada Badr, Manar Hosny, Nuha Bintayyash, Eman Albilali, Souad Larabi Marie-Sainte

Abstract:

The median problem is significantly applied to derive the most reasonable rearrangement phylogenetic tree for many species. More specifically, the problem is concerned with finding a permutation that minimizes the sum of distances between itself and a set of three signed permutations. Genomes with equal number of genes but different order can be represented as permutations. In this paper, an algorithm, namely BeamGA median, is proposed that combines a heuristic search approach (local beam) as an initialization step to generate a number of solutions, and then a Genetic Algorithm (GA) is applied in order to refine the solutions, aiming to achieve a better median with the smallest possible reversal distance from the three original permutations. In this approach, any genome rearrangement distance can be applied. In this paper, we use the reversal distance. To the best of our knowledge, the proposed approach was not applied before for solving the median problem. Our approach considers true biological evolution scenario by applying the concept of common intervals during the GA optimization process. This allows us to imitate a true biological behavior and enhance genetic approach time convergence. We were able to handle permutations with a large number of genes, within an acceptable time performance and with same or better accuracy as compared to existing algorithms.

Keywords: Genetic Algorithm, Phylogenetic Tree, beam search, median problem, permutation, genome rearrangement distance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 565

Abstracts

3 BeamGA Median: A Hybrid Heuristic Search Approach

Authors: Ghada Badr, Manar Hosny, Nuha Bintayyash, Eman Albilali, Souad Larabi Marie-Sainte

Abstract:

The median problem is significantly applied to derive the most reasonable rearrangement phylogenetic tree for many species. More specifically, the problem is concerned with finding a permutation that minimizes the sum of distances between itself and a set of three signed permutations. Genomes with equal number of genes but different order can be represented as permutations. In this paper, an algorithm, namely BeamGA median, is proposed that combines a heuristic search approach (local beam) as an initialization step to generate a number of solutions, and then a Genetic Algorithm (GA) is applied in order to refine the solutions, aiming to achieve a better median with the smallest possible reversal distance from the three original permutations. In this approach, any genome rearrangement distance can be applied. In this paper, we use the reversal distance. To the best of our knowledge, the proposed approach was not applied before for solving the median problem. Our approach considers true biological evolution scenario by applying the concept of common intervals during the GA optimization process. This allows us to imitate a true biological behavior and enhance genetic approach time convergence. We were able to handle permutations with a large number of genes, within an acceptable time performance and with same or better accuracy as compared to existing algorithms.

Keywords: Genetic Algorithm, Phylogenetic Tree, beam search, median problem, permutation, genome rearrangement distance

Procedia PDF Downloads 99
2 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

Authors: Ghada Badr, Arwa Alturki

Abstract:

The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.

Keywords: Data Mining, alignment, RNA secondary structure, pairwise, component-based

Procedia PDF Downloads 253
1 A Review of Effective Gene Selection Methods for Cancer Classification Using Microarray Gene Expression Profile

Authors: Ghada Badr, Hala Alshamlan, Yousef Alohali

Abstract:

Cancer is one of the dreadful diseases, which causes considerable death rate in humans. DNA microarray-based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. In recent years, a DNA microarray technique has gained more attraction in both scientific and in industrial fields. It is important to determine the informative genes that cause cancer to improve early cancer diagnosis and to give effective chemotherapy treatment. In order to gain deep insight into the cancer classification problem, it is necessary to take a closer look at the proposed gene selection methods. We believe that they should be an integral preprocessing step for cancer classification. Furthermore, finding an accurate gene selection method is a very significant issue in a cancer classification area because it reduces the dimensionality of microarray dataset and selects informative genes. In this paper, we classify and review the state-of-art gene selection methods. We proceed by evaluating the performance of each gene selection approach based on their classification accuracy and number of informative genes. In our evaluation, we will use four benchmark microarray datasets for the cancer diagnosis (leukemia, colon, lung, and prostate). In addition, we compare the performance of gene selection method to investigate the effective gene selection method that has the ability to identify a small set of marker genes, and ensure high cancer classification accuracy. To the best of our knowledge, this is the first attempt to compare gene selection approaches for cancer classification using microarray gene expression profile.

Keywords: Feature selection, Gene Selection, Microarray, cancer classification, gene expression profile

Procedia PDF Downloads 298