Search results for: protein sequences

2777 Bioinformatics Approach to Identify Physicochemical and Structural Properties Associated with Successful Cell-free Protein Synthesis

Abstract:

Cell-free protein synthesis is widely used to synthesize recombinant proteins. It allows genome-scale expression of various polypeptides under strictly controlled uniform conditions. However, only a minor fraction of all proteins can be successfully expressed in the systems of protein synthesis that are currently used. The factors determining expression success are poorly understood. At present, the vast volume of data is accumulated in cell-free expression databases. It makes possible comprehensive bioinformatics analysis and identification of multiple features associated with successful cell-free expression. Here, we describe an approach aimed at identification of multiple physicochemical and structural properties of amino acid sequences associated with protein solubility and aggregation and highlight major correlations obtained using this approach. The developed method includes: categorical assessment of the protein expression data, calculation and prediction of multiple properties of expressed amino acid sequences, correlation of the individual properties with the expression scores, and evaluation of statistical significance of the observed correlations. Using this approach, we revealed a number of statistically significant correlations between calculated and predicted features of protein sequences and their amenability to cell-free expression. It was found that some of the features, such as protein pI, hydrophobicity, presence of signal sequences, etc., are mostly related to protein solubility, whereas the others, such as protein length, number of disulfide bonds, content of secondary structure, etc., affect mainly the expression propensity. We also demonstrated that amenability of polypeptide sequences to cell-free expression correlates with the presence of multiple sites of post-translational modifications. The correlations revealed in this study provide a plethora of important insights into protein folding and rationalization of protein production. The developed bioinformatics approach can be of practical use for predicting expression success and optimizing cell-free protein synthesis.

Keywords: bioinformatics analysis, cell-free protein synthesis, expression success, optimization, recombinant proteins

Procedia PDF Downloads 380

2776 A Similarity/Dissimilarity Measure to Biological Sequence Alignment

Authors: Muhammad A. Khan, Waseem Shahzad

Abstract:

Analysis of protein sequences is carried out for the purpose to discover their structural and ancestry relationship. Sequence similarity determines similar protein structures, similar function, and homology detection. Biological sequences composed of amino acid residues or nucleotides provide significant information through sequence alignment. In this paper, we present a new similarity/dissimilarity measure to sequence alignment based on the primary structure of a protein. The approach finds the distance between the two given sequences using the novel sequence alignment algorithm and a mathematical model. The algorithm runs at a time complexity of O(n²). A distance matrix is generated to construct a phylogenetic tree of different species. The new similarity/dissimilarity measure outperforms other existing methods.

Keywords: alignment, distance, homology, mathematical model, phylogenetic tree

Procedia PDF Downloads 146

2775 A Protein-Wave Alignment Tool for Frequency Related Homologies Identification in Polypeptide Sequences

Authors: Victor Prevost, Solene Landerneau, Michel Duhamel, Joel Sternheimer, Olivier Gallet, Pedro Ferrandiz, Marwa Mokni

Abstract:

The search for homologous proteins is one of the ongoing challenges in biology and bioinformatics. Traditionally, a pair of proteins is thought to be homologous when they originate from the same ancestral protein. In such a case, their sequences share similarities, and advanced scientific research effort is spent to investigate this question. On this basis, we propose the Protein-Wave Alignment Tool (”P-WAT”) developed within the framework of the France Relance 2030 plan. Our work takes into consideration the mass-related wave aspect of protein biosynthesis, by associating specific frequencies to each amino acid according to its mass. Amino acids are then regrouped within their mass category. This way, our algorithm produces specific alignments in addition to those obtained with a common amino acid coding system. For this purpose, we develop the ”P-WAT” original algorithm, able to address large protein databases, with different attributes such as species, protein names, etc. that allow us to align user’s requests with a set of specific protein sequences. The primary intent of this algorithm is to achieve efficient alignments, in this specific conceptual frame, by minimizing execution costs and information loss. Our algorithm identifies sequence similarities by searching for matches of sub-sequences of different sizes, referred to as primers. Our algorithm relies on Boolean operations upon a dot plot matrix to identify primer amino acids common to both proteins which are likely to be part of a significant alignment of peptides. From those primers, dynamic programming-like traceback operations generate alignments and alignment scores based on an adjusted PAM250 matrix.

Keywords: protein, alignment, homologous, Genodic

Procedia PDF Downloads 75

2774 Constructing Orthogonal De Bruijn and Kautz Sequences and Applications

Authors: Yaw-Ling Lin

Abstract:

A de Bruijn graph of order k is a graph whose vertices representing all length-k sequences with edges joining pairs of vertices whose sequences have maximum possible overlap (length k−1). Every Hamiltonian cycle of this graph defines a distinct, minimum length de Bruijn sequence containing all k-mers exactly once. A Kautz sequence is the minimal generating sequence so as the sequence of minimal length that produces all possible length-k sequences with the restriction that every two consecutive alphabets in the sequences must be different. A collection of de Bruijn/Kautz sequences are orthogonal if any two sequences are of maximally differ in sequence composition; that is, the maximum length of their common substring is k. In this paper, we discuss how such a collection of (maximal) orthogonal de Bruijn/Kautz sequences can be made and use the algorithm to build up a web application service for the synthesized DNA and other related biomolecular sequences.

Keywords: biomolecular sequence synthesis, de Bruijn sequences, Eulerian cycle, Hamiltonian cycle, Kautz sequences, orthogonal sequences

Procedia PDF Downloads 118

2773 Parkinson's Disease Gene Identification Using Physicochemical Properties of Amino Acids

Authors: Priya Arora, Ashutosh Mishra

Abstract:

Gene identification, towards the pursuit of mutated genes, leading to Parkinson’s disease, puts forward a challenge towards proactive cure of the disorder itself. Computational analysis is an effective technique for exploring genes in the form of protein sequences, as the theoretical and manual analysis is infeasible. The limitations and effectiveness of a particular computational method are entirely dependent on the previous data that is available for disease identification. The article presents a sequence-based classification method for the identification of genes responsible for Parkinson’s disease. During the initiation phase, the physicochemical properties of amino acids transform protein sequences into a feature vector. The second phase of the method employs Jaccard distances to select negative genes from the candidate population. The third phase involves artificial neural networks for making final predictions. The proposed approach is compared with the state of art methods on the basis of F-measure. The results confirm and estimate the efficiency of the method.

Keywords: disease gene identification, Parkinson’s disease, physicochemical properties of amino acid, protein sequences

Procedia PDF Downloads 101

2772 Gene Prediction in DNA Sequences Using an Ensemble Algorithm Based on Goertzel Algorithm and Anti-Notch Filter

Authors: Hamidreza Saberkari, Mousa Shamsi, Hossein Ahmadi, Saeed Vaali, , MohammadHossein Sedaaghi

Abstract:

In the recent years, using signal processing tools for accurate identification of the protein coding regions has become a challenge in bioinformatics. Most of the genomic signal processing methods is based on the period-3 characteristics of the nucleoids in DNA strands and consequently, spectral analysis is applied to the numerical sequences of DNA to find the location of periodical components. In this paper, a novel ensemble algorithm for gene selection in DNA sequences has been presented which is based on the combination of Goertzel algorithm and anti-notch filter (ANF). The proposed algorithm has many advantages when compared to other conventional methods. Firstly, it leads to identify the coding protein regions more accurate due to using the Goertzel algorithm which is tuned at the desired frequency. Secondly, faster detection time is achieved. The proposed algorithm is applied on several genes, including genes available in databases BG570 and HMR195 and their results are compared to other methods based on the nucleotide level evaluation criteria. Implementation results show the excellent performance of the proposed algorithm in identifying protein coding regions, specifically in identification of small-scale gene areas.

Keywords: protein coding regions, period-3, anti-notch filter, Goertzel algorithm

Procedia PDF Downloads 357

2771 Genome-Wide Analysis of BES1/BZR1 Gene Family in Five Plant Species

Authors: Jafar Ahmadi, Zhohreh Asiaban, Sedigheh Fabriki Ourang

Abstract:

Brassinosteroids (BRs) regulate cell elongation, vascular differentiation, senescence and stress responses. BRs signal through the BES1/BZR1 family of transcription factors, which regulate hundreds of target genes involved in this pathway. In this research a comprehensive genome-wide analysis was carried out in BES1/BZR1 gene family in Arabidopsis thaliana, Cucumis sativus, Vitis vinifera, Glycin max, and Brachypodium distachyon. Specifications of the desired sequences, dot plot and hydropathy plot were analyzed in the protein and genome sequences of five plant species. The maximum amino acid length was attributed to protein sequence Brdic3g with 374aa and the minimum amino acid length was attributed to protein sequence Gm7g with 163aa. The maximum Instability index was attributed to protein sequence AT1G19350 equal with 79.99 and the minimum Instability index was attributed to protein sequence Gm5g equal with 33.22. Aliphatic index of these protein sequences ranged from 47.82 to 78.79 in Arabidopsis thaliana, 49.91 to 57.50 in Vitis vinifera, 55.09 to 82.43 in Glycin max, 54.09 to 54.28 in Brachypodium distachyon 55.36 to 56.83 in Cucumis sativus. Overall, data obtained from our investigation contributes a better understanding of the complexity of the BES1/BZR1 gene family and provides the first step towards directing future experimental designs to perform systematic analysis of the functions of the BES1/BZR1 gene family.

Keywords: BES1/BZR1, brassinosteroids, phylogenetic analysis, transcription factor

Procedia PDF Downloads 302

2770 Text Mining Techniques for Prioritizing Pathogenic Mutations in Protein Families Known to Misfold or Aggregate

Authors: Khaleel Saleh Al-Rababah

Abstract:

Amyloid fibril forming regions, which are known as protein aggregates, in sequences of some protein families are associated with a number of diseases known as amyloidosis. Mutations play a role in forming fibrils by accelerating the fibril formation process. In this paper we want to extract diseases that caused by those mutations as a result of the impact of the mutations on structural and functional properties of the aggregated protein. We propose a text mining system, to automatically extract mutations, diseases and relations between mutations and diseases. We presented an algorithm based on finite state to cluster mutations found in the same sentence as a sentence could contain different mutation cause different diseases. Also, we presented a co reference algorithm that enables cross-link sentences.

Keywords: amyloid, amyloidosis, co reference, protein, text mining

Procedia PDF Downloads 491

2769 Analysis on Thermococcus achaeans with Frequent Pattern Mining

Authors: Jeongyeob Hong, Myeonghoon Park, Taeson Yoon

Abstract:

After the advent of Achaeans which utilize different metabolism pathway and contain conspicuously different cellular structure, they have been recognized as possible materials for developing quality of human beings. Among diverse Achaeans, in this paper, we compared 16s RNA Sequences of four different species of Thermococcus: Achaeans genus specialized in sulfur-dealing metabolism. Four Species, Barophilus, Kodakarensis, Hydrothermalis, and Onnurineus, live near the hydrothermal vent that emits extreme amount of sulfur and heat. By comparing ribosomal sequences of aforementioned four species, we found similarities in their sequences and expressed protein, enabling us to expect that certain ribosomal sequence or proteins are vital for their survival. Apriori algorithms and Decision Tree were used. for comparison.

Keywords: Achaeans, Thermococcus, apriori algorithm, decision tree

Procedia PDF Downloads 261

2768 Comparison of Physicochemical Properties of Catfish Myofibrillar and Sarcoplasmic Protein Hydrolysates and Characterization of Their Bioactive Peptides

Authors: Leila Najafian

Abstract:

Sarcoplasmic protein hydrolysates (SPHs) and myofibrillar protein hydrolysates (MPHs) from patin (Pangasius sutchi) were produced using two types of proteases: Papain and Alcalase. 1,1-diphenyl-2-picrylhydrazyl (DPPH), 2,2'-azino-bis(3-ethylbenzothiazoline-6-sulphonic acid) diammonium salt (ABTS) radical scavenging activities and metal chelating activity assays for antioxidant activities were carried out on the SPHs and MPHs. The hydrolysates were isolated and purified by ultrafiltration, gel filtration and reverse phase high-performance liquid chromatography (RP-HPLC) and liquid chromatography with tandem mass spectrometry detection (LC-MS/MS) was used in identifying peptide sequences. The results showed that when the DH of MPHs increased, the protein solubility increased, while the highest amount of the protein solubility of SPHs was after 60 min incubation. The effect of DH on antioxidant activities of SPHs and MPHs was investigated. Among the hydrolysates, papain-MPH and Alcalase-SPH, which had the highest antioxidant activities, were purified. The potent fractions obtained from RP-HPLC of sarcoplasmic (SI 3 fraction) and myofibrillar (MI 4 fraction) hydrolysates showed the highest DPPH radical scavenging activity. The FVNQPYLLYSVHMK peptide for MPH and the LVVDIPAALQHA peptide for SPH exhibited the highest antioxidant activity. The presence of hydrophobic and hydrophilic amino acids, namely leucine (L), valine (V), phenylalanine (F), histidine (H) and proline (P), in the peptide sequences of SPH and MPH are believed to contribute to high antioxidant activity. Hence, SPH and MPH from patin have the potential as a natural functional ingredient in food and pharmaceutical industry.

Keywords: patin (Pangasius sutchi), protein hydrolysates, antioxidative peptides, mass spectrometry

Procedia PDF Downloads 232

2767 A Comprehensive Analysis of the Phylogenetic Signal in Ramp Sequences in 211 Vertebrates

Authors: Lauren M. McKinnon, Justin B. Miller, Michael F. Whiting, John S. K. Kauwe, Perry G. Ridge

Abstract:

Background: Ramp sequences increase translational speed and accuracy when rare, slowly-translated codons are found at the beginnings of genes. Here, the results of the first analysis of ramp sequences in a phylogenetic construct are presented. Methods: Ramp sequences were compared from 211 vertebrates (110 Mammalian and 101 non-mammalian). The presence and absence of ramp sequences were analyzed as a binary character in a parsimony and maximum likelihood framework. Additionally, ramp sequences were mapped to the Open Tree of Life taxonomy to determine the number of parallelisms and reversals that occurred, and these results were compared to what would be expected due to random chance. Lastly, aligned nucleotides in ramp sequences were compared to the rest of the sequence in order to examine possible differences in phylogenetic signal between these regions of the gene. Results: Parsimony and maximum likelihood analyses of the presence/absence of ramp sequences recovered phylogenies that are highly congruent with established phylogenies. Additionally, the retention index of ramp sequences is significantly higher than would be expected due to random chance (p-value = 0). A chi-square analysis of completely orthologous ramp sequences resulted in a p-value of approximately zero as compared to random chance. Discussion: Ramp sequences recover comparable phylogenies as other phylogenomic methods. Although not all ramp sequences appear to have a phylogenetic signal, more ramp sequences track speciation than expected by random chance. Therefore, ramp sequences may be used in conjunction with other phylogenomic approaches.

Keywords: codon usage bias, phylogenetics, phylogenomics, ramp sequence

Procedia PDF Downloads 122

2766 Predicting Potential Protein Therapeutic Candidates from the Gut Microbiome

Authors: Prasanna Ramachandran, Kareem Graham, Helena Kiefel, Sunit Jain, Todd DeSantis

Abstract:

Microbes that reside inside the mammalian GI tract, commonly referred to as the gut microbiome, have been shown to have therapeutic effects in animal models of disease. We hypothesize that specific proteins produced by these microbes are responsible for this activity and may be used directly as therapeutics. To speed up the discovery of these key proteins from the big-data metagenomics, we have applied machine learning techniques. Using amino acid sequences of known epitopes and their corresponding binding partners, protein interaction descriptors (PID) were calculated, making a positive interaction set. A negative interaction dataset was calculated using sequences of proteins known not to interact with these same binding partners. Using Random Forest and positive and negative PID, a machine learning model was trained and used to predict interacting versus non-interacting proteins. Furthermore, the continuous variable, cosine similarity in the interaction descriptors was used to rank bacterial therapeutic candidates. Laboratory binding assays were conducted to test the candidates for their potential as therapeutics. Results from binding assays reveal the accuracy of the machine learning prediction and are subsequently used to further improve the model.

Keywords: protein-interactions, machine-learning, metagenomics, microbiome

Procedia PDF Downloads 337

2765 Estimation of Transition and Emission Probabilities

Authors: Aakansha Gupta, Neha Vadnere, Tapasvi Soni, M. Anbarsi

Abstract:

Protein secondary structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine and biotechnology. Some aspects of protein functions and genome analysis can be predicted by secondary structure prediction. This is used to help annotate sequences, classify proteins, identify domains, and recognize functional motifs. In this paper, we represent protein secondary structure as a mathematical model. To extract and predict the protein secondary structure from the primary structure, we require a set of parameters. Any constants appearing in the model are specified by these parameters, which also provide a mechanism for efficient and accurate use of data. To estimate these model parameters there are many algorithms out of which the most popular one is the EM algorithm or called the Expectation Maximization Algorithm. These model parameters are estimated with the use of protein datasets like RS126 by using the Bayesian Probabilistic method (data set being categorical). This paper can then be extended into comparing the efficiency of EM algorithm to the other algorithms for estimating the model parameters, which will in turn lead to an efficient component for the Protein Secondary Structure Prediction. Further this paper provides a scope to use these parameters for predicting secondary structure of proteins using machine learning techniques like neural networks and fuzzy logic. The ultimate objective will be to obtain greater accuracy better than the previously achieved.

Keywords: model parameters, expectation maximization algorithm, protein secondary structure prediction, bioinformatics

Procedia PDF Downloads 437

2764 Lentil Protein Fortification in Cranberry Squash

Authors: Sandhya Devi A

Abstract:

The protein content of the cranberry squash (protein: 0g) may be increased by extracting protein from the lentils (9 g), which is particularly linked to a lower risk of developing heart disease. Using the technique of alkaline extraction from the lentils flour, protein may be extracted. Alkaline extraction of protein from lentil flour was optimized utilizing response surface approach in order to maximize both protein content and yield. Cranberry squash may be taken if a protein fortification syrup is prepared and processed into the squash.

Keywords: alkaline extraction, cranberry squash, protein fortification, response surface methodology

Procedia PDF Downloads 74

2763 Advances on the Understanding of Sequence Convergence Seen from the Perspective of Mathematical Working Spaces

Authors: Paula Verdugo-Hernandez, Patricio Cumsille

Abstract:

We analyze a first-class on the convergence of real number sequences, named hereafter sequences, to foster exploration and discovery of concepts through graphical representations before engaging students in proving. The main goal was to differentiate between sequences and continuous functions-of-a-real-variable and better understand concepts at an initial stage. We applied the analytic frame of mathematical working spaces, which we expect to contribute to extending to sequences since, as far as we know, it has only developed for other objects, and which is relevant to analyze how mathematical work is built systematically by connecting the epistemological and cognitive perspectives, and involving the semiotic, instrumental, and discursive dimensions.

Keywords: convergence, graphical representations, mathematical working spaces, paradigms of real analysis, real number sequences

Procedia PDF Downloads 109

2762 Hydration of Protein-RNA Recognition Sites

Authors: Amita Barik, Ranjit Prasad Bahadur

Abstract:

We investigate the role of water molecules in 89 protein-RNA complexes taken from the Protein Data Bank. Those with tRNA and single-stranded RNA are less hydrated than with duplex or ribosomal proteins. Protein-RNA interfaces are hydrated less than protein-DNA interfaces, but more than protein-protein interfaces. Majority of the waters at protein-RNA interfaces makes multiple H-bonds; however, a fraction does not make any. Those making Hbonds have preferences for the polar groups of RNA than its partner protein. The spatial distribution of waters makes interfaces with ribosomal proteins and single-stranded RNA relatively ‘dry’ than interfaces with tRNA and duplex RNA. In contrast to protein-DNA interfaces, mainly due to the presence of the 2’OH, the ribose in protein-RNA interfaces is hydrated more than the phosphate or the bases. The minor groove in protein-RNA interfaces is hydrated more than the major groove, while in protein-DNA interfaces it is reverse. The strands make the highest number of water-mediated H-bonds per unit interface area followed by the helices and the non-regular structures. The preserved waters at protein-RNA interfaces make higher number of H-bonds than the other waters. Preserved waters contribute toward the affinity in protein-RNA recognition and should be carefully treated while engineering protein-RNA interfaces.

Keywords: h-bonds, minor-major grooves, preserved water, protein-RNA interfaces

Procedia PDF Downloads 254

2761 Protein Crystallization Induced by Surface Plasmon Resonance

Authors: Tetsuo Okutsu

Abstract:

We have developed a crystallization plate with the function of promoting protein crystallization. A gold thin film is deposited on the crystallization plate. A protein solution is dropped thereon, and crystallization is promoted when the protein is irradiated with light of a wavelength that protein does not absorb. Protein is densely adsorbed on the gold thin film surface. The light excites the surface plasmon resonance of the gold thin film, the protein is excited by the generated enhanced electric field induced by surface plasmon resonance, and the amino acid residues are radicalized to produce protein dimers. The dimers function as templates for protein crystals, crystallization is promoted.

Keywords: lysozyme, plasmon, protein, crystallization, RNaseA

Procedia PDF Downloads 183

2760 Prediction of All-Beta Protein Secondary Structure Using Garnier-Osguthorpe-Robson Method

Authors: K. Tejasri, K. Suvarna Vani, S. Prathyusha, S. Ramya

Abstract:

Proteins are chained sequences of amino acids which are brought together by the peptide bonds. Many varying formations of the chains are possible due to multiple combinations of amino acids and rotation in numerous positions along the chain. Protein structure prediction is one of the crucial goals worked towards by the members of bioinformatics and theoretical chemistry backgrounds. Among the four different structure levels in proteins, we emphasize mainly the secondary level structure. Generally, the secondary protein basically comprises alpha-helix and beta-sheets. Multi-class classification problem of data with disparity is truly a challenge to overcome and has to be addressed for the beta strands. Imbalanced data distribution constitutes a couple of the classes of data having very limited training samples collated with other classes. The secondary structure data is extracted from the protein primary sequence, and the beta-strands are predicted using suitable machine learning algorithms.

Keywords: proteins, secondary structure elements, beta-sheets, beta-strands, alpha-helices, machine learning algorithms

Procedia PDF Downloads 59

2759 Unifying RSV Evolutionary Dynamics and Epidemiology Through Phylodynamic Analyses

Authors: Lydia Tan, Philippe Lemey, Lieselot Houspie, Marco Viveen, Darren Martin, Frank Coenjaerts

Abstract:

Introduction: Human respiratory syncytial virus (hRSV) is the leading cause of severe respiratory tract infections in infants under the age of two. Genomic substitutions and related evolutionary dynamics of hRSV are of great influence on virus transmission behavior. The evolutionary patterns formed are due to a precarious interplay between the host immune response and RSV, thereby selecting the most viable and less immunogenic strains. Studying genomic profiles can teach us which genes and consequent proteins play an important role in RSV survival and transmission dynamics. Study design: In this study, genetic diversity and evolutionary rate analysis were conducted on 36 RSV subgroup B whole genome sequences and 37 subgroup A genome sequences. Clinical RSV isolates were obtained from nasopharyngeal aspirates and swabs of children between 2 weeks and 5 years old of age. These strains, collected during epidemic seasons from 2001 to 2011 in the Netherlands and Belgium by either conventional or 454-sequencing. Sequences were analyzed for genetic diversity, recombination events, synonymous/non-synonymous substitution ratios, epistasis, and translational consequences of mutations were mapped to known 3D protein structures. We used Bayesian statistical inference to estimate the rate of RSV genome evolution and the rate of variability across the genome. Results: The A and B profiles were described in detail and compared to each other. Overall, the majority of the whole RSV genome is highly conserved among all strains. The attachment protein G was the most variable protein and its gene had, similar to the non-coding regions in RSV, more elevated (two-fold) substitution rates than other genes. In addition, the G gene has been identified as the major target for diversifying selection. Overall, less gene and protein variability was found within RSV-B compared to RSV-A and most protein variation between the subgroups was found in the F, G, SH and M2-2 proteins. For the F protein mutations and correlated amino acid changes are largely located in the F2 ligand-binding domain. The small hydrophobic phosphoprotein and nucleoprotein are the most conserved proteins. The evolutionary rates were similar in both subgroups (A: 6.47E-04, B: 7.76E-04 substitution/site/yr), but estimates of the time to the most recent common ancestor were much lower for RSV-B (B: 19, A: 46.8 yrs), indicating that there is more turnover in this subgroup. Conclusion: This study provides a detailed description of whole RSV genome mutations, the effect on translation products and the first estimate of the RSV genome evolution tempo. The immunogenic G protein seems to require high substitution rates in order to select less immunogenic strains and other conserved proteins are most likely essential to preserve RSV viability. The resulting G gene variability makes its protein a less interesting target for RSV intervention methods. The more conserved RSV F protein with less antigenic epitope shedding is, therefore, more suitable for developing therapeutic strategies or vaccines.

Keywords: drug target selection, epidemiology, respiratory syncytial virus, RSV

Procedia PDF Downloads 378

2758 Alternative Splicing of an Arabidopsis Gene, At2g24600, Encoding Ankyrin-Repeat Protein

Authors: H. Sakamoto, S. Kurosawa, M. Suzuki, S. Oguri

Abstract:

In Arabidopsis, several genes encoding proteins with ankyrin repeats and trans-membrane domains (AtANKTM) have been identified as mediators of biotic and abiotic stress responses. It has been known that the expression of an AtANKTM gene, At2g24600, is induced in response to abiotic stress and that there are four splicing variants derived from this locus. In this study, by RT-PCR and sequencing analysis, an unknown splicing variant of the At2g24600 transcript was identified. Based on differences in the predicted amino acid sequences, the five splicing variants are divided into three groups. The three predicted proteins are highly homologous, yet have different numbers of ankyrin repeats and trans-membrane domains. It is generally considered that ankyrin repeats mediate protein-protein interaction and that the number of trans-membrane domains affects membrane topology of proteins. The protein variants derived from the At2g24600 locus may have different molecular functions each other.

Keywords: alternative splicing, ankyrin repeats, trans-membrane domains, arabidopsis

Procedia PDF Downloads 341

2757 In-Vivo Association of Multivalent 11 Zinc Fingers Transcriptional Factors CTCF and Boris to YB-1 in Multiforme Glioma-RGBM Cell Line

Authors: Daruliza Kernain, Shaharum Shamsuddin, See Too Wei Cun

Abstract:

CTCF is a unique, highly conserved and ubiquitously expressed 11 zinc finger (ZF) transcriptional factor with multiple target sites. It is able to bind to various target sequences to perform different regulatory roles including promoter activation or repression, creating hormone-responsive gene silencing element, and functional block of enhancer-promoter interactions. The binding of CTCF to the essential binding site is through the combination of different ZF domain. On the other hand, BORIS for brother of the regulator of imprinted sites, which expressed only in the testis and certain cancer cell line is homology to CTCF 11 ZF domains. Since both transcriptional factors share the same ZF domains hence there is a possibility for both to bind to the same target sequences. In this study, the interaction of these two proteins to multi-functional Y-box DNA/RNA-binding factor, YB-1 was determined. The protein-protein interaction between CTCF/YB-1 and BORIS/YB-1 were discovered by Co-immuno-precipitation (CO-IP) technique through reciprocal experiment from RGBM total cell lysate. The results showed that both CTCF and BORIS were able to interact with YB-1 in Glioma RGBM cell line. To the best of our knowledge, this is the first findings demonstrating the ability of BORIS and YB-1 to form a complex in vivo.

Keywords: immunoprecipitation, CTCF/BORIS/YB-1, transcription factor, molecular medicine

Procedia PDF Downloads 233

2756 Design and in Slico Study of the Truncated Spike-M-N SARS-CoV-2 as a Novel Effective Vaccine Candidate

Authors: Aghasadeghi MR., Bahramali G., Sadat SM., Sadeghi SA., Yousefi M., Khodaei K., Ghorbani M., Sadat Larijani M.

Abstract:

Background:The emerging COVID-19 pandemic is a serious concernfor the public health worldwide. Despite the many mutations in the virus genome, it is important to find an effective vaccine against viral mutations. Therefore, in current study, we aimed at immunoinformatic evaluation of the virus proteins immunogenicity to design a preventive vaccine candidate, which could elicit humoral and cellular immune responses as well. Methods:Three antigenic regions are included;Spike, Membrane, and Nucleocapsid amino acid sequences were obtained, and possible fusion proteins were assessed andcompared by immunogenicity, structural features, and population coverage. The best fusion protein was also evaluated for MHC-I and MHC-II T-cell epitopes and the linear and conformational B-cell epitopes. Results: Among the four predicted models, the truncated Spike protein in fusion with M and N proteins is composed of 24 highly immunogenic human MHC class I and 29 MHC class II, along with 14 B-cell linear and 61 discontinues epitopes. Also, the selected protein has high antigenicity and acceptable population coverage of 82.95% in Iran and 92.51% in Europe. Conclusion: The data indicate that the truncated Spike-M-N SARS-CoV-2form which could be potential targets of neutralizing antibodies. The protein also has the ability to stimulate humoral and cellular immunity. The in silico study provided the fusion protein as a potential preventive vaccine candidate for further in vivo evaluation.

Keywords: SARS-CoV-2, immunoinformatic, protein, vaccine

Procedia PDF Downloads 179

2755 Protein Remote Homology Detection and Fold Recognition by Combining Profiles with Kernel Methods

Authors: Bin Liu

Abstract:

Protein remote homology detection and fold recognition are two most important tasks in protein sequence analysis, which is critical for protein structure and function studies. In this study, we combined the profile-based features with various string kernels, and constructed several computational predictors for protein remote homology detection and fold recognition. Experimental results on two widely used benchmark datasets showed that these methods outperformed the competing methods, indicating that these predictors are useful computational tools for protein sequence analysis. By analyzing the discriminative features of the training models, some interesting patterns were discovered, reflecting the characteristics of protein superfamilies and folds, which are important for the researchers who are interested in finding the patterns of protein folds.

Keywords: protein remote homology detection, protein fold recognition, profile-based features, Support Vector Machines (SVMs)

Procedia PDF Downloads 124

2754 Carbon Based Classification of Aquaporin Proteins: A New Proposal

Authors: Parul Johri, Mala Trivedi

Abstract:

Major Intrinsic proteins (MIPs), actively involved in the passive transport of small polar molecules across the membranes of almost all living organisms. MIPs that specifically transport water molecules are named aquaporins (AQPs). The permeability of membranes is actively controlled by the regulation of the amount of different MIPs present but also in some cases by phosphorylation and dephosphorylation of the channel. Based on sequence similarity, MIPs have been classified into many categories. All of the proteins are made up of the 20 amino acids, the only difference is there in their orientations. Again all the 20 amino acids are made up of the basic five elements namely: carbon, hydrogen, oxygen, sulphur and nitrogen. These elements are responsible for giving the amino acids the properties of hydrophilicity/hydrophobicity which play an important role in protein interactions. The hydrophobic amino acids characteristically have greater number of carbon atoms as carbon is the main element which contributes to hydrophobic interactions in proteins. It is observed that the carbon level of proteins in different species is different. In the present work, we have taken a sample set of 150 aquaporins proteins from Uniprot database and a dynamic programming code was written to calculate the carbon percentage for each sequence. This carbon percentage was further used to barcode the aqauporins of animals and plants. The protein taken from Oryza sativa, Zea mays and Arabidopsis thaliana preferred to have carbon percentage of 31.8 to 35, whereas on the other hand sequences taken from Mus musculus, Saccharomyces cerevisiae, Homo sapiens, Bos Taurus, and Rattus norvegicus preferred to have carbon percentage of 31 to 33.7. This clearly demarks the carbon range in the aquaporin proteins from plant and animal origin. Hence the atom level analysis of protein sequences can provide us with better results as compared to the residue level comparison.

Keywords: aquaporins, carbon, dynamic prgramming, MIPs

Procedia PDF Downloads 334

2753 In silico and in vitro Investigation of the Role of Acinetobacter baumannii in the Pathogenesis of Multiple Sclerosis

Authors: Kieren Luellman, Makenzi Rockwell, Eduardo Callegari, Nichole Haag, Chun Wu

Abstract:

Multiple sclerosis (MS) is an autoimmune disorder that damages the myelin sheath of neurons in the central nervous system. The presence of Acinetobacter bacteria and anti-Acinetobacter antibodies in MS patients has led to the hypothesis that the bacteria may contribute to MS pathogenesis. In this study, the protein sequences of Acinetobacter baumannii were compared to five peptides from three mammalian myelin proteins, i.e., Proteolipid Protein (PLP): PLP 139-151, PLP 178-191, Myelin Basic Protein (MBP): MBP 84-104 and Myelin Oligodendrocyte Glycoprotein (MOG): MOG 35-55 and MOG 92-106 respectively, known to induce experimental autoimmune encephalomyelitis (EAE), a condition similar to MS. We found 11 hits (i.e., with five or more amino acid sequence similarity) in Acinetobacter baumannii, which are identical or similar to PLP139-151, 32 hits to PLP178-191, 35 to MBP 84-104, 41 hits to MOG 35-55 and 26 hits to MOG92-106. In addition, Western blotting was used to assess possible interaction between the bacterial proteins and human anti-MBP, anti-MOG, and anti-PLP antibodies produced in rabbits, corresponding to MBP 84-104, MOG 35-55, and PLP 139-151, respectively. We found that both human Polyclonal anti-MOG antibody and anti-PLP antibody recognized a protein or more proteins of the same molecular mass of around 25 kDa. in Acinetobacter baumannii. The results suggested that this/these protein(s) might potentially serve as antigen(s) to induce anti-MOG antibody and anti-PLP antibody production in mammalian B cells. The proteomic study identified 433 hits, among which the sequence of Acinetobacter baumannii protein 491 subunit A matches a previously published enzyme Acinetobacter 3-Oxoadipate CoA-Transferase, in which a fragment of its peptide was observed to recognize MS patient serum via ELISA method. Our findings might pave the road to understanding one of the pathogenesis mechanisms of MS.

Keywords: multiple sclerosis, pathogenesis, Acinetobacter baumannii, antibody recognition

Procedia PDF Downloads 69

2752 Membrane Spanning DNA Origami Nanopores for Protein Translocation

Authors: Genevieve Pugh, Johnathan Burns, Stefan Howorka

Abstract:

Single-molecule sensing via protein nanopores has achieved a step-change in portable and label-free DNA sequencing. However, protein pores of both natural or engineered origin are not able to produce the tunable diameters needed for effective protein sensing. Here, we describe a generic strategy to build synthetic DNA nanopores that are wide enough to accommodate folded protein. The pores are composed of interlinked DNA duplexes and carry lipid anchors to achieve the required membrane insertion. Our demonstrator pore has a contiguous cross-sectional channel area of 50 nm2 which is 6-times larger than the largest protein pore. Consequently, transport of folded protein across bilayers is possible. The modular design is amenable for different pore dimensions and can be adapted for protein sensing or to create molecular gates in synthetic biology.

Keywords: biosensing, DNA nanotechnology, DNA origami, nanopore sensing

Procedia PDF Downloads 287

2751 CMPD: Cancer Mutant Proteome Database

Authors: Po-Jung Huang, Chi-Ching Lee, Bertrand Chin-Ming Tan, Yuan-Ming Yeh, Julie Lichieh Chu, Tin-Wen Chen, Cheng-Yang Lee, Ruei-Chi Gan, Hsuan Liu, Petrus Tang

Abstract:

Whole-exome sequencing focuses on the protein coding regions of disease/cancer associated genes based on a priori knowledge is the most cost-effective method to study the association between genetic alterations and disease. Recent advances in high throughput sequencing technologies and proteomic techniques has provided an opportunity to integrate genomics and proteomics, allowing readily detectable mutated peptides corresponding to mutated genes. Since sequence database search is the most widely used method for protein identification using Mass spectrometry (MS)-based proteomics technology, a mutant proteome database is required to better approximate the real protein pool to improve disease-associated mutated protein identification. Large-scale whole exome/genome sequencing studies were launched by National Cancer Institute (NCI), Broad Institute, and The Cancer Genome Atlas (TCGA), which provide not only a comprehensive report on the analysis of coding variants in diverse samples cell lines but a invaluable resource for extensive research community. No existing database is available for the collection of mutant protein sequences related to the identified variants in these studies. CMPD is designed to address this issue, serving as a bridge between genomic data and proteomic studies and focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations.

Keywords: TCGA, cancer, mutant, proteome

Procedia PDF Downloads 554

2750 A Comprehensive Analysis of LACK (Leishmania Homologue of Receptors for Activated C Kinase) in the Context of Visceral Leishmaniasis

Authors: Sukrat Sinha, Abhay Kumar, Shanthy Sundaram

Abstract:

The Leishmania homologue of activated C kinase (LACK) is known T cell epitope from soluble Leishmania antigens (SLA) that confers protection against Leishmania challenge. This antigen has been found to be highly conserved among Leishmania strains. LACK has been shown to be protective against L. donovani challenge. A comprehensive analysis of several LACK sequences was completed. The analysis shows a high level of conservation, lower variability and higher antigenicity in specific portions of the LACK protein. This information provides insights for the potential consideration of LACK as a putative candidate in the context of visceral Leishmaniasis vaccine target.

Keywords: bioinformatics, genome assembly, leishmania activated protein kinase c (lack), next-generation sequencing

Procedia PDF Downloads 302

2749 Bioinformatics Approach to Support Genetic Research in Autism in Mali

Authors: M. Kouyate, M. Sangare, S. Samake, S. Keita, H. G. Kim, D. H. Geschwind

Abstract:

Background & Objectives: Human genetic studies can be expensive, even unaffordable, in developing countries, partly due to the sequencing costs. Our aim is to pilot the use of bioinformatics tools to guide scientifically valid, locally relevant, and economically sound autism genetic research in Mali. Methods: The following databases, NCBI, HGMD, and LSDB, were used to identify hot point mutations. Phenotype, transmission pattern, theoretical protein expression in the brain, the impact of the mutation on the 3D structure of the protein) were used to prioritize selected autism genes. We used the protein database, Modeller, and clustal W. Results: We found Mef2c (Gly27Ala/Leu38Gln), Pten (Thr131IIle), Prodh (Leu289Met), Nme1 (Ser120Gly), and Dhcr7 (Pro227Thr/Glu224Lys). These mutations were associated with endonucleases BseRI, NspI, PfrJS2IV, BspGI, BsaBI, and SpoDI, respectively. Gly27Ala/Leu38Gln mutations impacted the 3D structure of the Mef2c protein. Mef2c protein sequences across species showed a high percentage of similarity with a highly conserved MADS domain. Discussion: Mef2c, Pten, Prodh, Nme1, and Dhcr 7 gene mutation frequencies in the Malian population will be very informative. PCR coupled with restriction enzyme digestion can be used to screen the targeted gene mutations. Sanger sequencing will be used for confirmation only. This will cut down considerably the sequencing cost for gene-to-gene mutation screening. The knowledge of the 3D structure and potential impact of the mutations on Mef2c protein informed the protein family and altered function (ex. Leu38Gln). Conclusion & Future Work: Bio-informatics will positively impact autism research in Mali. Our approach can be applied to another neuropsychiatric disorder.

Keywords: bioinformatics, endonucleases, autism, Sanger sequencing, point mutations

Procedia PDF Downloads 43

2748 Effect of Electromagnetic Fields on Protein Extraction from Shrimp By-Products for Electrospinning Process

Authors: Guido Trautmann-Sáez, Mario Pérez-Won, Vilbett Briones, María José Bugueño, Gipsy Tabilo-Munizaga, Luis Gonzáles-Cavieres

Abstract:

Shrimp by-products are a valuable source of protein. However, traditional protein extraction methods have limitations in terms of their efficiency. Protein extraction from shrimp (Pleuroncodes monodon) industrial by-products assisted with ohmic heating (OH), microwave (MW) and pulsed electric field (PEF). It was performed by chemical method (using NaOH and HCl 2M) assisted with OH, MW and PEF in a continuous flow system (5 ml/s). Protein determination, differential scanning calorimetry (DSC) and Fourier-transform infrared (FTIR). Results indicate a 19.25% (PEF) 3.65% (OH) and 28.19% (MW) improvement in protein extraction efficiency. The most efficient method was selected for the electrospinning process and obtaining fiber.

Keywords: electrospinning process, emerging technology, protein extraction, shrimp by-products

Procedia PDF Downloads 35