Search results for: Protein sequence algorithm

4123 PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah

Abstract:

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.

Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2222

4122 Analytical Modeling of Globular Protein-Ferritin in α-Helical Conformation: A White Noise Functional Approach

Authors: Vernie C. Convicto, Henry P. Aringa, Wilson I. Barredo

Abstract:

This study presents a conformational model of the helical structures of globular protein particularly ferritin in the framework of white noise path integral formulation by using Associated Legendre functions, Bessel and convolution of Bessel and trigonometric functions as modulating functions. The model incorporates chirality features of proteins and their helix-turn-helix sequence structural motif.

Keywords: Globular protein, modulating function, white noise, winding probability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926

4121 Expression of Tissue Plasminogen Activator in Transgenic Tobacco Plants by Signal Peptides Targeting for Delivery to Apoplast, Endoplasmic Reticulum and Cytosol Spaces

Authors: Sadegh Lotfieblisofla, Arash Khodabakhshi

Abstract:

Tissue plasminogen activator (tPA) as a serine protease plays an important role in the fibrinolytic system and the dissolution of fibrin clots in human body. The production of this drug in plants such as tobacco could reduce its production costs. In this study, expression of tPA gene and protein targeting to different plant cell compartments, using various signal peptides has been investigated. For high level of expression, Kozak sequence was used after CaMV35S in the beginning of the gene. In order to design the final construction, Extensin, KDEL (amino acid sequence including Lys-Asp-Glu-Leu) and SP (γ-zein signal peptide coding sequence) were used as leader signals to conduct this protein into apoplast, endoplasmic reticulum and cytosol spaces, respectively. Cloned human tPA gene under the CaMV (Cauliflower mosaic virus) 35S promoter and NOS (Nopaline Synthase) terminator into pBI121 plasmid was transferred into tobacco explants by Agrobacterium tumefaciens strain LBA₄₄₀₄. The presence and copy number of genes in transgenic tobacco was proved by Southern blotting. Enzymatic activity of the rt-PA protein in transgenic plants compared to non-transgenic plants was confirmed by Zymography assay. The presence and amount of rt-PA recombinant protein in plants was estimated by ELISA analysis on crude protein extract of transgenic tobacco using a specific antibody. The yield of recombinant tPA in transgenic tobacco for SP, KDEL, Extensin signals were counted 0.50, 0.68, 0.69 microgram per milligram of total soluble proteins.

Keywords: Recombinant tissue plasminogen activator, plant cell comportment, leader signals, transgenic tobacco.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 684

4120 Performance of Chaotic Lu System in CDMA Satellites Communications Systems

Authors: K. Kemih, M. Benslama

Abstract:

This paper investigates the problem of spreading sequence and receiver code synchronization techniques for satellite based CDMA communications systems. The performance of CDMA system depends on the autocorrelation and cross-correlation properties of the used spreading sequences. In this paper we propose the uses of chaotic Lu system to generate binary sequences for spreading codes in a direct sequence spread CDMA system. To minimize multiple access interference (MAI) we propose the use of genetic algorithm for optimum selection of chaotic spreading sequences. To solve the problem of transmitter-receiver synchronization, we use the passivity controls. The concept of semipassivity is defined to find simple conditions which ensure boundedness of the solutions of coupled Lu systems. Numerical results are presented to show the effectiveness of the proposed approach.

Keywords: About Chaotic Lu system, synchronization, Spreading sequence, Genetic Algorithm. Passive System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1721

4119 Predicting Protein Interaction Sites Based on a New Integrated Radial Basis Functional Neural Network

Authors: Xiaoli Shen, Yuehui Chen

Abstract:

Interactions among proteins are the basis of various life events. So, it is important to recognize and research protein interaction sites. A control set that contains 149 protein molecules were used here. Then 10 features were extracted and 4 sample sets that contained 9 sliding windows were made according to features. These 4 sample sets were calculated by Radial Basis Functional neutral networks which were optimized by Particle Swarm Optimization respectively. Then 4 groups of results were obtained. Finally, these 4 groups of results were integrated by decision fusion (DF) and Genetic Algorithm based Selected Ensemble (GASEN). A better accuracy was got by DF and GASEN. So, the integrated methods were proved to be effective.

Keywords: protein interaction sites, features, sliding windows, radial basis functional neutral networks, genetic algorithm basedselected ensemble.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1402

4118 Two Points Crossover Genetic Algorithm for Loop Layout Design Problem

Authors: Xu LiYun, Briand Florent, Fan GuoLiang

Abstract:

The loop-layout design problem (LLDP) aims at optimizing the sequence of positioning of the machines around the cyclic production line. Traffic congestion is the usual criteria to minimize in this type of problem, i.e. the number of additional cycles spent by each part in the network until the completion of its required routing sequence of machines. This paper aims at applying several improvements mechanisms such as a positioned-based crossover operator for the Genetic Algorithm (GA) called a Two Points Crossover (TPC) and an offspring selection process. The performance of the improved GA is measured using well-known examples from literature and compared to other evolutionary algorithms. Good results show that GA can still be competitive for this type of problem against more recent evolutionary algorithms.

Keywords: Crossover, genetic algorithm, layout design problem, loop-layout, manufacturing optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 818

4117 Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures

Authors: Do Phuc, Nguyen Thi Kim Phung

Abstract:

In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.

Keywords: Eigenvalues, m-tree, graph database, protein structure, spectra graph theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1626

4116 A Hybrid System of Hidden Markov Models and Recurrent Neural Networks for Learning Deterministic Finite State Automata

Authors: Pavan K. Rallabandi, Kailash C. Patidar

Abstract:

In this paper, we present an optimization technique or a learning algorithm using the hybrid architecture by combining the most popular sequence recognition models such as Recurrent Neural Networks (RNNs) and Hidden Markov models (HMMs). In order to improve the sequence/pattern recognition/classification performance by applying a hybrid/neural symbolic approach, a gradient descent learning algorithm is developed using the Real Time Recurrent Learning of Recurrent Neural Network for processing the knowledge represented in trained Hidden Markov Models. The developed hybrid algorithm is implemented on automata theory as a sample test beds and the performance of the designed algorithm is demonstrated and evaluated on learning the deterministic finite state automata.

Keywords: Hybrid systems, Hidden Markov Models, Recurrent neural networks, Deterministic finite state automata.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2856

4115 A Particle Swarm Optimization Approach for the Earliness-Tardiness No-Wait Flowshop Scheduling Problem

Authors: Sedighe Arabameri, Nasser Salmasi

Abstract:

In this researcha particle swarm optimization (PSO) algorithm is proposedfor no-wait flowshopsequence dependent setuptime scheduling problem with weighted earliness-tardiness penalties as the criterion (|, |Σ " ).The smallestposition value (SPV) rule is applied to convert the continuous value of position vector of particles in PSO to job permutations.A timing algorithm is generated to find the optimal schedule and calculate the objective function value of a given sequence in PSO algorithm. Twodifferent neighborhood structures are applied to improve the solution quality of PSO algorithm.The first one is based on variable neighborhood search (VNS) and the second one is a simple one with invariable structure. In order to compare the performance of two neighborhood structures, random test problems are generated and solved by both neighborhood approaches.Computational results show that the VNS algorithmhas better performance than the other one especially for the large sized problems.

Keywords: minimization of summation of weighed earliness and tardiness, no-wait flowshop scheduling, particle swarm optimization, sequence dependent setup times

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1604

4114 Semi-Blind Two-Dimensional Code Acquisition in CDMA Communications

Authors: Rui Wu, Tapani Ristaniemi

Abstract:

In this paper, we propose a new algorithm for joint time-delay and direction-of-arrival (DOA) estimation, here called two-dimensional code acquisition, in an asynchronous directsequence code-division multiple-access (DS-CDMA) array system. This algorithm depends on eigenvector-eigenvalue decomposition of sample correlation matrix, and requires to know desired user-s training sequence. The performance of the algorithm is analyzed both analytically and numerically in uncorrelated and coherent multipath environment. Numerical examples show that the algorithm is robust with unknown number of coherent signals.

Keywords: Two-Dimensional Code Acquisition, EV-t, DSCDMA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1500

4113 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

Authors: Mahdi Esmaeili, Mansour Tarafdar

Abstract:

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1455

4112 Investigations of Protein Aggregation Using Sequence and Structure Based Features

Authors: M. Michael Gromiha, A. Mary Thangakani, Sandeep Kumar, D. Velmurugan

Abstract:

The main cause of several neurodegenerative diseases such as Alzhemier, Parkinson and spongiform encephalopathies is formation of amyloid fibrils and plaques in proteins. We have analyzed different sets of proteins and peptides to understand the influence of sequence based features on protein aggregation process. The comparison of 373 pairs of homologous mesophilic and thermophilic proteins showed that aggregation prone regions (APRs) are present in both. But, the thermophilic protein monomers show greater ability to ‘stow away’ the APRs in their hydrophobic cores and protect them from solvent exposure. The comparison of amyloid forming and amorphous b-aggregating hexapeptides suggested distinct preferences for specific residues at the six positions as well as all possible combinations of nine residue pairs. The compositions of residues at different positions and residue pairs have been converted into energy potentials and utilized for distinguishing between amyloid forming and amorphous b-aggregating peptides. Our method could correctly identify the amyloid forming peptides at an accuracy of 95-100% in different datasets of peptides.

Keywords: Aggregation prone regions, amyloids, thermophilic proteins, amino acid residues, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472

4111 Adaptive and Personalizing Learning Sequence Using Modified Roulette Wheel Selection Algorithm

Authors: Melvin A. Ballera

Abstract:

Prior literature in the field of adaptive and personalized learning sequence in e-learning have proposed and implemented various mechanisms to improve the learning process such as individualization and personalization, but complex to implement due to expensive algorithmic programming and need of extensive and prior data. The main objective of personalizing learning sequence is to maximize learning by dynamically selecting the closest teaching operation in order to achieve the learning competency of learner. In this paper, a revolutionary technique has been proposed and tested to perform individualization and personalization using modified reversed roulette wheel selection algorithm that runs at O(n). The technique is simpler to implement and is algorithmically less expensive compared to other revolutionary algorithms since it collects the dynamic real time performance matrix such as examinations, reviews, and study to form the RWSA single numerical fitness value. Results show that the implemented system is capable of recommending new learning sequences that lessens time of study based on student's prior knowledge and real performance matrix.

Keywords: E-learning, fitness value, personalized learning sequence, reversed roulette wheel selection algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1995

4110 FPGA Implementation of the “PYRAMIDS“ Block Cipher

Authors: A. AlKalbany, H. Al hassan, M. Saeb

Abstract:

The “PYRAMIDS" Block Cipher is a symmetric encryption algorithm of a 64, 128, 256-bit length, that accepts a variable key length of 128, 192, 256 bits. The algorithm is an iterated cipher consisting of repeated applications of a simple round transformation with different operations and different sequence in each round. The algorithm was previously software implemented in Cµ code. In this paper, a hardware implementation of the algorithm, using Field Programmable Gate Arrays (FPGA), is presented. In this work, we discuss the algorithm, the implemented micro-architecture, and the simulation and implementation results. Moreover, we present a detailed comparison with other implemented standard algorithms. In addition, we include the floor plan as well as the circuit diagrams of the various micro-architecture modules.

Keywords: FPGA, VHDL, micro-architecture, encryption, cryptography, algorithm, data communication security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1676

4109 Multiple Sequence Alignment Using Three- Dimensional Fragments

Authors: Layal Al Ait, Eduardo Corel, Kifah Tout, Burkhard Morgenstern

Abstract:

Background: Dialign is a DNA/Protein alignment tool for performing pairwise and multiple pairwise alignments through the comparison of gap-free segments (fragments) between sequence pairs. An alignment of two sequences is a chain of fragments, i.e local gap-free pairwise alignments, with the highest total score. METHOD: A new approach is defined in this article which relies on the concept of using three-dimensional fragments – i.e. local threeway alignments -- in the alignment process instead of twodimensional ones. These three-dimensional fragments are gap-free alignments constituting of equal-length segments belonging to three distinct sequences. RESULTS: The obtained results showed good improvments over the performance of DIALIGN.

Keywords: DIALIGN, Multiple sequence alignment, Threedimensional fragments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1526

4108 Video Matting based on Background Estimation

Authors: J.-H. Moon, D.-O Kim, R.-H. Park

Abstract:

This paper presents a video matting method, which extracts the foreground and alpha matte from a video sequence. The objective of video matting is finding the foreground and compositing it with the background that is different from the one in the original image. By finding the motion vectors (MVs) using a sliced block matching algorithm (SBMA), we can extract moving regions from the video sequence under the assumption that the foreground is moving and the background is stationary. In practice, foreground areas are not moving through all frames in an image sequence, thus we accumulate moving regions through the image sequence. The boundaries of moving regions are found by Canny edge detector and the foreground region is separated in each frame of the sequence. Remaining regions are defined as background regions. Extracted backgrounds in each frame are combined and reframed as an integrated single background. Based on the estimated background, we compute the frame difference (FD) of each frame. Regions with the FD larger than the threshold are defined as foreground regions, boundaries of foreground regions are defined as unknown regions and the rest of regions are defined as backgrounds. Segmentation information that classifies an image into foreground, background, and unknown regions is called a trimap. Matting process can extract an alpha matte in the unknown region using pixel information in foreground and background regions, and estimate the values of foreground and background pixels in unknown regions. The proposed video matting approach is adaptive and convenient to extract a foreground automatically and to composite a foreground with a background that is different from the original background.

Keywords: Background estimation, Object segmentation, Blockmatching algorithm, Video matting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1791

4107 Neural Network Learning Based on Chaos

Authors: Truong Quang Dang Khoa, Masahiro Nakagawa

Abstract:

Chaos and fractals are novel fields of physics and mathematics showing up a new way of universe viewpoint and creating many ideas to solve several present problems. In this paper, a novel algorithm based on the chaotic sequence generator with the highest ability to adapt and reach the global optima is proposed. The adaptive ability of proposal algorithm is flexible in 2 steps. The first one is a breadth-first search and the second one is a depth-first search. The proposal algorithm is examined by 2 functions, the Camel function and the Schaffer function. Furthermore, the proposal algorithm is applied to optimize training Multilayer Neural Networks.

Keywords: learning and evolutionary computing, Chaos Optimization Algorithm, Artificial Neural Networks, nonlinear optimization, intelligent computational technologies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760

4106 Constraint Based Frequent Pattern Mining Technique for Solving GCS Problem

Authors: First G.M. Karthik, Second Ramachandra.V.Pujeri, Dr.

Abstract:

Generalized Center String (GCS) problem are generalized from Common Approximate Substring problem and Common substring problems. GCS are known to be NP-hard allowing the problems lies in the explosion of potential candidates. Finding longest center string without concerning the sequence that may not contain any motifs is not known in advance in any particular biological gene process. GCS solved by frequent pattern-mining techniques and known to be fixed parameter tractable based on the fixed input sequence length and symbol set size. Efficient method known as Bpriori algorithms can solve GCS with reasonable time/space complexities. Bpriori 2 and Bpriori 3-2 algorithm are been proposed of any length and any positions of all their instances in input sequences. In this paper, we reduced the time/space complexity of Bpriori algorithm by Constrained Based Frequent Pattern mining (CBFP) technique which integrates the idea of Constraint Based Mining and FP-tree mining. CBFP mining technique solves the GCS problem works for all center string of any length, but also for the positions of all their mutated copies of input sequence. CBFP mining technique construct TRIE like with FP tree to represent the mutated copies of center string of any length, along with constraints to restraint growth of the consensus tree. The complexity analysis for Constrained Based FP mining technique and Bpriori algorithm is done based on the worst case and average case approach. Algorithm's correctness compared with the Bpriori algorithm using artificial data is shown.

Keywords: Constraint Based Mining, FP tree, Data mining, GCS problem, CBFP mining technique.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1667

4105 Non-Population Search Algorithms for Capacitated Material Requirement Planning in Multi-Stage Assembly Flow Shop with Alternative Machines

Authors: Watcharapan Sukkerd, Teeradej Wuttipornpun

Abstract:

This paper aims to present non-population search algorithms called tabu search (TS), simulated annealing (SA) and variable neighborhood search (VNS) to minimize the total cost of capacitated MRP problem in multi-stage assembly flow shop with two alternative machines. There are three main steps for the algorithm. Firstly, an initial sequence of orders is constructed by a simple due date-based dispatching rule. Secondly, the sequence of orders is repeatedly improved to reduce the total cost by applying TS, SA and VNS separately. Finally, the total cost is further reduced by optimizing the start time of each operation using the linear programming (LP) model. Parameters of the algorithm are tuned by using real data from automotive companies. The result shows that VNS significantly outperforms TS, SA and the existing algorithm.

Keywords: Capacitated MRP, non-population search algorithms, linear programming, assembly flow shop.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 929

4104 Cloning of a β-Glucosidase Gene (BGL1) from Traditional Starter Yeast Saccharomycopsis fibuligera BMQ 908 and Expression in Pichia pastoris

Authors: Le Thuy Mai, Vu Nguyen Thanh

Abstract:

β-Glucosidase is an important enzyme for production of ethanol from lignocellulose. With hydrolytic activity on cellooligosaccharides, especially cellobiose, β-glucosidase removes product inhibitory effect on cellulases and forms fermentable sugars. In this study, β-glucosidase encoding gene (BGL1) from traditional starter yeast Saccharomycosis fibuligera BMQ908 was cloned and expressed in Pichia pastoris. BGL1 of S. fibuligera BMQ 908 shared 98% nucleotide homology with the closest GenBank sequence (M22475) but identity in amino-acid sequences of catalytic domains. Recombinant plasmid pPICZαA/BGL1 containing the sequence encoding BGL1 mature protein and α-factor secretion signal was constructed and transformed into methylotrophic yeast P. pastoris by electroporation. The recombinant strain produced single extracellular protein with molecular weight of 120 kDa and cellobiase activity of 60 IU/ml. The optimum pH of the recombinant β-glucosidase was 5.0 and the optimum temperature was 50°C.

Keywords: β-Glucosidase, Pichia pastoris, Saccharomycopsisfibuligera, recombinant enzyme.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4951

4103 Analysis of DNA-Recognizing Enzyme Interaction using Deaminated Lesions

Authors: Seung Pil Pack

Abstract:

Deaminated lesions were produced via nitrosative oxidation of natural nucleobases; uracul (Ura, U) from cytosine (Cyt, C), hypoxanthine (Hyp, H) from adenine (Ade, A), and xanthine (Xan, X) and oxanine (Oxa, O) from guanine (Gua, G). Such damaged nucleobases may induce mutagenic problems, so that much attentions and efforts have been poured on the revealing of their mechanisms in vivo or in vitro. In this study, we employed these deaminated lesions as useful probes for analysis of DNA-binding/recognizing proteins or enzymes. Since the pyrimidine lesions such as Hyp, Oxa and Xan are employed as analogues of guanine, their comparative uses are informative for analyzing the role of Gua in DNA sequence in DNA-protein interaction. Several DNA oligomers containing such Hyp, Oxa or Xan substituted for Gua were designed to reveal the molecular interaction between DNA and protein. From this approach, we have got useful information to understand the molecular mechanisms of the DNA-recognizing enzymes, which have not ever been observed using conventional DNA oligomer composed of just natural nucleobases.

Keywords: Deaminated lesion, DNA-protein interaction, DNA-recognizing enzymes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1260

4102 A New Edit Distance Method for Finding Similarity in Dna Sequence

Authors: Patsaraporn Somboonsak, Mud-Armeen Munlin

Abstract:

The P-Bigram method is a string comparison methods base on an internal two characters-based similarity measure. The edit distance between two strings is the minimal number of elementary editing operations required to transform one string into the other. The elementary editing operations include deletion, insertion, substitution two characters. In this paper, we address the P-Bigram method to sole the similarity problem in DNA sequence. This method provided an efficient algorithm that locates all minimum operation in a string. We have been implemented algorithm and found that our program calculated that smaller distance than one string. We develop PBigram edit distance and show that edit distance or the similarity and implementation using dynamic programming. The performance of the proposed approach is evaluated using number edit and percentage similarity measures.

Keywords: Edit distance, String Matching, String Similarity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3289

4101 An Algebra for Protein Structure Data

Authors: Yanchao Wang, Rajshekhar Sunderraman

Abstract:

This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.

Keywords: Domain-Specific Data Management, Protein Algebra, Protein Ontology, Protein Structure Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1517

4100 Comparison of Phylogenetic Trees of Multiple Protein Sequence Alignment Methods

Authors: Khaddouja Boujenfa, Nadia Essoussi, Mohamed Limam

Abstract:

Multiple sequence alignment is a fundamental part in many bioinformatics applications such as phylogenetic analysis. Many alignment methods have been proposed. Each method gives a different result for the same data set, and consequently generates a different phylogenetic tree. Hence, the chosen alignment method affects the resulting tree. However in the literature, there is no evaluation of multiple alignment methods based on the comparison of their phylogenetic trees. This work evaluates the following eight aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN, ProbCons and Align-m, based on their phylogenetic trees (test trees) produced on a given data set. The Neighbor-Joining method is used to estimate trees. Three criteria, namely, the dNNI, the dRF and the Id_Tree are established to test the ability of different alignment methods to produce closer test tree compared to the reference one (true tree). Results show that the method which produces the most accurate alignment gives the nearest test tree to the reference tree. MUSCLE outperforms all aligners with respect to the three criteria and for all datasets, performing particularly better when sequence identities are within 10-20%. It is followed by T-Coffee at lower sequence identity (<10%), Align-m at 20-30% identity, and ClustalX and ProbCons at 30-50% identity. Also, it is noticed that when sequence identities are higher (>30%), trees scores of all methods become similar.

Keywords: Multiple alignment methods, phylogenetic trees, Neighbor-Joining method, Robinson-Foulds distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1806

4099 Genetic Algorithm Application in a Dynamic PCB Assembly with Carryover Sequence- Dependent Setups

Authors: M. T. Yazdani Sabouni, Rasaratnam Logendran

Abstract:

We consider a typical problem in the assembly of printed circuit boards (PCBs) in a two-machine flow shop system to simultaneously minimize the weighted sum of weighted tardiness and weighted flow time. The investigated problem is a group scheduling problem in which PCBs are assembled in groups and the interest is to find the best sequence of groups as well as the boards within each group to minimize the objective function value. The type of setup operation between any two board groups is characterized as carryover sequence-dependent setup time, which exactly matches with the real application of this problem. As a technical constraint, all of the boards must be kitted before the assembly operation starts (kitting operation) and by kitting staff. The main idea developed in this paper is to completely eliminate the role of kitting staff by assigning the task of kitting to the machine operator during the time he is idle which is referred to as integration of internal (machine) and external (kitting) setup times. Performing the kitting operation, which is a preparation process of the next set of boards while the other boards are currently being assembled, results in the boards to continuously enter the system or have dynamic arrival times. Consequently, a dynamic PCB assembly system is introduced for the first time in the assembly of PCBs, which also has characteristics similar to that of just-in-time manufacturing. The problem investigated is computationally very complex, meaning that finding the optimal solutions especially when the problem size gets larger is impossible. Thus, a heuristic based on Genetic Algorithm (GA) is employed. An example problem on the application of the GA developed is demonstrated and also numerical results of applying the GA on solving several instances are provided.

Keywords: Genetic algorithm, Dynamic PCB assembly, Carryover sequence-dependent setup times, Multi-objective.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1551

4098 Software Evolution Based Sequence Diagrams Merging

Authors: Zine-Eddine Bouras, Abdelouaheb Talai

Abstract:

The need to merge software artifacts seems inherent to modern software development. Distribution of development over several teams and breaking tasks into smaller, more manageable pieces are an effective means to deal with the kind of complexity. In each case, the separately developed artifacts need to be assembled as efficiently as possible into a consistent whole in which the parts still function as described. In addition, earlier changes are introduced into the life cycle and easier is their management by designers. Interaction-based specifications such as UML sequence diagrams have been found effective in this regard. As a result, sequence diagrams can be used not only for capturing system behaviors but also for merging changes in order to create a new version. The objective of this paper is to suggest a new approach to deal with the problem of software merging at the level of sequence diagrams by using the concept of dependence analysis that captures, formally, all mapping, and differences between elements of sequence diagrams and serves as a key concept to create a new version of sequence diagram.

Keywords: System behaviors, sequence diagram merging, dependence analysis, sequence diagram slicing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1741

4097 Selecting Negative Examples for Protein-Protein Interaction

Authors: Mohammad Shoyaib, M. Abdullah-Al-Wadud, Oksam Chae

Abstract:

Proteomics is one of the largest areas of research for bioinformatics and medical science. An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. Predicting Protein-Protein Interaction (PPI) is one of the crucial and decisive problems in current research. Genomic data offer a great opportunity and at the same time a lot of challenges for the identification of these interactions. Many methods have already been proposed in this regard. In case of in-silico identification, most of the methods require both positive and negative examples of protein interaction and the perfection of these examples are very much crucial for the final prediction accuracy. Positive examples are relatively easy to obtain from well known databases. But the generation of negative examples is not a trivial task. Current PPI identification methods generate negative examples based on some assumptions, which are likely to affect their prediction accuracy. Hence, if more reliable negative examples are used, the PPI prediction methods may achieve even more accuracy. Focusing on this issue, a graph based negative example generation method is proposed, which is simple and more accurate than the existing approaches. An interaction graph of the protein sequences is created. The basic assumption is that the longer the shortest path between two protein-sequences in the interaction graph, the less is the possibility of their interaction. A well established PPI detection algorithm is employed with our negative examples and in most cases it increases the accuracy more than 10% in comparison with the negative pair selection method in that paper.

Keywords: Interaction graph, Negative training data, Protein-Protein interaction, Support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1678

4096 Optimal Design of Composite Patch for a Cracked Pipe by Utilizing Genetic Algorithm and Finite Element Method

Authors: Mahdi Fakoor, Seyed Mohammad Navid Ghoreishi

Abstract:

Composite patching is a common way for reinforcing the cracked pipes and cylinders. The effects of composite patch reinforcement on fracture parameters of a cracked pipe depend on a variety of parameters such as number of layers, angle, thickness, and material of each layer. Therefore, stacking sequence optimization of composite patch becomes crucial for the applications of cracked pipes. In this study, in order to obtain the optimal stacking sequence for a composite patch that has minimum weight and maximum resistance in propagation of cracks, a coupled Multi-Objective Genetic Algorithm (MOGA) and Finite Element Method (FEM) process is proposed. This optimization process has done for longitudinal and transverse semi-elliptical cracks and optimal stacking sequences and Pareto’s front for each kind of cracks are presented. The proposed algorithm is validated against collected results from the existing literature.

Keywords: Multi objective optimization, Pareto front, composite patch, cracked pipe.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 880

4095 Using Genetic Algorithms in Closed Loop Identification of the Systems with Variable Structure Controller

Authors: O.M. Mohamed vall, M. Radhi

Abstract:

This work presents a recursive identification algorithm. This algorithm relates to the identification of closed loop system with Variable Structure Controller. The approach suggested includes two stages. In the first stage a genetic algorithm is used to obtain the parameters of switching function which gives a control signal rich in commutations (i.e. a control signal whose spectral characteristics are closest possible to those of a white noise signal). The second stage consists in the identification of the system parameters by the instrumental variable method and using the optimal switching function parameters obtained with the genetic algorithm. In order to test the validity of this algorithm a simulation example is presented.

Keywords: Closed loop identification, variable structure controller, pseud-random binary sequence, genetic algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410

4094 One-Class Support Vector Machines for Protein-Protein Interactions Prediction

Authors: Hany Alashwal, Safaai Deris, Razib M. Othman

Abstract:

Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.

Keywords: Bioinformatics, Protein-protein interactions, One-Class Support Vector Machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1968