Search results for: Sequence alignment.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 599

Search results for: Sequence alignment.

509 Edit Distance Algorithm to Increase Storage Efficiency of Javanese Corpora

Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy

Abstract:

Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).

Keywords: edit distance coefficient, Javanese, parallel text alignment, phrase pair combination

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
508 Video Matting based on Background Estimation

Authors: J.-H. Moon, D.-O Kim, R.-H. Park

Abstract:

This paper presents a video matting method, which extracts the foreground and alpha matte from a video sequence. The objective of video matting is finding the foreground and compositing it with the background that is different from the one in the original image. By finding the motion vectors (MVs) using a sliced block matching algorithm (SBMA), we can extract moving regions from the video sequence under the assumption that the foreground is moving and the background is stationary. In practice, foreground areas are not moving through all frames in an image sequence, thus we accumulate moving regions through the image sequence. The boundaries of moving regions are found by Canny edge detector and the foreground region is separated in each frame of the sequence. Remaining regions are defined as background regions. Extracted backgrounds in each frame are combined and reframed as an integrated single background. Based on the estimated background, we compute the frame difference (FD) of each frame. Regions with the FD larger than the threshold are defined as foreground regions, boundaries of foreground regions are defined as unknown regions and the rest of regions are defined as backgrounds. Segmentation information that classifies an image into foreground, background, and unknown regions is called a trimap. Matting process can extract an alpha matte in the unknown region using pixel information in foreground and background regions, and estimate the values of foreground and background pixels in unknown regions. The proposed video matting approach is adaptive and convenient to extract a foreground automatically and to composite a foreground with a background that is different from the original background.

Keywords: Background estimation, Object segmentation, Blockmatching algorithm, Video matting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1769
507 Object Alignment for Military Optical Surveillance

Authors: Oscar J.G. Somsen, Fok Bolderheij

Abstract:

Electro-optical devices are increasingly used for military sea-, land- and air applications to detect, recognize and track objects. Typically, these devices produce video information that is presented to an operator. However, with increasing availability of electro-optical devices the data volume is becoming very large, creating a rising need for automated analysis. In a military setting, this typically involves detecting and recognizing objects at a large distance, i.e. when they are difficult to distinguish from background and noise. One may consider combining multiple images from a video stream into a single enhanced image that provides more information for the operator. In this paper we investigate a simple algorithm to enhance simulated images from a military context and investigate how the enhancement is affected by various types of disturbance.

Keywords: Electro-Optics, Automated Image alignment

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1560
506 The Orlicz Space of the Entire Sequence Fuzzy Numbers Defined by Infinite Matrices

Authors: N.Subramanian, C.Murugesan

Abstract:

This paper is devoted to the study of the general properties of Orlicz space of entire sequence of fuzzy numbers by using infinite matrices.

Keywords: Fuzzy numbers, infinite matrix, Orlicz space, entiresequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1160
505 Fractal Analysis of 16S rRNA Gene Sequences in Archaea Thermophiles

Authors: T. Holden, G. Tremberger, Jr, E. Cheung, R. Subramaniam, R. Sullivan, N. Gadura, P. Schneider, P. Marchese, A. Flamholz, T. Cheung, D. Lieberman

Abstract:

A nucleotide sequence can be expressed as a numerical sequence when each nucleotide is assigned its proton number. A resulting gene numerical sequence can be investigated for its fractal dimension in terms of evolution and chemical properties for comparative studies. We have investigated such nucleotide fluctuation in the 16S rRNA gene of archaea thermophiles. The studied archaea thermophiles were archaeoglobus fulgidus, methanothermobacter thermautotrophicus, methanocaldococcus jannaschii, pyrococcus horikoshii, and thermoplasma acidophilum. The studied five archaea-euryarchaeota thermophiles have fractal dimension values ranging from 1.93 to 1.97. Computer simulation shows that random sequences would have an average of about 2 with a standard deviation about 0.015. The fractal dimension was found to correlate (negative correlation) with the thermophile-s optimal growth temperature with R2 value of 0.90 (N =5). The inclusion of two aracheae-crenarchaeota thermophiles reduces the R2 value to 0.66 (N = 7). Further inclusion of two bacterial thermophiles reduces the R2 value to 0.50 (N =9). The fractal dimension is correlated (positive) to the sequence GC content with an R2 value of 0.89 for the five archaea-euryarchaeota thermophiles (and 0.74 for the entire set of N = 9), although computer simulation shows little correlation. The highest correlation (positive) was found to be between the fractal dimension and di-nucleotide Shannon entropy. However Shannon entropy and sequence GC content were observed to correlate with optimal growth temperature having an R2 of 0.8 (negative), and 0.88 (positive), respectively, for the entire set of 9 thermophiles; thus the correlation lacks species specificity. Together with another correlation study of bacterial radiation dosage with RecA repair gene sequence fractal dimension, it is postulated that fractal dimension analysis is a sensitive tool for studying the relationship between genotype and phenotype among closely related sequences.

Keywords: Fractal dimension, archaea thermophiles, Shannon entropy, GC content

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729
504 Extended Low Power Bus Binding Combined with Data Sequence Reordering

Authors: Jihyung Kim, Taejin Kim, Sungho Park, Jun-Dong Cho

Abstract:

In this paper, we address the problem of reducing the switching activity (SA) in on-chip buses through the use of a bus binding technique in high-level synthesis. While many binding techniques to reduce the SA exist, we present yet another technique for further reducing the switching activity. Our proposed method combines bus binding and data sequence reordering to explore a wider solution space. The problem is formulated as a multiple traveling salesman problem and solved using simulated annealing technique. The experimental results revealed that a binding solution obtained with the proposed method reduces 5.6-27.2% (18.0% on average) and 2.6-12.7% (6.8% on average) of the switching activity when compared with conventional binding-only and hybrid binding-encoding methods, respectively.

Keywords: low power, bus binding, switching activity, multiple traveling salesman problem, data sequence reordering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1293
503 Finite Element Analysis of Composite Frames in Wheelchair under Upward Loading

Authors: Thomas Jin-Chee Liu, Jin-Wei Liang, Wei-Long Chen, Teng-Hui Chen

Abstract:

The finite element analysis is adopted in this primary study. Using the Tsai-Wu criterion and delamination criterion, the stacking sequence [45/04/-454/904]s is the final optimal design for the wheelchair frame. On the contrary, the uni-directional laminates, i.e. [9013]s, [4513]s and [-4513]s, are bad designs due to the higher failure indexes.

Keywords: Wheelchair frame, stacking sequence, failure index, finite element.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3712
502 Adaptive and Personalizing Learning Sequence Using Modified Roulette Wheel Selection Algorithm

Authors: Melvin A. Ballera

Abstract:

Prior literature in the field of adaptive and personalized learning sequence in e-learning have proposed and implemented various mechanisms to improve the learning process such as individualization and personalization, but complex to implement due to expensive algorithmic programming and need of extensive and prior data. The main objective of personalizing learning sequence is to maximize learning by dynamically selecting the closest teaching operation in order to achieve the learning competency of learner. In this paper, a revolutionary technique has been proposed and tested to perform individualization and personalization using modified reversed roulette wheel selection algorithm that runs at O(n). The technique is simpler to implement and is algorithmically less expensive compared to other revolutionary algorithms since it collects the dynamic real time performance matrix such as examinations, reviews, and study to form the RWSA single numerical fitness value. Results show that the implemented system is capable of recommending new learning sequences that lessens time of study based on student's prior knowledge and real performance matrix.

Keywords: E-learning, fitness value, personalized learning sequence, reversed roulette wheel selection algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1971
501 Performance of Chaotic Lu System in CDMA Satellites Communications Systems

Authors: K. Kemih, M. Benslama

Abstract:

This paper investigates the problem of spreading sequence and receiver code synchronization techniques for satellite based CDMA communications systems. The performance of CDMA system depends on the autocorrelation and cross-correlation properties of the used spreading sequences. In this paper we propose the uses of chaotic Lu system to generate binary sequences for spreading codes in a direct sequence spread CDMA system. To minimize multiple access interference (MAI) we propose the use of genetic algorithm for optimum selection of chaotic spreading sequences. To solve the problem of transmitter-receiver synchronization, we use the passivity controls. The concept of semipassivity is defined to find simple conditions which ensure boundedness of the solutions of coupled Lu systems. Numerical results are presented to show the effectiveness of the proposed approach.

Keywords: About Chaotic Lu system, synchronization, Spreading sequence, Genetic Algorithm. Passive System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1700
500 Exons and Introns Classification in Human and Other Organisms

Authors: Benjamin Y. M. Kwan, Jennifer Y. Y. Kwan, Hon Keung Kwan

Abstract:

In the paper, the relative performances on spectral classification of short exon and intron sequences of the human and eleven model organisms is studied. In the simulations, all combinations of sixteen one-sequence numerical representations, four threshold values, and four window lengths are considered. Sequences of 150-base length are chosen and for each organism, a total of 16,000 sequences are used for training and testing. Results indicate that an appropriate combination of one-sequence numerical representation, threshold value, and window length is essential for arriving at top spectral classification results. For fixed-length sequences, the precisions on exon and intron classification obtained for different organisms are not the same because of their genomic differences. In general, precision increases as sequence length increases.

Keywords: Exons and introns classification, Human genome, Model organism genome, Spectral analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010
499 A Hybrid Genetic Algorithm for the Sequence Dependent Flow-Shop Scheduling Problem

Authors: Mohammad Mirabi

Abstract:

Flow-shop scheduling problem (FSP) deals with the scheduling of a set of jobs that visit a set of machines in the same order. The FSP is NP-hard, which means that an efficient algorithm for solving the problem to optimality is unavailable. To meet the requirements on time and to minimize the make-span performance of large permutation flow-shop scheduling problems in which there are sequence dependent setup times on each machine, this paper develops one hybrid genetic algorithms (HGA). Proposed HGA apply a modified approach to generate population of initial chromosomes and also use an improved heuristic called the iterated swap procedure to improve initial solutions. Also the author uses three genetic operators to make good new offspring. The results are compared to some recently developed heuristics and computational experimental results show that the proposed HGA performs very competitively with respect to accuracy and efficiency of solution.

Keywords: Hybrid genetic algorithm, Scheduling, Permutationflow-shop, Sequence dependent

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1840
498 Face Reconstruction and Camera Pose Using Multi-dimensional Descent

Authors: Varin Chouvatut, Suthep Madarasmi, Mihran Tuceryan

Abstract:

This paper aims to propose a novel, robust, and simple method for obtaining a human 3D face model and camera pose (position and orientation) from a video sequence. Given a video sequence of a face recorded from an off-the-shelf digital camera, feature points used to define facial parts are tracked using the Active- Appearance Model (AAM). Then, the face-s 3D structure and camera pose of each video frame can be simultaneously calculated from the obtained point correspondences. This proposed method is primarily based on the combined approaches of Gradient Descent and Powell-s Multidimensional Minimization. Using this proposed method, temporarily occluded point including the case of self-occlusion does not pose a problem. As long as the point correspondences displayed in the video sequence have enough parallax, these missing points can still be reconstructed.

Keywords: Camera Pose, Face Reconstruction, Gradient Descent, Powell's Multidimensional Minimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1535
497 Merging and Comparing Ontologies Generically

Authors: Xiuzhan Guo, Arthur Berrill, Ajinkya Kulkarni, Kostya Belezko, Min Luo

Abstract:

Ontology operations, e.g., aligning and merging, were studied and implemented extensively in different settings, such as, categorical operations, relation algebras, typed graph grammars, with different concerns. However, aligning and merging operations in the settings share some generic properties, e.g., idempotence, commutativity, associativity, and representativity, which are defined on an ontology merging system, given by a nonempty set of the ontologies concerned, a binary relation on the set of the ontologies modeling ontology aligning, and a partial binary operation on the set of the ontologies modeling ontology merging. Given an ontology repository, a finite subset of the set of the ontologies, its merging closure is the smallest subset of the set of the ontologies, which contains the repository and is closed with respect to merging. If idempotence, commutativity, associativity, and representativity properties are satisfied, then both the set of the ontologies and the merging closure of the ontology repository are partially ordered naturally by merging, the merging closure of the ontology repository is finite and can be computed, compared, and sorted efficiently, including sorting, selecting, and querying some specific elements, e.g., maximal ontologies and minimal ontologies. An ontology Valignment pair is a pair of ontology homomorphisms with a common domain. We also show that the ontology merging system, given by ontology V-alignment pairs and pushouts, satisfies idempotence, commutativity, associativity, and representativity properties so that the merging system is partially ordered and the merging closure of a given repository with respect to pushouts can be computed efficiently.

Keywords: Ontology aligning, ontology merging, merging system, poset, merging closure, ontology V-alignment pair, ontology homomorphism, ontology V-alignment pair homomorphism, pushout.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 182
496 Fabrication of Cylindrical Silicon Nanowire-Embedded Field Effect Transistor Using Al2O3 Transfer Layer

Authors: Sang Hoon Lee, Tae Il Lee, Su Jeong Lee, Jae Min Myoung

Abstract:

In order to manufacture short gap single Si nanowire (NW) field effect transistor (FET) by imprinting and transferring method, we introduce the method using Al2O3 sacrificial layer. The diameters of cylindrical Si NW addressed between Au electrodes by dielectrophoretic (DEP) alignment method are controlled to 106, 128, and 148 nm. After imprinting and transfer process, cylindrical Si NW is embedded in PVP adhesive and dielectric layer. By curing transferred cylindrical Si NW and Au electrodes on PVP-coated p++ Si substrate with 200nm-thick SiO2, 3μm gap Si NW FET fabrication was completed. As the diameter of embedded Si NW increases, the mobility of FET increases from 80.51 to 121.24 cm2/V·s and the threshold voltage moves from –7.17 to –2.44 V because the ratio of surface to volume gets reduced.

Keywords: Al2O3 Sacrificial transfer layer, cylindrical silicon nanowires, Dielectrophorestic alignment, Field effect transistor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2073
495 The Influence of Directionality on the Giovanelli Illusion

Authors: Michele Sinico

Abstract:

In the Giovanelli illusion, some collinear dots appear misaligned, when each dot lies within a circle and the circles are not collinear. In this illusion, the role of the frame of reference, determined by the circles, is considered a crucial factor. Three experiments were carried out to study the influence of directionality of the circles on the misalignment. The adjustment method was used. Participants changed the orthogonal position of each dot, from the left to the right of the sequence, until a collinear sequence of dots was achieved. The first experiment verified the illusory effect of the misalignment. In the second experiment, the influence of two different directionalities of the circles (-0.58° and +0.58°) on the misalignment was tested. The results show an over-normalization on the sequences of the dots. The third experiment tested the misalignment of the dots without any inclination of the sequence of circles (0°). Only a local illusory effect was found. These results demonstrate that the directionality of the circles, as a global factor, can increase the misalignment. The findings also indicate that directionality and the frame of reference are independent factors in explaining the Giovanelli illusion.

Keywords: Giovanelli illusion, visual illusion, directionality, misalignment, frame of reference.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 712
494 Hybrid Coding for Animated Polygonal Meshes

Authors: Jinghua Zhang, Charles B. Owen, Jinsheng Xu

Abstract:

A new hybrid coding method for compressing animated polygonal meshes is presented. This paper assumes the simplistic representation of the geometric data: a temporal sequence of polygonal meshes for each discrete frame of the animated sequence. The method utilizes a delta coding and an octree-based method. In this hybrid method, both the octree approach and the delta coding approach are applied to each single frame in the animation sequence in parallel. The approach that generates the smaller encoded file size is chosen to encode the current frame. Given the same quality requirement, the hybrid coding method can achieve much higher compression ratio than the octree-only method or the delta-only method. The hybrid approach can represent 3D animated sequences with higher compression factors while maintaining reasonable quality. It is easy to implement and have a low cost encoding process and a fast decoding process, which make it a better choice for real time application.

Keywords: animated polygonal meshes, compression, deltacoding, octree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1426
493 Analysis of Risk-Based Disaster Planning in Local Communities

Authors: R. A. Temah, L. A. Nkengla-Asi

Abstract:

Planning for future disasters sets the stage for a variety of activities that may trigger multiple recurring operations and expose the community to opportunities to minimize risks. Local communities are increasingly embracing the necessity for planning based on local risks, but are also significantly challenged to effectively plan and response to disasters. This research examines basic risk-based disaster planning model and compares it with advanced risk-based planning that introduces the identification and alignment of varieties of local capabilities within and out of the local community that can be pivotal to facilitate the management of local risks and cascading effects prior to a disaster. A critical review shows that the identification and alignment of capabilities can potentially enhance risk-based disaster planning. A tailored holistic approach to risk based disaster planning is pivotal to enhance collective action and a reduction in disaster collective cost.

Keywords: Capabilities, disaster planning, hazards, local community, risk-based.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1016
492 Effect of Implementation of Nonlinear Sequence Transformations on Power Series Expansion for a Class of Non-Linear Abel Equations

Authors: Javad Abdalkhani

Abstract:

Convergence of power series solutions for a class of non-linear Abel type equations, including an equation that arises in nonlinear cooling of semi-infinite rods, is very slow inside their small radius of convergence. Beyond that the corresponding power series are wildly divergent. Implementation of nonlinear sequence transformation allow effortless evaluation of these power series on very large intervals..

Keywords: Nonlinear transformation, Abel Volterra Equations, Mathematica

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1260
491 Construction of cDNALibrary and EST Analysis of Tenebriomolitorlarvae

Authors: JiEun Jeong, Se-Won Kang, Hee-Ju Hwang, Sung-Hwa Chae, Sang-Haeng Choi, Hong-SeogPark, YeonSoo Han, Bok-Reul Lee, Dae-Hyun Seog, Yong Seok Lee

Abstract:

Tofurther advance research on immune-related genes from T. molitor, we constructed acDNA library and analyzed expressed sequence taq (EST) sequences from 1,056 clones. After removing vector sequence and quality checkingthrough thePhred program (trim_alt 0.05 (P-score>20), 1039 sequences were generated. The average length of insert was 792 bp. In addition, we identified 162 clusters, 167 contigs and 391 contigs after clustering and assembling process using a TGICL package. EST sequences were searchedagainst NCBI nr database by local BLAST (blastx, EKeywords: EST, Innate immunity, Tenebriomolitor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1494
490 A Class of Recurrent Sequences Exhibiting Some Exciting Properties of Balancing Numbers

Authors: G.K.Panda, S.S.Rout

Abstract:

The balancing numbers are natural numbers n satisfying the Diophantine equation 1 + 2 + 3 + · · · + (n - 1) = (n + 1) + (n + 2) + · · · + (n + r); r is the balancer corresponding to the balancing number n.The nth balancing number is denoted by Bn and the sequence {Bn}1 n=1 satisfies the recurrence relation Bn+1 = 6Bn-Bn-1. The balancing numbers posses some curious properties, some like Fibonacci numbers and some others are more interesting. This paper is a study of recurrent sequence {xn}1 n=1 satisfying the recurrence relation xn+1 = Axn - Bxn-1 and possessing some curious properties like the balancing numbers.

Keywords: Recurrent sequences, Balancing numbers, Lucas balancing numbers, Binet form.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1453
489 A Subjective Scheduler Based on Backpropagation Neural Network for Formulating a Real-life Scheduling Situation

Authors: K. G. Anilkumar, T. Tanprasert

Abstract:

This paper presents a subjective job scheduler based on a 3-layer Backpropagation Neural Network (BPNN) and a greedy alignment procedure in order formulates a real-life situation. The BPNN estimates critical values of jobs based on the given subjective criteria. The scheduler is formulated in such a way that, at each time period, the most critical job is selected from the job queue and is transferred into a single machine before the next periodic job arrives. If the selected job is one of the oldest jobs in the queue and its deadline is less than that of the arrival time of the current job, then there is an update of the deadline of the job is assigned in order to prevent the critical job from its elimination. The proposed satisfiability criteria indicates that the satisfaction of the scheduler with respect to performance of the BPNN, validity of the jobs and the feasibility of the scheduler.

Keywords: Backpropagation algorithm, Critical value, Greedy alignment procedure, Neural network, Subjective criteria, Satisfiability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1436
488 Assembly Process Algorithms of Flexible Cell

Authors: M. Kusá, M. Matúšová, A. Javorová, K. Velí

Abstract:

This paper deals about four items assembly process of linear drive. This assembly will be realized in flexible assembly cell on Institute of Manufacturing Systems and Applied Mechanics. There is defined manufacturing cell, individual actuators created our flexible cell. Next chapter is about control type, detailed describe a sequence control type, which will be used in mentioned flexible assembly cell. All cell control is divided in individual steps instructions. There instructions illustrate table number III.

Keywords: assembly, flexible cell, sequence control

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
487 Hidden Markov Model for the Simulation Study of Neural States and Intentionality

Authors: R. B. Mishra

Abstract:

Hidden Markov Model (HMM) has been used in prediction and determination of states that generate different neural activations as well as mental working conditions. This paper addresses two applications of HMM; one to determine the optimal sequence of states for two neural states: Active (AC) and Inactive (IA) for the three emission (observations) which are for No Working (NW), Waiting (WT) and Working (W) conditions of human beings. Another is for the determination of optimal sequence of intentionality i.e. Believe (B), Desire (D), and Intention (I) as the states and three observational sequences: NW, WT and W. The computational results are encouraging and useful.

Keywords: BDI, HMM, neural activation, optimal states, working conditions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 830
486 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk

Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour

Abstract:

The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.

Keywords: Cancer risk, extrinsic factors, genome sequencing, intrinsic factors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1060
485 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings

Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank

Abstract:

Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].

Keywords: data mining, protein secondary structure prediction, parallelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1551
484 Vector Space of the Extended Base-triplets over the Galois Field of five DNA Bases Alphabet

Authors: Robersy Sánchez, Ricardo Grau

Abstract:

A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, G, A, U, C}, where the letter D represent one or more hypothetical bases with unspecific pairing. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvements of a primitive DNA repair system could make possible the transition from the ancient to the modern genetic code. Our results suggest that the Watson-Crick base pairing and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as the transition from the former to the later. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences.

Keywords: Genetic code vector space, primeval genetic code, power spectrum.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
483 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: Metagenomics, phenotype prediction, deep learning, embeddings, multiple instance learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 825
482 Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm

Authors: Jehad A. H. Hammad, Nur'Aini binti Abdul Rashid

Abstract:

With the rapid development in the field of life sciences and the flooding of genomic information, the need for faster and scalable searching methods has become urgent. One of the approaches that were investigated is indexing. The indexing methods have been categorized into three categories which are the lengthbased index algorithms, transformation-based algorithms and mixed techniques-based algorithms. In this research, we focused on the transformation based methods. We embedded the N-gram method into the transformation-based method to build an inverted index table. We then applied the parallel methods to speed up the index building time and to reduce the overall retrieval time when querying the genomic database. Our experiments show that the use of N-Gram transformation algorithm is an economical solution; it saves time and space too. The result shows that the size of the index is smaller than the size of the dataset when the size of N-Gram is 5 and 6. The parallel N-Gram transformation algorithm-s results indicate that the uses of parallel programming with large dataset are promising which can be improved further.

Keywords: Biological sequence, Database index, N-gram indexing, Parallel computing, Sequence retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2080
481 Codon-optimized Carbonic Anhydrase from Dunaliella species: Expression and Characterization

Authors: Seung Pil Pack

Abstract:

Carbonic anhydrases (CAs) has been focused as biological catalysis for CO2 sequestration process because it can catalyze the conversion of CO2 to bicarbonate. Here, codon-optimized sequence of α type-CA cloned from Duneliala species. (DsCAopt) was constructed, expressed, and characterized. The expression level in E. coli BL21(DE3) was better for codon-optimized DsCAopt than intact sequence of DsCAopt. DsCAopt enzyme shows high-stability at pH 7.6/10.0. In final, we demonstrated that in the Ca2+ solution, DsCAopt enzyme can catalyze well the conversion of CO2 to CaCO3, as the calcite form.

Keywords: Carbonic anhydrase, Codon-optimization, Duneliala species, CO2 sequestration

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1602
480 Application of a Similarity Measure for Graphs to Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Alexander Mehler, Jürgen Kilian, Max Mühlhauser

Abstract:

Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.

Keywords: Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1842