Search results for: Entire sequences

638 On Some Subspaces of Entire Sequence Space of Fuzzy Numbers

Authors: T. Balasubramanian, A. Pandiarani

Abstract:

In this paper we introduce some subspaces of fuzzy entire sequence space. Some general properties of these sequence spaces are discussed. Also some inclusion relation involving the spaces are obtained. Mathematics Subject Classification: 40A05, 40D25.

Keywords: Fuzzy Numbers, Entire sequences, completeness, Fuzzy entire sequences

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1196

637 On λ− Summable of Orlicz Space of Entire Sequences of Fuzzy Numbers

Authors: N. Subramanian, U. K. Misra, M. S. Panda

Abstract:

In this paper the concept of strongly (λM)p - Ces'aro summability of a sequence of fuzzy numbers and strongly λM- statistically convergent sequences of fuzzy numbers is introduced.

Keywords: Fuzzy numbers, statistical convergence, Orlicz space, entire sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1864

636 On The Elliptic Divisibility Sequences over Finite Fields

Authors: Osman Bizim

Abstract:

In this work we study elliptic divisibility sequences over finite fields. MorganWard in [11, 12] gave arithmetic theory of elliptic divisibility sequences. We study elliptic divisibility sequences, equivalence of these sequences and singular elliptic divisibility sequences over finite fields Fp, p > 3 is a prime.

Keywords: Elliptic divisibility sequences, equivalent sequences, singular sequences.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1427

635 On the Properties of Pseudo Noise Sequences with a Simple Proposal of Randomness Test

Authors: Abhijit Mitra

Abstract:

Maximal length sequences (m-sequences) are also known as pseudo random sequences or pseudo noise sequences for closely following Golomb-s popular randomness properties: (P1) balance, (P2) run, and (P3) ideal autocorrelation. Apart from these, there also exist certain other less known properties of such sequences all of which are discussed in this tutorial paper. Comprehensive proofs to each of these properties are provided towards better understanding of such sequences. A simple test is also proposed at the end of the paper in order to distinguish pseudo noise sequences from truly random sequences such as Bernoulli sequences.

Keywords: Maximal length sequence, pseudo noise sequence, punctured de Bruijn sequence, auto-correlation, Bernoulli sequence, randomness tests.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6639

634 Elliptic Divisibility Sequences over Finite Fields

Authors: Betül Gezer, Ahmet Tekcan, Osman Bizim

Abstract:

In this work, we study elliptic divisibility sequences over finite fields. Morgan Ward in [14], [15] gave arithmetic theory of elliptic divisibility sequences and formulas for elliptic divisibility sequences with rank two over finite field Fp. We study elliptic divisibility sequences with rank three, four and five over a finite field Fp, where p > 3 is a prime and give general terms of these sequences and then we determine elliptic and singular curves associated with these sequences.

Keywords: Elliptic divisibility sequences, singular elliptic divisibilitysequences, elliptic curves, singular curves.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1654

633 Clustering Protein Sequences with Tailored General Regression Model Technique

Authors: G. Lavanya Devi, Allam Appa Rao, A. Damodaram, GR Sridhar, G. Jaya Suma

Abstract:

Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.

Keywords: Clustering, General Regression Model, Protein Sequences, Similarity Measure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513

632 On the Effectivity of Different Pseudo-Noise and Orthogonal Sequences for Speech Encryption from Correlation Properties

Authors: V. Anil Kumar, Abhijit Mitra, S. R. Mahadeva Prasanna

Abstract:

We analyze the effectivity of different pseudo noise (PN) and orthogonal sequences for encrypting speech signals in terms of perceptual intelligence. Speech signal can be viewed as sequence of correlated samples and each sample as sequence of bits. The residual intelligibility of the speech signal can be reduced by removing the correlation among the speech samples. PN sequences have random like properties that help in reducing the correlation among speech samples. The mean square aperiodic auto-correlation (MSAAC) and the mean square aperiodic cross-correlation (MSACC) measures are used to test the randomness of the PN sequences. Results of the investigation show the effectivity of large Kasami sequences for this purpose among many PN sequences.

Keywords: Speech encryption, pseudo-noise codes, maximallength, Gold, Barker, Kasami, Walsh-Hadamard, autocorrelation, crosscorrelation, figure of merit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1994

631 A Simplified and Effective Algorithm Used to Mine Similar Processes: An Illustrated Example

Authors: Min-Hsun Kuo, Yun-Shiow Chen

Abstract:

The running logs of a process hold valuable information about its executed activity behavior and generated activity logic structure. Theses informative logs can be extracted, analyzed and utilized to improve the efficiencies of the process's execution and conduction. One of the techniques used to accomplish the process improvement is called as process mining. To mine similar processes is such an improvement mission in process mining. Rather than directly mining similar processes using a single comparing coefficient or a complicate fitness function, this paper presents a simplified heuristic process mining algorithm with two similarity comparisons that are able to relatively conform the activity logic sequences (traces) of mining processes with those of a normalized (regularized) one. The relative process conformance is to find which of the mining processes match the required activity sequences and relationships, further for necessary and sufficient applications of the mined processes to process improvements. One similarity presented is defined by the relationships in terms of the number of similar activity sequences existing in different processes; another similarity expresses the degree of the similar (identical) activity sequences among the conforming processes. Since these two similarities are with respect to certain typical behavior (activity sequences) occurred in an entire process, the common problems, such as the inappropriateness of an absolute comparison and the incapability of an intrinsic information elicitation, which are often appeared in other process conforming techniques, can be solved by the relative process comparison presented in this paper. To demonstrate the potentiality of the proposed algorithm, a numerical example is illustrated.

Keywords: process mining, process similarity, artificial intelligence, process conformance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1401

630 Advances on the Understanding of Sequence Convergence Seen from the Perspective of Mathematical Working Spaces

Authors: Paula Verdugo-Hernández, Patricio Cumsille

Abstract:

We analyze a first-class on the convergence of real number sequences, named hereafter sequences, to foster exploration and discovery of concepts through graphical representations before engaging students in proving. The main goal was to differentiate between sequences and continuous functions-of-a-real-variable and better understand concepts at an initial stage. We applied the analytic frame of Mathematical Working Spaces, which we expect to contribute to extending to sequences since, as far as we know, it has only developed for other objects, and which is relevant to analyze how mathematical work is built systematically by connecting the epistemological and cognitive perspectives, and involving the semiotic, instrumental, and discursive dimensions.

Keywords: Convergence, graphical representations, Mathematical Working Spaces, paradigms of real analysis, real number sequences.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 448

629 On the Construction of m-Sequences via Primitive Polynomials with a Fast Identification Method

Authors: Abhijit Mitra

Abstract:

The paper provides an in-depth tutorial of mathematical construction of maximal length sequences (m-sequences) via primitive polynomials and how to map the same when implemented in shift registers. It is equally important to check whether a polynomial is primitive or not so as to get proper m-sequences. A fast method to identify primitive polynomials over binary fields is proposed where the complexity is considerably less in comparison with the standard procedures for the same purpose.

Keywords: Finite field, irreducible polynomial, primitive polynomial, maximal length sequence, additive shift register, multiplicative shift register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3883

628 The Convergence Theorems for Mixing Random Variable Sequences

Authors: Yan-zhao Yang

Abstract:

In this paper, some limit properties for mixing random variables sequences were studied and some results on weak law of large number for mixing random variables sequences were presented. Some complete convergence theorems were also obtained. The results extended and improved the corresponding theorems in i.i.d random variables sequences.

Keywords: Complete convergence, mixing random variables, weak law of large numbers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576

627 A Class of Formal Operators for Combinatorial Identities and its Application

Authors: Ruigang Zhang, Wuyungaowa, Xingchen Ma

Abstract:

In this paper, we present some formulas of symbolic operator summation, which involving Generalization well-know number sequences or polynomial sequences, and mean while we obtain some identities about the sequences by employing M-R‘s substitution rule.

Keywords: Generating functions, operators sequence group, Riordan arrays, R. G operator group, combinatorial identities.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760

626 Fractal Analysis of 16S rRNA Gene Sequences in Archaea Thermophiles

Authors: T. Holden, G. Tremberger, Jr, E. Cheung, R. Subramaniam, R. Sullivan, N. Gadura, P. Schneider, P. Marchese, A. Flamholz, T. Cheung, D. Lieberman

Abstract:

A nucleotide sequence can be expressed as a numerical sequence when each nucleotide is assigned its proton number. A resulting gene numerical sequence can be investigated for its fractal dimension in terms of evolution and chemical properties for comparative studies. We have investigated such nucleotide fluctuation in the 16S rRNA gene of archaea thermophiles. The studied archaea thermophiles were archaeoglobus fulgidus, methanothermobacter thermautotrophicus, methanocaldococcus jannaschii, pyrococcus horikoshii, and thermoplasma acidophilum. The studied five archaea-euryarchaeota thermophiles have fractal dimension values ranging from 1.93 to 1.97. Computer simulation shows that random sequences would have an average of about 2 with a standard deviation about 0.015. The fractal dimension was found to correlate (negative correlation) with the thermophile-s optimal growth temperature with R2 value of 0.90 (N =5). The inclusion of two aracheae-crenarchaeota thermophiles reduces the R2 value to 0.66 (N = 7). Further inclusion of two bacterial thermophiles reduces the R2 value to 0.50 (N =9). The fractal dimension is correlated (positive) to the sequence GC content with an R2 value of 0.89 for the five archaea-euryarchaeota thermophiles (and 0.74 for the entire set of N = 9), although computer simulation shows little correlation. The highest correlation (positive) was found to be between the fractal dimension and di-nucleotide Shannon entropy. However Shannon entropy and sequence GC content were observed to correlate with optimal growth temperature having an R2 of 0.8 (negative), and 0.88 (positive), respectively, for the entire set of 9 thermophiles; thus the correlation lacks species specificity. Together with another correlation study of bacterial radiation dosage with RecA repair gene sequence fractal dimension, it is postulated that fractal dimension analysis is a sensitive tool for studying the relationship between genotype and phenotype among closely related sequences.

Keywords: Fractal dimension, archaea thermophiles, Shannon entropy, GC content

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1725

625 MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes

Authors: Achraf El Allali, John R. Rose

Abstract:

A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in classifying DNA sequences into coding and noncoding. In this paper we describe a species independent iterative approach that distinguishes coding from non-coding sequences using the mutual information measure (MIM). A set of sixty prokaryotes is used to extract universal training data. To facilitate comparisons with the published results of other researchers, a test set of 51 bacterial and archaeal genomes was used to evaluate MIM. These results demonstrate that MIM produces superior results while remaining species independent.

Keywords: Coding Non-coding Classification, Entropy, GeneRecognition, Mutual Information.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672

624 Strong Limit Theorems for Dependent Random Variables

Authors: Libin Wu, Bainian Li

Abstract:

In This Article We establish moment inequality of dependent random variables,furthermore some theorems of strong law of large numbers and complete convergence for sequences of dependent random variables. In particular, independent and identically distributed Marcinkiewicz Law of large numbers are generalized to the case of m0-dependent sequences.

Keywords: Lacunary System, Generalized Gaussian, NA sequences, strong law of large numbers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434

623 Eukaryotic Gene Prediction by an Investigation of Nonlinear Dynamical Modeling Techniques on EIIP Coded Sequences

Authors: Mai S. Mabrouk, Nahed H. Solouma, Abou-Bakr M. Youssef, Yasser M. Kadah

Abstract:

Many digital signal processing, techniques have been used to automatically distinguish protein coding regions (exons) from non-coding regions (introns) in DNA sequences. In this work, we have characterized these sequences according to their nonlinear dynamical features such as moment invariants, correlation dimension, and largest Lyapunov exponent estimates. We have applied our model to a number of real sequences encoded into a time series using EIIP sequence indicators. In order to discriminate between coding and non coding DNA regions, the phase space trajectory was first reconstructed for coding and non-coding regions. Nonlinear dynamical features are extracted from those regions and used to investigate a difference between them. Our results indicate that the nonlinear dynamical characteristics have yielded significant differences between coding (CR) and non-coding regions (NCR) in DNA sequences. Finally, the classifier is tested on real genes where coding and non-coding regions are well known.

Keywords: Gene prediction, nonlinear dynamics, correlation dimension, Lyapunov exponent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1757

622 New Algorithms for Finding Short Reset Sequences in Synchronizing Automata

Authors: Adam Roman

Abstract:

Finding synchronizing sequences for the finite automata is a very important problem in many practical applications (part orienters in industry, reset problem in biocomputing theory, network issues etc). Problem of finding the shortest synchronizing sequence is NP-hard, so polynomial algorithms probably can work only as heuristic ones. In this paper we propose two versions of polynomial algorithms which work better than well-known Eppstein-s Greedy and Cycle algorithms.

Keywords: Synchronizing words, reset sequences, Černý Conjecture

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1547

621 SAF: A Substitution and Alignment Free Similarity Measure for Protein Sequences

Authors: Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski

Abstract:

The literature reports a large number of approaches for measuring the similarity between protein sequences. Most of these approaches estimate this similarity using alignment-based techniques that do not necessarily yield biologically plausible results, for two reasons. First, for the case of non-alignable (i.e., not yet definitively aligned and biologically approved) sequences such as multi-domain, circular permutation and tandem repeat protein sequences, alignment-based approaches do not succeed in producing biologically plausible results. This is due to the nature of the alignment, which is based on the matching of subsequences in equivalent positions, while non-alignable proteins often have similar and conserved domains in non-equivalent positions. Second, the alignment-based approaches lead to similarity measures that depend heavily on the parameters set by the user for the alignment (e.g., gap penalties and substitution matrices). For easily alignable protein sequences, it's possible to supply a suitable combination of input parameters that allows such an approach to yield biologically plausible results. However, for difficult-to-align protein sequences, supplying different combinations of input parameters yields different results. Such variable results create ambiguities and complicate the similarity measurement task. To overcome these drawbacks, this paper describes a novel and effective approach for measuring the similarity between protein sequences, called SAF for Substitution and Alignment Free. Without resorting either to the alignment of protein sequences or to substitution relations between amino acids, SAF is able to efficiently detect the significant subsequences that best represent the intrinsic properties of protein sequences, those underlying the chronological dependencies of structural features and biochemical activities of protein sequences. Moreover, by using a new efficient subsequence matching scheme, SAF more efficiently handles protein sequences that contain similar structural features with significant meaning in chronologically non-equivalent positions. To show the effectiveness of SAF, extensive experiments were performed on protein datasets from different databases, and the results were compared with those obtained by several mainstream algorithms.

Keywords: Protein, Similarity, Substitution, Alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1357

620 Mutational Analysis of CTLA4 Gene in Pakistani SLE Patients

Authors: N. Hussain, G. Jaffery, A.N. Sabri, S. Hasnain

Abstract:

The main aim is to perform mutational analysis of CTLA4 gene Exon 1 in SLE patients. A total of 61 SLE patients fulfilling “American College of Rheumatology (ACR) criteria" and 61 controls were enrolled in this study. The region of CTLA4 gene exon 1 was amplified by using Step-down PCR technique. Extracted DNA of band 354 bp was sequenced to analyze mutations in the exon-1 of CTLA-4 gene. Further, protein sequences were identified from nucleotide sequences of CTLA4 Exon 1 by using Expasy software and through Blast P software it was found that CTLA4 protein sequences of Pakistani SLE patients were similar to that of Chinese SLE population. No variations were found after patients sequences were compared with that of the control sequence. Furthermore it was found that CTLA4 protein sequences of Pakistani SLE patients were similar to that of Chinese SLE population. Thus CTLA4 gene may not be responsible for an autoimmune disease SLE.

Keywords: American College of Rheumatology criteria, autoimmune disease, Cytotoxic T Lymphocyte Antigen-4, Polymerase Chain Reaction, Systemic Lupus Erythematosus

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479

619 Analysis of the Genetic Sequences of PCV2 Virus in Mexico

Authors: Robles F, Chevez J, Angulo R, Díaz E, González C.

Abstract:

These All pig-producing countries from around the world report the presence of Postweaning multisystemic wasting syndrome (PMWS.) In America, PCV2 has been recognized in Canada, United States and Brazil. Knowledge concerning the genetic sequences of PMWS has been very important. In Mexico, there is no report describing the genetic sequences and variations of the PCV2 virus present around the country. For this reason, the main objective was to describe the homology and genetic sequences of the PCV2 virus obtained from different regions of Mexico. The results show that in Mexico are present both subgenotypes \"a\" and \"b\" of this virus and the homologies are from 89 to 99%. Regarding with the aminoacid sequence, three major heterogenic regions were present in the position 59-91, 123–136 and 185–210. This study presents the results of the first genetic characterization of PCV2 in production herds from Mexico.

Keywords: PCV-2, sequencing analysis, Mexico

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1518

618 Indoor Mobile Robot Positioning Based on Wireless Fingerprint Matching

Authors: Xu Huang, Jing Fan, Maonian Wu, Yonggen Gu

Abstract:

This paper discusses the design of an indoor mobile robot positioning system. The problem of indoor positioning is solved through Wi-Fi fingerprint positioning to implement a low cost deployment. A wireless fingerprint matching algorithm based on the similarity of unequal length sequences is presented. Candidate sequences selection is defined as a set of mappings, and detection errors caused by wireless hotspot stability and the change of interior pattern can be corrected by transforming the unequal length sequences into equal length sequences. The presented scheme was verified experimentally to achieve the accuracy requirements for an indoor positioning system with low deployment cost.

Keywords: Fingerprint match, indoor positioning, mobile robot positioning system, Wi-Fi, wireless fingerprint.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1567

617 Ranking and Unranking Algorithms for k-ary Trees in Gray Code Order

Authors: Fateme Ashari-Ghomi, Najme Khorasani, Abbas Nowzari-Dalini

Abstract:

In this paper, we present two new ranking and unranking algorithms for k-ary trees represented by x-sequences in Gray code order. These algorithms are based on a gray code generation algorithm developed by Ahrabian et al.. In mentioned paper, a recursive backtracking generation algorithm for x-sequences corresponding to k-ary trees in Gray code was presented. This generation algorithm is based on Vajnovszki-s algorithm for generating binary trees in Gray code ordering. Up to our knowledge no ranking and unranking algorithms were given for x-sequences in this ordering. we present ranking and unranking algorithms with O(kn2) time complexity for x-sequences in this Gray code ordering

Keywords: k-ary Tree Generation, Ranking, Unranking, Gray Code.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2057

616 Digital Image Encryption Scheme using Chaotic Sequences with a Nonlinear Function

Authors: H. Ogras, M. Turk

Abstract:

In this study, a system of encryption based on chaotic sequences is described. The system is used for encrypting digital image data for the purpose of secure image transmission. An image secure communication scheme based on Logistic map chaotic sequences with a nonlinear function is proposed in this paper. Encryption and decryption keys are obtained by one-dimensional Logistic map that generates secret key for the input of the nonlinear function. Receiver can recover the information using the received signal and identical key sequences through the inverse system technique. The results of computer simulations indicate that the transmitted source image can be correctly and reliably recovered by using proposed scheme even under the noisy channel. The performance of the system will be discussed through evaluating the quality of recovered image with and without channel noise.

Keywords: Digital image, Image encryption, Secure communication

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2182

615 A Geometrical Perspective on the Insulin Evolution

Authors: Yuhei Kunihiro, Sorin V. Sabau, Kazuhiro Shibuya

Abstract:

We study the molecular evolution of insulin from metric geometry point of view. In mathematics, and in particular in geometry, distances and metrics between objects are of fundamental importance. Using a weaker notion than the classical distance, namely the weighted quasi-metrics, one can study the geometry of biological sequences (DNA, mRNA, or proteins) space. We analyze from geometrical point of view a family of 60 insulin homologous sequences ranging on a large variety of living organisms from human to the nematode C. elegans. We show that the distances between sequences provide important information about the evolution and function of insulin.

Keywords: Metric geometry, evolution, insulin.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1483

614 Exons and Introns Classification in Human and Other Organisms

Authors: Benjamin Y. M. Kwan, Jennifer Y. Y. Kwan, Hon Keung Kwan

Abstract:

In the paper, the relative performances on spectral classification of short exon and intron sequences of the human and eleven model organisms is studied. In the simulations, all combinations of sixteen one-sequence numerical representations, four threshold values, and four window lengths are considered. Sequences of 150-base length are chosen and for each organism, a total of 16,000 sequences are used for training and testing. Results indicate that an appropriate combination of one-sequence numerical representation, threshold value, and window length is essential for arriving at top spectral classification results. For fixed-length sequences, the precisions on exon and intron classification obtained for different organisms are not the same because of their genomic differences. In general, precision increases as sequence length increases.

Keywords: Exons and introns classification, Human genome, Model organism genome, Spectral analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2004

613 Computing Entropy for Ortholog Detection

Authors: Hsing-Kuo Pao, John Case

Abstract:

Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.

Keywords: compression, decision tree, entropy, ortholog, ROC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1786

612 Skolem Sequences and Erdosian Labellings of m Paths with 2 and 3 Vertices

Authors: H. V. Chen

Abstract:

Assume that we have m identical graphs where the graphs consists of paths with k vertices where k is a positive integer. In this paper, we discuss certain labelling of the m graphs called c-Erdösian for some positive integers c. We regard labellings of the vertices of the graphs by positive integers, which induce the edge labels for the paths as the sum of the two incident vertex labels. They have the property that each vertex label and edge label appears only once in the set of positive integers {c, . . . , c+6m- 1}. Here, we show how to construct certain c-Erdösian of m paths with 2 and 3 vertices by using Skolem sequences.

Keywords: c-Erdösian, Skolem sequences, magic labelling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1112

611 On λ− Summable of Orlicz Space of Gai Sequences of Fuzzy Numbers

Authors: N.Subramanian, S.Krishnamoorthy, S. Balasubramanian

Abstract:

In this paper the concept of strongly (λM)p - Ces'aro summability of a sequence of fuzzy numbers and strongly λM- statistically convergent sequences of fuzzy numbers is introduced.

Keywords: Fuzzy numbers, statistical convergence, Orlicz space, gai sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1894

610 Construction of cDNALibrary and EST Analysis of Tenebriomolitorlarvae

Authors: JiEun Jeong, Se-Won Kang, Hee-Ju Hwang, Sung-Hwa Chae, Sang-Haeng Choi, Hong-SeogPark, YeonSoo Han, Bok-Reul Lee, Dae-Hyun Seog, Yong Seok Lee

Abstract:

Tofurther advance research on immune-related genes from T. molitor, we constructed acDNA library and analyzed expressed sequence taq (EST) sequences from 1,056 clones. After removing vector sequence and quality checkingthrough thePhred program (trim_alt 0.05 (P-score>20), 1039 sequences were generated. The average length of insert was 792 bp. In addition, we identified 162 clusters, 167 contigs and 391 contigs after clustering and assembling process using a TGICL package. EST sequences were searchedagainst NCBI nr database by local BLAST (blastx, EKeywords: EST, Innate immunity, Tenebriomolitor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490

609 A New Predictor of Coding Regions in Genomic Sequences using a Combination of Different Approaches

Authors: Aníbal Rodríguez Fuentes, Juan V. Lorenzo Ginori, Ricardo Grau Ábalo

Abstract:

Identifying protein coding regions in DNA sequences is a basic step in the location of genes. Several approaches based on signal processing tools have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new predictor that improves the efficacy of three techniques that use the Fourier Transform to predict coding regions, and that could be computed using an algorithm that reduces the computation load. Some ideas about the combination of the predictor with other methods are discussed. ROC curves are used to demonstrate the efficacy of the proposed predictor, based on the computation of 25 DNA sequences from three different organisms.

Keywords: Bioinformatics, Coding region prediction, Computational load reduction, Digital Signal Processing, Fourier Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615