Search results for: genome mining

1420 Genome Analyses of Pseudomonas Fluorescens b29b from Coastal Kerala

Abstract:

Pseudomonas fluorescens B29B, which has asparaginase enzymatic activity, was isolated from the surface coastal seawater of Trivandrum, India. We report the complete Pseudomonas fluorescens B29B genome sequenced, identified, and annotated from a marine source. We find the genome at most minuscule a 7,331,508 bp single circular chromosome with a GC content of 62.19% and 6883 protein-coding genes. Three hundred forty subsystems were identified, including two predicted asparaginases from the genome analysis of P. fluorescens B29B for further investigation. This genome data will help further industrial biotechnology applications of proteins in general and asparaginase as a target.

Keywords: pseudomonas, marine, asparaginases, Kerala, whole-genome

Procedia PDF Downloads 209

1419 Computing the Similarity and the Diversity in the Species Based on Cronobacter Genome

Authors: E. Al Daoud

Abstract:

The purpose of computing the similarity and the diversity in the species is to trace the process of evolution and to find the relationship between the species and discover the unique, the special, the common and the universal proteins. The proteins of the whole genome of 40 species are compared with the cronobacter genome which is used as reference genome. More than 3 billion pairwise alignments are performed using blastp. Several findings are introduced in this study, for example, we found 172 proteins in cronobacter genome which have insignificant hits in other species, 116 significant proteins in the all tested species with very high score value and 129 common proteins in the plants but have insignificant hits in mammals, birds, fishes, and insects.

Keywords: genome, species, blastp, conserved genes, Cronobacter

Procedia PDF Downloads 491

1418 Genomics of Aquatic Adaptation

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining

Procedia PDF Downloads 529

1417 Genome-Wide Mining of Potential Guide RNAs for Streptococcus pyogenes and Neisseria meningitides CRISPR-Cas Systems for Genome Engineering

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system can facilitate targeted genome editing in organisms. Dual or single guide RNA (gRNA) can program the Cas9 nuclease to cut target DNA in particular areas; thus, introducing concise mutations either via error-prone non-homologous end-joining repairing or via incorporating foreign DNAs by homologous recombination between donor DNA and target area. In spite of high demand of such promising technology, developing a well-organized procedure in order for reliable mining of potential target sites for gRNAs in large genomic data is still challenging. Hence, we aimed to perform high-throughput detection of target sites by specific PAMs for not only common Streptococcus pyogenes (SpCas9) but also for Neisseria meningitides (NmCas9) CRISPR-Cas systems. Previous research confirmed the successful application of such RNA-guided Cas9 orthologs for effective gene targeting and subsequently genome manipulation. However, Cas9 orthologs need their particular PAM sequence for DNA cleavage activity. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of the target site for the two orthogonals of Cas9 protein, we created a reliable procedure to explore possible gRNA sequences. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. Finally, a complete list of all potential gRNAs along with their locations, strands, and PAMs sequence orientation can be provided for both SpCas9 as well as another potential Cas9 ortholog (NmCas9). The artificial design of potential gRNAs in a genome of interest can accelerate functional genomic studies. Consequently, the application of such novel genome editing tool (CRISPR/Cas technology) will enhance by presenting increased versatility and efficiency.

Keywords: CRISPR/Cas9 genome editing, gRNA mining, SpCas9, NmCas9

Procedia PDF Downloads 252

1416 An Improved Ant Colony Algorithm for Genome Rearrangements

Authors: Essam Al Daoud

Abstract:

Genome rearrangement is an important area in computational biology and bioinformatics. The basic problem in genome rearrangements is to compute the edit distance, i.e., the minimum number of operations needed to transform one genome into another. Unfortunately, unsigned genome rearrangement problem is NP-hard. In this study an improved ant colony optimization algorithm to approximate the edit distance is proposed. The main idea is to convert the unsigned permutation to signed permutation and evaluate the ants by using Kaplan algorithm. Two new operations are added to the standard ant colony algorithm: Replacing the worst ants by re-sampling the ants from a new probability distribution and applying the crossover operations on the best ants. The proposed algorithm is tested and compared with the improved breakpoint reversal sort algorithm by using three datasets. The results indicate that the proposed algorithm achieves better accuracy ratio than the previous methods.

Keywords: ant colony algorithm, edit distance, genome breakpoint, genome rearrangement, reversal sort

Procedia PDF Downloads 339

1415 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk

Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour

Abstract:

The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.

Keywords: cancer risk, extrinsic factors, genome sequencing, intrinsic factors

Procedia PDF Downloads 264

1414 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur

Abstract:

In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 585

1413 High-Throughput Artificial Guide RNA Sequence Design for Type I, II and III CRISPR/Cas-Mediated Genome Editing

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

A huge revolution has emerged in genome engineering by the discovery of CRISPR (clustered regularly interspaced palindromic repeats) and CRISPR-associated system genes (Cas) in bacteria. The function of type II Streptococcus pyogenes (Sp) CRISPR/Cas9 system has been confirmed in various species. Other S. thermophilus (St) CRISPR-Cas systems, CRISPR1-Cas and CRISPR3-Cas, have been also reported for preventing phage infection. The CRISPR1-Cas system interferes by cleaving foreign dsDNA entering the cell in a length-specific and orientation-dependant manner. The S. thermophilus CRISPR3-Cas system also acts by cleaving phage dsDNA genomes at the same specific position inside the targeted protospacer as observed in the CRISPR1-Cas system. It is worth mentioning, for the effective DNA cleavage activity, RNA-guided Cas9 orthologs require their own specific PAM (protospacer adjacent motif) sequences. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of target site for the three orthogonals of Cas9 protein, a well-organized procedure will be required for high-throughput and accurate mining of possible target sites in a large genomic dataset. Consequently, we created a reliable procedure to explore potential gRNA sequences for type I (Streptococcus thermophiles), II (Streptococcus pyogenes), and III (Streptococcus thermophiles) CRISPR/Cas systems. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows: i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. The output of such procedure highlights the power of comparative genome mining for different CRISPR/Cas systems. This could yield a repertoire of Cas9 variants with expanded capabilities of gRNA design, and will pave the way for further advance genome and epigenome engineering.

Keywords: CRISPR/Cas systems, gRNA mining, Streptococcus pyogenes, Streptococcus thermophiles

Procedia PDF Downloads 250

1412 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 400

1411 Project Risk Assessment of the Mining Industry of Ghana

Authors: Charles Amoatey

Abstract:

The issue of risk in the mining industry is a global phenomenon and the Ghanaian mining industry is not exempted. The main purpose of this study is to identify the critical risk factors affecting the mining industry. The study takes an integrated view of the mining industry by examining the contribution of various risk factors to mining project failure in Ghana. A questionnaire survey was conducted to solicit the critical risk factors from key mining practitioners. About 80 respondents from 11 mining firms participated in the survey. The study identified 22 risk factors contributing to mining project failure in Ghana. The five most critical risk factors based on both probability of occurrence and impact were: (1) unstable commodity prices, (2) inflation/exchange rate, (3) land degradation, (4) high cost of living and (5) government bureaucracy for obtaining licenses. Furthermore, the study found that risk assessment in the mining sector has a direct link with mining project sustainability. Mitigation measures for addressing the identified risk factors were discussed. The key findings emphasize the need for a comprehensive risk management culture in the entire mining industry.

Keywords: risk, assessment, mining, Ghana

Procedia PDF Downloads 446

1410 A Comprehensive Survey and Improvement to Existing Privacy Preserving Data Mining Techniques

Authors: Tosin Ige

Abstract:

Ethics must be a condition of the world, like logic. (Ludwig Wittgenstein, 1889-1951). As important as data mining is, it possess a significant threat to ethics, privacy, and legality, since data mining makes it difficult for an individual or consumer (in the case of a company) to control the accessibility and usage of his data. This research focuses on Current issues and the latest research and development on Privacy preserving data mining methods as at year 2022. It also discusses some advances in those techniques while at the same time highlighting and providing a new technique as a solution to an existing technique of privacy preserving data mining methods. This paper also bridges the wide gap between Data mining and the Web Application Programing Interface (web API), where research is urgently needed for an added layer of security in data mining while at the same time introducing a seamless and more efficient way of data mining.

Keywords: data, privacy, data mining, association rule, privacy preserving, mining technique

Procedia PDF Downloads 163

1409 Genome Editing in Sorghum: Advancements and Future Possibilities: A Review

Authors: Micheale Yifter Weldemichael, Hailay Mehari Gebremedhn, Teklehaimanot Hailesslasie

Abstract:

The advancement of target-specific genome editing tools, including clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein9 (Cas9), mega-nucleases, base editing (BE), prime editing (PE), transcription activator-like endonucleases (TALENs), and zinc-finger nucleases (ZFNs), have paved the way for a modern era of gene editing. CRISPR/Cas9, as a versatile, simple, cost-effective and robust system for genome editing, has dominated the genome manipulation field over the last few years. The application of CRISPR/Cas9 in sorghum improvement is particularly vital in the context of ecological, environmental and agricultural challenges, as well as global climate change. In this context, gene editing using CRISPR/Cas9 can improve nutritional value, yield, resistance to pests and disease and tolerance to different abiotic stress. Moreover, CRISPR/Cas9 can potentially perform complex editing to reshape already available elite varieties and new genetic variations. However, existing research is targeted at improving even further the effectiveness of the CRISPR/Cas9 genome editing techniques to fruitfully edit endogenous sorghum genes. These findings suggest that genome editing is a feasible and successful venture in sorghum. Newer improvements and developments of CRISPR/Cas9 techniques have further qualified researchers to modify extra genes in sorghum with improved efficiency. The fruitful application and development of CRISPR techniques for genome editing in sorghum will not only help in gene discovery, creating new, improved traits in sorghum regulating gene expression sorghum functional genomics, but also in making site-specific integration events.

Keywords: CRISPR/Cas9, genome editing, quality, sorghum, stress, yield

Procedia PDF Downloads 52

1408 Mining the Proteome of Fusobacterium nucleatum for Potential Therapeutics Discovery

Authors: Abdul Musaweer Habib, Habibul Hasan Mazumder, Saiful Islam, Sohel Sikder, Omar Faruk Sikder

Abstract:

The plethora of genome sequence information of bacteria in recent times has ushered in many novel strategies for antibacterial drug discovery and facilitated medical science to take up the challenge of the increasing resistance of pathogenic bacteria to current antibiotics. In this study, we adopted subtractive genomics approach to analyze the whole genome sequence of the Fusobacterium nucleatum, a human oral pathogen having association with colorectal cancer. Our study divulged 1499 proteins of Fusobacterium nucleatum, which has no homolog in human genome. These proteins were subjected to screening further by using the Database of Essential Genes (DEG) that resulted in the identification of 32 vitally important proteins for the bacterium. Subsequent analysis of the identified pivotal proteins, using the KEGG Automated Annotation Server (KAAS) resulted in sorting 3 key enzymes of F. nucleatum that may be good candidates as potential drug targets, since they are unique for the bacterium and absent in humans. In addition, we have demonstrated the 3-D structure of these three proteins. Finally, determination of ligand binding sites of the key proteins as well as screening for functional inhibitors that best fitted with the ligands sites were conducted to discover effective novel therapeutic compounds against Fusobacterium nucleatum.

Keywords: colorectal cancer, drug target, Fusobacterium nucleatum, homology modeling, ligands

Procedia PDF Downloads 381

1407 Block Mining: Block Chain Enabled Process Mining Database

Authors: James Newman

Abstract:

Process mining is an emerging technology that looks to serialize enterprise data in time series data. It has been used by many companies and has been the subject of a variety of research papers. However, the majority of current efforts have looked at how to best create process mining from standard relational databases. This paper is the first pass at outlining a database custom-built for the minimal viable product of process mining. We present Block Miner, a blockchain protocol to store process mining data across a distributed network. We demonstrate the feasibility of storing process mining data on the blockchain. We present a proof of concept and show how the intersection of these two technologies helps to solve a variety of issues, including but not limited to ransomware attacks, tax documentation, and conflict resolution.

Keywords: blockchain, process mining, memory optimization, protocol

Procedia PDF Downloads 92

1406 Association Rules Mining Task Using Metaheuristics: Review

Authors: Abir Derouiche, Abdesslem Layeb

Abstract:

Association Rule Mining (ARM) is one of the most popular data mining tasks and it is widely used in various areas. The search for association rules is an NP-complete problem that is why metaheuristics have been widely used to solve it. The present paper presents the ARM as an optimization problem and surveys the proposed approaches in the literature based on metaheuristics.

Keywords: Optimization, Metaheuristics, Data Mining, Association rules Mining

Procedia PDF Downloads 153

1405 Study for Establishing a Concept of Underground Mining in a Folded Deposit with Weathering

Authors: Chandan Pramanik, Bikramjit Chanda

Abstract:

Large metal mines operated with open-cast mining methods must transition to underground mining at the conclusion of the operation; however, this requires a period of a difficult time when production convergence due to interference between the two mining methods. A transition model with collaborative mining operations is presented and established in this work, based on the case of the South Kaliapani Underground Project, to address these technical issues of inadequate production security and other mining challenges during the transition phase and beyond. By integrating the technology of the small-scale Drift and Fill method and Highly productive Sub Level Open Stoping at deep section, this hybrid mining concept tries to eliminate major bottlenecks and offers an optimized production profile with the safe and sustainable operation. Considering every geo-mining aspect, this study offers a genuine and precise technical deliberation for the transition from open pit to underground mining.

Keywords: drift and fill, geo-mining aspect, sublevel open stoping, underground mining method

Procedia PDF Downloads 97

1404 The Environmental and Socio Economic Impacts of Mining on Local Livelihood in Cameroon: A Case Study in Bertoua

Authors: Fongang Robert Tichuck

Abstract:

This paper reports the findings of a study undertaken to assess the socio-economic and environmental impacts of mining in Bertoua Eastern Region of Cameroon. In addition to sampling community perceptions of mining activities, the study prescribes interventions that can assist in mitigating the negative impacts of mining. Marked environmental and interrelated socio-economic improvements can be achieved within regional artisanal gold mines if the government provides technical support to local operators, regulations are improved, and illegal mining activity is reduced.

Keywords: gold mining, socio-economic, mining activities, local people

Procedia PDF Downloads 389

1403 Genomic Adaptation to Local Climate Conditions in Native Cattle Using Whole Genome Sequencing Data

Authors: Rugang Tian

Abstract:

In this study, we generated whole-genome sequence (WGS) data from110 native cattle. Together with whole-genome sequences from world-wide cattle populations, we estimated the genetic diversity and population genetic structure of different cattle populations. Our findings revealed clustering of cattle groups in line with their geographic locations. We identified noticeable genetic diversity between indigenous cattle breeds and commercial populations. Among all studied cattle groups, lower genetic diversity measures were found in commercial populations, however, high genetic diversity were detected in some local cattle, particularly in Rashoki and Mongolian breeds. Our search for potential genomic regions under selection in native cattle revealed several candidate genes related with immune response and cold shock protein on multiple chromosomes such as TRPM8, NMUR1, PRKAA2, SMTNL2 and OXR1 that are involved in energy metabolism and metabolic homeostasis.

Keywords: cattle, whole-genome, population structure, adaptation

Procedia PDF Downloads 62

1402 Genome Sequencing, Assembly and Annotation of Gelidium Pristoides from Kenton-on-Sea, South Africa

Authors: Sandisiwe Mangali, Graeme Bradley

Abstract:

Genome is complete set of the organism's hereditary information encoded as either deoxyribonucleic acid or ribonucleic acid in most viruses. The three different types of genomes are nuclear, mitochondrial and the plastid genome and their sequences which are uncovered by genome sequencing are known as an archive for all genetic information and enable researchers to understand the composition of a genome, regulation of gene expression and also provide information on how the whole genome works. These sequences enable researchers to explore the population structure, genetic variations, and recent demographic events in threatened species. Particularly, genome sequencing refers to a process of figuring out the exact arrangement of the basic nucleotide bases of a genome and the process through which all the afore-mentioned genomes are sequenced is referred to as whole or complete genome sequencing. Gelidium pristoides is South African endemic Rhodophyta species which has been harvested in the Eastern Cape since the 1950s for its high economic value which is one motivation for its sequencing. Its endemism further motivates its sequencing for conservation biology as endemic species are more vulnerable to anthropogenic activities endangering a species. As sequencing, mapping and annotating the Gelidium pristoides genome is the aim of this study. To accomplish this aim, the genomic DNA was extracted and quantified using the Nucleospin Plank Kit, Qubit 2.0 and Nanodrop. Thereafter, the Ion Plus Fragment Library was used for preparation of a 600bp library which was then sequenced through the Ion S5 sequencing platform for two runs. The produced reads were then quality-controlled and assembled through the SPAdes assembler with default parameters and the genome assembly was quality assessed through the QUAST software. From this assembly, the plastid and the mitochondrial genomes were then sampled out using Gelidiales organellar genomes as search queries and ordered according to them using the Geneious software. The Qubit and the Nanodrop instruments revealed an A260/A280 and A230/A260 values of 1.81 and 1.52 respectively. A total of 30792074 reads were obtained and produced a total of 94140 contigs with resulted into a sequence length of 217.06 Mbp with N50 value of 3072 bp and GC content of 41.72%. A total length of 179281bp and 25734 bp was obtained for plastid and mitochondrial respectively. Genomic data allows a clear understanding of the genomic constituent of an organism and is valuable as foundation information for studies of individual genes and resolving the evolutionary relationships between organisms including Rhodophytes and other seaweeds.

Keywords: Gelidium pristoides, genome, genome sequencing and assembly, Ion S5 sequencing platform

Procedia PDF Downloads 143

1401 Genome-Wide Association Study Identify COL2A1 as a Susceptibility Gene for the Hand Development Failure of Kashin-Beck Disease

Authors: Feng Zhang

Abstract:

Kashin-Beck disease (KBD) is a chronic osteochondropathy. The mechanism of hand growth and development failure of KBD remains elusive now. In this study, we conducted a two-stage genome-wide association study (GWAS) of palmar length-width ratio (LWR) of KBD, totally involving 493 Chinese Han KBD patients. Affymetrix Genome Wide Human SNP Array 6.0 was applied for SNP genotyping. Association analysis was conducted by PLINK software. Imputation analysis was performed by IMPUTE against the reference panel of the 1000 genome project. In the GWAS, the most significant association was observed between palmar LWR and rs2071358 of COL2A1 gene (P value = 4.68×10-8). Imputation analysis identified 3 SNPs surrounding rs2071358 with significant or suggestive association signals. Replication study observed additional significant association signals at both rs2071358 (P value = 0.017) and rs4760608 (P value = 0.002) of COL2A1 gene after Bonferroni correction. Our results suggest that COL2A1 gene was a novel susceptibility gene involved in the growth and development failure of hand of KBD.

Keywords: Kashin-Beck disease, genome-wide association study, COL2A1, hand

Procedia PDF Downloads 213

1400 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 424

1399 Genome Characterization and Phylogeny Analysis of Viruses Infected Invertebrates, Parvoviridae Family

Authors: Niloofar Fariborzi, Hamzeh Alipour, Kourosh Azizi, Neda Eskandarzade, Abozar Ghorbani

Abstract:

The family Parvoviridae consists of a large diversity of single-stranded DNA viruses, which cause mild to severe diseases in both vertebrates and invertebrates. The Parvoviridae are classified into three subfamilies: Parvovirinae infect vertebrates, Densovirinae infects invertebrates, while Hamaparovirinae infects both vertebrates and invertebrates. Except for the NS1 region, which is the prime criterion for phylogeny analysis, other parts of the parvoviruses genome, such as UTRs, are diverse even among closely related viruses or within the same genus. It is believed that host switching in parvoviruses may be related to genetic changes in regions other than NS1; therefore, whole-genome screening is valuable for studying parvoviruses' host-virus interactions. The aim of this study was to analyze genome organization and phylogeny of the complete genome sequence of the 132 Paroviridae family members, focusing on viruses that infect invertebrates. The maximum and minimum divergence within each subfamily belonged to Densovirinae and Parvovirinae, respectively. The greatest evolutionary divergence was between Hamaparovirinae and Parvovirinae. Unclassified viruses were mostly from Parovirinae and had the highest divergence to densoviruses and the lowest divergence to Parovirinae viruses. In a phylogenetic tree, all hamparoviruses were found in the center of densoviruses, with the exception of Syngnathid Ichthamaparvovirus 1 (NC_055527), which was positioned between two Parvovirinae members (NC _022089 and NC_038544). The proximity of hamparoviruses members to some densoviruses strengthens the possibility that densoviruses may be the ancestors of hamaparoviruses or vice versa. Therefore, examination and phylogeny analysis of the whole genome is necessary to understand Parvoviridae family host selection.

Keywords: densoviruses, parvoviridae, bioinformatics, phylogeny

Procedia PDF Downloads 85

1398 Genome-Wide Analysis of Long Terminal Repeat (LTR) Retrotransposons in Rabbit (Oryctolagus cuniculus)

Authors: Zeeshan Khan, Faisal Nouroz, Shumaila Noureen

Abstract:

European or common rabbit (Oryctolagus cuniculus) belongs to class Mammalia, order Lagomorpha of family Leporidae. They are distributed worldwide and are native to Europe (France, Spain and Portugal) and Africa (Morocco and Algeria). LTR retrotransposons are major Class I mobile genetic elements of eukaryotic genomes and play a crucial role in genome expansion, evolution and diversification. They were mostly annotated in various genomes by conventional approaches of homology searches, which restricted the annotation of novel elements. Present work involved de novo identification of LTR retrotransposons by LTR_FINDER in haploid genome of rabbit (2247.74 Mb) distributed in 22 chromosomes, of which 7,933 putative full-length or partial copies were identified containing 69.38 Mb of elements, accounting 3.08% of the genome. Highest copy numbers (731) were found on chromosome 7, followed by chromosome 12 (705), while the lowest copy numbers (27) were detected in chromosome 19 with no elements identified from chromosome 21 due to partially sequenced chromosome, unidentified nucleotides (N) and repeated simple sequence repeats (SSRs). The identified elements ranged in sizes from 1.2 - 25.8 Kb with average sizes between 2-10 Kb. Highest percentage (4.77%) of elements was found in chromosome 15, while lowest (0.55%) in chromosome 19. The most frequent tRNA type was Arginine present in majority of the elements. Based on gained results, it was estimated that rabbit exhibits 15,866 copies having 137.73 Mb of elements accounting 6.16% of diploid genome (44 chromosomes). Further molecular analyses will be helpful in chromosomal localization and distribution of these elements on chromosomes.

Keywords: rabbit, LTR retrotransposons, genome, chromosome

Procedia PDF Downloads 141

1397 Genetic Diversity and Discovery of Unique SNPs in Five Country Cultivars of Sesamum indicum by Next-Generation Sequencing

Authors: Nam-Kuk Kim, Jin Kim, Soomin Park, Changhee Lee, Mijin Chu, Seong-Hun Lee

Abstract:

In this study, we conducted whole genome re-sequencing of 10 cultivars originated from five countries including Korea, China, India, Pakistan and Ethiopia with Sesamum indicum (Zhongzho No. 13) genome as a reference. Almost 80% of the whole genome sequences of the reference genome could be covered by sequenced reads. Numerous SNP and InDel were detected by bioinformatic analysis. Among these variants, 266,051 SNPs were identified as unique to countries. Pakistan and Ethiopia had high densities of SNPs compared to other countries. Three main clusters (cluster 1: Korea, cluster 2: Pakistan and India, cluster 3: Ethiopia and China) were recovered by neighbor-joining analysis using all variants. Interestingly, some variants were detected in DGAT1 (diacylglycerol O-acyltransferase 1) and FADS (fatty acid desaturase) genes, which are known to be related with fatty acid synthesis and metabolism. These results can provide useful information to understand the regional characteristics and develop DNA markers for origin discrimination of sesame.

Keywords: Sesamum indicum, NGS, SNP, DNA marker

Procedia PDF Downloads 323

1396 Allele Mining for Rice Sheath Blight Resistance by Whole-Genome Association Mapping in a Tail-End Population

Authors: Naoki Yamamoto, Hidenobu Ozaki, Taiichiro Ookawa, Youming Liu, Kazunori Okada, Aiping Zheng

Abstract:

Rice sheath blight is one of the destructive fungal diseases in rice. We have thought that rice sheath blight resistance is a polygenic trait. Host-pathogen interactions and secondary metabolites such as lignin and phytoalexins are likely to be involved in defense against R. solani. However, to our knowledge, it is still unknown how sheath blight resistance can be enhanced in rice breeding. To seek for an alternative genetic factor that contribute to sheath blight resistance, we mined relevant allelic variations from rice core collections created in Japan. Based on disease lesion length on detached leaf sheath, we selected 30 varieties of the top tail-end and the bottom tail-end, respectively, from the core collections to perform genome-wide association mapping. Re-sequencing reads for these varieties were used for calling single nucleotide polymorphisms among the 60 varieties to create a SNP panel, which contained 1,137,131 homozygous variant sites after filitering. Association mapping highlighted a locus on the long arm of chromosome 11, which is co-localized with three sheath blight QTLs, qShB11-2-TX, qShB11, and qSBR-11-2. Based on the localization of the trait-associated alleles, we identified an ankyryn repeat-containing protein gene (ANK-M) as an uncharacterized candidate factor for rice sheath blight resistance. Allelic distributions for ANK-M in the whole rice population supported the reliability of trait-allele associations. Gene expression characteristics were checked to evaluiate the functionality of ANK-M. Since an ANK-M homolog (OsPIANK1) in rice seems a basal defense regulator against rice blast and bacterial leaf blight, ANK-M may also play a role in the rice immune system.

Keywords: allele mining, GWAS, QTL, rice sheath blight

Procedia PDF Downloads 72

1395 Insights into the Annotated Genome Sequence of Defluviitoga tunisiensis L3 Isolated from a Thermophilic Rural Biogas Producing Plant

Authors: Irena Maus, Katharina Gabriella Cibis, Andreas Bremges, Yvonne Stolze, Geizecler Tomazetto, Daniel Wibberg, Helmut König, Alfred Pühler, Andreas Schlüter

Abstract:

Within the agricultural sector, the production of biogas from organic substrates represents an economically attractive technology to generate bioenergy. Complex consortia of microorganisms are responsible for biomass decomposition and biogas production. Recently, species belonging to the phylum Thermotogae were detected in thermophilic biogas-production plants utilizing renewable primary products for biomethanation. To analyze adaptive genome features of representative Thermotogae strains, Defluviitoga tunisiensis L3 was isolated from a rural thermophilic biogas plant (54°C) and completely sequenced on an Illumina MiSeq system. Sequencing and assembly of the D. tunisiensis L3 genome yielded a circular chromosome with a size of 2,053,097 bp and a mean GC content of 31.38%. Functional annotation of the complete genome sequence revealed that the thermophilic strain L3 encodes several genes predicted to facilitate growth of this microorganism on arabinose, galactose, maltose, mannose, fructose, raffinose, ribose, cellobiose, lactose, xylose, xylan, lactate and mannitol. Acetate, hydrogen (H2) and carbon dioxide (CO2) are supposed to be end products of the fermentation process. The latter gene products are metabolites for methanogenic archaea, the key players in the final step of the anaerobic digestion process. To determine the degree of relatedness of dominant biogas community members within selected digester systems to D. tunisiensis L3, metagenome sequences from corresponding communities were mapped on the L3 genome. These fragment recruitments revealed that metagenome reads originating from a thermophilic biogas plant covered 95% of D. tunisiensis L3 genome sequence. In conclusion, availability of the D. tunisiensis L3 genome sequence and insights into its metabolic capabilities provide the basis for biotechnological exploitation of genome features involved in thermophilic fermentation processes utilizing renewable primary products.

Keywords: genome sequence, thermophilic biogas plant, Thermotogae, Defluviitoga tunisiensis

Procedia PDF Downloads 492

1394 Genomics of Adaptation in the Sea

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: marine genomics, evolutionary bioinformatics, human genome sequencing, genomic analyses

Procedia PDF Downloads 605

1393 Review of Different Machine Learning Algorithms

Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui

Abstract:

Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.

Keywords: Data Mining, Web Mining, classification, ML Algorithms

Procedia PDF Downloads 290

1392 Object-Centric Process Mining Using Process Cubes

Authors: Anahita Farhang Ghahfarokhi, Alessandro Berti, Wil M.P. van der Aalst

Abstract:

Process mining provides ways to analyze business processes. Common process mining techniques consider the process as a whole. However, in real-life business processes, different behaviors exist that make the overall process too complex to interpret. Process comparison is a branch of process mining that isolates different behaviors of the process from each other by using process cubes. Process cubes organize event data using different dimensions. Each cell contains a set of events that can be used as an input to apply process mining techniques. Existing work on process cubes assume single case notions. However, in real processes, several case notions (e.g., order, item, package, etc.) are intertwined. Object-centric process mining is a new branch of process mining addressing multiple case notions in a process. To make a bridge between object-centric process mining and process comparison, we propose a process cube framework, which supports process cube operations such as slice and dice on object-centric event logs. To facilitate the comparison, the framework is integrated with several object-centric process discovery approaches.

Keywords: multidimensional process mining, mMulti-perspective business processes, OLAP, process cubes, process discovery, process mining

Procedia PDF Downloads 250

1391 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad

Abstract:

Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 457