Search results for: whole exome sequencing data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25104

Search results for: whole exome sequencing data

24864 Assessment of DNA Sequence Encoding Techniques for Machine Learning Algorithms Using a Universal Bacterial Marker

Authors: Diego Santibañez Oyarce, Fernanda Bravo Cornejo, Camilo Cerda Sarabia, Belén Díaz Díaz, Esteban Gómez Terán, Hugo Osses Prado, Raúl Caulier-Cisterna, Jorge Vergara-Quezada, Ana Moya-Beltrán

Abstract:

The advent of high-throughput sequencing technologies has revolutionized genomics, generating vast amounts of genetic data that challenge traditional bioinformatics methods. Machine learning addresses these challenges by leveraging computational power to identify patterns and extract information from large datasets. However, biological sequence data, being symbolic and non-numeric, must be converted into numerical formats for machine learning algorithms to process effectively. So far, some encoding methods, such as one-hot encoding or k-mers, have been explored. This work proposes additional approaches for encoding DNA sequences in order to compare them with existing techniques and determine if they can provide improvements or if current methods offer superior results. Data from the 16S rRNA gene, a universal marker, was used to analyze eight bacterial groups that are significant in the pulmonary environment and have clinical implications. The bacterial genes included in this analysis are Prevotella, Abiotrophia, Acidovorax, Streptococcus, Neisseria, Veillonella, Mycobacterium, and Megasphaera. These data were downloaded from the NCBI database in Genbank file format, followed by a syntactic analysis to selectively extract relevant information from each file. For data encoding, a sequence normalization process was carried out as the first step. From approximately 22,000 initial data points, a subset was generated for testing purposes. Specifically, 55 sequences from each bacterial group met the length criteria, resulting in an initial sample of approximately 440 sequences. The sequences were encoded using different methods, including one-hot encoding, k-mers, Fourier transform, and Wavelet transform. Various machine learning algorithms, such as support vector machines, random forests, and neural networks, were trained to evaluate these encoding methods. The performance of these models was assessed using multiple metrics, including the confusion matrix, ROC curve, and F1 Score, providing a comprehensive evaluation of their classification capabilities. The results show that accuracies between encoding methods vary by up to approximately 15%, with the Fourier transform obtaining the best results for the evaluated machine learning algorithms. These findings, supported by the detailed analysis using the confusion matrix, ROC curve, and F1 Score, provide valuable insights into the effectiveness of different encoding methods and machine learning algorithms for genomic data analysis, potentially improving the accuracy and efficiency of bacterial classification and related genomic studies.

Keywords: DNA encoding, machine learning, Fourier transform, Fourier transformation

Procedia PDF Downloads 11
24863 Complete Genome Sequence Analysis of Pasteurella multocida Subspecies multocida Serotype A Strain PMTB2.1

Authors: Shagufta Jabeen, Faez J. Firdaus Abdullah, Zunita Zakaria, Nurulfiza M. Isa, Yung C. Tan, Wai Y. Yee, Abdul R. Omar

Abstract:

Pasteurella multocida (PM) is an important veterinary opportunistic pathogen particularly associated with septicemic pasteurellosis, pneumonic pasteurellosis and hemorrhagic septicemia in cattle and buffaloes. P. multocida serotype A has been reported to cause fatal pneumonia and septicemia. Pasteurella multocida subspecies multocida of serotype A Malaysian isolate PMTB2.1 was first isolated from buffaloes died of septicemia. In this study, the genome of P. multocida strain PMTB2.1 was sequenced using third-generation sequencing technology, PacBio RS2 system and analyzed bioinformatically via de novo analysis followed by in-depth analysis based on comparative genomics. Bioinformatics analysis based on de novo assembly of PacBio raw reads generated 3 contigs followed by gap filling of aligned contigs with PCR sequencing, generated a single contiguous circular chromosome with a genomic size of 2,315,138 bp and a GC content of approximately 40.32% (Accession number CP007205). The PMTB2.1 genome comprised of 2,176 protein-coding sequences, 6 rRNA operons and 56 tRNA and 4 ncRNAs sequences. The comparative genome sequence analysis of PMTB2.1 with nine complete genomes which include Actinobacillus pleuropneumoniae, Haemophilus parasuis, Escherichia coli and five P. multocida complete genome sequences including, PM70, PM36950, PMHN06, PM3480, PMHB01 and PMTB2.1 was carried out based on OrthoMCL analysis and Venn diagram. The analysis showed that 282 CDs (13%) are unique to PMTB2.1and 1,125 CDs with orthologs in all. This reflects overall close relationship of these bacteria and supports the classification in the Gamma subdivision of the Proteobacteria. In addition, genomic distance analysis among all nine genomes indicated that PMTB2.1 is closely related with other five Pasteurella species with genomic distance less than 0.13. Synteny analysis shows subtle differences in genetic structures among different P.multocida indicating the dynamics of frequent gene transfer events among different P. multocida strains. However, PM3480 and PM70 exhibited exceptionally large structural variation since they were swine and chicken isolates. Furthermore, genomic structure of PMTB2.1 is more resembling that of PM36950 with a genomic size difference of approximately 34,380 kb (smaller than PM36950) and strain-specific Integrative and Conjugative Elements (ICE) which was found only in PM36950 is absent in PMTB2.1. Meanwhile, two intact prophages sequences of approximately 62 kb were found to be present only in PMTB2.1. One of phage is similar to transposable phage SfMu. The phylogenomic tree was constructed and rooted with E. coli, A. pleuropneumoniae and H. parasuis based on OrthoMCL analysis. The genomes of P. multocida strain PMTB2.1 were clustered with bovine isolates of P. multocida strain PM36950 and PMHB01 and were separated from avian isolate PM70 and swine isolates PM3480 and PMHN06 and are distant from Actinobacillus and Haemophilus. Previous studies based on Single Nucleotide Polymorphism (SNPs) and Multilocus Sequence Typing (MLST) unable to show a clear phylogenetic relatedness between Pasteurella multocida and the different host. In conclusion, this study has provided insight on the genomic structure of PMTB2.1 in terms of potential genes that can function as virulence factors for future study in elucidating the mechanisms behind the ability of the bacteria in causing diseases in susceptible animals.

Keywords: comparative genomics, DNA sequencing, phage, phylogenomics

Procedia PDF Downloads 182
24862 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 187
24861 Identification of Blood Biomarkers Unveiling Early Alzheimer's Disease Diagnosis Through Single-Cell RNA Sequencing Data and Autoencoders

Authors: Hediyeh Talebi, Shokoofeh Ghiam, Changiz Eslahchi

Abstract:

Traditionally, Alzheimer’s disease research has focused on genes with significant fold changes, potentially neglecting subtle but biologically important alterations. Our study introduces an integrative approach that highlights genes crucial to underlying biological processes, regardless of their fold change magnitude. Alzheimer's Single-cell RNA-seq data related to the peripheral blood mononuclear cells (PBMC) was extracted from the Gene Expression Omnibus (GEO). After quality control, normalization, scaling, batch effect correction, and clustering, differentially expressed genes (DEGs) were identified with adjusted p-values less than 0.05. These DEGs were categorized based on cell-type, resulting in four datasets, each corresponding to a distinct cell type. To distinguish between cells from healthy individuals and those with Alzheimer's, an adversarial autoencoder with a classifier was employed. This allowed for the separation of healthy and diseased samples. To identify the most influential genes in this classification, the weight matrices in the network, which includes the encoder and classifier components, were multiplied, and focused on the top 20 genes. The analysis revealed that while some of these genes exhibit a high fold change, others do not. These genes, which may be overlooked by previous methods due to their low fold change, were shown to be significant in our study. The findings highlight the critical role of genes with subtle alterations in diagnosing Alzheimer's disease, a facet frequently overlooked by conventional methods. These genes demonstrate remarkable discriminatory power, underscoring the need to integrate biological relevance with statistical measures in gene prioritization. This integrative approach enhances our understanding of the molecular mechanisms in Alzheimer’s disease and provides a promising direction for identifying potential therapeutic targets.

Keywords: alzheimer's disease, single-cell RNA-seq, neural networks, blood biomarkers

Procedia PDF Downloads 58
24860 Classification of Multiple Cancer Types with Deep Convolutional Neural Network

Authors: Nan Deng, Zhenqiu Liu

Abstract:

Thousands of patients with metastatic tumors were diagnosed with cancers of unknown primary sites each year. The inability to identify the primary cancer site may lead to inappropriate treatment and unexpected prognosis. Nowadays, a large amount of genomics and transcriptomics cancer data has been generated by next-generation sequencing (NGS) technologies, and The Cancer Genome Atlas (TCGA) database has accrued thousands of human cancer tumors and healthy controls, which provides an abundance of resource to differentiate cancer types. Meanwhile, deep convolutional neural networks (CNNs) have shown high accuracy on classification among a large number of image object categories. Here, we utilize 25 cancer primary tumors and 3 normal tissues from TCGA and convert their RNA-Seq gene expression profiling to color images; train, validate and test a CNN classifier directly from these images. The performance result shows that our CNN classifier can archive >80% test accuracy on most of the tumors and normal tissues. Since the gene expression pattern of distant metastases is similar to their primary tumors, the CNN classifier may provide a potential computational strategy on identifying the unknown primary origin of metastatic cancer in order to plan appropriate treatment for patients.

Keywords: bioinformatics, cancer, convolutional neural network, deep leaning, gene expression pattern

Procedia PDF Downloads 296
24859 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 512
24858 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 315
24857 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 455
24856 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 237
24855 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 264
24854 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 354
24853 Isolation and Screening of Fungal Strains for β-Galactosidase Production

Authors: Parmjit S. Panesar, Rupinder Kaur, Ram S. Singh

Abstract:

Enzymes are the biocatalysts which catalyze the biochemical processes and thus have a wide variety of applications in the industrial sector. β-Galactosidase (E.C. 3.2.1.23) also known as lactase, is one of the prime enzymes, which has significant potential in the dairy and food processing industries. It has the capability to catalyze both the hydrolytic reaction for the production of lactose hydrolyzed milk and transgalactosylation reaction for the synthesis of prebiotics such as lactulose and galactooligosaccharides. These prebiotics have various nutritional and technological benefits. Although, the enzyme is naturally present in almonds, peaches, apricots and other variety of fruits and animals, the extraction of enzyme from these sources increases the cost of enzyme. Therefore, focus has been shifted towards the production of low cost enzyme from the microorganisms such as bacteria, yeast and fungi. As compared to yeast and bacteria, fungal β-galactosidase is generally preferred as being extracellular and thermostable in nature. Keeping the above in view, the present study was carried out for the isolation of the β-galactosidase producing fungal strain from the food as well as the agricultural wastes. A total of more than 100 fungal cultures were examined for their potential in enzyme production. All the fungal strains were screened using X-gal and IPTG as inducers in the modified Czapek Dox Agar medium. Among the various isolated fungal strains, the strain exhibiting the highest enzyme activity was chosen for further phenotypic and genotypic characterization. The strain was identified as Rhizomucor pusillus on the basis of 5.8s RNA gene sequencing data.

Keywords: beta-galactosidase, enzyme, fungal, isolation

Procedia PDF Downloads 246
24852 Contribution of PALB2 and BLM Mutations to Familial Breast Cancer Risk in BRCA1/2 Negative South African Breast Cancer Patients Detected Using High-Resolution Melting Analysis

Authors: N. C. van der Merwe, J. Oosthuizen, M. F. Makhetha, J. Adams, B. K. Dajee, S-R. Schneider

Abstract:

Women representing high-risk breast cancer families, who tested negative for pathogenic mutations in BRCA1 and BRCA2, are four times more likely to develop breast cancer compared to women in the general population. Sequencing of genes involved in genomic stability and DNA repair led to the identification of novel contributors to familial breast cancer risk. These include BLM and PALB2. Bloom's syndrome is a rare homozygous autosomal recessive chromosomal instability disorder with a high incidence of various types of neoplasia and is associated with breast cancer when in a heterozygous state. PALB2, on the other hand, binds to BRCA2 and together, they partake actively in DNA damage repair. Archived DNA samples of 66 BRCA1/2 negative high-risk breast cancer patients were retrospectively selected based on the presence of an extensive family history of the disease ( > 3 affecteds per family). All coding regions and splice-site boundaries of both genes were screened using High-Resolution Melting Analysis. Samples exhibiting variation were bi-directionally automated Sanger sequenced. The clinical significance of each variant was assessed using various in silico and splice site prediction algorithms. Comprehensive screening identified a total of 11 BLM and 26 PALB2 variants. The variants detected ranged from global to rare and included three novel mutations. Three BLM and two PALB2 likely pathogenic mutations were identified that could account for the disease in these extensive breast cancer families in the absence of BRCA mutations (BLM c.11T > A, p.V4D; BLM c.2603C > T, p.P868L; BLM c.3961G > A, p.V1321I; PALB2 c.421C > T, p.Gln141Ter; PALB2 c.508A > T, p.Arg170Ter). Conclusion: The study confirmed the contribution of pathogenic mutations in BLM and PALB2 to the familial breast cancer burden in South Africa. It explained the presence of the disease in 7.5% of the BRCA1/2 negative families with an extensive family history of breast cancer. Segregation analysis will be performed to confirm the clinical impact of these mutations for each of these families. These results justify the inclusion of both these genes in a comprehensive breast and ovarian next generation sequencing cancer panel and should be screened simultaneously with BRCA1 and BRCA2 as it might explain a significant percentage of familial breast and ovarian cancer in South Africa.

Keywords: Bloom Syndrome, familial breast cancer, PALB2, South Africa

Procedia PDF Downloads 226
24851 Secondary Metabolites Identified from a Pseudoalteromonas rubra Bacterial Strain Isolated from a Fijian Marine Alga

Authors: James Sinclair, Katy Soapi, Brad Carte

Abstract:

The marine environment has continuously demonstrated to be a rich source of secondary metabolites and bioactive compounds that can address the many pharmaceutical problems facing mankind. The emergence of multidrug resistant pathogens has caused scientists to explore contemporary ways of combating these super bugs. A red-pigmented bacterial strain isolated from a marine alga collected in Fiji was identified to be Pseudoalteromonas rubra from 16s rRNA sequencing. This bacterial strain was cultured using a yeast-peptone media and incubated for five days. The ethyl acetate extract of this bacterium was subjected to chromatographic separation techniques such as vacuum liquid chromatography, flash chromatography, size exclusion chromatography and high-pressure liquid chromatography to yield the pure compound and a number of semi-pure fractions. The crude extract and subsequent purified fractions were analyzed by ultraviolet/visible spectroscopy and mass spectroscopy and was found to contain the compounds ivermectin, stenothricin, cyclo-L-pro-L-val, prodigiosin, mycophenolic acid, phenazine-1-carboxylic acid, eplerenone, staurosporine and pseudoalteromone A. The structure of the pure compound, pseudoalteromone A, was elucidated using NMR 1H, 13C, 1H-1H COSY, HSQC and HMBC spectroscopic data.

Keywords: Pseudoalteromonas rubra, Pseudoalteromone A, secondary metabolites, structure elucidation

Procedia PDF Downloads 209
24850 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 75
24849 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 327
24848 Linking the Genetic Signature of Free-Living Soil Diazotrophs with Process Rates under Land Use Conversion in the Amazon Rainforest

Authors: Rachel Danielson, Brendan Bohannan, S.M. Tsai, Kyle Meyer, Jorge L.M. Rodrigues

Abstract:

The Amazon Rainforest is a global diversity hotspot and crucial carbon sink, but approximately 20% of its total extent has been deforested- primarily for the establishment of cattle pasture. Understanding the impact of this large-scale disturbance on soil microbial community composition and activity is crucial in understanding potentially consequential shifts in nutrient or greenhouse gas cycling, as well as adding to the body of knowledge concerning how these complex communities respond to human disturbance. In this study, surface soils (0-10cm) were collected from three forests and three 45-year-old pastures in Rondonia, Brazil (the Amazon state with the greatest rate of forest destruction) in order to determine the impact of forest conversion on microbial communities involved in nitrogen fixation. Soil chemical and physical parameters were paired with measurements of microbial activity and genetic profiles to determine how community composition and process rates relate to environmental conditions. Measuring both the natural abundance of 15N in total soil N, as well as incorporation of enriched 15N2 under incubation has revealed that conversion of primary forest to cattle pasture results in a significant increase in the rate of nitrogen fixation by free-living diazotrophs. Quantification of nifH gene copy numbers (an essential subunit encoding the nitrogenase enzyme) correspondingly reveals a significant increase of genes in pasture compared to forest soils. Additionally, genetic sequencing of both nifH genes and transcripts shows a significant increase in the diversity of the present and metabolically active diazotrophs within the soil community. Levels of both organic and inorganic nitrogen tend to be lower in pastures compared to forests, with ammonium rather than nitrate as the dominant inorganic form. However, no significant or consistent differences in total, extractable, permanganate-oxidizable, or loss-on-ignition carbon are present between the two land-use types. Forest conversion is associated with a 0.5- 1.0 unit pH increase, but concentrations of many biologically relevant nutrients such as phosphorus do not increase consistently. Increases in free-living diazotrophic community abundance and activity appear to be related to shifts in carbon to nitrogen pool ratios. Furthermore, there may be an important impact of transient, low molecular weight plant-root-derived organic carbon on free-living diazotroph communities not captured in this study. Preliminary analysis of nitrogenase gene variant composition using NovoSeq metagenomic sequencing indicates that conversion of forest to pasture may significantly enrich vanadium-based nitrogenases. This indication is complemented by a significant decrease in available soil molybdenum. Very little is known about the ecology of diazotrophs utilizing vanadium-based nitrogenases, so further analysis may reveal important environmental conditions favoring their abundance and diversity in soil systems. Taken together, the results of this study indicate a significant change in nitrogen cycling and diazotroph community composition with the conversion of the Amazon Rainforest. This may have important implications for the sustainability of cattle pastures once established since nitrogen is a crucial nutrient for forage grass productivity.

Keywords: free-living diazotrophs, land use change, metagenomic sequencing, nitrogen fixation

Procedia PDF Downloads 190
24847 Poultry as a Carrier of Chlamydia gallinacea

Authors: Monika Szymańska-Czerwińsk, Kinga Zaręba-Marchewka, Krzysztof Niemczuk

Abstract:

Chlamydiaceae are Gram-negative bacteria distributed worldwide in animals and humans. One of them is Chlamydia gallinacea recently discovered. Available data show that C. gallinacea is dominant chlamydial agent found in poultry in European and Asian countries. The aim of the studies was screening of poultry flocks in order to evaluate frequency of C. gallinacea shedding and genetic diversity. Sampling was conducted in different regions of Poland in 2019-2020. Overall, 1466 cloacal/oral swabs were collected in duplicate from 146 apparently healthy poultry flocks including chickens, turkeys, ducks, geese and quails. Dry swabs were used for DNA extraction. DNA extracts were screened using a Chlamydiaceae 23S rRNA real-time PCR assay. To identify Chlamydia species, specific real-time PCR assays were performed. Furthermore, selected samples were used for sequencing based on ompA gene fragments and variable domains (VD1-2, VD3-4). In total, 10.3% of the tested flocks were Chlamydiaceae-positive (15/146 farms). The presence of Chlamydiaceae was confirmed mainly in chickens (13/92 farms) but also in turkey (1/19 farms) and goose (1/26 farms) flocks. Eleven flocks were identified as C. gallinacea-positive while four flocks remained unclassified. Phylogenetic analysis revealed at least 16 genetic variants of C. gallinacea. Research showed that Chlamydiaceae occur in a poultry flock in Poland. The strains of C. gallinacea as dominant species show genetic variability.

Keywords: C. gallinacea, emerging agent, poultry, real-time PCR

Procedia PDF Downloads 102
24846 RNA-Seq Analysis of Coronaviridae Family and SARS-Cov-2 Prediction Using Proposed ANN

Authors: Busra Mutlu Ipek, Merve Mutlu, Ahmet Mutlu

Abstract:

Novel coronavirus COVID-19, which has recently influenced the world, poses a great threat to humanity. In order to overcome this challenging situation, scientists are working on developing effective vaccine against coronavirus. Many experts and researchers have also produced articles and done studies on this highly important subject. In this direction, this special topic was chosen for article to make a contribution to this area. The purpose of this article is to perform RNA sequence analysis of selected virus forms in the Coronaviridae family and predict/classify SARS-CoV-2 (COVID-19) from other selected complete genomes in coronaviridae family using proposed Artificial Neural Network(ANN) algorithm.

Keywords: Coronaviridae family, COVID-19, RNA sequencing, ANN, neural network

Procedia PDF Downloads 139
24845 Impact of Ocean Acidification on Gene Expression Dynamics during Development of the Sea Urchin Species Heliocidaris erythrogramma

Authors: Hannah R. Devens, Phillip L. Davidson, Dione Deaker, Kathryn E. Smith, Gregory A. Wray, Maria Byrne

Abstract:

Marine invertebrate species with calcifying larvae are especially vulnerable to ocean acidification (OA) caused by rising atmospheric CO₂ levels. Acidic conditions can delay development, suppress metabolism, and decrease the availability of carbonate ions in the ocean environment for skeletogenesis. These stresses often result in increased larval mortality, which may lead to significant ecological consequences including alterations to the larval settlement, population distribution, and genetic connectivity. Importantly, many of these physiological and developmental effects are caused by genetic and molecular level changes. Although many studies have examined the effect of near-future oceanic pH levels on gene expression in marine invertebrates, little is known about the impact of OA on gene expression in a developmental context. Here, we performed mRNA-sequencing to investigate the impact of environmental acidity on gene expression across three developmental stages in the sea urchin Heliocidaris erythrogramma. We collected RNA from gastrula, early larva, and 1-day post-metamorphic juvenile sea urchins cultured at present-day and predicted future oceanic pH levels (pH 8.1 and 7.7, respectively). We assembled an annotated reference transcriptome encompassing development from egg to ten days post-metamorphosis by combining these data with datasets from two previous developmental transcriptomic studies of H. erythrogramma. Differential gene expression and time course analyses between pH conditions revealed significant alterations to developmental transcription that are potentially associated with pH stress. Consistent with previous investigations, genes involved in biomineralization and ion transport were significantly upregulated under acidic conditions. Differences in gene expression between the two pH conditions became more pronounced post-metamorphosis, suggesting a development-dependent effect of OA on gene expression. Furthermore, many differences in gene expression later in development appeared to be a result of broad downregulation at pH 7.7: of 539 genes differentially expressed at the juvenile stage, 519 of these were lower in the acidic condition. Time course comparisons between pH 8.1 and 7.7 samples also demonstrated over 500 genes were more lowly expressed in pH 7.7 samples throughout development. Of the genes exhibiting stage-dependent expression level changes, over 15% of these diverged from the expected temporal pattern of expression in the acidic condition. Through these analyses, we identify novel candidate genes involved in development, metabolism, and transcriptional regulation that are possibly affected by pH stress. Our results demonstrate that pH stress significantly alters gene expression dynamics throughout development. A large number of genes differentially expressed between pH conditions in juveniles relative to earlier stages may be attributed to the effects of acidity on transcriptional regulation, as a greater proportion of mRNA at this later stage has been nascent transcribed rather than maternally loaded. Also, the overall downregulation of many genes in the acidic condition suggests that OA-induced developmental delay manifests as suppressed mRNA expression, possibly from lower transcription rates or increased mRNA degradation in the acidic environment. Further studies will be necessary to determine in greater detail the extent of OA effects on early developing marine invertebrates.

Keywords: development, gene expression, ocean acidification, RNA-sequencing, sea urchins

Procedia PDF Downloads 162
24844 An Analysis of the Regression Hypothesis from a Shona Broca’s Aphasci Perspective

Authors: Esther Mafunda, Simbarashe Muparangi

Abstract:

The present paper tests the applicability of the Regression Hypothesis on the pathological language dissolution of a Shona male adult with Broca’s aphasia. It particularly assesses the prediction of the Regression Hypothesis, which states that the process according to which language is forgotten will be the reversal of the process according to which it will be acquired. The main aim of the paper is to find out whether mirror symmetries between L1 acquisition and L1 dissolution of tense in Shona and, if so, what might cause these regression patterns. The paper also sought to highlight the practical contributions that Linguistic theory can make to solving language-related problems. Data was collected from a 46-year-old male adult with Broca’s aphasia who was receiving speech therapy at St Giles Rehabilitation Centre in Harare, Zimbabwe. The primary data elicitation method was experimental, using the probe technique. The TART (Test for Assessing Reference Time) Shona version in the form of sequencing pictures was used to access tense by Broca’s aphasic and 3.5-year-old child. Using the SPSS (Statistical Package for Social Studies) and Excel analysis, it was established that the use of the future tense was impaired in Shona Broca’s aphasic whilst the present and past tense was intact. However, though the past tense was intact in the male adult with Broca’s aphasic, a reference to the remote past was made. The use of the future tense was also found to be difficult for the 3,5-year-old speaking child. No difficulties were encountered in using the present and past tenses. This means that mirror symmetries were found between L1 acquisition and L1 dissolution of tense in Shona. On the basis of the results of this research, it can be concluded that the use of tense in a Shona adult with Broca’s aphasia supports the Regression Hypothesis. The findings of this study are important in terms of speech therapy in the context of Zimbabwe. The study also contributes to Bantu linguistics in general and to Shona linguistics in particular. Further studies could also be done focusing on the rest of the Bantu language varieties in terms of aphasia.

Keywords: Broca’s Aphasia, regression hypothesis, Shona, language dissolution

Procedia PDF Downloads 87
24843 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 368
24842 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 79
24841 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 422
24840 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 175
24839 Distribution and Taxonomy of Marine Fungi in Nha Trang Bay and Van Phong Bay, Vietnam

Authors: Thu Thuy Pham, Thi Chau Loan Tran, Van Duy Nguyen

Abstract:

Marine fungi play an important role in the marine ecosystems. Marine fungi also supply biomass and metabolic products of industrial value. Currently, the biodiversity of marine fungi along the coastal areas of Vietnam has not yet been studied fully. The objective of this study is to assess the spatial and temporal diversity of planktonic fungi from the coastal waters of Nha Trang Bay and Van Phong Bay in Central Vietnam using culture-dependent and independent approach. Using culture-dependent approach, filamentous fungi and yeasts were isolated on selective media and then classified by phenotype and genotype based on the sequencing of ITS (internal transcribed spacers) regions of rDNA with two primer pairs (ITS1F_KYO2 and ITS4; NS1 and NS8). Using culture-independent approach, environmental DNA samples were isolated and amplified using fungal-specific ITS primer pairs. A total of over 160 strains were isolated from 10 seawater sampling stations at 50 cm depth. They were classified into diverse genera and species of both yeast and mold. At least 5 strains could be potentially novel species. Our results also revealed that planktonic fungi were molecularly diverse with hundreds of phylotypes recovered across these two bays. The results of the study provide data about the distribution and taxonomy of mycoplankton in this area, thereby allowing assessment of their positive role in the biogeochemical cycle of coastal ecosystems and the development of new bioactive compounds for industrial applications.

Keywords: biodiversity, ITS, marine fungi, Nha Trang Bay, Van Phong Bay

Procedia PDF Downloads 182
24838 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 846
24837 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 72
24836 A Comprehensive Analysis of LACK (Leishmania Homologue of Receptors for Activated C Kinase) in the Context of Visceral Leishmaniasis

Authors: Sukrat Sinha, Abhay Kumar, Shanthy Sundaram

Abstract:

The Leishmania homologue of activated C kinase (LACK) is known T cell epitope from soluble Leishmania antigens (SLA) that confers protection against Leishmania challenge. This antigen has been found to be highly conserved among Leishmania strains. LACK has been shown to be protective against L. donovani challenge. A comprehensive analysis of several LACK sequences was completed. The analysis shows a high level of conservation, lower variability and higher antigenicity in specific portions of the LACK protein. This information provides insights for the potential consideration of LACK as a putative candidate in the context of visceral Leishmaniasis vaccine target.

Keywords: bioinformatics, genome assembly, leishmania activated protein kinase c (lack), next-generation sequencing

Procedia PDF Downloads 331
24835 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 99