Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 24768

Search results for: data throughput

24588 Adaptive Transmission Scheme Based on Channel State in Dual-Hop System

Authors: Seung-Jun Yu, Yong-Jun Kim, Jung-In Baik, Hyoung-Kyu Song

Abstract:

In this paper, a dual-hop relay based on channel state is studied. In the conventional relay scheme, a relay uses the same modulation method without reference to channel state. But, a relay uses an adaptive modulation method with reference to channel state. If the channel state is poor, a relay eliminates latter 2 bits and uses Quadrature Phase Shift Keying (QPSK) modulation. If channel state is good, a relay modulates the received symbols with 16-QAM symbols by using 4 bits. The performance of the proposed scheme for Symbol Error Rate (SER) and throughput is analyzed.

Keywords: adaptive transmission, channel state, dual-hop, hierarchical modulation, relay

Procedia PDF Downloads 362

24587 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 181

24586 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 506

24585 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 311

24584 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 449

24583 Dynamic Risk Identification Using Fuzzy Failure Mode Effect Analysis in Fabric Process Industries: A Research Article as Management Perspective

Authors: A. Sivakumar, S. S. Darun Prakash, P. Navaneethakrishnan

Abstract:

In and around Erode District, it is estimated that more than 1250 chemical and allied textile processing fabric industries are affected, partially closed and shut off for various reasons such as poor management, poor supplier performance, lack of planning for productivity, fluctuation of output, poor investment, waste analysis, labor problems, capital/labor ratio, accumulation of stocks, poor maintenance of resources, deficiencies in the quality of fabric, low capacity utilization, age of plant and equipment, high investment and input but low throughput, poor research and development, lack of energy, workers’ fear of loss of jobs, work force mix and work ethic. The main objective of this work is to analyze the existing conditions in textile fabric sector, validate the break even of Total Productivity (TP), analyze, design and implement fuzzy sets and mathematical programming for improvement of productivity and quality dimensions in the fabric processing industry. It needs to be compatible with the reality of textile and fabric processing industries. The highly risk events from productivity and quality dimension were found by fuzzy systems and results are wrapped up among the textile fabric processing industry.

Keywords: break even point, fuzzy crisp data, fuzzy sets, productivity, productivity cycle, total productive maintenance

Procedia PDF Downloads 322

24582 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 233

24581 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 258

24580 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 350

24579 Performance Analysis of ERA Using Fuzzy Logic in Wireless Sensor Network

Authors: Kamalpreet Kaur, Harjit Pal Singh, Vikas Khullar

Abstract:

In Wireless Sensor Network (WSN), the main limitation is generally inimitable energy consumption during processing of the sensor nodes. Cluster head (CH) election is one of the main issues that can reduce the energy consumption. Therefore, discovering energy saving routing protocol is the focused area for research. In this paper, fuzzy-based energy aware routing protocol is presented, which enhances the stability and network lifetime of the network. Fuzzy logic ensures the well-organized selection of CH by taking four linguistic variables that are concentration, energy, centrality, and distance to base station (BS). The results show that the proposed protocol shows better results in requisites of stability and throughput of the network.

Keywords: ERA, fuzzy logic, network model, WSN

Procedia PDF Downloads 262

24578 Microfluidic Continuous Approaches to Produce Magnetic Nanoparticles with Homogeneous Size Distribution

Authors: Ane Larrea, Victor Sebastian, Manuel Arruebo, Jesus Santamaria

Abstract:

We present a gas-liquid microfluidic system as a reactor to obtain magnetite nanoparticles with an excellent degree of control regarding their crystalline phase, shape and size. Several types of microflow approaches were selected to prevent nanomaterial aggregation and to promote homogenous size distribution. The selected reactor consists of a mixer stage aided by ultrasound waves and a reaction stage using a N2-liquid segmented flow to prevent magnetite oxidation to non-magnetic phases. A milli-fluidic reactor was developed to increase the production rate where a magnetite throughput close to 450 mg/h in a continuous fashion was obtained.

Keywords: continuous production, magnetic nanoparticles, microfluidics, nanomaterials

Procedia PDF Downloads 573

24577 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 70

24576 A Performance Analysis of Different Scheduling Schemes in WiMAX

Authors: A. Youseef

Abstract:

One of the most aims of IEEE 802.16 (WiMAX) is to present high-speed wireless access to cover wide range coverage. The base station (BS) and the subscriber station (SS) are the main parts of WiMAX. WiMAX uses either Point-to-Multipoint (PMP) or mesh topologies. In the PMP mode, the SSs connect to the BS to gain access to the network. However, in the mesh mode, the SSs connect to each other to gain access to the BS. The main components of QoS management in the 802.16 standard are the admission control, buffer management, and packet scheduling. There are several researches proposed to create an efficient packet scheduling schemes. Therefore, we use QualNet 5.0.2 to study the performance of different scheduling schemes, such as WFQ, SCFQ, RR, and SP when the numbers of SSs increase. We find that when the number of SSs increases, the average jitter and average end-to-end delay is increased and the throughput is reduced.

Keywords: WiMAX, scheduling scheme, QoS, QualNet

Procedia PDF Downloads 443

24575 Automated Resin Transfer Moulding of Carbon Phenolic Composites

Authors: Zhenyu Du, Ed Collings, James Meredith

Abstract:

The high cost of composite materials versus conventional materials remains a major barrier to uptake in the transport sector. This is exacerbated by a shortage of skilled labour which makes the labour content of a hand laid composite component (~40 % of total cost) an obvious target for reduction. Automation is a method to remove labour cost and improve quality. This work focuses on the challenges and benefits to automating the manufacturing process from raw fibre to trimmed component. It will detail the experimental work required to complete an automation cell, the control strategy used to integrate all machines and the final benefits in terms of throughput and cost.

Keywords: automation, low cost technologies, processing and manufacturing technologies, resin transfer moulding

Procedia PDF Downloads 282

24574 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 319

24573 First Attempts Using High-Throughput Sequencing in Senecio from the Andes

Authors: L. Salomon, P. Sklenar

Abstract:

The Andes hold the highest plant species diversity in the world. How this occurred is one of the most intriguing questions in studies addressing the origin and patterning of plant diversity worldwide. Recently, the explosive adaptive radiations found in high Andean groups have been pointed as triggers to this spectacular diversity. The Andes is the species-richest area for the biggest genus from the Asteraceae family: Senecio. There, the genus presents an incredible diversity of species, striking growth form variation, and large niche span. Even when some studies tried to disentangle the evolutionary story for some Andean species in Senecio, they obtained partially resolved and low supported phylogenies, as expected for recently radiated groups. The high-throughput sequencing (HTS) approaches have proved to be a powerful tool answering phylogenetic questions in those groups whose evolutionary stories are recent and traditional techniques like Sanger sequencing are not informative enough. Although these tools have been used to understand the evolution of an increasing number of Andean groups, nowadays, their scope has not been applied for Senecio. This project aims to contribute to a better knowledge of the mechanisms shaping the hyper diversity of Senecio in the Andean region, using HTS focusing on Senecio ser. Culcitium (Asteraceae), recently recircumscribed. Firstly, reconstructing a highly resolved and supported phylogeny, and after assessing the role of allopatric differentiation, hybridization, and genome duplication in the diversification of the group. Using the Hyb-Seq approach, combining target enrichment using Asteraceae COS loci baits and genome skimming, more than 100 new accessions were generated. HybPhyloMaker and HybPiper pipelines were used for the phylogenetic analyses, and another pipeline in development (Paralogue Wizard) was used to deal with paralogues. RAxML was used to generate gene trees and Astral for species tree reconstruction. Phyparts were used to explore as first step of gene tree discordance along the clades. Fully resolved with moderated supported trees were obtained, showing Senecio ser. Culcitium as monophyletic. Within the group, some species formed well-supported clades with morphologically related species, while some species would not have exclusive ancestry, in concordance with previous studies using amplified fragment length polymorphism (AFLP) showing geographical differentiation. Discordance between gene trees was detected. Paralogues were detected for many loci, indicating possible genome duplications; ploidy level estimation using flow cytometry will be carried out during the next months in order to identify the role of this process in the diversification of the group. Likewise, TreeSetViz package for Mesquite, hierarchical likelihood ratio congruence test using Concaterpillar, and Procrustean Approach to Cophylogeny (PACo), will be used to evaluate the congruence among different inheritance patterns. In order to evaluate the influence of hybridization and Incomplete Lineage Sorting (ILS) in each resultant clade from the phylogeny, Joly et al.'s 2009 method in a coalescent scenario and Paterson’s D-statistic will be performed. Even when the main discordance sources between gene trees were not explored in detail yet, the data show that at least to some degree, processes such as genome duplication, hybridization, and/or ILS could be involved in the evolution of the group.

Keywords: adaptive radiations, Andes, genome duplication, hybridization, Senecio

Procedia PDF Downloads 125

24572 Population Structure Analysis of Pakistani Indigenous Cattle Population by Using High Density SNP Array

Authors: Hamid Mustafa, Huson J. Heather, Kim Eiusoo, McClure Matt, Khalid Javed, Talat Nasser Pasha, Afzal Ali1, Adeela Ajmal, Tad Sonstegard

Abstract:

Genetic differences associated with speciation, breed formation or local adaptation can help to preserve and effective utilization of animals in selection programs. Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among ten Pakistani indigenous cattle breeds. In total, 25 individuals from three cattle populations, including Achi (n=08), Bhagnari (n=04) and Cholistani (n=13) were genotyped for 777, 962 single nucleotide polymorphism (SNP) markers. Population structure was examined using the linkage model in the program STRUCTURE. After characterizing SNP polymorphism in the different populations, we performed a detailed analysis of genetic structure at both the individual and population levels. The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. We further searched for spatial patterns of genetic diversity among these breeds under the recently developed spatial principal component analysis framework. Overall, such high throughput genotyping data confirmed a clear partitioning of the cattle genetic diversity into distinct breeds. The resulting complex historical origins associated with both natural and artificial selection have led to the differentiation of numerous different cattle breeds displaying a broad phenotypic variety over a short period of time.

Keywords: Pakistan, cattle, genetic diversity, population structure

Procedia PDF Downloads 602

24571 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 362

24570 Analysis and Performance of Handover in Universal Mobile Telecommunications System (UMTS) Network Using OPNET Modeller

Authors: Latif Adnane, Benaatou Wafa, Pla Vicent

Abstract:

Handover is of great significance to achieve seamless connectivity in wireless networks. This paper gives an impression of the main factors which are being affected by the soft and the hard handovers techniques. To know and understand the handover process in The Universal Mobile Telecommunications System (UMTS) network, different statistics are calculated. This paper focuses on the quality of service (QoS) of soft and hard handover in UMTS network, which includes the analysis of received power, signal to noise radio, throughput, delay traffic, traffic received, delay, total transmit load, end to end delay and upload response time using OPNET simulator.

Keywords: handover, UMTS, mobility, simulation, OPNET modeler

Procedia PDF Downloads 306

24569 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 71

24568 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 410

24567 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 165

24566 High Throughput Virtual Screening against ns3 Helicase of Japanese Encephalitis Virus (JEV)

Authors: Soma Banerjee, Aamen Talukdar, Argha Mandal, Dipankar Chaudhuri

Abstract:

Japanese Encephalitis is a major infectious disease with nearly half the world’s population living in areas where it is prevalent. Currently, treatment for it involves only supportive care and symptom management through vaccination. Due to the lack of antiviral drugs against Japanese Encephalitis Virus (JEV), the quest for such agents remains a priority. For these reasons, simulation studies of drug targets against JEV are important. Towards this purpose, docking experiments of the kinase inhibitors were done against the chosen target NS3 helicase as it is a nucleoside binding protein. Previous efforts regarding computational drug design against JEV revealed some lead molecules by virtual screening using public domain software. To be more specific and accurate regarding finding leads, in this study a proprietary software Schrödinger-GLIDE has been used. Druggability of the pockets in the NS3 helicase crystal structure was first calculated by SITEMAP. Then the sites were screened according to compatibility with ATP. The site which is most compatible with ATP was selected as target. Virtual screening was performed by acquiring ligands from databases: KinaseSARfari, KinaseKnowledgebase and Published inhibitor Set using GLIDE. The 25 ligands with best docking scores from each database were re-docked in XP mode. Protein structure alignment of NS3 was performed using VAST against MMDB, and similar human proteins were docked to all the best scoring ligands. The low scoring ligands were chosen for further studies and the high scoring ligands were screened. Seventy-three ligands were listed as the best scoring ones after performing HTVS. Protein structure alignment of NS3 revealed 3 human proteins with RMSD values lesser than 2Å. Docking results with these three proteins revealed the inhibitors that can interfere and inhibit human proteins. Those inhibitors were screened. Among the ones left, those with docking scores worse than a threshold value were also removed to get the final hits. Analysis of the docked complexes through 2D interaction diagrams revealed the amino acid residues that are essential for ligand binding within the active site. Interaction analysis will help to find a strongly interacting scaffold among the hits. This experiment yielded 21 hits with the best docking scores which could be investigated further for their drug like properties. Aside from getting suitable leads, specific NS3 helicase-inhibitor interactions were identified. Selection of Target modification strategies complementing docking methodologies which can result in choosing better lead compounds are in progress. Those enhanced leads can lead to better in vitro testing.

Keywords: antivirals, docking, glide, high-throughput virtual screening, Japanese encephalitis, ns3 helicase

Procedia PDF Downloads 215

24565 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 836

24564 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 67

24563 Predictive Pathogen Biology: Genome-Based Prediction of Pathogenic Potential and Countermeasures Targets

Authors: Debjit Ray

Abstract:

Horizontal gene transfer (HGT) and recombination leads to the emergence of bacterial antibiotic resistance and pathogenic traits. HGT events can be identified by comparing a large number of fully sequenced genomes across a species or genus, define the phylogenetic range of HGT, and find potential sources of new resistance genes. In-depth comparative phylogenomics can also identify subtle genome or plasmid structural changes or mutations associated with phenotypic changes. Comparative phylogenomics requires that accurately sequenced, complete and properly annotated genomes of the organism. Assembling closed genomes requires additional mate-pair reads or “long read” sequencing data to accompany short-read paired-end data. To bring down the cost and time required of producing assembled genomes and annotating genome features that inform drug resistance and pathogenicity, we are analyzing the performance for genome assembly of data from the Illumina NextSeq, which has faster throughput than the Illumina HiSeq (~1-2 days versus ~1 week), and shorter reads (150bp paired-end versus 300bp paired end) but higher capacity (150-400M reads per run versus ~5-15M) compared to the Illumina MiSeq. Bioinformatics improvements are also needed to make rapid, routine production of complete genomes a reality. Modern assemblers such as SPAdes 3.6.0 running on a standard Linux blade are capable in a few hours of converting mixes of reads from different library preps into high-quality assemblies with only a few gaps. Remaining breaks in scaffolds are generally due to repeats (e.g., rRNA genes) are addressed by our software for gap closure techniques, that avoid custom PCR or targeted sequencing. Our goal is to improve the understanding of emergence of pathogenesis using sequencing, comparative genomics, and machine learning analysis of ~1000 pathogen genomes. Machine learning algorithms will be used to digest the diverse features (change in virulence genes, recombination, horizontal gene transfer, patient diagnostics). Temporal data and evolutionary models can thus determine whether the origin of a particular isolate is likely to have been from the environment (could it have evolved from previous isolates). It can be useful for comparing differences in virulence along or across the tree. More intriguing, it can test whether there is a direction to virulence strength. This would open new avenues in the prediction of uncharacterized clinical bugs and multidrug resistance evolution and pathogen emergence.

Keywords: genomics, pathogens, genome assembly, superbugs

Procedia PDF Downloads 188

24562 Establishing a Drug Discovery Platform to Progress Compounds into the Clinic

Authors: Sheraz Gul

Abstract:

The requirements for progressing a compound to clinical trials is well established and relies on the results from in-vitro and in-vivo animal tests to indicate that it is likely to be safe and efficacious when testing in humans. The typical data package required will include demonstrating compound safety, toxicity, bioavailability, pharmacodynamics (potential effects of the compound on body systems) and pharmacokinetics (how the compound is potentially absorbed, distributed, metabolised and eliminated after dosing in humans). If the desired criteria are met and the compound meets the clinical Candidate criteria and is deemed worthy of further development, a submission to regulatory bodies such as the US Food & Drug Administration for an exploratory Investigational New Drug Study can be made. The purpose of this study is to collect data to establish that the compound will not expose humans to unreasonable risks when used in limited, early-stage clinical studies in patients or normal volunteer subjects (Phase I). These studies are also designed to determine the metabolism and pharmacologic actions of the drug in humans, the side effects associated with increasing doses, and, if possible, to gain early evidence on their effectiveness. In order to reach the above goals, we have developed a pre-clinical high throughput Absorption, Distribution, Metabolism and Excretion–Toxicity (ADME–Toxicity) panel of assays to identify compounds that are likely to meet the Lead and Candidate compound acceptance criteria. This panel includes solubility studies in a range of biological fluids, cell viability studies in cancer and primary cell-lines, mitochondrial toxicity, off-target effects (across the kinase, protease, histone deacetylase, phosphodiesterase and GPCR protein families), CYP450 inhibition (5 different CYP450 enzymes), CYP450 induction, cardio-toxicity (hERG) and gene-toxicity. This panel of assays has been applied to multiple compound series developed in a number of projects delivering Lead and clinical Candidates and examples from these will be presented.

Keywords: absorption, distribution, metabolism and excretion–toxicity , drug discovery, food and drug administration , pharmacodynamics

Procedia PDF Downloads 161

24561 Computational Approaches to Study Lineage Plasticity in Human Pancreatic Ductal Adenocarcinoma

Authors: Almudena Espin Perez, Tyler Risom, Carl Pelz, Isabel English, Robert M. Angelo, Rosalie Sears, Andrew J. Gentles

Abstract:

Pancreatic ductal adenocarcinoma (PDAC) is one of the most deadly malignancies. The role of the tumor microenvironment (TME) is gaining significant attention in cancer research. Despite ongoing efforts, the nature of the interactions between tumors, immune cells, and stromal cells remains poorly understood. The cell-intrinsic properties that govern cell lineage plasticity in PDAC and extrinsic influences of immune populations require technically challenging approaches due to the inherently heterogeneous nature of PDAC. Understanding the cell lineage plasticity of PDAC will improve the development of novel strategies that could be translated to the clinic. Members of the team have demonstrated that the acquisition of ductal to neuroendocrine lineage plasticity in PDAC confers therapeutic resistance and is a biomarker of poor outcomes in patients. Our approach combines computational methods for deconvolving bulk transcriptomic cancer data using CIBERSORTx and high-throughput single-cell imaging using Multiplexed Ion Beam Imaging (MIBI) to study lineage plasticity in PDAC and its relationship to the infiltrating immune system. The CIBERSORTx algorithm uses signature matrices from immune cells and stroma from sorted and single-cell data in order to 1) infer the fractions of different immune cell types and stromal cells in bulked gene expression data and 2) impute a representative transcriptome profile for each cell type. We studied a unique set of 300 genomically well-characterized primary PDAC samples with rich clinical annotation. We deconvolved the PDAC transcriptome profiles using CIBERSORTx, leveraging publicly available single-cell RNA-seq data from normal pancreatic tissue and PDAC to estimate cell type proportions in PDAC, and digitally reconstruct cell-specific transcriptional profiles from our study dataset. We built signature matrices and optimized by simulations and comparison to ground truth data. We identified cell-type-specific transcriptional programs that contribute to cancer cell lineage plasticity, especially in the ductal compartment. We also studied cell differentiation hierarchies using CytoTRACE and predict cell lineage trajectories for acinar and ductal cells that we believe are pinpointing relevant information on PDAC progression. Collaborators (Angelo lab, Stanford University) has led the development of the Multiplexed Ion Beam Imaging (MIBI) platform for spatial proteomics. We will use in the very near future MIBI from tissue microarray of 40 PDAC samples to understand the spatial relationship between cancer cell lineage plasticity and stromal cells focused on infiltrating immune cells, using the relevant markers of PDAC plasticity identified from the RNA-seq analysis.

Keywords: deconvolution, imaging, microenvironment, PDAC

Procedia PDF Downloads 111

24560 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 93

24559 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 165