Search results for: data throughput
24588 Adaptive Transmission Scheme Based on Channel State in Dual-Hop System
Authors: Seung-Jun Yu, Yong-Jun Kim, Jung-In Baik, Hyoung-Kyu Song
Abstract:
In this paper, a dual-hop relay based on channel state is studied. In the conventional relay scheme, a relay uses the same modulation method without reference to channel state. But, a relay uses an adaptive modulation method with reference to channel state. If the channel state is poor, a relay eliminates latter 2 bits and uses Quadrature Phase Shift Keying (QPSK) modulation. If channel state is good, a relay modulates the received symbols with 16-QAM symbols by using 4 bits. The performance of the proposed scheme for Symbol Error Rate (SER) and throughput is analyzed.Keywords: adaptive transmission, channel state, dual-hop, hierarchical modulation, relay
Procedia PDF Downloads 36224587 Control the Flow of Big Data
Authors: Shizra Waris, Saleem Akhtar
Abstract:
Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.Keywords: computer, it community, industry, big data
Procedia PDF Downloads 18124586 High Performance Computing and Big Data Analytics
Authors: Branci Sarra, Branci Saadia
Abstract:
Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.Keywords: high performance computing, HPC, big data, data analysis
Procedia PDF Downloads 50624585 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories
Authors: Prashant Shrivastava
Abstract:
The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.Keywords: research data, research data repositories, research data registry, re3data.org
Procedia PDF Downloads 31124584 A Study of Cloud Computing Solution for Transportation Big Data Processing
Authors: Ilgin Gökaşar, Saman Ghaffarian
Abstract:
The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing
Procedia PDF Downloads 44924583 Dynamic Risk Identification Using Fuzzy Failure Mode Effect Analysis in Fabric Process Industries: A Research Article as Management Perspective
Authors: A. Sivakumar, S. S. Darun Prakash, P. Navaneethakrishnan
Abstract:
In and around Erode District, it is estimated that more than 1250 chemical and allied textile processing fabric industries are affected, partially closed and shut off for various reasons such as poor management, poor supplier performance, lack of planning for productivity, fluctuation of output, poor investment, waste analysis, labor problems, capital/labor ratio, accumulation of stocks, poor maintenance of resources, deficiencies in the quality of fabric, low capacity utilization, age of plant and equipment, high investment and input but low throughput, poor research and development, lack of energy, workers’ fear of loss of jobs, work force mix and work ethic. The main objective of this work is to analyze the existing conditions in textile fabric sector, validate the break even of Total Productivity (TP), analyze, design and implement fuzzy sets and mathematical programming for improvement of productivity and quality dimensions in the fabric processing industry. It needs to be compatible with the reality of textile and fabric processing industries. The highly risk events from productivity and quality dimension were found by fuzzy systems and results are wrapped up among the textile fabric processing industry.Keywords: break even point, fuzzy crisp data, fuzzy sets, productivity, productivity cycle, total productive maintenance
Procedia PDF Downloads 32224582 Harmonic Data Preparation for Clustering and Classification
Authors: Ali Asheibi
Abstract:
The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.Keywords: data mining, harmonic data, clustering, classification
Procedia PDF Downloads 23324581 Linguistic Summarization of Structured Patent Data
Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay
Abstract:
Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.Keywords: data mining, fuzzy sets, linguistic summarization, patent data
Procedia PDF Downloads 25824580 Proposal of Data Collection from Probes
Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik
Abstract:
In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.Keywords: communication, computer network, data collection, probe
Procedia PDF Downloads 35024579 Performance Analysis of ERA Using Fuzzy Logic in Wireless Sensor Network
Authors: Kamalpreet Kaur, Harjit Pal Singh, Vikas Khullar
Abstract:
In Wireless Sensor Network (WSN), the main limitation is generally inimitable energy consumption during processing of the sensor nodes. Cluster head (CH) election is one of the main issues that can reduce the energy consumption. Therefore, discovering energy saving routing protocol is the focused area for research. In this paper, fuzzy-based energy aware routing protocol is presented, which enhances the stability and network lifetime of the network. Fuzzy logic ensures the well-organized selection of CH by taking four linguistic variables that are concentration, energy, centrality, and distance to base station (BS). The results show that the proposed protocol shows better results in requisites of stability and throughput of the network.Keywords: ERA, fuzzy logic, network model, WSN
Procedia PDF Downloads 26224578 Microfluidic Continuous Approaches to Produce Magnetic Nanoparticles with Homogeneous Size Distribution
Authors: Ane Larrea, Victor Sebastian, Manuel Arruebo, Jesus Santamaria
Abstract:
We present a gas-liquid microfluidic system as a reactor to obtain magnetite nanoparticles with an excellent degree of control regarding their crystalline phase, shape and size. Several types of microflow approaches were selected to prevent nanomaterial aggregation and to promote homogenous size distribution. The selected reactor consists of a mixer stage aided by ultrasound waves and a reaction stage using a N2-liquid segmented flow to prevent magnetite oxidation to non-magnetic phases. A milli-fluidic reactor was developed to increase the production rate where a magnetite throughput close to 450 mg/h in a continuous fashion was obtained.Keywords: continuous production, magnetic nanoparticles, microfluidics, nanomaterials
Procedia PDF Downloads 57324577 A Review on Big Data Movement with Different Approaches
Authors: Nay Myo Sandar
Abstract:
With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques
Procedia PDF Downloads 7024576 A Performance Analysis of Different Scheduling Schemes in WiMAX
Authors: A. Youseef
Abstract:
One of the most aims of IEEE 802.16 (WiMAX) is to present high-speed wireless access to cover wide range coverage. The base station (BS) and the subscriber station (SS) are the main parts of WiMAX. WiMAX uses either Point-to-Multipoint (PMP) or mesh topologies. In the PMP mode, the SSs connect to the BS to gain access to the network. However, in the mesh mode, the SSs connect to each other to gain access to the BS. The main components of QoS management in the 802.16 standard are the admission control, buffer management, and packet scheduling. There are several researches proposed to create an efficient packet scheduling schemes. Therefore, we use QualNet 5.0.2 to study the performance of different scheduling schemes, such as WFQ, SCFQ, RR, and SP when the numbers of SSs increase. We find that when the number of SSs increases, the average jitter and average end-to-end delay is increased and the throughput is reduced.Keywords: WiMAX, scheduling scheme, QoS, QualNet
Procedia PDF Downloads 44324575 Automated Resin Transfer Moulding of Carbon Phenolic Composites
Authors: Zhenyu Du, Ed Collings, James Meredith
Abstract:
The high cost of composite materials versus conventional materials remains a major barrier to uptake in the transport sector. This is exacerbated by a shortage of skilled labour which makes the labour content of a hand laid composite component (~40 % of total cost) an obvious target for reduction. Automation is a method to remove labour cost and improve quality. This work focuses on the challenges and benefits to automating the manufacturing process from raw fibre to trimmed component. It will detail the experimental work required to complete an automation cell, the control strategy used to integrate all machines and the final benefits in terms of throughput and cost.Keywords: automation, low cost technologies, processing and manufacturing technologies, resin transfer moulding
Procedia PDF Downloads 28224574 Optimized Approach for Secure Data Sharing in Distributed Database
Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal
Abstract:
In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.Keywords: ER-schema, electronic record, P2P framework, API, query formulation
Procedia PDF Downloads 31924573 First Attempts Using High-Throughput Sequencing in Senecio from the Andes
Authors: L. Salomon, P. Sklenar
Abstract:
The Andes hold the highest plant species diversity in the world. How this occurred is one of the most intriguing questions in studies addressing the origin and patterning of plant diversity worldwide. Recently, the explosive adaptive radiations found in high Andean groups have been pointed as triggers to this spectacular diversity. The Andes is the species-richest area for the biggest genus from the Asteraceae family: Senecio. There, the genus presents an incredible diversity of species, striking growth form variation, and large niche span. Even when some studies tried to disentangle the evolutionary story for some Andean species in Senecio, they obtained partially resolved and low supported phylogenies, as expected for recently radiated groups. The high-throughput sequencing (HTS) approaches have proved to be a powerful tool answering phylogenetic questions in those groups whose evolutionary stories are recent and traditional techniques like Sanger sequencing are not informative enough. Although these tools have been used to understand the evolution of an increasing number of Andean groups, nowadays, their scope has not been applied for Senecio. This project aims to contribute to a better knowledge of the mechanisms shaping the hyper diversity of Senecio in the Andean region, using HTS focusing on Senecio ser. Culcitium (Asteraceae), recently recircumscribed. Firstly, reconstructing a highly resolved and supported phylogeny, and after assessing the role of allopatric differentiation, hybridization, and genome duplication in the diversification of the group. Using the Hyb-Seq approach, combining target enrichment using Asteraceae COS loci baits and genome skimming, more than 100 new accessions were generated. HybPhyloMaker and HybPiper pipelines were used for the phylogenetic analyses, and another pipeline in development (Paralogue Wizard) was used to deal with paralogues. RAxML was used to generate gene trees and Astral for species tree reconstruction. Phyparts were used to explore as first step of gene tree discordance along the clades. Fully resolved with moderated supported trees were obtained, showing Senecio ser. Culcitium as monophyletic. Within the group, some species formed well-supported clades with morphologically related species, while some species would not have exclusive ancestry, in concordance with previous studies using amplified fragment length polymorphism (AFLP) showing geographical differentiation. Discordance between gene trees was detected. Paralogues were detected for many loci, indicating possible genome duplications; ploidy level estimation using flow cytometry will be carried out during the next months in order to identify the role of this process in the diversification of the group. Likewise, TreeSetViz package for Mesquite, hierarchical likelihood ratio congruence test using Concaterpillar, and Procrustean Approach to Cophylogeny (PACo), will be used to evaluate the congruence among different inheritance patterns. In order to evaluate the influence of hybridization and Incomplete Lineage Sorting (ILS) in each resultant clade from the phylogeny, Joly et al.'s 2009 method in a coalescent scenario and Paterson’s D-statistic will be performed. Even when the main discordance sources between gene trees were not explored in detail yet, the data show that at least to some degree, processes such as genome duplication, hybridization, and/or ILS could be involved in the evolution of the group.Keywords: adaptive radiations, Andes, genome duplication, hybridization, Senecio
Procedia PDF Downloads 12524572 Population Structure Analysis of Pakistani Indigenous Cattle Population by Using High Density SNP Array
Authors: Hamid Mustafa, Huson J. Heather, Kim Eiusoo, McClure Matt, Khalid Javed, Talat Nasser Pasha, Afzal Ali1, Adeela Ajmal, Tad Sonstegard
Abstract:
Genetic differences associated with speciation, breed formation or local adaptation can help to preserve and effective utilization of animals in selection programs. Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among ten Pakistani indigenous cattle breeds. In total, 25 individuals from three cattle populations, including Achi (n=08), Bhagnari (n=04) and Cholistani (n=13) were genotyped for 777, 962 single nucleotide polymorphism (SNP) markers. Population structure was examined using the linkage model in the program STRUCTURE. After characterizing SNP polymorphism in the different populations, we performed a detailed analysis of genetic structure at both the individual and population levels. The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. We further searched for spatial patterns of genetic diversity among these breeds under the recently developed spatial principal component analysis framework. Overall, such high throughput genotyping data confirmed a clear partitioning of the cattle genetic diversity into distinct breeds. The resulting complex historical origins associated with both natural and artificial selection have led to the differentiation of numerous different cattle breeds displaying a broad phenotypic variety over a short period of time.Keywords: Pakistan, cattle, genetic diversity, population structure
Procedia PDF Downloads 60224571 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands
Authors: Julio Albuja, David Zaldumbide
Abstract:
Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.Keywords: algorithms, data, decision tree, transformation
Procedia PDF Downloads 36224570 Analysis and Performance of Handover in Universal Mobile Telecommunications System (UMTS) Network Using OPNET Modeller
Authors: Latif Adnane, Benaatou Wafa, Pla Vicent
Abstract:
Handover is of great significance to achieve seamless connectivity in wireless networks. This paper gives an impression of the main factors which are being affected by the soft and the hard handovers techniques. To know and understand the handover process in The Universal Mobile Telecommunications System (UMTS) network, different statistics are calculated. This paper focuses on the quality of service (QoS) of soft and hard handover in UMTS network, which includes the analysis of received power, signal to noise radio, throughput, delay traffic, traffic received, delay, total transmit load, end to end delay and upload response time using OPNET simulator.Keywords: handover, UMTS, mobility, simulation, OPNET modeler
Procedia PDF Downloads 30624569 Application of Blockchain Technology in Geological Field
Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu
Abstract:
Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.Keywords: blockchain, intellectual property protection, geological data, big data management
Procedia PDF Downloads 7124568 Frequent Item Set Mining for Big Data Using MapReduce Framework
Authors: Tamanna Jethava, Rahul Joshi
Abstract:
Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.Keywords: frequent item set mining, big data, Hadoop, MapReduce
Procedia PDF Downloads 41024567 The Role Of Data Gathering In NGOs
Authors: Hussaini Garba Mohammed
Abstract:
Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.Keywords: reliable information, data assessment, data mining, data communication
Procedia PDF Downloads 16524566 High Throughput Virtual Screening against ns3 Helicase of Japanese Encephalitis Virus (JEV)
Authors: Soma Banerjee, Aamen Talukdar, Argha Mandal, Dipankar Chaudhuri
Abstract:
Japanese Encephalitis is a major infectious disease with nearly half the world’s population living in areas where it is prevalent. Currently, treatment for it involves only supportive care and symptom management through vaccination. Due to the lack of antiviral drugs against Japanese Encephalitis Virus (JEV), the quest for such agents remains a priority. For these reasons, simulation studies of drug targets against JEV are important. Towards this purpose, docking experiments of the kinase inhibitors were done against the chosen target NS3 helicase as it is a nucleoside binding protein. Previous efforts regarding computational drug design against JEV revealed some lead molecules by virtual screening using public domain software. To be more specific and accurate regarding finding leads, in this study a proprietary software Schrödinger-GLIDE has been used. Druggability of the pockets in the NS3 helicase crystal structure was first calculated by SITEMAP. Then the sites were screened according to compatibility with ATP. The site which is most compatible with ATP was selected as target. Virtual screening was performed by acquiring ligands from databases: KinaseSARfari, KinaseKnowledgebase and Published inhibitor Set using GLIDE. The 25 ligands with best docking scores from each database were re-docked in XP mode. Protein structure alignment of NS3 was performed using VAST against MMDB, and similar human proteins were docked to all the best scoring ligands. The low scoring ligands were chosen for further studies and the high scoring ligands were screened. Seventy-three ligands were listed as the best scoring ones after performing HTVS. Protein structure alignment of NS3 revealed 3 human proteins with RMSD values lesser than 2Å. Docking results with these three proteins revealed the inhibitors that can interfere and inhibit human proteins. Those inhibitors were screened. Among the ones left, those with docking scores worse than a threshold value were also removed to get the final hits. Analysis of the docked complexes through 2D interaction diagrams revealed the amino acid residues that are essential for ligand binding within the active site. Interaction analysis will help to find a strongly interacting scaffold among the hits. This experiment yielded 21 hits with the best docking scores which could be investigated further for their drug like properties. Aside from getting suitable leads, specific NS3 helicase-inhibitor interactions were identified. Selection of Target modification strategies complementing docking methodologies which can result in choosing better lead compounds are in progress. Those enhanced leads can lead to better in vitro testing.Keywords: antivirals, docking, glide, high-throughput virtual screening, Japanese encephalitis, ns3 helicase
Procedia PDF Downloads 21524565 The Application of Data Mining Technology in Building Energy Consumption Data Analysis
Authors: Liang Zhao, Jili Zhang, Chongquan Zhong
Abstract:
Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.Keywords: data mining, data analysis, prediction, optimization, building operational performance
Procedia PDF Downloads 83624564 To Handle Data-Driven Software Development Projects Effectively
Authors: Shahnewaz Khan
Abstract:
Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.Keywords: data, data-driven projects, data science, NLP, software project
Procedia PDF Downloads 6724563 Predictive Pathogen Biology: Genome-Based Prediction of Pathogenic Potential and Countermeasures Targets
Authors: Debjit Ray
Abstract:
Horizontal gene transfer (HGT) and recombination leads to the emergence of bacterial antibiotic resistance and pathogenic traits. HGT events can be identified by comparing a large number of fully sequenced genomes across a species or genus, define the phylogenetic range of HGT, and find potential sources of new resistance genes. In-depth comparative phylogenomics can also identify subtle genome or plasmid structural changes or mutations associated with phenotypic changes. Comparative phylogenomics requires that accurately sequenced, complete and properly annotated genomes of the organism. Assembling closed genomes requires additional mate-pair reads or “long read” sequencing data to accompany short-read paired-end data. To bring down the cost and time required of producing assembled genomes and annotating genome features that inform drug resistance and pathogenicity, we are analyzing the performance for genome assembly of data from the Illumina NextSeq, which has faster throughput than the Illumina HiSeq (~1-2 days versus ~1 week), and shorter reads (150bp paired-end versus 300bp paired end) but higher capacity (150-400M reads per run versus ~5-15M) compared to the Illumina MiSeq. Bioinformatics improvements are also needed to make rapid, routine production of complete genomes a reality. Modern assemblers such as SPAdes 3.6.0 running on a standard Linux blade are capable in a few hours of converting mixes of reads from different library preps into high-quality assemblies with only a few gaps. Remaining breaks in scaffolds are generally due to repeats (e.g., rRNA genes) are addressed by our software for gap closure techniques, that avoid custom PCR or targeted sequencing. Our goal is to improve the understanding of emergence of pathogenesis using sequencing, comparative genomics, and machine learning analysis of ~1000 pathogen genomes. Machine learning algorithms will be used to digest the diverse features (change in virulence genes, recombination, horizontal gene transfer, patient diagnostics). Temporal data and evolutionary models can thus determine whether the origin of a particular isolate is likely to have been from the environment (could it have evolved from previous isolates). It can be useful for comparing differences in virulence along or across the tree. More intriguing, it can test whether there is a direction to virulence strength. This would open new avenues in the prediction of uncharacterized clinical bugs and multidrug resistance evolution and pathogen emergence.Keywords: genomics, pathogens, genome assembly, superbugs
Procedia PDF Downloads 18824562 Establishing a Drug Discovery Platform to Progress Compounds into the Clinic
Authors: Sheraz Gul
Abstract:
The requirements for progressing a compound to clinical trials is well established and relies on the results from in-vitro and in-vivo animal tests to indicate that it is likely to be safe and efficacious when testing in humans. The typical data package required will include demonstrating compound safety, toxicity, bioavailability, pharmacodynamics (potential effects of the compound on body systems) and pharmacokinetics (how the compound is potentially absorbed, distributed, metabolised and eliminated after dosing in humans). If the desired criteria are met and the compound meets the clinical Candidate criteria and is deemed worthy of further development, a submission to regulatory bodies such as the US Food & Drug Administration for an exploratory Investigational New Drug Study can be made. The purpose of this study is to collect data to establish that the compound will not expose humans to unreasonable risks when used in limited, early-stage clinical studies in patients or normal volunteer subjects (Phase I). These studies are also designed to determine the metabolism and pharmacologic actions of the drug in humans, the side effects associated with increasing doses, and, if possible, to gain early evidence on their effectiveness. In order to reach the above goals, we have developed a pre-clinical high throughput Absorption, Distribution, Metabolism and Excretion–Toxicity (ADME–Toxicity) panel of assays to identify compounds that are likely to meet the Lead and Candidate compound acceptance criteria. This panel includes solubility studies in a range of biological fluids, cell viability studies in cancer and primary cell-lines, mitochondrial toxicity, off-target effects (across the kinase, protease, histone deacetylase, phosphodiesterase and GPCR protein families), CYP450 inhibition (5 different CYP450 enzymes), CYP450 induction, cardio-toxicity (hERG) and gene-toxicity. This panel of assays has been applied to multiple compound series developed in a number of projects delivering Lead and clinical Candidates and examples from these will be presented.Keywords: absorption, distribution, metabolism and excretion–toxicity , drug discovery, food and drug administration , pharmacodynamics
Procedia PDF Downloads 16124561 Computational Approaches to Study Lineage Plasticity in Human Pancreatic Ductal Adenocarcinoma
Authors: Almudena Espin Perez, Tyler Risom, Carl Pelz, Isabel English, Robert M. Angelo, Rosalie Sears, Andrew J. Gentles
Abstract:
Pancreatic ductal adenocarcinoma (PDAC) is one of the most deadly malignancies. The role of the tumor microenvironment (TME) is gaining significant attention in cancer research. Despite ongoing efforts, the nature of the interactions between tumors, immune cells, and stromal cells remains poorly understood. The cell-intrinsic properties that govern cell lineage plasticity in PDAC and extrinsic influences of immune populations require technically challenging approaches due to the inherently heterogeneous nature of PDAC. Understanding the cell lineage plasticity of PDAC will improve the development of novel strategies that could be translated to the clinic. Members of the team have demonstrated that the acquisition of ductal to neuroendocrine lineage plasticity in PDAC confers therapeutic resistance and is a biomarker of poor outcomes in patients. Our approach combines computational methods for deconvolving bulk transcriptomic cancer data using CIBERSORTx and high-throughput single-cell imaging using Multiplexed Ion Beam Imaging (MIBI) to study lineage plasticity in PDAC and its relationship to the infiltrating immune system. The CIBERSORTx algorithm uses signature matrices from immune cells and stroma from sorted and single-cell data in order to 1) infer the fractions of different immune cell types and stromal cells in bulked gene expression data and 2) impute a representative transcriptome profile for each cell type. We studied a unique set of 300 genomically well-characterized primary PDAC samples with rich clinical annotation. We deconvolved the PDAC transcriptome profiles using CIBERSORTx, leveraging publicly available single-cell RNA-seq data from normal pancreatic tissue and PDAC to estimate cell type proportions in PDAC, and digitally reconstruct cell-specific transcriptional profiles from our study dataset. We built signature matrices and optimized by simulations and comparison to ground truth data. We identified cell-type-specific transcriptional programs that contribute to cancer cell lineage plasticity, especially in the ductal compartment. We also studied cell differentiation hierarchies using CytoTRACE and predict cell lineage trajectories for acinar and ductal cells that we believe are pinpointing relevant information on PDAC progression. Collaborators (Angelo lab, Stanford University) has led the development of the Multiplexed Ion Beam Imaging (MIBI) platform for spatial proteomics. We will use in the very near future MIBI from tissue microarray of 40 PDAC samples to understand the spatial relationship between cancer cell lineage plasticity and stromal cells focused on infiltrating immune cells, using the relevant markers of PDAC plasticity identified from the RNA-seq analysis.Keywords: deconvolution, imaging, microenvironment, PDAC
Procedia PDF Downloads 11124560 The Relationship Between Artificial Intelligence, Data Science, and Privacy
Authors: M. Naidoo
Abstract:
Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.Keywords: artificial intelligence, data science, law, policy
Procedia PDF Downloads 9324559 Simulation Data Summarization Based on Spatial Histograms
Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura
Abstract:
In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.Keywords: simulation data, data summarization, spatial histograms, exploration, visualization
Procedia PDF Downloads 165