Search results for: whole exome sequencing data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24549

Search results for: whole exome sequencing data

24339 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 128
24338 PMEL Marker Identification of Dark and Light Feather Colours in Local Canary

Authors: Mudawamah Mudawamah, Muhammad Z. Fadli, Gatot Ciptadi, Aulanni’am

Abstract:

Canary breeders have spread throughout Indonesian regions for the low-middle society and become an income source for them. The interesting phenomenon of the canary market is the feather colours become one of determining factor for the price. The advantages of this research were contributed to the molecular database as a base of selection and mating for the Indonesia canary breeder. The research method was experiment with the genome obtained from canary blood isolation. The genome did the PCR amplification with PMEL marker followed by sequencing. Canaries were used 24 heads of light and dark colour feathers. Research data analyses used BioEdit and Network 4.6.0.0 software. The results showed that all samples were amplification with PMEL gene with 500 bp fragment length. In base sequence of 40 was found Cytosine(C) in the light colour canaries, while the dark colour canaries was obtained Thymine (T) in same base sequence. Sequence results had 286-415 bp fragment and 10 haplotypes. The conclusions were the PMEL gene (gene of white pigment) was likely to be used PMEL gene to detect molecular genetic variation of dark and light colour feather.

Keywords: canary, haplotype, PMEL, sequence

Procedia PDF Downloads 206
24337 Investigating the Efficiency of Granular Sludge for Recovery of Phosphate from Wastewater

Authors: Sara Salehi, Ka Yu Cheng, Anna Heitz, Maneesha Ginige

Abstract:

This study investigated the efficiency of granular sludge for phosphorous (P) recovery from wastewater. A laboratory scale sequencing batch reactor (SBR) was operated under alternating aerobic/anaerobic conditions to enrich a P accumulating granular biomass. This study showed that an overall 45-fold increase in P concentration could be achieved by reducing the volume of the P capturing liquor by 5-fold in the anaerobic P release phase. Moreover, different fractions of the granular biomass have different individual contributions towards generating a concentrated stream of P.

Keywords: granular sludge, PAOs, P recovery, SBR

Procedia PDF Downloads 453
24336 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 180
24335 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 137
24334 Identification of Rare Mutations in Genes Involved in Monogenic Forms of Obesity and Diabetes in Obese Guadeloupean Children through Next-Generation Sequencing

Authors: Lydia Foucan, Laurent Larifla, Emmanuelle Durand, Christine Rambhojan, Veronique Dhennin, Jean-Marc Lacorte, Philippe Froguel, Amelie Bonnefond

Abstract:

In the population of Guadeloupe Island (472,124 inhabitants and 80% of subjects of African descent), overweight and obesity were estimated at 23% and 9% respectively among children. High prevalence of diabetes has been reported (~10%) in the adult population. Nevertheless, no study has investigated the contribution of gene mutations to childhood obesity in this population. We aimed to investigate rare genetic mutations in genes involved in monogenic obesity or diabetes in obese Afro-Caribbean children from Guadeloupe Island using next-generation sequencing. The present investigation included unrelated obese children, from a previous study on overweight conducted in Guadeloupe Island in 2013. We sequenced coding regions of 59 genes involved in monogenic obesity or diabetes. A total of 25 obese schoolchildren (with Z-score of body mass index [BMI]: 2.0 to 2.8) were screened for rare mutations (non-synonymous, splice-site, or insertion/deletion) in 59 genes. Mean age of the study population was 12.4 ± 1.1 years. Seventeen children (68%) had insulin-resistance (HOMA-IR > 3.16). A family history of obesity (mother or father) was observed in eight children and three of the accompanying parent presented with type 2 diabetes. None of the children had gonadotrophic abnormality or mental retardation. We detected five rare heterozygous mutations, in four genes involved in monogenic obesity, in five different obese children: MC4R p.Ile301Thr and SIM1 p.Val326Thrfs*43 mutations which were pathogenic; SIM1 p.Ser343Pro and SH2B1 p.Pro90His mutations which were likely pathogenic; and NTRK2 p.Leu140Phe that was of uncertain significance. In parallel, we identified seven carriers of mutation in ABCC8 or KCNJ11 (involved in monogenic diabetes), which were of uncertain significance (KCNJ11 p.Val13Met, KCNJ11 p.Val151Met, ABCC8 p.Lys1521Asn and ABCC8 p.Ala625Val). Rare pathogenic or likely pathogenic mutations, linked to severe obesity were detected in more than 15% of this Afro-Caribbean population at high risk of obesity and type 2 diabetes.

Keywords: childhood obesity, MC4R, monogenic obesity, SIM1

Procedia PDF Downloads 158
24333 Modelling a Hospital as a Queueing Network: Analysis for Improving Performance

Authors: Emad Alenany, M. Adel El-Baz

Abstract:

In this paper, the flow of different classes of patients into a hospital is modelled and analyzed by using the queueing network analyzer (QNA) algorithm and discrete event simulation. Input data for QNA are the rate and variability parameters of the arrival and service times in addition to the number of servers in each facility. Patient flows mostly match real flow for a hospital in Egypt. Based on the analysis of the waiting times, two approaches are suggested for improving performance: Separating patients into service groups, and adopting different service policies for sequencing patients through hospital units. The separation of a specific group of patients, with higher performance target, to be served separately from the rest of patients requiring lower performance target, requires the same capacity while improves performance for the selected group of patients with higher target. Besides, it is shown that adopting the shortest processing time and shortest remaining processing time service policies among other tested policies would results in, respectively, 11.47% and 13.75% reduction in average waiting time relative to first come first served policy.

Keywords: queueing network, discrete-event simulation, health applications, SPT

Procedia PDF Downloads 156
24332 Systematic Identification of Noncoding Cancer Driver Somatic Mutations

Authors: Zohar Manber, Ran Elkon

Abstract:

Accumulation of somatic mutations (SMs) in the genome is a major driving force of cancer development. Most SMs in the tumor's genome are functionally neutral; however, some cause damage to critical processes and provide the tumor with a selective growth advantage (termed cancer driver mutations). Current research on functional significance of SMs is mainly focused on finding alterations in protein coding sequences. However, the exome comprises only 3% of the human genome, and thus, SMs in the noncoding genome significantly outnumber those that map to protein-coding regions. Although our understanding of noncoding driver SMs is very rudimentary, it is likely that disruption of regulatory elements in the genome is an important, yet largely underexplored mechanism by which somatic mutations contribute to cancer development. The expression of most human genes is controlled by multiple enhancers, and therefore, it is conceivable that regulatory SMs are distributed across different enhancers of the same target gene. Yet, to date, most statistical searches for regulatory SMs have considered each regulatory element individually, which may reduce statistical power. The first challenge in considering the cumulative activity of all the enhancers of a gene as a single unit is to map enhancers to their target promoters. Such mapping defines for each gene its set of regulating enhancers (termed "set of regulatory elements" (SRE)). Considering multiple enhancers of each gene as one unit holds great promise for enhancing the identification of driver regulatory SMs. However, the success of this approach is greatly dependent on the availability of comprehensive and accurate enhancer-promoter (E-P) maps. To date, the discovery of driver regulatory SMs has been hindered by insufficient sample sizes and statistical analyses that often considered each regulatory element separately. In this study, we analyzed more than 2,500 whole-genome sequence (WGS) samples provided by The Cancer Genome Atlas (TCGA) and The International Cancer Genome Consortium (ICGC) in order to identify such driver regulatory SMs. Our analyses took into account the combinatorial aspect of gene regulation by considering all the enhancers that control the same target gene as one unit, based on E-P maps from three genomics resources. The identification of candidate driver noncoding SMs is based on their recurrence. We searched for SREs of genes that are "hotspots" for SMs (that is, they accumulate SMs at a significantly elevated rate). To test the statistical significance of recurrence of SMs within a gene's SRE, we used both global and local background mutation rates. Using this approach, we detected - in seven different cancer types - numerous "hotspots" for SMs. To support the functional significance of these recurrent noncoding SMs, we further examined their association with the expression level of their target gene (using gene expression data provided by the ICGC and TCGA for samples that were also analyzed by WGS).

Keywords: cancer genomics, enhancers, noncoding genome, regulatory elements

Procedia PDF Downloads 81
24331 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 174
24330 C-eXpress: A Web-Based Analysis Platform for Comparative Functional Genomics and Proteomics in Human Cancer Cell Line, NCI-60 as an Example

Authors: Chi-Ching Lee, Po-Jung Huang, Kuo-Yang Huang, Petrus Tang

Abstract:

Background: Recent advances in high-throughput research technologies such as new-generation sequencing and multi-dimensional liquid chromatography makes it possible to dissect the complete transcriptome and proteome in a single run for the first time. However, it is almost impossible for many laboratories to handle and analysis these “BIG” data without the support from a bioinformatics team. We aimed to provide a web-based analysis platform for users with only limited knowledge on bio-computing to study the functional genomics and proteomics. Method: We use NCI-60 as an example dataset to demonstrate the power of the web-based analysis platform and data delivering system: C-eXpress takes a simple text file that contain the standard NCBI gene or protein ID and expression levels (rpkm or fold) as input file to generate a distribution map of gene/protein expression levels in a heatmap diagram organized by color gradients. The diagram is hyper-linked to a dynamic html table that allows the users to filter the datasets based on various gene features. A dynamic summary chart is generated automatically after each filtering process. Results: We implemented an integrated database that contain pre-defined annotations such as gene/protein properties (ID, name, length, MW, pI); pathways based on KEGG and GO biological process; subcellular localization based on GO cellular component; functional classification based on GO molecular function, kinase, peptidase and transporter. Multiple ways of sorting of column and rows is also provided for comparative analysis and visualization of multiple samples.

Keywords: cancer, visualization, database, functional annotation

Procedia PDF Downloads 586
24329 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 443
24328 Tuberculosis in Humans and Animals in the Eastern Part of the Sudan

Authors: Yassir Adam Shuaib, Stefan Niemann, Eltahir Awad Khalil, Ulrich Schaible, Lothar Heinz Wieler, Mohammed Ahmed Bakhiet, Abbashar Osman Mohammed, Mohamed Abdelsalam Abdalla, Elvira Richter

Abstract:

Tuberculosis (TB) is a chronic bacterial disease of humans and animals and it is characterized by the progressive development of specific granulomatous tubercle lesions in affected tissues. In a six-month study, from June to November 2014, a total of 2,304 carcasses of cattle, camel, sheep, and goats slaughtered at East and West Gaash slaughterhouses, Kassala, were investigated during postmortem, in parallel, 101 sputum samples from TB suspected patients at Kassala and El-Gadarif Teaching Hospitals were collected in order to investigate tuberculosis in animals and humans. Only 0.1% carcasses were found with suspected TB lesions in the liver and lung and peritoneal cavity of two sheep and no tuberculous lesions were found in the carcasses of cattle, goats or camels. All samples, tissue lesions and sputum, were decontaminated by the NALC-NaOH method and cultured for mycobacterial growth at the NRZ for Mycobacteria, Research Center Borstel, Germany. Genotyping and molecular characterization of the grown strains were done by line probe assay (GenoType CM and MTBC) and 16S rDNA, rpoB gene, and ITS sequencing, spoligotyping, MIRU-VNTR typing and next generation sequencing (NGS). Culture of the specimens revealed growth of organisms from 81.6% of all samples. Mycobacterium tuberculosis (76.2%), M. intracellulare (14.2%), mixed infection with M. tuberculosis and M. intracellulare (6.0%) and mixed infection with M. tuberculosis and M. fortuitum and with M. intracellulare and unknown species (1.2%) were detected in the sputum samples and unknown species (1.2%) were detected in the samples of one of the animals tissues. From the 69 M. tuberculosis strains, 25 (36.2%) were showing either mono-drug-resistant or multi-drug-resistant or poly-drug-resistant but none was extensively drug-resistant. In conclusion, the prevalence of TB in animals was very low while in humans M. tuberculosis-Delhi/CAS lineage was responsible for most cases and there was an evidence of MDR transmission and acquisition.

Keywords: animal, human, slaughterhouse, Sudan, tuberculosis

Procedia PDF Downloads 340
24327 Liquid Tin(II) Alkoxide Initiators for Use in the Ring-Opening Polymerisation of Cyclic Ester Monomers

Authors: Sujitra Ruengdechawiwat, Robert Molloy, Jintana Siripitayananon, Runglawan Somsunan, Paul D. Topham, Brian J. Tighe

Abstract:

The main aim of this research has been to design and synthesize some completely soluble liquid tin(II) alkoxide initiators for use in the ring-opening polymerisation (ROP) of cyclic ester monomers. This is in contrast to conventional tin(II) alkoxides in solid form which tend to be molecular aggregates and difficult to dissolve. The liquid initiators prepared were bis(tin(II) monooctoate) diethylene glycol ([Sn(Oct)]2DEG) and bis(tin(II) monooctoate) ethylene glycol ([Sn(Oct)]2EG). Their efficiencies as initiators in the bulk ROP of ε-caprolactone (CL) at 130oC were studied kinetically by dilatometry. Kinetic data over the 20-70% conversion range was used to construct both first-order and zero-order rate plots. It was found that the rate data fitted more closely to first-order kinetics with respect to the monomer concentration and gave higher first-order rate constants than the corresponding tin(II) octoate/diol initiating systems normally used to generate the tin(II) alkoxide in situ. Since the ultimate objective of this work is to produce copolymers suitable for biomedical use as absorbable monofilament surgical sutures, poly(L-lactide-co-ε-caprolactone) 75:25 mol %, P(LL-co-CL), copolymers were synthesized using both solid and liquid tin(II) alkoxide initiators at 130°C for 48 hrs. The statistical copolymers were obtained in near-quantitative yields with compositions (from 1H-NMR) close to the initial comonomer feed ratios. The monomer sequencing (from 13C-NMR) was partly random and partly blocky (gradient-type) due to the much differing monomer reactivity ratios (rLL >> rCL). From GPC, the copolymers obtained using the soluble liquid tin(II) alkoxides were found to have higher molecular weights (Mn = 40,000-100,000) than those from the only partially soluble solid initiators (Mn = 30,000-52,000).

Keywords: biodegradable polyesters, poly(L-lactide-co-ε-caprolactone), ring-opening polymerisation, tin(II) alkoxide

Procedia PDF Downloads 170
24326 Isolation of the Leptospira spp. from the Rice Farming Lands in the North of Iran by EMJH Media

Authors: S. Rostampour Yasouri, M. Ghane

Abstract:

Leptospirosis is one the most important common diseases between human and live stock occurred by different species of Leptospira. This disease has been construed as the native in the northern provinces of Iran and risk of the infection with pathogenic is high. One hundred fifteen samples of water (67), soil (36) and feces of rodents (12) were collected from the rice fields of the suburbs of Tonekabon Township situated in northern part of Iran in 2012. The samples, after passage from membranous filters, were cultured in the liquid and solid EMJH medium and incubated at 30°C for 1 month. Leptospira spp. were isolated using culture technique, and the plates were studied from viewpoint of colony formation, microscopic observations and then identified by phenotyping tests. Finally, the identification of Leptospira genus was verified by PCR technique and 16S rRNA gene sequencing. Of 115 samples totally, 55 samples (47.82%) became positive by use of the culture technique which the positive cases included 47 water samples (70.14%) and 8 soil samples (22.22%), while the isolation was not accomplished from the sample of the rodents feces. Overall, according to these data, Leptospira spp. exists with high frequency in North Iran. Hence, based on foregoing evidence environments in the north of Iran are vehicles of Leptospira spp.

Keywords: EMJH Medium, Leptospira, Northern of Iran, rice fields

Procedia PDF Downloads 147
24325 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 533
24324 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 302
24323 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer

Authors: Binder Hans

Abstract:

Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.

Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas

Procedia PDF Downloads 119
24322 Modern Proteomics and the Application of Machine Learning Analyses in Proteomic Studies of Chronic Kidney Disease of Unknown Etiology

Authors: Dulanjali Ranasinghe, Isuru Supasan, Kaushalya Premachandra, Ranjan Dissanayake, Ajith Rajapaksha, Eustace Fernando

Abstract:

Proteomics studies of organisms are considered to be significantly information-rich compared to their genomic counterparts because proteomes of organisms represent the expressed state of all proteins of an organism at a given time. In modern top-down and bottom-up proteomics workflows, the primary analysis methods employed are gel–based methods such as two-dimensional (2D) electrophoresis and mass spectrometry based methods. Machine learning (ML) and artificial intelligence (AI) have been used increasingly in modern biological data analyses. In particular, the fields of genomics, DNA sequencing, and bioinformatics have seen an incremental trend in the usage of ML and AI techniques in recent years. The use of aforesaid techniques in the field of proteomics studies is only beginning to be materialised now. Although there is a wealth of information available in the scientific literature pertaining to proteomics workflows, no comprehensive review addresses various aspects of the combined use of proteomics and machine learning. The objective of this review is to provide a comprehensive outlook on the application of machine learning into the known proteomics workflows in order to extract more meaningful information that could be useful in a plethora of applications such as medicine, agriculture, and biotechnology.

Keywords: proteomics, machine learning, gel-based proteomics, mass spectrometry

Procedia PDF Downloads 126
24321 Time Travel Testing: A Mechanism for Improving Renewal Experience

Authors: Aritra Majumdar

Abstract:

While organizations strive to expand their new customer base, retaining existing relationships is a key aspect of improving overall profitability and also showcasing how successful an organization is in holding on to its customers. It is an experimentally proven fact that the lion’s share of profit always comes from existing customers. Hence seamless management of renewal journeys across different channels goes a long way in improving trust in the brand. From a quality assurance standpoint, time travel testing provides an approach to both business and technology teams to enhance the customer experience when they look to extend their partnership with the organization for a defined phase of time. This whitepaper will focus on key pillars of time travel testing: time travel planning, time travel data preparation, and enterprise automation. Along with that, it will call out some of the best practices and common accelerator implementation ideas which are generic across verticals like healthcare, insurance, etc. In this abstract document, a high-level snapshot of these pillars will be provided. Time Travel Planning: The first step of setting up a time travel testing roadmap is appropriate planning. Planning will include identifying the impacted systems that need to be time traveled backward or forward depending on the business requirement, aligning time travel with other releases, frequency of time travel testing, preparedness for handling renewal issues in production after time travel testing is done and most importantly planning for test automation testing during time travel testing. Time Travel Data Preparation: One of the most complex areas in time travel testing is test data coverage. Aligning test data to cover required customer segments and narrowing it down to multiple offer sequencing based on defined parameters are keys for successful time travel testing. Another aspect is the availability of sufficient data for similar combinations to support activities like defect retesting, regression testing, post-production testing (if required), etc. This section will talk about the necessary steps for suitable data coverage and sufficient data availability from a time travel testing perspective. Enterprise Automation: Time travel testing is never restricted to a single application. The workflow needs to be validated in the downstream applications to ensure consistency across the board. Along with that, the correctness of offers across different digital channels needs to be checked in order to ensure a smooth customer experience. This section will talk about the focus areas of enterprise automation and how automation testing can be leveraged to improve the overall quality without compromising on the project schedule. Along with the above-mentioned items, the white paper will elaborate on the best practices that need to be followed during time travel testing and some ideas pertaining to accelerator implementation. To sum it up, this paper will be written based on the real-time experience author had on time travel testing. While actual customer names and program-related details will not be disclosed, the paper will highlight the key learnings which will help other teams to implement time travel testing successfully.

Keywords: time travel planning, time travel data preparation, enterprise automation, best practices, accelerator implementation ideas

Procedia PDF Downloads 125
24320 Previously Undescribed Cardiac Abnormalities in Two Unrelated Autistic Males with Causative Variants in CHD8

Authors: Mariia A. Parfenenko, Ilya S. Dantsev, Sergei V. Bochenkov, Natalia V. Vinogradova, Olga S. Groznova, Victoria Yu. Voinova

Abstract:

Introduction: Autism is the most common neurodevelopmental disorder. Autism is characterized by difficulties in social interaction and adherence to stereotypic behavioral patterns and frequently co-occurs with epilepsy, intellectual disabilities, connective tissue disorders, and other conditions. CHD8 codes for chromodomain-helicase-DNA-binding protein 8 - a chromatin remodeler that regulates cellular proliferation and neurodevelopment in embryogenesis. CHD8 is one of the genes most frequently involved in autism. Patients and methods: 2 unrelated male patients, P3 and P12, aged 3 and 12 years old, underwent whole genome sequencing, which determined that they both had different likely pathogenic variants, both previously undescribed in literature. Sanger sequencing later determined that P12 inherited the variant from his affected mother. Results: P3 and P12 presented with autism, a developmental delay, ataxia, sleep disorders, overgrowth, and macrocephaly, as well as other clinical features typically present in patients with causative variants in CHD8. The mother of P12 also has autistic traits, as well as ataxia, hypotonia, sleep disorders, and other symptoms. However, P3 and P12 also have different cardiac abnormalities. P3 had signs of a repolarization disorder: a flattened T wave in the III and aVF derivations and a negative T wave in the V1-V2 derivations. He also had structural valve anomalies with associated regurgitation, local contractility impairment of the left ventricular, and diastolic dysfunction of the right ventricle. Meanwhile, P12 had Wolff-Parkinson-White syndrome and underwent radiofrequency ablation at the age of 2 years. At the time of observation, P12 had mild sinus arrhythmia and an incomplete right bundle branch block, as well as arterial hypertension. Discussion: Cardiac abnormalities were not previously reported in patients with causative variants in CHD8. The underlying mechanism for the formation of those abnormalities is currently unknown. However, the two hypotheses are either a disordered interaction with CHD7 – another chromodomain remodeler known to be directly involved in the cardiophenotype of CHARGE syndrome – a rare condition characterized by coloboma, heart defects and growth abnormalities, or the disrupted functioning of CHD8 as an A-Kinase Anchoring Protein, which are known to modulate cardiac function. Conclusion: We observed 2 unrelated autistic males with likely pathogenic variants in CHD8 that presented with typical symptoms of CHD8-related neurodevelopmental disorder, as well as cardiac abnormalities. Cardiac abnormalities have, until now, been considered uncharacteristic for patients with causative variants in CHD8. Further accumulation of data, including experimental evidence of the involvement of CHD8 in heart formation, will elucidate the mechanism underlying the cardiophenotype of those patients. Acknowledgements: Molecular genetic testing of the patients was made possible by the Charity Fund for medical and social genetic aid projects «Life Genome.»

Keywords: autism spectrum disorders, chromodomain-helicase-DNA-binding protein 8, neurodevelopmental disorder, cardio phenotype

Procedia PDF Downloads 55
24319 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 158
24318 Incorporating Spatial Transcriptome Data into Ligand-Receptor Analyses to Discover Regional Activation in Cells

Authors: Eric Bang

Abstract:

Interactions between receptors and ligands are crucial for many essential biological processes, including neurotransmission and metabolism. Ligand-receptor analyses that examine cell behavior and interactions often utilize cell type-specific RNA expressions from single-cell RNA sequencing (scRNA-seq) data. Using CellPhoneDB, a public repository consisting of ligands, receptors, and ligand-receptor interactions, the cell-cell interactions were explored in a specific scRNA-seq dataset from kidney tissue and portrayed the results with dot plots and heat maps. Depending on the type of cell, each ligand-receptor pair was aligned with the interacting cell type and calculated the positori probabilities of these associations, with corresponding P values reflecting average expression values between the triads and their significance. Using single-cell data (sample kidney cell references), genes in the dataset were cross-referenced with ones in the existing CellPhoneDB dataset. For example, a gene such as Pleiotrophin (PTN) present in the single-cell data also needed to be present in the CellPhoneDB dataset. Using the single-cell transcriptomics data via slide-seq and reference data, the CellPhoneDB program defines cell types and plots them in different formats, with the two main ones being dot plots and heat map plots. The dot plot displays derived measures of the cell to cell interaction scores and p values. For the dot plot, each row shows a ligand-receptor pair, and each column shows the two interacting cell types. CellPhoneDB defines interactions and interaction levels from the gene expression level, so since the p-value is on a -log10 scale, the larger dots represent more significant interactions. By performing an interaction analysis, a significant interaction was discovered for myeloid and T-cell ligand-receptor pairs, including those between Secreted Phosphoprotein 1 (SPP1) and Fibronectin 1 (FN1), which is consistent with previous findings. It was proposed that an effective protocol would involve a filtration step where cell types would be filtered out, depending on which ligand-receptor pair is activated in that part of the tissue, as well as the incorporation of the CellPhoneDB data in a streamlined workflow pipeline. The filtration step would be in the form of a Python script that expedites the manual process necessary for dataset filtration. Being in Python allows it to be integrated with the CellPhoneDB dataset for future workflow analysis. The manual process involves filtering cell types based on what ligand/receptor pair is activated in kidney cells. One limitation of this would be the fact that some pairings are activated in multiple cells at a time, so the manual manipulation of the data is reflected prior to analysis. Using the filtration script, accurate sorting is incorporated into the CellPhoneDB database rather than waiting until the output is produced and then subsequently applying spatial data. It was envisioned that this would reveal wherein the cell various ligands and receptors are interacting with different cell types, allowing for easier identification of which cells are being impacted and why, for the purpose of disease treatment. The hope is this new computational method utilizing spatially explicit ligand-receptor association data can be used to uncover previously unknown specific interactions within kidney tissue.

Keywords: bioinformatics, Ligands, kidney tissue, receptors, spatial transcriptome

Procedia PDF Downloads 117
24317 Perceptions of College Students on Whether an Intelligent Tutoring System Is a Tutor

Authors: Michael Smalenberger

Abstract:

Intelligent tutoring systems (ITS) are computer-based platforms which can incorporate artificial intelligence to provide step-by-step guidance as students practice problem-solving skills. ITS can replicate the benefits of one-on-one tutoring, foster transactivity in collaborative environments, and lead to substantial learning gains when used to supplement the instruction of a teacher or when used as the sole method of instruction. Developments improving the ease of ITS creation have recently increased their proliferation, leading many K-12 schools and institutions of higher education in the United States to regularly use ITS within classrooms. We investigated how students perceive their experience using an ITS. In this study, 111 undergraduate students used an ITS in a college-level introductory statistics course and were subsequently asked for feedback on their experience. Results show that their perceptions were generally favorable of the ITS, and most would seek to use an ITS both for STEM and non-STEM courses in the future. Along with detailed transaction-level data, this feedback also provides insights on the design of user-friendly interfaces, guidance on accessibility for students with impairments, the sequencing of exercises, students’ expectation of achievement, and comparisons to other tutoring experiences. We discuss how these findings are important for the creation, implementation, and evaluation of ITS as a mode and method of teaching and learning.

Keywords: college statistics course, intelligent tutoring systems, in vivo study, student perceptions of tutoring

Procedia PDF Downloads 79
24316 Isolation, Identification and Screening of Pectinase Producing Fungi Isolated from Apple (Malus Domestica)

Authors: Shameel Pervez, Saad Aziz Durrani, Ibatsam Khokhar

Abstract:

Pectinase is an enzyme that breaks down pectin, a compound responsible for structural integrity of the plant. Pectin is difficult to break down mechanically and the cost is very high, that is why many industries including food industries use pectinase enzyme produced by microbes for pectin breakdown. Apple (Malus domestica) is an important fruit in terms of market value. Every year, millions of apples are wasted due to post-harvest rot caused by fungi. Fungi are natural decomposers of our ecosystem and are infamous for post-harvest rot of apple fruit but at the same time they are prized for their high production of valuable extracellular enzymes such as pectinase. In this study, fungi belonging to different genus were isolated from rotten apples. Rotten samples of apple were picked from different markets of Lahore. After surface sterilization, the rotten parts were cut into small pieces and placed onto MEA media plates for three days. Afterwards, distinct colonies were picked and purified by sub-culturing. The isolates were identified to genus level through the study of basic colony morphology and microscopic features. The isolates were then subjected to screening for pectinase activity on MS media to compare pectinase production and were then subsequently tested for pathogenic activity through wound suspension method to evaluate the pathogenic activity of isolates in comparison with their pectinolytic activity. A total of twelve fungal strains were isolates from rotten apples. They were belonging to genus Penicillium, Alternaria, Paecilomyces and Rhizopus. Upon screening for pectinolytic activity, isolates Pen 1, Pen 4, and Rz showed high pectinolytic activity and were further subjected to DNA isolation and partial sequencing for species identification. The results of partial sequencing were combined with in-depth study of morphological features revealing Pen 1 as Penicillium janthinellum, Pen 4 as Penicillium griseofulvum, and Rz as Rhizopus microsporus. Pathogenic activity of all twelve isolates was evaluated. Penicillium spp. were highly pathogenic and destructive and same was the case with Paecilomyces sp. and Rhizopus sp. However, Alternaria spp. were found to be more consistent in their pathogenic activity, on all types of apples.

Keywords: apple, pectinase, fungal pathogens, penicillium, rhizopus

Procedia PDF Downloads 28
24315 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 484
24314 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 295
24313 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 422
24312 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 218
24311 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 245
24310 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 331