Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 24592

Search results for: genomic data

24502 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 544

24501 Comparative Study of Dose Calculation Accuracy in Bone Marrow Using Monte Carlo Method

Authors: Marzieh Jafarzadeh, Fatemeh Rezaee

Abstract:

Introduction: The effect of ionizing radiation on human health can be effective for genomic integrity and cell viability. It also increases the risk of cancer and malignancy. Therefore, X-ray behavior and absorption dose calculation are considered. One of the applicable tools for calculating and evaluating the absorption dose in human tissues is Monte Carlo simulation. Monte Carlo offers a straightforward way to simulate and integrate, and because it is simple and straightforward, Monte Carlo is easy to use. The Monte Carlo BEAMnrc code is one of the most common diagnostic X-ray simulation codes used in this study. Method: In one of the understudy hospitals, a certain number of CT scan images of patients who had previously been imaged were extracted from the hospital database. BEAMnrc software was used for simulation. The simulation of the head of the device with the energy of 0.09 MeV with 500 million particles was performed, and the output data obtained from the simulation was applied for phantom construction using CT CREATE software. The percentage of depth dose (PDD) was calculated using STATE DOSE was then compared with international standard values. Results and Discussion: The ratio of surface dose to depth dose (D/Ds) in the measured energy was estimated to be about 4% to 8% for bone and 3% to 7% for bone marrow. Conclusion: MC simulation is an efficient and accurate method for simulating bone marrow and calculating the absorbed dose.

Keywords: Monte Carlo, absorption dose, BEAMnrc, bone marrow

Procedia PDF Downloads 196

24500 Multi-omics Integrative Analysis with Genome-Scale Metabolic Model Simulation Reveals Reaction Essentiality data in Human Astrocytes Under the Lipotoxic Effect of Palmitic Acid

Authors: Janneth Gonzalez, Andres Pinzon Velasco, Maria Angarita, Nicolas Mendoza

Abstract:

Astrocytes play an important role in various processes in the brain, including pathological conditions such as neurodegenerative diseases. Recent studies have shown that the increase in saturated fatty acids such as palmitic acid (PA) triggers pro-inflammatory pathways in the brain. The use of synthetic neurosteroids such as tibolone has demonstrated neuro-protective mechanisms. However, there are few studies on the neuro-protective mechanisms of tibolone, especially at the systemic (omic) level. In this study, we performed the integration of multi-omic data (transcriptome and proteome) into a human astrocyte genomic scale metabolic model to study the astrocytic response during palmitate treatment. We evaluated metabolic fluxes in three scenarios (healthy, induced inflammation by PA, and tibolone treatment under PA inflammation). We also use control theory to identify those reactions that control the astrocytic system. Our results suggest that PA generates a modulation of central and secondary metabolism, showing a change in energy source use through inhibition of folate cycle and fatty acid β-oxidation and upregulation of ketone bodies formation.We found 25 metabolic switches under PA-mediated cellular regulation, 9 of which were critical only in the inflammatory scenario but not in the protective tibolone one. Within these reactions, inhibitory, total, and directional coupling profiles were key findings, playing a fundamental role in the (de)regulation in metabolic pathways that increase neurotoxicity and represent potential treatment targets. Finally, this study framework facilitates the understanding of metabolic regulation strategies, andit can be used for in silico exploring the mechanisms of astrocytic cell regulation, directing a more complex future experimental work in neurodegenerative diseases.

Keywords: astrocytes, data integration, palmitic acid, computational model, multi-omics, control theory

Procedia PDF Downloads 106

24499 Genetic Determinants of Ovarian Response to Gonadotropin Stimulation in Women Undergoing Assisted Reproductive Treatment

Authors: D. Tohlob, E. Abo Hashem, N. Ghareeb, M. Ghanem, R. Elfarahaty, S. A. Roberts, P. Pemberton, L. Mohiyiddeen, W. G. Newman

Abstract:

Gonadotropin stimulation is used in females undergoing assisted reproductive treatment for ovulation induction, but ovarian response is variable and unpredictable in these women. More effective protocols and individualization of treatment are needed to increase the success rate of IVF/ICSI cycles. We genotyped seven variants reported in previous studies to be associated with ovarian response (number of ova retrieved and total gonadotropin dose) in women undergoing IVF treatment including FSHR variants Asn 680 Ser (c.2039 A > G), Thr 307 Ala (c. 919 > A), -29 G > A, HRG c.610 C > T gene, BMP15 -9 C > G, AMH Ile 49 Ser (c.146 G > T), and AMHR -489A˃G in 118 Egyptian females attending Mansoura Integrated Fertility Center in Egypt, these females were undergoing their first cycle of controlled ovarian hyper stimulation for IVF/ICSI treatment. They were analyzed by TaqMan allelic discrimination assay in Manchester Center of Genomic Medicine. We found no evidence of any significant difference (p value < 0.05) in the number of eggs retrieved or the gonadotropin dose used between individuals in all genotypes except for HRG c.610 C > T gene polymorphism where regression analysis gives a p value of 0.04 with a fewer eggs number in TT genotyped females. These results indicate that these variants do not provide sufficient clinically relevant data to individualize the treatment protocols.

Keywords: controlled ovarian hyperstimulation, gene variants, ovarian response, assisted reproduction

Procedia PDF Downloads 305

24498 RAPD Analysis of Genetic Diversity of Castor Bean

Authors: M. Vivodík, Ž. Balážová, Z. Gálová

Abstract:

The aim of this work was to detect genetic variability among the set of 40 castor genotypes using 8 RAPD markers. Amplification of genomic DNA of 40 genotypes, using RAPD analysis, yielded in 66 fragments, with an average of 8.25 polymorphic fragments per primer. Number of amplified fragments ranged from 3 to 13, with the size of amplicons ranging from 100 to 1200 bp. Values of the polymorphic information content (PIC) value ranged from 0.556 to 0.895 with an average of 0.784 and diversity index (DI) value ranged from 0.621 to 0.896 with an average of 0.798. The dendrogram based on hierarchical cluster analysis using UPGMA algorithm was prepared and analyzed genotypes were grouped into two main clusters and only two genotypes could not be distinguished. Knowledge on the genetic diversity of castor can be used for future breeding programs for increased oil production for industrial uses.

Keywords: dendrogram, polymorphism, RAPD technique, Ricinus communis L.

Procedia PDF Downloads 451

24497 Insights into Archaeological Human Sample Microbiome Using 16S rRNA Gene Sequencing

Authors: Alisa Kazarina, Guntis Gerhards, Elina Petersone-Gordina, Ilva Pole, Viktorija Igumnova, Janis Kimsis, Valentina Capligina, Renate Ranka

Abstract:

Human body is inhabited by a vast number of microorganisms, collectively known as the human microbiome, and there is a tremendous interest in evolutionary changes in human microbial ecology, diversity and function. The field of paleomicrobiology, study of ancient human microbiome, is powered by modern techniques of Next Generation Sequencing (NGS), which allows extracting microbial genomic data directly from archaeological sample of interest. One of the major techniques is 16S rRNA gene sequencing, by which certain 16S rRNA gene hypervariable regions are being amplified and sequenced. However, some limitations of this method exist including the taxonomic precision and efficacy of different regions used. The aim of this study was to evaluate the phylogenetic sensitivity of different 16S rRNA gene hypervariable regions for microbiome studies in the archaeological samples. Towards this aim, archaeological bone samples and corresponding soil samples from each burial environment were collected in Medieval cemeteries in Latvia. The Ion 16S™ Metagenomics Kit targeting different 16S rRNA gene hypervariable regions was used for library construction (Ion Torrent technologies). Sequenced data were analysed by using appropriate bioinformatic techniques; alignment and taxonomic representation was done using Mothur program. Sequences of most abundant genus were further aligned to E. coli 16S rRNA gene reference sequence using MEGA7 in order to identify the hypervariable region of the segment of interest. Our results showed that different hypervariable regions had different discriminatory power depending on the groups of microbes, as well as the nature of samples. On the basis of our results, we suggest that wider range of primers used can provide more accurate recapitulation of microbial communities in archaeological samples. Acknowledgements. This work was supported by the ERAF grant Nr. 1.1.1.1/16/A/101.

Keywords: 16S rRNA gene, ancient human microbiome, archaeology, bioinformatics, genomics, microbiome, molecular biology, next-generation sequencing

Procedia PDF Downloads 174

24496 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 381

24495 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 119

24494 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 366

24493 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 502

24492 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks

Authors: Paul Shize Li, Frank Alber

Abstract:

A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.

Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation

Procedia PDF Downloads 124

24491 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 454

24490 GeneNet: Temporal Graph Data Visualization for Gene Nomenclature and Relationships

Authors: Jake Gonzalez, Tommy Dang

Abstract:

This paper proposes a temporal graph approach to visualize and analyze the evolution of gene relationships and nomenclature over time. An interactive web-based tool implements this temporal graph, enabling researchers to traverse a timeline and observe coupled dynamics in network topology and naming conventions. Analysis of a real human genomic dataset reveals the emergence of densely interconnected functional modules over time, representing groups of genes involved in key biological processes. For example, the antimicrobial peptide DEFA1A3 shows increased connections to related alpha-defensins involved in infection response. Tracking degree and betweenness centrality shifts over timeline iterations also quantitatively highlight the reprioritization of certain genes’ topological importance as knowledge advances. Examination of the CNR1 gene encoding the cannabinoid receptor CB1 demonstrates changing synonymous relationships and consolidating naming patterns over time, reflecting its unique functional role discovery. The integrated framework interconnecting these topological and nomenclature dynamics provides richer contextual insights compared to isolated analysis methods. Overall, this temporal graph approach enables a more holistic study of knowledge evolution to elucidate complex biology.

Keywords: temporal graph, gene relationships, nomenclature evolution, interactive visualization, biological insights

Procedia PDF Downloads 44

24489 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 382

24488 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 625

24487 The Genetic Architecture Underlying Dilated Cardiomyopathy in Singaporeans

Authors: Feng Ji Mervin Goh, Edmund Chee Jian Pua, Stuart Alexander Cook

Abstract:

Dilated cardiomyopathy (DCM) is a common cause of heart failure. Genetic mutations account for 50% of DCM cases with TTN mutations being the most common, accounting for up to 25% of DCM cases. However, the genetic architecture underlying Asian DCM patients is unknown. We evaluated 68 patients (female= 17) with DCM who underwent follow-up at the National Heart Centre, Singapore from 2013 through 2014. Clinical data were obtained and analyzed retrospectively. Genomic DNA was subjected to next-generation targeted sequencing. Nextera Rapid Capture Enrichment was used to capture the exons of a panel of 169 cardiac genes. DNA libraries were sequenced as paired-end 150-bp reads on Illumina MiSeq. Raw sequence reads were processed and analysed using standard bioinformatics techniques. The average age of onset of DCM was 46.1±10.21 years old. The average left ventricular ejection fraction (LVEF), left ventricular diastolic internal diameter (LVIDd), left ventricular systolic internal diameter (LVIDs) were 26.1±11.2%, 6.20±0.83cm, and 5.23±0.92cm respectively. The frequencies of mutations in major DCM-associated genes were as follows TTN (5.88% vs published frequency of 20%), LMNA (4.41% vs 6%), MYH7 (5.88% vs 4%), MYH6 (5.88% vs 4%), and SCN5a (4.41% vs 3%). The average callability at 10 times coverage of each major gene were: TTN (99.7%), LMNA (87.1%), MYH7 (94.8%), MYH6 (95.5%), and SCN5a (94.3%). In conclusion, TTN mutations are not common in Singaporean DCM patients. The frequencies of other major DCM-associated genes are comparable to frequencies published in the current literature.

Keywords: heart failure, dilated cardiomyopathy, genetics, next-generation sequencing

Procedia PDF Downloads 230

24486 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 355

24485 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 142

24484 Chemopreventive Efficacy Of Cdcl2(C14H21N3O2) in Rat Colon Carcinogenesis Model Using Aberrant Crypt Foci (ACF) as Endpoint Marker

Authors: Maryam Hajrezaie, Mahmood Ameen Abdulla, Nazia AbdulMajid, Maryam Zahedifard

Abstract:

Colon cancer is one of the most prevalent cancers in the world. Cancer chemoprevention is defined as the use of natural or synthetic compounds capable of inducing biological mechanisms necessary to preserve genomic fidelity. New schiff based compounds are reported to exhibit a wide spectrum of biological activities of therapeutic importance. To evaluate inhibitory properties of CdCl2(C14H21N3O2) complex on colonic aberrant crypt foci, five groups of 7-week-old male rats were used. Control group was fed with 10% Tween 20 once a day, cancer control group was intra-peritoneally injected with 15 mg/kg Azoxymethan, drug control group was injected with 15 mg/kg azoxymethan and 5-Flourouracil, experimental groups were fed with 2.5 and 5 mg/kg CdCl2(C14H21N3O2) compound each once a day. Administration of compound were found to be effectively chemoprotective. Andrographolide suppressed total colonic ACF formation up to 72% to 74%, respectively, when compared with control group. The results also showed a significant increase in glutathione peroxidase, superoxide dismutase, catalase activities and a decrease in malondialdehyde level. Immunohistochemical staining demonstrated down-regulation of PCNA protein. According to the Western blot comparison analysis, COX-2 and Bcl2 is up-regulated whilst the Bax is down-regulated. according to these data, this compound plays promising chemoprotective activity, in a model of AOM-induced in ACF.

Keywords: chemopreventive, Schiff based compound, aberrant crypt foci (ACF), immunohistochemical staining

Procedia PDF Downloads 385

24483 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 207

24482 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 151

24481 RNA-Seq Based Transcriptomic Analysis of Wheat Cultivars for Unveiling of Genomic Variations and Isolation of Drought Tolerant Genes for Genome Editing

Authors: Ghulam Muhammad Ali

Abstract:

Unveiling of genes involved in drought and root architecture using transcriptomic analyses remained fragmented for further improvement of wheat through genome editing. The purpose of this research endeavor was to unveil the variations in different genes implicated in drought tolerance and root architecture in wheat through RNA-seq data analysis. In this study seedlings of 8 days old, 6 cultivars of wheat namely, Batis, Blue Silver, Local White, UZ888, Chakwal 50 and Synthetic wheat S22 were subjected to transcriptomic analysis for root and shoot genes. Total of 12 RNA samples was sequenced by Illumina. Using updated wheat transcripts from Ensembl and IWGC references with 54,175 gene models, we found that 49,621 out of 54,175 (91.5%) genes are expressed at an RPKM of 0.1 or more (in at least 1 sample). The number of genes expressed was higher in Local White than Batis. Differentially expressed genes (DEG) were higher in Chakwal 50. Expression-based clustering indicated conserved function of DRO1and RPK1 between Arabidopsis and wheat. Dendrogram showed that Local White is sister to Chakwal 50 while Batis is closely related to Blue Silver. This study flaunts transcriptomic sequence variations in different cultivars that showed mutations in genes associated with drought that may directly contribute to drought tolerance. DRO1 and RPK1 genes were fetched/isolated for genome editing. These genes are being edited in wheat through CRISPR-Cas9 for yield enhancement.

Keywords: transcriptomic, wheat, genome editing, drought, CRISPR-Cas9, yield enhancement

Procedia PDF Downloads 126

24480 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 190

24479 Cyclocoelids (Trematoda: Echinostomata) from Gadwall Mareca strepera in the South of the Russian Far East

Authors: Konstantin S. Vainutis, Mark E. Andreev, Anastasia N. Voronova, Mikhail Yu. Shchelkanov

Abstract:

Introduction: The trematodes from the family Cyclocoelidae (cyclocoelids) belong to the superfamily Echinostomatoidea infecting air sacs and trachea of wild birds. At present, the family Cyclocoelidae comprises nine valid genera in three subfamilies: Cyclocoelinae (type taxon), Haematotrephinae, and Typhlocoelinae. To our best knowledge, in this study, molecular genetic methods were used for the first time for studying cyclocoelids from the Russian Far East. Here we provide the data on the morphology and phylogeny of cyclocoelids from gadwall from the Russian Far East. The morphological and genetic data obtained for cyclocoelids indicated the necessity to revise the previously proposed classification within the family Cyclocoelidae. Objectives: The first objective was performing the morphological study of cyclocoelids found in M. strepera from the Russian Far East. The second objective is to reconstruct the phylogenetic relationships of the studied trematodes with other cyclocoelids using the 28S gene. Material and methods: During the field studies in the Khasansky district of the Primorsky region, 21 cyclocoelids were recovered from the air sacs of a single gadwall Mareca strepera. Seven samples of cyclocoelids were overstained in alum carmine, dehydrated in a graded ethanol series, cleared in clove oil, and mounted in Canada balsam. Genomic DNA was extracted from four cyclocoelids using the alkaline lysis method HotShot. The 28S rDNA fragment was amplified using the forward primer Digl2 and the reverse primer 1500R. Results: According to morphological features (ovary intratesticular, forming a triangle with the testes), the studied worms belong to the subfamily Cyclocoelinae Stossich, 1902. In particular, the highest morphological similarity was observed in relation to the trematodes of the genus Cyclocoelum Brandes, 1892 – genital pores are pharyngeal. However, the genetic analysis has shown significant discrepancies between the trematodes studied regarding the genus Cyclocoelum. On the phylogenetic tree, these trematodes took the sister position in relation to the genus Morishitium (previously considered in the subfamily Szidatitrematinae). Conclusion: Based on the results of the morphological and genetic studies, cyclocoelids isolated from Mareca strepera are suggested to be described in the previously unknown genus and differentiated from the type genus Cyclocoelum of the type subfamily Cyclocoelinae. Considering the available molecular data, including described cyclocoelids, the family Cyclocoelidae comprises ten valid genera in the three subfamilies mentioned above.

Keywords: new species, trematoda, phylogeny, cyclocoelidae

Procedia PDF Downloads 827

24478 A Galectin from Rock Bream Oplegnathus fasciatus: Molecular Characterization and Immunological Properties

Authors: W. S. Thulasitha, N. Umasuthan, G. I. Godahewa, Jehee Lee

Abstract:

In fish, innate immune defense is the first immune response against microbial pathogens which consists of several antimicrobial components. Galectins are one of the carbohydrate binding lectins that have the ability to identify pathogen by recognition of pathogen associated molecular patterns. Galectins play a vital role in the regulation of innate and adaptive immune responses. Rock bream Oplegnathus fasciatus is one of the most important cultured species in Korea and Japan. Considering the losses due to microbial pathogens, present study was carried out to understand the molecular and functional characteristics of a galectin in normal and pathogenic conditions, which could help to establish an understanding about immunological components of rock bream. Complete cDNA of rock bream galectin like protein B (rbGal like B) was identified from the cDNA library, and the in silico analysis was carried out using bioinformatic tools. Genomic structure was derived from the BAC library by sequencing a specific clone and using Spidey. Full length of rbGal like B (contig14775) cDNA containing 517 nucleotides was identified from the cDNA library which comprised of 435 bp in the open reading frame encoding a deduced protein composed of 145 amino acids. The molecular mass of putative protein was predicted as 16.14 kDa with an isoelectric point of 8.55. A characteristic conserved galactose binding domain was located from 12 to 145 amino acids. Genomic structure of rbGal like B consisted of 4 exons and 3 introns. Moreover, pairwise alignment showed that rock bream rbGal like B shares highest similarity (95.9 %) and identity (91 %) with Takifugu rubripes galectin related protein B like and lowest similarity (55.5 %) and identity (32.4 %) with Homo sapiens. Multiple sequence alignment demonstrated that the galectin related protein B was conserved among vertebrates. A phylogenetic analysis revealed that rbGal like B protein clustered together with other fish homologs in fish clade. It showed closer evolutionary link with Takifugu rubripes. Tissue distribution and expression patterns of rbGal like B upon immune challenges were performed using qRT-PCR assays. Among all tested tissues, level of rbGal like B expression was significantly high in gill tissue followed by kidney, intestine, heart and spleen. Upon immune challenges, it showed an up-regulated pattern of expression with Edwardsiella tarda, rock bream irido virus and poly I:C up to 6 h post injection and up to 24 h with LPS. However, In the presence of Streptococcus iniae rbGal like B showed an up and down pattern of expression with the peak at 6 - 12 h. Results from the present study revealed the phylogenetic position and role of rbGal like B in response to microbial infection in rock bream.

Keywords: galectin like protein B, immune response, Oplegnathus fasciatus, molecular characterization

Procedia PDF Downloads 336

24477 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 457

24476 Atomic Force Microscopy Studies of DNA Binding Properties of the Archaeal Mini Chromosome Maintenance Complex

Authors: Amna Abdalla Mohammed Khalid, Pietro Parisse, Silvia Onesti, Loredana Casalis

Abstract:

Basic cellular processes as DNA replication are crucial to cell life. Understanding at the molecular level the mechanisms that govern DNA replication in proliferating cells is fundamental to understand disease connected to genomic instabilities, as a genetic disease and cancer. A key step for DNA replication to take place, is unwinding the DNA double helix and this carried out by proteins called helicases. The archaeal MCM (minichromosome maintenance) complex from Methanothermobacter thermautotrophicus have being studied using Atomic Force Microscopy (AFM), imaging in air and liquid (Physiological environment). The accurate analysis of AFM topographic images allowed to understand the static conformations as well the interaction dynamic of MCM and DNA double helix in the present of ATP.

Keywords: DNA, protein-DNA interaction, MCM (mini chromosome manteinance) complex, atomic force microscopy (AFM)

Procedia PDF Downloads 292

24475 Production of Recombinant VP2 Protein of Canine Parvovirus 2a Using Baculovirus Expression System

Authors: Soo Dong Cho, In-Ohk Ouh, Byeong Sul Kang, Seyeon Park, In-Soo Cho, Jae Young Song

Abstract:

An VP2 gene from the current prevalent CPV (Canine Parvovirus) strain (new CPV-2a) in the Republic of Korea was expressed in a baculovirus expression system. Genomic DNA was extracted from the isolate strain CPV-2a. The recombinant baculovirus, containing the coding sequences of VP2 with the histidine tag at the N-terminus, were generated by using the Bac-to-Bac system. For production of the recombinant VP2 proteins, SF9 cells were transfection into 6 wells. Propagation of recombinant baculoviruses and expression of the VP2 protein were performed in the Sf9 cell line maintained. The proteins were detected to Western blot anlaysis. CPV-2a VP2 was detected by Western blotting the monoclonal antibodies recognized 6x His and the band had a molecular weight of 65 KDa. We demonstrated that recombinant CPV-2a VP2 expression in baculovirus. The recombinant CPV-2a VP2 may able to development of specific diagnostic test and vaccination of against CPV2. This study provides a foundation for application of CPV2 on the development of new CPV2 subunit vaccine.

Keywords: baculovirus, canine parvovirus 2a, Dog, Korea

Procedia PDF Downloads 225

24474 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 551

24473 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 321