Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25280

Search results for: whole exome sequencing data

24980 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 105

24979 A Systematic Review on Challenges in Big Data Environment

Authors: Rimmy Yadav, Anmol Preet Kaur

Abstract:

Big Data has demonstrated the vast potential in streamlining, deciding, spotting business drifts in different fields, for example, producing, fund, Information Technology. This paper gives a multi-disciplinary diagram of the research issues in enormous information and its procedures, instruments, and system identified with the privacy, data storage management, network and energy utilization, adaptation to non-critical failure and information representations. Other than this, result difficulties and openings accessible in this Big Data platform have made.

Keywords: big data, privacy, data management, network and energy consumption

Procedia PDF Downloads 306

24978 Survey on Big Data Stream Classification by Decision Tree

Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: big data, data streams, classification, decision tree

Procedia PDF Downloads 517

24977 The Taxonomic and Functional Diversity in Edaphic Microbial Communities from Antarctic Dry Valleys

Authors: Sean T. S. Wei, Joy D. Van Nostrand, Annapoorna Maitrayee Ganeshram, Stephen B. Pointing

Abstract:

McMurdo Dry Valleys are a largely ice-free polar desert protected by international treaty as an Antarctic special managed area. The terrestrial landscape is dominated by oligotrophic mineral soil with extensive rocky outcrops. Several environmental stresses: low temperature, lack of liquid water, UV exposure and oligotrophic substrates, restrict the major biotic component to microorganisms. The bacterial diversity and the putative physiological capacity of microbial communities of quartz rocks (hypoliths) and soil of a maritime-influenced Dry Valleys were interrogated by two metagenomic approaches: 454 pyro-sequencing and Geochp DNA microarray. The most abundant phylum in hypoliths was Cyanobacteria (46%), whereas in solils Actinobacteria (31%) were most abundant. The Proteobacteria and Bacteriodetes were the only other phyla to comprise >10% of both communities. Carbon fixation was indicated by photoautotrophic and chemoautotrophic pathways for both hypolith and soil communities. The fungi accounted for polymer carbon transformations, particularly for aromatic compounds. The complete nitrogen cycling was observed in both communities. The fungi in particular displayed pathways related to ammonification. Environmental stress response pathways were common among bacteria, whereas the nutrient stress response pathways were more widely present in bacteria, archaea and fungi. The diversity of bacterialphage was also surveyed by Geochip. Data suggested that different substrates supported different viral families: Leviviridae, Myoviridae, Podoviridae and Siphoviridiae were ubiquitous. However, Corticoviridae and Microviridae only occurred in wetter soils.

Keywords: Antarctica, hypolith, soil, dry valleys, geochip, functional diversity, stress response

Procedia PDF Downloads 446

24976 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication

Authors: Aishwarya Shekhar, Himanshu Sharma

Abstract:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.

Keywords: confidentiality, deduplication, data compression, hybridity of cloud

Procedia PDF Downloads 377

24975 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.

Abstract:

Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 437

24974 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption

Procedia PDF Downloads 180

24973 Assessment of Efficiency of Underwater Undulatory Swimming Strategies Using a Two-Dimensional CFD Method

Authors: Dorian Audot, Isobel Margaret Thompson, Dominic Hudson, Joseph Banks, Martin Warner

Abstract:

In competitive swimming, after dives and turns, athletes perform underwater undulatory swimming (UUS), copying marine mammals’ method of locomotion. The body, performing this wave-like motion, accelerates the fluid downstream in its vicinity, generating propulsion with minimal resistance. Through this technique, swimmers can maintain greater speeds than surface swimming and take advantage of the overspeed granted by the dive (or push-off). Almost all previous work has considered UUS when performed at maximum effort. Critical parameters to maximize UUS speed are frequently discussed; however, this does not apply to most races. In only 3 out of the 16 individual competitive swimming events are athletes likely to attempt to perform UUS with the greatest speed, without thinking of the cost of locomotion. In the other cases, athletes will want to control the speed of their underwater swimming, attempting to maximise speed whilst considering energy expenditure appropriate to the duration of the event. Hence, there is a need to understand how swimmers adapt their underwater strategies to optimize the speed within the allocated energetic cost. This paper develops a consistent methodology that enables different sets of UUS kinematics to be investigated. These may have different propulsive efficiencies and force generation mechanisms (e.g.: force distribution along with the body and force magnitude). The developed methodology, therefore, needs to: (i) provide an understanding of the UUS propulsive mechanisms at different speeds, (ii) investigate the key performance parameters when UUS is not performed solely for maximizing speed; (iii) consistently determine the propulsive efficiency of a UUS technique. The methodology is separated into two distinct parts: kinematic data acquisition and computational fluid dynamics (CFD) analysis. For the kinematic acquisition, the position of several joints along the body and their sequencing were either obtained by video digitization or by underwater motion capture (Qualisys system). During data acquisition, the swimmers were asked to perform UUS at a constant depth in a prone position (facing the bottom of the pool) at different speeds: maximum effort, 100m pace, 200m pace and 400m pace. The kinematic data were input to a CFD algorithm employing a two-dimensional Large Eddy Simulation (LES). The algorithm adopted was specifically developed in order to perform quick unsteady simulations of deforming bodies and is therefore suitable for swimmers performing UUS. Despite its approximations, the algorithm is applied such that simulations are performed with the inflow velocity updated at every time step. It also enables calculations of the resistive forces (total and applied to each segment) and the power input of the modeled swimmer. Validation of the methodology is achieved by comparing the data obtained from the computations with the original data (e.g.: sustained swimming speed). This method is applied to the different kinematic datasets and provides data on swimmers’ natural responses to pacing instructions. The results show how kinematics affect force generation mechanisms and hence how the propulsive efficiency of UUS varies for different race strategies.

Keywords: CFD, efficiency, human swimming, hydrodynamics, underwater undulatory swimming

Procedia PDF Downloads 215

24972 Genetic Polymorphism in the Vitamin D Receptor Gene and 25-Hydroxyvitamin D Serum Levels in East Indian Women with Polycystic Ovary Syndrome

Authors: Dipanshu Sur, Ratnabali Chakravorty

Abstract:

Background: Polycystic ovary syndrome (PCOS) is the most common metabolic abnormality such as changes in lipid profile, diabetes, hypertension and metabolic syndrome occurring in young women of reproductive age. Low vitamin D levels were found to be associated with the development of obesity and insulin resistance in women with PCOS. Variants on vitamin D receptor (VDR) gene have also been related to metabolic comorbidities in general population. Aim: The aim of this case-control study was to investigate whether the VDR gene polymorphisms are associated with susceptibility to PCOS. Methods: Women with PCOS and a control group, all aged 16-40 years, were enrolled. Genotyping of VDR Fok-I (rs2228570), VDR Apa-I (rs7975232) as well as GC (rs2282679), DHCR7 (rs12785878) SNPs between groups were determined by using direct sequencing. Serum 25-hydroxyvitamin D [25(OH)] levels were measured by ELISA. Results: Mean serum 25(OH)D in the PCOS and control samples were 19.08±7 and 23.27±6.03 (p=0.048) which were significantly lower in PCOS patients compared with controls. CC genotype of the VDR Apa-I SNP was same frequent in PCOS (25.6%) and controls (25.6%) (OR: 0.9995; 95%CI: 0.528 to 1.8921; p= 0.9987). The CC genotype was also significantly associated with both lower E2 (p=0.031) and Androstenedione levels (p=0.062). We observed a significant association of GC polymorphism with 25(OH)D levels. PCOS women carrying the GG genotype (in GC genes) had significantly higher risk for vitamin D deficiency than women carrying the TT genotype. Conclusions: In conclusion, data from this study indicate that vitamin D levels are lower, and vitamin D deficiency more frequent, in PCOS than in controls. The present findings suggest that the Apa-I, Fok-I polymorphism of the VDR gene is associated with PCOS and seems to modulate ovarian steroid secretion. Further studies are needed to better clarify the biological mechanisms by which the polymorphism influences PCOS risk.

Keywords: vitamin D receptor, polymorphism, vitamin D, polycystic ovary syndrome

Procedia PDF Downloads 300

24971 The Various Legal Dimensions of Genomic Data

Authors: Amy Gooden

Abstract:

When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.

Keywords: artificial intelligence, data, law, genomics, rights

Procedia PDF Downloads 135

24970 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)

Procedia PDF Downloads 232

24969 An Emergence of Pinus taeda Needle Defoliation and Tree Mortality in Alabama, USA

Authors: Debit Datta, Jeffrey J. Coleman, Scott A. Enebak, Lori G. Eckhardt

Abstract:

Pinus taeda, commonly known as loblolly pine, is a crucial timber species native to the southeastern USA. An emerging problem has been encountered for the past few years, which is better to be known as loblolly pine needle defoliation (LPND), which is threatening the ecological health of southeastern forests and economic vitality of the region’s timber industry. Currently, more than 1000 hectares of loblolly plantations in Alabama are affected with similar symptoms and have created concern among southeast landowners and forest managers. However, it is still uncertain whether LPND results from one or the combination of several fungal pathogens. Therefore, the objectives of the study were to identify and characterize the fungi associated with LPND in the southeastern USA and document the damage being done to loblolly pine as a result of repeated defoliation. Identification of fungi was confirmed using classical morphological methods (microscopic examination of the infected needles), conventional and species-specific priming (SSPP) PCR, and ITS sequencing. To date, 17 species of fungi, either cultured from pine needles or formed fruiting bodies on pine needles, were identified based on morphology and genetic sequence data. Among them, brown-spot pathogen Lecanostica acicola has been frequently recovered from pine needles in both spring and summer. Moreover, Ophistomatoid fungi such as Leptographium procerum, L. terebrantis are associated with pine decline have also been recovered from root samples of the infected stands. Trees have been increasingly and repeatedly chlorotic and defoliated from 2019 to 2020. Based on morphological observations and molecular data, emerging loblolly pine needle defoliation is due in larger part to the brown-spot pathogen L. acoicola followed by pine decline pathogens L. procerum and L. terebrantis. Root pathogens were suspected to emerge later, and their cumulative effects contribute to the widespread mortality of the trees. It is more likely that longer wet spring and warmer temperatures are favorable to disease development and may be important in the disease ecology of LPND. Therefore, the outbreak of the disease is assumed to be expanded over a large geographical area in a changing climatic condition.

Keywords: brown-spot fungi, emerging disease, defoliation, loblolly pine

Procedia PDF Downloads 134

24968 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur

Abstract:

In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 586

24967 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring

Authors: Seung-Lock Seo

Abstract:

This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.

Keywords: data mining, process data, monitoring, safety, industrial processes

Procedia PDF Downloads 394

24966 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: biological ontology, linked data, semantic data integration, semantic web

Procedia PDF Downloads 446

24965 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 126

24964 An Improved Genetic Algorithm for Traveling Salesman Problem with Precedence Constraint

Authors: M. F. F. Ab Rashid, A. N. Mohd Rose, N. M. Z. Nik Mohamed, W. S. Wan Harun, S. A. Che Ghani

Abstract:

Traveling salesman problem with precedence constraint (TSPPC) is one of the most complex problems in combinatorial optimization. The existing algorithms to solve TSPPC cost large computational time to find the optimal solution. The purpose of this paper is to present an efficient genetic algorithm that guarantees optimal solution with less number of generations and iterations time. Unlike the existing algorithm that generates priority factor as chromosome, the proposed algorithm directly generates sequence of solution as chromosome. As a result, the proposed algorithm is capable of generating optimal solution with smaller number of generations and iteration time compare to existing algorithm.

Keywords: traveling salesman problem, sequencing, genetic algorithm, precedence constraint

Procedia PDF Downloads 554

24963 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault

Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola

Abstract:

Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.

Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula

Procedia PDF Downloads 74

24962 Searching SNPs Variants in Myod-1 and Myod-2 Genes Linked to Body Weight in Gilthead Seabream, Sparus aurata L.

Authors: G. Blanco-Lizana, C. García-Fernández, J. A. Sánchez

Abstract:

Growth is a productive trait regulated by a large and complex gene network with very different effect. Some of they (candidate genes) have a higher effect and are excellent resources to search in them polymorphisms correlated with differences in growth rates. This study was focused on the identification of single nucleotide polymorphism (SNP) in MyoD-1 and MyoD-2 genes, members of the family of myogenic regulatory genes with a key role in the differentiation and development of muscular tissue.(MFRs), and its evaluation as potential markers in genetic selection programs for growth in gilthead sea bream (Sparus aurata). Through a sequencing in 30 seabream (classified as unrelated by microsatellite markers) of 1.968bp in MyoD-1 gene [AF478568 .1] and 1.963bp in MyoD-2 gene [AF478569.1], three SNPs were identified in each gene (SaMyoD-1 D2100A (D indicate a deletion) SaMyoD-1 A2143G and SaMyoD-1 A2404G and SaMyoD-2_A785C, SaMyoD-2_C1982T and SaMyoD-2_A2031T). The relationships between SNPs and body weight were evaluated by SNP genotyping of 53 breeders from two broodstocks (A:18♀-9♂; B:16♀-10♂) and 389 offspring divided into two groups (slow- and fast-growth) with significant differences in growth at 18 months of development (A18Slow: N=107, A18Fast: N=103, B18Slow: N=92 and B18Fast: N=87) (Borrell et al., 2011). Haplotype and diplotype were reconstructed from genotype data by Phase 2.1 software. Differences among means of different diplotypes were calculated by one-way ANOVA followed by post-hoc Tukey test. Association analysis indicated that single SNP did not show significant effect on body weight. However, when the analysis is carried out considering haplotype data it was observed that the DGG haplotipe of MyoD-1 gen and CCA haplotipe of MyoD- 2gen were associated to with lower body weight. This haplotype combination always showed the lowest mean body weight (P<0.05) in three (A18Slow, A18Fast & B18Slow) of the four groups tested. Individuals with DGG haplotipe of MyoD-1 gen have a 25,5% and those with CCA haplotipe of MyoD- 2gen showed 14-18% less on mean body weight. Although further studies are need to validate the role of these 3 SNPs as marker for body weight, the polymorphism-trait association established in this work create promising expectations on the use of these variants as genetic tool for future giltead seabream breeding programs.

Keywords: growth, MyoD-1 and MyoD-2 genes, selective breeding, SNP-haplotype

Procedia PDF Downloads 326

24961 Novel Coprocessor for DNA Sequence Alignment in Resequencing Applications

Authors: Atef Ibrahim, Hamed Elsimary, Abdullah Aljumah, Fayez Gebali

Abstract:

This paper presents a novel semi-systolic array architecture for an optimized parallel sequence alignment algorithm. This architecture has the advantage that it can be modiﬁed to be reused for multiple pass processing in order to increase the number of processing elements that can be packed into a single FPGA and to increase the number of sequences that can be aligned in parallel in a single FPGA. This resolves the potential problem of many FPGA resources left unused for designs that have large values of short read length. When using the previously published conventional hardware design. FPGA implementation results show that, for large values of short read lengths (M>128), the proposed design has a slightly higher speed up and FPGA utilization over the the conventional one.

Keywords: bioinformatics, genome sequence alignment, re-sequencing applications, systolic array

Procedia PDF Downloads 526

24960 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name

Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing

Abstract:

Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.

Keywords: NDN, order-preserving encryption, fuzzy search, privacy

Procedia PDF Downloads 480

24959 Difference in Virulence Factor Genes Between Transient and Persistent Streptococcus Uberis Intramammary Infection in Dairy Cattle

Authors: Anyaphat Srithanasuwan, Noppason Pangprasit, Montira Intanon, Phongsakorn Chuammitri, Witaya Suriyasathaporn, Ynte H. Schukken

Abstract:

Streptococcus uberis is one of the most common mastitis-causing pathogens, with a wide range of intramammary infection (IMI) durations and pathogenicity. This study aimed to compare shared or unique virulence factor gene clusters distinguishing persistent and transient strains of S. uberis. A total of 139 S. uberis strains were isolated from three small-holder dairy herds with a high prevalence of S. uberis mastitis. The duration of IMI was used to categorize bacteria into two groups: transient and persistent strains with an IMI duration of less than 1 month and longer than 2 months, respectively. Six representative S. uberis strains, three from each group (transience and persistence) were selected for analysis. All transient strains exhibited multi-locus sequence types (MLST), indicating a highly diverse population of transient S. uberis. In contrast, MLST of persistent strains was available in an online database (pubMLST). Identification of virulence genes was performed using whole-genome sequencing (WGS) data. Differences in genomic size and number of virulent genes were found. For example, the BCA gene or alpha-c protein and the gene associated with capsule formation (hasAB), found in persistent strains, are important for attachment and invasion, as well as the evasion of the antimicrobial mechanisms and survival persistence, respectively. These findings suggest a genetic-level difference between the two strain types. Consequently, a comprehensive study of 139 S. uberis isolates will be conducted to perform an in-depth genetic assessment through WGS analysis on an Illumina platform.

Keywords: Streptococcus Uberis, mastitis, whole genome sequence, intramammary infection, persistent S. Uberis, transient s. Uberis

Procedia PDF Downloads 58

24958 Healthcare Big Data Analytics Using Hadoop

Authors: Chellammal Surianarayanan

Abstract:

Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.

Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare

Procedia PDF Downloads 409

24957 L. rhamnosus GG Lysate Can Inhibit Cytotoxic Effects of S. aureus on Keratinocytes in vitro

Authors: W. Mohammed Saeed, A. J. Mcbain, S. M. Cruickshank, C. A. O’Neill

Abstract:

In the gut, probiotics have been shown to protect epithelial cells from pathogenic bacteria through a number of mechanisms: 1-Increasing epithelial barrier function, 2-Modulation of the immune response especially innate immune response, 3-Inhibition of pathogen adherence and down regulation of virulence factors. Since probiotics have positive impacts on the gut, their potential effects on other body tissues, such as skin have begun to be investigated. The purpose of this project is to characterize the potential of probiotic bacteria lysate as therapeutic agent for preventing or reducing the S. aureus infection. Normal human primary keratinocytes (KCs) were exposed to S. aureus (106/ml) in the presence or absence of L. rhamnosus GG lysate (extracted from 108cfu/ml). The viability of the KCs was measured after 24 hours using a trypan blue exclusion assay. When KCs were treated with S aureus alone, only 25% of the KCs remained viable at 24 hours post infection. However, in the presence of L. rhamnosus GG lysate the viability of pathogen infected KCs increased to 58% (p=0.008, n=3). Furthermore, when KCs co-exposed, pre- exposed or post-exposed to L. rhamnosus GG lysate, the viability of the KCs increased to ≈60%, the L. rhamnosus GG lysate was afforded equal protection in different conditions. These data suggests that two possible separate mechanisms are involved in the protective effects of L. rhamnosus GG such as reducing S. aureus growth, or inhibiting of pathogenic adhesion. Interestingly, a lysate of L rhamnosus GG provided significant reduction in S. aureus growth and adhesion of S. aureus that being viable following 24 hours incubation with S aureus. Therefore, a series of Liquid Chromatography (RP-LC) methods were adopted to partially purify the lysate in combination with functional assays to elucidate in which fractions the efficacious molecules were contained. In addition, the Mass Spectrometry-based protein sequencing was used to identify putative proteins in the fractions. The data presented from purification process demonstrated that L. rhamnosus GG lysate has the potential to protect keratinocytes from the toxic effects of the skin pathogen, S. aureus. Three potential mechanisms were identified: inhibition of pathogen growth; competitive exclusion; and displacement of the pathogen from keratinocyte binding sites. In this study, ‘moonlight’ proteins were identified in the current study’s MS/MS data for L. rhamnosus GG lysate, which could elucidate the ability of lysate in the competitive exclusion and displacement of S. aureus from keratinocyte binding sites. Taken together, it can be speculated that L. rhamnosus GG lysate utilizes different mechanisms to protect keratinocytes from S. aureus toxicity. The present study indicates that the proteinaceous substances are involved in anti-adhesion activity. This is achieved by displacing the pathogen and preventing the severity of pathogen infection and the moonlight proteins might be involved in inhibiting the adhesion of pathogens.

Keywords: lysate, fractions, adhesion, L. rhamnosus GG, S. aureus toxicity

Procedia PDF Downloads 289

24956 Data Disorders in Healthcare Organizations: Symptoms, Diagnoses, and Treatments

Authors: Zakieh Piri, Shahla Damanabi, Peyman Rezaii Hachesoo

Abstract:

Introduction: Healthcare organizations like other organizations suffer from a number of disorders such as Business Sponsor Disorder, Business Acceptance Disorder, Cultural/Political Disorder, Data Disorder, etc. As quality in healthcare care mostly depends on the quality of data, we aimed to identify data disorders and its symptoms in two teaching hospitals. Methods: Using a self-constructed questionnaire, we asked 20 questions in related to quality and usability of patient data stored in patient records. Research population consisted of 150 managers, physicians, nurses, medical record staff who were working at the time of study. We also asked their views about the symptoms and treatments for any data disorders they mentioned in the questionnaire. Using qualitative methods we analyzed the answers. Results: After classifying the answers, we found six main data disorders: incomplete data, missed data, late data, blurred data, manipulated data, illegible data. The majority of participants believed in their important roles in treatment of data disorders while others believed in health system problems. Discussion: As clinicians have important roles in producing of data, they can easily identify symptoms and disorders of patient data. Health information managers can also play important roles in early detection of data disorders by proactively monitoring and periodic check-ups of data.

Keywords: data disorders, quality, healthcare, treatment

Procedia PDF Downloads 428

24955 Mapping Protein Selectivity Landscapes

Authors: Niv Papo

Abstract:

Characterizing the binding selectivity landscape of interacting proteins is crucial both for elucidating the underlying mechanisms of their interaction and for developing selective inhibitors. However, current mapping methods are laborious and cannot provide a sufficiently comprehensive description of the landscape. Here, we introduce a distinct and efficient strategy for comprehensively mapping the binding landscape of proteins using a combination of experimental multi-target selective library screening and in silico next-generation sequencing analysis. We map the binding landscape of a non-selective trypsin inhibitor, the amyloid protein precursor inhibitor (APPI), to each of four human serine proteases (kallikrein-6, mesotrypsin, and anionic and cationic trypsins). We then use this map to dissect and improve the affinity and selectivity of APPI variants toward each of the four proteases. Our strategy can be used as a platform for the development of a new generation of target-selective probes and therapeutic agents based on selective protein–protein interactions.

Keywords: drug design, directed evolution, protein engineering, protease inhibition.

Procedia PDF Downloads 13

24954 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines

Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay

Abstract:

One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.

Keywords: big data, data analytics, higher education, republic of the philippines, assessment

Procedia PDF Downloads 342

24953 Data Management and Analytics for Intelligent Grid

Authors: G. Julius P. Roy, Prateek Saxena, Sanjeev Singh

Abstract:

Power distribution utilities two decades ago would collect data from its customers not later than a period of at least one month. The origin of SmartGrid and AMI has subsequently increased the sampling frequency leading to 1000 to 10000 fold increase in data quantity. This increase is notable and this steered to coin the tern Big Data in utilities. Power distribution industry is one of the largest to handle huge and complex data for keeping history and also to turn the data in to significance. Majority of the utilities around the globe are adopting SmartGrid technologies as a mass implementation and are primarily focusing on strategic interdependence and synergies of the big data coming from new information sources like AMI and intelligent SCADA, there is a rising need for new models of data management and resurrected focus on analytics to dissect data into descriptive, predictive and dictatorial subsets. The goal of this paper is to is to bring load disaggregation into smart energy toolkit for commercial usage.

Keywords: data management, analytics, energy data analytics, smart grid, smart utilities

Procedia PDF Downloads 775

24952 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive

Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh

Abstract:

Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.

Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data

Procedia PDF Downloads 291

24951 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 211