Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5

Bootstrapping Related Abstracts

5 Air Pollution and Respiratory-Related Restricted Activity Days in Tunisia

Authors: Mokhtar Kouki Inès Rekik

Abstract:

This paper focuses on the assessment of the air pollution and morbidity relationship in Tunisia. Air pollution is measured by ozone air concentration and the morbidity is measured by the number of respiratory-related restricted activity days during the 2-week period prior to the interview. Socioeconomic data are also collected in order to adjust for any confounding covariates. Our sample is composed by 407 Tunisian respondents; 44.7% are women, the average age is 35.2, near 69% are living in a house built after the 1980, and 27.8% have reported at least one day of respiratory-related restricted activity. The model consists on the regression of the number of respiratory-related restricted activity days on the air quality measure and the socioeconomic covariates. In order to correct for zero-inflation and heterogeneity, we estimate several models (Poisson, Negative binomial, Zero inflated Poisson, Poisson hurdle, Negative binomial hurdle and finite mixture Poisson models). Bootstrapping and post-stratification techniques are used in order to correct for any sample bias. According to the Akaike information criteria, the hurdle negative binomial model has the greatest goodness of fit. The main result indicates that, after adjusting for socioeconomic data, the ozone concentration increases the probability of positive number of restricted activity days.

Keywords: Bootstrapping, hurdle negbin model, overdispersion, ozone concentration, respiratory-related restricted activity days

Procedia PDF Downloads 135
4 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption

Authors: John K. Alhassan, Idris Ismaila, Waziri Victor Onomza, Noel Dogonyaro Moses

Abstract:

This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute theoretical presentations in high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.

Keywords: Security, Privacy, Big data analytics, Bootstrapping, homomorphic, homomorphic encryption scheme

Procedia PDF Downloads 227
3 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encyption Scheme

Authors: Victor Onomza Waziri, John K. Alhassan, Idris Ismaila, Noel Dogonyara

Abstract:

This paper describes the problem of building secure computational services for encrypted information in the Cloud. Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy or confidentiality, availability and integrity of the data and user’s security. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute a theoretical presentations in a high-level computational processes that are based on number theory that is derivable from abstract algebra which can easily be integrated and leveraged in the Cloud computing interface with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based on cryptographic security algorithm.

Keywords: Security, Privacy, Big data analytics, Bootstrapping, Fully Homomorphic Encryption Scheme

Procedia PDF Downloads 308
2 Modelling Conceptual Quantities Using Support Vector Machines

Authors: Ka C. Lam, Oluwafunmibi S. Idowu

Abstract:

Uncertainty in cost is a major factor affecting performance of construction projects. To our knowledge, several conceptual cost models have been developed with varying degrees of accuracy. Incorporating conceptual quantities into conceptual cost models could improve the accuracy of early predesign cost estimates. Hence, the development of quantity models for estimating conceptual quantities of framed reinforced concrete structures using supervised machine learning is the aim of the current research. Using measured quantities of structural elements and design variables such as live loads and soil bearing pressures, response and predictor variables were defined and used for constructing conceptual quantities models. Twenty-four models were developed for comparison using a combination of non-parametric support vector regression, linear regression, and bootstrap resampling techniques. R programming language was used for data analysis and model implementation. Gross soil bearing pressure and gross floor loading were discovered to have a major influence on the quantities of concrete and reinforcement used for foundations. Building footprint and gross floor loading had a similar influence on beams and slabs. Future research could explore the modelling of other conceptual quantities for walls, finishes, and services using machine learning techniques. Estimation of conceptual quantities would assist construction planners in early resource planning and enable detailed performance evaluation of early cost predictions.

Keywords: Modelling, Reinforced Concrete, Bootstrapping, support vector regression, conceptual quantities

Procedia PDF Downloads 62
1 Phenotype Prediction of Dna Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: Next Generation Sequencing, Bootstrapping, k-mers, AWD-LSTM

Procedia PDF Downloads 1