Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 689

Search results for: GWAS multivariate

689 Multivariate Genome-Wide Association Studies for Identifying Additional Loci for Myopia

Authors: Qiao Fan, Xiaobo Guo, Junxian Zhu, Xiaohu Ding, Ching-Yu Cheng, Tien-Yin Wong, Mingguang He, Heping Zhang, Xueqin Wang

Abstract:

A systematic, simultaneous analysis of multiple phenotypes in genome-wide association studies (GWASs) draws a great attention to integrate the signals from single phenotypes with increased power. However, lacking an interpretable and efficient multivariate GWAS analysis impede the application of such approach. In this study, we propose to decompose the multivariate model into a series of simple univariate models. This transformation illuminates what exactly the individual trait contributes to the significant signals from the multivariate analyses. By employing our approach in the analysis of three myopia-related endophenotypes from the Singapore Malay Eye Study (SIMES), we identify novel candidate loci which were successfully validated in an independent Guangzhou Twin Eye Study (GTES).

Keywords: GWAS multivariate, multiple traits, myopia, association

Procedia PDF Downloads 225

688 Regression for Doubly Inflated Multivariate Poisson Distributions

Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta

Abstract:

Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.

Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios

Procedia PDF Downloads 156

687 Genome-Wide Significant SNPs Proximal to Nicotinic Receptor Genes Impact Cognition in Schizophrenia

Authors: Mohammad Ahangari

Abstract:

Schizophrenia is a psychiatric disorder with symptoms that include cognitive deficits and nicotine has been suggested to have an effect on cognition. In recent years, the advents of Genome-Wide Association Studies(GWAS) has evolved our understanding about the genetic causes of complex disorders such as schizophrenia and studying the role of genome-wide significant genes could potentially lead to the development of new therapeutic agents for treatment of cognitive deficits in schizophrenia. The current study identified six Single Nucleotide Polymorphisms (SNP) from schizophrenia and smoking GWAS that are located on or in close proximity to the nicotinic receptor gene cluster (CHRN) and studied their association with cognition in an Irish sample of 1297 cases and controls using linear regression analysis. Further on, the interaction between CHRN gene cluster and Dopamine receptor D2 gene (DRD2) during working memory was investigated. The effect of these polymorphisms on nicotinic and dopaminergic neurotransmission, which is disrupted in schizophrenia, have been characterized in terms of their effects on memory, attention, social cognition and IQ as measured by a neuropsychological test battery and significant effects in two polymorphisms were found across global IQ domain of the test battery.

Keywords: cognition, dopamine, GWAS, nicotine, schizophrenia, SNPs

Procedia PDF Downloads 346

686 The Modality of Multivariate Skew Normal Mixture

Authors: Bader Alruwaili, Surajit Ray

Abstract:

Finite mixtures are a flexible and powerful tool that can be used for univariate and multivariate distributions, and a wide range of research analysis has been conducted based on the multivariate normal mixture and multivariate of a t-mixture. Determining the number of modes is an important activity that, in turn, allows one to determine the number of homogeneous groups in a population. Our work currently being carried out relates to the study of the modality of the skew normal distribution in the univariate and multivariate cases. For the skew normal distribution, the aims are associated with studying the modality of the skew normal distribution and providing the ridgeline, the ridgeline elevation function, the $\Pi$ function, and the curvature function, and this will be conducive to an exploration of the number and location of mode when mixing the two components of skew normal distribution. The subsequent objective is to apply these results to the application of real world data sets, such as flow cytometry data.

Keywords: mode, modality, multivariate skew normal, finite mixture, number of mode

Procedia PDF Downloads 490

685 Generalized Correlation Coefficient in Genome-Wide Association Analysis of Cognitive Ability in Twins

Authors: Afsaneh Mohammadnejad, Marianne Nygaard, Jan Baumbach, Shuxia Li, Weilong Li, Jesper Lund, Jacob v. B. Hjelmborg, Lene Christensen, Qihua Tan

Abstract:

Cognitive impairment in the elderly is a key issue affecting the quality of life. Despite a strong genetic background in cognition, only a limited number of single nucleotide polymorphisms (SNPs) have been found. These explain a small proportion of the genetic component of cognitive function, thus leaving a large proportion unaccounted for. We hypothesize that one reason for this missing heritability is the misspecified modeling in data analysis concerning phenotype distribution as well as the relationship between SNP dosage and the phenotype of interest. In an attempt to overcome these issues, we introduced a model-free method based on the generalized correlation coefficient (GCC) in a genome-wide association study (GWAS) of cognitive function in twin samples and compared its performance with two popular linear regression models. The GCC-based GWAS identified two genome-wide significant (P-value < 5e-8) SNPs; rs2904650 near ZDHHC2 on chromosome 8 and rs111256489 near CD6 on chromosome 11. The kinship model also detected two genome-wide significant SNPs, rs112169253 on chromosome 4 and rs17417920 on chromosome 7, whereas no genome-wide significant SNPs were found by the linear mixed model (LME). Compared to the linear models, more meaningful biological pathways like GABA receptor activation, ion channel transport, neuroactive ligand-receptor interaction, and the renin-angiotensin system were found to be enriched by SNPs from GCC. The GCC model outperformed the linear regression models by identifying more genome-wide significant genetic variants and more meaningful biological pathways related to cognitive function. Moreover, GCC-based GWAS was robust in handling genetically related twin samples, which is an important feature in handling genetic confounding in association studies.

Keywords: cognition, generalized correlation coefficient, GWAS, twins

Procedia PDF Downloads 127

684 Elucidating the Genetic Determinism of Seed Protein Plasticity in Response to the Environment Using Medicago truncatula

Authors: K. Cartelier, D. Aime, V. Vernoud, J. Buitink, J. M. Prosperi, K. Gallardo, C. Le Signor

Abstract:

Legumes can produce protein-rich seeds without nitrogen fertilizer through root symbiosis with nitrogen-fixing rhizobia. Rich in lysine, these proteins are used for human nutrition and animal feed. However, the instability of seed protein yield and quality due to environmental fluctuations limits the wider use of legumes such as pea. Breeding efforts are needed to optimize and stabilize seed nutritional value, which requires to identify the genetic determinism of seed protein plasticity in response to the environment. Towards this goal, we have studied the plasticity of protein content and composition of seeds from a collection of 200 Medicago truncatula ecotypes grown under four controlled conditions (optimal, drought, and winter/spring sowing). A quantitative analysis of one-dimensional protein profiles of these mature seeds was performed and plasticity indices were calculated from each abundant protein band. Genome-Wide Association Studies (GWAS) from these data identified major GWAS hotspots, from which a list of candidate genes was obtained. A Gene Ontology Enrichment Analysis revealed an over-representation of genes involved in several amino acid metabolic pathways. This led us to propose that environmental variations are likely to modulate amino acid balance, thus impacting seed protein composition. The selection of candidate genes for controlling the plasticity of seed protein composition was refined using transcriptomics data from developing Medicago truncatula seeds. The pea orthologs of key genes were identified for functional studies by mean of TILLING (Targeting Induced Local Lesions in Genomes) lines in this crop. We will present how this study highlighted mechanisms that could govern seed protein plasticity, providing new cues towards the stabilization of legume seed quality.

Keywords: GWAS, Medicago truncatula, plasticity, seed, storage proteins

Procedia PDF Downloads 142

683 An AK-Chart for the Non-Normal Data

Authors: Chia-Hau Liu, Tai-Yue Wang

Abstract:

Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.

Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data

Procedia PDF Downloads 423

682 Multivariate Analysis of Spectroscopic Data for Agriculture Applications

Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman

Abstract:

In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.

Keywords: Brown rot disease, NIR spectroscopy, potato, random forest

Procedia PDF Downloads 190

681 Genome-Wide Association Study Identify COL2A1 as a Susceptibility Gene for the Hand Development Failure of Kashin-Beck Disease

Authors: Feng Zhang

Abstract:

Kashin-Beck disease (KBD) is a chronic osteochondropathy. The mechanism of hand growth and development failure of KBD remains elusive now. In this study, we conducted a two-stage genome-wide association study (GWAS) of palmar length-width ratio (LWR) of KBD, totally involving 493 Chinese Han KBD patients. Affymetrix Genome Wide Human SNP Array 6.0 was applied for SNP genotyping. Association analysis was conducted by PLINK software. Imputation analysis was performed by IMPUTE against the reference panel of the 1000 genome project. In the GWAS, the most significant association was observed between palmar LWR and rs2071358 of COL2A1 gene (P value = 4.68×10-8). Imputation analysis identified 3 SNPs surrounding rs2071358 with significant or suggestive association signals. Replication study observed additional significant association signals at both rs2071358 (P value = 0.017) and rs4760608 (P value = 0.002) of COL2A1 gene after Bonferroni correction. Our results suggest that COL2A1 gene was a novel susceptibility gene involved in the growth and development failure of hand of KBD.

Keywords: Kashin-Beck disease, genome-wide association study, COL2A1, hand

Procedia PDF Downloads 220

680 The Moment of the Optimal Average Length of the Multivariate Exponentially Weighted Moving Average Control Chart for Equally Correlated Variables

Authors: Edokpa Idemudia Waziri, Salisu S. Umar

Abstract:

The Hotellng’s T^2 is a well-known statistic for detecting a shift in the mean vector of a multivariate normal distribution. Control charts based on T have been widely used in statistical process control for monitoring a multivariate process. Although it is a powerful tool, the T statistic is deficient when the shift to be detected in the mean vector of a multivariate process is small and consistent. The Multivariate Exponentially Weighted Moving Average (MEWMA) control chart is one of the control statistics used to overcome the drawback of the Hotellng’s T statistic. In this paper, the probability distribution of the Average Run Length (ARL) of the MEWMA control chart when the quality characteristics exhibit substantial cross correlation and when the process is in-control and out-of-control was derived using the Markov Chain algorithm. The derivation of the probability functions and the moments of the run length distribution were also obtained and they were consistent with some existing results for the in-control and out-of-control situation. By simulation process, the procedure identified a class of ARL for the MEWMA control when the process is in-control and out-of-control. From our study, it was observed that the MEWMA scheme is quite adequate for detecting a small shift and a good way to improve the quality of goods and services in a multivariate situation. It was also observed that as the in-control average run length ARL0¬ or the number of variables (p) increases, the optimum value of the ARL0pt increases asymptotically and as the magnitude of the shift σ increases, the optimal ARLopt decreases. Finally, we use the example from the literature to illustrate our method and demonstrate its efficiency.

Keywords: average run length, markov chain, multivariate exponentially weighted moving average, optimal smoothing parameter

Procedia PDF Downloads 422

679 A Cohort and Empirical Based Multivariate Mortality Model

Authors: Jeffrey Tzu-Hao Tsai, Yi-Shan Wong

Abstract:

This article proposes a cohort-age-period (CAP) model to characterize multi-population mortality processes using cohort, age, and period variables. Distinct from the factor-based Lee-Carter-type decomposition mortality model, this approach is empirically based and includes the age, period, and cohort variables into the equation system. The model not only provides a fruitful intuition for explaining multivariate mortality change rates but also has a better performance in forecasting future patterns. Using the US and the UK mortality data and performing ten-year out-of-sample tests, our approach shows smaller mean square errors in both countries compared to the models in the literature.

Keywords: longevity risk, stochastic mortality model, multivariate mortality rate, risk management

Procedia PDF Downloads 56

678 Introduction of Robust Multivariate Process Capability Indices

Authors: Behrooz Khalilloo, Hamid Shahriari, Emad Roghanian

Abstract:

Process capability indices (PCIs) are important concepts of statistical quality control and measure the capability of processes and how much processes are meeting certain specifications. An important issue in statistical quality control is parameter estimation. Under the assumption of multivariate normality, the distribution parameters, mean vector and variance-covariance matrix must be estimated, when they are unknown. Classic estimation methods like method of moment estimation (MME) or maximum likelihood estimation (MLE) makes good estimation of the population parameters when data are not contaminated. But when outliers exist in the data, MME and MLE make weak estimators of the population parameters. So we need some estimators which have good estimation in the presence of outliers. In this work robust M-estimators for estimating these parameters are used and based on robust parameter estimators, robust process capability indices are introduced. The performances of these robust estimators in the presence of outliers and their effects on process capability indices are evaluated by real and simulated multivariate data. The results indicate that the proposed robust capability indices perform much better than the existing process capability indices.

Keywords: multivariate process capability indices, robust M-estimator, outlier, multivariate quality control, statistical quality control

Procedia PDF Downloads 284

677 A Non-parametric Clustering Approach for Multivariate Geostatistical Data

Authors: Francky Fouedjio

Abstract:

Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

Keywords: clustering, geostatistics, multivariate data, non-parametric

Procedia PDF Downloads 477

676 On the Bootstrap P-Value Method in Identifying out of Control Signals in Multivariate Control Chart

Authors: O. Ikpotokin

Abstract:

In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.

Keywords: bootstrap control limit, p-value method, out-of-control signals, p-value, quality characteristics

Procedia PDF Downloads 348

675 The Use of Boosted Multivariate Trees in Medical Decision-Making for Repeated Measurements

Authors: Ebru Turgal, Beyza Doganay Erdogan

Abstract:

Machine learning aims to model the relationship between the response and features. Medical decision-making researchers would like to make decisions about patients’ course and treatment, by examining the repeated measurements over time. Boosting approach is now being used in machine learning area for these aims as an influential tool. The aim of this study is to show the usage of multivariate tree boosting in this field. The main reason for utilizing this approach in the field of decision-making is the ease solutions of complex relationships. To show how multivariate tree boosting method can be used to identify important features and feature-time interaction, we used the data, which was collected retrospectively from Ankara University Chest Diseases Department records. Dataset includes repeated PF ratio measurements. The follow-up time is planned for 120 hours. A set of different models is tested. In conclusion, main idea of classification with weighed combination of classifiers is a reliable method which was shown with simulations several times. Furthermore, time varying variables will be taken into consideration within this concept and it could be possible to make accurate decisions about regression and survival problems.

Keywords: boosted multivariate trees, longitudinal data, multivariate regression tree, panel data

Procedia PDF Downloads 203

674 From Genome to Field: Applying Genome Wide Association Study for Sustainable Ascochyta Blight Management in Faba Beans

Authors: Rabia Faridi, Rizwana Maqbool, Umara Sahar Rana, Zaheer Ahmad

Abstract:

Climate change impacts agriculture, notably in Germany, where spring faba beans predominate. However, improved winter hardiness aligns with milder winters, enabling autumn-sown varieties. Genetic resistance to Ascochyta blight is vital for crop integration. Traditional breeding faces challenges due to complex inheritance. This study assessed 224 homozygous faba bean lines for Ascochyta resistance traits. To achieve h²>70%, 12 replicates were required (realized h²=87%). Genetic variation and strong trait correlations were observed. Five lines outperformed 29H, while three were highly susceptible. A genome-wide association study (GWAS) with 188 inbred lines and 2058 markers, including 17 guide SNP markers, identified 12 markers associated with resistance traits, potentially indicating new resistance genes. One guide marker (Vf-Mt1g014230-001) on chromosome III validated a known QTL. The guided marker approach complemented GWAS, facilitating marker-assisted selection for Ascochyta resistance. The Göttingen Winter Bean Population offers promise for resistance breeding.

Keywords: genome wide association studies, marker assisted breeding, faba bean, ascochyta blight

Procedia PDF Downloads 60

673 Analyzing the Influence of Hydrometeorlogical Extremes, Geological Setting, and Social Demographic on Public Health

Authors: Irfan Ahmad Afip

Abstract:

This main research objective is to accurately identify the possibility for a Leptospirosis outbreak severity of a certain area based on its input features into a multivariate regression model. The research question is the possibility of an outbreak in a specific area being influenced by this feature, such as social demographics and hydrometeorological extremes. If the occurrence of an outbreak is being subjected to these features, then the epidemic severity for an area will be different depending on its environmental setting because the features will influence the possibility and severity of an outbreak. Specifically, this research objective was three-fold, namely: (a) to identify the relevant multivariate features and visualize the patterns data, (b) to develop a multivariate regression model based from the selected features and determine the possibility for Leptospirosis outbreak in an area, and (c) to compare the predictive ability of multivariate regression model and machine learning algorithms. Several secondary data features were collected locations in the state of Negeri Sembilan, Malaysia, based on the possibility it would be relevant to determine the outbreak severity in the area. The relevant features then will become an input in a multivariate regression model; a linear regression model is a simple and quick solution for creating prognostic capabilities. A multivariate regression model has proven more precise prognostic capabilities than univariate models. The expected outcome from this research is to establish a correlation between the features of social demographic and hydrometeorological with Leptospirosis bacteria; it will also become a contributor for understanding the underlying relationship between the pathogen and the ecosystem. The relationship established can be beneficial for the health department or urban planner to inspect and prepare for future outcomes in event detection and system health monitoring.

Keywords: geographical information system, hydrometeorological, leptospirosis, multivariate regression

Procedia PDF Downloads 117

672 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 131

671 Multi-scale Spatial and Unified Temporal Feature-fusion Network for Multivariate Time Series Anomaly Detection

Authors: Hang Yang, Jichao Li, Kewei Yang, Tianyang Lei

Abstract:

Multivariate time series anomaly detection is a significant research topic in the field of data mining, encompassing a wide range of applications across various industrial sectors such as traffic roads, financial logistics, and corporate production. The inherent spatial dependencies and temporal characteristics present in multivariate time series introduce challenges to the anomaly detection task. Previous studies have typically been based on the assumption that all variables belong to the same spatial hierarchy, neglecting the multi-level spatial relationships. To address this challenge, this paper proposes a multi-scale spatial and unified temporal feature fusion network, denoted as MSUT-Net, for multivariate time series anomaly detection. The proposed model employs a multi-level modeling approach, incorporating both temporal and spatial modules. The spatial module is designed to capture the spatial characteristics of multivariate time series data, utilizing an adaptive graph structure learning model to identify the multi-level spatial relationships between data variables and their attributes. The temporal module consists of a unified temporal processing module, which is tasked with capturing the temporal features of multivariate time series. This module is capable of simultaneously identifying temporal dependencies among different variables. Extensive testing on multiple publicly available datasets confirms that MSUT-Net achieves superior performance on the majority of datasets. Our method is able to model and accurately detect systems data with multi-level spatial relationships from a spatial-temporal perspective, providing a novel perspective for anomaly detection analysis.

Keywords: data mining, industrial system, multivariate time series, anomaly detection

Procedia PDF Downloads 17

670 Large-scale GWAS Investigating Genetic Contributions to Queerness Will Decrease Stigma Against LGBTQ+ Communities

Authors: Paul J. McKay

Abstract:

Large-scale genome-wide association studies (GWAS) investigating genetic contributions to sexual orientation and gender identity are largely lacking and may reduce stigma experienced in the LGBTQ+ community by providing an underlying biological explanation for queerness. While there is a growing consensus within the scientific community that genetic makeup contributes – at least in part – to sexual orientation and gender identity, there is a marked lack of genomics research exploring polygenic contributions to queerness. Based on recent (2019) findings from a large-scale GWAS investigating the genetic architecture of same-sex sexual behavior, and various additional peer-reviewed publications detailing novel insights into the molecular mechanisms of sexual orientation and gender identity, we hypothesize that sexual orientation and gender identity are complex, multifactorial, and polygenic; meaning that many genetic factors contribute to these phenomena, and environmental factors play a possible role through epigenetic modulation. In recent years, large-scale GWAS studies have been paramount to our modern understanding of many other complex human traits, such as in the case of autism spectrum disorder (ASD). Despite possible benefits of such research, including reduced stigma towards queer people, improved outcomes for LGBTQ+ in familial, socio-cultural, and political contexts, and improved access to healthcare (particularly for trans populations); important risks and considerations remain surrounding this type of research. To mitigate possibilities such as invalidation of the queer identities of existing LGBTQ+ individuals, genetic discrimination, or the possibility of euthanasia of embryos with a genetic predisposition to queerness (through reproductive technologies like IVF and/or gene-editing in utero), we propose a community-engaged research (CER) framework which emphasizes the privacy and confidentiality of research participants. Importantly, the historical legacy of scientific research attempting to pathologize queerness (in particular, falsely equating gender variance to mental illness) must be acknowledged to ensure any future research conducted in this realm does not propagate notions of homophobia, transphobia or stigma against queer people. Ultimately, in a world where same-sex sexual activity is criminalized in 69 UN member states, with 67 of these states imposing imprisonment, 8 imposing public flogging, 6 (Brunei, Iran, Mauritania, Nigeria, Saudi Arabia, Yemen) invoking the death penalty, and another 5 (Afghanistan, Pakistan, Qatar, Somalia, United Arab Emirates) possibly invoking the death penalty, the importance of this research cannot be understated, as finding a biological basis for queerness would directly oppose the harmful rhetoric that “being LGBTQ+ is a choice.” Anti-trans legislation is similarly widespread: In the United States in 2022 alone (as of Oct. 13), 155 anti-trans bills have been introduced preventing trans girls and women from playing on female sports teams, barring trans youth from using bathrooms and locker rooms that align with their gender identity, banning access to gender affirming medical care (e.g., hormone-replacement therapy, gender-affirming surgeries), and imposing legal restrictions on name changes. Understanding that a general lack of knowledge about the biological basis of queerness may be a contributing factor to the societal stigma faced by gender and sexual orientation minorities, we propose the initiation of large-scale GWAS studies investigating the genetic basis of gender identity and sexual orientation.

Keywords: genome-wide association studies (GWAS), sexual and gender minorities (SGM), polygenicity, community-engaged research (CER)

Procedia PDF Downloads 70

669 Multivariate Dependent Frequency-Severity Modeling of Insurance Claims: A Vine Copula Approach

Authors: Islem Kedidi, Rihab Bedoui Bensalem, Faysal Manssouri

Abstract:

In traditional models of insurance data, the number and size of claims are assumed to be independent. Relaxing the independence assumption, this article explores the Vine copula to model dependence structure between multivariate frequency and average severity of insurance claim. To illustrate this approach, we use the Wisconsin local government property insurance fund which offers several insurance protections for motor vehicles, property and contractor’s equipment claims. Results show that the C-vine copula can better characterize the multivariate dependence structure between frequency and severity. Furthermore, we find significant dependencies especially between frequency and average severity among different coverage types.

Keywords: dependency modeling, government insurance, insurance claims, vine copula

Procedia PDF Downloads 210

668 On the Impact of Oil Price Fluctuations on Stock Markets: A Multivariate Long-Memory GARCH Framework

Authors: Manel Youssef, Lotfi Belkacem

Abstract:

This paper employs multivariate long memory GARCH models to simultaneously estimate mean and conditional variance spillover effects between oil prices and different financial markets. Since different financial assets are traded based on these market sector returns, it’s important for financial market participants to understand the volatility transmission mechanism over time and across these series in order to make optimal portfolio allocation decisions. We examine weekly returns from January 1, 2003 to November 30, 2012 and find evidence of significant transmission of shocks and volatilities between oil prices and some of the examined financial markets. The findings support the idea of cross-market hedging and sharing of common information by investors.

Keywords: oil prices, stock indices returns, oil volatility, contagion, DCC-multivariate (FI) GARCH

Procedia PDF Downloads 534

667 Model of Optimal Centroids Approach for Multivariate Data Classification

Authors: Pham Van Nha, Le Cam Binh

Abstract:

Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm. PSO was inspired by the natural behavior of birds and fish in migration and foraging for food. PSO is considered as a multidisciplinary optimization model that can be applied in various optimization problems. PSO’s ideas are simple and easy to understand but PSO is only applied in simple model problems. We think that in order to expand the applicability of PSO in complex problems, PSO should be described more explicitly in the form of a mathematical model. In this paper, we represent PSO in a mathematical model and apply in the multivariate data classification. First, PSOs general mathematical model (MPSO) is analyzed as a universal optimization model. Then, Model of Optimal Centroids (MOC) is proposed for the multivariate data classification. Experiments were conducted on some benchmark data sets to prove the effectiveness of MOC compared with several proposed schemes.

Keywords: analysis of optimization, artificial intelligence based optimization, optimization for learning and data analysis, global optimization

Procedia PDF Downloads 209

666 Genome-Wide Analysis Identifies Locus Associated with Parathyroid Hormone Levels

Authors: Antonela Matana, Dubravka Brdar, Vesela Torlak, Marijana Popovic, Ivana Gunjaca, Ozren Polasek, Vesna Boraska Perica, Maja Barbalic, Ante Punda, Caroline Hayward, Tatijana Zemunik

Abstract:

Parathyroid hormone (PTH) plays a critical role in the regulation of bone mineral metabolism and calcium homeostasis. Higher PTH levels are associated with heart failure, hypertension, coronary artery disease, cardiovascular mortality and poorer bone health. A twin study estimated that 60% of the variation in PTH concentrations is genetically determined. Only one GWAS of PTH concentration has been reported to date. Identified loci explained 4.5% of the variance in circulating PTH, suggesting that additional genetic variants remain undiscovered. Therefore, the aim of this study was to identify novel genetic variants associated with PTH levels in a general population. We have performed a GWAS meta-analysis on 2596 individuals originating from three Croatian cohorts: City of Split and the Islands of Korčula and Vis, within a large-scale project of “10,001 Dalmatians”. A total of 7 411 206 variants, imputed using the 1000 Genomes reference panel, with minor allele frequency ≥ 1% and Rsq ≥ 0.5 were analyzed for the association. GWAS within each data set was performed under an additive model, controlling for age, gender and relatedness. Meta-analysis was conducted using the inverse-variance fixed-effects method. Furthermore, to identify sex-specific effects, we have conducted GWAS meta-analyses analyzing males and females separately. In addition, we have performed biological pathway analysis. Four SNPs, representing one locus, reached genome-wide significance. The most significant SNP was rs11099476 on chromosome 4 (P=1.15x10-8), which explained 1.14 % of the variance in PTH. The SNP is located near the protein-coding gene RASGEF1B. Additionally, we detected suggestive association with SNPs, rs77178854 located on chromosome 2 in the DPP10 gene (P=2.46x10-7) and rs481121 located on chromosome 1 (P=3.58x10-7) near the GRIK1 gene. One of the top hits detected in the main meta-analysis, intron variant rs77178854 located within DPP10 gene, reached genome-wide significance in females (P=2.21x10-9). No single locus was identified in the meta-analysis in males. Fifteen biological pathways were functionally enriched at a P<0.01, including muscle contraction, ion homeostasis and cardiac conduction as the most significant pathways. RASGEF1B is the guanine nucleotide exchange factor, known to be associated with height, bone density, and hip. DPP10 encodes a membrane protein that is a member of the serine proteases family, which binds specific voltage-gated potassium channels and alters their expression and biophysical properties. In conclusion, we identified 2 novel loci associated with PTH levels in a general population, providing us with further insights into the genetics of this complex trait.

Keywords: general population, genome-wide association analysis, parathyroid hormone, single nucleotide polymorphisms.

Procedia PDF Downloads 226

665 Neutral Heavy Scalar Searches via Standard Model Gauge Boson Decays at the Large Hadron Electron Collider with Multivariate Techniques

Authors: Luigi Delle Rose, Oliver Fischer, Ahmed Hammad

Abstract:

In this article, we study the prospects of the proposed Large Hadron electron Collider (LHeC) in the search for heavy neutral scalar particles. We consider a minimal model with one additional complex scalar singlet that interacts with the Standard Model (SM) via mixing with the Higgs doublet, giving rise to an SM-like Higgs boson and a heavy scalar particle. Both scalar particles are produced via vector boson fusion and can be tested via their decays into pairs of SM particles, analogously to the SM Higgs boson. Using multivariate techniques, we show that the LHeC is sensitive to heavy scalars with masses between 200 and 800 GeV down to scalar mixing of order 0.01.

Keywords: beyond the standard model, large hadron electron collider, multivariate analysis, scalar singlet

Procedia PDF Downloads 137

664 Multivariate Statistical Process Monitoring of Base Metal Flotation Plant Using Dissimilarity Scale-Based Singular Spectrum Analysis

Authors: Syamala Krishnannair

Abstract:

A multivariate statistical process monitoring methodology using dissimilarity scale-based singular spectrum analysis (SSA) is proposed for the detection and diagnosis of process faults in the base metal flotation plant. Process faults are detected based on the multi-level decomposition of process signals by SSA using the dissimilarity structure of the process data and the subsequent monitoring of the multiscale signals using the unified monitoring index which combines T² with SPE. Contribution plots are used to identify the root causes of the process faults. The overall results indicated that the proposed technique outperformed the conventional multivariate techniques in the detection and diagnosis of the process faults in the flotation plant.

Keywords: fault detection, fault diagnosis, process monitoring, dissimilarity scale

Procedia PDF Downloads 209

663 Discrimination Between Bacillus and Alicyclobacillus Isolates in Apple Juice by Fourier Transform Infrared Spectroscopy and Multivariate Analysis

Authors: Murada Alholy, Mengshi Lin, Omar Alhaj, Mahmoud Abugoush

Abstract:

Alicyclobacillus is a causative agent of spoilage in pasteurized and heat-treated apple juice products. Differentiating between this genus and the closely related Bacillus is crucially important. In this study, Fourier transform infrared spectroscopy (FT-IR) was used to identify and discriminate between four Alicyclobacillus strains and four Bacillus isolates inoculated individually into apple juice. Loading plots over the range of 1350 and 1700 cm-1 reflected the most distinctive biochemical features of Bacillus and Alicyclobacillus. Multivariate statistical methods (e.g. principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA)) were used to analyze the spectral data. Distinctive separation of spectral samples was observed. This study demonstrates that FT-IR spectroscopy in combination with multivariate analysis could serve as a rapid and effective tool for fruit juice industry to differentiate between Bacillus and Alicyclobacillus and to distinguish between species belonging to these two genera.

Keywords: alicyclobacillus, bacillus, FT-IR, spectroscopy, PCA

Procedia PDF Downloads 489

662 Association between Polygenic Risk of Alzheimer's Dementia, Brain MRI and Cognition in UK Biobank

Authors: Rachana Tank, Donald. M. Lyall, Kristin Flegal, Joey Ward, Jonathan Cavanagh

Abstract:

Alzheimer’s research UK estimates by 2050, 2 million individuals will be living with Late Onset Alzheimer’s disease (LOAD). However, individuals experience considerable cognitive deficits and brain pathology over decades before reaching clinically diagnosable LOAD and studies have utilised gene candidate studies such as genome wide association studies (GWAS) and polygenic risk (PGR) scores to identify high risk individuals and potential pathways. This investigation aims to determine whether high genetic risk of LOAD is associated with worse brain MRI and cognitive performance in healthy older adults within the UK Biobank cohort. Previous studies investigating associations of PGR for LOAD and measures of MRI or cognitive functioning have focused on specific aspects of hippocampal structure, in relatively small sample sizes and with poor ‘controlling’ for confounders such as smoking. Both the sample size of this study and the discovery GWAS sample are bigger than previous studies to our knowledge. Genetic interaction between loci showing largest effects in GWAS have not been extensively studied and it is known that APOE e4 poses the largest genetic risk of LOAD with potential gene-gene and gene-environment interactions of e4, for this reason we also analyse genetic interactions of PGR with the APOE e4 genotype. High genetic loading based on a polygenic risk score of 21 SNPs for LOAD is associated with worse brain MRI and cognitive outcomes in healthy individuals within the UK Biobank cohort. Summary statistics from Kunkle et al., GWAS meta-analyses (case: n=30,344, control: n=52,427) will be used to create polygenic risk scores based on 21 SNPs and analyses will be carried out in N=37,000 participants in the UK Biobank. This will be the largest study to date investigating PGR of LOAD in relation to MRI. MRI outcome measures include WM tracts, structural volumes. Cognitive function measures include reaction time, pairs matching, trail making, digit symbol substitution and prospective memory. Interaction of the APOE e4 alleles and PGR will be analysed by including APOE status as an interaction term coded as either 0, 1 or 2 e4 alleles. Models will be adjusted partially for adjusted for age, BMI, sex, genotyping chip, smoking, depression and social deprivation. Preliminary results suggest PGR score for LOAD is associated with decreased hippocampal volumes including hippocampal body (standardised beta = -0.04, P = 0.022) and tail (standardised beta = -0.037, P = 0.030), but not with hippocampal head. There were also associations of genetic risk with decreased cognitive performance including fluid intelligence (standardised beta = -0.08, P<0.01) and reaction time (standardised beta = 2.04, P<0.01). No genetic interactions were found between APOE e4 dose and PGR score for MRI or cognitive measures. The generalisability of these results is limited by selection bias within the UK Biobank as participants are less likely to be obese, smoke, be socioeconomically deprived and have fewer self-reported health conditions when compared to the general population. Lack of a unified approach or standardised method for calculating genetic risk scores may also be a limitation of these analyses. Further discussion and results are pending.

Keywords: Alzheimer's dementia, cognition, polygenic risk, MRI

Procedia PDF Downloads 114

661 Fast Bayesian Inference of Multivariate Block-Nearest Neighbor Gaussian Process (NNGP) Models for Large Data

Authors: Carlos Gonzales, Zaida Quiroz, Marcos Prates

Abstract:

Several spatial variables collected at the same location that share a common spatial distribution can be modeled simultaneously through a multivariate geostatistical model that takes into account the correlation between these variables and the spatial autocorrelation. The main goal of this model is to perform spatial prediction of these variables in the region of study. Here we focus on a geostatistical multivariate formulation that relies on sharing common spatial random effect terms. In particular, the first response variable can be modeled by a mean that incorporates a shared random spatial effect, while the other response variables depend on this shared spatial term, in addition to specific random spatial effects. Each spatial random effect is defined through a Gaussian process with a valid covariance function, but in order to improve the computational efficiency when the data are large, each Gaussian process is approximated to a Gaussian random Markov field (GRMF), specifically to the block nearest neighbor Gaussian process (Block-NNGP). This approach involves dividing the spatial domain into several dependent blocks under certain constraints, where the cross blocks allow capturing the spatial dependence on a large scale, while each individual block captures the spatial dependence on a smaller scale. The multivariate geostatistical model belongs to the class of Latent Gaussian Models; thus, to achieve fast Bayesian inference, it is used the integrated nested Laplace approximation (INLA) method. The good performance of the proposed model is shown through simulations and applications for massive data.

Keywords: Block-NNGP, geostatistics, gaussian process, GRMF, INLA, multivariate models.

Procedia PDF Downloads 98

660 Effects of Video Games and Online Chat on Mathematics Performance in High School: An Approach of Multivariate Data Analysis

Authors: Lina Wu, Wenyi Lu, Ye Li

Abstract:

Regarding heavy video game players for boys and super online chat lovers for girls as a symbolic phrase in the current adolescent culture, this project of data analysis verifies the displacement effect on deteriorating mathematics performance. To evaluate correlation or regression coefficients between a factor of playing video games or chatting online and mathematics performance compared with other factors, we use multivariate analysis technique and take gender difference into account. We find the most important reason for the negative sign of the displacement effect on mathematics performance due to students’ poor academic background. Statistical analysis methods in this project could be applied to study internet users’ academic performance from the high school education to the college education.

Keywords: correlation coefficients, displacement effect, multivariate analysis technique, regression coefficients

Procedia PDF Downloads 367