Search results for: genotype imputation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 316

Search results for: genotype imputation

316 Linkage Disequilibrium and Haplotype Blocks Study from Two High-Density Panels and a Combined Panel in Nelore Beef Cattle

Authors: Priscila A. Bernardes, Marcos E. Buzanskas, Luciana C. A. Regitano, Ricardo V. Ventura, Danisio P. Munari

Abstract:

Genotype imputation has been used to reduce genomic selections costs. In order to increase haplotype detection accuracy in methods that considers the linkage disequilibrium, another approach could be used, such as combined genotype data from different panels. Therefore, this study aimed to evaluate the linkage disequilibrium and haplotype blocks in two high-density panels before and after the imputation to a combined panel in Nelore beef cattle. A total of 814 animals were genotyped with the Illumina BovineHD BeadChip (IHD), wherein 93 animals (23 bulls and 70 progenies) were also genotyped with the Affymetrix Axion Genome-Wide BOS 1 Array Plate (AHD). After the quality control, 809 IHD animals (509,107 SNPs) and 93 AHD (427,875 SNPs) remained for analyses. The combined genotype panel (CP) was constructed by merging both panels after quality control, resulting in 880,336 SNPs. Imputation analysis was conducted using software FImpute v.2.2b. The reference (CP) and target (IHD) populations consisted of 23 bulls and 786 animals, respectively. The linkage disequilibrium and haplotype blocks studies were carried out for IHD, AHD, and imputed CP. Two linkage disequilibrium measures were considered; the correlation coefficient between alleles from two loci (r²) and the |D’|. Both measures were calculated using the software PLINK. The haplotypes' blocks were estimated using the software Haploview. The r² measurement presented different decay when compared to |D’|, wherein AHD and IHD had almost the same decay. For r², even with possible overestimation by the sample size for AHD (93 animals), the IHD presented higher values when compared to AHD for shorter distances, but with the increase of distance, both panels presented similar values. The r² measurement is influenced by the minor allele frequency of the pair of SNPs, which can cause the observed difference comparing the r² decay and |D’| decay. As a sum of the combinations between Illumina and Affymetrix panels, the CP presented a decay equivalent to a mean of these combinations. The estimated haplotype blocks detected for IHD, AHD, and CP were 84,529, 63,967, and 140,336, respectively. The IHD were composed by haplotype blocks with mean of 137.70 ± 219.05kb, the AHD with mean of 102.10kb ± 155.47, and the CP with mean of 107.10kb ± 169.14. The majority of the haplotype blocks of these three panels were composed by less than 10 SNPs, with only 3,882 (IHD), 193 (AHD) and 8,462 (CP) haplotype blocks composed by 10 SNPs or more. There was an increase in the number of chromosomes covered with long haplotypes when CP was used as well as an increase in haplotype coverage for short chromosomes (23-29), which can contribute for studies that explore haplotype blocks. In general, using CP could be an alternative to increase density and number of haplotype blocks, increasing the probability to obtain a marker close to a quantitative trait loci of interest.

Keywords: Bos taurus indicus, decay, genotype imputation, single nucleotide polymorphism

Procedia PDF Downloads 248
315 Two-Phase Sampling for Estimating a Finite Population Total in Presence of Missing Values

Authors: Daniel Fundi Murithi

Abstract:

Missing data is a real bane in many surveys. To overcome the problems caused by missing data, partial deletion, and single imputation methods, among others, have been proposed. However, problems such as discarding usable data and inaccuracy in reproducing known population parameters and standard errors are associated with them. For regression and stochastic imputation, it is assumed that there is a variable with complete cases to be used as a predictor in estimating missing values in the other variable, and the relationship between the two variables is linear, which might not be realistic in practice. In this project, we estimate population total in presence of missing values in two-phase sampling. Instead of regression or stochastic models, non-parametric model based regression model is used in imputing missing values. Empirical study showed that nonparametric model-based regression imputation is better in reproducing variance of population total estimate obtained when there were no missing values compared to mean, median, regression, and stochastic imputation methods. Although regression and stochastic imputation were better than nonparametric model-based imputation in reproducing population total estimates obtained when there were no missing values in one of the sample sizes considered, nonparametric model-based imputation may be used when the relationship between outcome and predictor variables is not linear.

Keywords: finite population total, missing data, model-based imputation, two-phase sampling

Procedia PDF Downloads 100
314 Effect of Genuine Missing Data Imputation on Prediction of Urinary Incontinence

Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Ananias Diokno

Abstract:

Missing data is a common challenge in statistical analyses of most clinical survey datasets. A variety of methods have been developed to enable analysis of survey data to deal with missing values. Imputation is the most commonly used among the above methods. However, in order to minimize the bias introduced due to imputation, one must choose the right imputation technique and apply it to the correct type of missing data. In this paper, we have identified different types of missing values: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD) and applied rough set imputation on only the GMD portion of the missing data. We have used rough set imputation to evaluate the effect of such imputation on prediction by generating several simulation datasets based on an existing epidemiological dataset (MESA). To measure how well each dataset lends itself to the prediction model (logistic regression), we have used p-values from the Wald test. To evaluate the accuracy of the prediction, we have considered the width of 95% confidence interval for the probability of incontinence. Both imputed and non-imputed simulation datasets were fit to the prediction model, and they both turned out to be significant (p-value < 0.05). However, the Wald score shows a better fit for the imputed compared to non-imputed datasets (28.7 vs. 23.4). The average confidence interval width was decreased by 10.4% when the imputed dataset was used, meaning higher precision. The results show that using the rough set method for missing data imputation on GMD data improve the predictive capability of the logistic regression. Further studies are required to generalize this conclusion to other clinical survey datasets.

Keywords: rough set, imputation, clinical survey data simulation, genuine missing data, predictive index

Procedia PDF Downloads 137
313 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 245
312 Imputing Missing Data in Electronic Health Records: A Comparison of Linear and Non-Linear Imputation Models

Authors: Alireza Vafaei Sadr, Vida Abedi, Jiang Li, Ramin Zand

Abstract:

Missing data is a common challenge in medical research and can lead to biased or incomplete results. When the data bias leaks into models, it further exacerbates health disparities; biased algorithms can lead to misclassification and reduced resource allocation and monitoring as part of prevention strategies for certain minorities and vulnerable segments of patient populations, which in turn further reduce data footprint from the same population – thus, a vicious cycle. This study compares the performance of six imputation techniques grouped into Linear and Non-Linear models on two different realworld electronic health records (EHRs) datasets, representing 17864 patient records. The mean absolute percentage error (MAPE) and root mean squared error (RMSE) are used as performance metrics, and the results show that the Linear models outperformed the Non-Linear models in terms of both metrics. These results suggest that sometimes Linear models might be an optimal choice for imputation in laboratory variables in terms of imputation efficiency and uncertainty of predicted values.

Keywords: EHR, machine learning, imputation, laboratory variables, algorithmic bias

Procedia PDF Downloads 46
311 A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data

Authors: Benjamin Leiby, Darryl Ahner

Abstract:

This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions while presenting a need for further refinement that mimics predictive mean matching.

Keywords: correlation, country conflict, imputation, stochastic regression

Procedia PDF Downloads 89
310 Energy Complementary in Colombia: Imputation of Dataset

Authors: Felipe Villegas-Velasquez, Harold Pantoja-Villota, Sergio Holguin-Cardona, Alejandro Osorio-Botero, Brayan Candamil-Arango

Abstract:

Colombian electricity comes mainly from hydric resources, affected by environmental variations such as the El Niño phenomenon. That is why incorporating other types of resources is necessary to provide electricity constantly. This research seeks to fill the wind speed and global solar irradiance dataset for two years with the highest amount of information. A further result is the characterization of the data by region that led to infer which errors occurred and offered the incomplete dataset.

Keywords: energy, wind speed, global solar irradiance, Colombia, imputation

Procedia PDF Downloads 109
309 Genetic Variability in Advanced Derivatives of Interspecific Hybrids in Brassica

Authors: Yasir Ali, Farhatullah

Abstract:

The present study was conducted to estimate the genetic variability, heritability and genetic advance in six parental lines and their 56 genotypes derived from five introgressed brassica populations on the basis of morphological and biochemical traits. The experiment was laid out in a randomized complete block design with two replications at The University of Agriculture Peshawar-Pakistan during growing season of 2015-2016. The ANOVA of all traits of F5:6 populations showed highly significant differences (P ≤ 0.01) for all morphological and biochemical traits. Among F5:6 populations, the genotype 2(526) was earlier in flowering (108.65 days), and genotype 14(485) was earlier in maturity (170 days). Tallest plants (182.5 cm), largest main raceme (91.5 cm) and maximum number of pods (80.5) on main raceme were recorded for genotype 17(34). Maximum primary branches plant-1(6.2) and longest pods (10.26 cm) were recorded for genotype 15, while genotype 16(171) had more seeds pod⁻¹ (22) and gave maximum yield plant-1 (30.22 g). The maximum 100-seed weight (0.60 g) was observed for genotype 10(506) while high protein content (22.61%) was recorded for genotype 4(99). Maximum oil content (54.08 %) and low linoleic acid (7.07 %) were produced by genotype (12(138) and low glucosinolate (59.01 µMg⁻¹) was recorded for genotype 21(113). The genotype 27(303) having high oleic acid content (51.73 %) and genotype 1(209) gave low erucic acid (35.97 %). Among the F5:6 populations moderate to high heritability observed for all morphological and biochemical traits coupled with high genetic advance. Cluster analysis grouped the 56 F5:6 populations along their parental lines into seven different groups. Each group was different from the other group on the basis of morphological and biochemical traits. Moreover all the F5:6 populations showed sufficient variability. Genotypes 10(506) and 16(171) were superior for high seed yield⁻¹, 100-seeds weight, and seed pod⁻¹ and are recommended for future breeding program.

Keywords: Brassicaceae, biochemical characterization, introgression, morphological characterization

Procedia PDF Downloads 155
308 Association of Transmission Risk Factors Among HCV-infected Bangladeshi Patients With Different Genotypes

Authors: Nahida Sultana

Abstract:

Globally, an estimated 58 million people have chronic hepatitis C virus infection, with about 1.5 million new infections occurring per year. The hepatitis C virus is a blood-borne virus, and most infections occur through exposure to blood from unsafe injection practices, unsafe health care, unscreened blood transfusion, injection drug use, and sexual practices that lead to exposure to blood. Hepatitis C virus (HCV) causes chronic infections that mainly affect the liver leading to liver diseases. This study aimed to determine whether there is any significant association between HCV transmission risk factors in relation to genotypes in HCV-infected Bangladeshi patients. After quantification of HCV viral load, 36 samples were randomly selected for HCV genotyping and risk factor measurement. A greater proportion of genotype 1 (p > 0.05) patients (40%) underwent blood transfusion compared to patients (22.6%) with genotype 3 infections. More genotype 1 patient underwent surgery and invasive procedures (20%), and rather than those with genotype 3 patients (16.1%). The history of IDUs (25.8%) and sexual exposure (3.2%) are only prevalent in genotype 3 patients and absent in patients with genotype 1 (p >0.05). There was no significant statistical difference found in HCV transmission risk factors (blood transfusion, IDUs, Surgery& interventions, sexual transmission) between patients infected with genotypes 1 and 3. In HCV infection, genotype may have no relation to transmission risk factors among Bangladeshi patients.

Keywords: HCV genotype, alanine aminotransferase (ALT), HCV viral load, IDUs

Procedia PDF Downloads 54
307 Overview of Adaptive Spline interpolation

Authors: Rongli Gai, Zhiyuan Chang

Abstract:

At this stage, in view of various situations in the interpolation process, most researchers use self-adaptation to adjust the interpolation process, which is also one of the current and future research hotspots in the field of CNC machining. In the interpolation process, according to the overview of the spline curve interpolation algorithm, the adaptive analysis is carried out from the factors affecting the interpolation process. The adaptive operation is reflected in various aspects, such as speed, parameters, errors, nodes, feed rates, random Period, sensitive point, step size, curvature, adaptive segmentation, adaptive optimization, etc. This paper will analyze and summarize the research of adaptive imputation in the direction of the above factors affecting imputation.

Keywords: adaptive algorithm, CNC machining, interpolation constraints, spline curve interpolation

Procedia PDF Downloads 157
306 Outlier Detection in Stock Market Data using Tukey Method and Wavelet Transform

Authors: Sadam Alwadi

Abstract:

Outlier values become a problem that frequently occurs in the data observation or recording process. Thus, the need for data imputation has become an essential matter. In this work, it will make use of the methods described in the prior work to detect the outlier values based on a collection of stock market data. In order to implement the detection and find some solutions that maybe helpful for investors, real closed price data were obtained from the Amman Stock Exchange (ASE). Tukey and Maximum Overlapping Discrete Wavelet Transform (MODWT) methods will be used to impute the detect the outlier values.

Keywords: outlier values, imputation, stock market data, detecting, estimation

Procedia PDF Downloads 54
305 Influence of κ-Casein Genotype on Milk Productivity of Latvia Local Dairy Breeds

Authors: S. Petrovska, D. Jonkus, D. Smiltiņa

Abstract:

κ-casein is one of milk proteins which are very important for milk processing. Genotypes of κ-casein affect milk yield, fat, and protein content. The main factors which affect local Latvian dairy breed milk yield and composition are analyzed in research. Data were collected from 88 Latvian brown and 82 Latvian blue cows in 2015. AA genotype was 0.557 in Latvian brown and 0.232 in Latvian blue breed. BB genotype was 0.034 in Latvian brown and 0.207 in Latvian blue breed. Highest milk yield was observed in Latvian brown (5131.2 ± 172.01 kg), significantly high fat content and fat yield also was in Latvian brown (p < 0.05). Significant differences between κ-casein genotypes were not found in Latvian brown, but highest milk yield (5057 ± 130.23 kg), protein content (3.42 ± 0.03%), and protein yield (171.9 ± 4.34 kg) were with AB genotype. Significantly high fat content was observed in Latvian blue breed with BB genotype (4.29 ± 0.17%) compared with AA genotypes (3.42 ± 0.19). Similar tendency was found in protein content – 3.27 ± 0.16% with BB genotype and 2.59 ± 0.16% with AA genotype (p < 0.05). Milk yield increases by increasing parity. We did not obtain major tendency of changes of milk fat and protein content according parity.

Keywords: dairy cows, κ-casein, milk productivity, polymorphism

Procedia PDF Downloads 230
304 Single Imputation for Audiograms

Authors: Sarah Beaver, Renee Bryce

Abstract:

Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125Hz to 8000Hz. The data contains patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R\textsuperscript{2} values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R\textsuperscript{2} values for the best models for KNN ranges from .89 to .95. The best imputation models received R\textsuperscript{2} between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our best imputation models versus constant imputations by a two percent increase.

Keywords: machine learning, audiograms, data imputations, single imputations

Procedia PDF Downloads 48
303 Bias-Corrected Estimation Methods for Receiver Operating Characteristic Surface

Authors: Khanh To Duc, Monica Chiogna, Gianfranco Adimari

Abstract:

With three diagnostic categories, assessment of the performance of diagnostic tests is achieved by the analysis of the receiver operating characteristic (ROC) surface, which generalizes the ROC curve for binary diagnostic outcomes. The volume under the ROC surface (VUS) is a summary index usually employed for measuring the overall diagnostic accuracy. When the true disease status can be exactly assessed by means of a gold standard (GS) test, unbiased nonparametric estimators of the ROC surface and VUS are easily obtained. In practice, unfortunately, disease status verification via the GS test could be unavailable for all study subjects, due to the expensiveness or invasiveness of the GS test. Thus, often only a subset of patients undergoes disease verification. Statistical evaluations of diagnostic accuracy based only on data from subjects with verified disease status are typically biased. This bias is known as verification bias. Here, we consider the problem of correcting for verification bias when continuous diagnostic tests for three-class disease status are considered. We assume that selection for disease verification does not depend on disease status, given test results and other observed covariates, i.e., we assume that the true disease status, when missing, is missing at random. Under this assumption, we discuss several solutions for ROC surface analysis based on imputation and re-weighting methods. In particular, verification bias-corrected estimators of the ROC surface and of VUS are proposed, namely, full imputation, mean score imputation, inverse probability weighting and semiparametric efficient estimators. Consistency and asymptotic normality of the proposed estimators are established, and their finite sample behavior is investigated by means of Monte Carlo simulation studies. Two illustrations using real datasets are also given.

Keywords: imputation, missing at random, inverse probability weighting, ROC surface analysis

Procedia PDF Downloads 382
302 A Preliminary Report of HBV Full Genome Sequencing Derived from Iranian Intravenous Drug Users

Authors: Maryam Vaezjalali, Koroush Rahimian, Maryam Asli, Tahmineh Kandelouei, Foad Davoodbeglou, Amir H. Kashi

Abstract:

Objectives: The present study was conducted to assess the HBV molecular profiles including genotypes, subgenotypes, subtypes & mutations in hepatitis B genes. Materials/Patients and Methods: This study was conducted on 229 intravenous drug users who referred to three Drop- in-Centers and a hospital in Tehran. HBV DNA was extracted from HBsAg positive serum samples and amplified by Nested PCR. HBV genotype, subgenotypes, subtype and genes mutation were determined by direct sequencing. Phylogenetic tree was constructed using neighbor- joining (NJ) method. Statistical analyses were carried out by SPSS 20. Results: HBV DNA was found in 3 HBsAg positive cases. Phylogenetic tree of derived HBV DNAs showed the existence of genotype D (subgenotype D1, subtype ayw2). Also immune escape mutations were determined in S gene. Conclusion: There were a few variations and genotypes and subtypes among infected intravenous drug users. This study showed the predominance of genotype D among intravenous drug users. Our study concurs with other reports from Iran, that all showing currently only genotype D is the only detectable genotype in Iran.

Keywords: drug users, genotype, HBV, phylogenetic tree

Procedia PDF Downloads 285
301 Detection and Distribution Pattern of Prevelant Genotypes of Hepatitis C in a Tertiary Care Hospital of Western India

Authors: Upasana Bhumbla

Abstract:

Background: Hepatitis C virus is a major cause of chronic hepatitis, which can further lead to cirrhosis of the liver and hepatocellular carcinoma. Worldwide the burden of Hepatitis C infection has become a serious threat to the human race. Hepatitis C virus (HCV) has population-specific genotypes and provides valuable epidemiological and therapeutic information. Genotyping and assessment of viral load in HCV patients are important for planning the therapeutic strategies. The aim of the study is to study the changing trends of prevalence and genotypic distribution of hepatitis C virus in a tertiary care hospital in Western India. Methods: It is a retrospective study; blood samples were collected and tested for anti HCV antibodies by ELISA in Dept. of Microbiology. In seropositive Hepatitis C patients, quantification of HCV-RNA was done by real-time PCR and in HCV-RNA positive samples, genotyping was conducted. Results: A total of 114 patients who were seropositive for Anti HCV were recruited in the study, out of which 79 (69.29%) were HCV-RNA positive. Out of these positive samples, 54 were further subjected to genotype determination using real-time PCR. Genotype was not detected in 24 samples due to low viral load; 30 samples were positive for genotype. Conclusion: Knowledge of genotype is crucial for the management of HCV infection and prediction of prognosis. Patients infected with HCV genotype 1 and 4 will have to receive Interferon and Ribavirin for 48 weeks. Patients with these genotypes show a poor sustained viral response when tested 24 weeks after completion of therapy. On the contrary, patients infected with HCV genotype 2 and 3 are reported to have a better response to therapy.

Keywords: hepatocellular, genotype, ribavarin, seropositive

Procedia PDF Downloads 99
300 dynr.mi: An R Program for Multiple Imputation in Dynamic Modeling

Authors: Yanling Li, Linying Ji, Zita Oravecz, Timothy R. Brick, Michael D. Hunter, Sy-Miin Chow

Abstract:

Assessing several individuals intensively over time yields intensive longitudinal data (ILD). Even though ILD provide rich information, they also bring other data analytic challenges. One of these is the increased occurrence of missingness with increased study length, possibly under non-ignorable missingness scenarios. Multiple imputation (MI) handles missing data by creating several imputed data sets, and pooling the estimation results across imputed data sets to yield final estimates for inferential purposes. In this article, we introduce dynr.mi(), a function in the R package, Dynamic Modeling in R (dynr). The package dynr provides a suite of fast and accessible functions for estimating and visualizing the results from fitting linear and nonlinear dynamic systems models in discrete as well as continuous time. By integrating the estimation functions in dynr and the MI procedures available from the R package, Multivariate Imputation by Chained Equations (MICE), the dynr.mi() routine is designed to handle possibly non-ignorable missingness in the dependent variables and/or covariates in a user-specified dynamic systems model via MI, with convergence diagnostic check. We utilized dynr.mi() to examine, in the context of a vector autoregressive model, the relationships among individuals’ ambulatory physiological measures, and self-report affect valence and arousal. The results from MI were compared to those from listwise deletion of entries with missingness in the covariates. When we determined the number of iterations based on the convergence diagnostics available from dynr.mi(), differences in the statistical significance of the covariate parameters were observed between the listwise deletion and MI approaches. These results underscore the importance of considering diagnostic information in the implementation of MI procedures.

Keywords: dynamic modeling, missing data, mobility, multiple imputation

Procedia PDF Downloads 138
299 Self-Organizing Maps for Exploration of Partially Observed Data and Imputation of Missing Values in the Context of the Manufacture of Aircraft Engines

Authors: Sara Rejeb, Catherine Duveau, Tabea Rebafka

Abstract:

To monitor the production process of turbofan aircraft engines, multiple measurements of various geometrical parameters are systematically recorded on manufactured parts. Engine parts are subject to extremely high standards as they can impact the performance of the engine. Therefore, it is essential to analyze these databases to better understand the influence of the different parameters on the engine's performance. Self-organizing maps are unsupervised neural networks which achieve two tasks simultaneously: they visualize high-dimensional data by projection onto a 2-dimensional map and provide clustering of the data. This technique has become very popular for data exploration since it provides easily interpretable results and a meaningful global view of the data. As such, self-organizing maps are usually applied to aircraft engine condition monitoring. As databases in this field are huge and complex, they naturally contain multiple missing entries for various reasons. The classical Kohonen algorithm to compute self-organizing maps is conceived for complete data only. A naive approach to deal with partially observed data consists in deleting items or variables with missing entries. However, this requires a sufficient number of complete individuals to be fairly representative of the population; otherwise, deletion leads to a considerable loss of information. Moreover, deletion can also induce bias in the analysis results. Alternatively, one can first apply a common imputation method to create a complete dataset and then apply the Kohonen algorithm. However, the choice of the imputation method may have a strong impact on the resulting self-organizing map. Our approach is to address simultaneously the two problems of computing a self-organizing map and imputing missing values, as these tasks are not independent. In this work, we propose an extension of self-organizing maps for partially observed data, referred to as missSOM. First, we introduce a criterion to be optimized, that aims at defining simultaneously the best self-organizing map and the best imputations for the missing entries. As such, missSOM is also an imputation method for missing values. To minimize the criterion, we propose an iterative algorithm that alternates the learning of a self-organizing map and the imputation of missing values. Moreover, we develop an accelerated version of the algorithm by entwining the iterations of the Kohonen algorithm with the updates of the imputed values. This method is efficiently implemented in R and will soon be released on CRAN. Compared to the standard Kohonen algorithm, it does not come with any additional cost in terms of computing time. Numerical experiments illustrate that missSOM performs well in terms of both clustering and imputation compared to the state of the art. In particular, it turns out that missSOM is robust to the missingness mechanism, which is in contrast to many imputation methods that are appropriate for only a single mechanism. This is an important property of missSOM as, in practice, the missingness mechanism is often unknown. An application to measurements on one type of part is also provided and shows the practical interest of missSOM.

Keywords: imputation method of missing data, partially observed data, robustness to missingness mechanism, self-organizing maps

Procedia PDF Downloads 124
298 Relationship Salt Sensitivity and с825т Polymorphism of gnb3 Gene in Patients with Essential Hypertension

Authors: Aleksandr Nagay, Gulnoz Khamidullayeva

Abstract:

It is known that an unbalanced intake of salt (NaCI), lifestyle and genetic predisposition to pathology is a key component of the risk and the development of essential hypertension (EH). Purpose: To study the relationship between salt-sensitivity and blood pressure (BP) on systolic (SBP) and diastolic (DBP) blood pressure, depending on the C825T polymorphism of GNB3 in individuals of Uzbek nationality with EH. Method: studied 148 healthy and 148 patients with EH with I-II degree (WHO/ISH, 2003) with disease duration 6,5±1,3 years. Investigation of the gene GNB3 was produced by PCR-RFLP method. Determination of salt-sensitivity was performed by the method of R. Henkin. Results: For a comparative analysis of BP, the groups with carriage of CТ and TT genotypes were combined. The analysis showed that carriers of CC genotype and low salt-sensitivity were determined by higher levels of SBP compared with carriers of CT and TT genotypes, and low salt-sensitivity of SBP: 166,2±4,3 against 158,2±9,1 mm Hg (p=0,000). A similar analysis on the values of DBP also showed significantly higher values of blood pressure in carriers of CC genotype DBP: 105,8±10,6 vs. 100,5±7,2 mm Hg, respectively (p=0,001). The average values of SBP and DBP in groups with carriers of CC genotype at medium or high salt-sensitivity in comparison with carriers of CT or TT genotype did not differ statistically SBP: 165,0±0,1 vs. 160,0±8,6 mm Hg (p=0,275) and DBP: 100,1±0,1 vs. 101,6±7,6 mm Hg (p=0,687), respectively. Conclusion: It is revealed that in patients with EH CC genotype of the gene GNB3 given salt-sensitivity has a negative effect on blood pressure profile. Since patients with EH with the CC genotype of GNB3 gene with low-salt taste sensitivity is determined by a higher level of blood pressure, both on SBP and DBP.

Keywords: salt sensitivity, essential hypertension EH, blood pressure BP, genetic predisposition

Procedia PDF Downloads 247
297 Prospective Validation of the FibroTest Score in Assessing Liver Fibrosis in Hepatitis C Infection with Genotype 4

Authors: G. Shiha, S. Seif, W. Samir, K. Zalata

Abstract:

Prospective Validation of the FibroTest Score in assessing Liver Fibrosis in Hepatitis C Infection with Genotype 4 FibroTest (FT) is non-invasive score of liver fibrosis that combines the quantitative results of 5 serum biochemical markers (alpha-2-macroglobulin, haptoglobin, apolipoprotein A1, gamma glutamyl transpeptidase (GGT) and bilirubin) and adjusted with the patient's age and sex in a patented algorithm to generate a measure of fibrosis. FT has been validated in patients with chronic hepatitis C (CHC) (Halfon et al., Gastroenterol. Clin Biol.( 2008), 32 6suppl 1, 22-39). The validation of fibro test ( FT) in genotype IV is not well studied. Our aim was to evaluate the performance of FibroTest in an independent prospective cohort of hepatitis C patients with genotype 4. Subject was 122 patients with CHC. All liver biopsies were scored using METAVIR system. Our fibrosis score(FT) were measured, and the performance of the cut-off score were done using ROC curve. Among patients with advanced fibrosis, the FT was identically matched with the liver biopsy in 18.6%, overestimated the stage of fibrosis in 44.2% and underestimated the stage of fibrosis in 37.7% of cases. Also in patients with no/mild fibrosis, identical matching was detected in 39.2% of cases with overestimation in 48.1% and underestimation in 12.7%. So, the overall results of the test were identical matching, overestimation and underestimation in 32%, 46.7% and 21.3% respectively. Using ROC curve it was found that (FT) at the cut-off point of 0.555 could discriminate early from advanced stages of fibrosis with an area under ROC curve (AUC) of 0.72, sensitivity of 65%, specificity of 69%, PPV of 68%, NPV of 66% and accuracy of 67%. As FibroTest Score overestimates the stage of advanced fibrosis, it should not be considered as a reliable surrogate for liver biopsy in hepatitis C infection with genotype 4.

Keywords: fibrotest, chronic Hepatitis C, genotype 4, liver biopsy

Procedia PDF Downloads 383
296 Pegylated Interferon in HCV Genotype 3 Relapser to Conventional Interferon in Pakistani Population

Authors: Saad Khalid Niaz, Arif Mahmood Siddiqui, Afzal Haqi

Abstract:

Background: Estimated prevalence of Hepatitis C in Pakistan is 5% of which 78 % are Genotype 3, in which Response to conventional interferon is reported to be 70%. Objective: To determine the efficacy of pegylated interferon 20 kDa (Unipeg) plus ribavirin (Ribazole) in HCV genotype 3 patients who relapsed to conventional interferon. Methods: This is an ongoing study of 20 enrolled patients. Pegylated interferon alfa-2a 20 kDa 180 mcg weekly with ribavirin, were administered for a period of 24 weeks. Virological Responses were measured by Qualitative HCV RNA at weeks 4, 12, 24 and 48 to determine Rapid Virological Response (RVR), Early Virological Response (EVR), End of Treatment (ETR) and Sustained Virological Response (SVR), respectively. EVR was done for those who didn’t achieve RVR. Results: Males were 12 (60%) and mean age was 38.5 ±7.62 years. Out of 20 recruited patients, all completed 4 weeks therapy; RVR was achieved in 8 (40%) patients. One patient was lost to follow up and one yet to visit at 12 weeks. From 10 patients, 8 (80%) patients achieved EVR. Out of intent-to-treat patients, 15 completed 24 weeks therapy, ETR was achieved in 14 (93%) patients and 9 patients completed post therapy follow-up, of which, 8 (89%) patients achieved SVR. Conclusion: Our interim data demonstrates that Pegylated Interferon alfa-2a 20 kDa 180 mcg (Unipeg) in combination with Ribavirin (Ribazole) has shown promising results in treating HCV Genotype 3 patients who relapsed to conventional interferon. We recommend use of Pegylated Interferon in Relapsers with Genotype 3 when financial constraints limit the use of oral antivirals.

Keywords: pegylated interferon (unipeg), hepatitis c, relapsers, Pakistan

Procedia PDF Downloads 273
295 Responses of Grain Yield, Anthocyanin and Antioxidant Capacity to Water Condition in Wetland and Upland Purple Rice Genotypes

Authors: Supaporn Yamuangmorn, Chanakan Prom-U-Thai

Abstract:

Wetland and upland purple rice are the two major types classified by its original ecotypes in Northern Thailand. Wetland rice is grown under flooded condition from transplanting until the mutuality, while upland rice is naturally grown under well-drained soil known as aerobic cultivations. Both ecotypes can be grown and adapted to the reverse systems but little is known on its responses of grain yield and qualities between the 2 ecotypes. This study evaluated responses of grain yield as well as anthocyanin and antioxidant capacity between the wetland and upland purple rice genotypes grown in the submerged and aerobic conditions. A factorial arrangement in a randomized complete block design (RCBD) with two factors of rice genotype and water condition were carried out in three replications. The two wetland genotypes (Kum Doi Saket: KDK and Kum Phayao: KPY) and two upland genotypes (Kum Hom CMU: KHCMU and Pieisu1: PES1) were used in this study by growing under submerged and aerobic conditions. Grain yield was affected by the interaction between water condition and rice genotype. The wetland genotypes, KDK and KPY grown in the submerged condition produced about 2.7 and 0.8 times higher yield than in the aerobic condition, respectively. The 0.4 times higher grain yield of upland genotype (PES1) was found in the submerged condition than in the aerobic condition, but no significant differences in KHCMU. In the submerged condition, all genotypes produced higher yield components of tiller number, panicle number and percent filled grain than in the aerobic condition by 24% and 32% and 11%, respectively. The thousand grain weight and spikelet number were affected by water condition differently among genotypes. The wetland genotypes, KDK and KPY, and upland genotype, PES1, grown in the submerged condition produced about 19-22% higher grain weight than in the aerobic condition. The similar effect was found in spikelet number which the submerged condition of wetland genotypes, KDK and KPY, and the upland genotype, KHCMU, had about 28-30% higher than the aerobic condition. In contrast, the anthocyanin concentration and antioxidant capacity were affected by both the water condition and genotype. Rice grain grown in the aerobic condition had about 0.9 and 2.6 times higher anthocyanin concentration than in the submerged condition was found in the wetland rice, KDK and upland rice, KHCMU, respectively. Similarly, the antioxidant capacity of wetland rice, KDK and upland rice, KHCMU were 0.5 and 0.6 times higher in aerobic condition than in the submerged condition. There was a negative correlation between grain yield and anthocyanin concentration in wetland genotype KDK and upland genotype KHCMU, but it was not found in the other genotypes. This study indicating that some rice genotype can be adapted in the reverse ecosystem in both grain yield and quality, especially in the wetland genotype KPY and upland genotype PES1. To maximize grain yield and quality of purple rice, proper water management condition is require with a key consideration on difference responses among genotypes. Increasing number of rice genotypes in both ecotypes is needed to confirm their responses on water management.

Keywords: purple rice, water condition, anthocyanin, grain yield

Procedia PDF Downloads 136
294 A Review of Methods for Handling Missing Data in the Formof Dropouts in Longitudinal Clinical Trials

Authors: A. Satty, H. Mwambi

Abstract:

Much clinical trials data-based research are characterized by the unavoidable problem of dropout as a result of missing or erroneous values. This paper aims to review some of the various techniques to address the dropout problems in longitudinal clinical trials. The fundamental concepts of the patterns and mechanisms of dropout are discussed. This study presents five general techniques for handling dropout: (1) Deletion methods; (2) Imputation-based methods; (3) Data augmentation methods; (4) Likelihood-based methods; and (5) MNAR-based methods. Under each technique, several methods that are commonly used to deal with dropout are presented, including a review of the existing literature in which we examine the effectiveness of these methods in the analysis of incomplete data. Two application examples are presented to study the potential strengths or weaknesses of some of the methods under certain dropout mechanisms as well as to assess the sensitivity of the modelling assumptions.

Keywords: incomplete longitudinal clinical trials, missing at random (MAR), imputation, weighting methods, sensitivity analysis

Procedia PDF Downloads 378
293 Associations between Polymorphism of Growth Hormone Gene on Milk Production, Fat and Protein Content in Friesian Holstein Cattle

Authors: Tety Hartatik, Dian Kurniawati, Adiarto

Abstract:

The aim of the research was to determine the associations between polymorphism of the bovine growth hormone (GH) gene (Leu/Val, L/V) and milk production of Friesian Holstein Cattle. A total of 62 cows which consist of two Friesian Holstein groups (cattle from New Zealand are 19 heads and cattle from Australia are 43 heads). We perform the PCR and RFLP method for analyzing the genotype of the target gene GH 211 bp in the part of intron 4 and exon 5 of GH gene. The frequencies of genotypes LL were higher than genotype LV. The number of genotype LL in New Zealand and Australia groups are 84% and 79%, respectively. The number of genotype LV in New Zealand and Australia groups are 16% and 21%, respectively. The association between Leu/Val polymorphism on milk production, fat and protein content in both groups does not show the significant effect. However base on the groups (cows from New Zealand compare with those from Australia) show the significant effect on fat and protein content.

Keywords: Friesian Holstein, fat content, growth hormone gene, milk production, PCR-RLFP, protein content

Procedia PDF Downloads 618
292 Effects of Obesity and Family History of Diabetes on the Association of Cholesterol Ester Transfer Protein Gene with High-Density Lipoprotein Cholesterol Levels in Korean Population

Authors: Jae Woong Sull

Abstract:

Lipid levels are related to the risk of cardiovascular diseases. Cholesterol ester transfer protein (CETP) gene is one of the candidate genes of cardiovascular diseases. A total of 2,304 persons were chosen from a Hospital (N=4,294) in South Korea. Female subjects with the CG/GG genotype had a 2.03 -fold (p=0.0001) higher risk of having abnormal HDL cholesterol levels (<40 mg/dL) than subjects with the CC genotype. Male subjects with the CG/GG genotype had a 1.34 -fold (p=0.0019) higher risk than subjects with the CC genotype. When analyzed by body mass index, the association with CETP was much stronger in male subjects with BMI>=25.69 (OR=1.55, 95% CI: 1.15-2.07, P=0.0037) than in male lean subjects. When analyzed by family history of diabetes, the association with CETP was much stronger in male subjects with positive family history of low physical activity (OR=4.82, 95% CI: 1.86-12.5, P=0.0012) than in male subjects with negative family history of diabetes. This study clearly demonstrates that genetic variants in CETP influence HDL cholesterol levels in Korean adults.

Keywords: CETP, diabetes, obesity, polymorphisms

Procedia PDF Downloads 112
291 Associations of Gene Polymorphism of IL-17 a (C737T) with Its Level in Patients with Erysipelas Kazakh Population

Authors: Nazira B. Bekenova, Lydia A. Mukovozova, Andrej M. Grjibovski, Alma Z. Tokayeva, Yerbol M. Smail, Nurlan E. Aukenov

Abstract:

Erysipelas is an infectious disease with socio-economic significance and prone to prolonged recurrent course (30%). Contribution of genetic factors, in particular the gene polymorphism of cytokines, can be essential in disease etiology and pathogenesis. Interleukin – 17 A are produced by T helpers of 17 type and plays a key role in development of local inflammation process. Local inflammatory process is a dominant in the clinic of erysipelas. Established that the skin and mucosas are primary areas of migration (homing) Th17-cell and their cytokines are stimulate the barrier function of the epithelium. We studied associations between gene polymorphism of IL-17A (C737T) rs 8193036 and IL-17A level in patients with erysipelas Kazakh population. Altogether, 90 cases with erysipelas and 90 healthy controls from an ethnic Kazakh population comprised the sample. Cases were identified at Clinical Infectious Diseases Hospital of Semey (Kazakhstan). The IL-17A (rs8193036) polymorphism was analyzed by a real time polymerase chain reaction. Plasma levels of IL-17 A were assessed by immuneenzyme analysis method using ‘Vector-Best’ test-system (Russia). Differences in levels of IL-17 A between CC, TT, CT groups were studied using Kruskal — Wallis test. Pairwise comparisons were performed using Mann-Whitney tests with Bonferroni correction (New significance level was set to 0.025). We found, that in patients with erysipelas with CC genotype the level of IL-17 A was higher (p= 0, 010) compared to the carriers of CT genotype. When compared the level of IL – 17 A between the patients with TT genotype and patients with CC genotype, also between the patients with CT genotype and patients with TT genotype statistically significant differences are not revealed (p = 0.374 and p = 0.043, respectively). Comparisons of IL-17 A plasma levels between the CC and CT genotypes, between the CC and TT genotypes, and between the TT and CT in healthy patients did not reveal significant differences (p = 0, 291). Therefore, we are determined the associations of gene polymorphism of IL-17 A (C737T) with its level in patients erysipelas carriers CC genotype.

Keywords: erysipelas, interleukin – 17 A, Kazakh, polymorphism

Procedia PDF Downloads 393
290 The Effects of Different Sowing Times on Seed Yield and Quality of Fenugreek (Trigonella foenum graecum L.) in East Mediterranean Region of Turkey

Authors: Lale Efe, Zeynep Gokce

Abstract:

In this study carried out in 2013-14 growing season in East Mediterranean Region of Turkey, it was aimed to investigate the effects of different sowing times on the seed yield and quality of fenugreek (Trigonella foenum graceum L.). Three fenugreek genotypes (Gürarslan, Candidate Line-1 and Genotype-1) were sown on 13.11.2013 and 07.03.2014 according to factorial randomized block design with 3 replications. Plant height (cm), branch number per plant, first pod height (cm), pod length (mm), seed number per pod (g), seed yield per plant (g), seed yield per decar (kg), thousand seed weight (g), mucilage rate (%), seed protein ratio (%), seed oil ratio (%), oleic acid (%), linoleic acid (%), palmitic acid (%) and stearic acid (%) were investigated. Among genotypes, while the highest seed yield per plant was obtained from Genotype-1 (5 g/plant), the lowest seed yield per plant was obtained from cv. Gürarslan (3.4 g/plant). According to genotype x sowing date interactions, it can be said that the highest seed yield per plant was taken in autumn sowing from Genotype-1 (6.6 g/plant) and the lowest seed yield per plant was taken in spring sowing from cv. Gürarslan (2.9 g/plant). Genotype-1 had the highest linoleic acid ratio (41.6 %). Cv. Gürarslan and Candidate Line-1 had the highest oleic acid ratio (respectively 17.8 % and 17.6%).

Keywords: fenugreek, seed yield and quality, sowing times, Trigonella foenum graecum L.

Procedia PDF Downloads 173
289 Molecular Epidemiologic Distribution of HDV Genotypes among Different Ethnic Groups in Iran: A Systematic Review

Authors: Khabat Barkhordari

Abstract:

Hepatitis delta virus (HDV) is a RNA virus that needs the function of hepatitis B virus (HBV) for its propagation and assembly. Infection by HDV can occur spontaneously with HBV infection and cause acute hepatitis or develop as secondary infection in HBV suffering patients. Based on genome sequence analysis, HDV has several genotypes which show broad geographic and diverse clinical features. The aim of current study is determine the molecular epidemiology of hepatitis delta virus genotype in patients with positive HBsAg among different ethnic groups of Iran. This systematic review study reviews the results of different studies which examined 2000 Iranian patients with HBV infection from 2010 to 2015. Among 2000 patients in this study, 16.75 % were containing anti-HDV antibody and HDV RNA was found in just 1.75% cases. All of positive cases also have genotype I.

Keywords: HDV, genotype, epidemiology, distribution

Procedia PDF Downloads 246
288 Bulking Rate of Cassava Genotypes and Their Root Yield Relationship at Guinea Savannah and Forest Transition Agroecological Zone of Nigeria

Authors: Olusegun D. Badewa, E. K. Tsado, A. S. Gana, K. D. Tolorunse, R. U. Okechukwu, P. Iluebbey, S. Ibrahim

Abstract:

Farmers are faced with varying production challenges ranging from unstable weather due to climate change, low yield, malnutrition, cattle invasion, and bush fires that have always affected their livelihood. Research effort must therefore be centered on improving farmers’ livelihood, nutrition, and health by providing early bulking biofortified cassava varieties that could be harvested earlier with reasonable root yield and thereby preventing long stay of the crop on their farmland. This study evaluated cassava genotypes at different harvesting months of 3, 6, 9, and 12 months after planting in order to evaluate their bulking rate at different agroecology of Mokwa and Ubiaja. Data were collected on fresh storage root yield, Harvest index, and Dry matter content. It was shown from the study that traits FSRY, HI, and DM were significant for genotype and months after planting and variable among the genotype while location had no effect on the yield traits. Early bulking genotypes were not high yielding and showed discontinuity at some point across the months. The retrogression in yield performance across months had no effect on the highest yielding. Also, for all the genotypes and across evaluated months, FSRY reduces at 9 MAP due to a reduction in dry matter content during the same month, and the best performing genotype was the genotype IBA90581, followed by IBA120036, IBA130896, and IBA980581 while the least performing was genotype IBA130818.

Keywords: early bulking, dry mater, harvest index, high yielding, root yield

Procedia PDF Downloads 180
287 Macronutrient Accumulation and Partitioning for Six Wheat Genotypes Grown at Contrasting Nitrogen Supply

Authors: E. Chakwizira, D. J. Moot, M. Andrews, E. Teixeira

Abstract:

Partitioning of macro-nutrients in wheat (Triticum aestivum L.) plant organs have not been extensively studied, particularly for modern genotypes grown under contrasting N supply. Nutrient accumulation and partitioning of phosphorus, potassium, calcium, magnesium and sulphur (P, K, Ca, Mg and S) were determined for six wheat genotypes [12S2-2021, 12S3-3019, 13S3-2026, Discovery, Duchess and Reliance] grown with (200 kg/ha) or without (0 kg/ha) nitrogen (N), in a fully irrigated field experiment in 2017-18 season at Lincoln, New Zealand. Data were collected at three growth stages (GS): tillering (GS21), anthesis (GS60) and grain maturity (GS92). Grain yield varied with both N and genotype; from 6-7.5 t/ha for the 0 kg N/ha crops and 8.1-9.3 t/ha for the 200 kg N/ha treatments. Plant nutrient uptake at maturity responded to both N supply and genotype for all nutrients, except S which did not differ among the genotypes. For example, total P uptake averaged 13.5 (12.4-14.3) kg/ha for the 0 kg N/ha treatments and 17.8 (15.1-19.7) kg/ha when 200 kg N/ha was applied. Similarly, K uptake increased from an average of 23 (21.6-25.3) for the 0 kg N/ha treatments to 34.3 (32.4-40.8) kg/ha when 200 kg N/ha was applied. Similar trends were observed for Ca and Mg. The S content only responded to N supply but not to genotype, increasing from 7.9 kg/ha for the 0 kg N treatments to 12.8 kg/ha when 200 kg N was applied. Relative nutrient content at anthesis compared with those at maturity were 30% for P, 100% for both K and Ca and 34% of Mg. Sulphur content at anthesis decreased 29% with N supply and was highest for genotypes 12S2-2021 compared with the other five genotype. At grain maturity, the ratio of nutrients in grain to total plant nutrient, defined as the nutrient harvest index (NHI) varied with both N supply and genotype. Averaged across treatments, the NHI was 0.96 for P, 0.53 for K, 0.58 for Ca, 0.90 for Mg and 0.85 for S. These results suggest that Ca and K should be provided earlier in the season as there is limited or no uptake after anthesis. These results also show that Ca and K are important for structural functions, while P, Mg and S are remobilised to the grains and become important for quality.

Keywords: anthesis, genotype, nutrient harvests index, NHI, Triticum aestivum L.

Procedia PDF Downloads 127