A Pairwise-Gaussian-Merging Approach: Towards Genome Segmentation for Copy Number Analysis

Chih-Hao Chen; Hsing-Chung Lee; Qingdong Ling; Hsiao-Jung Chen; Sun-Chong Wang; Li-Ching Wu; H.C. Lee

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

A Pairwise-Gaussian-Merging Approach: Towards Genome Segmentation for Copy Number Analysis

Authors: Chih-Hao Chen, Hsing-Chung Lee, Qingdong Ling, Hsiao-Jung Chen, Sun-Chong Wang, Li-Ching Wu, H.C. Lee

Abstract:

Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N2) in either computation time or memory requirement, or both, and the rapid advance of microarray resolution has practically rendered such algorithms useless. Here we propose an algorithm, SAD, that is much faster and much less thirsty for memory – O(N) in both computation time and memory requirement -- and offers higher accuracy. The two key ingredients of SAD are the fundamental assumption in statistics that measurement errors are normally distributed and the mathematical relation that the product of two Gaussians is another Gaussian (function). We have produced a computer program for analyzing CNV based on SAD. In addition to being fast and small it offers two important features: quantitative statistics for predictions and, with only two user-decided parameters, ease of use. Its speed shows little dependence on genomic profile. Running on an average modern computer, it completes CNV analyses for a 262 thousand-probe array in ~1 second and a 1.8 million-probe array in 9 seconds

Keywords: Cancer, pathogenesis, chromosomal aberration, copy number variation, segmentation analysis.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1082655

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1481

References:

[1] Solinas-Toldo, S. et al. (1997) Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer, 20, 399-407.
[2] Pinkel, D. et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet., 20, 207-211.
[3] Pinkel, D. and Albertson, D.G. (2005) Array comparative genomic hybridization and its applications in cancer. Nat. Genet., 37, Suppl 11-17.
[4] Pollack, J.R. et al. (1999) Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet., 23, 41-46.
[5] Brennan, C. et al. (2004) High-resolution global profiling of genomic alterations with long oligonucleotide microarray. Cancer Res., 64, 4744-4748.
[6] Lucito, R. et al. (2003) Representational oligonucleotide microarray analysis: a highresolution method to detect genome copy number variation. Genome Res., 13, 2291-2305.
[7] Ishkanian, A.S. et al. (2004) A tiling resolution DNAmicroarray with complete coverage of the human genome. Nat. Genet., 36, 299-303.
[8] Lai, W.R., Johnson, M.D., Kucherlapati, R., & Park, P.J. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics, 21, 3763-3770.
[9] Hsu, L. et al. (2005) Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics, 6, 211-226.
[10] Eilers, P.H.C. and de Menezes, R.X. (2005) Quantile smoothing of array CGH data. Bioinformatics, 21, 1146-1153.
[11] Picard, F., Robin, S., Lavielle, M., Vaisse, C. & Daudin J. (2005) A statistical approach for array CGH data analysis. BMC Bioinforma., 6, 27.
[12] Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557-572.
[13] Myers, C.L., Dunham, M.J., Kung, S.Y. & Troyanskaya, O.G. (2004) Accurate detection of aneuploidies in array CGH and gene expression microarray data. Bioinformatics, 20, 3533-3543
[14] Wang, P., Kim, Y., Pollack, J., Narasimhan, B. & Tibshirani, R. (2005) A method for calling gains and losses in array CGH data. Biostatistics, 6, 45-58.
[15] Lingj├ªrde, O.C., Baumbusch, L.O., Liest├©l, K., Glad, I.K. & B├©rresen-Dale A. (2005) CGH-Explorer: a program for analysis of array-CGH data. Bioinformatics, 21, 821-822.
[16] Fridlyand,J. et al. (2004) Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal., 90, 132-153
[17] Hupé, P., Stransky, N., Thiery, J., Radvanyi, F. & Barillot, E. (2004) Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics, 20, 3413-3422.
[18] Jong, K. et al. (2003) Chromosomal breakpoint detection in human cancer. In Lecture Notes in Computer Science, Springer-Verlag, Berlin, Vol. 2611, pp. 54-65.
[19] Wang, P., Kim, Y., Pollack, J., Narasimhan, B. & Tibshirani, R. (2005) A method for calling gains and losses in array CGH data. Biostatistics, 6, 45-58.
[20] Venkatraman, E.S. and Olshen, A.B. (2007) A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 23, 657-663.
[21] Lee, Hsin-Chung. Private Communication.
[22] Snijders, A.M. et al. (2001) Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet., 29, 263-264.
[23] Ting, J.C., Ye, Y., Thomas, G.H., Ruczinski, I. & Pevsner, J. (2006) Analysis and visualization of chromosomal abnormalities in SNP data with SNPscan. BMC Bioinformatics, 7, 25