{"title":"A Pairwise-Gaussian-Merging Approach: Towards Genome Segmentation for Copy Number Analysis","authors":"Chih-Hao Chen, Hsing-Chung Lee, Qingdong Ling, Hsiao-Jung Chen, Sun-Chong Wang, Li-Ching Wu, H.C. Lee","country":null,"institution":"","volume":51,"journal":"International Journal of Biotechnology and Bioengineering","pagesStart":155,"pagesEnd":164,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/14262","abstract":"Segmentation, filtering out of measurement errors and\r\nidentification of breakpoints are integral parts of any analysis of\r\nmicroarray data for the detection of copy number variation (CNV).\r\nExisting algorithms designed for these tasks have had some successes\r\nin the past, but they tend to be O(N2) in either computation time or\r\nmemory requirement, or both, and the rapid advance of microarray\r\nresolution has practically rendered such algorithms useless. Here we\r\npropose an algorithm, SAD, that is much faster and much less thirsty\r\nfor memory \u2013 O(N) in both computation time and memory requirement\r\n-- and offers higher accuracy. The two key ingredients of SAD are the\r\nfundamental assumption in statistics that measurement errors are\r\nnormally distributed and the mathematical relation that the product of\r\ntwo Gaussians is another Gaussian (function). We have produced a\r\ncomputer program for analyzing CNV based on SAD. In addition to\r\nbeing fast and small it offers two important features: quantitative\r\nstatistics for predictions and, with only two user-decided parameters,\r\nease of use. Its speed shows little dependence on genomic profile.\r\nRunning on an average modern computer, it completes CNV analyses\r\nfor a 262 thousand-probe array in ~1 second and a 1.8 million-probe\r\narray in 9 seconds","references":null,"publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 51, 2011"}