Dimension Reduction of Microarray Data Based on Local Principal Component
Authors: Ali Anaissi, Paul J. Kennedy, Madhu Goyal
Abstract:
Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.
Keywords: Linear Dimension Reduction, Non-Linear Dimension Reduction, Principal Component Analysis, Biologists.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1081900
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1574References:
[1] V. Tenenbaum and J.C. Langford, A Global Geometric framework For Nonlinear Dimensionality reduction. Science, 290 (5500):23192323,2009.
[2] S.T. Roweis and L.K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):23232326, 2000.
[3] C. Bowman, R. Baumgartner et al, Dimensionality Reduction for BiomedicalSpectra. Electrical and Computer Engineering, 2002. IEEE CCECE,2002.
[4] P. J. Kennedy, S. J. Simoff, D. Skillicorn and D. Catchpoole, Extracting and Explaining Biological Knowledge in Microarray Data. Proc. Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Sydney. (eds) Dai, H., Srikant, R., and Zhang, C., LNAI 3056, pp 699- 703, Springer-Verlag Berlin, 2004.
[5] I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3 (2003) 1157-1182, 2002.
[6] J. Lee and M. Verleysen, Nonlinear Dimensionality Reduction Springer, 2007.
[7] J. Quansheng, J. Minping, et al., New approach of intelligent fault diagnosis based on LLE algorithm. Control and Decision Conference, 2008. CCDC 2008. Chinese, 2008.
[8] C. Varini, T. W. Nattkemper, et al., Breast MRI Data Analysis by LLE.Neural Networks, 2004.Proceedings. 2004 IEEE International Joint Conference, 2004.
[9] H. Tian, H. and D.G. Goodenough, Nonlinear Feature extraction of Hyperspectral Data Based on Locally Linear Embedding (LLE). In Geoscience and Remote Sensing Symposium, 2005.IGARSS -05.Proceedings.2005 IEEE International. 2005.
[10] Z. Zhang and H. Zha, Principal Manifolds and Nonlinear DimensionalityReduction Via Local tangent Space Alignment. SIAM Journal ofScientific Computing, 26(1):313338, 2004.
[11] D. Ridder and D. Rober, Locally Linear Embedding for classification. In the Pattern Recognition Group Technical Report Series. ICIP. 2005.
[12] E. Anderson, The Irises of the gasp Peninsula. Bulletin of the American Iris Society, 59(2-5), 1935.
[13] S. Kaski, J. Nikkila and et al., Trustworthiness and metrics in Visualizing Similarity of gene Expression. BMC Bioinformatics, 4:48, 2003.
[14] J. Venna, and S. Kaski, Visualizing gene Interaction Graphs With Local Multidimensional Scaling. In Michel Verleysen, editor, Proceedings of the 14th European Symposium on Artificial Neural Networks (ESANN2006), Bruges, Belgium, April 2628, pp. 557562, d-side, Evere, Belgium, 2006.
[15] K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space . Philosophical Magazine, 2:559-572, 1901.
[16] A. Anaissi ,P. Kennedy and M. Goyal, A Framework for Very High Dimensional Data Reduction in the Microarray Domain . IEEEBITA, 2010.