TY - JFULL AU - G. Lavanya Devi and Allam Appa Rao and A. Damodaram and GR Sridhar and G. Jaya Suma PY - 2008/4/ TI - Clustering Protein Sequences with Tailored General Regression Model Technique T2 - International Journal of Biomedical and Biological Engineering SP - 66 EP - 71 VL - 2 SN - 1307-6892 UR - https://publications.waset.org/pdf/10604 PU - World Academy of Science, Engineering and Technology NX - Open Science Index 15, 2008 N2 - Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them. ER -