Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach
Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1316412Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 410
 Modan K Das, Ho-Kwok Dai, “A survey of DNA motif finding algorithms”, BMC Bioinformatics, 8 (Suppl 7): S 21, 2007.
 Tong Ihn Lee, Richard G.Jenner, LauireA.Boyer, Matthew G.Guenther, Stuart S.Levine, RoshanM.Kumar, et al., “Control of Developmental Regulators by Polycomb inHuman Embryonic Stem Cells”, Cell, vol. 125, no. 2, pp. 301– 313, 2006.
 Bling Ren, Francois Robert, John J.Wyrick, Oscar Aparicio, Ezra G.Jennings, Itamar Simon, et al., “Genome-wide Location and Function of DNA Binding Proteins”, Science, vol. 290, no. 5500, pp. 2306–2309, 2000.
 FedricoZambelli, GrazianoPesole, GiulioPavesi, “Motif discovery and transcription factor binding sites before and after the next-generation sequencing era”, Briefings Bioinformatics., vol.14, no. 2, pp. 225–237, 2013.
 NejatMahdieh, BaharehRabbani, “An Overview of Mutation Detection Methods in Genetic Disorders”, Iran J Pediatr, vol 23. No.4, pp: 375-388, 2013.
 Jeremy Buhler, Martin Tompa, “Finding Motifs Using Random Projections”, Journal of computational biology, vol. 9, no.2, pp. 225-242, 2002.
 ShripalVijayvargiya, Pratyoosh Shukla, “A Genetic Algorithm with Clustering for Finding Regulatory Motifs in DNA Sequences”, IJCA Special Issue on “Artificial Intelligence Techniques- Novel approaches & Practical Applications, pp. 6-10, AIT 2011.
 Rui Chen, Yun Peng, Byron Choi, JilanliangXu, Haibo Hu, “A private DNA motif finding algorithm”, Journal of Biomedical Informatics, vol. 50, pp. 122-132, 2014.
 Dianhui Wang, SarwarTapan, “Robust Elicitation Algorithm for Discovering DNA Motifs Using Fuzzy Self-Organizing Maps”, IEEE Transactions on neural networks and learning systems, vol. 24, no. 10, pp.1677-1688, 2013.
 Yetian Fan, Wei Wu, Rongrong Liu, Wenyu Yang, “An iterative algorithm for motif discovery”, 17th Asia Pacific Symposium on Intelligent and Evolutionary Systems, vol. 24, pp. 25-29, 2013.
 David L.Gonzalez-Alvarez, Miguel A. Veg-Rodriguez, Juan A. Gomez-Pulido, Juan M. Sanchez-Perez, “Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery”, Engineering Applications of Artificial Intelligence, vol. 26, pp. 314-326, 2013.
 Robert J.Pantazes, Jack Reifert, Joel Bozekowski, Kelly N.Ibsen, Joseph A. Murry, Patrick S.Daugherty, “Identifaction of disease specific motifs in the antibody specificity repertoire via next-generation sequencing”, Sci. Rep.6,pp. 1-11, 2016.
 Oleg V.Vishnevsky, AndreyV.Bocharnikov, Nikolay A. Kolchanov, “Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets”, Journal of Bioinformatics and Computational Biology, vol.16, no.1, pp. 1740012: 1-23, 2017.
 NungKion Lee, Allen ChiengHoonChoong, “Filtering of background DNA sequences improves DNA motif prediction using clustering techniques”, Procedia- Social and behavioural Sciences, vol.97, pp. 602-611, 2013.
 Jian-Jun SHU, “Identification of DNA Motif with Mutation”, Procedia Computer Science, vol. 51, pp. 602-609, 2015.
 Shaun Mahony, Panayiotis V.Benos, Terry J. Smith, Aaron Golden, “Self-organizing neural networks to support the discovery of DNA-binding motifs”, Neural networks, vol. 19, pp. 950-962, 2006.
 Sumedha S.Gunawardena,“Optimum-time, Optimum-space, Algorithms for k-mer Analysis of Whole Genome Sequences”, Journal of Bioinformatics and Comparative Genomics, vol.1, pp.1-12, 2014.
 TeuvoKohonen, PanuSomervuo,“Self-organizing maps of symbol strings”, Elsevier, Neurocomputing, vol. 21, pp.19-30, 1998.
 Marghny Mohamed, AbeerA.Al-Mehdhar, Mohamed Bamatraf, Moheb R.Girgis,“Enhanced Self-Organizing Map Neural Network for DNA Sequence Classification”, Intelligent Information Management, vol.5, pp.25-33, 2013.
 Igor Fischer, Andreas Zell, “String averages and self-organizing maps for strings”, Proceedings of the ICSC Symposia on Neural Computing, pp. 208-215, 2000.
 Nassiri, Azadian, Nejad, “A Sequence Associated with Intrinsic Mutation Hot-Spots in Human DNA”, Journal of Poteomics and Bioinformatics, vol. 6, 2013.
 PatrikDhaseleer, “What are DNA sequence motifs?”, Nature Biotechnology, vol.24,no.4,pp.423-425,2006.