SOM Related Abstracts

5 A Comparative Study of Multi-SOM Algorithms for Determining the Optimal Number of Clusters

Authors: Mohamed Limam, Imèn Khanchouch, Malika Charrad


The interpretation of the quality of clusters and the determination of the optimal number of clusters is still a crucial problem in clustering. We focus in this paper on multi-SOM clustering method which overcomes the problem of extracting the number of clusters from the SOM map through the use of a clustering validity index. We then tested multi-SOM using real and artificial data sets with different evaluation criteria not used previously such as Davies Bouldin index, Dunn index and silhouette index. The developed multi-SOM algorithm is compared to k-means and Birch methods. Results show that it is more efficient than classical clustering methods.

Keywords: Clustering, SOM, multi-SOM, DB index, Dunn index, silhouette index

4 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach

Authors: G. Tamilpavai, C. Vishnuppriya


Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.

Keywords: Bioinformatics, Dna, SOM, cancer motif, k-mers, Levenshtein distance

3 Hyperspectral Data Classification Algorithm Based on the Deep Belief and Self-Organizing Neural Network

Authors: Huang Yong, Li Qingjian, Li Ke, He Chun


In this paper, the method of combining the Pohl Seidman's deep belief network with the self-organizing neural network is proposed to classify the target. This method is mainly aimed at the high nonlinearity of the hyperspectral image, the high sample dimension and the difficulty in designing the classifier. The main feature of original data is extracted by deep belief network. In the process of extracting features, adding known labels samples to fine tune the network, enriching the main characteristics. Then, the extracted feature vectors are classified into the self-organizing neural network. This method can effectively reduce the dimensions of data in the spectrum dimension in the preservation of large amounts of raw data information, to solve the traditional clustering and the long training time when labeled samples less deep learning algorithm for training problems, improve the classification accuracy and robustness. Through the data simulation, the results show that the proposed network structure can get a higher classification precision in the case of a small number of known label samples.

Keywords: Data Compression, Hyperspectral, pattern classification, DBN, SOM

2 Finding the Longest Common Subsequence in Normal DNA and Disease Affected Human DNA Using Self Organizing Map

Authors: G. Tamilpavai, C. Vishnuppriya


Bioinformatics is an active research area which combines biological matter as well as computer science research. The longest common subsequence (LCSS) is one of the major challenges in various bioinformatics applications. The computation of the LCSS plays a vital role in biomedicine and also it is an essential task in DNA sequence analysis in genetics. It includes wide range of disease diagnosing steps. The objective of this proposed system is to find the longest common subsequence which presents in a normal and various disease affected human DNA sequence using Self Organizing Map (SOM) and LCSS. The human DNA sequence is collected from National Center for Biotechnology Information (NCBI) database. Initially, the human DNA sequence is separated as k-mer using k-mer separation rule. Mean and median values are calculated from each separated k-mer. These calculated values are fed as input to the Self Organizing Map for the purpose of clustering. Then obtained clusters are given to the Longest Common Sub Sequence (LCSS) algorithm for finding common subsequence which presents in every clusters. It returns nx(n-1)/2 subsequence for each cluster where n is number of k-mer in a specific cluster. Experimental outcomes of this proposed system produce the possible number of longest common subsequence of normal and disease affected DNA data. Thus the proposed system will be a good initiative aid for finding disease causing sequence. Finally, performance analysis is carried out for different DNA sequences. The obtained values show that the retrieval of LCSS is done in a shorter time than the existing system.

Keywords: Clustering, SOM, k-mers, longest common subsequence

1 Neural Network Approach For Clustering Host Community: Based on Perceptions Toward Tourism, Their Satisfaction Level and Demographic Attributes in Iran (Lahijan)

Authors: Ali Rajabzadeh, Nasibeh Mohammadpour, Adel Azar, Hamid Zargham Borujeni


Generally, various industries development depends on their stakeholders and beneficiaries supports. One of the most important stakeholders in tourism industry ( which has become one of the most important lucrative and employment-generating activities at the international level these days) are host communities in tourist destination which are affected and effect on this industry development. Recognizing host community and its segmentations can be important to get their support for future decisions and policy making. In order to identify these segments, in this study, clustering of the residents has been done by using some tools that are designed to encounter human complexities and have ability to model and generalize complex systems without any needs for the initial clusters’ seeds like classic methods. Neural networks can help to meet these expectations. The research have been planned to design neural networks-based mathematical model for clustering the host community effectively according to multi criteria, and identifies differences among segments. In order to achieve this goal, the residents’ segmentation has been done by demographic characteristics, their attitude towards the tourism development, the level of satisfaction and the type of their support in this field. The applied method is self-organized neural networks and the results have compared with K-means. As the results show, the use of Self- Organized Map (SOM) method provides much better results by considering the Cophenetic correlation and between clusters variance coefficients. Based on these criteria, the host community is divided into five sections with unique and distinctive features, which are in the best condition (in comparison other modes) according to Cophenetic correlation coefficient of 0.8769 and between clusters variance of 0.1412.

Keywords: Tourism, Clustering, SOM, resident, Artificial Nural Network

