%0 Journal Article
	%A Matthias Dehmer and  Frank Emmert Streib and  J├╝rgen Kilian and  Andreas Zulauf
	%D 2007
	%J International Journal of Computer and Information Engineering
	%B World Academy of Science, Engineering and Technology
	%I Open Science Index 10, 2007
	%T Towards Clustering of Web-based Document Structures
	%U https://publications.waset.org/pdf/10653
	%V 10
	%X Methods for organizing web data into groups in order
to analyze web-based hypertext data and facilitate data availability
are very important in terms of the number of documents available
online. Thereby, the task of clustering web-based document structures
has many applications, e.g., improving information retrieval on the
web, better understanding of user navigation behavior, improving web
users requests servicing, and increasing web information accessibility.
In this paper we investigate a new approach for clustering web-based
hypertexts on the basis of their graph structures. The hypertexts will
be represented as so called generalized trees which are more general
than usual directed rooted trees, e.g., DOM-Trees. As a important
preprocessing step we measure the structural similarity between the
generalized trees on the basis of a similarity measure d. Then,
we apply agglomerative clustering to the obtained similarity matrix
in order to create clusters of hypertext graph patterns representing
navigation structures. In the present paper we will run our approach
on a data set of hypertext structures and obtain good results in
Web Structure Mining. Furthermore we outline the application of
our approach in Web Usage Mining as future work.
	%P 3315 - 3320