Information Filtering using Index Word Selection based on the Topics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32799
Information Filtering using Index Word Selection based on the Topics

Authors: Takeru YOKOI, Hidekazu YANAGIMOTO, Sigeru OMATU

Abstract:

We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Factorization. In information filtering, a document is often represented with the vector in which the elements correspond to the weight of the index words, and the dimension of the vector becomes larger as the number of documents is increased. Therefore, it is possible that useless words as index words for the information filtering are included. In order to address the problem, the dimension needs to be reduced. Our proposal reduces the dimension by selecting index words based on the topics included in a document set. We have applied the Sparse Non-negative Matrix Factorization to the document set to obtain these topics. The filtering is carried out based on a centroid of the learning document set. The centroid is regarded as the user-s interest. In addition, the centroid is represented with a document vector whose elements consist of the weight of the selected index words. Using the English test collection MEDLINE, thus, we confirm the effectiveness of our proposal. Hence, our proposed selection can confirm the improvement of the recommendation accuracy from the other previous methods when selecting the appropriate number of index words. In addition, we discussed the selected index words by our proposal and we found our proposal was able to select the index words covered some minor topics included in the document set.

Keywords: Information Filtering, Sparse NMF, Index wordSelection, User Profile, Chi-squared Measure

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1083549

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1407

References:


[1] G.Salton, M.J.McGill: "Introduction to Modern Information Retrieval", McGraw-Hill Book Company, 1983.
[2] P.O.Hoyer, "Non-negative Matrix Factorization with Sparseness Constraints'", Journal of Machine Learning Research, Vol. 5, pp. 1457-1469, 2004.
[3] D.Lee and H.Seung, "Algorithms for non-negative matrix factorization", NIPS 2000, 2000.
[4] D.Lee and H.Seung, "Learning the parts of objects by non-negative matrix factorization", Nature, Vol. 401, pp.788-791
[5] S.Tsuge, M.Shishibori, S.Kuroiwa and K.Kita: "Dimensionality Reduction Using Non-negative Matrix Factorization for Information Retrieval", Natural Language Processing and Knowledge Engineering Mini Symposium, IEEE SYSTEMS, MAN, AND CYBERNETICS 2001 (NLPKE), pp.960-965, 2001
[6] E. P. Jiang: "Information Retrieval and Filtering Using the Riemannian SVD", Ph.D. Thesis, Dept. of Computer Science, The University of Tennessee, Knoxville, TN, 1988.
[7] S.Deerwester, T.Dumais, T.Landauer, W.Furnas and A.Harshman: "Indexing by Latent Semantic Analysis", Journal of the Society for Information Science, Vol.41, No.6, pp.391-497
[8] T.Kolenda and L.K.Hansen: "Independent Components in Text", Advances in Independent Component Analysis, Springer-Verlag, 2000.
[9] T.Yokoi, H.Yanagimoto and S.Omatu: "The Proposal for the Way to Recommend Information with ICA", The Ninth Int. Synp. on Artificial Life and Robotics(AROB 9th '04), Proc. pp. 694-697, 2004
[10] Xu. W., Liu. X., Gong. Y.:"Document Clustering Based On Non-negative Matrix Factorization", Proceedings of SIGIR-03, pp.267-273, 2003.
[11] M.W. Berry, M. Browne, A.N. Langville, "Algorithms and Applications for Approximate Nonnegative Matrix Factorization", V.P. Pauca, and R.J. Plemmons, Computational Statistics & Data Analysis 52(1), pp. 155-173, 2007.
[12] P.O.Hoyer, "Nonnegative Sparse Coding", Proc. IEEE Workshop Neural Networks for Signal Processing, 2002
[13] Xu.W., Liu. X., Gong. Y., "Nonnegative Matrix Factorization for Visual Coding", Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing(ICASSP2003), 2003
[14] Y.Matsuo and M.Ishizuka, "Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information", Int'l Journal on Artificial Intelligence Tools, Vol.13, No.1, pp.157-169, 2004
[15] Yukio Ohsawa, Nels E. Benson and Masahiko Yachida, "KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor", Proc. Advanced Digital Library Conference (IEEE ADL'98), pp.12-18 (1998)
[16] J. Rocchio: "Relevance Feedback in Information Retrieval", The SMART Retrieval System Experiments in Automatic Document Processing, pp313-323, 1971.
[17] SMART stop-list ftp://ftp.cs.cornell.edu/pub/smart/english.stop