A Review on Important Aspects of Information Retrieval
Authors: Yogesh Gupta, Ashish Saini, A.K. Saxena
Abstract:
Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.
Keywords: Information Retrieval, query expansion, similarity measure, query expansion, vector space model.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1336508
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3339References:
[1] Lauren D, Joseph B (1975) Information Retrieval and Processing, Melville.
[2] Singhal A (2001) Modern Information Retrieval: A Brief Overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
[3] Mooers CN (1950) Information retrieval viewed as temporal signaling. Proceedings of the International Congress of Mathematicians 1: 572-573.
[4] Savino P, Sebastiani F (1998) Essential bibliography on multimedia information retrieval, categorization and filtering. 2nd European Digital Libraries Conference Tutorial on Multimedia Information.
[5] Liu B (2007) Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer-Verlag, Berlin Heidelberg.
[6] Khatatneh K, Hussain I (2002) Information Retrievals tree v/s Inverted File Word Method for Arabic language. Prince Abdu Allah Bin Ghazi for IT, Al-Balqa Applied University Salt, Jordan.
[7] Christopher DM, Raghavan P, Hinrich S (2008) An Introduction to Information Retrieval. Cambridge University Press.
[8] Soper HE (1920) Means for compiling tabular and statistical data, U.S.
[9] Bush V (1945) As We May Think. Atlantic Monthly 176: 101-108.
[10] Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1(4): 309–317.
[11] Mitchell HF (1953) The Use of the University AC FAC-Tronic System in the Library Reference Field. American Documentation 4(1): 16-17.
[12] Nanus B (1960) The Use of Electronic Computers for Information Retrieval. Bull Med Library Association 48(3): 278-291.
[13] Taube M, Gull CD, Wachtel IS (1952) Unit terms in coordinate indexing. American Documentation 3(4): 213-218.
[14] Cleverdon CW (1959) The Evaluation of Systems Used in Information Retrieval. Proceedings of the International Conference on Scientific Information 2: 687-698.
[15] Cleverdon CW (1991) The significance of the Cranfield tests on index languages. Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, Chicago, Illinois, United States, 3-12.
[16] Luhn HP (1958) The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2): 159-165.
[17] Maron ME, Kuhns JL, Ray LC (1959) Probabilistic indexing: A statistical technique for document identification and retrieval. Thomson Wooldridge Inc, Los Angeles.
[18] Switzer P (1963) Vector Images in Document Retrieval, Harvard University.
[19] Salton G (1968) Automatic Information Organization and Retrieval. McGraw Hill.
[20] Rocchio JJ (1965) Relevance Feedback in Information Retrieval. Harvard University.
[21] Stevens ME, Giuliano VE, Heilprin LB (1964) Statistical association methods for mechanized documentation. Symposium proceedings, Washington.
[22] Rijsbergen CJV (1979) Information Retrieval, Butterworth-Heinemann Ltd.
[23] Bjorner S, Ardito SC (2003) Online Before the Internet, Part 1: Early Pioneers Tell Their Stories. Searcher: The Magazine for Database Professionals 11(6).
[24] Turtle H (1994) Natural language vs. Boolean query evaluation: A comparison of retrieval performance. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 212-220.
[25] Spärck KJ (1972) A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28(1): 11-21.
[26] Salton G, Yang CS (1973) On the Specification of Term Values in Automatic Indexing, Department of Computer Science, Cornell University, Ithaca, New York.
[27] Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Communications of the ACM 18(11): 613-620.
[28] Robertson SE (1977) The probability ranking principle in IR. Journal of Documentation 33: 294-304.
[29] Robertson SE, Spärck KJ (1976) Relevance weighting of search terms. Journal of the American Society for Information science 27(3): 129-146.
[30] Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Information processing & management 24(5): 513-523.
[31] Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. Journal of the American society for information Science 41(6): 391-407.
[32] Harman DK (1993) Overview of the first Text REtrieval Conference (TREC-1). Proceedings of the First Text REtrieval Conference (TREC-1), 1–20.
[33] Kraft DH, Martin MJ, Chen J (2003) Sanchez Rules and fuzzy rules in text: concept, extraction and usage. International Journal of Approximate Reasoning 34: 145-161.
[34] Resnik P (1999) Semantic Similarity in Taxonomy: An Information based Measure and its Application to problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11: 95-130.
[35] Lin D (1998) Automatic Retrieval and Clustering of Similar Words. International Committee on Computational Linguistics and the Association for Computational Linguistics, 768-774.
[36] Fan W, Gordon M, Pathak P. (1999) Automatic generation of a matching function by genetic programming for effective information retrieval. America’s Conference on Information System, Milwaukee, USA.
[37] Fan W, Gordon MD, Pathak P (2000) Personalization of search engine services for effective retrieval and knowledge management. Proceedings of International Conference on Information Systems (ICIS), Brisbane, Australia.
[38] Pathak P, Gordon M, Fan W (2000) Effective information retrieval using genetic algorithms based matching functions adaption. Proceedings of 33rd Hawaii International Conference on Science, Hawaii, USA.
[39] Pei J, Han J, Mortazavi AB, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining Sequential Patterns by Pattern growth: the Prefix span Approach. IEEE Transactions on Knowledge and Data Engineering 16(11): 1424-1440.
[40] Li M, Chen X, Li X, Ma B, Paul M, et al. (2004) The Similarity Metric. IEEE Transactions on Information Theory 50(12): 3250-3264.
[41] Sahami M, Heilman T (2006) A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. 15th International Conference on World Wide Web, 377-386.
[42] Chen H, Lin M, Wei Y (2006) Novel Association Measures using Web Search with Double Checking. International Committee on Computational Linguistics and the Association for Computational Linguistics, 1009-1016.
[43] Cilibrasi R, Vitanyi P (2007) The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3): 370-383.
[44] Hughes T, Ramage D (2007) Lexical Semantic Relatedness with Random Graph Walks. Conference on Empirical Methods in Natural Language Processing Conference on Computational Natural Language Learning, 581-589.
[45] Zuber V, Faltings B (2007) OSS: A Semantic Similarity Function Based on Hierarchical Ontologies. International Joint Conference on Artificial Intelligence, 551-556.
[46] Korenius T, Laurikkala J, Juhola M (2007) On principal component analysis, cosine and Euclidean measures in information retrieval. Information Science 177: 4893-4905.
[47] Gledson A, Keane J (2008) Using Web-Search Results to Measure Word-Group Similarity. 22nd International Conference on Computational Linguistics, 281-288.
[48] Torra V, Narukawa Y (2008) Word Similarity from dictionaries: Inferring Fuzzy measures and Fuzzy graphs. International Journal of Computational Intelligence Systems 1(1): 19-23.
[49] Bollegala D, Matsuo Y, Ishizuka M (2011) A Web Search Engine-based Approach to Measure Semantic Similarity between Words. IEEE Transactions on Knowledge and Data Engineering 23(7): 977-990.
[50] Chen SJ (2011) Fuzzy information retrieval based on a new similarity measure of generalized fuzzy numbers. Intelligent Automation and Soft Computing 17(4): 465-476.
[51] Usharanim J, Iyakutti K (2013) A Genetic Algorithm based on Cosine Similarity for Relevant Document Retrieval. International Journal of Engineering Research and Technology 2(2).
[52] Maron ME, Kuhns JL (1960) On relevance, probabilistic indexing and information Retrieval. Journal of the Association for Computing Machinery 1(7): 216-244.
[53] Ide E (1971) New experiments in relevance feedback. The SMART Retrieval System, Englewood Cliffs, 337–354.
[54] Rocchio JJ (1971) Relevance feedback in information retrieval. The SMART Retrieval System, Prentice-Hall, Englewood Cliffs, 313–323.
[55] Harper GW, Rijsbergen CJV (1978) An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 3: 189-216.
[56] Lesk ME (1969) Word-Word Associations in Document Retrieval Systems. American Documentation 1: 8–36.
[57] Minker J, Wilson GA, Zimmerman BH (1972) An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Information Storage and Retrieval 6: 329-348.
[58] Doszkocs TE (1978) AID, an Associative Interactive Dictionary for Online Searching. Online Revision 2: 163–174.
[59] Porter MF (1982) Implementing a probabilistic information retrieval system. Information Technology: Research and Development, 2: 131-156.
[60] Xu J, Croft WB (1996) Query expansion using local and global document analysis. Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland, 4–11.
[61] Cooper JW, Byrd RJ (1998) OBIWAN—a visual interface for prompted query refinement. Proceedings of the 31st Hawaii international conference on system sciences, Hawaii, 2, 277–285.
[62] Chen H (1998) A machine learning approach to inductive query by examples: an experiment using relevance feedback, ID3, genetic algorithms and simulated annealing. Journal of the American Society for Information Science 49(8): 693-705.
[63] Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. Journal of the American society for information science 41: 288–297.
[64] Harman DK (1992) Relevance feedback and other query modification techniques. Information Retrieval Data Structures and Algorithms, Prentice Hall, Englewood Cliffs, 241-263.
[65] Horng J, Yeh C (2000) Applying genetic algorithms to query optimization in document retrieval. Information Processing and Management 36: 737-759.
[66] Li WS, Agrawal D (2000) Supporting web query expansion efficiently using multi-granularity indexing and query processing. Journal of Data and Knowledge Engineering 35(3): 239-257.
[67] Wei J, Bressan S, Ooi BC (2000) Mining term association rules for automatic global query expansion: Methodology and preliminary results. Proceedings of the first international conference on web information systems engineering, Hong Kong.
[68] Chen H, Yu JX, Furuse K, Ohbo N (2001) Support IR query refinement by partial keyword set. Proceedings of the second international conference on web information systems engineering, Singapore 1: 245–253.
[69] Takagi T, Tajima M (2001) Query expansion using conceptual fuzzy sets for search engine. Proceedings of the 10th IEEE international conference on fuzzy systems, Melbourne, Australia, 1303-1308.
[70] Kim BM, Kim JY, Kim J (2001) Query term expansion and reweighting using term co-occurrence similarity and fuzzy inference. Proceedings of the joint ninth IFSA world congress and 20th NAFIPS international conference, Vancouver, Canada.
[71] Cui H, Wen JR, Nie JY, Ma WY (2002) Probabilistic query expansion using query logs. Proceedings of the 11th international conference on World Wide Web, Honolulu, Hawaii, 325-332.
[72] Billerbeck B, Scholer F, Williams HE, Zobel J (2003) Query expansion using associated queries. Proceedings of the 12th international conference on information and knowledge management, New Orleans, 2-9.
[73] Chang YC, Chen SM, Liau CJ (2003) A new query expansion method based on fuzzy rules. Proceedings of the seventh joint conference on AI, Fuzzy system, and Grey system, Taipei, Taiwan, Republic of China.
[74] Jin Q, Zhao J, Xu B (2003) Query expansion based on term similarity tree model. Proceedings of the 2003 international conference on natural language processing and knowledge engineering, Beijing, China, 400-406.
[75] Latiri CC, Elloumi S, Chevallet JP, Jaoua A (2003) Extension of fuzzy Galois connection for information retrieval using a fuzzy quantifier. Proceedings of IEEE international conference on computer systems and applications, Tunis, Tunisia.
[76] Nakauchi K, Ishikawa Y, Morikawa H, Aoyama T (2003) Peer-to-peer keyword search using keyword relationship. Proceedings of the 3rd IEEE international symposium on cluster computing and the grid, Tokyo, Japan, 359-366.
[77] Safar B, Kefi H (2003) Domain ontology and Galois lattice structure for query refinement. Proceedings of the 15th IEEE international conference on tools with artificial intelligence, Sacramento, California, 597-601.
[78] Berardi M, Lapi M, Leo P, Malerba D, Marinelli C, et al. (2004) A data mining approach to PubMed query refinement. Proceedings of the 15th international workshop on database and expert systems applications, Zaragoza, Spain, 401-405.
[79] Martin M J, Sanches D, Chamorro J, Serrano JM, Vila MA (2004) Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets and Systems 148(1): 85-104.
[80] Stojanovic N (2004) On using query neighborhood for better navigation through a product catalog: SMART approach. Proceedings of the 2004 IEEE international conference on e-Technology, e-Commerce and e-Service, Taipei, Taiwan, 405-412.
[81] Lin HC, Wang LH, Chen SM (2005) A new query expansion method for document retrieval by mining additional query terms. Proceedings of the 2005 International conference on business and information, Hong Kong, China.
[82] Michel B, Annabelle M (2004) An Information Retrieval Model using the Fuzzy Proximity Degree of Term occurrences. SAC’05, Santa De, New Mexico, USA.
[83] Chang YC, Chen SM (2006) A New Query Reweighting Method for Document Retrieval Based on Genetic Algorithms. IEEE Transactions on Evolutionary Computation 10(5): 617-622.
[84] Grootjen FA, Weide TP (2006) Conceptual Query Expansion. Data and Knowledge Engineering 56: 174-193.
[85] Billerbeck B, Zobel J (2006) Efficient query expansion with auxiliary data structures. Information Systems 31: 573-584.
[86] Chang YC, Chen SM, Liau CJ (2007) A new query expansion method for document retrieval based on the inference of fuzzy rules. Journal of Chinese Institute of Engineers 30(3): 511-515.
[87] Nowacka K, Zadrozny S, Kacprzyk J (2008) A new fuzzy logic based information retrieval model. Proceeding of IPMU’08, 1749-1756.
[88] Fattahi R, Wilson CS, Cole F (2008) An alternative approach to natural language query expansion in search engines: Text analysis of non-topical terms in Web documents. Information Processing and Management 44: 1503-1516.
[89] Cecchini RL, Carlos ML, Ana GM, Brignole N (2008) Using genetic algorithms to evolve a population of topical queries. Information Processing and Management 44: 1863-1878.
[90] Carlos ML, Ana GM (2009) A semi-supervised incremental algorithm to automatically formulate topical queries. Information Sciences 179: 1881-1892.
[91] Wasilewski P (2011) Query Expansion by Semantic Modeling of Information Need. Proceedings of International Workshop CS&P.
[92] Liu Z, Natarajan S, Chen Y (2011) Query Expansion based on Clustered Results. Proceedings of the VLDB Endowment, 4(6).
[93] Tayal DK, Sabharwal S, Jain A, Mittal K (2012) Intelligent query expansion for the queries including numerical terms. Proceedings of International Journal of Computer Applications, 35-39.
[94] Latiri C, Haddad H, Hamrouni T (2012) Towards an effective automatic query expansion process using an association rule mining approach. Journal of Intelligent Information System, 209-247.
[95] Yates RB, Berthier R (1999) Modern Information retrieval, Addisson Wesley.
[96] Cooper WS (1988) Getting beyond Boole. Information Processing and Management 24: 243-225.
[97] Salton G (1998) Automatic text processing: the transformation, analysis, and retrieval of information by computer, Addison-Wesley.
[98] Witten I, Moffat A, Bell T (1999) Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann.
[99] Joachims T (1997) A Probabilistic Analysis of the Rocchio Algorithm with TF-IDF for Text Categorization. In Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA.
[100] Hull TC (1994) On the mathematics of flat origamis. Congressus Numerantium 100: 215-224.
[101] Soboroff I, Nicholas C (2000) Collaborative Filtering and the Generalized Vector Space Model. Proceedings of the 23rd Annual International Conference on Research and Development in Information Retrieval, Athens, Greece.
[102] Qiu Y, Frei HP (1993) Concept based query expansion. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, USA, ACM Press, 160-169.
[103] Xu J (1997) Solving the word mismatch problem through text analysis, Ph.D. Thesis, University of Massachusetts, Department of Computer Science, USA.