Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7192

Search results for: Information retrieval systems

7192 A Review on Important Aspects of Information Retrieval

Authors: Yogesh Gupta, Ashish Saini, A.K. Saxena

Abstract:

Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.

Keywords: Information Retrieval, query expansion, similarity measure, query expansion, vector space model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2763
7191 Performance Evaluation of Content Based Image Retrieval Using Indexed Views

Authors: Tahir Iqbal, Mumtaz Ali, Syed Wajahat Kareem, Muhammad Harris

Abstract:

Digital information is expanding in exponential order in our life. Information that is residing online and offline are stored in huge repositories relating to every aspect of our lives. Getting the required information is a task of retrieval systems. Content based image retrieval (CBIR) is a retrieval system that retrieves the required information from repositories on the basis of the contents of the image. Time is a critical factor in retrieval system and using indexed views with CBIR system improves the time efficiency of retrieved results.

Keywords: Content based image retrieval (CBIR), Indexed view, Color, Image retrieval, Cross correlation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1612
7190 Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems

Authors: Tengku Mohd T. Sembok

Abstract:

Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model. Keywords may be unstemmed or stemmed. When keywords are stemmed and conflated in retrieving process, we are a step forwards in applying semantic technology in IRS. Word stemming is a process in morphological analysis under natural language processing, before syntactic and semantic analysis. We have developed algorithms for Malay and Arabic and incorporated stemming in our experimental systems in order to measure retrieval effectiveness. The results have shown that the retrieval effectiveness has increased when stemming is used in the systems.

Keywords: Information Retrieval, Natural Language Processing, Artificial Intelligence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2037
7189 Context-Aware Querying in Multimedia Databases – A Futuristic Approach

Authors: Nadeem Iftikhar, Zouhaib Zafar, Shaukat Ali

Abstract:

Efficient retrieval of multimedia objects has gained enormous focus in recent years. A number of techniques have been suggested for retrieval of textual information; however, relatively little has been suggested for efficient retrieval of multimedia objects. In this paper we have proposed a generic architecture for contextaware retrieval of multimedia objects. The proposed framework combines the well-known approaches of text-based retrieval and context-aware retrieval to formulate architecture for accurate retrieval of multimedia data.

Keywords: Context-aware retrieval, information retrieval, multimedia databases, multimedia data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1266
7188 Using Genetic Algorithm to Improve Information Retrieval Systems

Authors: Ahmed A. A. Radwan, Bahgat A. Abdel Latef, Abdel Mgeid A. Ali, Osman A. Sadek

Abstract:

This study investigates the use of genetic algorithms in information retrieval. The method is shown to be applicable to three well-known documents collections, where more relevant documents are presented to users in the genetic modification. In this paper we present a new fitness function for approximate information retrieval which is very fast and very flexible, than cosine similarity fitness function.

Keywords: Cosine similarity, Fitness function, Genetic Algorithm, Information Retrieval, Query learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2478
7187 Language and Retrieval Accuracy

Authors: Ahmed Abdelali, Jim Cowie, Hamdy S. Soliman

Abstract:

One of the major challenges in the Information Retrieval field is handling the massive amount of information available to Internet users. Existing ranking techniques and strategies that govern the retrieval process fall short of expected accuracy. Often relevant documents are buried deep in the list of documents returned by the search engine. In order to improve retrieval accuracy we examine the issue of language effect on the retrieval process. Then, we propose a solution for a more biased, user-centric relevance for retrieved data. The results demonstrate that using indices based on variations of the same language enhances the accuracy of search engines for individual users.

Keywords: Information Search and Retrieval, LanguageVariants, Search Engine, Retrieval Accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1205
7186 Multi-agent Data Fusion Architecture for Intelligent Web Information Retrieval

Authors: Amin Milani Fard, Mohsen Kahani, Reza Ghaemi, Hamid Tabatabaee

Abstract:

In this paper we propose a multi-agent architecture for web information retrieval using fuzzy logic based result fusion mechanism. The model is designed in JADE framework and takes advantage of JXTA agent communication method to allow agent communication through firewalls and network address translators. This approach enables developers to build and deploy P2P applications through a unified medium to manage agent-based document retrieval from multiple sources.

Keywords: Information retrieval systems, list fusion methods, document score, multi-agent systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1333
7185 A New Model of English-Vietnamese Bilingual Information Retrieval System

Authors: Chinh Trong Nguyen, Dang Tuan Nguyen

Abstract:

In this paper, we propose a new model of English- Vietnamese bilingual Information Retrieval system. Although there are so many CLIR systems had been researched and built, the accuracy of searching results in different languages that the CLIR system supports still need to improve, especially in finding bilingual documents. The problems identified in this paper are the limitation of machine translation-s result and the extra large collections of document to be found. So we try to establish a different model to overcome these problems.

Keywords: Bilingual Information Retrieval, Cross-lingual Information Retrieval, Bilingual Web sites.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1385
7184 Using Dempster-Shafer Theory in XML Information Retrieval

Authors: F. Raja, M. Rahgozar, F. Oroumchian

Abstract:

XML is a markup language which is becoming the standard format for information representation and data exchange. A major purpose of XML is the explicit representation of the logical structure of a document. Much research has been performed to exploit logical structure of documents in information retrieval in order to precisely extract user information need from large collections of XML documents. In this paper, we describe an XML information retrieval weighting scheme that tries to find the most relevant elements in XML documents in response to a user query. We present this weighting model for information retrieval systems that utilize plausible inferences to infer the relevance of elements in XML documents. We also add to this model the Dempster-Shafer theory of evidence to express the uncertainty in plausible inferences and Dempster-Shafer rule of combination to combine evidences derived from different inferences.

Keywords: Dempster-Shafer theory, plausible inferences, XMLinformation retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1289
7183 A Comparative Performance Evaluation Model of Mobile Agent Versus Remote Method Invocation for Information Retrieval

Authors: Yousry El-Gamal, Khalid El-Gazzar, Magdy Saeb

Abstract:

The development of distributed systems has been affected by the need to accommodate an increasing degree of flexibility, adaptability, and autonomy. The Mobile Agent technology is emerging as an alternative to build a smart generation of highly distributed systems. In this work, we investigate the performance aspect of agent-based technologies for information retrieval. We present a comparative performance evaluation model of Mobile Agents versus Remote Method Invocation by means of an analytical approach. We demonstrate the effectiveness of mobile agents for dynamic code deployment and remote data processing by reducing total latency and at the same time producing minimum network traffic. We argue that exploiting agent-based technologies significantly enhances the performance of distributed systems in the domain of information retrieval.

Keywords: Mobile Agent, performance evaluation, RMI, information retrieval, distributed systems, database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2018
7182 A Frame Work for Query Results Refinement in Multimedia Databases

Authors: Humaira Liaquat, Nadeem Iftikhar, Shaukat Ali, Zohaib Zafar Iqbal

Abstract:

In the current age, retrieval of relevant information from massive amount of data is a challenging job. Over the years, precise and relevant retrieval of information has attained high significance. There is a growing need in the market to build systems, which can retrieve multimedia information that precisely meets the user's current needs. In this paper, we have introduced a framework for refining query results before showing it to the user, using ambient intelligence, user profile, group profile, user location, time, day, user device type and extracted features. A prototypic tool was also developed to demonstrate the efficiency of the proposed approach.

Keywords: Context aware retrieval, Information retrieval, Ambient Intelligence, Multimedia databases, User and group profile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1156
7181 Enhancing Retrieval Effectiveness of Malay Documents by Exploiting Implicit Semantic Relationship between Words

Authors: Mohd Pouzi Hamzah, Tengku Mohd Tengku Sembok

Abstract:

Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.

Keywords: Information Retrieval, Malay Language, Semantic Relationship, Retrieval Effectiveness, Conceptual Indexing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1176
7180 ARCS for Critical Information Retrieval Development

Authors: Suttipong Boonphadung

Abstract:

The research on ARCS for critical information retrieval development aimed to (1) investigate conditions of critical information retrieval skill of the Mathematics pre-service teachers before applying ARCS model in learning activities, (2) study and analyze the development of critical information retrieval skill of the Mathematics pre-service teachers after utilizing ARCS model in learning activities, and (3) evaluate the Mathematics pre-service teachers’ satisfaction on using ARCS model in learning activities as a tool to development critical information retrieval skill. Forty-one of 4th year Mathematics pre-service teachers who have enrolled in the subject of Research for Learning Development of semester 2 in 2012 were purposively selected as the research cohort. The research tools were self-report and interview questionnaire that was approved as content validity and reliability (IOC=.66-1.00, α =.834). The research found that critical information retrieval skill of the research samples before using ARCS model in learning activities was in the normal high level. According to the in-depth interview and focus group, the result however showed that the pre-service teachers still lack inadequate and effective knowledge in information retrieval. Additionally, critical information retrieval skill of the research cohort after applying ARCS model in learning activities appeared to be high level. The result revealed that the pre-service teachers are able to explain the method of searching, extraction, and selecting information as well as evaluating quality of information, and effectively making decision in accepting information. Moreover, the research discovered that the pre-service teachers showed normal high to highest level of satisfaction on using ARCS model in learning activities as a tool to development their critical information retrieval skill.

Keywords: Critical information retrieval skill, ARCS model, Satisfaction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1285
7179 Text Retrieval Relevance Feedback Techniques for Bag of Words Model in CBIR

Authors: Nhu Van NGUYEN, Jean-Marc OGIER, Salvatore TABBONE, Alain BOUCHER

Abstract:

The state-of-the-art Bag of Words model in Content- Based Image Retrieval has been used for years but the relevance feedback strategies for this model are not fully investigated. Inspired from text retrieval, the Bag of Words model has the ability to use the wealth of knowledge and practices available in text retrieval. We study and experiment the relevance feedback model in text retrieval for adapting it to image retrieval. The experiments show that the techniques from text retrieval give good results for image retrieval and that further improvements is possible.

Keywords: Relevance feedback, bag of words model, probabilistic model, vector space model, image retrieval

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1821
7178 Image Retrieval: Techniques, Challenge, and Trend

Authors: Hui Hui Wang, Dzulkifli Mohamad, N.A Ismail

Abstract:

This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users- mind.

Keywords: content based image retrieval, keyword based imageretrieval, semantic gap, semantic image retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2231
7177 Algorithm for Information Retrieval Optimization

Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran

Abstract:

When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (

Keywords: Internet ranking,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1215
7176 Leveraging Quality Metrics in Voting Model Based Thread Retrieval

Authors: Atefeh Heydari, Mohammadali Tavakoli, Zuriati Ismail, Naomie Salim

Abstract:

Seeking and sharing knowledge on online forums have made them popular in recent years. Although online forums are valuable sources of information, due to variety of sources of messages, retrieving reliable threads with high quality content is an issue. Majority of the existing information retrieval systems ignore the quality of retrieved documents, particularly, in the field of thread retrieval. In this research, we present an approach that employs various quality features in order to investigate the quality of retrieved threads. Different aspects of content quality, including completeness, comprehensiveness, and politeness, are assessed using these features, which lead to finding not only textual, but also conceptual relevant threads for a user query within a forum. To analyse the influence of the features, we used an adopted version of voting model thread search as a retrieval system. We equipped it with each feature solely and also various combinations of features in turn during multiple runs. The results show that incorporating the quality features enhances the effectiveness of the utilised retrieval system significantly.

Keywords: Content quality, Forum search, Thread retrieval, Voting techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410
7175 Data Extraction of XML Files using Searching and Indexing Techniques

Authors: Sushma Satpute, Vaishali Katkar, Nilesh Sahare

Abstract:

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.

Keywords: XML Retrieval, Indexed Search, Information Retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1552
7174 Neural-Symbolic Machine-Learning for Knowledge Discovery and Adaptive Information Retrieval

Authors: Hager Kammoun, Jean Charles Lamirel, Mohamed Ben Ahmed

Abstract:

In this paper, a model for an information retrieval system is proposed which takes into account that knowledge about documents and information need of users are dynamic. Two methods are combined, one qualitative or symbolic and the other quantitative or numeric, which are deemed suitable for many clustering contexts, data analysis, concept exploring and knowledge discovery. These two methods may be classified as inductive learning techniques. In this model, they are introduced to build “long term" knowledge about past queries and concepts in a collection of documents. The “long term" knowledge can guide and assist the user to formulate an initial query and can be exploited in the process of retrieving relevant information. The different kinds of knowledge are organized in different points of view. This may be considered an enrichment of the exploration level which is coherent with the concept of document/query structure.

Keywords: Information Retrieval Systems, machine learning, classification, Galois lattices, Self Organizing Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 995
7173 Information Retrieval: A Comparative Study of Textual Indexing Using an Oriented Object Database (db4o) and the Inverted File

Authors: Mohammed Erritali

Abstract:

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word... In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.

Keywords: Information Retrieval, indexation, oriented object database (db4o), inverted file.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1449
7172 An Approach of Control System for Automated Storage and Retrieval System (AS/RS)

Authors: M. Soyaslan, A. Fenercioglu, C. Kozkurt

Abstract:

Automated storage and retrieval systems (AS/RS) become frequently used systems in warehouses. There has been a transition from human based forklift applications to fast and safe AS/RS applications in firm-s warehouse systems. In this study, basic components and automation systems of the AS/RS are examined. Proposed system's automation components and their tasks in the system control algorithm were stated. According to this control algorithm the control system structure was obtained.

Keywords: AS/RS, Automatic Storage and Retrieval System, Warehouse Control System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3282
7171 Composite Relevance Feedback for Image Retrieval

Authors: Pushpa B. Patil, Manesh B. Kokare

Abstract:

This paper presents content-based image retrieval (CBIR) frameworks with relevance feedback (RF) based on combined learning of support vector machines (SVM) and AdaBoosts. The framework incorporates only most relevant images obtained from both the learning algorithm. To speed up the system, it removes irrelevant images from the database, which are returned from SVM learner. It is the key to achieve the effective retrieval performance in terms of time and accuracy. The experimental results show that this framework had significant improvement in retrieval effectiveness, which can finally improve the retrieval performance.

Keywords: Image retrieval, relevance feedback, wavelet transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1767
7170 Selection of Relevant Servers in Distributed Information Retrieval System

Authors: Benhamouda Sara, Guezouli Larbi

Abstract:

Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.

Keywords: Distributed information retrieval, relevance, server selection, collection selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1090
7169 Query Reformulation Guided by External Resource for Information Retrieval

Authors: Mohammed El Amine Abderrahim

Abstract:

Reformulating the user query is a technique that aims to improve the performance of an Information Retrieval System (IRS) in terms of precision and recall. This paper tries to evaluate the technique of query reformulation guided by an external resource for Arabic texts. To do this, various precision and recall measures were conducted and two corpora with different external resources like Arabic WordNet (AWN) and the Arabic Dictionary (thesaurus) of Meaning (ADM) were used. Examination of the obtained results will allow us to measure the real contribution of this reformulation technique in improving the IRS performance.

Keywords: Arabic NLP, Arabic Information Retrieval, Arabic WordNet, Query Expansion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1169
7168 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities

Authors: Khaled M. Alhawiti

Abstract:

This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.

Keywords: Data Retrieval, Information retrieval, Natural Language Processing, Text Structuring.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2380
7167 Comparative Analysis of Different Page Ranking Algorithms

Authors: S. Prabha, K. Duraiswamy, J. Indhumathi

Abstract:

Search engine plays an important role in internet, to retrieve the relevant documents among the huge number of web pages. However, it retrieves more number of documents, which are all relevant to your search topics. To retrieve the most meaningful documents related to search topics, ranking algorithm is used in information retrieval technique. One of the issues in data miming is ranking the retrieved document. In information retrieval the ranking is one of the practical problems. This paper includes various Page Ranking algorithms, page segmentation algorithms and compares those algorithms used for Information Retrieval. Diverse Page Rank based algorithms like Page Rank (PR), Weighted Page Rank (WPR), Weight Page Content Rank (WPCR), Hyperlink Induced Topic Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time Rank, Tag Rank, Relational Based Page Rank and Query Dependent Ranking algorithms are discussed and compared.

Keywords: Information Retrieval, Web Page Ranking, search engine, web mining, page segmentations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3791
7166 Content-based Retrieval of Medical Images

Authors: Lilac A. E. Al-Safadi

Abstract:

With the advance of multimedia and diagnostic images technologies, the number of radiographic images is increasing constantly. The medical field demands sophisticated systems for search and retrieval of the produced multimedia document. This paper presents an ongoing research that focuses on the semantic content of radiographic image documents to facilitate semantic-based radiographic image indexing and a retrieval system. The proposed model would divide a radiographic image document, based on its semantic content, and would be converted into a logical structure or a semantic structure. The logical structure represents the overall organization of information. The semantic structure, which is bound to logical structure, is composed of semantic objects with interrelationships in the various spaces in the radiographic image.

Keywords: Semantic Indexing, Content-Based Retrieval, Radiographic Images, Data Model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269
7165 Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language

Authors: Sameh H. Ghwanmeh

Abstract:

In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more.

Keywords: Hierarchical K-mean like clustering (HKM), Kmeans, cluster centroids, initial partition, and document distances

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2322
7164 Applications of Rough Set Decompositions in Information Retrieval

Authors: Chen Wu, Xiaohua Hu

Abstract:

This paper proposes rough set models with three different level knowledge granules in incomplete information system under tolerance relation by similarity between objects according to their attribute values. Through introducing dominance relation on the discourse to decompose similarity classes into three subclasses: little better subclass, little worse subclass and vague subclass, it dismantles lower and upper approximations into three components. By using these components, retrieving information to find naturally hierarchical expansions to queries and constructing answers to elaborative queries can be effective. It illustrates the approach in applying rough set models in the design of information retrieval system to access different granular expanded documents. The proposed method enhances rough set model application in the flexibility of expansions and elaborative queries in information retrieval.

Keywords: Incomplete information system, Rough set model, tolerance relation, dominance relation, approximation, decomposition, elaborative query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1388
7163 A Multilanguage Source Code Retrieval System Using Structural-Semantic Fingerprints

Authors: Mohamed Amine Ouddan, Hassane Essafi

Abstract:

Source code retrieval is of immense importance in the software engineering field. The complex tasks of retrieving and extracting information from source code documents is vital in the development cycle of the large software systems. The two main subtasks which result from these activities are code duplication prevention and plagiarism detection. In this paper, we propose a Mohamed Amine Ouddan, and Hassane Essafi source code retrieval system based on two-level fingerprint representation, respectively the structural and the semantic information within a source code. A sequence alignment technique is applied on these fingerprints in order to quantify the similarity between source code portions. The specific purpose of the system is to detect plagiarism and duplicated code between programs written in different programming languages belonging to the same class, such as C, Cµ, Java and CSharp. These four languages are supported by the actual version of the system which is designed such that it may be easily adapted for any programming language.

Keywords: Source code retrieval, plagiarism detection, clonedetection, sequence alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1550