Search results for: Document ranking

444 Algorithm for Information Retrieval Optimization

Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran

Abstract:

When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (

Keywords: Internet ranking,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428

443 Comparative Analysis of Different Page Ranking Algorithms

Authors: S. Prabha, K. Duraiswamy, J. Indhumathi

Abstract:

Search engine plays an important role in internet, to retrieve the relevant documents among the huge number of web pages. However, it retrieves more number of documents, which are all relevant to your search topics. To retrieve the most meaningful documents related to search topics, ranking algorithm is used in information retrieval technique. One of the issues in data miming is ranking the retrieved document. In information retrieval the ranking is one of the practical problems. This paper includes various Page Ranking algorithms, page segmentation algorithms and compares those algorithms used for Information Retrieval. Diverse Page Rank based algorithms like Page Rank (PR), Weighted Page Rank (WPR), Weight Page Content Rank (WPCR), Hyperlink Induced Topic Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time Rank, Tag Rank, Relational Based Page Rank and Query Dependent Ranking algorithms are discussed and compared.

Keywords: Information Retrieval, Web Page Ranking, search engine, web mining, page segmentations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4235

442 A Hybrid Ontology Based Approach for Ranking Documents

Authors: Sarah Motiee, Azadeh Nematzadeh, Mehrnoush Shamsfard

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1571

441 ORank: An Ontology Based System for Ranking Documents

Authors: Mehrnoush Shamsfard, Azadeh Nematzadeh, Sarah Motiee

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques for extracting phrases and stemming words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1834

440 University Ranking Systems – From League Table to Homogeneous Groups of Universities

Authors: M. Jarocka

Abstract:

The paper contains a review of the literature in terms of the critical analysis of methodologies of university ranking systems. Furthermore, the initiatives supported by the European Commission (U-Map, U-Multirank) and CHE Ranking are described. Special attention is paid to the tendencies in the development of ranking systems. According to the author, the ranking organizations should abandon the classic form of ranking, namely a hierarchical ordering of universities from “the best" to “the worse". In the empirical part of this paper, using one of the method of cluster analysis called k-means clustering, the author presents university classifications of the top universities from the Shanghai Jiao Tong University-s (SJTU) Academic Ranking of World Universities (ARWU).

Keywords: Classification, cluster analysis, ranking, university.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2697

439 A Combination of Similarity Ranking and Time for Social Research Paper Searching

Authors: P. Jomsri

Abstract:

Nowadays social media are important tools for web resource discovery. The performance and capabilities of web searches are vital, especially search results from social research paper bookmarking. This paper proposes a new algorithm for ranking method that is a combination of similarity ranking with paper posted time or CSTRank. The paper posted time is static ranking for improving search results. For this particular study, the paper posted time is combined with similarity ranking to produce a better ranking than other methods such as similarity ranking or SimRank. The retrieval performance of combination rankings is evaluated using mean values of NDCG. The evaluation in the experiments implies that the chosen CSTRank ranking by using weight score at ratio 90:10 can improve the efficiency of research paper searching on social bookmarking websites.

Keywords: combination ranking, information retrieval, time, similarity ranking, static ranking, weight score

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617

438 Ranking Fuzzy Numbers Based on Lexicographical Ordering

Authors: B. Farhadinia

Abstract:

Although so far, many methods for ranking fuzzy numbers have been discussed broadly, most of them contained some shortcomings, such as requirement of complicated calculations, inconsistency with human intuition and indiscrimination. The motivation of this study is to develop a model for ranking fuzzy numbers based on the lexicographical ordering which provides decision-makers with a simple and efficient algorithm to generate an ordering founded on a precedence. The main emphasis here is put on the ease of use and reliability. The effectiveness of the proposed method is finally demonstrated by including a comprehensive comparing different ranking methods with the present one.

Keywords: Ranking fuzzy numbers, Lexicographical ordering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1766

437 On Internet Access Technology Specification Model

Authors: Samson Okwakol Ariko, Venansius Baryamureeba

Abstract:

Internet Access Technologies (IAT) provide a means through which Internet can be accessed. The choice of a suitable Internet technology is increasingly becoming an important issue to ISP clients. Currently, the choice of IAT is based on discretion and intuition of the concerned managers and the reliance on ISPs. In this paper we propose a model and designs algorithms that are used in the Internet access technology specification. In the proposed model, three ranking approaches are introduced; concurrent ranking, stepwise ranking and weighted ranking. The model ranks the IAT based on distance measures computed in ascending order while the global ranking system assigns weights to each IAT according to the position held in each ranking technique, determines the total weight of a particular IAT and ranks them in descending order. The final output is an objective ranking of IAT in descending order.

Keywords: Internet Access Technology (IAT).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1390

436 Entropy Based Data Hiding for Document Images

Authors: Swetha Kurup, Sridhar G., Sridhar V.

Abstract:

In this paper we present a novel technique for data hiding in binary document images. We use the concept of entropy in order to identify document specific least distortive areas throughout the binary document image. The document image is treated as any other image and the proposed method utilizes the standard document characteristics for the embedding process. Proposed method minimizes perceptual distortion due to embedding and allows watermark extraction without the requirement of any side information at the decoder end.

Keywords: Entropy, Steganography, Watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1482

435 Comparative Study of Universities’ Web Structure Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

This paper is meant to analyze the ranking of University of Malaysia Terengganu, UMT’s website in the World Wide Web. There are only few researches have been done on comparing the ranking of universities’ websites so this research will be able to determine whether the existing UMT’s website is serving its purpose which is to introduce UMT to the world. The ranking is based on hub and authority values which are accordance to the structure of the website. These values are computed using two websearching algorithms, HITS and SALSA. Three other universities’ websites are used as the benchmarks which are UM, Harvard and Stanford. The result is clearly showing that more work has to be done on the existing UMT’s website where important pages according to the benchmarks, do not exist in UMT’s pages. The ranking of UMT’s website will act as a guideline for the web-developer to develop a more efficient website.

Keywords: Algorithm, ranking, website, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629

434 A New Approach for Flexible Document Categorization

Authors: Jebari Chaker, Ounelli Habib

Abstract:

In this paper we propose a new approach for flexible document categorization according to the document type or genre instead of topic. Our approach implements two homogenous classifiers: contextual classifier and logical classifier. The contextual classifier is based on the document URL, whereas, the logical classifier use the logical structure of the document to perform the categorization. The final categorization is obtained by combining contextual and logical categorizations. In our approach, each document is assigned to all predefined categories with different membership degrees. Our experiments demonstrate that our approach is best than other genre categorization approaches.

Keywords: Categorization, combination, flexible, logicalstructure, genre, category, URL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434

433 The Usefulness of Logical Structure in Flexible Document Categorization

Authors: Jebari Chaker, Ounalli Habib

Abstract:

This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more categories (thesis, paper, call for papers, email, ...). Using a set of training documents, our approach generates a set of rules used to categorize new documents. The approach flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is dynamically modified at each new document categorization. The experimentation of the proposed approach provides satisfactory results.

Keywords: categorization rule, document categorization, flexible categorization, logical structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1205

432 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: Text mining, topic extraction, independent, incremental, independent component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 999

431 Solutions of Fuzzy Transportation Problem Using Best Candidates Method and Different Ranking Techniques

Authors: M. S. Annie Christi

Abstract:

Transportation Problem (TP) is based on supply and demand of commodities transported from one source to the different destinations. Usual methods for finding solution of TPs are North-West Corner Rule, Least Cost Method Vogel’s Approximation Method etc. The transportation costs tend to vary at each time. We can use fuzzy numbers which would give solution according to this situation. In this study the Best Candidate Method (BCM) is applied. For ranking Centroid Ranking Technique (CRT) and Robust Ranking Technique have been adopted to transform the fuzzy TP and the above methods are applied to EDWARDS Vacuum Company, Crawley, in West Sussex in the United Kingdom. A Comparative study is also given. We see that the transportation cost can be minimized by the application of CRT under BCM.

Keywords: Best candidates method, centroid ranking technique, robust ranking technique, transportation problem, fuzzy transportation problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1511

430 A New Approach For Ranking Of Generalized Trapezoidal Fuzzy Numbers

Authors: Amit Kumar, Pushpinder Singh, Parampreet Kaur, Amarpreet Kaur

Abstract:

Ranking of fuzzy numbers play an important role in decision making, optimization, forecasting etc. Fuzzy numbers must be ranked before an action is taken by a decision maker. In this paper, with the help of several counter examples it is proved that ranking method proposed by Chen and Chen (Expert Systems with Applications 36 (2009) 6833-6842) is incorrect. The main aim of this paper is to propose a new approach for the ranking of generalized trapezoidal fuzzy numbers. The main advantage of the proposed approach is that the proposed approach provide the correct ordering of generalized and normal trapezoidal fuzzy numbers and also the proposed approach is very simple and easy to apply in the real life problems. It is shown that proposed ranking function satisfies all the reasonable properties of fuzzy quantities proposed by Wang and Kerre (Fuzzy Sets and Systems 118 (2001) 375-385).

Keywords: Ranking function, Generalized trapezoidal fuzzy numbers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2676

429 Ranking Fuzzy Numbers Based On Epsilon-Deviation Degree

Authors: Vincent F. Yu, Ha Thi Xuan Chi

Abstract:

Nejad and Mashinchi (2011) proposed a revision for ranking fuzzy numbers based on the areas of the left and the right sides of a fuzzy number. However, this method still has some shortcomings such as lack of discriminative power to rank similar fuzzy numbers and no guarantee the consistency between the ranking of fuzzy numbers and the ranking of their images. To overcome these drawbacks, we propose an epsilon-deviation degree method based on the left area and the right area of a fuzzy number, and the concept of the centroid point. The main advantage of the new approach is the development of an innovative index value which can be used to consistently evaluate and rank fuzzy numbers. Numerical examples are presented to illustrate the efficiency and superiority of the proposed method.

Keywords: Ranking fuzzy numbers, Centroid, Deviation degree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1528

428 An Exploratory Study of Reliability of Ranking vs. Rating in Peer Assessment

Authors: Yang Song, Yifan Guo, Edward F. Gehringer

Abstract:

Fifty years of research has found great potential for peer assessment as a pedagogical approach. With peer assessment, not only do students receive more copious assessments; they also learn to become assessors. In recent decades, more educational peer assessments have been facilitated by online systems. Those online systems are designed differently to suit different class settings and student groups, but they basically fall into two categories: rating-based and ranking-based. The rating-based systems ask assessors to rate the artifacts one by one following some review rubrics. The ranking-based systems allow assessors to review a set of artifacts and give a rank for each of them. Though there are different systems and a large number of users of each category, there is no comprehensive comparison on which design leads to higher reliability. In this paper, we designed algorithms to evaluate assessors' reliabilities based on their rating/ranking against the global ranks of the artifacts they have reviewed. These algorithms are suitable for data from both rating-based and ranking-based peer assessment systems. The experiments were done based on more than 15,000 peer assessments from multiple peer assessment systems. We found that the assessors in ranking-based peer assessments are at least 10% more reliable than the assessors in rating-based peer assessments. Further analysis also demonstrated that the assessors in ranking-based assessments tend to assess the more differentiable artifacts correctly, but there is no such pattern for rating-based assessors.

Keywords: Peer assessment, peer rating, peer ranking, reliability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1059

427 An Interval Type-2 Dual Fuzzy Polynomial Equations and Ranking Method of Fuzzy Numbers

Authors: Nurhakimah Ab. Rahman, Lazim Abdullah

Abstract:

According to fuzzy arithmetic, dual fuzzy polynomials cannot be replaced by fuzzy polynomials. Hence, the concept of ranking method is used to find real roots of dual fuzzy polynomial equations. Therefore, in this study we want to propose an interval type-2 dual fuzzy polynomial equation (IT2 DFPE). Then, the concept of ranking method also is used to find real roots of IT2 DFPE (if exists). We transform IT2 DFPE to system of crisp IT2 DFPE. This transformation performed with ranking method of fuzzy numbers based on three parameters namely value, ambiguity and fuzziness. At the end, we illustrate our approach by two numerical examples.

Keywords: Dual fuzzy polynomial equations, Interval type-2, Ranking method, Value.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1724

426 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: Grayscale image format, image fusing, SURF detection, YCbCr image format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1108

425 Combining Color and Layout Features for the Identification of Low-resolution Documents

Authors: Ardhendu Behera, Denis Lalanne, Rolf Ingold

Abstract:

This paper proposes a method, combining color and layout features, for identifying documents captured from lowresolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represented. The combined color and layout features are arranged in a symbolic file, which is unique for each document and is called the document-s visual signature. Our identification method first uses the color information in the signatures in order to focus the search space on documents having a similar color distribution, and finally selects the document having the most similar layout structure in the remaining search space. Finally, our experiment considers slide documents, which are often captured using handheld devices.

Keywords: Document color modeling, document visual signature, kernel density estimation, document identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1329

424 Ranking and Unranking Algorithms for k-ary Trees in Gray Code Order

Authors: Fateme Ashari-Ghomi, Najme Khorasani, Abbas Nowzari-Dalini

Abstract:

In this paper, we present two new ranking and unranking algorithms for k-ary trees represented by x-sequences in Gray code order. These algorithms are based on a gray code generation algorithm developed by Ahrabian et al.. In mentioned paper, a recursive backtracking generation algorithm for x-sequences corresponding to k-ary trees in Gray code was presented. This generation algorithm is based on Vajnovszki-s algorithm for generating binary trees in Gray code ordering. Up to our knowledge no ranking and unranking algorithms were given for x-sequences in this ordering. we present ranking and unranking algorithms with O(kn2) time complexity for x-sequences in this Gray code ordering

Keywords: k-ary Tree Generation, Ranking, Unranking, Gray Code.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2064

423 Color and Layout-based Identification of Documents Captured from Handheld Devices

Authors: Ardhendu Behera, Denis Lalanne, Rolf Ingold

Abstract:

This paper proposes a method, combining color and layout features, for identifying documents captured from low-resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represented. Our identification method first uses the color information in the documents in order to focus the search space on documents having a similar color distribution, and finally selects the document having the most similar layout structure in the remaining of the search space.

Keywords: Document color modeling, document visualsignature, kernel density estimation, document identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1517

422 Highlighting Document's Structure

Authors: Sylvie Ratté, Wilfried Njomgue, Pierre-André Ménard

Abstract:

In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).

Keywords: Information retrieval, document structures, symbolic grammars.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1188

421 Collaborative Document Evaluation: An Alternative Approach to Classic Peer Review

Authors: J. Beel, B. Gipp

Abstract:

Research papers are usually evaluated via peer review. However, peer review has limitations in evaluating research papers. In this paper, Scienstein and the new idea of 'collaborative document evaluation' are presented. Scienstein is a project to evaluate scientific papers collaboratively based on ratings, links, annotations and classifications by the scientific community using the internet. In this paper, critical success factors of collaborative document evaluation are analyzed. That is the scientists- motivation to participate as reviewers, the reviewers- competence and the reviewers- trustworthiness. It is shown that if these factors are ensured, collaborative document evaluation may prove to be a more objective, faster and less resource intensive approach to scientific document evaluation in comparison to the classical peer review process. It is shown that additional advantages exist as collaborative document evaluation supports interdisciplinary work, allows continuous post-publishing quality assessments and enables the implementation of academic recommendation engines. In the long term, it seems possible that collaborative document evaluation will successively substitute peer review and decrease the need for journals.

Keywords: Peer Review, Alternative, Collaboration, Document Evaluation, Rating, Annotations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441

420 Data Extraction of XML Files using Searching and Indexing Techniques

Authors: Sushma Satpute, Vaishali Katkar, Nilesh Sahare

Abstract:

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.

Keywords: XML Retrieval, Indexed Search, Information Retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736

419 Skew Detection Technique for Binary Document Images based on Hough Transform

Authors: Manjunath Aradhya V N, Hemantha Kumar G, Shivakumara P

Abstract:

Document image processing has become an increasingly important technology in the automation of office documentation tasks. During document scanning, skew is inevitably introduced into the incoming document image. Since the algorithm for layout analysis and character recognition are generally very sensitive to the page skew. Hence, skew detection and correction in document images are the critical steps before layout analysis. In this paper, a novel skew detection method is presented for binary document images. The method considered the some selected characters of the text which may be subjected to thinning and Hough transform to estimate skew angle accurately. Several experiments have been conducted on various types of documents such as documents containing English Documents, Journals, Text-Book, Different Languages and Document with different fonts, Documents with different resolutions, to reveal the robustness of the proposed method. The experimental results revealed that the proposed method is accurate compared to the results of well-known existing methods.

Keywords: Optical Character Recognition, Skew angle, Thinning, Hough transform, Document processing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2053

418 Persian/Arabic Document Segmentation Based On Pyramidal Image Structure

Authors: Seyyed Yasser Hashemi, Khalil Monfaredi

Abstract:

Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus in most previously reported methods it is inevitable to include these parameters. This problem becomes excessively acute and severe, especially in Persian/Arabic documents. Since the Persian/Arabic scripts differ considerably from the English scripts, most of the proposed methods for the English scripts do not render good results for the Persian scripts. In this paper, we present a novel parameter-free method for segmenting the Persian/Arabic document images which also works well for English scripts. This method segments the document image into maximal homogeneous regions and identifies them as texts and non-texts based on a pyramidal image structure. In other words the proposed method is capable of document segmentation without considering the character font sizes, text line spacing, and document layout structures. This algorithm is examined for 150 Arabic/Persian and English documents and document segmentation process are done successfully for 96 percent of documents.

Keywords: Persian/Arabic document, document segmentation, Pyramidal Image Structure, skew detection and correction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726

417 Investments Attractiveness via Combinatorial Optimization Ranking

Authors: Ivan C. Mustakerov, Daniela I. Borissova

Abstract:

The paper proposes an approach to ranking a set of potential countries to invest taking into account the investor point of view about importance of different economic indicators. For the goal, a ranking algorithm that contributes to rational decision making is proposed. The described algorithm is based on combinatorial optimization modeling and repeated multi-criteria tasks solution. The final result is list of countries ranked in respect of investor preferences about importance of economic indicators for investment attractiveness. Different scenarios are simulated conforming to different investors preferences. A numerical example with real dataset of indicators is solved. The numerical testing shows the applicability of the described algorithm. The proposed approach can be used with any sets of indicators as ranking criteria reflecting different points of view of investors.

Keywords: Combinatorial optimization modeling, economics investment attractiveness, economics ranking algorithm, multi-criteria problems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2039

416 Quantitative Ranking Evaluation of Wine Quality

Authors: A. Brunel, A. Kernevez, F. Leclere, J. Trenteseaux

Abstract:

Today, wine quality is only evaluated by wine experts with their own different personal tastes, even if they may agree on some common features. So producers do not have any unbiased way to independently assess the quality of their products. A tool is here proposed to evaluate wine quality by an objective ranking based upon the variables entering wine elaboration, and analysed through principal component analysis (PCA) method. Actual climatic data are compared by measuring the relative distance between each considered wine, out of which the general ranking is performed.

Keywords: Wine, grape, vine, weather conditions, rating, climate, principal component analysis, metric analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2088

415 Fuzzy Approach for Ranking of Motor Vehicles Involved in Road Accidents

Authors: Lazim Abdullah, N orhanadiah Zam

Abstract:

Increasing number of vehicles and lack of awareness among road users may lead to road accidents. However no specific literature was found to rank vehicles involved in accidents based on fuzzy variables of road users. This paper proposes a ranking of four selected motor vehicles involved in road accidents. Human and non-human factors that normally linked with road accidents are considered for ranking. The imprecision or vagueness inherent in the subjective assessment of the experts has led the application of fuzzy sets theory to deal with ranking problems. Data in form of linguistic variables were collected from three authorised personnel of three Malaysian Government agencies. The Multi Criteria Decision Making, fuzzy TOPSIS was applied in computational procedures. From the analysis, it shows that motorcycles vehicles yielded the highest closeness coefficient at 0.6225. A ranking can be drawn using the magnitude of closeness coefficient. It was indicated that the motorcycles recorded the first rank.

Keywords: Road accidents, decision making, closeness coefficient, fuzzy number

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490