Search results for: photo documents
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 370

Search results for: photo documents

370 Audiovisual Sources in Space and Time

Authors: G. Seksenbaeva, K. Atabayev, N. Alpusbaeva, T. Tulebayev, G. Sabdenova

Abstract:

In article are analyzed value of audiovisual sources which possesses high integrative potential and allows studying movement of information in the history - information movement from generation to the generation, in essence providing continuity of historical development and inheritance of traditions. Information thus fixed in them is considered as a source not only about last condition of society, but also significant for programming of its subsequent activity.

Keywords: Historical source, audiovisual documents, audiovisual source, film documents, photo documents, phonodocuments, cultural heritage, National Archives, material culture, spiritual culture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1620
369 Optimization and Determination of Process Parameters in Thin Film SOI Photo-BJMOSFET

Authors: Hai-Qing Xie, Yun Zeng, Yong-Hong Yan, Guo-Liang Zhang, Tai-Hong Wang

Abstract:

We propose photo-BJMOSFET (Bipolar Junction Metal-Oxide-Semiconductor Field Effect Transistor) fabricated on SOI film. ITO film is adopted in the device as gate electrode to reduce light absorption. I-V characteristics of photo-BJMOSFET obtained in dark (dark current) and under 570nm illumination (photo current) are studied furthermore to achieve high photo-to-dark-current contrast ratio. Two variables in the calculation were the channel length and the thickness of the film which were set equal to six different values, i.e., L=2, 4, 6, 8, 10, and 12μm and three different values, i.e., dsi =100, 200 and 300nm, respectively. The results indicate that the greatest photo-to-dark-current contrast ratio is achieved with L=10μm and dsi=200 nm at VGK=0.6V.

Keywords: Photo-to-dark-current contrast ratio, Photo-current, Dark-current, Process parameter

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1393
368 Characterization of Responsivity, Sensitivity and Spectral Response in Thin Film SOI photo-BJMOS -FET Compatible with CMOS Technology

Authors: Hai-Qing Xie, Yun Zeng, Yong-Hong Yan, Jian-Ping Zeng, Tai-Hong Wang

Abstract:

Photo-BJMOSFET (Bipolar Junction Metal-Oxide- Semiconductor Field Effect Transistor) fabricated on SOI film was proposed. ITO film is adopted in the device as gate electrode to reduce light absorption. Depletion region but not inversion region is formed in film by applying gate voltage (but low reverse voltage) to achieve high photo-to-dark-current ratio. Comparisons of photoelectriccharacteristics executed among VGK=0V, 0.3V, 0.6V, 0.9V and 1.0V (reverse voltage VAK is equal to 1.0V for total area of 10×10μm2). The results indicate that the greatest improvement in photo-to-dark-current ratio is achieved up to 2.38 at VGK=0.6V. In addition, photo-BJMOSFET is compatible with CMOS integration due to big input resistance

Keywords: Photo-BJMOSFET, Responsivity, Sensitivity, Spectral response.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487
367 Cross-Search Technique and its Visualization of Peer-to-Peer Distributed Clinical Documents

Authors: Yong Jun Choi, Juman Byun, Simon Berkovich

Abstract:

One of the ubiquitous routines in medical practice is searching through voluminous piles of clinical documents. In this paper we introduce a distributed system to search and exchange clinical documents. Clinical documents are distributed peer-to-peer. Relevant information is found in multiple iterations of cross-searches between the clinical text and its domain encyclopedia.

Keywords: Clinical documents, cross-search, document exchange, information retrieval, peer-to-peer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1251
366 Advanced Information Extraction with n-gram based LSI

Authors: Ahmet Güven, Ö. Özgür Bozkurt, Oya Kalıpsız

Abstract:

Number of documents being created increases at an increasing pace while most of them being in already known topics and little of them introducing new concepts. This fact has started a new era in information retrieval discipline where the requirements have their own specialties. That is digging into topics and concepts and finding out subtopics or relations between topics. Up to now IR researches were interested in retrieving documents about a general topic or clustering documents under generic subjects. However these conventional approaches can-t go deep into content of documents which makes it difficult for people to reach to right documents they were searching. So we need new ways of mining document sets where the critic point is to know much about the contents of the documents. As a solution we are proposing to enhance LSI, one of the proven IR techniques by supporting its vector space with n-gram forms of words. Positive results we have obtained are shown in two different application area of IR domain; querying a document database, clustering documents in the document database.

Keywords: Document clustering, Information Extraction, Information Retrieval, LSI, n-gram.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751
365 Evaluation of Total Cross Section of Photo-Ionization of Helium in Weak Field on Base of Trajectory Method

Authors: Alexander B. Bichkov, Valery V. Smirnov

Abstract:

Total cross section of helium atom photo-ionization by weak short pulse is calculated using the variant of trajectory method, developed in our earlier work. The method enables simple estimation of total ionization probability (or cross section) without integration of differential one.

Keywords: Evaluation of Photo-Ionization, Helium, Trajectory Method

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1846
364 Ultra High Speed Approach for Document Skew Detection and Correction Based On Centre of Gravity

Authors: Seyyed Yasser Hashemi

Abstract:

Skew detection and correction (SDC) has a direct effect in efficiency and exactitude of documents’ segmentation and analysis and thus is considered as a very important step in documents’ analysis field. Skew is a major problem in documents’ analysis for every language. For Arabic/Persian document scripts this problem is more severe because of special features of these languages. In this paper an efficient and fast algorithm for Document Skew Detection (DSD) based on the concept of segmentation and Center of Gravity (COG) is proposed. This algorithm is examined for 150 Arabic/Persian and English documents and SDC process are done successfully for 93 percent of documents with error rate of less than 1°. This algorithm shows better results for English documents compared to Arabic/Persian documents. The proposed method is also represents favorable results for handwritten, printed and also complicated documents such as newspapers and journals even with very low quality and resolution.

Keywords: Arabic/Persian document, Baseline, Centre of gravity, Document segmentation, Skew detection and correction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1857
363 Inspection of Geometrical Integrity of Work Piece and Measurement of Tool Wear by the Use of Photo Digitizing Method

Authors: R. Alipour, F. Nadjarian, A. Alinaghizade

Abstract:

Considering complexity of products, new geometrical design and investment tolerances that are necessary, measuring and dimensional controlling involve modern and more precise methods. Photo digitizing method using two cameras to record pictures and utilization of conventional method named “cloud points" and data analysis by the use of ATOUS software, is known as modern and efficient in mentioned context. In this paper, benefits of photo digitizing method in evaluating sampling of machining processes have been put forward. For example, assessment of geometrical integrity surface in 5-axis milling process and measurement of carbide tool wear in turning process, can be can be brought forward. Advantages of this method comparing to conventional methods have been expressed.

Keywords: photo digitizing, tool wear, geometrical integrity, cloud points

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1316
362 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1939
361 Evolved Strokes in Non Photo–Realistic Rendering

Authors: Ashkan Izadi, Vic Ciesielski

Abstract:

We describe a work with an evolutionary computing algorithm for non photo–realistic rendering of a target image. The renderings are produced by genetic programming. We have used two different types of strokes: “empty triangle" and “filled triangle" in color level. We compare both empty and filled triangular strokes to find which one generates more aesthetic pleasing images. We found the filled triangular strokes have better fitness and generate more aesthetic images than empty triangular strokes.

Keywords: Artificial intelligence, Evolutionary programming, Geneticprogramming, Non photo–realistic rendering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1875
360 Data Gathering and Analysis for Arabic Historical Documents

Authors: Ali Dulla

Abstract:

This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.

Keywords: Dataset production, ground truth production, historical documents, arbitrary warping, geometric correction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 818
359 Compression of Semistructured Documents

Authors: Leo Galambos, Jan Lansky, Katsiaryna Chernik

Abstract:

EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as a part of the index. Such a requirement leads us to pick up an appropriate compression algorithm which would reduce the space demand. One of the solutions could be to use common compression methods, for instance gzip or bzip2, but it might be preferable if we develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist a special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio

Keywords: Compression, search engine, HTML, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1530
358 Generation of Photo-Mosaic Images through Block Matching and Color Adjustment

Authors: Hae-Yeoun Lee

Abstract:

Mosaic refers to a technique that makes image by gathering lots of small materials in various colors. This paper presents an automatic algorithm that makes the photo-mosaic image using photos. The algorithm is composed of 4 steps: partition and feature extraction, block matching, redundancy removal and color adjustment. The input image is partitioned in the small block to extract feature. Each block is matched to find similar photo in database by comparing similarity with Euclidean difference between blocks. The intensity of the block is adjusted to enhance the similarity of image by replacing the value of light and darkness with that of relevant block. Further, the quality of image is improved by minimizing the redundancy of tiles in the adjacent blocks. Experimental results support that the proposed algorithm is excellent in quantitative analysis and qualitative analysis.

Keywords: Photo-mosaic, Euclidean distance, Block matching, Intensity adjustment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3525
357 The Photo-Absorption and Surface Feature of Nano-Structured TIO2 Coatings

Authors: Maryamossadat Bozorgtabar, Mohammadreza Rahimipour, Mehdi Salehi, Mohammadreza Jafarpour

Abstract:

Titanium dioxide coatings were deposited by utilizing atmospheric plasma spraying (APS) system. The agglomerated nanopowder and different spraying parameters were used to determine their influences on the microstructure surface feature and photoabsorption of the coatings. The microstructure of as-sprayed TiO2 coatings were characterized by scanning electron microscope (SEM). Surface characteristics were investigated by Fourier Transform Infrared (FT-IR). The photo absorption was determined by UV-VIS spectrophotometer. It is found that the spray parameters have an influence on the microstructure, surface feature and photo-absorption of the TiO2 coatings.

Keywords: APS, TiO2, Nanostructured Coating, Photoabsorption

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1679
356 Photo-Fenton Treatment of 1,3-dichloro-2- Propanol Aqueous Solutions Using UV Radiation and H2O2 – A Kinetic Study

Authors: Maria D. Nikolaki, Katerina N. Zerva, Constantine. J. Philippopoulos

Abstract:

The photochemical and photo-Fenton oxidation of 1,3-dichloro-2-propanol was performed in a batch reactor, at room temperature, using UV radiation, H2O2 as oxidant, and Fenton-s reagent. The effect of the oxidative agent-s initial concentration was investigated as well as the effect of the initial concentration of Fe(II) by following the target compound degradation, the total organic carbon removal and the chloride ion production. Also, from the kinetic analysis conducted and proposed reaction scheme it was deduced that the addition of Fe(II) significantly increases the production and the further oxidation of the chlorinated intermediates.

Keywords: 1, 3-dichloro-2-propanol, hydrogen peroxide, photo- Fenton, UV .

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1661
355 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: Grayscale image format, image fusing, SURF detection, YCbCr image format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1108
354 Using Suffix Tree Document Representation in Hierarchical Agglomerative Clustering

Authors: Daniel I. Morariu, Radu G. Cretulescu, Lucian N. Vintan

Abstract:

In text categorization problem the most used method for documents representation is based on words frequency vectors called VSM (Vector Space Model). This representation is based only on words from documents and in this case loses any “word context" information found in the document. In this article we make a comparison between the classical method of document representation and a method called Suffix Tree Document Model (STDM) that is based on representing documents in the Suffix Tree format. For the STDM model we proposed a new approach for documents representation and a new formula for computing the similarity between two documents. Thus we propose to build the suffix tree only for any two documents at a time. This approach is faster, it has lower memory consumption and use entire document representation without using methods for disposing nodes. Also for this method is proposed a formula for computing the similarity between documents, which improves substantially the clustering quality. This representation method was validated using HAC - Hierarchical Agglomerative Clustering. In this context we experiment also the stemming influence in the document preprocessing step and highlight the difference between similarity or dissimilarity measures to find “closer" documents.

Keywords: Text Clustering, Suffix tree documentrepresentation, Hierarchical Agglomerative Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1865
353 Implementation of A Photo-Curable 3D Additive Manufacturing Technology with Coloring Gray Capability by Using Piezo Ink-Jet

Authors: Ming-Jong Tsai, Y. L. Cheng, Y. L. Kuo, S. Y. Hsiao, J .W. Chen, P. H. Liu, D. H. Chen

Abstract:

The 3D printing is a combination of digital technology, material science, intelligent manufacturing and control of opto-mechatronics systems. It is called the third industrial revolution from the view of the Economist Journal. A color 3D printing machine may provide the necessary support for high value-added industrial and commercial design, architectural design, personal boutique, and 3D artist’s creation. The main goal of this paper is to develop photo-curable color 3D manufacturing technology and system implementation. The key technologies include (1) Photo-curable color 3D additive manufacturing processes development and materials research (2) Piezo type ink-jet head control and Opto-mechatronics integration technique of the photo-curable color 3D laminated manufacturing system. The proposed system is integrated with single Piezo type ink-jet head with two individual channels for two primary UV light curable color resins which can provide for future colorful 3D printing solutions. The main research results are 16 grey levels and grey resolution of 75 dpi. 

Keywords: 3d printing, additive manufacturing, color, photo-curable, Piezo type ink-jet, UV Resin.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2093
352 Photo Mosaic Smartphone Application in Client-Server Based Large-Scale Image Databases

Authors: Sang-Hun Lee, Bum-Soo Kim, Yang-Sae Moon, Jinho Kim

Abstract:

In this paper we present a photo mosaic smartphone application in client-server based large-scale image databases. Photo mosaic is not a new concept, but there are very few smartphone applications especially for a huge number of images in the client-server environment. To support large-scale image databases, we first propose an overall framework working as a client-server model. We then present a concept of image-PAA features to efficiently handle a huge number of images and discuss its lower bounding property. We also present a best-match algorithm that exploits the lower bounding property of image-PAA. We finally implement an efficient Android-based application and demonstrate its feasibility.

Keywords: smartphone applications; photo mosaic; similarity search; data mining; large-scale image databases.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629
351 Automatic Enhanced Update Summary Generation System for News Documents

Authors: S. V. Kogilavani, C. S. Kanimozhiselvi, S. Malliga

Abstract:

Fast changing knowledge systems on the Internet can be accessed more efficiently with the help of automatic document summarization and updating techniques. The aim of multi-document update summary generation is to construct a summary unfolding the mainstream of data from a collection of documents based on the hypothesis that the user has already read a set of previous documents. In order to provide a lot of semantic information from the documents, deeper linguistic or semantic analysis of the source documents were used instead of relying only on document word frequencies to select important concepts. In order to produce a responsive summary, meaning oriented structural analysis is needed. To address this issue, the proposed system presents a document summarization approach based on sentence annotation with aspects, prepositions and named entities. Semantic element extraction strategy is used to select important concepts from documents which are used to generate enhanced semantic summary.

Keywords: Aspects, named entities, prepositions, update summary.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2100
350 Mining Association Rules from Unstructured Documents

Authors: Hany Mahgoub

Abstract:

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.

Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2393
349 Extending the Conceptual Neighborhood Graph of the Relations for the Semantic Adaptation of Multimedia Documents

Authors: Azze-Eddine Maredj, Nourredine Tonkin

Abstract:

The recent developments in computing and communication technology permit to users to access multimedia documents with variety of devices (PCs, PDAs, mobile phones...) having heterogeneous capabilities. This diversification of supports has trained the need to adapt multimedia documents according to their execution contexts. A semantic framework for multimedia document adaptation based on the conceptual neighborhood graphs was proposed. In this framework, adapting consists on finding another specification that satisfies the target constraints and which is as close as possible from the initial document. In this paper, we propose a new way of building the conceptual neighborhood graphs to best preserve the proximity between the adapted and the original documents and to deal with more elaborated relations models by integrating the relations relaxation graphs that permit to handle the delays and the distances defined within the relations.

Keywords: Conceptual Neighborhood Graph, Relaxation Graphs, Relations with Delays, Semantic Adaptation of Multimedia Documents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1499
348 Evolutionary Feature Selection for Text Documents using the SVM

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, we present three feature selection methods: Information Gain, Support Vector Machine feature selection called (SVM_FS) and Genetic Algorithm with SVM (called GA_SVM). We show that the best results were obtained with GA_SVM method for a relatively small dimension of the feature vector.

Keywords: Feature Selection, Learning with Kernels, Support Vector Machine, Genetic Algorithm, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658
347 Color and Layout-based Identification of Documents Captured from Handheld Devices

Authors: Ardhendu Behera, Denis Lalanne, Rolf Ingold

Abstract:

This paper proposes a method, combining color and layout features, for identifying documents captured from low-resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represented. Our identification method first uses the color information in the documents in order to focus the search space on documents having a similar color distribution, and finally selects the document having the most similar layout structure in the remaining of the search space.

Keywords: Document color modeling, document visualsignature, kernel density estimation, document identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1517
346 Absolute Cross Sections of Multi-Photon Ionization of Xenon by the Comparison with Process of its Electron-Impact Ionization

Authors: A. A. Mityureva, A. A. Pastor, P. Yu. Serdobintsev, N. A. Timofeev

Abstract:

Comparison of electron- and photon-impact processes as a method for determination of photo-ionization cross sections is described, discussed and shown to have many attractive features.

Keywords: Transition probability, cross section, photo-ionization, electron-ionization, multi-photon process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2188
345 A Hybrid Ontology Based Approach for Ranking Documents

Authors: Sarah Motiee, Azadeh Nematzadeh, Mehrnoush Shamsfard

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1571
344 A New Approach to Annotate the Text's of the Websites and Documents with a Quite Comprehensive Knowledge Base

Authors: Mohammad Yasrebi, Mehran Mohsenzadeh, Mashalla Abbasi-Dezfuli

Abstract:

Machine-understandable data when strongly interlinked constitutes the basis for the SemanticWeb. Annotating web documents is one of the major techniques for creating metadata on the Web. Annotating websites defines the containing data in a form which is suitable for interpretation by machines. In this paper, we present a new approach to annotate websites and documents by promoting the abstraction level of the annotation process to a conceptual level. By this means, we hope to solve some of the problems of the current annotation solutions.

Keywords: Knowledge base, ontology, semantic annotation, semantic web.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1310
343 Proposition for a New Approach of Version Control System Based On ECA Active Rules

Authors: S. Benhamed, S. Hocine, D. Benhamamouch

Abstract:

We try to give a solution of version control for documents in web service, that-s why we propose a new approach used specially for the XML documents. The new approach is applied in a centralized repository, this repository coexist with other repositories in a decentralized system. To achieve the activities of this approach in a standard model we use the ECA active rules. We also show how the Event-Condition-Action rules (ECA rules) have been incorporated as a mechanism for the version control of documents. The need to integrate ECA rules is that it provides a clear declarative semantics and induces an immediate operational realization in the system without the need for human intervention.

Keywords: ECA Rule, Web service, version control system, propagation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1330
342 On the Interactive Search with Web Documents

Authors: Mario Kubek, Herwig Unger

Abstract:

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-ofthe- art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents.

Keywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609
341 Using Dempster-Shafer Theory in XML Information Retrieval

Authors: F. Raja, M. Rahgozar, F. Oroumchian

Abstract:

XML is a markup language which is becoming the standard format for information representation and data exchange. A major purpose of XML is the explicit representation of the logical structure of a document. Much research has been performed to exploit logical structure of documents in information retrieval in order to precisely extract user information need from large collections of XML documents. In this paper, we describe an XML information retrieval weighting scheme that tries to find the most relevant elements in XML documents in response to a user query. We present this weighting model for information retrieval systems that utilize plausible inferences to infer the relevance of elements in XML documents. We also add to this model the Dempster-Shafer theory of evidence to express the uncertainty in plausible inferences and Dempster-Shafer rule of combination to combine evidences derived from different inferences.

Keywords: Dempster-Shafer theory, plausible inferences, XMLinformation retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1482