Search results for: Hierarchical document structure
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2987

Search results for: Hierarchical document structure

2927 Advanced Information Extraction with n-gram based LSI

Authors: Ahmet Güven, Ö. Özgür Bozkurt, Oya Kalıpsız

Abstract:

Number of documents being created increases at an increasing pace while most of them being in already known topics and little of them introducing new concepts. This fact has started a new era in information retrieval discipline where the requirements have their own specialties. That is digging into topics and concepts and finding out subtopics or relations between topics. Up to now IR researches were interested in retrieving documents about a general topic or clustering documents under generic subjects. However these conventional approaches can-t go deep into content of documents which makes it difficult for people to reach to right documents they were searching. So we need new ways of mining document sets where the critic point is to know much about the contents of the documents. As a solution we are proposing to enhance LSI, one of the proven IR techniques by supporting its vector space with n-gram forms of words. Positive results we have obtained are shown in two different application area of IR domain; querying a document database, clustering documents in the document database.

Keywords: Document clustering, Information Extraction, Information Retrieval, LSI, n-gram.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751
2926 Organization Model of Semantic Document Repository and Search Techniques for Studying Information Technology

Authors: Nhon Do, Thuong Huynh, An Pham

Abstract:

Nowadays, organizing a repository of documents and resources for learning on a special field as Information Technology (IT), together with search techniques based on domain knowledge or document-s content is an urgent need in practice of teaching, learning and researching. There have been several works related to methods of organization and search by content. However, the results are still limited and insufficient to meet user-s demand for semantic document retrieval. This paper presents a solution for the organization of a repository that supports semantic representation and processing in search. The proposed solution is a model which integrates components such as an ontology describing domain knowledge, a database of document repository, semantic representation for documents and a file system; with problems, semantic processing techniques and advanced search techniques based on measuring semantic similarity. The solution is applied to build a IT learning materials management system of a university with semantic search function serving students, teachers, and manager as well. The application has been implemented, tested at the University of Information Technology, Ho Chi Minh City, Vietnam and has achieved good results.

Keywords: document retrieval system, knowledgerepresentation, document representation, semantic search, ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662
2925 A Probabilistic View of the Spatial Pooler in Hierarchical Temporal Memory

Authors: Mackenzie Leake, Liyu Xia, Kamil Rocki, Wayne Imaino

Abstract:

In the Hierarchical Temporal Memory (HTM) paradigm the effect of overlap between inputs on the activation of columns in the spatial pooler is studied. Numerical results suggest that similar inputs are represented by similar sets of columns and dissimilar inputs are represented by dissimilar sets of columns. It is shown that the spatial pooler produces these results under certain conditions for the connectivity and proximal thresholds. Following the discussion of the initialization of parameters for the thresholds, corresponding qualitative arguments about the learning dynamics of the spatial pooler are discussed.

Keywords: Hierarchical Temporal Memory, HTM, Learning Algorithms, Machine Learning, Spatial Pooler.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2145
2924 Sense of Territoriality and Revitalization of Neighborhood Centers in Boshrooyeh City

Authors: H. Farkisch, A.I. Che-Ani, V. Ahmadi, M. Surat

Abstract:

The role of neighborhood center as semi public (the balance space) is disappeared in bonding between private and public in new urbanism. In this way, a hierarchical principle in the traditional neighborhood center appears to create or develop the conditions for residents` relationships and belonging. This paper evaluates significant of hierarchical principles of the neighborhood center in residents` territoriality and its factors. In this way Miandeh neighborhood center from Boshrooyeh city was determined as a case study area. Results indicated that a hierarchical principle is the best instrument to improve the territoriality as the subcomponent of place belonging in residents. The findings help the urban designer to revitalization the neighborhoods and proceedings in organization of physical space.

Keywords: Belonging, Neighborhood center, Revitalization, Territoriality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
2923 Discovering the Dimension of Abstractness: Structure-Based Model that Learns New Categories and Categorizes on Different Levels of Abstraction

Authors: Georgi I. Petkov, Ivan I. Vankov, Yolina A. Petrova

Abstract:

A structure-based model of category learning and categorization at different levels of abstraction is presented. The model compares different structures and expresses their similarity implicitly in the forms of mappings. Based on this similarity, the model can categorize different targets either as members of categories that it already has or creates new categories. The model is novel using two threshold parameters to evaluate the structural correspondence. If the similarity between two structures exceeds the higher threshold, a new sub-ordinate category is created. Vice versa, if the similarity does not exceed the higher threshold but does the lower one, the model creates a new category on higher level of abstraction.

Keywords: Analogy-making, categorization, learning of categories, abstraction, hierarchical structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 737
2922 Enhancement Throughput of Unplanned Wireless Mesh Networks Deployment Using Partitioning Hierarchical Cluster (PHC)

Authors: Ahmed K. Hasan, A. A. Zaidan, Anas Majeed, B. B. Zaidan, Rosli Salleh, Omar Zakaria, Ali Zuheir

Abstract:

Wireless mesh networks based on IEEE 802.11 technology are a scalable and efficient solution for next generation wireless networking to provide wide-area wideband internet access to a significant number of users. The deployment of these wireless mesh networks may be within different authorities and without any planning, they are potentially overlapped partially or completely in the same service area. The aim of the proposed model is design a new model to Enhancement Throughput of Unplanned Wireless Mesh Networks Deployment Using Partitioning Hierarchical Cluster (PHC), the unplanned deployment of WMNs are determinates there performance. We use throughput optimization approach to model the unplanned WMNs deployment problem based on partitioning hierarchical cluster (PHC) based architecture, in this paper the researcher used bridge node by allowing interworking traffic between these WMNs as solution for performance degradation.

Keywords: Wireless Mesh Networks, 802.11s Internetworking, partitioning Hierarchical Cluste.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1493
2921 A Keyword-Based Filtering Technique of Document-Centric XML using NFA Representation

Authors: Changwoo Byun, Kyounghan Lee, Seog Park

Abstract:

XML is becoming a de facto standard for online data exchange. Existing XML filtering techniques based on a publish/subscribe model are focused on the highly structured data marked up with XML tags. These techniques are efficient in filtering the documents of data-centric XML but are not effective in filtering the element contents of the document-centric XML. In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to adequately filter element contents using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. We show several performance studies, efficiency and scalability using the multi-query processing time (MQPT).

Keywords: XML Data Stream, Document-centric XML, Filtering Technique, Value-based Predicates.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1717
2920 Cooperative CDD Scheme Based on Hierarchical Modulation in OFDM System

Authors: Seung-Jun Yu, Yeong-Seop Ahn, Young-Min Ko, Hyoung-Kyu Song

Abstract:

In order to achieve high data rate and increase the spectral efficiency, multiple input multiple output (MIMO) system has been proposed. However, multiple antennas are limited by size and cost. Therefore, recently developed cooperative diversity scheme, which profits the transmit diversity only with the existing hardware by constituting a virtual antenna array, can be a solution. However, most of the introduced cooperative techniques have a common fault of decreased transmission rate because the destination should receive the decodable compositions of symbols from the source and the relay. In this paper, we propose a cooperative cyclic delay diversity (CDD) scheme that use hierarchical modulation. This scheme is free from the rate loss and allows seamless cooperative communication.

Keywords: MIMO, Cooperative communication, CDD, Hierarchical modulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2163
2919 Data Migration between Document-Oriented and Relational Databases

Authors: Bogdan Walek, Cyril Klimes

Abstract:

Current tools for data migration between documentoriented and relational databases have several disadvantages. We propose a new approach for data migration between documentoriented and relational databases. During data migration the relational schema of the target (relational database) is automatically created from collection of XML documents. Proposed approach is verified on data migration between document-oriented database IBM Lotus/ Notes Domino and relational database implemented in relational database management system (RDBMS) MySQL.

Keywords: data migration, database, document-oriented database, XML, relational schema

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3469
2918 Soccer Video Edition Using a Multimodal Annotation

Authors: Fendri Emna, Ben-Abdallah Hanêne, Ben-Hamadou Abdelmajid

Abstract:

In this paper, we present an approach for soccer video edition using a multimodal annotation. We propose to associate with each video sequence of a soccer match a textual document to be used for further exploitation like search, browsing and abstract edition. The textual document contains video meta data, match meta data, and match data. This document, generated automatically while the video is analyzed, segmented and classified, can be enriched semi automatically according to the user type and/or a specialized recommendation system.

Keywords: XML, Multimodal Annotation, recommendation system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1372
2917 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: Share of electricity generation, CO2 emission, targets, multivariate methods, hierarchical clustering, K-means clustering, discriminant analyzed, correlation, EU member countries.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1203
2916 An MADM Framework toward Hierarchical Production Planning in Hybrid MTS/MTO Environments

Authors: H. Rafiei, M. Rabbani

Abstract:

This paper proposes a new decision making structure to determine the appropriate product delivery strategy for different products in a manufacturing system among make-to-stock, make-toorder, and hybrid strategy. Given product delivery strategies for all products in the manufacturing system, the position of the Order Penetrating Point (OPP) can be located regarding the delivery strategies among which location of OPP in hybrid strategy is a cumbersome task. In this regard, we employ analytic network process, because there are varieties of interrelated driving factors involved in choosing the right location. Moreover, the proposed structure is augmented with fuzzy sets theory in order to cope with the uncertainty of judgments. Finally, applicability of the proposed structure is proven in practice through a real industrial case company. The numerical results demonstrate the efficiency of the proposed decision making structure in order partitioning and OPP location.

Keywords: Hybrid make-to-stock/make-to-order, Multi-attribute decision making, Order partitioning, Order penetration point.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2170
2915 Identification of Spam Keywords Using Hierarchical Category in C2C E-commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like ebay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C E-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C E-commerce.

Keywords: Spam Keyword, E-commerce, keyword features, spam filtering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2459
2914 Towards Clustering of Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf

Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1462
2913 WebGD: A CORBA-based Document Classification and Retrieval System on the Web

Authors: Fuyang Peng, Bo Deng, Chao Qi, Mou Zhan

Abstract:

This paper presents the design and implementation of the WebGD, a CORBA-based document classification and retrieval system on Internet. The WebGD makes use of such techniques as Web, CORBA, Java, NLP, fuzzy technique, knowledge-based processing and database technology. Unified classification and retrieval model, classifying and retrieving with one reasoning engine and flexible working mode configuration are some of its main features. The architecture of WebGD, the unified classification and retrieval model, the components of the WebGD server and the fuzzy inference engine are discussed in this paper in detail.

Keywords: Text Mining, document classification, knowledgeprocessing, fuzzy logic, Web, CORBA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1771
2912 Survival of Neutrino Mass Models in Nonthermal Leptogenesis

Authors: Amal Kr Sarma, H Zeen Devi, N Nimai Singh

Abstract:

The Constraints imposed by non-thermal leptogenesis on the survival of the neutrino mass models describing the presently available neutrino mass patterns, are studied numerically. We consider the Majorana CP violating phases coming from right-handed Majorana mass matrices to estimate the baryon asymmetry of the universe, for different neutrino mass models namely quasi-degenerate, inverted hierarchical and normal hierarchical models, with tribimaximal mixings. Considering two possible diagonal forms of Dirac neutrino mass matrix as either charged lepton or up-quark mass matrix, the heavy right-handed mass matrices are constructed from the light neutrino mass matrix. Only the normal hierarchical model leads to the best predictions of baryon asymmetry of the universe, consistent with observations in non-thermal leptogenesis scenario.

Keywords: Thermal leptogenesis, Non-thermal leptogenesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1233
2911 Automatic Enhanced Update Summary Generation System for News Documents

Authors: S. V. Kogilavani, C. S. Kanimozhiselvi, S. Malliga

Abstract:

Fast changing knowledge systems on the Internet can be accessed more efficiently with the help of automatic document summarization and updating techniques. The aim of multi-document update summary generation is to construct a summary unfolding the mainstream of data from a collection of documents based on the hypothesis that the user has already read a set of previous documents. In order to provide a lot of semantic information from the documents, deeper linguistic or semantic analysis of the source documents were used instead of relying only on document word frequencies to select important concepts. In order to produce a responsive summary, meaning oriented structural analysis is needed. To address this issue, the proposed system presents a document summarization approach based on sentence annotation with aspects, prepositions and named entities. Semantic element extraction strategy is used to select important concepts from documents which are used to generate enhanced semantic summary.

Keywords: Aspects, named entities, prepositions, update summary.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2100
2910 Experimental Evaluation of Mobility Anchor Point Selection Scheme in Hierarchical Mobile IPv6

Authors: Zulkeflee Kusin, Mohamad Shanudin Zakaria

Abstract:

Hierarchical Mobile IPv6 (HMIPv6) was designed to support IP micro-mobility management in the Next Generation Networks (NGN) framework. The main design behind this protocol is the usage of Mobility Anchor Point (MAP) located at any level router of network to support hierarchical mobility management. However, the distance MAP selection in HMIPv6 causes MAP overloaded and increase frequent binding update as the network grows. Therefore, to address the issue in designing MAP selection scheme, we propose a dynamic load control mechanism integrates with a speed detection mechanism (DMS-DLC). From the experimental results we obtain that the proposed scheme gives better distribution in MAP load and increase handover speed.

Keywords: Dynamic load control, HMIPv6, Mobility AnchorPoint, MAP selection scheme

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1749
2909 Image Segmentation Using 2-D Histogram in RGB Color Space in Digital Libraries

Authors: El Asnaoui Khalid, Aksasse Brahim, Ouanan Mohammed

Abstract:

This paper presents an unsupervised color image segmentation method. It is based on a hierarchical analysis of 2-D histogram in RGB color space. This histogram minimizes storage space of images and thus facilitates the operations between them. The improved segmentation approach shows a better identification of objects in a color image and, at the same time, the system is fast.

Keywords: Image segmentation, hierarchical analysis, 2-D histogram, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1571
2908 Fuzzy Group Decision Making for the Assessment of Health-Care Waste Disposal Alternatives in Istanbul

Authors: Mehtap Dursun, E. Ertugrul Karsak, Melis Almula Karadayi

Abstract:

Disposal of health-care waste (HCW) is considered as an important environmental problem especially in large cities. Multiple criteria decision making (MCDM) techniques are apt to deal with quantitative and qualitative considerations of the health-care waste management (HCWM) problems. This research proposes a fuzzy multi-criteria group decision making approach with a multilevel hierarchical structure including qualitative as well as quantitative performance attributes for evaluating HCW disposal alternatives for Istanbul. Using the entropy weighting method, objective weights as well as subjective weights are taken into account to determine the importance weighting of quantitative performance attributes. The results obtained using the proposed methodology are thoroughly analyzed.

Keywords: Entropy weighting method, group decision making, health-care waste management, hierarchical fuzzy multi-criteriadecision making

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1641
2907 The Effect of Social Structural Change on the Traditional Turkish Houses Becoming Unusable

Authors: Gamze Fahriye Pehlivan, Tulay Canitez

Abstract:

The traditional Turkish houses becoming unusable are a result of the deterioration of the balanced interaction between users and house (human and house) continuing during the history. Especially depending upon the change in social structure, the houses becoming neglected do not meet the desires of the users and do not have the meaning but the shelter are becoming unusable and are being destroyed. A conservation policy should be developed and renovations should be made in order to pass the traditional houses carrying the quality of a cultural and historical document presenting the social structure, the lifestyle and the traditions of its own age to the next generations and to keep them alive.

Keywords: House, social structural change, social structural, traditional Turkish houses.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1664
2906 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Soo-Hyeon Jeon, Byeoung Kug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: Big Data Analysis, Document Classification, Text Mining, Topic Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1706
2905 Comprehensive Evaluation on China-s Industrial Structure Optimization from the Perspective of Coordination

Authors: Ying Wang

Abstract:

From the perspective of industrial structure coordination and based on an explicit definition for the connotation of industrial structure coordination, the synergetic coefficients are used to measure the coordination degree between three industries' input structure and output structure, and then the efficacy function method is employed to comprehensively evaluate the level of China-s industrial structure optimization. It is showed that Chinese industrial structure presented a "v-shaped" variation tendency between 1996 and 2008, and its industrial structure adjustment got obvious achievements after 2003, with the industrial structure optimization level increasing continuously. However in 2009, the level of China-s industrial structure optimization declined sharply due to the decreasing contribution degree of value added structure and energy structure coordination and the lower coordination degree of value added structure and capital structure.

Keywords: China's industrial structure, Coordination degree, Efficacy function, Synergetic coefficients

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1338
2904 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1939
2903 Interactive Concept-based Search using MOEA:The Hierarchical Preferences Case

Authors: Gideon Avigad, Amiram Moshaiov, Neima Brauner

Abstract:

An IEC technique is described for a multi-objective search of conceptual solutions. The survivability of solutions is influenced by both model-based fitness and subjective human preferences. The concepts- preferences are articulated via a hierarchy of sub-concepts. The suggested method produces an objectivesubjective front. Academic example is employed to demonstrate the proposed approach.

Keywords: Conceptual solution, engineering design, hierarchical planning, multi-objective search, problem reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 969
2902 Exploring the Activity Fabric of an Intelligent Environment with Hierarchical Hidden Markov Theory

Authors: Chiung-Hui Chen

Abstract:

The Internet of Things (IoT) was designed for widespread convenience. With the smart tag and the sensing network, a large quantity of dynamic information is immediately presented in the IoT. Through the internal communication and interaction, meaningful objects provide real-time services for users. Therefore, the service with appropriate decision-making has become an essential issue. Based on the science of human behavior, this study employed the environment model to record the time sequences and locations of different behaviors and adopted the probability module of the hierarchical Hidden Markov Model for the inference. The statistical analysis was conducted to achieve the following objectives: First, define user behaviors and predict the user behavior routes with the environment model to analyze user purposes. Second, construct the hierarchical Hidden Markov Model according to the logic framework, and establish the sequential intensity among behaviors to get acquainted with the use and activity fabric of the intelligent environment. Third, establish the intensity of the relation between the probability of objects’ being used and the objects. The indicator can describe the possible limitations of the mechanism. As the process is recorded in the information of the system created in this study, these data can be reused to adjust the procedure of intelligent design services.

Keywords: Behavior, big data, hierarchical Hidden Markov Model, intelligent object.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 709
2901 A New Model for Discovering XML Association Rules from XML Documents

Authors: R. AliMohammadzadeh, M. Rahgozar, A. Zarnani

Abstract:

The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the discovery process and do not ignore the tree structure of data in the final rules. The frequent subtrees based on the user provided support are split to complement subtrees to form the rules. We explain our model within multi-steps from data preparation to rule generation.

Keywords: XML, Data Mining, Association Rule Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
2900 Restoration of Noisy Document Images with an Efficient Bi-Level Adaptive Thresholding

Authors: Abhijit Mitra

Abstract:

An effective approach for extracting document images from a noisy background is introduced. The entire scheme is divided into three sub- stechniques – the initial preprocessing operations for noise cluster tightening, introduction of a new thresholding method by maximizing the ratio of stan- dard deviations of the combined effect on the image to the sum of weighted classes and finally the image restoration phase by image binarization utiliz- ing the proposed optimum threshold level. The proposed method is found to be efficient compared to the existing schemes in terms of computational complexity as well as speed with better noise rejection.

Keywords: Document image extraction, Preprocessing, Ratio of stan-dard deviations, Bi-level adaptive thresholding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1414
2899 Fast Document Segmentation Using Contourand X-Y Cut Technique

Authors: Boontee Kruatrachue, Narongchai Moongfangklang, Kritawan Siriboon

Abstract:

This paper describes fast and efficient method for page segmentation of document containing nonrectangular block. The segmentation is based on edge following algorithm using small window of 16 by 32 pixels. This segmentation is very fast since only border pixels of paragraph are used without scanning the whole page. Still, the segmentation may contain error if the space between them is smaller than the window used in edge following. Consequently, this paper reduce this error by first identify the missed segmentation point using direction information in edge following then, using X-Y cut at the missed segmentation point to separate the connected columns. The advantage of the proposed method is the fast identification of missed segmentation point. This methodology is faster with fewer overheads than other algorithms that need to access much more pixel of a document.

Keywords: Contour Direction Technique, Missed SegmentationPoints, Page Segmentation, Recursive X-Y Cut Technique

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2725
2898 Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Authors: Matthias Dehmer, Frank Emmert Streib, Alexander Mehler, Jürgen Kilian

Abstract:

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.

Keywords: Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2517