Search results for: document warehouse

223 Algorithm for Information Retrieval Optimization

Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran

Abstract:

When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (

Keywords: Internet ranking,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1431

222 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1942

221 Text Summarization for Oil and Gas News Article

Authors: L. H. Chong, Y. Y. Chen

Abstract:

Information is increasing in volumes; companies are overloaded with information that they may lose track in getting the intended information. It is a time consuming task to scan through each of the lengthy document. A shorter version of the document which contains only the gist information is more favourable for most information seekers. Therefore, in this paper, we implement a text summarization system to produce a summary that contains gist information of oil and gas news articles. The summarization is intended to provide important information for oil and gas companies to monitor their competitor-s behaviour in enhancing them in formulating business strategies. The system integrated statistical approach with three underlying concepts: keyword occurrences, title of the news article and location of the sentence. The generated summaries were compared with human generated summaries from an oil and gas company. Precision and recall ratio are used to evaluate the accuracy of the generated summary. Based on the experimental results, the system is able to produce an effective summary with the average recall value of 83% at the compression rate of 25%.

Keywords: Information retrieval, text summarization, statistical approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559

220 Suitability of Requirements Abstraction Model (RAM) Requirements for High-Level System Testing

Authors: Naeem Muhammad, Yves Vandewoude, Yolande Berbers, Robert Feldt

Abstract:

The Requirements Abstraction Model (RAM) helps in managing abstraction in requirements by organizing them at four levels (product, feature, function and component). The RAM is adaptable and can be tailored to meet the needs of the various organizations. Because software requirements are an important source of information for developing high-level tests, organizations willing to adopt the RAM model need to know the suitability of the RAM requirements for developing high-level tests. To investigate this suitability, test cases from twenty randomly selected requirements were developed, analyzed and graded. Requirements were selected from the requirements document of a Course Management System, a web based software system that supports teachers and students in performing course related tasks. This paper describes the results of the requirements document analysis. The results show that requirements at lower levels in the RAM are suitable for developing executable tests whereas it is hard to develop from requirements at higher levels.

Keywords: Market-driven requirements engineering, requirements abstraction model, requirements abstraction, system testing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926

219 Semantic Indexing Approach of a Corpora Based On Ontology

Authors: Mohammed Erritali

Abstract:

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. This paper presents a new semantic indexing approach of a documentary corpus. The indexing process starts first by a term weighting phase to determine the importance of these terms in the documents. Then the use of a thesaurus like Wordnet allows moving to the conceptual level. Each candidate concept is evaluated by determining its level of representation of the document, that is to say, the importance of the concept in relation to other concepts of the document. Finally, the semantic index is constructed by attaching to each concept of the ontology, the documents of the corpus in which these concepts are found.

Keywords: Semantic, indexing, corpora, WordNet, ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1331

218 DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

Authors: Ming-Jen Huang, Chun-Fang Huang, Chiching Wei

Abstract:

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Keywords: Document processing, framework, formal definition, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 578

217 Lexical Based Method for Opinion Detection on Tripadvisor Collection

Authors: Faiza Belbachir, Thibault Schienhinski

Abstract:

The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.

Keywords: Tripadvisor, Opinion detection, SentiWordNet, trust score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 691

216 Reverse Logistics Information Management Using Ontological Approach

Authors: F. Lhafiane, A. Elbyed, M. Bouchoum

Abstract:

Reverse Logistics (RL) Network is considered as complex and dynamic network that involves many stakeholders such as: suppliers, manufactures, warehouse, retails and costumers, this complexity is inherent in such process due to lack of perfect knowledge or conflicting information. Ontologies on the other hand can be considered as an approach to overcome the problem of sharing knowledge and communication among the various reverse logistics partners. In this paper we propose a semantic representation based on hybrid architecture for building the Ontologies in ascendant way, this method facilitates the semantic reconciliation between the heterogeneous information systems that support reverse logistics processes and product data.

Keywords: Reverse Logistics, information management, heterogeneity, Ontologies, semantic web.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2912

215 Secure Text Steganography for Microsoft Word Document

Authors: Khan Farhan Rafat, M. Junaid Hussain

Abstract:

Seamless modification of an entity for the purpose of hiding a message of significance inside its substance in a manner that the embedding remains oblivious to an observer is known as steganography. Together with today's pervasive registering frameworks, steganography has developed into a science that offers an assortment of strategies for stealth correspondence over the globe that must, however, need a critical appraisal from security breach standpoint. Microsoft Word is amongst the preferably used word processing software, which comes as a part of the Microsoft Office suite. With a user-friendly graphical interface, the richness of text editing, and formatting topographies, the documents produced through this software are also most suitable for stealth communication. This research aimed not only to epitomize the fundamental concepts of steganography but also to expound on the utilization of Microsoft Word document as a carrier for furtive message exchange. The exertion is to examine contemporary message hiding schemes from security aspect so as to present the explorative discoveries and suggest enhancements which may serve a wellspring of information to encourage such futuristic research endeavors.

Keywords: Hiding information in plain sight, stealth communication, oblivious information exchange, conceal, steganography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1558

214 Optimal Document Archiving and Fast Information Retrieval

Authors: Hazem M. El-Bakry, Ahmed A. Mohammed

Abstract:

In this paper, an intelligent algorithm for optimal document archiving is presented. It is kown that electronic archives are very important for information system management. Minimizing the size of the stored data in electronic archive is a main issue to reduce the physical storage area. Here, the effect of different types of Arabic fonts on electronic archives size is discussed. Simulation results show that PDF is the best file format for storage of the Arabic documents in electronic archive. Furthermore, fast information detection in a given PDF file is introduced. Such approach uses fast neural networks (FNNs) implemented in the frequency domain. The operation of these networks relies on performing cross correlation in the frequency domain rather than spatial one. It is proved mathematically and practically that the number of computation steps required for the presented FNNs is less than that needed by conventional neural networks (CNNs). Simulation results using MATLAB confirm the theoretical computations.

Keywords: Information Storage and Retrieval, Electronic Archiving, Fast Information Detection, Cross Correlation, Frequency Domain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532

213 RFID Logistic Management with Cold Chain Monitoring – Cold Store Case Study

Authors: Mira Trebar

Abstract:

Logistics processes of perishable food in the supply chain include the distribution activities and the real time temperature monitoring to fulfil the cold chain requirements. The paper presents the use of RFID (Radio Frequency Identification) technology as an identification tool of receiving and shipping activities in the cold store. At the same time, the use of RFID data loggers with temperature sensors is presented to observe and store the temperatures for the purpose of analyzing the processes and having the history data available for traceability purposes and efficient recall management.

Keywords: Logistics, warehouse, RFID device, cold chain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3653

212 A Business Intelligence System Design Based on ASP Platform

Authors: Fengchi Shen, Rongtao Ding

Abstract:

The Informational Infrastructures of small and medium-sized manufacturing enterprises are relatively poor, there are serious shortages of capitals which can be invested in informatization construction, computer hardware and software resources, and human resources. To address the informatization issue in small and medium-sized manufacturing enterprises, and enable them to the application of advanced management thinking and enhance their competitiveness, the paper establish a manufacturing-oriented small and medium-sized enterprises informatization platform based on the ASP business intelligence technology, which effectively improves the scientificity of enterprises decision and management informatization.

Keywords: ASP, business intelligence, data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1757

211 Double Clustering as an Unsupervised Approach for Order Picking of Distributed Warehouses

Authors: Hsin-Yi Huang, Ming-Sheng Liu, Jiun-Yan Shiau

Abstract:

Planning the order picking lists for warehouses to achieve some operational performances is a significant challenge when the costs associated with logistics are relatively high, and it is especially important in e-commerce era. Nowadays, many order planning techniques employ supervised machine learning algorithms. However, to define features for supervised machine learning algorithms is not a simple task. Against this background, we consider whether unsupervised algorithms can enhance the planning of order-picking lists. A double zone picking approach, which is based on using clustering algorithms twice, is developed. A simplified example is given to demonstrate the merit of our approach.

Keywords: order picking, warehouse, clustering, unsupervised learning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 449

210 Using Perspective Schemata to Model the ETL Process

Authors: Valeria M. Pequeno, Joao Carlos G. M. Pires

Abstract:

Data Warehouses (DWs) are repositories which contain the unified history of an enterprise for decision support. The data must be Extracted from information sources, Transformed and integrated to be Loaded (ETL) into the DW, using ETL tools. These tools focus on data movement, where the models are only used as a means to this aim. Under a conceptual viewpoint, the authors want to innovate the ETL process in two ways: 1) to make clear compatibility between models in a declarative fashion, using correspondence assertions and 2) to identify the instances of different sources that represent the same entity in the real-world. This paper presents the overview of the proposed framework to model the ETL process, which is based on the use of a reference model and perspective schemata. This approach provides the designer with a better understanding of the semantic associated with the ETL process.

Keywords: conceptual data model, correspondence assertions, data warehouse, data integration, ETL process, object relational database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1463

209 Application of GIS-Based Construction Engineering: An Electronic Document Management System

Authors: Mansour N. Jadid

Abstract:

This paper describes the implementation of a GIS to provide decision support for successfully monitoring the movements and storage of materials, hence ensuring that finished products travel from the point of origin to the destination construction site through the supply-chain management (SCM) system. This system ensures the efficient operation of suppliers, manufacturers, and distributors by determining the shortest path from the point of origin to the final destination to reduce construction costs, minimize time, and enhance productivity. These systems are essential to the construction industry because they reduce costs and save time, thereby improve productivity and effectiveness. This study describes a typical supply-chain model and a geographical information system (GIS)-based SCM that focuses on implementing an electronic document management system, which maps the application framework to integrate geodetic support with the supply-chain system. This process provides guidance for locating the nearest suppliers to fill the information needs of project members in different locations. Moreover, this study illustrates the use of a GIS-based SCM as a collaborative tool in innovative methods for implementing Web mapping services, as well as aspects of their integration by generating an interactive GIS for the construction industry platform.

Keywords: Construction, coordinate, engineering, GIS, management, map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1399

208 Identification of Most Frequently Occurring Lexis in Winnings-announcing Unsolicited Bulke-mails

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of nearly 3000 winnings-announcing UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the winnings-announcing UBE documents. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexisset in the given UBE and the probability that the given UBE will be the one announcing fake winnings. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in winningsannouncing UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.

Keywords: Lexis, Unsolicited Bulk e-mail (UBE), Vector SpaceDocument Model, Winnings, Lottery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490

207 Using Radio Frequency Identification Technology in Supply Chain Management

Authors: Eleonora Tudora, Adriana Alexandru

Abstract:

The radio frequency identification (RFID) is a technology for automatic identification of items, particularly in supply chain, but it is becoming increasingly important for industrial applications. Unlike barcode technology that detects the optical signals reflected from barcode labels, RFID uses radio waves to transmit the information from an RFID tag affixed to the physical object. In contrast to today most often use of this technology in warehouse inventory and supply chain, the focus of this paper is an overview of the structure of RFID systems used by RFID technology and it also presents a solution based on the application of RFID for brand authentication, traceability and tracking, by implementing a production management system and extending its use to traders.

Keywords: RFID, RFID Tag, Electronic Product Code (EPC), EPC network, Object Naming Service (ONS), Authentication, Traceability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646

206 Towards Clustering of Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf

Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1464

205 Application of a Similarity Measure for Graphs to Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Alexander Mehler, Jürgen Kilian, Max Mühlhauser

Abstract:

Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.

Keywords: Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1843

204 Information Technologies in Automotive Assembly Industry in Thailand

Authors: Jirarat Teeravaraprug, Usawadee Inklay

Abstract:

This paper gave an attempt in prioritizing information technologies that organizations should give concentration. The case study was organizations in the automotive assembly industry in Thailand. Data were first collected to gather all information technologies known and used in the automotive assembly industry in Thailand. Five experts from the industries were surveyed based on the concept of fuzzy DEMATEL. The information technologies were categorized into six groups, which were communication, transaction, planning, organization management, warehouse management, and transportation. The cause groups of information technologies for each group were analyzed and presented. Moreover, the relationship between the used and the significant information technologies was given. Discussions based on the used information technologies and the research results are given.

Keywords: Information technology, automotive assembly industry, fuzzy DEMATEL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1678

203 The Data Mining usage in Production System Management

Authors: Pavel Vazan, Pavol Tanuska, Michal Kebisek

Abstract:

The paper gives the pilot results of the project that is oriented on the use of data mining techniques and knowledge discoveries from production systems through them. They have been used in the management of these systems. The simulation models of manufacturing systems have been developed to obtain the necessary data about production. The authors have developed the way of storing data obtained from the simulation models in the data warehouse. Data mining model has been created by using specific methods and selected techniques for defined problems of production system management. The new knowledge has been applied to production management system. Gained knowledge has been tested on simulation models of the production system. An important benefit of the project has been proposal of the new methodology. This methodology is focused on data mining from the databases that store operational data about the production process.

Keywords: data mining, data warehousing, management of production system, simulation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3441

202 Evaluation of Risk Attributes Driven by Periodically Changing System Functionality

Authors: Dariusz Dymek, Leszek Kotulski

Abstract:

Modeling of the distributed systems allows us to represent the whole its functionality. The working system instance rarely fulfils the whole functionality represented by model; usually some parts of this functionality should be accessible periodically. The reporting system based on the Data Warehouse concept seams to be an intuitive example of the system that some of its functionality is required only from time to time. Analyzing an enterprise risk associated with the periodical change of the system functionality, we should consider not only the inaccessibility of the components (object) but also their functions (methods), and the impact of such a situation on the system functionality from the business point of view. In the paper we suggest that the risk attributes should be estimated from risk attributes specified at the requirements level (Use Case in the UML model) on the base of the information about the structure of the model (presented at other levels of the UML model). We argue that it is desirable to consider the influence of periodical changes in requirements on the enterprise risk estimation. Finally, the proposition of such a solution basing on the UML system model is presented.

Keywords: Risk assessing, software maintenance, UML, graph grammars.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1349

201 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: Data integration, data warehousing, federated architecture, online analytical processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 663

200 The Comparison of Anchor and Star Schema from a Query Performance Perspective

Authors: Radek Němec

Abstract:

Today's business environment requires that companies have access to highly relevant information in a matter of seconds. Modern Business Intelligence tools rely on data structured mostly in traditional dimensional database schemas, typically represented by star schemas. Dimensional modeling is already recognized as a leading industry standard in the field of data warehousing although several drawbacks and pitfalls were reported. This paper focuses on the analysis of another data warehouse modeling technique - the anchor modeling, and its characteristics in context with the standardized dimensional modeling technique from a query performance perspective. The results of the analysis show information about performance of queries executed on database schemas structured according to principles of each database modeling technique.

Keywords: Data warehousing, anchor modeling, star schema, anchor schema, query performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3272

199 A Hybrid Ontology Based Approach for Ranking Documents

Authors: Sarah Motiee, Azadeh Nematzadeh, Mehrnoush Shamsfard

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576

198 Optimal Production Planning in Aromatic Coconuts Supply Chain Based On Mixed-Integer Linear Programming

Authors: Chaimongkol Limpianchob

Abstract:

This work addresses the problem of production planning that arises in the production of aromatic coconuts from Samudsakhorn province in Thailand. The planning involves the forwarding of aromatic coconuts from the harvest areas to the factory, which is classified into two groups; self-owned areas and contracted areas, the decisions of aromatic coconuts flow in the plant, and addressing a question of which warehouse will be in use. The problem is formulated as a mixed-integer linear programming model within supply chain management framework. The objective function seeks to minimize the total cost including the harvesting, labor and inventory costs. Constraints on the system include the production activities in the company and demand requirements. Numerical results are presented to demonstrate the feasibility of coconuts supply chain model compared with base case.

Keywords: Aromatic coconut, supply chain management, production planning, mixed-integer linear programming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2696

197 ORank: An Ontology Based System for Ranking Documents

Authors: Mehrnoush Shamsfard, Azadeh Nematzadeh, Sarah Motiee

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques for extracting phrases and stemming words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1839

196 A Materialized Approach to the Integration of XML Documents: the OSIX System

Authors: H. Ahmad, S. Kermanshahani, A. Simonet, M. Simonet

Abstract:

The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.

Keywords: Data integration, semi-structured data, views, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1543

195 On the Standardizing the Metal Die of Punchand Matrix by Mechanical Desktop Software

Authors: A. M. R. Mosalman Yazdi, B. A. R. Mosalman Yazdi

Abstract:

In industry, on of the most important subjects is die and it's characteristics in which for cutting and forming different mechanical pieces, various punch and matrix metal die are used. whereas the common parts which form the main frame die are not often proportion with pieces and dies therefore using a part as socalled common part for frames in specified dimension ranges can decrease the time of designing, occupied space of warehouse and manufacturing costs. Parts in dies with getting uniform in their shape and dimension make common parts of dies. Common parts of punch and matrix metal die are as bolster, guide bush, guide pillar and shank. In this paper the common parts and effective parameters in selecting each of them as the primary information are studied, afterward for selection and design of mechanical parts an introduction and investigation based on the Mech. Desk. software is done hence with developing this software can standardize the metal common parts of punch and matrix. These studies will be so useful for designer in their designing and also using it has with very much advantage for manufactures of products in decreasing occupied spaces by dies.

Keywords: Die, Matrix, Punch, Standardize.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1341

194 Encrypter Information Software Using Chaotic Generators

Authors: Cardoza-Avendaño L., López-Gutiérrez R.M., Inzunza-González E., Cruz-Hernández C., García-Guerrero E., Spirin V., Serrano H.

Abstract:

This document shows a software that shows different chaotic generator, as continuous as discrete time. The software gives the option for obtain the different signals, using different parameters and initial condition value. The program shows then critical parameter for each model. All theses models are capable of encrypter information, this software show it too.

Keywords: cryptography, chaotic attractors, software.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1449