Search results for: data contents search
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8226

Search results for: data contents search

8166 An Augmented Beam-search Based Algorithm for the Strip Packing Problem

Authors: Hakim Akeb, Mhand Hifi

Abstract:

In this paper, the use of beam search and look-ahead strategies for solving the strip packing problem (SPP) is investigated. Given a strip of fixed width W, unlimited length L, and a set of n circular pieces of known radii, the objective is to determine the minimum length of the initial strip that packs all the pieces. An augmented algorithm which combines beam search and a look-ahead strategies is proposed. The look-ahead is used in order to evaluate the nodes at each level of the tree search. The best nodes are then retained for branching. The computational investigation showed that the proposed augmented algorithm is able to improve the best known solutions of the literature on most instances used.

Keywords: Combinatorial optimization, cutting and packing, beam search, heuristic, look-ahead strategy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1318
8165 An Ontology Based Question Answering System on Software Test Document Domain

Authors: Meltem Serhatli, Ferda N. Alpaslan

Abstract:

Processing the data by computers and performing reasoning tasks is an important aim in Computer Science. Semantic Web is one step towards it. The use of ontologies to enhance the information by semantically is the current trend. Huge amount of domain specific, unstructured on-line data needs to be expressed in machine understandable and semantically searchable format. Currently users are often forced to search manually in the results returned by the keyword-based search services. They also want to use their native languages to express what they search. In this paper, an ontology-based automated question answering system on software test documents domain is presented. The system allows users to enter a question about the domain by means of natural language and returns exact answer of the questions. Conversion of the natural language question into the ontology based query is the challenging part of the system. To be able to achieve this, a new algorithm regarding free text to ontology based search engine query conversion is proposed. The algorithm is based on investigation of suitable question type and parsing the words of the question sentence.

Keywords: Description Logics, ontology, question answering, reasoning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2109
8164 Learning FCM by Tabu Search

Authors: Somayeh Alizadeh, Mehdi Ghazanfari, Mostafa Jafari, Salman Hooshmand

Abstract:

Fuzzy Cognitive Maps (FCMs) is a causal graph, which shows the relations between essential components in complex systems. Experts who are familiar with the system components and their relations can generate a related FCM. There is a big gap when human experts cannot produce FCM or even there is no expert to produce the related FCM. Therefore, a new mechanism must be used to bridge this gap. In this paper, a novel learning method is proposed to construct causal graph based on historical data and by using metaheuristic such Tabu Search (TS). The efficiency of the proposed method is shown via comparison of its results of some numerical examples with those of some other methods.

Keywords: Fuzzy Cognitive Map (FCM), Learning, Meta heuristic, Genetic Algorithm, Tabu search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819
8163 Auto Classification for Search Intelligence

Authors: Lilac A. E. Al-Safadi

Abstract:

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Keywords: Information Processing on the Web, Data Mining, Document Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1584
8162 Fast Search Method for Large Video Database Using Histogram Features and Temporal Division

Authors: Feifei Lee, Qiu Chen, Koji Kotani, Tadahiro Ohmi

Abstract:

In this paper, we propose an improved fast search algorithm using combined histogram features and temporal division method for short MPEG video clips from large video database. There are two types of histogram features used to generate more robust features. The first one is based on the adjacent pixel intensity difference quantization (APIDQ) algorithm, which had been reliably applied to human face recognition previously. An APIDQ histogram is utilized as the feature vector of the frame image. Another one is ordinal feature which is robust to color distortion. Combined with active search [4], a temporal pruning algorithm, fast and robust video search can be realized. The proposed search algorithm has been evaluated by 6 hours of video to search for given 200 MPEG video clips which each length is 30 seconds. Experimental results show the proposed algorithm can detect the similar video clip in merely 120ms, and Equal Error Rate (ERR) of 1% is achieved, which is more accurately and robust than conventional fast video search algorithm.

Keywords: Fast search, Adjacent pixel intensity differencequantization (APIDQ), DC image, Histogram feature.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582
8161 Block Based Imperial Competitive Algorithm with Greedy Search for Traveling Salesman Problem

Authors: Meng-Hui Chen, Chiao-Wei Yu, Pei-Chann Chang

Abstract:

Imperial competitive algorithm (ICA) simulates a multi-agent algorithm. Each agent is like a kingdom has its country, and the strongest country in each agent is called imperialist, others are colony. Countries are competitive with imperialist which in the same kingdom by evolving. So this country will move in the search space to find better solutions with higher fitness to be a new imperialist. The main idea in this paper is using the peculiarity of ICA to explore the search space to solve the kinds of combinational problems. Otherwise, we also study to use the greed search to increase the local search ability. To verify the proposed algorithm in this paper, the experimental results of traveling salesman problem (TSP) is according to the traveling salesman problem library (TSPLIB). The results show that the proposed algorithm has higher performance than the other known methods.

Keywords: Traveling Salesman Problem, Artificial Chromosomes, Greedy Search, Imperial Competitive Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1848
8160 Heuristic Search Algorithms for Tuning PUMA 560 Fuzzy PID Controller

Authors: Sufian Ashraf Mazhari, Surendra Kumar

Abstract:

This paper compares the heuristic Global Search Techniques; Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Generalized Pattern Search, genetic algorithm hybridized with Nelder–Mead and Generalized pattern search technique for tuning of fuzzy PID controller for Puma 560. Since the actual control is in joint space ,inverse kinematics is used to generate various joint angles correspoding to desired cartesian space trajectory. Efficient dynamics and kinematics are modeled on Matlab which takes very less simulation time. Performances of all the tuning methods with and without disturbance are compared in terms of ITSE in joint space and ISE in cartesian space for spiral trajectory tracking. Genetic Algorithm hybridized with Generalized Pattern Search is showing best performance.

Keywords: Controller tuning, Fuzzy Control, Genetic Algorithm, Heuristic search, Robot control.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2167
8159 DIFFER: A Propositionalization approach for Learning from Structured Data

Authors: Thashmee Karunaratne, Henrik Böstrom

Abstract:

Logic based methods for learning from structured data is limited w.r.t. handling large search spaces, preventing large-sized substructures from being considered by the resulting classifiers. A novel approach to learning from structured data is introduced that employs a structure transformation method, called finger printing, for addressing these limitations. The method, which generates features corresponding to arbitrarily complex substructures, is implemented in a system, called DIFFER. The method is demonstrated to perform comparably to an existing state-of-art method on some benchmark data sets without requiring restrictions on the search space. Furthermore, learning from the union of features generated by finger printing and the previous method outperforms learning from each individual set of features on all benchmark data sets, demonstrating the benefit of developing complementary, rather than competing, methods for structure classification.

Keywords: Machine learning, Structure classification, Propositionalization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1184
8158 Comparative Analysis of Different Page Ranking Algorithms

Authors: S. Prabha, K. Duraiswamy, J. Indhumathi

Abstract:

Search engine plays an important role in internet, to retrieve the relevant documents among the huge number of web pages. However, it retrieves more number of documents, which are all relevant to your search topics. To retrieve the most meaningful documents related to search topics, ranking algorithm is used in information retrieval technique. One of the issues in data miming is ranking the retrieved document. In information retrieval the ranking is one of the practical problems. This paper includes various Page Ranking algorithms, page segmentation algorithms and compares those algorithms used for Information Retrieval. Diverse Page Rank based algorithms like Page Rank (PR), Weighted Page Rank (WPR), Weight Page Content Rank (WPCR), Hyperlink Induced Topic Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time Rank, Tag Rank, Relational Based Page Rank and Query Dependent Ranking algorithms are discussed and compared.

Keywords: Information Retrieval, Web Page Ranking, search engine, web mining, page segmentations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4240
8157 Predicting Bankruptcy using Tabu Search in the Mauritian Context

Authors: J. Cheeneebash, K. B. Lallmamode, A. Gopaul

Abstract:

Throughout this paper, a relatively new technique, the Tabu search variable selection model, is elaborated showing how it can be efficiently applied within the financial world whenever researchers come across the selection of a subset of variables from a whole set of descriptive variables under analysis. In the field of financial prediction, researchers often have to select a subset of variables from a larger set to solve different type of problems such as corporate bankruptcy prediction, personal bankruptcy prediction, mortgage, credit scoring and the Arbitrage Pricing Model (APM). Consequently, to demonstrate how the method operates and to illustrate its usefulness as well as its superiority compared to other commonly used methods, the Tabu search algorithm for variable selection is compared to two main alternative search procedures namely, the stepwise regression and the maximum R 2 improvement method. The Tabu search is then implemented in finance; where it attempts to predict corporate bankruptcy by selecting the most appropriate financial ratios and thus creating its own prediction score equation. In comparison to other methods, mostly the Altman Z-Score model, the Tabu search model produces a higher success rate in predicting correctly the failure of firms or the continuous running of existing entities.

Keywords: Predicting Bankruptcy, Tabu Search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1891
8156 Parallel 2-Opt Local Search on GPU

Authors: Wen-Bao Qiao, Jean-Charles Créput

Abstract:

To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges’ 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2-opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.

Keywords: Doubly linked list, parallel 2-opt, tour division, GPU.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1180
8155 Tabu Search Approach to Solve Routing Issues in Communication Networks

Authors: Anant Oonsivilai, Wichai Srisuruk, Boonruang Marungsri, Thanatchai Kulworawanichpong

Abstract:

Optimal routing in communication networks is a major issue to be solved. In this paper, the application of Tabu Search (TS) in the optimum routing problem where the aim is to minimize the computational time and improvement of quality of the solution in the communication have been addressed. The goal is to minimize the average delays in the communication. The effectiveness of Tabu Search method is shown by the results of simulation to solve the shortest path problem. Through this approach computational cost can be reduced.

Keywords: Communication networks, optimum routing network, tabu search algorithm, shortest path.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2051
8154 Information System for Data Selection and New Information Acquisition for Reconfigurable Multifunctional Machine Tools

Authors: Sasho Guergov

Abstract:

The purpose of the paper is to develop an informationcontrol environment for overall management and self-reconfiguration of the reconfigurable multifunctional machine tool for machining both rotation and prismatic parts and high concentration of different technological operations - turning, milling, drilling, grinding, etc. For the realization of this purpose on the basis of defined sub-processes for the implementation of the technological process, architecture of the information-search system for machine control is suggested. By using the object-oriented method, a structure and organization of the search system based on agents and manager with central control are developed. Thus conditions for identification of available information in DBs, self-reconfiguration of technological system and entire control of the reconfigurable multifunctional machine tool are created.

Keywords: Information system, multifunctional machine tool, reconfigurable machine tool, search system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1292
8153 Content Analysis and Attitude of Thai Students towards Thai Series “Hormones: Season 2”

Authors: Siriporn Meenanan

Abstract:

The objective of this study is to investigate the attitude of Thai students towards the Thai series "Hormones the Series Season 2". This study was conducted in the quantitative research, and the questionnaires were used to collect data from 400 people of the sample group. Descriptive statistics were used in data analysis. The findings reveal that most participants have positive comments regarding the series. They strongly agreed that the series reflects on the way of life and problems of teenagers in Thailand. Hence, the participants believe that if adults have a chance to watch the series, they will have the better understanding of the teenagers. In addition, the participants also agreed that the contents of the play are appropriate and satisfiable as the contents of “Hormones the Series Season 2” will raise awareness among the teens and use it as a guide to prevent problems that might happen during their teenage life.

Keywords: Content analysis, attitude, Thai series, Hormones the series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 909
8152 Robust Design of Power System Stabilizers Using Adaptive Genetic Algorithms

Authors: H. Alkhatib, J. Duveau

Abstract:

Genetic algorithms (GAs) have been widely used for global optimization problems. The GA performance depends highly on the choice of the search space for each parameter to be optimized. Often, this choice is a problem-based experience. The search space being a set of potential solutions may contain the global optimum and/or other local optimums. A bad choice of this search space results in poor solutions. In this paper, our approach consists in extending the search space boundaries during the GA optimization, only when it is required. This leads to more diversification of GA population by new solutions that were not available with fixed search space boundaries. So, these dynamic search spaces can improve the GA optimization performances. The proposed approach is applied to power system stabilizer optimization for multimachine power system (16-generator and 68-bus). The obtained results are evaluated and compared with those obtained by ordinary GAs. Eigenvalue analysis and nonlinear system simulation results show the effectiveness of the proposed approach to damp out the electromechanical oscillation and enhance the global system stability.

Keywords: Genetic Algorithms, Multiobjective Optimization, Power System Stabilizer, Small Signal Stability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1684
8151 A Knowledge-Based E-mail System Using Semantic Categorization and Rating Mechanisms

Authors: Azleena Mohd Kassim, Muhamad Rashidi A. Rahman, Yu-N. Cheah

Abstract:

Knowledge-based e-mail systems focus on incorporating knowledge management approach in order to enhance the traditional e-mail systems. In this paper, we present a knowledgebased e-mail system called KS-Mail where people do not only send and receive e-mail conventionally but are also able to create a sense of knowledge flow. We introduce semantic processing on the e-mail contents by automatically assigning categories and providing links to semantically related e-mails. This is done to enrich the knowledge value of each e-mail as well as to ease the organization of the e-mails and their contents. At the application level, we have also built components like the service manager, evaluation engine and search engine to handle the e-mail processes efficiently by providing the means to share and reuse knowledge. For this purpose, we present the KS-Mail architecture, and elaborate on the details of the e-mail server and the application server. We present the ontology mapping technique used to achieve the e-mail content-s categorization as well as the protocols that we have developed to handle the transactions in the e-mail system. Finally, we discuss further on the implementation of the modules presented in the KS-Mail architecture.

Keywords: E-mail rating, knowledge-based system, ontology mapping, text categorization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1407
8150 Cognitive Landscape of Values – Understanding the Information Contents of Mental Representations

Authors: J. Maksimainen

Abstract:

The values of managers and employees in organizations are phenomena that have captured the interest of researchers at large. Despite this attention, there continues to be a lack of agreement on what values are and how they influence individuals, or how they are constituted in individuals- mind. In this article content-based approach is presented as alternative reference frame for exploring values. In content-based approach human thinking in different contexts is set at the focal point. Differences in valuations can be explained through the information contents of mental representations. In addition to the information contents, attention is devoted to those cognitive processes through which mental representations of values are constructed. Such informational contents are in decisive role for understanding human behavior. By applying content-based analysis to an examination of values as mental representations, it is possible to reach a deeper to the motivational foundation of behaviors, such as decision making in organizational procedures, through understanding the structure and meanings of specific values at play.

Keywords: Content-based Approach, Mental Content, Mental Representations, Organizational values, Values

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1392
8149 A New Effective Local Search Heuristic for the Maximum Clique Problem

Authors: S. Balaji

Abstract:

An edge based local search algorithm, called ELS, is proposed for the maximum clique problem (MCP), a well-known combinatorial optimization problem. ELS is a two phased local search method effectively £nds the near optimal solutions for the MCP. A parameter ’support’ of vertices de£ned in the ELS greatly reduces the more number of random selections among vertices and also the number of iterations and running times. Computational results on BHOSLIB and DIMACS benchmark graphs indicate that ELS is capable of achieving state-of-the-art-performance for the maximum clique with reasonable average running times.

Keywords: Maximum clique, local search, heuristic, NP-complete.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2208
8148 Using Pattern Search Methods for Minimizing Clustering Problems

Authors: Parvaneh Shabanzadeh, Malik Hj Abu Hassan, Leong Wah June, Maryam Mohagheghtabar

Abstract:

Clustering is one of an interesting data mining topics that can be applied in many fields. Recently, the problem of cluster analysis is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the cluster analysis problem based on nonsmooth optimization techniques is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. These methods do not explicitly use derivatives, an important feature that has not been addressed in previous studies. Results of numerical experiments are presented which demonstrate the effectiveness of the proposed method.

Keywords: Clustering functions, Non-smooth Optimization, Nonconvex Optimization, Pattern Search Method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
8147 Interactive Concept-based Search using MOEA:The Hierarchical Preferences Case

Authors: Gideon Avigad, Amiram Moshaiov, Neima Brauner

Abstract:

An IEC technique is described for a multi-objective search of conceptual solutions. The survivability of solutions is influenced by both model-based fitness and subjective human preferences. The concepts- preferences are articulated via a hierarchy of sub-concepts. The suggested method produces an objectivesubjective front. Academic example is employed to demonstrate the proposed approach.

Keywords: Conceptual solution, engineering design, hierarchical planning, multi-objective search, problem reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 972
8146 An Analysis of Dynamic Economic Dispatch Using Search Space Reduction Based Gravitational Search Algorithm

Authors: K. C. Meher, R. K. Swain, C. K. Chanda

Abstract:

This paper presents the performance analysis of dynamic search space reduction (DSR) based gravitational search algorithm (GSA) to solve dynamic economic dispatch of thermal generating units with valve point effects. Dynamic economic dispatch basically dictates the best setting of generator units with anticipated load demand over a definite period of time. In this paper, the presented technique is considered that deals an inequality constraints treatment mechanism known as DSR strategy to accelerate the optimization process. The presented method is demonstrated through five-unit test systems to verify its effectiveness and robustness. The simulation results are compared with other existing evolutionary methods reported in the literature. It is intuited from the comparison that the fuel cost and other performances of the presented approach yield fruitful results with marginal value of simulation time.

Keywords: Dynamic economic dispatch, dynamic search space reduction strategy, gravitational search algorithm, ramp rate limits, valve-point effects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1452
8145 Organization Model of Semantic Document Repository and Search Techniques for Studying Information Technology

Authors: Nhon Do, Thuong Huynh, An Pham

Abstract:

Nowadays, organizing a repository of documents and resources for learning on a special field as Information Technology (IT), together with search techniques based on domain knowledge or document-s content is an urgent need in practice of teaching, learning and researching. There have been several works related to methods of organization and search by content. However, the results are still limited and insufficient to meet user-s demand for semantic document retrieval. This paper presents a solution for the organization of a repository that supports semantic representation and processing in search. The proposed solution is a model which integrates components such as an ontology describing domain knowledge, a database of document repository, semantic representation for documents and a file system; with problems, semantic processing techniques and advanced search techniques based on measuring semantic similarity. The solution is applied to build a IT learning materials management system of a university with semantic search function serving students, teachers, and manager as well. The application has been implemented, tested at the University of Information Technology, Ho Chi Minh City, Vietnam and has achieved good results.

Keywords: document retrieval system, knowledgerepresentation, document representation, semantic search, ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1665
8144 Using the Semantic Web in Ubiquitous and Mobile Computing: the Morfeo Experience

Authors: José M. Cantera, Miguel Jiménez, Genoveva López, Javier Soriano

Abstract:

With the advent of emerging personal computing paradigms such as ubiquitous and mobile computing, Web contents are becoming accessible from a wide range of mobile devices. Since these devices do not have the same rendering capabilities, Web contents need to be adapted for transparent access from a variety of client agents. Such content adaptation results in better rendering and faster delivery to the client device. Nevertheless, Web content adaptation sets new challenges for semantic markup. This paper presents an advanced components platform, called MorfeoSMC, enabling the development of mobility applications and services according to a channel model based on Services Oriented Architecture (SOA) principles. It then goes on to describe the potential for integration with the Semantic Web through a novel framework of external semantic annotation of mobile Web contents. The role of semantic annotation in this framework is to describe the contents of individual documents themselves, assuring the preservation of the semantics during the process of adapting content rendering, as well as to exploit these semantic annotations in a novel user profile-aware content adaptation process. Semantic Web content adaptation is a way of adding value to and facilitates repurposing of Web contents (enhanced browsing, Web Services location and access, etc).

Keywords: Semantic web, ubiquitous and mobile computing, web content transcoding, semantic markup, mobile computing middleware and services.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577
8143 Digital Image Forensics: Discovering the History of Digital Images

Authors: Gurinder Singh, Kulbir Singh

Abstract:

Digital multimedia contents such as image, video, and audio can be tampered easily due to the availability of powerful editing softwares. Multimedia forensics is devoted to analyze these contents by using various digital forensic techniques in order to validate their authenticity. Digital image forensics is dedicated to investigate the reliability of digital images by analyzing the integrity of data and by reconstructing the historical information of an image related to its acquisition phase. In this paper, a survey is carried out on the forgery detection by considering the most recent and promising digital image forensic techniques.

Keywords: Computer forensics, multimedia forensics, image ballistics, camera source identification, forgery detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1734
8142 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: Visual search, deep learning, convolutional neural network, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 774
8141 Compression of Semistructured Documents

Authors: Leo Galambos, Jan Lansky, Katsiaryna Chernik

Abstract:

EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as a part of the index. Such a requirement leads us to pick up an appropriate compression algorithm which would reduce the space demand. One of the solutions could be to use common compression methods, for instance gzip or bzip2, but it might be preferable if we develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist a special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio

Keywords: Compression, search engine, HTML, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532
8140 Concurrency in Web Access Patterns Mining

Authors: Jing Lu, Malcolm Keech, Weiru Chen

Abstract:

Web usage mining is an interesting application of data mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the key challenges for web access patterns mining is tackling the problem of mining richly structured patterns. This paper proposes a novel model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the search for and representation of concurrency between access patterns on the web. From experiments conducted on large-scale synthetic sequence data as well as real web access data, it is demonstrated that CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.

Keywords: concurrent access patterns (CAP), CAP mining and modelling, CAP-Graph, web access patterns (WAP), WAP-Graph, Web usage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1677
8139 Impact of Similarity Ratings on Human Judgement

Authors: Ian A. McCulloh, Madelaine Zinser, Jesse Patsolic, Michael Ramos

Abstract:

Recommender systems are a common artificial intelligence (AI) application. For any given input, a search system will return a rank-ordered list of similar items. As users review returned items, they must decide when to halt the search and either revise search terms or conclude their requirement is novel with no similar items in the database. We present a statistically designed experiment that investigates the impact of similarity ratings on human judgement to conclude a search item is novel and halt the search. In the study, 450 participants were recruited from Amazon Mechanical Turk to render judgement across 12 decision tasks. We find the inclusion of ratings increases the human perception that items are novel. Percent similarity increases novelty discernment when compared with star-rated similarity or the absence of a rating. Ratings reduce the time to decide and improve decision confidence. This suggests that the inclusion of similarity ratings can aid human decision-makers in knowledge search tasks.

Keywords: Ratings, rankings, crowdsourcing, empirical studies, user studies, similarity measures, human-centered computing, novelty in information retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 182
8138 Specification of Attributes of a Multimedia Presentation for Presentation Manager

Authors: Veli Hakkoymaz, Alpaslan Altunköprü

Abstract:

A multimedia presentation system refers to the integration of a multimedia database with a presentation manager which has the functionality of content selection, organization and playout of multimedia presentations. It requires high performance of involved system components. Starting from multimedia information capture until the presentation delivery, high performance tools are required for accessing, manipulating, storing and retrieving these segments, for transferring and delivering them in a presentation terminal according to a playout order. The organization of presentations is a complex task in that the display order of presentation contents (in time and space) must be specified. A multimedia presentation contains audio, video, images and text media types. The critical decisions for presentation construction include what the contents are, how the contents are organized, and once the decision is made on the organization of the contents of the presentation, it must be conveyed to the end user in the correct organizational order and in a timely fashion. This paper introduces a framework for specification of multimedia presentations and describes the design of sample presentations using this framework from a multimedia database.

Keywords: Multimedia presentation, temporal specification, SMIL, spatial specification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1773
8137 Optimizing of Fuzzy C-Means Clustering Algorithm Using GA

Authors: Mohanad Alata, Mohammad Molhim, Abdullah Ramini

Abstract:

Fuzzy C-means Clustering algorithm (FCM) is a method that is frequently used in pattern recognition. It has the advantage of giving good modeling results in many cases, although, it is not capable of specifying the number of clusters by itself. In FCM algorithm most researchers fix weighting exponent (m) to a conventional value of 2 which might not be the appropriate for all applications. Consequently, the main objective of this paper is to use the subtractive clustering algorithm to provide the optimal number of clusters needed by FCM algorithm by optimizing the parameters of the subtractive clustering algorithm by an iterative search approach and then to find an optimal weighting exponent (m) for the FCM algorithm. In order to get an optimal number of clusters, the iterative search approach is used to find the optimal single-output Sugenotype Fuzzy Inference System (FIS) model by optimizing the parameters of the subtractive clustering algorithm that give minimum least square error between the actual data and the Sugeno fuzzy model. Once the number of clusters is optimized, then two approaches are proposed to optimize the weighting exponent (m) in the FCM algorithm, namely, the iterative search approach and the genetic algorithms. The above mentioned approach is tested on the generated data from the original function and optimal fuzzy models are obtained with minimum error between the real data and the obtained fuzzy models.

Keywords: Fuzzy clustering, Fuzzy C-Means, Genetic Algorithm, Sugeno fuzzy systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3201