Search results for: Multidimensional indexing
156 Indexing and Searching of Image Data in Multimedia Databases Using Axial Projection
Authors: Khalid A. Kaabneh
Abstract:
This paper introduces and studies new indexing techniques for content-based queries in images databases. Indexing is the key to providing sophisticated, accurate and fast searches for queries in image data. This research describes a new indexing approach, which depends on linear modeling of signals, using bases for modeling. A basis is a set of chosen images, and modeling an image is a least-squares approximation of the image as a linear combination of the basis images. The coefficients of the basis images are taken together to serve as index for that image. The paper describes the implementation of the indexing scheme, and presents the findings of our extensive evaluation that was conducted to optimize (1) the choice of the basis matrix (B), and (2) the size of the index A (N). Furthermore, we compare the performance of our indexing scheme with other schemes. Our results show that our scheme has significantly higher performance.
Keywords: Axial Projection, images, indexing, multimedia database, searching.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1386155 Lecture Video Indexing and Retrieval Using Topic Keywords
Authors: B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa
Abstract:
In this paper, we propose a framework to help users to search and retrieve the portions in the lecture video of their interest. This is achieved by temporally segmenting and indexing the lecture video using the topic keywords. We use transcribed text from the video and documents relevant to the video topic extracted from the web for this purpose. The keywords for indexing are found by applying the non-negative matrix factorization (NMF) topic modeling techniques on the web documents. Our proposed technique first creates indices on the transcribed documents using the topic keywords, and these are mapped to the video to find the start and end time of the portions of the video for a particular topic. This time information is stored in the index table along with the topic keyword which is used to retrieve the specific portions of the video for the query provided by the users.
Keywords: Video indexing and retrieval, lecture videos, content based video search, multimodal indexing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1553154 Object-Based Image Indexing and Retrieval in DCT Domain using Clustering Techniques
Authors: Hossein Nezamabadi-pour, Saeid Saryazdi
Abstract:
In this paper, we present a new and effective image indexing technique that extracts features directly from DCT domain. Our proposed approach is an object-based image indexing. For each block of size 8*8 in DCT domain a feature vector is extracted. Then, feature vectors of all blocks of image using a k-means algorithm is clustered into groups. Each cluster represents a special object of the image. Then we select some clusters that have largest members after clustering. The centroids of the selected clusters are taken as image feature vectors and indexed into the database. Also, we propose an approach for using of proposed image indexing method in automatic image classification. Experimental results on a database of 800 images from 8 semantic groups in automatic image classification are reported.
Keywords: Object-based image retrieval, DCT domain, Image indexing, Image classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2024153 Concept Indexing using Ontology and Supervised Machine Learning
Authors: Rossitza M. Setchi, Qiao Tang
Abstract:
Nowadays, ontologies are the only widely accepted paradigm for the management of sharable and reusable knowledge in a way that allows its automatic interpretation. They are collaboratively created across the Web and used to index, search and annotate documents. The vast majority of the ontology based approaches, however, focus on indexing texts at document level. Recently, with the advances in ontological engineering, it became clear that information indexing can largely benefit from the use of general purpose ontologies which aid the indexing of documents at word level. This paper presents a concept indexing algorithm, which adds ontology information to words and phrases and allows full text to be searched, browsed and analyzed at different levels of abstraction. This algorithm uses a general purpose ontology, OntoRo, and an ontologically tagged corpus, OntoCorp, both developed for the purpose of this research. OntoRo and OntoCorp are used in a two-stage supervised machine learning process aimed at generating ontology tagging rules. The first experimental tests show a tagging accuracy of 78.91% which is encouraging in terms of the further improvement of the algorithm.Keywords: Concepts, indexing, machine learning, ontology, tagging.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1677152 Semantic Indexing Approach of a Corpora Based On Ontology
Authors: Mohammed Erritali
Abstract:
The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. This paper presents a new semantic indexing approach of a documentary corpus. The indexing process starts first by a term weighting phase to determine the importance of these terms in the documents. Then the use of a thesaurus like Wordnet allows moving to the conceptual level. Each candidate concept is evaluated by determining its level of representation of the document, that is to say, the importance of the concept in relation to other concepts of the document. Finally, the semantic index is constructed by attaching to each concept of the ontology, the documents of the corpus in which these concepts are found.Keywords: Semantic, indexing, corpora, WordNet, ontology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1367151 Image Indexing Using a Color Similarity Metric based on the Human Visual System
Authors: Angelo Nodari, Ignazio Gallo
Abstract:
The novelty proposed in this study is twofold and consists in the developing of a new color similarity metric based on the human visual system and a new color indexing based on a textual approach. The new color similarity metric proposed is based on the color perception of the human visual system. Consequently the results returned by the indexing system can fulfill as much as possibile the user expectations. We developed a web application to collect the users judgments about the similarities between colors, whose results are used to estimate the metric proposed in this study. In order to index the image's colors, we used a text indexing engine to facilitate the integration of visual features in a database of text documents. The textual signature is build by weighting the image's colors in according to their occurrence in the image. The use of a textual indexing engine, provide us a simple, fast and robust solution to index images. A typical usage of the system proposed in this study, is the development of applications whose data type is both visual and textual. In order to evaluate the proposed method we chose a price comparison engine as a case of study, collecting a series of commercial offers containing the textual description and the image representing a specific commercial offer.
Keywords: Color Extraction, Content-Based Image Retrieval, Indexing
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3025150 SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space
Authors: Sanaa Chafik, ImaneDaoudi, Mounim A. El Yacoubi, Hamid El Ouardi
Abstract:
Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LSH is the large memory consumption. In order to achieve a good accuracy, a large number of hash tables is required. In this paper, we propose a new hashing algorithm to overcome the storage space problem and improve query time, while keeping a good accuracy as similar to that achieved by the original Euclidean LSH. The Experimental results on a real large-scale dataset show that the proposed approach achieves good performances and consumes less memory than the Euclidean LSH.
Keywords: Approximate Nearest Neighbor Search, Content based image retrieval (CBIR), Curse of dimensionality, Locality sensitive hashing, Multidimensional indexing, Scalability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2576149 Grouping and Indexing Color Features for Efficient Image Retrieval
Authors: M. V. Sudhamani, C. R. Venugopal
Abstract:
Content-based Image Retrieval (CBIR) aims at searching image databases for specific images that are similar to a given query image based on matching of features derived from the image content. This paper focuses on a low-dimensional color based indexing technique for achieving efficient and effective retrieval performance. In our approach, the color features are extracted using the mean shift algorithm, a robust clustering technique. Then the cluster (region) mode is used as representative of the image in 3-D color space. The feature descriptor consists of the representative color of a region and is indexed using a spatial indexing method that uses *R -tree thus avoiding the high-dimensional indexing problems associated with the traditional color histogram. Alternatively, the images in the database are clustered based on region feature similarity using Euclidian distance. Only representative (centroids) features of these clusters are indexed using *R -tree thus improving the efficiency. For similarity retrieval, each representative color in the query image or region is used independently to find regions containing that color. The results of these methods are compared. A JAVA based query engine supporting query-by- example is built to retrieve images by color.
Keywords: Content-based, indexing, cluster, region.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1810148 CSOLAP (Continuous Spatial On-Line Analytical Processing)
Authors: Taher Omran Ahmed, Abdullatif Mihdi Buras
Abstract:
Decision support systems are usually based on multidimensional structures which use the concept of hypercube. Dimensions are the axes on which facts are analyzed and form a space where a fact is located by a set of coordinates at the intersections of members of dimensions. Conventional multidimensional structures deal with discrete facts linked to discrete dimensions. However, when dealing with natural continuous phenomena the discrete representation is not adequate. There is a need to integrate spatiotemporal continuity within multidimensional structures to enable analysis and exploration of continuous field data. Research issues that lead to the integration of spatiotemporal continuity in multidimensional structures are numerous. In this paper, we discuss research issues related to the integration of continuity in multidimensional structures, present briefly a multidimensional model for continuous field data. We also define new aggregation operations. The model and the associated operations and measures are validated by a prototype.Keywords: Continuous Data, Data warehousing, DecisionSupport, SOLAP
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1594147 3D Objects Indexing with a Direct and Analytical Method for Calculating the Spherical Harmonics Coefficients
Authors: S. Hellam, Y. Oulahrir, F. El Mounchid, A. Sadiq, S. Mbarki
Abstract:
In this paper, we propose a new method for threedimensional object indexing based on D.A.M.C-S.H.C descriptor (Direct and Analytical Method for Calculating the Spherical Harmonics Coefficients). For this end, we propose a direct calculation of the coefficients of spherical harmonics with perfect precision. The aims of the method are to minimize, the processing time on the 3D objects database and the searching time of similar objects to a request object. Firstly we start by defining the new descriptor using a new division of 3-D object in a sphere. Then we define a new distance which will be tested and prove his efficiency in the search for similar objects in the database in which we have objects with very various and important size.Keywords: 3D Object indexing, 3D shape descriptor, spherical harmonic, 3D Object similarity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1476146 Conceptual Multidimensional Model
Authors: Manpreet Singh, Parvinder Singh, Suman
Abstract:
The data is available in abundance in any business organization. It includes the records for finance, maintenance, inventory, progress reports etc. As the time progresses, the data keep on accumulating and the challenge is to extract the information from this data bank. Knowledge discovery from these large and complex databases is the key problem of this era. Data mining and machine learning techniques are needed which can scale to the size of the problems and can be customized to the application of business. For the development of accurate and required information for particular problem, business analyst needs to develop multidimensional models which give the reliable information so that they can take right decision for particular problem. If the multidimensional model does not possess the advance features, the accuracy cannot be expected. The present work involves the development of a Multidimensional data model incorporating advance features. The criterion of computation is based on the data precision and to include slowly change time dimension. The final results are displayed in graphical form.Keywords: Multidimensional, data precision.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456145 Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm
Authors: Jehad A. H. Hammad, Nur'Aini binti Abdul Rashid
Abstract:
With the rapid development in the field of life sciences and the flooding of genomic information, the need for faster and scalable searching methods has become urgent. One of the approaches that were investigated is indexing. The indexing methods have been categorized into three categories which are the lengthbased index algorithms, transformation-based algorithms and mixed techniques-based algorithms. In this research, we focused on the transformation based methods. We embedded the N-gram method into the transformation-based method to build an inverted index table. We then applied the parallel methods to speed up the index building time and to reduce the overall retrieval time when querying the genomic database. Our experiments show that the use of N-Gram transformation algorithm is an economical solution; it saves time and space too. The result shows that the size of the index is smaller than the size of the dataset when the size of N-Gram is 5 and 6. The parallel N-Gram transformation algorithm-s results indicate that the uses of parallel programming with large dataset are promising which can be improved further.Keywords: Biological sequence, Database index, N-gram indexing, Parallel computing, Sequence retrieval.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2135144 Multidimensional Compromise Programming Evaluation of Digital Commerce Websites
Authors: C. Ardil
Abstract:
Multidimensional compromise programming evaluation of digital commerce websites is essential not only to have recommendations for improvement, but also to make comparisons with global business competitors. This research provides a multidimensional decision making model that prioritizes the objective criteria weights of various commerce websites using multidimensional compromise solution. Evaluation of digital commerce website quality can be considered as a complex information system structure including qualitative and quantitative factors for a multicriteria decision making problem. The proposed multicriteria decision making approach mainly consists of three sequential steps for the selection problem. In the first step, three major different evaluation criteria are characterized for website ranking problem. In the second step, identified critical criteria are weighted using the standard deviation procedure. In the third step, the multidimensional compromise programming is applied to rank the digital commerce websites.
Keywords: Standard deviation, commerce website, website evaluation, multicriteria decision making, multicriteria compromise programming, website quality, multidimensional decision analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 815143 Compressed Suffix Arrays to Self-Indexes Based on Partitioned Elias-Fano
Abstract:
A practical and simple self-indexing data structure, Partitioned Elias-Fano (PEF) - Compressed Suffix Arrays (CSA), is built in linear time for the CSA based on PEF indexes. Moreover, the PEF-CSA is compared with two classical compressed indexing methods, Ferragina and Manzini implementation (FMI) and Sad-CSA on different type and size files in Pizza & Chili. The PEF-CSA performs better on the existing data in terms of the compression ratio, count, and locates time except for the evenly distributed data such as proteins data. The observations of the experiments are that the distribution of the φ is more important than the alphabet size on the compression ratio. Unevenly distributed data φ makes better compression effect, and the larger the size of the hit counts, the longer the count and locate time.
Keywords: Compressed suffix array, self-indexing, partitioned Elias-Fano, PEF-CSA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1080142 An Semantic Algorithm for Text Categoritation
Authors: Xu Zhao
Abstract:
Text categorization techniques are widely used to many Information Retrieval (IR) applications. In this paper, we proposed a simple but efficient method that can automatically find the relationship between any pair of terms and documents, also an indexing matrix is established for text categorization. We call this method Indexing Matrix Categorization Machine (IMCM). Several experiments are conducted to show the efficiency and robust of our algorithm.
Keywords: Text categorization, Sub-space learning, Latent Semantic Space
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1466141 Visualization and Indexing of Spectral Databases
Authors: Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, Janos Abonyi
Abstract:
On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Keywords: indexing high dimensional databases, dimensional reduction, clustering, similarity, k-nn algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1768140 Data Extraction of XML Files using Searching and Indexing Techniques
Authors: Sushma Satpute, Vaishali Katkar, Nilesh Sahare
Abstract:
XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.
Keywords: XML Retrieval, Indexed Search, Information Retrieval.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1782139 NOHIS-Tree: High-Dimensional Index Structure for Similarity Search
Authors: Mounira Taileb, Sami Touati
Abstract:
In Content-Based Image Retrieval systems it is important to use an efficient indexing technique in order to perform and accelerate the search in huge databases. The used indexing technique should also support the high dimensions of image features. In this paper we present the hierarchical index NOHIS-tree (Non Overlapping Hierarchical Index Structure) when we scale up to very large databases. We also present a study of the influence of clustering on search time. The performance test results show that NOHIS-tree performs better than SR-tree. Tests also show that NOHIS-tree keeps its performances in high dimensional spaces. We include the performance test that try to determine the number of clusters in NOHIS-tree to have the best search time.Keywords: High-dimensional indexing, k-nearest neighborssearch.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443138 UB-Tree Indexing for Semantic Query Optimization of Range Queries
Authors: S. Housseno, A. Simonet, M. Simonet
Abstract:
Semantic query optimization consists in restricting the search space in order to reduce the set of objects of interest for a query. This paper presents an indexing method based on UB-trees and a static analysis of the constraints associated to the views of the database and to any constraint expressed on attributes. The result of the static analysis is a partitioning of the object space into disjoint blocks. Through Space Filling Curve (SFC) techniques, each fragment (block) of the partition is assigned a unique identifier, enabling the efficient indexing of fragments by UB-trees. The search space corresponding to a range query is restricted to a subset of the blocks of the partition. This approach has been developed in the context of a KB-DBMS but it can be applied to any relational system.Keywords: Index, Range query, UB-tree, Space Filling Curve, Query optimization, Views, Database, Integrity Constraint, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1499137 Choosing R-tree or Quadtree Spatial DataIndexing in One Oracle Spatial Database System to Make Faster Showing Geographical Map in Mobile Geographical Information System Technology
Authors: Maruto Masserie Sardadi, Mohd Shafry bin Mohd Rahim, Zahabidin Jupri, Daut bin Daman
Abstract:
The latest Geographic Information System (GIS) technology makes it possible to administer the spatial components of daily “business object," in the corporate database, and apply suitable geographic analysis efficiently in a desktop-focused application. We can use wireless internet technology for transfer process in spatial data from server to client or vice versa. However, the problem in wireless Internet is system bottlenecks that can make the process of transferring data not efficient. The reason is large amount of spatial data. Optimization in the process of transferring and retrieving data, however, is an essential issue that must be considered. Appropriate decision to choose between R-tree and Quadtree spatial data indexing method can optimize the process. With the rapid proliferation of these databases in the past decade, extensive research has been conducted on the design of efficient data structures to enable fast spatial searching. Commercial database vendors like Oracle have also started implementing these spatial indexing to cater to the large and diverse GIS. This paper focuses on the decisions to choose R-tree and quadtree spatial indexing using Oracle spatial database in mobile GIS application. From our research condition, the result of using Quadtree and R-tree spatial data indexing method in one single spatial database can save the time until 42.5%.Keywords: Indexing, Mobile GIS, MapViewer, Oracle SpatialDatabase.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4034136 Multidimensional Compromise Optimization for Development Ranking of the Gulf Cooperation Council Countries and Turkey
Authors: C. Ardil
Abstract:
In this research, a multidimensional compromise optimization method is proposed for multidimensional decision making analysis in the development ranking of the Gulf Cooperation Council Countries and Turkey. The proposed approach presents ranking solutions resulting from different multicriteria decision analyses, which yield different ranking orders for the same ranking problem, consisting of a set of alternatives in terms of numerous competing criteria when they are applied with the same numerical data. The multiobjective optimization decision making problem is considered in three sequential steps. In the first step, five different criteria related to the development ranking are gathered from the research field. In the second step, identified evaluation criteria are, objectively, weighted using standard deviation procedure. In the third step, a country selection problem is illustrated with a numerical example as an application of the proposed multidimensional compromise optimization model. Finally, multidimensional compromise optimization approach is applied to rank the Gulf Cooperation Council Countries and Turkey.
Keywords: Standard deviation, performance evaluation, multicriteria decision making, multidimensional compromise optimization, vector normalization, multicriteria decision making, multicriteria analysis, multidimensional decision analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 810135 Graph Codes-2D Projections of Multimedia Feature Graphs for Fast and Effective Retrieval
Authors: Stefan Wagenpfeil, Felix Engel, Paul McKevitt, Matthias Hemmje
Abstract:
Multimedia Indexing and Retrieval is generally de-signed and implemented by employing feature graphs. These graphs typically contain a significant number of nodes and edges to reflect the level of detail in feature detection. A higher level of detail increases the effectiveness of the results but also leads to more complex graph structures. However, graph-traversal-based algorithms for similarity are quite inefficient and computation intensive, espe-cially for large data structures. To deliver fast and effective retrieval, an efficient similarity algorithm, particularly for large graphs, is mandatory. Hence, in this paper, we define a graph-projection into a 2D space (Graph Code) as well as the corresponding algorithms for indexing and retrieval. We show that calculations in this space can be performed more efficiently than graph-traversals due to a simpler processing model and a high level of parallelisation. In consequence, we prove that the effectiveness of retrieval also increases substantially, as Graph Codes facilitate more levels of detail in feature fusion. Thus, Graph Codes provide a significant increase in efficiency and effectiveness (especially for Multimedia indexing and retrieval) and can be applied to images, videos, audio, and text information.
Keywords: indexing, retrieval, multimedia, graph code, graph algorithm
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 440134 Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern
Authors: Mahdi Esmaeili, Mansour Tarafdar
Abstract:
The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1475133 Using Interval Trees for Approximate Indexing of Instances
Authors: Khalil el Hindi
Abstract:
This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.
Keywords: Instance based learning, interval trees, the knn algorithm, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1509132 An Efficient Graph Query Algorithm Based on Important Vertices and Decision Features
Authors: Xiantong Li, Jianzhong Li
Abstract:
Graph has become increasingly important in modeling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. Different from the existing methods, our approach, called VFM (Vertex to Frequent Feature Mapping), makes use of vertices and decision features as the basic indexing feature. VFM constructs two mappings between vertices and frequent features to answer graph queries. The VFM approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. The results show that the proposed method not only avoids the enumeration method of getting subgraphs of query graph, but also effectively reduces the subgraph isomorphism tests between the query graph and graphs in candidate answer set in verification stage.Keywords: Decision Feature, Frequent Feature, Graph Dataset, Graph Query
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870131 Optimal Algorithm for Constructing the Delaunay Triangulation in Ed
Authors: V. Tereshchenko, D. Taran
Abstract:
In this paper we propose a new approach to constructing the Delaunay Triangulation and the optimum algorithm for the case of multidimensional spaces (d ≥ 2). Analysing the modern state, it is possible to draw a conclusion, that the ideas for the existing effective algorithms developed for the case of d ≥ 2 are not simple to generalize on a multidimensional case, without the loss of efficiency. We offer for the solving this problem an effective algorithm that satisfies all the given requirements. But theoretical complexity of the problem it is impossible to improve as the Worst - Case Optimality for algorithms of solving such a problem is proved.
Keywords: Delaunay triangulation, multidimensional space, Voronoi Diagram, optimal algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1980130 Region-Based Segmentation of Generic Video Scenes Indexing
Authors: Aree A. Mohammed
Abstract:
In this work we develop an object extraction method and propose efficient algorithms for object motion characterization. The set of proposed tools serves as a basis for development of objectbased functionalities for manipulation of video content. The estimators by different algorithms are compared in terms of quality and performance and tested on real video sequences. The proposed method will be useful for the latest standards of encoding and description of multimedia content – MPEG4 and MPEG7.Keywords: Object extraction, Video indexing, Segmentation, Optical flow, Motion estimators.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1351129 3D Objects Indexing Using Spherical Harmonic for Optimum Measurement Similarity
Authors: S. Hellam, Y. Oulahrir, F. El Mounchid, A. Sadiq, S. Mbarki
Abstract:
In this paper, we propose a method for three-dimensional (3-D)-model indexing based on defining a new descriptor, which we call new descriptor using spherical harmonics. The purpose of the method is to minimize, the processing time on the database of objects models and the searching time of similar objects to request object. Firstly we start by defining the new descriptor using a new division of 3-D object in a sphere. Then we define a new distance which will be used in the search for similar objects in the database.
Keywords: 3D indexation, spherical harmonic, similarity of 3D objects.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2230128 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights
Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan
Abstract:
The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyse huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic wellbeing is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that support the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.
Keywords: COVID-19, big data, data analysis, indexing, NoSQL, sharding, scalability, poverty.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 66127 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data
Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro
Abstract:
Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.
Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 458