Search results for: data retrieval.
7457 A Content Vector Model for Text Classification
Authors: Eric Jiang
Abstract:
As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18857456 Developing a Coronavirus Academic Paper Sorting Application
Authors: Christina A. van Hal, Xiaoqian Jiang, Luyao Chen, Yan Chu, Robert D. Jolly, Yaobin Lin, Jitian Zhao, Kang Lin Hsieh
Abstract:
The COVID-19 Literature Summary App, now live on the university website, was created for the primary purpose of enabling academicians and clinicians to quickly sort through the vast array of recent coronavirus publications by topics of interest. Multiple methods of summarizing and sorting the manuscripts were created. A summary page introduces the application function and capabilities, while an interactive map provides daily updates on infection, death, and recovery rates. A page with a pivot table allows publication sorting by topic, with an interactive data table that allows sorting topics by columns, as wells as the capability to view abstracts. Additionally, publications may be sorted by the medical topics they cover. We used the CORD-19 database to compile lists of publications. The data table can sort binary variables, allowing the user to pick desired publication topics, such as papers that describe COVID-19 symptoms. The application is primarily designed for use by researchers but can be used by anybody who wants a faster and more efficient means of locating papers of interest.
Keywords: COVID-19, literature summary, information retrieval, snorkel
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4697455 Support Vector Machine based Intelligent Watermark Decoding for Anticipated Attack
Authors: Syed Fahad Tahir, Asifullah Khan, Abdul Majid, Anwar M. Mirza
Abstract:
In this paper, we present an innovative scheme of blindly extracting message bits from an image distorted by an attack. Support Vector Machine (SVM) is used to nonlinearly classify the bits of the embedded message. Traditionally, a hard decoder is used with the assumption that the underlying modeling of the Discrete Cosine Transform (DCT) coefficients does not appreciably change. In case of an attack, the distribution of the image coefficients is heavily altered. The distribution of the sufficient statistics at the receiving end corresponding to the antipodal signals overlap and a simple hard decoder fails to classify them properly. We are considering message retrieval of antipodal signal as a binary classification problem. Machine learning techniques like SVM is used to retrieve the message, when certain specific class of attacks is most probable. In order to validate SVM based decoding scheme, we have taken Gaussian noise as a test case. We generate a data set using 125 images and 25 different keys. Polynomial kernel of SVM has achieved 100 percent accuracy on test data.Keywords: Bit Correct Ratio (BCR), Grid Search, Intelligent Decoding, Jackknife Technique, Support Vector Machine (SVM), Watermarking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16707454 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient
Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart
Abstract:
Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.Keywords: Data mining, information retrieval system, multi-label, problem transformation, histogram of gradients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13157453 Integrating Computational Intelligence Techniques and Assessment Agents in ELearning Environments
Authors: Konstantinos C. Giotopoulos, Christos E. Alexakos, Grigorios N. Beligiannis, Spiridon D.Likothanassis
Abstract:
In this contribution an innovative platform is being presented that integrates intelligent agents and evolutionary computation techniques in legacy e-learning environments. It introduces the design and development of a scalable and interoperable integration platform supporting: I) various assessment agents for e-learning environments, II) a specific resource retrieval agent for the provision of additional information from Internet sources matching the needs and profile of the specific user and III) a genetic algorithm designed to extract efficient information (classifying rules) based on the students- answering input data. The agents are implemented in order to provide intelligent assessment services based on computational intelligence techniques such as Bayesian Networks and Genetic Algorithms. The proposed Genetic Algorithm (GA) is used in order to extract efficient information (classifying rules) based on the students- answering input data. The idea of using a GA in order to fulfil this difficult task came from the fact that GAs have been widely used in applications including classification of unknown data. The utilization of new and emerging technologies like web services allows integrating the provided services to any web based legacy e-learning environment.Keywords: Bayesian Networks, Computational Intelligencetechniques, E-learning legacy systems, Service Oriented Integration, Intelligent Agents, Genetic Algorithms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17447452 CookIT: A Web Portal for the Preservation and Dissemination of Traditional Italian Recipes
Authors: M. T. Artese, G. Ciocca, I. Gagliardi
Abstract:
Food is a social and cultural aspect of every individual. Food products, processing, and traditions have been identified as cultural objects carrying history and identity of social groups. Traditional recipes are passed down from one generation to the other, often to strengthen the link with the territory. The paper presents CookIT, a web portal developed to collect Italian traditional recipes related to regional cuisine, with the purpose to disseminate the knowledge of typical Italian recipes and the Mediterranean diet which is a significant part of Italian cuisine. The system designed is completed with multimodal means of browsing and data retrieval. Stored recipes can be retrieved integrating and combining a number of different methods and keys, while the results are displayed using classical styles, such as list and mosaic, and also using maps and graphs, with which users can play using available keys for interaction.
Keywords: Collaborative portal, Italian cuisine, intangible cultural heritage, traditional recipes, searching and browsing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8877451 An Semantic Algorithm for Text Categoritation
Authors: Xu Zhao
Abstract:
Text categorization techniques are widely used to many Information Retrieval (IR) applications. In this paper, we proposed a simple but efficient method that can automatically find the relationship between any pair of terms and documents, also an indexing matrix is established for text categorization. We call this method Indexing Matrix Categorization Machine (IMCM). Several experiments are conducted to show the efficiency and robust of our algorithm.
Keywords: Text categorization, Sub-space learning, Latent Semantic Space
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14677450 Feasibility Study of MongoDB and Radio Frequency Identification Technology in Asset Tracking System
Authors: Mohd Noah A. Rahman, Afzaal H. Seyal, Sharul T. Tajuddin, Hartiny Md Azmi
Abstract:
Taking into consideration the real time situation specifically the higher academic institutions, small, medium to large companies, public to private sectors and the remaining sectors, do experience the inventory or asset shrinkages due to theft, loss or even inventory tracking errors. This happening is due to a zero or poor security systems and measures being taken and implemented in their organizations. Henceforth, implementing the Radio Frequency Identification (RFID) technology into any manual or existing web-based system or web application can simply deter and will eventually solve certain major issues to serve better data retrieval and data access. Having said, this manual or existing system can be enhanced into a mobile-based system or application. In addition to that, the availability of internet connections can aid better services of the system. Such involvement of various technologies resulting various privileges to individuals or organizations in terms of accessibility, availability, mobility, efficiency, effectiveness, real-time information and also security. This paper will look deeper into the integration of mobile devices with RFID technologies with the purpose of asset tracking and control. Next, it is to be followed by the development and utilization of MongoDB as the main database to store data and its association with RFID technology. Finally, the development of a web based system which can be viewed in a mobile based formation with the aid of Hypertext Preprocessor (PHP), MongoDB, Hyper-Text Markup Language 5 (HTML5), Android, JavaScript and AJAX programming language.
Keywords: RFID, asset tracking system, MongoDB, NoSQL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16497449 Storing OWL Ontologies in SQL Relational Databases
Authors: Irina Astrova, Nahum Korda, Ahto Kalja
Abstract:
Relational databases are often used as a basis for persistent storage of ontologies to facilitate rapid operations such as search and retrieval, and to utilize the benefits of relational databases management systems such as transaction management, security and integrity control. On the other hand, there appear more and more OWL files that contain ontologies. Therefore, this paper proposes to extract ontologies from OWL files and then store them in relational databases. A prerequisite for this storing is transformation of ontologies to relational databases, which is the purpose of this paper.Keywords: Ontologies, relational databases, SQL, and OWL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 52147448 Augmented Reality for Maintenance Operator for Problem Inspections
Authors: Chong-Yang Qiao, Teeravarunyou Sakol
Abstract:
Current production-oriented factories need maintenance operators to work in shifts monitoring and inspecting complex systems and different equipment in the situation of mechanical breakdown. Augmented reality (AR) is an emerging technology that embeds data into the environment for situation awareness to help maintenance operators make decisions and solve problems. An application was designed to identify the problem of steam generators and inspection centrifugal pumps. The objective of this research was to find the best medium of AR and type of problem solving strategies among analogy, focal object method and mean-ends analysis. Two scenarios of inspecting leakage were temperature and vibration. Two experiments were used in usability evaluation and future innovation, which included decision-making process and problem-solving strategy. This study found that maintenance operators prefer build-in magnifier to zoom the components (55.6%), 3D exploded view to track the problem parts (50%), and line chart to find the alter data or information (61.1%). There is a significant difference in the use of analogy (44.4%), focal objects (38.9%) and mean-ends strategy (16.7%). The marked differences between maintainers and operators are of the application of a problem solving strategy. However, future work should explore multimedia information retrieval which supports maintenance operators for decision-making.Keywords: Augmented reality, situation awareness, decision-making, problem-solving.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13457447 Automatic Clustering of Gene Ontology by Genetic Algorithm
Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias, Zalmiyah Zakaria, Saberi M. Mohamad
Abstract:
Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.
Keywords: Automatic clustering, cohesion-and-coupling metric, gene ontology; genetic algorithm, split-and-merge algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19557446 Learning Block Memories with Metric Networks
Authors: Mario Gonzalez, David Dominguez, Francisco B. Rodriguez
Abstract:
An attractor neural network on the small-world topology is studied. A learning pattern is presented to the network, then a stimulus carrying local information is applied to the neurons and the retrieval of block-like structure is investigated. A synaptic noise decreases the memory capability. The change of stability from local to global attractors is shown to depend on the long-range character of the network connectivity.Keywords: Hebbian learning, image recognition, small world, spatial information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18657445 A Text Mining Technique Using Association Rules Extraction
Authors: Hany Mahgoub, Dietmar Rösner, Nabil Ismail, Fawzy Torkey
Abstract:
This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.
Keywords: Text mining, data mining, association rule mining
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 44377444 Genetic Algorithms for Feature Generation in the Context of Audio Classification
Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes
Abstract:
Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.
Keywords: Feature generation, feature learning, genetic algorithm, music information retrieval.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10787443 Big Data: Big Challenges to Privacy and Data Protection
Authors: Abu Bakar Munir, Siti Hajar Mohd Yasin, Firdaus Muhammad-Sukki
Abstract:
This paper seeks to analyse the benefits of big data and more importantly the challenges it pose to the subject of privacy and data protection. First, the nature of big data will be briefly deliberated before presenting the potential of big data in the present days. Afterwards, the issue of privacy and data protection is highlighted before discussing the challenges of implementing this issue in big data. In conclusion, the paper will put forward the debate on the adequacy of the existing legal framework in protecting personal data in the era of big data.
Keywords: Big data, data protection, information, privacy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 39247442 Tele-Diagnosis System for Rural Thailand
Authors: C. Snae Namahoot, M. Brueckner
Abstract:
Thailand-s health system is challenged by the rising number of patients and decreasing ratio of medical practitioners/patients, especially in rural areas. This may tempt inexperienced GPs to rush through the process of anamnesis with the risk of incorrect diagnosis. Patients have to travel far to the hospital and wait for a long time presenting their case. Many patients try to cure themselves with traditional Thai medicine. Many countries are making use of the Internet for medical information gathering, distribution and storage. Telemedicine applications are a relatively new field of study in Thailand; the infrastructure of ICT had hampered widespread use of the Internet for using medical information. With recent improvements made health and technology professionals can work out novel applications and systems to help advance telemedicine for the benefit of the people. Here we explore the use of telemedicine for people with health problems in rural areas in Thailand and present a Telemedicine Diagnosis System for Rural Thailand (TEDIST) for diagnosing certain conditions that people with Internet access can use to establish contact with Community Health Centers, e.g. by mobile phone. The system uses a Web-based input method for individual patients- symptoms, which are taken by an expert system for the analysis of conditions and appropriate diseases. The analysis harnesses a knowledge base and a backward chaining component to find out, which health professionals should be presented with the case. Doctors have the opportunity to exchange emails or chat with the patients they are responsible for or other specialists. Patients- data are then stored in a Personal Health Record.Keywords: Biomedical engineering, data acquisition, expert system, information management system, and information retrieval.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28287441 Cross-Search Technique and its Visualization of Peer-to-Peer Distributed Clinical Documents
Authors: Yong Jun Choi, Juman Byun, Simon Berkovich
Abstract:
One of the ubiquitous routines in medical practice is searching through voluminous piles of clinical documents. In this paper we introduce a distributed system to search and exchange clinical documents. Clinical documents are distributed peer-to-peer. Relevant information is found in multiple iterations of cross-searches between the clinical text and its domain encyclopedia.
Keywords: Clinical documents, cross-search, document exchange, information retrieval, peer-to-peer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13027440 Key Frame Based Video Summarization via Dependency Optimization
Authors: Janya Sainui
Abstract:
As a rapid growth of digital videos and data communications, video summarization that provides a shorter version of the video for fast video browsing and retrieval is necessary. Key frame extraction is one of the mechanisms to generate video summary. In general, the extracted key frames should both represent the entire video content and contain minimum redundancy. However, most of the existing approaches heuristically select key frames; hence, the selected key frames may not be the most different frames and/or not cover the entire content of a video. In this paper, we propose a method of video summarization which provides the reasonable objective functions for selecting key frames. In particular, we apply a statistical dependency measure called quadratic mutual informaion as our objective functions for maximizing the coverage of the entire video content as well as minimizing the redundancy among selected key frames. The proposed key frame extraction algorithm finds key frames as an optimization problem. Through experiments, we demonstrate the success of the proposed video summarization approach that produces video summary with better coverage of the entire video content while less redundancy among key frames comparing to the state-of-the-art approaches.Keywords: Video summarization, key frame extraction, dependency measure, quadratic mutual information, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9637439 Knowledge Based Chords Manipulation in MES
Authors: V. Hepsiba Mabel, K. Alagarsamy, Justus S.
Abstract:
Chord formation in western music notations is an intelligent art form which is learnt over the years by a musician to acquire it. Still it is a question of creativity that brings the perfect chord sequence that matches music score. This work focuses on the process of forming chords using a custom-designed knowledgebase (KB) of Music Expert System. An optimal Chord-Set for a given music score is arrived by using the chord-pool in the KB and the finding the chord match using Jusic Distance (JD). Conceptual Graph based knowledge representation model is followed for knowledge storage and retrieval in the knowledgebase.
Keywords: Knowledge, Music, Representation, Knowledgebase, Chord-Set.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18327438 Towards Improved Public Information on Industrial Emissions in Italy: Concepts and Specific Issues Associated to the Italian Experience in IPPC Permit Licensing
Authors: Mazziotti Gomez de Teran C., Fiore D., Cola B., Fardelli A.
Abstract:
The present paper summarizes the analysis of the request for consultation of information and data on industrial emissions made publicly available on the web site of the Ministry of Environment, Land and Sea on integrated pollution prevention and control from large industrial installations, the so called “AIA Portal”. As a matter of fact, a huge amount of information on national industrial plants is already available on internet, although it is usually proposed as textual documentation or images. Thus, it is not possible to access all the relevant information through interoperability systems and also to retrieval relevant information for decision making purposes as well as rising of awareness on environmental issue. Moreover, since in Italy the number of institutional and private subjects involved in the management of the public information on industrial emissions is substantial, the access to the information is provided on internet web sites according to different criteria; thus, at present it is not structurally homogeneous and comparable. To overcome the mentioned difficulties in the case of the Coordinating Committee for the implementation of the Agreement for the industrial area in Taranto and Statte, operating before the IPPC permit granting procedures of the relevant installation located in the area, a big effort was devoted to elaborate and to validate data and information on characterization of soil, ground water aquifer and coastal sea at disposal of different subjects to derive a global perspective for decision making purposes. Thus, the present paper also focuses on main outcomes matured during such experience.
Keywords: Public information, emissions into atmosphere, IPPC permits, territorial information systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20587437 Digital Preservation in Nigeria Universities Libraries: A Comparison between University of Nigeria Nsukka and Ahmadu Bello University Zaria
Authors: Suleiman Musa, Shuaibu Sidi Safiyanu
Abstract:
This study examined the digital preservation in Nigeria university libraries. A comparison between the university of Nigeria Nsukka (UNN) and Ahmadu Bello University Zaria (ABU, Zaria). The study utilized primary source of data obtained from two selected institution librarians. Finding revealed varying results in terms of skills acquired by librarians before and after digitization of the two institutions. The study reports that journals publication, text book, CD-ROMS, conference papers and proceedings, theses, dissertations and seminar papers are among the information resources available for digitization. The study further documents that copyright issue, power failure, and unavailability of needed materials are among the challenges facing the digitization of library of the institution. On the basis of the finding, the study concluded that digitization of library enhances efficiency in organization and retrieval of information services. The study therefore recommended that software should be upgraded with backup, training of the librarians on digital process, installation of antivirus and enhancement of technical collaboration between the library and MIS.Keywords: Digitalization, preservation, libraries, comparison.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17257436 Video Summarization: Techniques and Applications
Authors: Zaynab Elkhattabi, Youness Tabii, Abdelhamid Benkaddour
Abstract:
Nowadays, huge amount of multimedia repositories make the browsing, retrieval and delivery of video contents very slow and even difficult tasks. Video summarization has been proposed to improve faster browsing of large video collections and more efficient content indexing and access. In this paper, we focus on approaches to video summarization. The video summaries can be generated in many different forms. However, two fundamentals ways to generate summaries are static and dynamic. We present different techniques for each mode in the literature and describe some features used for generating video summaries. We conclude with perspective for further research.
Keywords: Semantic features, static summarization, video skimming, Video summarization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 70697435 Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution
Authors: S. M. Khalessizadeh, R. Zaefarian, S.H. Nasseri, E. Ardil
Abstract:
Today, Genetic Algorithm has been used to solve wide range of optimization problems. Some researches conduct on applying Genetic Algorithm to text classification, summarization and information retrieval system in text mining process. This researches show a better performance due to the nature of Genetic Algorithm. In this paper a new algorithm for using Genetic Algorithm in concept weighting and topic identification, based on concept standard deviation will be explored.Keywords: Genetic Algorithm, Text Mining, Term Weighting, Concept Extraction, Concept Distribution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 37147434 Using Automatic Ontology Learning Methods in Human Plausible Reasoning Based Systems
Authors: A. R. Vazifedoost, M. Rahgozar, F. Oroumchian
Abstract:
Knowledge discovery from text and ontology learning are relatively new fields. However their usage is extended in many fields like Information Retrieval (IR) and its related domains. Human Plausible Reasoning based (HPR) IR systems for example need a knowledge base as their underlying system which is currently made by hand. In this paper we propose an architecture based on ontology learning methods to automatically generate the needed HPR knowledge base.Keywords: Ontology Learning, Human Plausible Reasoning, knowledge extraction, knowledge representation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16017433 A Robust Audio Fingerprinting Algorithm in MP3 Compressed Domain
Authors: Ruili Zhou, Yuesheng Zhu
Abstract:
In this paper, a new robust audio fingerprinting algorithm in MP3 compressed domain is proposed with high robustness to time scale modification (TSM). Instead of simply employing short-term information of the MP3 stream, the new algorithm extracts the long-term features in MP3 compressed domain by using the modulation frequency analysis. Our experiment has demonstrated that the proposed method can achieve a hit rate of above 95% in audio retrieval and resist the attack of 20% TSM. It has lower bit error rate (BER) performance compared to the other algorithms. The proposed algorithm can also be used in other compressed domains, such as AAC.Keywords: Audio Fingerprinting, MP3, Modulation Frequency, TSM
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21967432 Data Preprocessing for Supervised Leaning
Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas
Abstract:
Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.Keywords: Data mining, feature selection, data cleaning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 60917431 Applications of Big Data in Education
Authors: Faisal Kalota
Abstract:
Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 48747430 Knowledge Management and Tourism: An Exploratory Study Applied to Travel Agents in Egypt
Authors: Mohammad Soliman, Mohamed A. Abou-Shouk
Abstract:
Knowledge management focuses on the development, storage, retrieval, and dissemination of information and expertise. It has become an important tool to improve performance in tourism enterprises. This includes improving decision-making, developing customer services, and increasing sales and profits. Knowledge management adoption depends on human, organizational and technological factors. This study aims to explore the concept of knowledge management in travel agents in Egypt. It explores the requirements of adoption and its impact on performance in these agencies. The study targets Category A travel agents in Egypt. The population of the study encompasses Category A travel agents having online presence. An online questionnaire is used to collect data from managers of travel agents. This study is useful for travel agents who are in urgent need to restructure their intermediary role and support their survival in the global travel market. The study sheds light on the requirements of adoption and the expected impact on performance. This could help travel agents identify their situation and the determine the extent to which they are ready to adopt knowledge management. This study is contributing to knowledge by providing insights from the tourism sector in a developing country where the concept of knowledge management is still in its infancy stages.Keywords: Benefits, determinants, Egypt, knowledge management, travel agents.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19817429 Research of Data Cleaning Methods Based on Dependency Rules
Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin
Abstract:
This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.Keywords: Data cleaning, dependency rules, violation data discovery, data repair.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26127428 Travel Time Model for Cylinder Type Parking System
Authors: Jing Zhang, Jie Chen
Abstract:
In this paper, we mainly analyze an automated parking system where the storage and retrieval requests are performed by a tower crane. In this parking system, the S/R crane which is located at the middle of the bottom of the cylinder parking area can rotate in both clockwise and counterclockwise and three kinds of movements can be done simultaneously. We develop some mathematical travel time models for the single command cycle under the random storage assignment using the characteristics of this system. Finally, we compare these travel models with discrete case and it is shown that these travel models display a good satisfactory performance.Keywords: Parking system, travel time model, tower crane.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 793