Search results for: data filtering

7328 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1799

7327 Improving Performance of World Wide Web by Adaptive Web Traffic Reduction

Authors: Achuthsankar S. Nair, J. S. Jayasudha

Abstract:

The ever increasing use of World Wide Web in the existing network, results in poor performance. Several techniques have been developed for reducing web traffic by compressing the size of the file, saving the web pages at the client side, changing the burst nature of traffic into constant rate etc. No single method was adequate enough to access the document instantly through the Internet. In this paper, adaptive hybrid algorithms are developed for reducing web traffic. Intelligent agents are used for monitoring the web traffic. Depending upon the bandwidth usage, user-s preferences, server and browser capabilities, intelligent agents use the best techniques to achieve maximum traffic reduction. Web caching, compression, filtering, optimization of HTML tags, and traffic dispersion are incorporated into this adaptive selection. Using this new hybrid technique, latency is reduced to 20 – 60 % and cache hit ratio is increased 40 – 82 %.

Keywords: Bandwidth, Congestion, Intelligent Agents, Prefetching, Web Caching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1727

7326 Experimental Study on Dehumidification Performance of Supersonic Nozzle

Authors: Esam Jassim

Abstract:

Supersonic nozzles are commonly used to purify natural gas in gas processing technology. As an innovated technology, it is employed to overcome the deficit of the traditional method, related to gas dynamics, thermodynamics and fluid dynamics theory. An indoor test rig is built to study the dehumidification process of moisture fluid. Humid air was chosen for the study. The working fluid was circulating in an open loop, which had provision for filtering, metering, and humidifying. A stainless steel supersonic separator is constructed together with the C-D nozzle system. The result shows that dehumidification enhances as NPR increases. This is due to the high intensity in the turbulence caused by the shock formation in the divergent section. Such disturbance strengthens the centrifugal force, pushing more particles toward the near-wall region. In return return, the pressure recovery factor, defined as the ratio of the outlet static pressure of the fluid to its inlet value, decreases with NPR.

Keywords: Supersonic nozzle, dehumidification, particle separation, geometry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1181

7325 Computation of the Filtering Properties of Photonic Crystal Waveguide Discontinuities Using the Mode Matching Method

Authors: Athanasios Theoharidis, Thomas Kamalakis, Ioannis Neokosmidis, Thomas Sphicopoulos

Abstract:

In this paper, the application of the Mode Matching (MM) method in the case of photonic crystal waveguide discontinuities is presented. The structure under consideration is divided into a number of cells, which supports a number of guided and evanescent modes. These modes can be calculated numerically by an alternative formulation of the plane wave expansion method for each frequency. A matrix equation is then formed relating the modal amplitudes at the beginning and at the end of the structure. The theory is highly efficient and accurate and can be applied to study the transmission sensitivity of photonic crystal devices due to fabrication tolerances. The accuracy of the MM method is compared to the Finite Difference Frequency Domain (FDFD) and the Adjoint Variable Method (AVM) and good agreement is observed.

Keywords: Optical Communications, Integrated Optics, Photonic Crystals, Optical Waveguide Discontinuities.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1549

7324 Pulsed Multi-Layered Image Filtering: A VLSI Implementation

Authors: Christian Mayr, Holger Eisenreich, Stephan Henker, René Schüffny

Abstract:

Image convolution similar to the receptive fields found in mammalian visual pathways has long been used in conventional image processing in the form of Gabor masks. However, no VLSI implementation of parallel, multi-layered pulsed processing has been brought forward which would emulate this property. We present a technical realization of such a pulsed image processing scheme. The discussed IC also serves as a general testbed for VLSI-based pulsed information processing, which is of interest especially with regard to the robustness of representing an analog signal in the phase or duration of a pulsed, quasi-digital signal, as well as the possibility of direct digital manipulation of such an analog signal. The network connectivity and processing properties are reconfigurable so as to allow adaptation to various processing tasks.

Keywords: Neural image processing, pulse computation application, pulsed Gabor convolution, VLSI pulse routing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1366

7323 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1213

7322 Dimensional Modeling of HIV Data Using Open Source

Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer

Abstract:

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.

Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1934

7321 Inter-frame Collusion Attack in SS-N Video Watermarking System

Authors: Yaser Mohammad Taheri, Alireza Zolghadr–asli, Mehran Yazdi

Abstract:

Video watermarking is usually considered as watermarking of a set of still images. In frame-by-frame watermarking approach, each video frame is seen as a single watermarked image, so collusion attack is more critical in video watermarking. If the same or redundant watermark is used for embedding in every frame of video, the watermark can be estimated and then removed by watermark estimate remodolulation (WER) attack. Also if uncorrelated watermarks are used for every frame, these watermarks can be washed out with frame temporal filtering (FTF). Switching watermark system or so-called SS-N system has better performance against WER and FTF attacks. In this system, for each frame, the watermark is randomly picked up from a finite pool of watermark patterns. At first SS-N system will be surveyed and then a new collusion attack for SS-N system will be proposed using a new algorithm for separating video frame based on watermark pattern. So N sets will be built in which every set contains frames carrying the same watermark. After that, using WER attack in every set, N different watermark patterns will be estimated and removed later.

Keywords: Watermark estimation remodulation (WER), Frame Temporal Averaging (FTF), switching watermark system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1470

7320 Words Reordering based on Statistical Language Model

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou

Abstract:

There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The comparative advantage of this method is that works with a large set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns.

Keywords: Permutations filtering, Statistical languagemodel N-grams, Word order errors

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1566

7319 Efficient Lossless Compression of Weather Radar Data

Authors: Wei-hua Ai, Wei Yan, Xiang Li

Abstract:

Data compression is used operationally to reduce bandwidth and storage requirements. An efficient method for achieving lossless weather radar data compression is presented. The characteristics of the data are taken into account and the optical linear prediction is used for the PPI images in the weather radar data in the proposed method. The next PPI image is identical to the current one and a dramatic reduction in source entropy is achieved by using the prediction algorithm. Some lossless compression methods are used to compress the predicted data. Experimental results show that for the weather radar data, the method proposed in this paper outperforms the other methods.

Keywords: Lossless compression, weather radar data, optical linear prediction, PPI image

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2230

7318 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: Data management, digitization, Industry 4.0, knowledge engineering, metamodel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1422

7317 A Methodology for Data Migration between Different Database Management Systems

Authors: Bogdan Walek, Cyril Klimes

Abstract:

In present days the area of data migration is very topical. Current tools for data migration in the area of relational database have several disadvantages that are presented in this paper. We propose a methodology for data migration of the database tables and their data between various types of relational database systems (RDBMS). The proposed methodology contains an expert system. The expert system contains a knowledge base that is composed of IFTHEN rules and based on the input data suggests appropriate data types of columns of database tables. The proposed tool, which contains an expert system, also includes the possibility of optimizing the data types in the target RDBMS database tables based on processed data of the source RDBMS database tables. The proposed expert system is shown on data migration of selected database of the source RDBMS to the target RDBMS.

Keywords: Expert system, fuzzy, data migration, database, relational database, data type, relational database management system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3453

7316 Error Effects on SAR Image Resolution using Range Doppler Imaging Algorithm

Authors: Su Su Yi Mon, Fang Jiancheng

Abstract:

Synthetic Aperture Radar (SAR) is an imaging radar form by taking full advantage of the relative movement of the antenna with respect to the target. Through the simultaneous processing of the radar reflections over the movement of the antenna via the Range Doppler Algorithm (RDA), the superior resolution of a theoretical wider antenna, termed synthetic aperture, is obtained. Therefore, SAR can achieve high resolution two dimensional imagery of the ground surface. In addition, two filtering steps in range and azimuth direction provide accurate enough result. This paper develops a simulation in which realistic SAR images can be generated. Also, the effect of velocity errors in the resulting image has also been investigated. Taking some velocity errors into account, the simulation results on the image resolution would be presented. Most of the times, algorithms need to be adjusted for particular datasets, or particular applications.

Keywords: Synthetic Aperture Radar (SAR), Range Doppler Algorithm (RDA), Image Resolution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3318

7315 Optimization of Distributed Processors for Power System: Kalman Filters using Petri Net

Authors: Anant Oonsivilai, Kenedy A. Greyson

Abstract:

The growth and interconnection of power networks in many regions has invited complicated techniques for energy management services (EMS). State estimation techniques become a powerful tool in power system control centers, and that more information is required to achieve the objective of EMS. For the online state estimator, assuming the continuous time is equidistantly sampled with period Δt, processing events must be finished within this period. Advantage of Kalman Filtering (KF) algorithm in using system information to improve the estimation precision is utilized. Computational power is a major issue responsible for the achievement of the objective, i.e. estimators- solution at a small sampled period. This paper presents the optimum utilization of processors in a state estimator based on KF. The model used is presented using Petri net (PN) theory.

Keywords: Kalman filters, model, Petri Net, power system, sequential State estimator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1335

7314 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: Big data, open data, productivity, transparency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597

7313 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data

Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin

Abstract:

Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.

Keywords: Big data, correlation analysis, data recommendation system, urban data network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1086

7312 On the Combination of Patient-Generated Data with Data from a Secure Clinical Network Environment – A Practical Example

Authors: Jeroen S. de Bruin, Karin Schindler, Christian Schuh

Abstract:

With increasingly more mobile health applications appearing due to the popularity of smartphones, the possibility arises that these data can be used to improve the medical diagnostic process, as well as the overall quality of healthcare, while at the same time lowering costs. However, as of yet there have been no reports of a successful combination of patient-generated data from smartphones with data from clinical routine. In this paper we describe how these two types of data can be combined in a secure way without modification to hospital information systems, and how they can together be used in a medical expert system for automatic nutritional classification and triage.

Keywords: Data integration, disease-related malnutrition, expert systems, mobile health.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2179

7311 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes

Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin

Abstract:

Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.

Keywords: Missing data, Imputation, Missing Data Techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644

7310 Cluster Analysis for the Statistical Modeling of Aesthetic Judgment Data Related to Comics Artists

Authors: George E. Tsekouras, Evi Sampanikou

Abstract:

We compare three categorical data clustering algorithms with respect to the problem of classifying cultural data related to the aesthetic judgment of comics artists. Such a classification is very important in Comics Art theory since the determination of any classes of similarities in such kind of data will provide to art-historians very fruitful information of Comics Art-s evolution. To establish this, we use a categorical data set and we study it by employing three categorical data clustering algorithms. The performances of these algorithms are compared each other, while interpretations of the clustering results are also given.

Keywords: Aesthetic judgment, comics artists, cluster analysis, categorical data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617

7309 IoT Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework

Authors: Femi Elegbeleye, Seani Rananga

Abstract:

This paper focused on cost effective storage architecture using fog and cloud data storage gateway, and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. Several results obtained from this study on data privacy models show that when two or more data privacy models are integrated via a fog storage gateway, we often have more secure data. Our main focus in the study is to design a framework for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, including its structure, and its interrelationships.

Keywords: IoT, fog storage, cloud storage, data analysis, data privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188

7308 Noise Removal from Surface Respiratory EMG Signal

Authors: Slim Yacoub, Kosai Raoof

Abstract:

The aim of this study was to remove the two principal noises which disturb the surface electromyography signal (Diaphragm). These signals are the electrocardiogram ECG artefact and the power line interference artefact. The algorithm proposed focuses on a new Lean Mean Square (LMS) Widrow adaptive structure. These structures require a reference signal that is correlated with the noise contaminating the signal. The noise references are then extracted : first with a noise reference mathematically constructed using two different cosine functions; 50Hz (the fundamental) function and 150Hz (the first harmonic) function for the power line interference and second with a matching pursuit technique combined to an LMS structure for the ECG artefact estimation. The two removal procedures are attained without the use of supplementary electrodes. These techniques of filtering are validated on real records of surface diaphragm electromyography signal. The performance of the proposed methods was compared with already conducted research results.

Keywords: Surface EMG, Adaptive, Matching Pursuit, Powerline interference.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4304

7307 Comparative Analysis of Two Approaches to Joint Signal Detection, ToA and AoA Estimation in Multi-Element Antenna Arrays

Authors: Olesya Bolkhovskaya, Alexey Davydov, Alexander Maltsev

Abstract:

In this paper two approaches to joint signal detection, time of arrival (ToA) and angle of arrival (AoA) estimation in multi-element antenna array are investigated. Two scenarios were considered: first one, when the waveform of the useful signal is known a priori and, second one, when the waveform of the desired signal is unknown. For first scenario, the antenna array signal processing based on multi-element matched filtering (MF) with the following non-coherent detection scheme and maximum likelihood (ML) parameter estimation blocks is exploited. For second scenario, the signal processing based on the antenna array elements covariance matrix estimation with the following eigenvector analysis and ML parameter estimation blocks is applied. The performance characteristics of both signal processing schemes are thoroughly investigated and compared for different useful signals and noise parameters.

Keywords: Antenna array, signal detection, ToA, AoA estimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2026

7306 The Impact of System and Data Quality on Organizational Success in the Kingdom of Bahrain

Authors: Amal M. Alrayes

Abstract:

Data and system quality play a central role in organizational success, and the quality of any existing information system has a major influence on the effectiveness of overall system performance. Given the importance of system and data quality to an organization, it is relevant to highlight their importance on organizational performance in the Kingdom of Bahrain. This research aims to discover whether system quality and data quality are related, and to study the impact of system and data quality on organizational success. A theoretical model based on previous research is used to show the relationship between data and system quality, and organizational impact. We hypothesize, first, that system quality is positively associated with organizational impact, secondly that system quality is positively associated with data quality, and finally that data quality is positively associated with organizational impact. A questionnaire was conducted among public and private organizations in the Kingdom of Bahrain. The results show that there is a strong association between data and system quality, that affects organizational success.

Keywords: Data quality, performance, system quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2095

7305 Integration of Multi-Source Data to Monitor Coral Biodiversity

Authors: K. Jitkue, W. Srisang, C. Yaiprasert, K. Jaroensutasinee, M. Jaroensutasinee

Abstract:

This study aims at using multi-source data to monitor coral biodiversity and coral bleaching. We used coral reef at Racha Islands, Phuket as a study area. There were three sources of data: coral diversity, sensor based data and satellite data.

Keywords: Coral reefs, Remote sensing, Sea surfacetemperatue, Satellite imagery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527

7304 Decision Support System Based on Data Warehouse

Authors: Yang Bao, LuJing Zhang

Abstract:

Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.

Keywords: Decision Support System, Data Warehouse, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3838

7303 A New History Based Method to Handle the Recurring Concept Shifts in Data Streams

Authors: Hossein Morshedlou, Ahmad Abdollahzade Barforoush

Abstract:

Recent developments in storage technology and networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data streams.

Keywords: Data Stream, Classification, Concept Shift, History.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1264

7302 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: Text mining, topic extraction, independent, incremental, independent component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1028

7301 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala

Abstract:

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Keywords: Databases, data mining, multi-agent, spatial datamart.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2025

7300 Volterra Filtering Techniques for Removal of Gaussian and Mixed Gaussian-Impulse Noise

Authors: M. B. Meenavathi, K. Rajesh

Abstract:

In this paper, we propose a new class of Volterra series based filters for image enhancement and restoration. Generally the linear filters reduce the noise and cause blurring at the edges. Some nonlinear filters based on median operator or rank operator deal with only impulse noise and fail to cancel the most common Gaussian distributed noise. A class of second order Volterra filters is proposed to optimize the trade-off between noise removal and edge preservation. In this paper, we consider both the Gaussian and mixed Gaussian-impulse noise to test the robustness of the filter. Image enhancement and restoration results using the proposed Volterra filter are found to be superior to those obtained with standard linear and nonlinear filters.

Keywords: Gaussian noise, Image enhancement, Imagerestoration, Linear filters, Nonlinear filters, Volterra series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2711

7299 Latent Topic Based Medical Data Classification

Authors: Jian-hua Yeh, Shi-yi Kuo

Abstract:

This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.

Keywords: classification, latent topics, outlier adjustment, feature scaling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625