Search results for: data filtering
7182 A Generalised Relational Data Model
Authors: Georgia Garani
Abstract:
A generalised relational data model is formalised for the representation of data with nested structure of arbitrary depth. A recursive algebra for the proposed model is presented. All the operations are formally defined. The proposed model is proved to be a superset of the conventional relational model (CRM). The functionality and validity of the model is shown by a prototype implementation that has been undertaken in the functional programming language Miranda.Keywords: nested relations, recursive algebra, recursive nested operations, relational data model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15597181 WiFi Data Offloading: Bundling Method in a Canvas Business Model
Authors: Majid Mokhtarnia, Alireza Amini
Abstract:
Mobile operators deal with increasing in the data traffic as a critical issue. As a result, a vital responsibility of the operators is to deal with such a trend in order to create added values. This paper addresses a bundling method in a Canvas business model in a WiFi Data Offloading (WDO) strategy by which some elements of the model may be affected. In the proposed method, it is supposed to sell a number of data packages for subscribers in which there are some packages with a free given volume of data-offloaded WiFi complimentary. The paper on hands analyses this method in the views of attractiveness and profitability. The results demonstrate that the quality of implementation of the WDO strongly affects the final result and helps the decision maker to make the best one.
Keywords: Bundling, canvas business model, telecommunication, WiFi Data Offloading.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8907180 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption
Authors: Victor Onomza Waziri, John K. Alhassan, Idris Ismaila, Moses Noel Dogonyaro
Abstract:
This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute a theoretical presentations in a high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.
Keywords: Data Analytics, Security, Privacy, Bootstrapping, and Fully Homomorphic Encryption Scheme.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34587179 Hierarchical Clustering Algorithms in Data Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.Keywords: Clustering, method, algorithm, hierarchical, survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 33767178 Iterative Clustering Algorithm for Analyzing Temporal Patterns of Gene Expression
Authors: Seo Young Kim, Jae Won Lee, Jong Sung Bae
Abstract:
Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms.Keywords: Clustering, microarray experiment, temporal pattern of gene expression data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13557177 Effective Software-Based Solution for Processing Mass Downstream Data in Interactive Push VOD System
Authors: Ni Hong, Wu Guobin, Wu Gang, Pan Liang
Abstract:
Interactive push VOD system is a new kind of system that incorporates push technology and interactive technique. It can push movies to users at high speeds at off-peak hours for optimal network usage so as to save bandwidth. This paper presents effective software-based solution for processing mass downstream data at terminals of interactive push VOD system, where the service can download movie according to a viewer-s selection. The downstream data is divided into two catalogs: (1) the carousel data delivered according to DSM-CC protocol; (2) IP data delivered according to Euro-DOCSIS protocol. In order to accelerate download speed and reduce data loss rate at terminals, this software strategy introduces caching, multi-thread and resuming mechanisms. The experiments demonstrate advantages of the software-based solution.Keywords: DSM-CC, data carousel, Euro-DOCSIS, push VOD.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14897176 Approaches and Schemes for Storing DTD-Independent XML Data in Relational Databases
Authors: Mehdi Emadi, Masoud Rahgozar, Adel Ardalan, Alireza Kazerani, Mohammad Mahdi Ariyan
Abstract:
The volume of XML data exchange is explosively increasing, and the need for efficient mechanisms of XML data management is vital. Many XML storage models have been proposed for storing XML DTD-independent documents in relational database systems. Benchmarking is the best way to highlight pros and cons of different approaches. In this study, we use a common benchmarking scheme, known as XMark to compare the most cited and newly proposed DTD-independent methods in terms of logical reads, physical I/O, CPU time and duration. We show the effect of Label Path, extracting values and storing in another table and type of join needed for each method's query answering.Keywords: XML Data Management, XPath, DTD-IndependentXML Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18817175 Approaches and Schemes for Storing DTDIndependent XML Data in Relational Databases
Authors: Mehdi Emadi, Masoud Rahgozar, Adel Ardalan, Alireza Kazerani, Mohammad Mahdi Ariyan
Abstract:
The volume of XML data exchange is explosively increasing, and the need for efficient mechanisms of XML data management is vital. Many XML storage models have been proposed for storing XML DTD-independent documents in relational database systems. Benchmarking is the best way to highlight pros and cons of different approaches. In this study, we use a common benchmarking scheme, known as XMark to compare the most cited and newly proposed DTD-independent methods in terms of logical reads, physical I/O, CPU time and duration. We show the effect of Label Path, extracting values and storing in another table and type of join needed for each method-s query answering.Keywords: XML Data Management, XPath, DTD-Independent XML Data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14537174 Analysis of Urban Population Using Twitter Distribution Data: Case Study of Makassar City, Indonesia
Authors: Yuyun Wabula, B. J. Dewancker
Abstract:
In the past decade, the social networking app has been growing very rapidly. Geolocation data is one of the important features of social media that can attach the user's location coordinate in the real world. This paper proposes the use of geolocation data from the Twitter social media application to gain knowledge about urban dynamics, especially on human mobility behavior. This paper aims to explore the relation between geolocation Twitter with the existence of people in the urban area. Firstly, the study will analyze the spread of people in the particular area, within the city using Twitter social media data. Secondly, we then match and categorize the existing place based on the same individuals visiting. Then, we combine the Twitter data from the tracking result and the questionnaire data to catch the Twitter user profile. To do that, we used the distribution frequency analysis to learn the visitors’ percentage. To validate the hypothesis, we compare it with the local population statistic data and land use mapping released by the city planning department of Makassar local government. The results show that there is the correlation between Twitter geolocation and questionnaire data. Thus, integration the Twitter data and survey data can reveal the profile of the social media users.
Keywords: Geolocation, Twitter, distribution analysis, human mobility.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11927173 Database Compression for Intelligent On-board Vehicle Controllers
Authors: Ágoston Winkler, Sándor Juhász, Zoltán Benedek
Abstract:
The vehicle fleet of public transportation companies is often equipped with intelligent on-board passenger information systems. A frequently used but time and labor-intensive way for keeping the on-board controllers up-to-date is the manual update using different memory cards (e.g. flash cards) or portable computers. This paper describes a compression algorithm that enables data transmission using low bandwidth wireless radio networks (e.g. GPRS) by minimizing the amount of data traffic. In typical cases it reaches a compression rate of an order of magnitude better than that of the general purpose compressors. Compressed data can be easily expanded by the low-performance controllers, too.
Keywords: Data analysis, data compression, differentialencoding, run-length encoding, vehicle control.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15677172 EUDIS-An Encryption Scheme for User-Data Security in Public Networks
Authors: S. Balaji, M. Rajaram
Abstract:
The method of introducing the proxy interpretation for sending and receiving requests increase the capability of the server and our approach UDIV (User-Data Identity Security) to solve the data and user authentication without extending size of the data makes better than hybrid IDS (Intrusion Detection System). And at the same time all the security stages we have framed have to pass through less through that minimize the response time of the request. Even though an anomaly detected, before rejecting it the proxy extracts its identity to prevent it to enter into system. In case of false anomalies, the request will be reshaped and transformed into legitimate request for further response. Finally we are holding the normal and abnormal requests in two different queues with own priorities.
Keywords: IDS, Data & User authentication, UDIS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18547171 An Analysis of Genetic Algorithm Based Test Data Compression Using Modified PRL Coding
Authors: K. S. Neelukumari, K. B. Jayanthi
Abstract:
In this paper genetic based test data compression is targeted for improving the compression ratio and for reducing the computation time. The genetic algorithm is based on extended pattern run-length coding. The test set contains a large number of X value that can be effectively exploited to improve the test data compression. In this coding method, a reference pattern is set and its compatibility is checked. For this process, a genetic algorithm is proposed to reduce the computation time of encoding algorithm. This coding technique encodes the 2n compatible pattern or the inversely compatible pattern into a single test data segment or multiple test data segment. The experimental result shows that the compression ratio and computation time is reduced.Keywords: Backtracking, test data compression (TDC), x-filling, x-propagating and genetic algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18697170 Parallel Vector Processing Using Multi Level Orbital DATA
Authors: Nagi Mekhiel
Abstract:
Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.Keywords: Memory organization, parallel processors, serial code, vector processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10627169 Improving the Shunt Active Power Filter Performance Using Synchronous Reference Frame PI Based Controller with Anti-Windup Scheme
Authors: Consalva J. Msigwa, Beda J. Kundy, Bakari M. M. Mwinyiwiwa
Abstract:
In this paper the reference current for Voltage Source Converter (VSC) of the Shunt Active Power Filter (SAPF) is generated using Synchronous Reference Frame method, incorporating the PI controller with anti-windup scheme. The proposed method improves the harmonic filtering by compensating the winding up phenomenon caused by the integral term of the PI controller. Using Reference Frame Transformation, the current is transformed from om a - b - c stationery frame to rotating 0 - d - q frame. Using the PI controller, the current in the 0 - d - q frame is controlled to get the desired reference signal. A controller with integral action combined with an actuator that becomes saturated can give some undesirable effects. If the control error is so large that the integrator saturates the actuator, the feedback path becomes ineffective because the actuator will remain saturated even if the process output changes. The integrator being an unstable system may then integrate to a very large value, the phenomenon known as integrator windup. Implementing the integrator anti-windup circuit turns off the integrator action when the actuator saturates, hence improving the performance of the SAPF and dynamically compensating harmonics in the power network. In this paper the system performance is examined with Shunt Active Power Filter simulation model.Keywords: Phase Locked Loop (PLL), Voltage SourceConverter (VSC), Shunt Active Power Filter (SAPF), PI, Pulse WidthModulation (PWM).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15677168 An Investigation into the Application of Artificial Neural Networks to the Prediction of Injuries in Sport
Authors: J. McCullagh, T. Whitfort
Abstract:
Artificial Neural Networks (ANNs) have been used successfully in many scientific, industrial and business domains as a method for extracting knowledge from vast amounts of data. However the use of ANN techniques in the sporting domain has been limited. In professional sport, data is stored on many aspects of teams, games, training and players. Sporting organisations have begun to realise that there is a wealth of untapped knowledge contained in the data and there is great interest in techniques to utilise this data. This study will use player data from the elite Australian Football League (AFL) competition to train and test ANNs with the aim to predict the onset of injuries. The results demonstrate that an accuracy of 82.9% was achieved by the ANNs’ predictions across all examples with 94.5% of all injuries correctly predicted. These initial findings suggest that ANNs may have the potential to assist sporting clubs in the prediction of injuries.Keywords: Artificial Neural Networks, data, injuries, sport
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28887167 Analysis of Medical Data using Data Mining and Formal Concept Analysis
Authors: Anamika Gupta, Naveen Kumar, Vasudha Bhatnagar
Abstract:
This paper focuses on analyzing medical diagnostic data using classification rules in data mining and context reduction in formal concept analysis. It helps in finding redundancies among the various medical examination tests used in diagnosis of a disease. Classification rules have been derived from positive and negative association rules using the Concept lattice structure of the Formal Concept Analysis. Context reduction technique given in Formal Concept Analysis along with classification rules has been used to find redundancies among the various medical examination tests. Also it finds out whether expensive medical tests can be replaced by some cheaper tests.
Keywords: Data Mining, Formal Concept Analysis, Medical Data, Negative Classification Rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17377166 Data Transmission Reliability in Short Message Integrated Distributed Monitoring Systems
Authors: Sui Xin, Li Chunsheng, Tian Di
Abstract:
Short message integrated distributed monitoring systems (SM-DMS) are growing rapidly in wireless communication applications in various areas, such as electromagnetic field (EMF) management, wastewater monitoring, and air pollution supervision, etc. However, delay in short messages often makes the data embedded in SM-DMS transmit unreliably. Moreover, there are few regulations dealing with this problem in SMS transmission protocols. In this study, based on the analysis of the command and data requirements in the SM-DMS, we developed a processing model for the control center to solve the delay problem in data transmission. Three components of the model: the data transmission protocol, the receiving buffer pool method, and the timer mechanism were described in detail. Discussions on adjusting the threshold parameter in the timer mechanism were presented for the adaptive performance during the runtime of the SM-DMS. This model optimized the data transmission reliability in SM-DMS, and provided a supplement to the data transmission reliability protocols at the application level.
Keywords: Delay, SMS, reliability, distributed monitoringsystem (DMS), wireless communication.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17047165 Data-organization Before Learning Multi-Entity Bayesian Networks Structure
Authors: H. Bouhamed, A. Rebai, T. Lecroq, M. Jaoua
Abstract:
The objective of our work is to develop a new approach for discovering knowledge from a large mass of data, the result of applying this approach will be an expert system that will serve as diagnostic tools of a phenomenon related to a huge information system. We first recall the general problem of learning Bayesian network structure from data and suggest a solution for optimizing the complexity by using organizational and optimization methods of data. Afterward we proposed a new heuristic of learning a Multi-Entities Bayesian Networks structures. We have applied our approach to biological facts concerning hereditary complex illnesses where the literatures in biology identify the responsible variables for those diseases. Finally we conclude on the limits arched by this work.
Keywords: Data-organization, data-optimization, automatic knowledge discovery, Multi-Entities Bayesian networks, score merging.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16117164 Data Gathering Protocols for Wireless Sensor Networks
Authors: Dhinu Johnson, Gurdip Singh
Abstract:
Sensor network applications are often data centric and involve collecting data from a set of sensor nodes to be delivered to various consumers. Typically, nodes in a sensor network are resource-constrained, and hence the algorithms operating in these networks must be efficient. There may be several algorithms available implementing the same service, and efficient considerations may require a sensor application to choose the best suited algorithm. In this paper, we present a systematic evaluation of a set of algorithms implementing the data gathering service. We propose a modular infrastructure for implementing such algorithms in TOSSIM with separate configurable modules for various tasks such as interest propagation, data propagation, aggregation, and path maintenance. By appropriately configuring these modules, we propose a number of data gathering algorithms, each of which incorporates a different set of heuristics for optimizing performance. We have performed comprehensive experiments to evaluate the effectiveness of these heuristics, and we present results from our experimentation efforts.Keywords: Data Centric Protocols, Shortest Paths, Sensor networks, Message passing systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14447163 Measured versus Default Interstate Traffic Data in New Mexico, USA
Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder
Abstract:
This study investigates how the site specific traffic data differs from the Mechanistic Empirical Pavement Design Software default values. Two Weigh-in-Motion (WIM) stations were installed in Interstate-40 (I-40) and Interstate-25 (I-25) to developed site specific data. A computer program named WIM Data Analysis Software (WIMDAS) was developed using Microsoft C-Sharp (.Net) for quality checking and processing of raw WIM data. A complete year data from November 2013 to October 2014 was analyzed using the developed WIM Data Analysis Program. After that, the vehicle class distribution, directional distribution, lane distribution, monthly adjustment factor, hourly distribution, axle load spectra, average number of axle per vehicle, axle spacing, lateral wander distribution, and wheelbase distribution were calculated. Then a comparative study was done between measured data and AASHTOWare default values. It was found that the measured general traffic inputs for I-40 and I-25 significantly differ from the default values.Keywords: AASHTOWare, Traffic, Weigh-in-Motion, Axle load Distribution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16997162 Energy Efficient In-Network Data Processing in Sensor Networks
Authors: Prakash G L, Thejaswini M, S H Manjula, K R Venugopal, L M Patnaik
Abstract:
The Sensor Network consists of densely deployed sensor nodes. Energy optimization is one of the most important aspects of sensor application design. Data acquisition and aggregation techniques for processing data in-network should be energy efficient. Due to the cross-layer design, resource-limited and noisy nature of Wireless Sensor Networks(WSNs), it is challenging to study the performance of these systems in a realistic setting. In this paper, we propose optimizing queries by aggregation of data and data redundancy to reduce energy consumption without requiring all sensed data and directed diffusion communication paradigm to achieve power savings, robust communication and processing data in-network. To estimate the per-node power consumption POWERTossim mica2 energy model is used, which provides scalable and accurate results. The performance analysis shows that the proposed methods overcomes the existing methods in the aspects of energy consumption in wireless sensor networks.Keywords: Data Aggregation, Directed Diffusion, Partial Aggregation, Packet Merging, Query Plan.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18337161 Preliminary Analysis of Energy Efficiency in Data Center: Case Study
Authors: Xiaoshu Lu, Tao Lu, Matias Remes, Martti Viljanen
Abstract:
As the data-driven economy is growing faster than ever and the demand for energy is being spurred, we are facing unprecedented challenges of improving energy efficiency in data centers. Effectively maximizing energy efficiency or minimising the cooling energy demand is becoming pervasive for data centers. This paper investigates overall energy consumption and the energy efficiency of cooling system for a data center in Finland as a case study. The power, cooling and energy consumption characteristics and operation condition of facilities are examined and analysed. Potential energy and cooling saving opportunities are identified and further suggestions for improving the performance of cooling system are put forward. Results are presented as a comprehensive evaluation of both the energy performance and good practices of energy efficient cooling operations for the data center. Utilization of an energy recovery concept for cooling system is proposed. The conclusion we can draw is that even though the analysed data center demonstrated relatively high energy efficiency, based on its power usage effectiveness value, there is still a significant potential for energy saving from its cooling systems.Keywords: Data center, case study, cooling system, energyefficiency.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15437160 Multidimensional Visualization Tools for Analysis of Expression Data
Authors: Urska Cvek, Marjan Trutschl, Randolph Stone II, Zanobia Syed, John L. Clifford, Anita L. Sabichi
Abstract:
Expression data analysis is based mostly on the statistical approaches that are indispensable for the study of biological systems. Large amounts of multidimensional data resulting from the high-throughput technologies are not completely served by biostatistical techniques and are usually complemented with visual, knowledge discovery and other computational tools. In many cases, in biological systems we only speculate on the processes that are causing the changes, and it is the visual explorative analysis of data during which a hypothesis is formed. We would like to show the usability of multidimensional visualization tools and promote their use in life sciences. We survey and show some of the multidimensional visualization tools in the process of data exploration, such as parallel coordinates and radviz and we extend them by combining them with the self-organizing map algorithm. We use a time course data set of transitional cell carcinoma of the bladder in our examples. Analysis of data with these tools has the potential to uncover additional relationships and non-trivial structures.Keywords: microarrays, visualization, parallel coordinates, radviz, self-organizing maps.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25087159 A Multi-Agent Framework for Data Mining
Authors: Kamal Ali Albashiri, Khaled Ahmed Kadouh
Abstract:
A generic and extendible Multi-Agent Data Mining (MADM) framework, MADMF (the Multi-Agent Data Mining Framework) is described. The central feature of the framework is that it avoids the use of agreed meta-language formats by supporting a framework of wrappers. The advantage offered is that the framework is easily extendible, so that further data agents and mining agents can simply be added to the framework. A demonstration MADMF framework is currently available. The paper includes details of the MADMF architecture and the wrapper principle incorporated into it. A full description and evaluation of the framework-s operation is provided by considering two MADM scenarios.Keywords: Multi-Agent Data Mining (MADM), Frequent Itemsets, Meta ARM, Association Rule Mining, Classifier generator.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20747158 The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making
Authors: Nevena Stolba, A Min Tjoa
Abstract:
Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.
Keywords: data mining, data warehousing, decision-support systems, evidence-based medicine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38117157 Investigation on Performance of Change Point Algorithm in Time Series Dynamical Regimes and Effect of Data Characteristics
Authors: Farhad Asadi, Mohammad Javad Mollakazemi
Abstract:
In this paper, Bayesian online inference in models of data series are constructed by change-points algorithm, which separated the observed time series into independent series and study the change and variation of the regime of the data with related statistical characteristics. variation of statistical characteristics of time series data often represent separated phenomena in the some dynamical system, like a change in state of brain dynamical reflected in EEG signal data measurement or a change in important regime of data in many dynamical system. In this paper, prediction algorithm for studying change point location in some time series data is simulated. It is verified that pattern of proposed distribution of data has important factor on simpler and smother fluctuation of hazard rate parameter and also for better identification of change point locations. Finally, the conditions of how the time series distribution effect on factors in this approach are explained and validated with different time series databases for some dynamical system.
Keywords: Time series, fluctuation in statistical characteristics, optimal learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18127156 AudioMine: Medical Data Mining in Heterogeneous Audiology Records
Authors: Shaun Cox, Michael Oakes, Stefan Wermter, Maurice Hawthorne
Abstract:
We report on the results of a pilot study in which a data-mining tool was developed for mining audiology records. The records were heterogeneous in that they contained numeric, category and textual data. The tools developed are designed to observe associations between any field in the records and any other field. The techniques employed were the statistical chi-squared test, and the use of self-organizing maps, an unsupervised neural learning approach.
Keywords: Audiology, data mining, chi-squared, self-organizing maps
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16717155 Fuzzy Types Clustering for Microarray Data
Authors: Seo Young Kim, Tai Myong Choi
Abstract:
The main goal of microarray experiments is to quantify the expression of every object on a slide as precisely as possible, with a further goal of clustering the objects. Recently, many studies have discussed clustering issues involving similar patterns of gene expression. This paper presents an application of fuzzy-type methods for clustering DNA microarray data that can be applied to typical comparisons. Clustering and analyses were performed on microarray and simulated data. The results show that fuzzy-possibility c-means clustering substantially improves the findings obtained by others.Keywords: Clustering, microarray data, Fuzzy-type clustering, Validation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15217154 Robust Regression and its Application in Financial Data Analysis
Authors: Mansoor Momeni, Mahmoud Dehghan Nayeri, Ali Faal Ghayoumi, Hoda Ghorbani
Abstract:
This research is aimed to describe the application of robust regression and its advantages over the least square regression method in analyzing financial data. To do this, relationship between earning per share, book value of equity per share and share price as price model and earning per share, annual change of earning per share and return of stock as return model is discussed using both robust and least square regressions, and finally the outcomes are compared. Comparing the results from the robust regression and the least square regression shows that the former can provide the possibility of a better and more realistic analysis owing to eliminating or reducing the contribution of outliers and influential data. Therefore, robust regression is recommended for getting more precise results in financial data analysis.
Keywords: Financial data analysis, Influential data, Outliers, Robust regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19327153 Sources of Water Supply and Water Quality for Local Consumption: The Case Study of Eco-Tourism Village, Suan Luang Sub- District Municipality, Ampawa District, Samut Songkram Province, Thailand
Authors: Paiboon Jeamponk, Tasanee Ponglaa, Patchapon Srisanguan
Abstract:
The aim of this research paper was based on an examination of sources of water supply and water quality for local consumption, conducted at eco- tourism villages of Suan Luang Sub- District Municipality of Amphawa District, Samut Songkram Province. The study incorporated both questionnaire and field work of water testing as the research tool and method. The sample size of 288 households was based on the population of the district, whereas the selected sample water sources were from 60 households: 30 samples were ground water and another 30 were surface water. Degree of heavy metal contamination in the water including copper, iron, manganese, zinc, cadmium and lead was investigated utilizing the Atomic Absorption- Direct Aspiration method. The findings unveiled that 96.0 percent of household water consumption was based on water supply, while the rest on canal, river and rain water. The household behavior of consumption revealed that 47.2 percent of people routinely consumed water without boiling or filtering prior to consumption. The investigation of water supply quality found that the degree of heavy metal contamination including metal, lead, iron, copper, manganese and cadmium met the standards of the Department of Health.
Keywords: Sources of water supply, water quality, water supply.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1855