Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6925

Search results for: data warehousing

6925 A Soft Systems Methodology Perspective on Data Warehousing Education Improvement

Authors: R. Goede, E. Taylor

Abstract:

This paper demonstrates how the soft systems methodology can be used to improve the delivery of a module in data warehousing for fourth year information technology students. Graduates in information technology needs to have academic skills but also needs to have good practical skills to meet the skills requirements of the information technology industry. In developing and improving current data warehousing education modules one has to find a balance in meeting the expectations of various role players such as the students themselves, industry and academia. The soft systems methodology, developed by Peter Checkland, provides a methodology for facilitating problem understanding from different world views. In this paper it is demonstrated how the soft systems methodology can be used to plan the improvement of data warehousing education for fourth year information technology students.

Keywords: Data warehousing, education, soft systems methodology, stakeholders, systems thinking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1386
6924 Improved Data Warehousing: Lessons Learnt from the Systems Approach

Authors: Roelien Goede

Abstract:

Data warehousing success is not high enough. User dissatisfaction and failure to adhere to time frames and budgets are too common. Most traditional information systems practices are rooted in hard systems thinking. Today, the great systems thinkers are forgotten by information systems developers. A data warehouse is still a system and it is worth investigating whether systems thinkers such as Churchman can enhance our practices today. This paper investigates data warehouse development practices from a systems thinking perspective. An empirical investigation is done in order to understand the everyday practices of data warehousing professionals from a systems perspective. The paper presents a model for the application of Churchman-s systems approach in data warehouse development.

Keywords: Data warehouse development, Information systemsdevelopment, Interpretive case study, Systems thinking

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1314
6923 Issues and Architecture for Supporting Data Warehouse Queries in Web Portals

Authors: Minsoo Lee, Yoon-kyung Lee, Hyejung Yoon, Soo-kyung Song, Sujeong Cheong

Abstract:

Data Warehousing tools have become very popular and currently many of them have moved to Web-based user interfaces to make it easier to access and use the tools. The next step is to enable these tools to be used within a portal framework. The portal framework consists of pages having several small windows that contain individual data warehouse query results. There are several issues that need to be considered when designing the architecture for a portal enabled data warehouse query tool. Some issues need special techniques that can overcome the limitations that are imposed by the nature of data warehouse queries. Issues such as single sign-on, query result caching and sharing, customization, scheduling and authorization need to be considered. This paper discusses such issues and suggests an architecture to support data warehouse queries within Web portal frameworks.

Keywords: Data Warehousing tools, data warehousing queries, web portal frameworks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1847
6922 Generic Data Warehousing for Consumer Electronics Retail Industry

Authors: S. Habte, K. Ouazzane, P. Patel, S. Patel

Abstract:

The dynamic and highly competitive nature of the consumer electronics retail industry means that businesses in this industry are experiencing different decision making challenges in relation to pricing, inventory control, consumer satisfaction and product offerings. To overcome the challenges facing retailers and create opportunities, we propose a generic data warehousing solution which can be applied to a wide range of consumer electronics retailers with a minimum configuration. The solution includes a dimensional data model, a template SQL script, a high level architectural descriptions, ETL tool developed using C#, a set of APIs, and data access tools. It has been successfully applied by ASK Outlets Ltd UK resulting in improved productivity and enhanced sales growth.

Keywords: Consumer electronics retail, dimensional data model, data analysis, generic data warehousing, reporting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 927
6921 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: Data integration, data warehousing, federated architecture, online analytical processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 421
6920 The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making

Authors: Nevena Stolba, A Min Tjoa

Abstract:

Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.

Keywords: data mining, data warehousing, decision-support systems, evidence-based medicine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3330
6919 The Comparison of Anchor and Star Schema from a Query Performance Perspective

Authors: Radek Němec

Abstract:

Today's business environment requires that companies have access to highly relevant information in a matter of seconds. Modern Business Intelligence tools rely on data structured mostly in traditional dimensional database schemas, typically represented by star schemas. Dimensional modeling is already recognized as a leading industry standard in the field of data warehousing although several drawbacks and pitfalls were reported. This paper focuses on the analysis of another data warehouse modeling technique - the anchor modeling, and its characteristics in context with the standardized dimensional modeling technique from a query performance perspective. The results of the analysis show information about performance of queries executed on database schemas structured according to principles of each database modeling technique.

Keywords: Data warehousing, anchor modeling, star schema, anchor schema, query performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3003
6918 The Data Mining usage in Production System Management

Authors: Pavel Vazan, Pavol Tanuska, Michal Kebisek

Abstract:

The paper gives the pilot results of the project that is oriented on the use of data mining techniques and knowledge discoveries from production systems through them. They have been used in the management of these systems. The simulation models of manufacturing systems have been developed to obtain the necessary data about production. The authors have developed the way of storing data obtained from the simulation models in the data warehouse. Data mining model has been created by using specific methods and selected techniques for defined problems of production system management. The new knowledge has been applied to production management system. Gained knowledge has been tested on simulation models of the production system. An important benefit of the project has been proposal of the new methodology. This methodology is focused on data mining from the databases that store operational data about the production process.

Keywords: data mining, data warehousing, management of production system, simulation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3195
6917 Towards Development of Solution for Business Process-Oriented Data Analysis

Authors: M. Klimavicius

Abstract:

This paper proposes a modeling methodology for the development of data analysis solution. The Author introduce the approach to address data warehousing issues at the at enterprise level. The methodology covers the process of the requirements eliciting and analysis stage as well as initial design of data warehouse. The paper reviews extended business process model, which satisfy the needs of data warehouse development. The Author considers that the use of business process models is necessary, as it reflects both enterprise information systems and business functions, which are important for data analysis. The Described approach divides development into three steps with different detailed elaboration of models. The Described approach gives possibility to gather requirements and display them to business users in easy manner.

Keywords: Data warehouse, data analysis, business processmanagement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1152
6916 Dimensional Modeling of HIV Data Using Open Source

Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer

Abstract:

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.

Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1674
6915 XML Schema Automatic Matching Solution

Authors: Huynh Quyet Thang, Vo Sy Nam

Abstract:

Schema matching plays a key role in many different applications, such as schema integration, data integration, data warehousing, data transformation, E-commerce, peer-to-peer data management, ontology matching and integration, semantic Web, semantic query processing, etc. Manual matching is expensive and error-prone, so it is therefore important to develop techniques to automate the schema matching process. In this paper, we present a solution for XML schema automated matching problem which produces semantic mappings between corresponding schema elements of given source and target schemas. This solution contributed in solving more comprehensively and efficiently XML schema automated matching problem. Our solution based on combining linguistic similarity, data type compatibility and structural similarity of XML schema elements. After describing our solution, we present experimental results that demonstrate the effectiveness of this approach.

Keywords: XML Schema, Schema Matching, SemanticMatching, Automatic XML Schema Matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591
6914 Multidimensional Performance Management

Authors: David Wiese

Abstract:

In order to maximize efficiency of an information management platform and to assist in decision making, the collection, storage and analysis of performance-relevant data has become of fundamental importance. This paper addresses the merits and drawbacks provided by the OLAP paradigm for efficiently navigating large volumes of performance measurement data hierarchically. The system managers or database administrators navigate through adequately (re)structured measurement data aiming to detect performance bottlenecks, identify causes for performance problems or assessing the impact of configuration changes on the system and its representative metrics. Of particular importance is finding the root cause of an imminent problem, threatening availability and performance of an information system. Leveraging OLAP techniques, in contrast to traditional static reporting, this is supposed to be accomplished within moderate amount of time and little processing complexity. It is shown how OLAP techniques can help improve understandability and manageability of measurement data and, hence, improve the whole Performance Analysis process.

Keywords: Data Warehousing, OLAP, Multidimensional Navigation, Performance Diagnosis, Performance Management, Performance Tuning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
6913 CSOLAP (Continuous Spatial On-Line Analytical Processing)

Authors: Taher Omran Ahmed, Abdullatif Mihdi Buras

Abstract:

Decision support systems are usually based on multidimensional structures which use the concept of hypercube. Dimensions are the axes on which facts are analyzed and form a space where a fact is located by a set of coordinates at the intersections of members of dimensions. Conventional multidimensional structures deal with discrete facts linked to discrete dimensions. However, when dealing with natural continuous phenomena the discrete representation is not adequate. There is a need to integrate spatiotemporal continuity within multidimensional structures to enable analysis and exploration of continuous field data. Research issues that lead to the integration of spatiotemporal continuity in multidimensional structures are numerous. In this paper, we discuss research issues related to the integration of continuity in multidimensional structures, present briefly a multidimensional model for continuous field data. We also define new aggregation operations. The model and the associated operations and measures are validated by a prototype.

Keywords: Continuous Data, Data warehousing, DecisionSupport, SOLAP

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1351
6912 The Study of Cost Accounting in S Company Based On TDABC

Authors: Heng Ma

Abstract:

Third-party warehousing logistics has an important role in the development of external logistics. At present, the third-party logistics in our country is still a new industry, the accounting system has not yet been established, the current financial accounting system of third-party warehousing logistics is mainly in the traditional way of thinking, and only able to provide the total cost information of the entire enterprise during the accounting period, unable to reflect operating indirect cost information. In order to solve the problem of third-party logistics industry cost information distortion, improve the level of logistics cost management, the paper combines theoretical research and case analysis method to reflect cost allocation by building third-party logistics costing model using Time-Driven Activity-Based Costing(TDABC), and takes S company as an example to account and control the warehousing logistics cost.Based on the idea of “Products consume activities and activities consume resources”, TDABC put time into the main cost driver and use time-consuming equation resources assigned to cost objects. In S company, the objects focuses on three warehouse, engaged with warehousing and transportation (the second warehouse, transport point) service. These three warehouse respectively including five departments, Business Unit, Production Unit, Settlement Center, Security Department and Equipment Division, the activities in these departments are classified by in-out of storage forecast, in-out of storage or transit and safekeeping work. By computing capacity cost rate, building the time-consuming equation, the paper calculates the final operation cost so as to reveal the real cost.The numerical analysis results show that the TDABC can accurately reflect the cost allocation of service customers and reveal the spare capacity cost of resource center, verifies the feasibility and validity of TDABC in third-party logistics industry cost accounting. It inspires enterprises focus on customer relationship management and reduces idle cost to strengthen the cost management of third-party logistics enterprises.

Keywords: Third-party logistics enterprises, TDABC, cost management, S company.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2142
6911 Futuristic Black Box Design Considerations and Global Networking for Real Time Monitoring of Flight Performance Parameters

Authors: K. Parandhama Gowd

Abstract:

The aim of this research paper is to conceptualize, discuss, analyze and propose alternate design methodologies for futuristic Black Box for flight safety. The proposal also includes global networking concepts for real time surveillance and monitoring of flight performance parameters including GPS parameters. It is expected that this proposal will serve as a failsafe real time diagnostic tool for accident investigation and location of debris in real time. In this paper, an attempt is made to improve the existing methods of flight data recording techniques and improve upon design considerations for futuristic FDR to overcome the trauma of not able to locate the block box. Since modern day communications and information technologies with large bandwidth are available coupled with faster computer processing techniques, the attempt made in this paper to develop a failsafe recording technique is feasible. Further data fusion/data warehousing technologies are available for exploitation.

Keywords: Flight data recorder (FDR), black box, diagnostic tool, global networking, cockpit voice and data recorder (CVDR), air traffic control (ATC), air traffic, telemetry, tracking and control centers ATTTCC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1151
6910 Big Data: Big Challenges to Privacy and Data Protection

Authors: Abu Bakar Munir, Siti Hajar Mohd Yasin, Firdaus Muhammad-Sukki

Abstract:

This paper seeks to analyse the benefits of big data and more importantly the challenges it pose to the subject of privacy and data protection. First, the nature of big data will be briefly deliberated before presenting the potential of big data in the present days. Afterwards, the issue of privacy and data protection is highlighted before discussing the challenges of implementing this issue in big data. In conclusion, the paper will put forward the debate on the adequacy of the existing legal framework in protecting personal data in the era of big data.

Keywords: Big data, data protection, information, privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3318
6909 Testing the Performance of Rival Warehousing Policies through Discrete Event Simulation

Authors: João Vilas-Boas, Abdul Suleman, Luis Moreira

Abstract:

This research tested the performance of alternative warehouse designs concerning the picking process. The chosen performance measures were Travel Distance and Total Fulfilment Time. An explanatory case study was built up around a model implemented with SIMUL8. Hypotheses were set by selecting outcomes from the literature survey matching popular empirical findings. 17.4% reductions were found for Total Fulfilment Time and Resource Utilisation. The latter was then used as a proxy for operational efficiency. Literal replication of theoretical data-patterns was considered as an internal validity sign. Assessing the estimated changes benefits ahead of implementation was found to be a contribution to practice.

Keywords: Warehouse discrete-event simulation, Storage policy selection and assessment, Performance evaluation of order picking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708
6908 Data Preprocessing for Supervised Leaning

Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas

Abstract:

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Keywords: Data mining, feature selection, data cleaning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4944
6907 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3759
6906 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2058
6905 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1326
6904 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1763
6903 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3431
6902 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 881
6901 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1312
6900 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1409
6899 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 979
6898 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2237
6897 Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova

Abstract:

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1332
6896 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range

Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu

Abstract:

Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.

Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 933