Search results for: data partitions
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24169

Search results for: data partitions

24169 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 148
24168 Theoretical Approach and Proof of Concept Implementation of Adaptive Partition Scheduling Module for Linux

Authors: Desislav Andreev, Veselin Stanev

Abstract:

Linux operating system continues to gain popularity with every passed year. This is due to its open-source license and a great number of distributions, covering users’ needs. At first glance it seems that Linux can be integrated in every type of systems – it is already present in personal computers, smartphones and even in some embedded systems like Raspberry Pi. However, Linux still does not meet the performance and security requirements to run effectively on a real-time system. Real-time systems are very time-restricted – their processes have to execute and finish at strict time intervals. The Completely Fair Scheduler present in Linux does not have such scheduling capabilities and it is not able to ensure that critical-time processes will execute on time. One of the ways to solve this problem is implementing an Adaptive Partition Scheduler solution similar to that present in QNX Neutrino operating system. This type of scheduling divides the CPU in multiple adaptive partitions where each partition holds a percentage of CPU usage called budget, which allows optimal usage of the CPU resources and also provides protection against cyber attacks such as Denial of Service. This approach will also benefit systems, where functional safety is highly demanded, such as the instrumental clusters in the Automotive industry. The purpose of this paper is to present a concept of Adaptive Partition Scheduler designed for Linux-based operating systems.

Keywords: adaptive partitions, Linux kernel modules, real-time systems, scheduling

Procedia PDF Downloads 70
24167 Clustering Performance Analysis using New Correlation-Based Cluster Validity Indices

Authors: Nathakhun Wiroonsri

Abstract:

There are various cluster validity measures used for evaluating clustering results. One of the main objectives of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weaknesses that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal option that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points are located in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios, including the well-known iris data set and a real-world marketing application, have been conducted to compare the proposed validity indices with several well-known ones.

Keywords: clustering algorithm, cluster validity measure, correlation, data partitions, iris data set, marketing, pattern recognition

Procedia PDF Downloads 81
24166 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: bioassay, machine learning, preprocessing, virtual screen

Procedia PDF Downloads 250
24165 Earthquake Classification in Molluca Collision Zone Using Conventional Statistical Methods

Authors: H. J. Wattimanela, U. S. Passaribu, A. N. T. Puspito, S. W. Indratno

Abstract:

Molluca Collision Zone is located at the junction of the Eurasian plate, Australian, Pacific, and the Philippines. Between the Sangihe arc, west of the collision zone, and to the east of Halmahera arc is active collision and convex toward the Molluca Sea. This research will analyze the behavior of earthquake occurrence in Molluca Collision Zone related to the distributions of an earthquake in each partition regions, determining the type of distribution of a occurrence earthquake of partition regions, and the mean occurrence of earthquakes each partition regions, and the correlation between the partitions region. We calculate number of earthquakes using partition method and its behavioral using conventional statistical methods. The data used is the data type of shallow earthquakes with magnitudes ≥ 4 SR for the period 1964-2013 in the Molluca Collision Zone. From the results, we can classify partitioned regions based on the correlation into two classes: strong and very strong. This classification can be used for early warning system in disaster management.

Keywords: molluca collision zone, partition regions, conventional statistical methods, earthquakes, classifications, disaster management

Procedia PDF Downloads 463
24164 Using Self Organizing Feature Maps for Classification in RGB Images

Authors: Hassan Masoumi, Ahad Salimi, Nazanin Barhemmat, Babak Gholami

Abstract:

Artificial neural networks have gained a lot of interest as empirical models for their powerful representational capacity, multi input and output mapping characteristics. In fact, most feed-forward networks with nonlinear nodal functions have been proved to be universal approximates. In this paper, we propose a new supervised method for color image classification based on self organizing feature maps (SOFM). This algorithm is based on competitive learning. The method partitions the input space using self-organizing feature maps to introduce the concept of local neighborhoods. Our image classification system entered into RGB image. Experiments with simulated data showed that separability of classes increased when increasing training time. In additional, the result shows proposed algorithms are effective for color image classification.

Keywords: classification, SOFM algorithm, neural network, neighborhood, RGB image

Procedia PDF Downloads 445
24163 Sensor Network Routing Optimization by Simulating Eurygaster Life in Wheat Farms

Authors: Fariborz Ahmadi, Hamid Salehi, Khosrow Karimi

Abstract:

A sensor network is set of sensor nodes that cooperate together to perform a predefined tasks. The important problem in this network is power consumption. So, in this paper one algorithm based on the eurygaster life is introduced to minimize power consumption by the nodes of these networks. In this method the search space of problem is divided into several partitions and each partition is investigated separately. The evaluation results show that our approach is more efficient in comparison to other evolutionary algorithm like genetic algorithm.

Keywords: evolutionary computation, genetic algorithm, particle swarm optimization, sensor network optimization

Procedia PDF Downloads 394
24162 Diabetes Mellitus and Blood Glucose Variability Increases the 30-day Readmission Rate after Kidney Transplantation

Authors: Harini Chakkera

Abstract:

Background: Inpatient hyperglycemia is an established independent risk factor among several patient cohorts with hospital readmission. This has not been studied after kidney transplantation. Nearly one-third of patients who have undergone a kidney transplant reportedly experience 30-day readmission. Methods: Data on first-time solitary kidney transplantations were retrieved between September 2015 to December 2018. Information was linked to the electronic health record to determine a diagnosis of diabetes mellitus and extract glucometeric and insulin therapy data. Univariate logistic regression analysis and the XGBoost algorithm were used to predict 30-day readmission. We report the average performance of the models on the testing set on five bootstrapped partitions of the data to ensure statistical significance. Results: The cohort included 1036 patients who received kidney transplantation, and 224 (22%) experienced 30-day readmission. The machine learning algorithm was able to predict 30-day readmission with an average AUC of 77.3% (95% CI 75.30-79.3%). We observed statistically significant differences in the presence of pretransplant diabetes, inpatient-hyperglycemia, inpatient-hypoglycemia, and minimum and maximum glucose values among those with higher 30-day readmission rates. The XGBoost model identified the index admission length of stay, presence of hyper- and hypoglycemia and recipient and donor BMI values as the most predictive risk factors of 30-day readmission. Additionally, significant variations in the therapeutic management of blood glucose by providers were observed. Conclusions: Suboptimal glucose metrics during hospitalization after kidney transplantation is associated with an increased risk for 30-day hospital readmission. Optimizing the hospital blood glucose management, a modifiable factor, after kidney transplantation may reduce the risk of 30-day readmission.

Keywords: kidney, transplant, diabetes, insulin

Procedia PDF Downloads 45
24161 Approximation to the Hardy Operator on Topological Measure Spaces

Authors: Kairat T. Mynbaev, Elena N. Lomakina

Abstract:

We consider a Hardy-type operator generated by a family of open subsets of a Hausdorff topological space. The family is indexed with non-negative real numbers and is totally ordered. For this operator, we obtain two-sided bounds of its norm, a compactness criterion, and bounds for its approximation numbers. Previously, bounds for its approximation numbers have been established only in the one-dimensional case, while we do not impose any restrictions on the dimension of the Hausdorff space. The bounds for the norm and conditions for compactness earlier have been found using different methods by G. Sinnamon and K. Mynbaev. Our approach is different in that we use domain partitions for all problems under consideration.

Keywords: approximation numbers, boundedness and compactness, multidimensional Hardy operator, Hausdorff topological space

Procedia PDF Downloads 69
24160 Exploration of Various Metrics for Partitioning of Cellular Automata Units for Efficient Reconfiguration of Field Programmable Gate Arrays (FPGAs)

Authors: Peter Tabatt, Christian Siemers

Abstract:

Using FPGA devices to improve the behavior of time-critical parts of embedded systems is a proven concept for years. With reconfigurable FPGA devices, the logical blocks can be partitioned and grouped into static and dynamic parts. The dynamic parts can be reloaded 'on demand' at runtime. This work uses cellular automata, which are constructed through compilation from (partially restricted) ANSI-C sources, to determine the suitability of various metrics for optimal partitioning. Significant metrics, in this case, are for example the area on the FPGA device for the partition, the pass count for loop constructs and communication characteristics to other partitions. With successful partitioning, it is possible to use smaller FPGA devices for the same requirements as with not reconfigurable FPGA devices or – vice versa – to use the same FPGAs for larger programs.

Keywords: reconfigurable FPGA, cellular automata, partitioning, metrics, parallel computing

Procedia PDF Downloads 243
24159 Seismic Performance Assessment of Pre-70 RC Frame Buildings with FEMA P-58

Authors: D. Cardone

Abstract:

Past earthquakes have shown that seismic events may incur large economic losses in buildings. FEMA P-58 provides engineers a practical tool for the performance seismic assessment of buildings. In this study, FEMA P-58 is applied to two typical Italian pre-1970 reinforced concrete frame buildings, characterized by plain rebars as steel reinforcement and masonry infills and partitions. Given that suitable tools for these buildings are missing in FEMA P- 58, specific fragility curves and loss functions are first developed. Next, building performance is evaluated following a time-based assessment approach. Finally, expected annual losses for the selected buildings are derived and compared with past applications to old RC frame buildings representative of the US building stock. 

Keywords: FEMA P-58, RC frame buildings, plain rebars, Masonry infills, fragility functions, loss functions, expected annual loss

Procedia PDF Downloads 299
24158 Data Clustering Algorithm Based on Multi-Objective Periodic Bacterial Foraging Optimization with Two Learning Archives

Authors: Chen Guo, Heng Tang, Ben Niu

Abstract:

Clustering splits objects into different groups based on similarity, making the objects have higher similarity in the same group and lower similarity in different groups. Thus, clustering can be treated as an optimization problem to maximize the intra-cluster similarity or inter-cluster dissimilarity. In real-world applications, the datasets often have some complex characteristics: sparse, overlap, high dimensionality, etc. When facing these datasets, simultaneously optimizing two or more objectives can obtain better clustering results than optimizing one objective. However, except for the objectives weighting methods, traditional clustering approaches have difficulty in solving multi-objective data clustering problems. Due to this, evolutionary multi-objective optimization algorithms are investigated by researchers to optimize multiple clustering objectives. In this paper, the Data Clustering algorithm based on Multi-objective Periodic Bacterial Foraging Optimization with two Learning Archives (DC-MPBFOLA) is proposed. Specifically, first, to reduce the high computing complexity of the original BFO, periodic BFO is employed as the basic algorithmic framework. Then transfer the periodic BFO into a multi-objective type. Second, two learning strategies are proposed based on the two learning archives to guide the bacterial swarm to move in a better direction. On the one hand, the global best is selected from the global learning archive according to the convergence index and diversity index. On the other hand, the personal best is selected from the personal learning archive according to the sum of weighted objectives. According to the aforementioned learning strategies, a chemotaxis operation is designed. Third, an elite learning strategy is designed to provide fresh power to the objects in two learning archives. When the objects in these two archives do not change for two consecutive times, randomly initializing one dimension of objects can prevent the proposed algorithm from falling into local optima. Fourth, to validate the performance of the proposed algorithm, DC-MPBFOLA is compared with four state-of-art evolutionary multi-objective optimization algorithms and one classical clustering algorithm on evaluation indexes of datasets. To further verify the effectiveness and feasibility of designed strategies in DC-MPBFOLA, variants of DC-MPBFOLA are also proposed. Experimental results demonstrate that DC-MPBFOLA outperforms its competitors regarding all evaluation indexes and clustering partitions. These results also indicate that the designed strategies positively influence the performance improvement of the original BFO.

Keywords: data clustering, multi-objective optimization, bacterial foraging optimization, learning archives

Procedia PDF Downloads 111
24157 Studying Relationship between Local Geometry of Decision Boundary with Network Complexity for Robustness Analysis with Adversarial Perturbations

Authors: Tushar K. Routh

Abstract:

If inputs are engineered in certain manners, they can influence deep neural networks’ (DNN) performances by facilitating misclassifications, a phenomenon well-known as adversarial attacks that question networks’ vulnerability. Recent studies have unfolded the relationship between vulnerability of such networks with their complexity. In this paper, the distinctive influence of additional convolutional layers at the decision boundaries of several DNN architectures was investigated. Here, to engineer inputs from widely known image datasets like MNIST, Fashion MNIST, and Cifar 10, we have exercised One Step Spectral Attack (OSSA) and Fast Gradient Method (FGM) techniques. The aftermaths of adding layers to the robustness of the architectures have been analyzed. For reasoning, separation width from linear class partitions and local geometry (curvature) near the decision boundary have been examined. The result reveals that model complexity has significant roles in adjusting relative distances from margins, as well as the local features of decision boundaries, which impact robustness.

Keywords: DNN robustness, decision boundary, local curvature, network complexity

Procedia PDF Downloads 44
24156 Establishment of Reference Interval for Serum Protein Electrophoresis of Apparently Healthy Adults in Addis Ababa, Ethiopia

Authors: Demiraw Bikila, Tadesse Lejisa, Yosef Tolcha, Chala Bashea, Mehari Meles Tigist Getahun Genet Ashebir, Wossene Habtu, Feyissa Challa, Ousman Mohammed, Melkitu Kassaw, Adisu Kebede, Letebrhan G. Egzeabher, Endalkachew Befekadu, Mistire Wolde, Aster Tsegaye

Abstract:

Background: Even though several factors affect reference intervals (RIs), the company-derived values are currently in use in many laboratories worldwide. However, little or no data is available regarding serum protein RIs, mainly in resource-limited setting countries like Ethiopia. Objective: To establish a reference interval for serum protein electrophoresis of apparently healthy adults in Addis Ababa, Ethiopia. Method: A cross-sectional study was conducted on a total of 297 apparently healthy adults from April-October 2019 in four selected sub-cities (Akaki, Kirkos, Arada, Yeka) of Addis Ababa, Ethiopia. Laboratory analysis of collected samples was performed using Capillarys 2 Flex Piercing analyzer, while statistical analysis was done using SPSS version 23 and med-cal software. Mann-Whitney test was used to check Partitions. Non-parametric method of reference range establishment was performed as per CLSI guideline EP28A3C. Result: The established RIs were: Albumin 53.83-64.59%, 52.24-63.55%; Alpha-1 globulin 3.04-5.40%, 3.44-5.60%; Alpha-2 globulin 8.0-12.67%, 8.44-12.87%; and Beta-1 globulin 5.01-7.38%, 5.14-7.86%. Moreover, Albumin to globulin ratio was 1.16-1.8, 1.09-1.74 for males and females, respectively. The combined RIs for Beta-2 globulin and Gamma globulin were 2.54-4.90% and 12.40-21.66%, respectively. Conclusion: The established reference interval for serum protein fractions revealed gender-specific differences except for Beta-2 globulin and Gamma globulin.

Keywords: serum protein electrophoresis, reference interval, Addis Ababa, Ethiopia

Procedia PDF Downloads 196
24155 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 310
24154 URM Infill in-Plane and out-of-Plane Interaction in Damage Evaluation of RC Frames

Authors: F. Longo, G. Granello, G. Tecchio, F. Da Porto

Abstract:

Unreinforced masonry (URM) infill walls are widely used throughout the world, also in seismic prone regions, as partitions in reinforced concrete building frames. Even if they do not represent structural elements, they can dramatically affect both strength and stiffness of RC structures by acting as a diagonal strut, modifying shear and displacements distribution along the building height, with uncertain consequences on structural safety. In the last decades, many refined models have been developed to describe infill walls effect on frame structural behaviour, but generally restricted to in-plane actions. Only very recently some new approaches were implemented to consider in-plane/out-of-plane interaction of URM infill walls in progressive collapse simulations. In the present work, a particularly promising macro-model was adopted for the progressive collapse analysis of infilled RC frames. The model allows to consider the bi-directional interaction in terms of displacement and strength capacity for URM infills, and to remove the infill contribution when the URM wall is supposed to fail during the analysis process. The model was calibrated on experimental data regarding two different URM panels thickness, modelling with particular care the post-critic softening branch. A frame specimen set representing the most common Italian structures was built considering two main normative approaches: a traditional design philosophy, corresponding to structures erected between 50’s-80’s basically designed to support vertical loads, and a seismic design philosophy, corresponding to current criteria that take into account horizontal actions. Non-Linear Static analyses were carried out on the specimen set and some preliminary evaluations were drawn in terms of different performance exhibited by the RC frame when the contemporary effect of the out-of-plane damage is considered for the URM infill.

Keywords: infill Panels macromodels, in plane-out of plane interaction, RC frames, URM infills

Procedia PDF Downloads 493
24153 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 382
24152 Segmentation of Arabic Handwritten Numeral Strings Based on Watershed Approach

Authors: Nidal F. Shilbayeh, Remah W. Al-Khatib, Sameer A. Nooh

Abstract:

Arabic offline handwriting recognition systems are considered as one of the most challenging topics. Arabic Handwritten Numeral Strings are used to automate systems that deal with numbers such as postal code, banking account numbers and numbers on car plates. Segmentation of connected numerals is the main bottleneck in the handwritten numeral recognition system.  This is in turn can increase the speed and efficiency of the recognition system. In this paper, we proposed algorithms for automatic segmentation and feature extraction of Arabic handwritten numeral strings based on Watershed approach. The algorithms have been designed and implemented to achieve the main goal of segmenting and extracting the string of numeral digits written by hand especially in a courtesy amount of bank checks. The segmentation algorithm partitions the string into multiple regions that can be associated with the properties of one or more criteria. The numeral extraction algorithm extracts the numeral string digits into separated individual digit. Both algorithms for segmentation and feature extraction have been tested successfully and efficiently for all types of numerals.

Keywords: handwritten numerals, segmentation, courtesy amount, feature extraction, numeral recognition

Procedia PDF Downloads 353
24151 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 512
24150 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 534
24149 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 370
24148 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 86
24147 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 349
24146 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 489
24145 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 443
24144 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 367
24143 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 609
24142 Sound Source Localisation and Augmented Reality for On-Site Inspection of Prefabricated Building Components

Authors: Jacques Cuenca, Claudio Colangeli, Agnieszka Mroz, Karl Janssens, Gunther Riexinger, Antonio D'Antuono, Giuseppe Pandarese, Milena Martarelli, Gian Marco Revel, Carlos Barcena Martin

Abstract:

This study presents an on-site acoustic inspection methodology for quality and performance evaluation of building components. The work focuses on global and detailed sound source localisation, by successively performing acoustic beamforming and sound intensity measurements. A portable experimental setup is developed, consisting of an omnidirectional broadband acoustic source and a microphone array and sound intensity probe. Three main acoustic indicators are of interest, namely the sound pressure distribution on the surface of components such as walls, windows and junctions, the three-dimensional sound intensity field in the vicinity of junctions, and the sound transmission loss of partitions. The measurement data is post-processed and converted into a three-dimensional numerical model of the acoustic indicators with the help of the simultaneously acquired geolocation information. The three-dimensional acoustic indicators are then integrated into an augmented reality platform superimposing them onto a real-time visualisation of the spatial environment. The methodology thus enables a measurement-supported inspection process of buildings and the correction of errors during construction and refurbishment. Two experimental validation cases are shown. The first consists of a laboratory measurement on a full-scale mockup of a room, featuring a prefabricated panel. The latter is installed with controlled defects such as lack of insulation and joint sealing material. It is demonstrated that the combined acoustic and augmented reality tool is capable of identifying acoustic leakages from the building defects and assist in correcting them. The second validation case is performed on a prefabricated room at a near-completion stage in the factory. With the help of the measurements and visualisation tools, the homogeneity of the partition installation is evaluated and leakages from junctions and doors are identified. Furthermore, the integration of acoustic indicators together with thermal and geometrical indicators via the augmented reality platform is shown.

Keywords: acoustic inspection, prefabricated building components, augmented reality, sound source localization

Procedia PDF Downloads 352
24141 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 345
24140 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 129