Search results for: data partitions
25117 Empirical Study of Partitions Similarity Measures
Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd
Abstract:
This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.
Procedia PDF Downloads 20025116 Clustering Performance Analysis using New Correlation-Based Cluster Validity Indices
Authors: Nathakhun Wiroonsri
Abstract:
There are various cluster validity measures used for evaluating clustering results. One of the main objectives of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weaknesses that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal option that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points are located in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios, including the well-known iris data set and a real-world marketing application, have been conducted to compare the proposed validity indices with several well-known ones.Keywords: clustering algorithm, cluster validity measure, correlation, data partitions, iris data set, marketing, pattern recognition
Procedia PDF Downloads 10225115 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms
Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang
Abstract:
Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.Keywords: bioassay, machine learning, preprocessing, virtual screen
Procedia PDF Downloads 27325114 Earthquake Classification in Molluca Collision Zone Using Conventional Statistical Methods
Authors: H. J. Wattimanela, U. S. Passaribu, A. N. T. Puspito, S. W. Indratno
Abstract:
Molluca Collision Zone is located at the junction of the Eurasian plate, Australian, Pacific, and the Philippines. Between the Sangihe arc, west of the collision zone, and to the east of Halmahera arc is active collision and convex toward the Molluca Sea. This research will analyze the behavior of earthquake occurrence in Molluca Collision Zone related to the distributions of an earthquake in each partition regions, determining the type of distribution of a occurrence earthquake of partition regions, and the mean occurrence of earthquakes each partition regions, and the correlation between the partitions region. We calculate number of earthquakes using partition method and its behavioral using conventional statistical methods. The data used is the data type of shallow earthquakes with magnitudes ≥ 4 SR for the period 1964-2013 in the Molluca Collision Zone. From the results, we can classify partitioned regions based on the correlation into two classes: strong and very strong. This classification can be used for early warning system in disaster management.Keywords: molluca collision zone, partition regions, conventional statistical methods, earthquakes, classifications, disaster management
Procedia PDF Downloads 49525113 Using Self Organizing Feature Maps for Classification in RGB Images
Authors: Hassan Masoumi, Ahad Salimi, Nazanin Barhemmat, Babak Gholami
Abstract:
Artificial neural networks have gained a lot of interest as empirical models for their powerful representational capacity, multi input and output mapping characteristics. In fact, most feed-forward networks with nonlinear nodal functions have been proved to be universal approximates. In this paper, we propose a new supervised method for color image classification based on self organizing feature maps (SOFM). This algorithm is based on competitive learning. The method partitions the input space using self-organizing feature maps to introduce the concept of local neighborhoods. Our image classification system entered into RGB image. Experiments with simulated data showed that separability of classes increased when increasing training time. In additional, the result shows proposed algorithms are effective for color image classification.Keywords: classification, SOFM algorithm, neural network, neighborhood, RGB image
Procedia PDF Downloads 47725112 Sensor Network Routing Optimization by Simulating Eurygaster Life in Wheat Farms
Authors: Fariborz Ahmadi, Hamid Salehi, Khosrow Karimi
Abstract:
A sensor network is set of sensor nodes that cooperate together to perform a predefined tasks. The important problem in this network is power consumption. So, in this paper one algorithm based on the eurygaster life is introduced to minimize power consumption by the nodes of these networks. In this method the search space of problem is divided into several partitions and each partition is investigated separately. The evaluation results show that our approach is more efficient in comparison to other evolutionary algorithm like genetic algorithm.Keywords: evolutionary computation, genetic algorithm, particle swarm optimization, sensor network optimization
Procedia PDF Downloads 42625111 Anti-Leishmanial Compounds from the Seaweed Padina pavonica
Authors: Nahal Najafi, Afsaneh Yegdaneh, Sedigheh Saberi
Abstract:
Introduction: Leishmaniasis poses a substantial global risk, affecting millions and resulting in thousands of cases each year in endemic regions. Challenges in current leishmaniasis treatments include drug resistance, high toxicity, and pancreatitis. Marine compounds, particularly brown algae, serve as a valuable source of inspiration for discovering treatments against Leishmania. Material and method: Padina pavonica was collected from the Persian Gulf. The seaweeds were dried and extracted with methanol: ethylacetate (1:1). The extract was partitioned to hexane (Hex), dicholoromethane (DCM), butanol, and water by Kupchan partitioning method. Hex partition was fractionated by silica gel column chromatography to 10 fractions (Fr. 1-10). Fr. 6 was further separated by the normal phase HPLC method to yield compounds 1-3. The structures of isolated compounds were elucidated by NMR, Mass, and other spectroscopic methods. Hex and DCM partitions, Fr. 6 and compounds 1-3, were tested for anti-leishmanicidal activity. RAW cell lines were cultured in enriched RPMI (10% FBS, 1% pen-strep) in a 37°C CO2 5% incubator, while promastigote cells were initially cultured in NNN culture and subsequently transferred to the aforementioned medium. Cytotoxicity was assessed using MTT tests, anti-promastigote activity was evaluated through Hemocytometer chamber promastigote counting, and the impact of amastigote damage was determined by counting amastigotes within 100 macrophages. Results: NMR and Mass identified isolated compounds as fucosterol and two sulfoquinovosyldiacylglycerols (SQDG). Among the samples tested, Fr.6 exhibited the highest cytotoxicity (CC50=60.24), while compound 2 showed the lowest cytotoxicity (CC50=21984). Compound 1 and dichloromethane fraction demonstrated the highest and lowest anti-promastigote activity (IC50=115.7, IC50=16.42, respectively), and compound 1 and hexane fraction exhibited the highest and lowest anti-amastigote activity (IC50=7.874, IC50=40.18, respectively). Conclusion: All six samples, including Hex and DCM partitions, Fr.6, and compounds 1-3, demonstrate a noteworthy correlation between rising concentration and time, with a statistically significant P-value of ≤0.05. Considering the higher selectivity index of compound 2 compared to others, it can be inferred that the presence of sulfur groups and unsaturated chains potentially contributes to these effects by impeding the DNA polymerase, which, of course, needs more research.Keywords: Padina, leishmania, sulfoquinovosyldiacylglycerol, cytotoxicity
Procedia PDF Downloads 1925110 Diabetes Mellitus and Blood Glucose Variability Increases the 30-day Readmission Rate after Kidney Transplantation
Authors: Harini Chakkera
Abstract:
Background: Inpatient hyperglycemia is an established independent risk factor among several patient cohorts with hospital readmission. This has not been studied after kidney transplantation. Nearly one-third of patients who have undergone a kidney transplant reportedly experience 30-day readmission. Methods: Data on first-time solitary kidney transplantations were retrieved between September 2015 to December 2018. Information was linked to the electronic health record to determine a diagnosis of diabetes mellitus and extract glucometeric and insulin therapy data. Univariate logistic regression analysis and the XGBoost algorithm were used to predict 30-day readmission. We report the average performance of the models on the testing set on five bootstrapped partitions of the data to ensure statistical significance. Results: The cohort included 1036 patients who received kidney transplantation, and 224 (22%) experienced 30-day readmission. The machine learning algorithm was able to predict 30-day readmission with an average AUC of 77.3% (95% CI 75.30-79.3%). We observed statistically significant differences in the presence of pretransplant diabetes, inpatient-hyperglycemia, inpatient-hypoglycemia, and minimum and maximum glucose values among those with higher 30-day readmission rates. The XGBoost model identified the index admission length of stay, presence of hyper- and hypoglycemia and recipient and donor BMI values as the most predictive risk factors of 30-day readmission. Additionally, significant variations in the therapeutic management of blood glucose by providers were observed. Conclusions: Suboptimal glucose metrics during hospitalization after kidney transplantation is associated with an increased risk for 30-day hospital readmission. Optimizing the hospital blood glucose management, a modifiable factor, after kidney transplantation may reduce the risk of 30-day readmission.Keywords: kidney, transplant, diabetes, insulin
Procedia PDF Downloads 8825109 Approximation to the Hardy Operator on Topological Measure Spaces
Authors: Kairat T. Mynbaev, Elena N. Lomakina
Abstract:
We consider a Hardy-type operator generated by a family of open subsets of a Hausdorff topological space. The family is indexed with non-negative real numbers and is totally ordered. For this operator, we obtain two-sided bounds of its norm, a compactness criterion, and bounds for its approximation numbers. Previously, bounds for its approximation numbers have been established only in the one-dimensional case, while we do not impose any restrictions on the dimension of the Hausdorff space. The bounds for the norm and conditions for compactness earlier have been found using different methods by G. Sinnamon and K. Mynbaev. Our approach is different in that we use domain partitions for all problems under consideration.Keywords: approximation numbers, boundedness and compactness, multidimensional Hardy operator, Hausdorff topological space
Procedia PDF Downloads 10125108 Exploration of Various Metrics for Partitioning of Cellular Automata Units for Efficient Reconfiguration of Field Programmable Gate Arrays (FPGAs)
Authors: Peter Tabatt, Christian Siemers
Abstract:
Using FPGA devices to improve the behavior of time-critical parts of embedded systems is a proven concept for years. With reconfigurable FPGA devices, the logical blocks can be partitioned and grouped into static and dynamic parts. The dynamic parts can be reloaded 'on demand' at runtime. This work uses cellular automata, which are constructed through compilation from (partially restricted) ANSI-C sources, to determine the suitability of various metrics for optimal partitioning. Significant metrics, in this case, are for example the area on the FPGA device for the partition, the pass count for loop constructs and communication characteristics to other partitions. With successful partitioning, it is possible to use smaller FPGA devices for the same requirements as with not reconfigurable FPGA devices or – vice versa – to use the same FPGAs for larger programs.Keywords: reconfigurable FPGA, cellular automata, partitioning, metrics, parallel computing
Procedia PDF Downloads 26725107 Seismic Performance Assessment of Pre-70 RC Frame Buildings with FEMA P-58
Authors: D. Cardone
Abstract:
Past earthquakes have shown that seismic events may incur large economic losses in buildings. FEMA P-58 provides engineers a practical tool for the performance seismic assessment of buildings. In this study, FEMA P-58 is applied to two typical Italian pre-1970 reinforced concrete frame buildings, characterized by plain rebars as steel reinforcement and masonry infills and partitions. Given that suitable tools for these buildings are missing in FEMA P- 58, specific fragility curves and loss functions are first developed. Next, building performance is evaluated following a time-based assessment approach. Finally, expected annual losses for the selected buildings are derived and compared with past applications to old RC frame buildings representative of the US building stock.Keywords: FEMA P-58, RC frame buildings, plain rebars, Masonry infills, fragility functions, loss functions, expected annual loss
Procedia PDF Downloads 32225106 Data Clustering Algorithm Based on Multi-Objective Periodic Bacterial Foraging Optimization with Two Learning Archives
Authors: Chen Guo, Heng Tang, Ben Niu
Abstract:
Clustering splits objects into different groups based on similarity, making the objects have higher similarity in the same group and lower similarity in different groups. Thus, clustering can be treated as an optimization problem to maximize the intra-cluster similarity or inter-cluster dissimilarity. In real-world applications, the datasets often have some complex characteristics: sparse, overlap, high dimensionality, etc. When facing these datasets, simultaneously optimizing two or more objectives can obtain better clustering results than optimizing one objective. However, except for the objectives weighting methods, traditional clustering approaches have difficulty in solving multi-objective data clustering problems. Due to this, evolutionary multi-objective optimization algorithms are investigated by researchers to optimize multiple clustering objectives. In this paper, the Data Clustering algorithm based on Multi-objective Periodic Bacterial Foraging Optimization with two Learning Archives (DC-MPBFOLA) is proposed. Specifically, first, to reduce the high computing complexity of the original BFO, periodic BFO is employed as the basic algorithmic framework. Then transfer the periodic BFO into a multi-objective type. Second, two learning strategies are proposed based on the two learning archives to guide the bacterial swarm to move in a better direction. On the one hand, the global best is selected from the global learning archive according to the convergence index and diversity index. On the other hand, the personal best is selected from the personal learning archive according to the sum of weighted objectives. According to the aforementioned learning strategies, a chemotaxis operation is designed. Third, an elite learning strategy is designed to provide fresh power to the objects in two learning archives. When the objects in these two archives do not change for two consecutive times, randomly initializing one dimension of objects can prevent the proposed algorithm from falling into local optima. Fourth, to validate the performance of the proposed algorithm, DC-MPBFOLA is compared with four state-of-art evolutionary multi-objective optimization algorithms and one classical clustering algorithm on evaluation indexes of datasets. To further verify the effectiveness and feasibility of designed strategies in DC-MPBFOLA, variants of DC-MPBFOLA are also proposed. Experimental results demonstrate that DC-MPBFOLA outperforms its competitors regarding all evaluation indexes and clustering partitions. These results also indicate that the designed strategies positively influence the performance improvement of the original BFO.Keywords: data clustering, multi-objective optimization, bacterial foraging optimization, learning archives
Procedia PDF Downloads 13725105 Studying Relationship between Local Geometry of Decision Boundary with Network Complexity for Robustness Analysis with Adversarial Perturbations
Authors: Tushar K. Routh
Abstract:
If inputs are engineered in certain manners, they can influence deep neural networks’ (DNN) performances by facilitating misclassifications, a phenomenon well-known as adversarial attacks that question networks’ vulnerability. Recent studies have unfolded the relationship between vulnerability of such networks with their complexity. In this paper, the distinctive influence of additional convolutional layers at the decision boundaries of several DNN architectures was investigated. Here, to engineer inputs from widely known image datasets like MNIST, Fashion MNIST, and Cifar 10, we have exercised One Step Spectral Attack (OSSA) and Fast Gradient Method (FGM) techniques. The aftermaths of adding layers to the robustness of the architectures have been analyzed. For reasoning, separation width from linear class partitions and local geometry (curvature) near the decision boundary have been examined. The result reveals that model complexity has significant roles in adjusting relative distances from margins, as well as the local features of decision boundaries, which impact robustness.Keywords: DNN robustness, decision boundary, local curvature, network complexity
Procedia PDF Downloads 7425104 Data Transformations in Data Envelopment Analysis
Authors: Mansour Mohammadpour
Abstract:
Data transformation refers to the modification of any point in a data set by a mathematical function. When applying transformations, the measurement scale of the data is modified. Data transformations are commonly employed to turn data into the appropriate form, which can serve various functions in the quantitative analysis of the data. This study addresses the investigation of the use of data transformations in Data Envelopment Analysis (DEA). Although data transformations are important options for analysis, they do fundamentally alter the nature of the variable, making the interpretation of the results somewhat more complex.Keywords: data transformation, data envelopment analysis, undesirable data, negative data
Procedia PDF Downloads 1925103 Establishment of Reference Interval for Serum Protein Electrophoresis of Apparently Healthy Adults in Addis Ababa, Ethiopia
Authors: Demiraw Bikila, Tadesse Lejisa, Yosef Tolcha, Chala Bashea, Mehari Meles Tigist Getahun Genet Ashebir, Wossene Habtu, Feyissa Challa, Ousman Mohammed, Melkitu Kassaw, Adisu Kebede, Letebrhan G. Egzeabher, Endalkachew Befekadu, Mistire Wolde, Aster Tsegaye
Abstract:
Background: Even though several factors affect reference intervals (RIs), the company-derived values are currently in use in many laboratories worldwide. However, little or no data is available regarding serum protein RIs, mainly in resource-limited setting countries like Ethiopia. Objective: To establish a reference interval for serum protein electrophoresis of apparently healthy adults in Addis Ababa, Ethiopia. Method: A cross-sectional study was conducted on a total of 297 apparently healthy adults from April-October 2019 in four selected sub-cities (Akaki, Kirkos, Arada, Yeka) of Addis Ababa, Ethiopia. Laboratory analysis of collected samples was performed using Capillarys 2 Flex Piercing analyzer, while statistical analysis was done using SPSS version 23 and med-cal software. Mann-Whitney test was used to check Partitions. Non-parametric method of reference range establishment was performed as per CLSI guideline EP28A3C. Result: The established RIs were: Albumin 53.83-64.59%, 52.24-63.55%; Alpha-1 globulin 3.04-5.40%, 3.44-5.60%; Alpha-2 globulin 8.0-12.67%, 8.44-12.87%; and Beta-1 globulin 5.01-7.38%, 5.14-7.86%. Moreover, Albumin to globulin ratio was 1.16-1.8, 1.09-1.74 for males and females, respectively. The combined RIs for Beta-2 globulin and Gamma globulin were 2.54-4.90% and 12.40-21.66%, respectively. Conclusion: The established reference interval for serum protein fractions revealed gender-specific differences except for Beta-2 globulin and Gamma globulin.Keywords: serum protein electrophoresis, reference interval, Addis Ababa, Ethiopia
Procedia PDF Downloads 23425102 URM Infill in-Plane and out-of-Plane Interaction in Damage Evaluation of RC Frames
Authors: F. Longo, G. Granello, G. Tecchio, F. Da Porto
Abstract:
Unreinforced masonry (URM) infill walls are widely used throughout the world, also in seismic prone regions, as partitions in reinforced concrete building frames. Even if they do not represent structural elements, they can dramatically affect both strength and stiffness of RC structures by acting as a diagonal strut, modifying shear and displacements distribution along the building height, with uncertain consequences on structural safety. In the last decades, many refined models have been developed to describe infill walls effect on frame structural behaviour, but generally restricted to in-plane actions. Only very recently some new approaches were implemented to consider in-plane/out-of-plane interaction of URM infill walls in progressive collapse simulations. In the present work, a particularly promising macro-model was adopted for the progressive collapse analysis of infilled RC frames. The model allows to consider the bi-directional interaction in terms of displacement and strength capacity for URM infills, and to remove the infill contribution when the URM wall is supposed to fail during the analysis process. The model was calibrated on experimental data regarding two different URM panels thickness, modelling with particular care the post-critic softening branch. A frame specimen set representing the most common Italian structures was built considering two main normative approaches: a traditional design philosophy, corresponding to structures erected between 50’s-80’s basically designed to support vertical loads, and a seismic design philosophy, corresponding to current criteria that take into account horizontal actions. Non-Linear Static analyses were carried out on the specimen set and some preliminary evaluations were drawn in terms of different performance exhibited by the RC frame when the contemporary effect of the out-of-plane damage is considered for the URM infill.Keywords: infill Panels macromodels, in plane-out of plane interaction, RC frames, URM infills
Procedia PDF Downloads 51325101 Segmentation of Arabic Handwritten Numeral Strings Based on Watershed Approach
Authors: Nidal F. Shilbayeh, Remah W. Al-Khatib, Sameer A. Nooh
Abstract:
Arabic offline handwriting recognition systems are considered as one of the most challenging topics. Arabic Handwritten Numeral Strings are used to automate systems that deal with numbers such as postal code, banking account numbers and numbers on car plates. Segmentation of connected numerals is the main bottleneck in the handwritten numeral recognition system. This is in turn can increase the speed and efficiency of the recognition system. In this paper, we proposed algorithms for automatic segmentation and feature extraction of Arabic handwritten numeral strings based on Watershed approach. The algorithms have been designed and implemented to achieve the main goal of segmenting and extracting the string of numeral digits written by hand especially in a courtesy amount of bank checks. The segmentation algorithm partitions the string into multiple regions that can be associated with the properties of one or more criteria. The numeral extraction algorithm extracts the numeral string digits into separated individual digit. Both algorithms for segmentation and feature extraction have been tested successfully and efficiently for all types of numerals.Keywords: handwritten numerals, segmentation, courtesy amount, feature extraction, numeral recognition
Procedia PDF Downloads 37925100 Processing Big Data: An Approach Using Feature Selection
Authors: Nikat Parveen, M. Ananthi
Abstract:
Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.Keywords: big data, key value, feature selection, retrieval, performance
Procedia PDF Downloads 33925099 Efficient Model Order Reduction of Descriptor Systems Using Iterative Rational Krylov Algorithm
Authors: Muhammad Anwar, Ameen Ullah, Intakhab Alam Qadri
Abstract:
This study presents a technique utilizing the Iterative Rational Krylov Algorithm (IRKA) to reduce the order of large-scale descriptor systems. Descriptor systems, which incorporate differential and algebraic components, pose unique challenges in Model Order Reduction (MOR). The proposed method partitions the descriptor system into polynomial and strictly proper parts to minimize approximation errors, applying IRKA exclusively to the strictly adequate component. This approach circumvents the unbounded errors that arise when IRKA is directly applied to the entire system. A comparative analysis demonstrates the high accuracy of the reduced model and a significant reduction in computational burden. The reduced model enables more efficient simulations and streamlined controller designs. The study highlights IRKA-based MOR’s effectiveness in optimizing complex systems’ performance across various engineering applications. The proposed methodology offers a promising solution for reducing the complexity of large-scale descriptor systems while maintaining their essential characteristics and facilitating their analysis, simulation, and control design.Keywords: model order reduction, descriptor systems, iterative rational Krylov algorithm, interpolatory model reduction, computational efficiency, projection methods, H₂-optimal model reduction
Procedia PDF Downloads 2925098 Applications of Big Data in Education
Authors: Faisal Kalota
Abstract:
Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.Keywords: big data, learning analytics, analytics, big data in education, Hadoop
Procedia PDF Downloads 42325097 Analysis of Big Data
Authors: Sandeep Sharma, Sarabjit Singh
Abstract:
As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.Keywords: big data, unstructured data, volume, variety, velocity
Procedia PDF Downloads 54525096 Sound Source Localisation and Augmented Reality for On-Site Inspection of Prefabricated Building Components
Authors: Jacques Cuenca, Claudio Colangeli, Agnieszka Mroz, Karl Janssens, Gunther Riexinger, Antonio D'Antuono, Giuseppe Pandarese, Milena Martarelli, Gian Marco Revel, Carlos Barcena Martin
Abstract:
This study presents an on-site acoustic inspection methodology for quality and performance evaluation of building components. The work focuses on global and detailed sound source localisation, by successively performing acoustic beamforming and sound intensity measurements. A portable experimental setup is developed, consisting of an omnidirectional broadband acoustic source and a microphone array and sound intensity probe. Three main acoustic indicators are of interest, namely the sound pressure distribution on the surface of components such as walls, windows and junctions, the three-dimensional sound intensity field in the vicinity of junctions, and the sound transmission loss of partitions. The measurement data is post-processed and converted into a three-dimensional numerical model of the acoustic indicators with the help of the simultaneously acquired geolocation information. The three-dimensional acoustic indicators are then integrated into an augmented reality platform superimposing them onto a real-time visualisation of the spatial environment. The methodology thus enables a measurement-supported inspection process of buildings and the correction of errors during construction and refurbishment. Two experimental validation cases are shown. The first consists of a laboratory measurement on a full-scale mockup of a room, featuring a prefabricated panel. The latter is installed with controlled defects such as lack of insulation and joint sealing material. It is demonstrated that the combined acoustic and augmented reality tool is capable of identifying acoustic leakages from the building defects and assist in correcting them. The second validation case is performed on a prefabricated room at a near-completion stage in the factory. With the help of the measurements and visualisation tools, the homogeneity of the partition installation is evaluated and leakages from junctions and doors are identified. Furthermore, the integration of acoustic indicators together with thermal and geometrical indicators via the augmented reality platform is shown.Keywords: acoustic inspection, prefabricated building components, augmented reality, sound source localization
Procedia PDF Downloads 38125095 Research of Data Cleaning Methods Based on Dependency Rules
Authors: Yang Bao, Shi Wei Deng, WangQun Lin
Abstract:
This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.Keywords: data cleaning, dependency rules, violation data discovery, data repair
Procedia PDF Downloads 56225094 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity
Authors: Hoda A. Abdel Hafez
Abstract:
Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.Keywords: mining big data, big data, machine learning, telecommunication
Procedia PDF Downloads 40725093 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach
Authors: Theertha Chandroth
Abstract:
This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.Keywords: XML, JSON, data comparison, integration testing, Python, SQL
Procedia PDF Downloads 13825092 Using Machine Learning Techniques to Extract Useful Information from Dark Data
Authors: Nigar Hussain
Abstract:
It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.Keywords: big data, dark data, machine learning, heatmap, random forest
Procedia PDF Downloads 2725091 Multi-Source Data Fusion for Urban Comprehensive Management
Authors: Bolin Hua
Abstract:
In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data
Procedia PDF Downloads 39225090 Reviewing Privacy Preserving Distributed Data Mining
Authors: Sajjad Baghernezhad, Saeideh Baghernezhad
Abstract:
Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.Keywords: data mining, distributed data mining, privacy protection, privacy preserving
Procedia PDF Downloads 52325089 The Right to Data Portability and Its Influence on the Development of Digital Services
Authors: Roman Bieda
Abstract:
The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.Keywords: data portability, digital market, GDPR, personal data
Procedia PDF Downloads 47125088 Recent Advances in Data Warehouse
Authors: Fahad Hanash Alzahrani
Abstract:
This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing
Procedia PDF Downloads 402