Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 26192

Search results for: hybrid data handler

24962 Survival Data with Incomplete Missing Categorical Covariates

Authors: Madaki Umar Yusuf, Mohd Rizam B. Abubakar

Abstract:

The survival censored data with incomplete covariate data is a common occurrence in many studies in which the outcome is survival time. With model when the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM by the method of weights. The survival outcome for the class of generalized linear model is applied and this method requires the estimation of the parameters of the distribution of the covariates. In this paper, we propose some clinical trials with ve covariates, four of which have some missing values which clearly show that they were fully censored data.

Keywords: EM algorithm, incomplete categorical covariates, ignorable missing data, missing at random (MAR), Weibull Distribution

Procedia PDF Downloads 400

24961 A Study of Blockchain Oracles

Authors: Abdeljalil Beniiche

Abstract:

The limitation with smart contracts is that they cannot access external data that might be required to control the execution of business logic. Oracles can be used to provide external data to smart contracts. An oracle is an interface that delivers data from external data outside the blockchain to a smart contract to consume. Oracle can deliver different types of data depending on the industry and requirements. In this paper, we study and describe the widely used blockchain oracles. Then, we elaborate on his potential role, technical architecture, and design patterns. Finally, we discuss the human oracle and its key role in solving the truth problem by reaching a consensus about a certain inquiry and tasks.

Keywords: blockchain, oracles, oracles design, human oracles

Procedia PDF Downloads 131

24960 Social Innovation, Change and the Future of Resilient Communities in Tokyo

Authors: Heide Imai

Abstract:

The paper will introduce and discuss specific examples of urban practices which take place within the dynamic urban landscape of contemporary Tokyo. The rising interest and importance of derelict places as resilient and creative clusters will be analysed, before relating this to the rediscovery of small urban niches and the emergence of different forms of social entrepreneurs. Secondly, two different case study areas will be introduced before discussing different forms of hybrid lifestyles, social micro scale enterprises and social innovations, understanding the concept of ‘small places of resilience’ as zones of human interaction, desire and care in which spontaneous practices take place.

Keywords: entrepreneurship, social innovation, Tokyo, urban regeneration

Procedia PDF Downloads 475

24959 Multi Data Management Systems in a Cluster Randomized Trial in Poor Resource Setting: The Pneumococcal Vaccine Schedules Trial

Authors: Abdoullah Nyassi, Golam Sarwar, Sarra Baldeh, Mamadou S. K. Jallow, Bai Lamin Dondeh, Isaac Osei, Grant A. Mackenzie

Abstract:

A randomized controlled trial is the "gold standard" for evaluating the efficacy of an intervention. Large-scale, cluster-randomized trials are expensive and difficult to conduct, though. To guarantee the validity and generalizability of findings, high-quality, dependable, and accurate data management systems are necessary. Robust data management systems are crucial for optimizing and validating the quality, accuracy, and dependability of trial data. Regarding the difficulties of data gathering in clinical trials in low-resource areas, there is a scarcity of literature on this subject, which may raise concerns. Effective data management systems and implementation goals should be part of trial procedures. Publicizing the creative clinical data management techniques used in clinical trials should boost public confidence in the study's conclusions and encourage further replication. In the ongoing pneumococcal vaccine schedule study in rural Gambia, this report details the development and deployment of multi-data management systems and methodologies. We implemented six different data management, synchronization, and reporting systems using Microsoft Access, RedCap, SQL, Visual Basic, Ruby, and ASP.NET. Additionally, data synchronization tools were developed to integrate data from these systems into the central server for reporting systems. Clinician, lab, and field data validation systems and methodologies are the main topics of this report. Our process development efforts across all domains were driven by the complexity of research project data collected in real-time data, online reporting, data synchronization, and ways for cleaning and verifying data. Consequently, we effectively used multi-data management systems, demonstrating the value of creative approaches in enhancing the consistency, accuracy, and reporting of trial data in a poor resource setting.

Keywords: data management, data collection, data cleaning, cluster-randomized trial

Procedia PDF Downloads 15

24958 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 273

24957 A Fuzzy Multi-Criteria Model for Sustainable Development of Community-Based Tourism through the Homestay Program in Malaysia

Authors: Azizah Ismail, Zainab Khalifah, Abbas Mardani

Abstract:

Sustainable community-based tourism through homestay programme is a growing niche market that has impacted destinations in many countries including Malaysia. With demand predicted to continue increasing, the importance of the homestay product will grow in the tourism industry. This research examines the sustainability criteria for homestay programme in Malaysia covering economic, socio-cultural and environmental dimensions. This research applied a two-stage methodology for data analysis. Specifically, the researcher implements a hybrid method which combines two multi-criteria decision making approaches. In the first stage of the methodology, the Decision Making Trial and Evaluation Laboratory (DEMATEL) technique is applied. Then, Analytical Network Process (ANP) is employed for the achievement of the objective of the current research. After factors identification and problem formulation, DEMATEL is used to detect complex relationships and to build a Network Relation Map (NRM). Then ANP is used to prioritize and find the weights of the criteria and sub-criteria of the decision model. The research verifies the framework of multi-criteria for sustainable community-based tourism from the perspective of stakeholders. The result also provides a different perspective on the importance of sustainable criteria from the view of multi-stakeholders. Practically, this research gives the framework model and helps stakeholders to improve and innovate the homestay programme and also promote community-based tourism.

Keywords: community-based tourism, homestay programme, sustainable tourism criteria, sustainable tourism development

Procedia PDF Downloads 128

24956 An Efficient Traceability Mechanism in the Audited Cloud Data Storage

Authors: Ramya P, Lino Abraham Varghese, S. Bose

Abstract:

By cloud storage services, the data can be stored in the cloud, and can be shared across multiple users. Due to the unexpected hardware/software failures and human errors, which make the data stored in the cloud be lost or corrupted easily it affected the integrity of data in cloud. Some mechanisms have been designed to allow both data owners and public verifiers to efficiently audit cloud data integrity without retrieving the entire data from the cloud server. But public auditing on the integrity of shared data with the existing mechanisms will unavoidably reveal confidential information such as identity of the person, to public verifiers. Here a privacy-preserving mechanism is proposed to support public auditing on shared data stored in the cloud. It uses group signatures to compute verification metadata needed to audit the correctness of shared data. The identity of the signer on each block in shared data is kept confidential from public verifiers, who are easily verifying shared data integrity without retrieving the entire file. But on demand, the signer of the each block is reveal to the owner alone. Group private key is generated once by the owner in the static group, where as in the dynamic group, the group private key is change when the users revoke from the group. When the users leave from the group the already signed blocks are resigned by cloud service provider instead of owner is efficiently handled by efficient proxy re-signature scheme.

Keywords: data integrity, dynamic group, group signature, public auditing

Procedia PDF Downloads 387

24955 Rodriguez Diego, Del Valle Martin, Hargreaves Matias, Riveros Jose Luis

Authors: Nathainail Bashir, Neil Anderson

Abstract:

The objective of this study site was to investigate the current state of the practice with regards to karst detection methods and recommend the best method and pattern of arrays to acquire the desire results. Proper site investigation in karst prone regions is extremely valuable in determining the location of possible voids. Two geophysical techniques were employed: multichannel analysis of surface waves (MASW) and electric resistivity tomography (ERT).The MASW data was acquired at each test location using different array lengths and different array orientations (to increase the probability of getting interpretable data in karst terrain). The ERT data were acquired using a dipole-dipole array consisting of 168 electrodes. The MASW data was interpreted (re: estimated depth to physical top of rock) and used to constrain and verify the interpretation of the ERT data. The ERT data indicates poorer quality MASW data were acquired in areas where there was significant local variation in the depth to top of rock.

Keywords: dipole-dipole, ERT, Karst terrains, MASW

Procedia PDF Downloads 314

24954 Data Science in Military Decision-Making: A Semi-Systematic Literature Review

Authors: H. W. Meerveld, R. H. A. Lindelauf

Abstract:

In contemporary warfare, data science is crucial for the military in achieving information superiority. Yet, to the authors’ knowledge, no extensive literature survey on data science in military decision-making has been conducted so far. In this study, 156 peer-reviewed articles were analysed through an integrative, semi-systematic literature review to gain an overview of the topic. The study examined to what extent literature is focussed on the opportunities or risks of data science in military decision-making, differentiated per level of war (i.e. strategic, operational, and tactical level). A relatively large focus on the risks of data science was observed in social science literature, implying that political and military policymakers are disproportionally influenced by a pessimistic view on the application of data science in the military domain. The perceived risks of data science are, however, hardly addressed in formal science literature. This means that the concerns on the military application of data science are not addressed to the audience that can actually develop and enhance data science models and algorithms. Cross-disciplinary research on both the opportunities and risks of military data science can address the observed research gaps. Considering the levels of war, relatively low attention for the operational level compared to the other two levels was observed, suggesting a research gap with reference to military operational data science. Opportunities for military data science mostly arise at the tactical level. On the contrary, studies examining strategic issues mostly emphasise the risks of military data science. Consequently, domain-specific requirements for military strategic data science applications are hardly expressed. Lacking such applications may ultimately lead to a suboptimal strategic decision in today’s warfare.

Keywords: data science, decision-making, information superiority, literature review, military

Procedia PDF Downloads 156

24953 Legal Regulation of Personal Information Data Transmission Risk Assessment: A Case Study of the EU’s DPIA

Authors: Cai Qianyi

Abstract:

In the midst of global digital revolution, the flow of data poses security threats that call China's existing legislative framework for protecting personal information into question. As a preliminary procedure for risk analysis and prevention, the risk assessment of personal data transmission lacks detailed guidelines for support. Existing provisions reveal unclear responsibilities for network operators and weakened rights for data subjects. Furthermore, the regulatory system's weak operability and a lack of industry self-regulation heighten data transmission hazards. This paper aims to compare the regulatory pathways for data information transmission risks between China and Europe from a legal framework and content perspective. It draws on the “Data Protection Impact Assessment Guidelines” to empower multiple stakeholders, including data processors, controllers, and subjects, while also defining obligations. In conclusion, this paper intends to solve China's digital security shortcomings by developing a more mature regulatory framework and industry self-regulation mechanisms, resulting in a win-win situation for personal data protection and the development of the digital economy.

Keywords: personal information data transmission, risk assessment, DPIA, internet service provider, personal information data transimission, risk assessment

Procedia PDF Downloads 55

24952 Wavelets Contribution on Textual Data Analysis

Authors: Habiba Ben Abdessalem

Abstract:

The emergence of giant set of textual data was the push that has encouraged researchers to invest in this field. The purpose of textual data analysis methods is to facilitate access to such type of data by providing various graphic visualizations. Applying these methods requires a corpus pretreatment step, whose standards are set according to the objective of the problem studied. This step determines the forms list contained in contingency table by keeping only those information carriers. This step may, however, lead to noisy contingency tables, so the use of wavelet denoising function. The validity of the proposed approach is tested on a text database that offers economic and political events in Tunisia for a well definite period.

Keywords: textual data, wavelet, denoising, contingency table

Procedia PDF Downloads 275

24951 Employing Remotely Sensed Soil and Vegetation Indices and Predicting ‎by Long ‎Short-Term Memory to Irrigation Scheduling Analysis

Authors: Elham Koohikerade, Silvio Jose Gumiere

Abstract:

In this research, irrigation is highlighted as crucial for improving both the yield and quality of ‎potatoes due to their high sensitivity to soil moisture changes. The study presents a hybrid Long ‎Short-Term Memory (LSTM) model aimed at optimizing irrigation scheduling in potato fields in ‎Quebec City, Canada. This model integrates model-based and satellite-derived datasets to simulate ‎soil moisture content, addressing the limitations of field data. Developed under the guidance of the ‎Food and Agriculture Organization (FAO), the simulation approach compensates for the lack of direct ‎soil sensor data, enhancing the LSTM model's predictions. The model was calibrated using indices ‎like Surface Soil Moisture (SSM), Normalized Vegetation Difference Index (NDVI), Enhanced ‎Vegetation Index (EVI), and Normalized Multi-band Drought Index (NMDI) to effectively forecast ‎soil moisture reductions. Understanding soil moisture and plant development is crucial for assessing ‎drought conditions and determining irrigation needs. This study validated the spectral characteristics ‎of vegetation and soil using ECMWF Reanalysis v5 (ERA5) and Moderate Resolution Imaging ‎Spectrometer (MODIS) data from 2019 to 2023, collected from agricultural areas in Dolbeau and ‎Peribonka, Quebec. Parameters such as surface volumetric soil moisture (0-7 cm), NDVI, EVI, and ‎NMDI were extracted from these images. A regional four-year dataset of soil and vegetation moisture ‎was developed using a machine learning approach combining model-based and satellite-based ‎datasets. The LSTM model predicts soil moisture dynamics hourly across different locations and ‎times, with its accuracy verified through cross-validation and comparison with existing soil moisture ‎datasets. The model effectively captures temporal dynamics, making it valuable for applications ‎requiring soil moisture monitoring over time, such as anomaly detection and memory analysis. By ‎identifying typical peak soil moisture values and observing distribution shapes, irrigation can be ‎scheduled to maintain soil moisture within Volumetric Soil Moisture (VSM) values of 0.25 to 0.30 ‎m²/m², avoiding under and over-watering. The strong correlations between parcels suggest that a ‎uniform irrigation strategy might be effective across multiple parcels, with adjustments based on ‎specific parcel characteristics and historical data trends. The application of the LSTM model to ‎predict soil moisture and vegetation indices yielded mixed results. While the model effectively ‎captures the central tendency and temporal dynamics of soil moisture, it struggles with accurately ‎predicting EVI, NDVI, and NMDI.‎

Keywords: irrigation scheduling, LSTM neural network, remotely sensed indices, soil and vegetation ‎monitoring

Procedia PDF Downloads 38

24950 Solar Cell Packed and Insulator Fused Panels for Efficient Cooling in Cubesat and Satellites

Authors: Anand K. Vinu, Vaishnav Vimal, Sasi Gopalan

Abstract:

All spacecraft components have a range of allowable temperatures that must be maintained to meet survival and operational requirements during all mission phases. Due to heat absorption, transfer, and emission on one side, the satellite surface presents an asymmetric temperature distribution and causes a change in momentum, which can manifest in spinning and non-spinning satellites in different manners. This problem can cause orbital decays in satellites which, if not corrected, will interfere with its primary objective. The thermal analysis of any satellite requires data from the power budget for each of the components used. This is because each of the components has different power requirements, and they are used at specific times in an orbit. There are three different cases that are run, one is the worst operational hot case, the other one is the worst non-operational cold case, and finally, the operational cold case. Sunlight is a major source of heating that takes place on the satellite. The way in which it affects the spacecraft depends on the distance from the Sun. Any part of a spacecraft or satellite facing the Sun will absorb heat (a net gain), and any facing away will radiate heat (a net loss). We can use the state-of-the-art foldable hybrid insulator/radiator panel. When the panels are opened, that particular side acts as a radiator for dissipating the heat. Here the insulator, in our case, the aerogel, is sandwiched with solar cells and radiator fins (solar cells outside and radiator fins inside). Each insulated side panel can be opened and closed using actuators depending on the telemetry data of the CubeSat. The opening and closing of the panels are dependent on the special code designed for this particular application, where the computer calculates where the Sun is relative to the satellites. According to the data obtained from the sensors, the computer decides which panel to open and by how many degrees. For example, if the panels open 180 degrees, the solar panels will directly face the Sun, in turn increasing the current generator of that particular panel. One example is when one of the corners of the CubeSat is facing or if more than one side is having a considerable amount of sun rays incident on it. Then the code will analyze the optimum opening angle for each panel and adjust accordingly. Another means of cooling is the passive way of cooling. It is the most suitable system for a CubeSat because of its limited power budget constraints, low mass requirements, and less complex design. Other than this fact, it also has other advantages in terms of reliability and cost. One of the passive means is to make the whole chase act as a heat sink. For this, we can make the entire chase out of heat pipes and connect the heat source to this chase with a thermal strap that transfers the heat to the chassis.

Keywords: passive cooling, CubeSat, efficiency, satellite, stationary satellite

Procedia PDF Downloads 94

24949 Customer Churn Analysis in Telecommunication Industry Using Data Mining Approach

Authors: Burcu Oralhan, Zeki Oralhan, Nilsun Sariyer, Kumru Uyar

Abstract:

Data mining has been becoming more and more important and a wide range of applications in recent years. Data mining is the process of find hidden and unknown patterns in big data. One of the applied fields of data mining is Customer Relationship Management. Understanding the relationships between products and customers is crucial for every business. Customer Relationship Management is an approach to focus on customer relationship development, retention and increase on customer satisfaction. In this study, we made an application of a data mining methods in telecommunication customer relationship management side. This study aims to determine the customers profile who likely to leave the system, develop marketing strategies, and customized campaigns for customers. Data are clustered by applying classification techniques for used to determine the churners. As a result of this study, we will obtain knowledge from international telecommunication industry. We will contribute to the understanding and development of this subject in Customer Relationship Management.

Keywords: customer churn analysis, customer relationship management, data mining, telecommunication industry

Procedia PDF Downloads 311

24948 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis

Authors: N. R. N. Idris, S. Baharom

Abstract:

A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates. On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.

Keywords: aggregate data, combined-level data, individual patient data, meta-analysis

Procedia PDF Downloads 371

24947 Analyzing On-Line Process Data for Industrial Production Quality Control

Authors: Hyun-Woo Cho

Abstract:

The monitoring of industrial production quality has to be implemented to alarm early warning for unusual operating conditions. Furthermore, identification of their assignable causes is necessary for a quality control purpose. For such tasks many multivariate statistical techniques have been applied and shown to be quite effective tools. This work presents a process data-based monitoring scheme for production processes. For more reliable results some additional steps of noise filtering and preprocessing are considered. It may lead to enhanced performance by eliminating unwanted variation of the data. The performance evaluation is executed using data sets from test processes. The proposed method is shown to provide reliable quality control results, and thus is more effective in quality monitoring in the example. For practical implementation of the method, an on-line data system must be available to gather historical and on-line data. Recently large amounts of data are collected on-line in most processes and implementation of the current scheme is feasible and does not give additional burdens to users.

Keywords: detection, filtering, monitoring, process data

Procedia PDF Downloads 552

24946 A Review of Travel Data Collection Methods

Authors: Muhammad Awais Shafique, Eiji Hato

Abstract:

Household trip data is of crucial importance for managing present transportation infrastructure as well as to plan and design future facilities. It also provides basis for new policies implemented under Transportation Demand Management. The methods used for household trip data collection have changed with passage of time, starting with the conventional face-to-face interviews or paper-and-pencil interviews and reaching to the recent approach of employing smartphones. This study summarizes the step-wise evolution in the travel data collection methods. It provides a comprehensive review of the topic, for readers interested to know the changing trends in the data collection field.

Keywords: computer, smartphone, telephone, travel survey

Procedia PDF Downloads 307

24945 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain

Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami

Abstract:

To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. In the blockchain mechanism such as Bitcoin using PKI (Public Key Infrastructure), in order to confirm the identity of the company that has sent the data, the plaintext must be shared between the companies. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is a top secret. In this scenario, we show a implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.

Keywords: business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption

Procedia PDF Downloads 132

24944 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: cluster analysis, education, mathematics, profiles

Procedia PDF Downloads 122

24943 Parallel Hybrid Honeypot and IDS Architecture to Detect Network Attacks

Authors: Hafiz Gulfam Ahmad, Chuangdong Li, Zeeshan Ahmad

Abstract:

In this paper, we proposed a parallel IDS and honeypot based approach to detect and analyze the unknown and known attack taxonomy for improving the IDS performance and protecting the network from intruders. The main theme of our approach is to record and analyze the intruder activities by using both the low and high interaction honeypots. Our architecture aims to achieve the required goals by combing signature based IDS, honeypots and generate the new signatures. The paper describes the basic component, design and implementation of this approach and also demonstrates the effectiveness of this approach reducing the probability of network attacks.

Keywords: network security, intrusion detection, honeypot, snort, nmap

Procedia PDF Downloads 560

24942 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators

Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros

Abstract:

Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.

Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis

Procedia PDF Downloads 133

24941 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm

Procedia PDF Downloads 136

24940 Canopy Temperature Acquired from Daytime and Nighttime Aerial Data as an Indicator of Trees’ Health Status

Authors: Agata Zakrzewska, Dominik Kopeć, Adrian Ochtyra

Abstract:

The growing number of new cameras, sensors, and research methods allow for a broader application of thermal data in remote sensing vegetation studies. The aim of this research was to check whether it is possible to use thermal infrared data with a spectral range (3.6-4.9 μm) obtained during the day and the night to assess the health condition of selected species of deciduous trees in an urban environment. For this purpose, research was carried out in the city center of Warsaw (Poland) in 2020. During the airborne data acquisition, thermal data, laser scanning, and orthophoto map images were collected. Synchronously with airborne data, ground reference data were obtained for 617 studied species (Acer platanoides, Acer pseudoplatanus, Aesculus hippocastanum, Tilia cordata, and Tilia × euchlora) in different health condition states. The results were as follows: (i) healthy trees are cooler than trees in poor condition and dying both in the daytime and nighttime data; (ii) the difference in the canopy temperatures between healthy and dying trees was 1.06oC of mean value on the nighttime data and 3.28oC of mean value on the daytime data; (iii) condition classes significantly differentiate on both daytime and nighttime thermal data, but only on daytime data all condition classes differed statistically significantly from each other. In conclusion, the aerial thermal data can be considered as an alternative to hyperspectral data, a method of assessing the health condition of trees in an urban environment. Especially data obtained during the day, which can differentiate condition classes better than data obtained at night. The method based on thermal infrared and laser scanning data fusion could be a quick and efficient solution for identifying trees in poor health that should be visually checked in the field.

Keywords: middle wave infrared, thermal imagery, tree discoloration, urban trees

Procedia PDF Downloads 110

24939 Integration of a Microbial Electrolysis Cell and an Oxy-Combustion Boiler

Authors: Ruth Diego, Luis M. Romeo, Antonio Morán

Abstract:

In the present work, a study of the coupling of a Bioelectrochemical System together with an oxy-combustion boiler is carried out; specifically, it proposes to connect the combustion gas outlet of a boiler with a microbial electrolysis cell (MEC) where the CO2 from the gases are transformed into methane in the cathode chamber, and the oxygen produced in the anode chamber is recirculated to the oxy-combustion boiler. The MEC mainly consists of two electrodes (anode and cathode) immersed in an aqueous electrolyte; these electrodes are separated by a proton exchange membrane (PEM). In this case, the anode is abiotic (where oxygen is produced), and it is at the cathode that an electroactive biofilm is formed with microorganisms that catalyze the CO2 reduction reactions. Real data from an oxy-combustion process in a boiler of around 20 thermal MW have been used for this study and are combined with data obtained on a smaller scale (laboratory-pilot scale) to determine the yields that could be obtained considering the system as environmentally sustainable energy storage. In this way, an attempt is made to integrate a relatively conventional energy production system (oxy-combustion) with a biological system (microbial electrolysis cell), which is a challenge to be addressed in this type of new hybrid scheme. In this way, a novel concept is presented with the basic dimensioning of the necessary equipment and the efficiency of the global process. In this work, it has been calculated that the efficiency of this power-to-gas system based on MEC cells when coupled to industrial processes is of the same order of magnitude as the most promising equivalent routes. The proposed process has two main limitations, the overpotentials in the electrodes that penalize the overall efficiency and the need for storage tanks for the process gases. The results of the calculations carried out in this work show that certain real potentials achieve an acceptable performance. Regarding the tanks, with adequate dimensioning, it is possible to achieve complete autonomy. The proposed system called OxyMES provides energy storage without energetically penalizing the process when compared to an oxy-combustion plant with conventional CO2 capture. According to the results obtained, this system can be applied as a measure to decarbonize an industry, changing the original fuel of the oxy-combustion boiler to the biogas generated in the MEC cell. It could also be used to neutralize CO2 emissions from industry by converting it to methane and then injecting it into the natural gas grid.

Keywords: microbial electrolysis cells, oxy-combustion, co2, power-to-gas

Procedia PDF Downloads 104

24938 On the Approximate Solution of Continuous Coefficients for Solving Third Order Ordinary Differential Equations

Authors: A. M. Sagir

Abstract:

This paper derived four newly schemes which are combined in order to form an accurate and efficient block method for parallel or sequential solution of third order ordinary differential equations of the form y^'''= f(x,y,y^',y^'' ), y(α)=y_0,〖y〗^' (α)=β,y^('' ) (α)=μ with associated initial or boundary conditions. The implementation strategies of the derived method have shown that the block method is found to be consistent, zero stable and hence convergent. The derived schemes were tested on stiff and non-stiff ordinary differential equations, and the numerical results obtained compared favorably with the exact solution.

Keywords: block method, hybrid, linear multistep, self-starting, third order ordinary differential equations

Procedia PDF Downloads 267

24937 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: clustering, unsupervised learning, algorithms, hierarchical

Procedia PDF Downloads 881

24936 End to End Monitoring in Oracle Fusion Middleware for Data Verification

Authors: Syed Kashif Ali, Usman Javaid, Abdullah Chohan

Abstract:

In large enterprises multiple departments use different sort of information systems and databases according to their needs. These systems are independent and heterogeneous in nature and sharing information/data between these systems is not an easy task. The usage of middleware technologies have made data sharing between systems very easy. However, monitoring the exchange of data/information for verification purposes between target and source systems is often complex or impossible for maintenance department due to security/access privileges on target and source systems. In this paper, we are intended to present our experience of an end to end data monitoring approach at middle ware level implemented in Oracle BPEL for data verification without any help of monitoring tool.

Keywords: service level agreement, SOA, BPEL, oracle fusion middleware, web service monitoring

Procedia PDF Downloads 476

24935 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering

Authors: K. Umbleja, M. Ichino

Abstract:

Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.

Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis

Procedia PDF Downloads 157

24934 Multilingualism and the Creation of New Languages: The Case of Camfranglais Spoken in Italy and Germany

Authors: Jocelyne Kenne Kenne

Abstract:

Previous works in the field of sociolinguistics have explored the various outcomes of linguistic pluralism. One of these outcomes is the creation of new languages. The presentation will focus on one of such languages, Camfranglais, a hybrid language spoken by Cameroonians. It appeared in the 1970s in the francophone area in Cameroon and developed as a result of interactions between French, English, Cameroonian Pidgin English and local Cameroonian languages, all languages spoken in Cameroon. With the migration of Cameroonians to Europe, researches have been conducted to analyze the sociolinguistic profile of Cameroonians in their new environment. The emphasis on this presentation will be on two recent studies that have been conducted to analyze the peculiarity of Camfranglais in two European countries: Germany and Italy. The research involved 59 Cameroonians living in Italy and 49 Cameroonians residing in Germany. The respondents were composed of participants from different linguistic background, students and workers, married and single. A combination of quantitative and qualitative research methods was employed. The field study was divided into three parts. The first part was focused on observing the Cameroonians interact in different places such as in canteens, in the university halls of residence, lecture theatres, at homes, at various Cameroonian meetings. Those observations were accompanied by audio-recordings of the various interactions. The aim was to study communication between Cameroonians to see whether they use Camfranglais or not; if yes, in which domains and what were the speakers’ linguistic profiles. Additionally, questionnaires of different lengths were used to collect biographical information concerning the participants and their sociolinguistic profile and finally, in-depth interviews with Cameroonians were conducted to inquire about the use, the functions and the importance of this language in the migratory context. The results of the research demonstrate how a widespread use of Camfranglais by Cameroonians in Germany and Italy reveal a longing for home on the one hand and a sign of belonging on the other. It also shows the differences that exist between the profiles of Camfranglais speakers in Europe and the speakers in Cameroon notably in terms of age and social class. Finally, it points out some differences in the use, the structure and the functions of this hybrid language in the migratory setting. This study is a contribution to existing research in the field of contact languages and can serve as a comparison for other situations of multilingualism and the creation of mixed languages. Furthermore, with globalization, the study of migrant languages and the contact of these languages with new languages are topics that might be productive for further research in the field of sociolinguistics.

Keywords: interaction, migrants language, multilingualism, mixed languages

Procedia PDF Downloads 210

24933 Hybridization of Mathematical Transforms for Robust Video Watermarking Technique

Authors: Harpal Singh, Sakshi Batra

Abstract:

The widespread and easy accesses to multimedia contents and possibility to make numerous copies without loss of significant fidelity have roused the requirement of digital rights management. Thus this problem can be effectively solved by Digital watermarking technology. This is a concept of embedding some sort of data or special pattern (watermark) in the multimedia content; this information will later prove ownership in case of a dispute, trace the marked document’s dissemination, identify a misappropriating person or simply inform user about the rights-holder. The primary motive of digital watermarking is to embed the data imperceptibly and robustly in the host information. Extensive counts of watermarking techniques have been developed to embed copyright marks or data in digital images, video, audio and other multimedia objects. With the development of digital video-based innovations, copyright dilemma for the multimedia industry increases. Video watermarking had been proposed in recent years to serve the issue of illicit copying and allocation of videos. It is the process of embedding copyright information in video bit streams. Practically video watermarking schemes have to address some serious challenges as compared to image watermarking schemes like real-time requirements in the video broadcasting, large volume of inherently redundant data between frames, the unbalance between the motion and motionless regions etc. and they are particularly vulnerable to attacks, for example, frame swapping, statistical analysis, rotation, noise, median and crop attacks. In this paper, an effective, robust and imperceptible video watermarking algorithm is proposed based on hybridization of powerful mathematical transforms; Fractional Fourier Transform (FrFT), Discrete Wavelet transforms (DWT) and Singular Value Decomposition (SVD) using redundant wavelet. This scheme utilizes various transforms for embedding watermarks on different layers by using Hybrid systems. For this purpose, the video frames are portioned into layers (RGB) and the watermark is being embedded in two forms in the video frames using SVD portioning of the watermark, and DWT sub-band decomposition of host video, to facilitate copyright safeguard as well as reliability. The FrFT orders are used as the encryption key that allows the watermarking method to be more robust against various attacks. The fidelity of the scheme is enhanced by introducing key generation and wavelet based key embedding watermarking scheme. Thus, for watermark embedding and extraction, same key is required. Therefore the key must be shared between the owner and the verifier via some safe network. This paper demonstrates the performance by considering different qualitative metrics namely Peak Signal to Noise ratio, Structure similarity index and correlation values and also apply some attacks to prove the robustness. The Experimental results are presented to demonstrate that the proposed scheme can withstand a variety of video processing attacks as well as imperceptibility.

Keywords: discrete wavelet transform, robustness, video watermarking, watermark

Procedia PDF Downloads 220