Search results for: heterogeneous massive data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25412

Search results for: heterogeneous massive data

24542 An Efficient Traceability Mechanism in the Audited Cloud Data Storage

Authors: Ramya P, Lino Abraham Varghese, S. Bose

Abstract:

By cloud storage services, the data can be stored in the cloud, and can be shared across multiple users. Due to the unexpected hardware/software failures and human errors, which make the data stored in the cloud be lost or corrupted easily it affected the integrity of data in cloud. Some mechanisms have been designed to allow both data owners and public verifiers to efficiently audit cloud data integrity without retrieving the entire data from the cloud server. But public auditing on the integrity of shared data with the existing mechanisms will unavoidably reveal confidential information such as identity of the person, to public verifiers. Here a privacy-preserving mechanism is proposed to support public auditing on shared data stored in the cloud. It uses group signatures to compute verification metadata needed to audit the correctness of shared data. The identity of the signer on each block in shared data is kept confidential from public verifiers, who are easily verifying shared data integrity without retrieving the entire file. But on demand, the signer of the each block is reveal to the owner alone. Group private key is generated once by the owner in the static group, where as in the dynamic group, the group private key is change when the users revoke from the group. When the users leave from the group the already signed blocks are resigned by cloud service provider instead of owner is efficiently handled by efficient proxy re-signature scheme.

Keywords: data integrity, dynamic group, group signature, public auditing

Procedia PDF Downloads 382
24541 Smart Automated Furrow Irrigation: A Preliminary Evaluation

Authors: Jasim Uddin, Rod Smith, Malcolm Gillies

Abstract:

Surface irrigation is the most popular irrigation method all over the world. However, two issues: low efficiency and huge labour involvement concern irrigators due to scarcity in recent years. To address these issues, a smart automated furrow is conceptualised that can be operated using digital devices like smartphone, iPad or computer and a preliminary evaluation was conducted in this study. The smart automated system is the integration of commercially available software and hardware. It includes real-time surface irrigation optimisation software (SISCO) and Rubicon Water’s surface irrigation automation hardware and software. The automated system consists of automatic water delivery system with 300 mm flexible pipes attached to both sides of a remotely controlled valve to operate the irrigation. A water level sensor to obtain the real-time inflow rate from the measured head in the channel, advance sensors to measure the advance time to particular points of an irrigated field, a solar-powered telemetry system including a base station to communicate all the field sensors with the main server. On the basis of field data, the software (SISCO) is optimised the ongoing irrigation and determine the optimum cut-off for particular irrigation and send this information to the control valve to stop the irrigation in a particular (cut-off) time. The preliminary evaluation shows that the automated surface irrigation worked reasonably well without manual intervention. The evaluation of farmers managed irrigation events show the potentials to save a significant amount of water and labour. A substantial amount of economic and social benefits are expected in rural industries by adopting this system. The future outcome of this work would be a fully tested commercial adaptive real-time furrow irrigation system able to compete with the pressurised alternative of centre pivot or lateral move machines on capital cost, water and labour savings but without the massive energy costs.

Keywords: furrow irrigation, smart automation, infiltration, SISCO, real-time irrigation, adoptive control

Procedia PDF Downloads 441
24540 Reasonable Adjustment for Students with Disabilities - Opportunities and Limits in Social Work Education

Authors: Bartelsen-Raemy Annabelle, Gerber Andrea

Abstract:

Objectives: The adoption of the UN Convention on the Rights of Persons with Disabilities has the effect that higher education institutions in Switzerland are called upon to promote inclusive university education. In this context, our School of Social Work aims to provide fair participation and the removal of barriers in our study programmes at bachelor’s and master’s levels. In 2015 we developed a concept of reasonable adjustments for students with disabilities and chronic illness as an instrument to provide equal opportunities for those students. We reviewed the implementation of this concept as part of our quality management process. Using a qualitative research design, we explored how affected students and lecturers experience the processes and measures taken and which barriers they still perceive. Methods: We captured subjective perspectives and experience of measures by conducting 15 problem-centred interviews with affected students and three experimental focus groups with lecturers. The data was processed using structured qualitative content analysis and summarised as key categories. Results: All respondents evaluated the concept of reasonable adjustment very positively and emphasised its importance for equal opportunities. Our analysis revealed differences in the usage and perception of both groups and showed that the students interviewed were a heterogeneous group with different needs. Overall, the students described the adjustments, in particular in relation to examinations and other assignments, as a great relief. The lecturers expressed high standards for their own teaching and supervision of students and, at the same time, wished for more support from the university. However, despite the positive evaluation by the lecturers, the limits of reasonable adjustment became evident. It is necessary to consider the limits of reasonable adjustments in terms of professional skills. Conclusion: Reasonable adjustments should, therefore, be seen as an element of an inclusive university culture that must be complemented by further measures. Taking this into account, we have planned further research as a basis for the development of a diversity and inclusion policy.

Keywords: opportunities and limits, reasonable adjustment, social work education, students with disabilities

Procedia PDF Downloads 119
24539 A Density Functional Theory Study of Metal-Porphyrin Graphene for CO2 Hydration

Authors: Manju Verma, Parag A. Deshpande

Abstract:

Electronic structure calculations of hydrogen terminated metal-porphyrin graphene were carried out to explore the catalytic activity for CO2 hydration reaction. A ruthenium atom was substituted in place of carbon atom of graphene and ruthenium chelated carbon atoms were replaced by four nitrogen atoms in metal-porphyrin graphene system. Ruthenium atom created the active site for CO2 hydration reaction. Ruthenium-porphyrin graphene followed the mechanism of carbonic anhydrase enzyme for CO2 conversion to HCO3- ion. CO2 hydration reaction over ruthenium-porphyrin graphene proceeded via the elementary steps: OH- formation from H2O dissociation, CO2 bending in presence of nucleophilic attack of OH- ion, HCO3- ion formation from proton migration, HCO3- ion desorption by H2O addition. Proton transfer to yield HCO3- ion was observed as a rate limiting step from free energy landscape.

Keywords: ruthenium-porphyrin graphene, CO2 hydration, carbonic anhydrase, heterogeneous catalyst, density functional theory

Procedia PDF Downloads 245
24538 Rodriguez Diego, Del Valle Martin, Hargreaves Matias, Riveros Jose Luis

Authors: Nathainail Bashir, Neil Anderson

Abstract:

The objective of this study site was to investigate the current state of the practice with regards to karst detection methods and recommend the best method and pattern of arrays to acquire the desire results. Proper site investigation in karst prone regions is extremely valuable in determining the location of possible voids. Two geophysical techniques were employed: multichannel analysis of surface waves (MASW) and electric resistivity tomography (ERT).The MASW data was acquired at each test location using different array lengths and different array orientations (to increase the probability of getting interpretable data in karst terrain). The ERT data were acquired using a dipole-dipole array consisting of 168 electrodes. The MASW data was interpreted (re: estimated depth to physical top of rock) and used to constrain and verify the interpretation of the ERT data. The ERT data indicates poorer quality MASW data were acquired in areas where there was significant local variation in the depth to top of rock.

Keywords: dipole-dipole, ERT, Karst terrains, MASW

Procedia PDF Downloads 305
24537 Cellulose Supported Heterogeneous Pd(II) Catalyst for Synthesis of Biaryls

Authors: Talat Baran

Abstract:

The Suzuki C(sp2)-C(sp2) coupling reaction is considered to be one of the best ways for the synthesis of biaryl compounds. There are many studies reporting the catalytic performance of palladium catalyst in Suzuki coupling reactions. Natural biopolymer (such as zeolite, carbon, silica, and chitosan) supporting catalysts have been lately attracted interest because of their low-cost, nontoxicity, and eco-friendliness. One of the most important natural biopolymer is cellulose, which is widely considered as an eco-friendly biopolymer due to its biodegradable, non-toxic and renewable nature. In this study, (1) cellulose supported Pd(II) catalyst was synthesized (2) its chemical structure was characterized by FT-IR, SEM/EDAX, XRD, TG-DTG, ICP-OES techniques (3) to investigate the performance of the catalyst in Suzuki coupling reactions by using microwave irradiation technique (4) reusability of the catalyst was done under optimum conditions. This cellulose supported Pd(II) catalyst exhibited high selectivity and efficiency in Suzuki coupling reactions under mild conditions (50°C). High TON and TOF values were recorded for the catalyst. Also, the reusability tests showed the catalysts could be used for several times in consequence of reusability tests.

Keywords: palladium, cellulose, Schiff base, reusability

Procedia PDF Downloads 242
24536 Data Science in Military Decision-Making: A Semi-Systematic Literature Review

Authors: H. W. Meerveld, R. H. A. Lindelauf

Abstract:

In contemporary warfare, data science is crucial for the military in achieving information superiority. Yet, to the authors’ knowledge, no extensive literature survey on data science in military decision-making has been conducted so far. In this study, 156 peer-reviewed articles were analysed through an integrative, semi-systematic literature review to gain an overview of the topic. The study examined to what extent literature is focussed on the opportunities or risks of data science in military decision-making, differentiated per level of war (i.e. strategic, operational, and tactical level). A relatively large focus on the risks of data science was observed in social science literature, implying that political and military policymakers are disproportionally influenced by a pessimistic view on the application of data science in the military domain. The perceived risks of data science are, however, hardly addressed in formal science literature. This means that the concerns on the military application of data science are not addressed to the audience that can actually develop and enhance data science models and algorithms. Cross-disciplinary research on both the opportunities and risks of military data science can address the observed research gaps. Considering the levels of war, relatively low attention for the operational level compared to the other two levels was observed, suggesting a research gap with reference to military operational data science. Opportunities for military data science mostly arise at the tactical level. On the contrary, studies examining strategic issues mostly emphasise the risks of military data science. Consequently, domain-specific requirements for military strategic data science applications are hardly expressed. Lacking such applications may ultimately lead to a suboptimal strategic decision in today’s warfare.

Keywords: data science, decision-making, information superiority, literature review, military

Procedia PDF Downloads 150
24535 Legal Regulation of Personal Information Data Transmission Risk Assessment: A Case Study of the EU’s DPIA

Authors: Cai Qianyi

Abstract:

In the midst of global digital revolution, the flow of data poses security threats that call China's existing legislative framework for protecting personal information into question. As a preliminary procedure for risk analysis and prevention, the risk assessment of personal data transmission lacks detailed guidelines for support. Existing provisions reveal unclear responsibilities for network operators and weakened rights for data subjects. Furthermore, the regulatory system's weak operability and a lack of industry self-regulation heighten data transmission hazards. This paper aims to compare the regulatory pathways for data information transmission risks between China and Europe from a legal framework and content perspective. It draws on the “Data Protection Impact Assessment Guidelines” to empower multiple stakeholders, including data processors, controllers, and subjects, while also defining obligations. In conclusion, this paper intends to solve China's digital security shortcomings by developing a more mature regulatory framework and industry self-regulation mechanisms, resulting in a win-win situation for personal data protection and the development of the digital economy.

Keywords: personal information data transmission, risk assessment, DPIA, internet service provider, personal information data transimission, risk assessment

Procedia PDF Downloads 47
24534 Wavelets Contribution on Textual Data Analysis

Authors: Habiba Ben Abdessalem

Abstract:

The emergence of giant set of textual data was the push that has encouraged researchers to invest in this field. The purpose of textual data analysis methods is to facilitate access to such type of data by providing various graphic visualizations. Applying these methods requires a corpus pretreatment step, whose standards are set according to the objective of the problem studied. This step determines the forms list contained in contingency table by keeping only those information carriers. This step may, however, lead to noisy contingency tables, so the use of wavelet denoising function. The validity of the proposed approach is tested on a text database that offers economic and political events in Tunisia for a well definite period.

Keywords: textual data, wavelet, denoising, contingency table

Procedia PDF Downloads 273
24533 Tale of Massive Distressed Migration from Rural to Urban Areas: A Study of Mumbai City

Authors: Vidya Yadav

Abstract:

Migration is the demographic process that links rural to urban areas, generating or spurring the growth of cities. Evidence shows the role of the city as a production processes. It looks the city as a power of centre, and a centre of change. It has been observed that not only the professionals want to settle down in an urban area but rural labourers are also coming to cities for employment. These are the people who are compelled to migrate to metropolises because of lack of employment opportunities in their place of residence. However, the cities also fail to provide adequate employment because of limited job opportunity creation and capital-intensive industrialization. So these masses of incoming migrants are force to take up whatever employment absorption is available to them particularly in urban informal activities. Ultimately with this informal job they are compelled to stay in the slum areas, which is another form of deprived housing colonies. The paper seeks to examine the evidences of poverty induced migration from rural to urban areas (particularly in urban agglomeration). The present paper utilizes an abundant rich source of census migration data (D-Series) of 1991-2001. Result shows that Mumbai remain as the most attractive place to migrate. The migrants are mainly from the major states like Uttar Pradesh, Bihar, West Bengal, Jharkhand, Odisha, and Rajasthan. Male dominated migration is related mostly for employment and females due to marriages. The picture of occupational absorption of migrants who moved for employment, cross classified with educational status. Result shows that illiterate males are primarily engaged in low grade production processing work. Illiterate’s females engaged in service sectors; but these are actually very low grade services in urban informal sectors in India like maid servants, domestic help, hawkers, vendors or vegetables sellers. Among the higher educational level, a small percentage of males and females got absorbed in professional or clerical work but the percentage has been increased in the period 1991-2001.

Keywords: informal, job, migration, urban

Procedia PDF Downloads 272
24532 Developing a Cloud Intelligence-Based Energy Management Architecture Facilitated with Embedded Edge Analytics for Energy Conservation in Demand-Side Management

Authors: Yu-Hsiu Lin, Wen-Chun Lin, Yen-Chang Cheng, Chia-Ju Yeh, Yu-Chuan Chen, Tai-You Li

Abstract:

Demand-Side Management (DSM) has the potential to reduce electricity costs and carbon emission, which are associated with electricity used in the modern society. A home Energy Management System (EMS) commonly used by residential consumers in a down-stream sector of a smart grid to monitor, control, and optimize energy efficiency to domestic appliances is a system of computer-aided functionalities as an energy audit for residential DSM. Implementing fault detection and classification to domestic appliances monitored, controlled, and optimized is one of the most important steps to realize preventive maintenance, such as residential air conditioning and heating preventative maintenance in residential/industrial DSM. In this study, a cloud intelligence-based green EMS that comes up with an Internet of Things (IoT) technology stack for residential DSM is developed. In the EMS, Arduino MEGA Ethernet communication-based smart sockets that module a Real Time Clock chip to keep track of current time as timestamps via Network Time Protocol are designed and implemented for readings of load phenomena reflecting on voltage and current signals sensed. Also, a Network-Attached Storage providing data access to a heterogeneous group of IoT clients via Hypertext Transfer Protocol (HTTP) methods is configured to data stores of parsed sensor readings. Lastly, a desktop computer with a WAMP software bundle (the Microsoft® Windows operating system, Apache HTTP Server, MySQL relational database management system, and PHP programming language) serves as a data science analytics engine for dynamic Web APP/REpresentational State Transfer-ful web service of the residential DSM having globally-Advanced Internet of Artificial Intelligence (AI)/Computational Intelligence. Where, an abstract computing machine, Java Virtual Machine, enables the desktop computer to run Java programs, and a mash-up of Java, R language, and Python is well-suited and -configured for AI in this study. Having the ability of sending real-time push notifications to IoT clients, the desktop computer implements Google-maintained Firebase Cloud Messaging to engage IoT clients across Android/iOS devices and provide mobile notification service to residential/industrial DSM. In this study, in order to realize edge intelligence that edge devices avoiding network latency and much-needed connectivity of Internet connections for Internet of Services can support secure access to data stores and provide immediate analytical and real-time actionable insights at the edge of the network, we upgrade the designed and implemented smart sockets to be embedded AI Arduino ones (called embedded AIduino). With the realization of edge analytics by the proposed embedded AIduino for data analytics, an Arduino Ethernet shield WizNet W5100 having a micro SD card connector is conducted and used. The SD library is included for reading parsed data from and writing parsed data to an SD card. And, an Artificial Neural Network library, ArduinoANN, for Arduino MEGA is imported and used for locally-embedded AI implementation. The embedded AIduino in this study can be developed for further applications in manufacturing industry energy management and sustainable energy management, wherein in sustainable energy management rotating machinery diagnostics works to identify energy loss from gross misalignment and unbalance of rotating machines in power plants as an example.

Keywords: demand-side management, edge intelligence, energy management system, fault detection and classification

Procedia PDF Downloads 245
24531 Customer Churn Analysis in Telecommunication Industry Using Data Mining Approach

Authors: Burcu Oralhan, Zeki Oralhan, Nilsun Sariyer, Kumru Uyar

Abstract:

Data mining has been becoming more and more important and a wide range of applications in recent years. Data mining is the process of find hidden and unknown patterns in big data. One of the applied fields of data mining is Customer Relationship Management. Understanding the relationships between products and customers is crucial for every business. Customer Relationship Management is an approach to focus on customer relationship development, retention and increase on customer satisfaction. In this study, we made an application of a data mining methods in telecommunication customer relationship management side. This study aims to determine the customers profile who likely to leave the system, develop marketing strategies, and customized campaigns for customers. Data are clustered by applying classification techniques for used to determine the churners. As a result of this study, we will obtain knowledge from international telecommunication industry. We will contribute to the understanding and development of this subject in Customer Relationship Management.

Keywords: customer churn analysis, customer relationship management, data mining, telecommunication industry

Procedia PDF Downloads 303
24530 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis

Authors: N. R. N. Idris, S. Baharom

Abstract:

A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates. On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.

Keywords: aggregate data, combined-level data, individual patient data, meta-analysis

Procedia PDF Downloads 363
24529 Analyzing On-Line Process Data for Industrial Production Quality Control

Authors: Hyun-Woo Cho

Abstract:

The monitoring of industrial production quality has to be implemented to alarm early warning for unusual operating conditions. Furthermore, identification of their assignable causes is necessary for a quality control purpose. For such tasks many multivariate statistical techniques have been applied and shown to be quite effective tools. This work presents a process data-based monitoring scheme for production processes. For more reliable results some additional steps of noise filtering and preprocessing are considered. It may lead to enhanced performance by eliminating unwanted variation of the data. The performance evaluation is executed using data sets from test processes. The proposed method is shown to provide reliable quality control results, and thus is more effective in quality monitoring in the example. For practical implementation of the method, an on-line data system must be available to gather historical and on-line data. Recently large amounts of data are collected on-line in most processes and implementation of the current scheme is feasible and does not give additional burdens to users.

Keywords: detection, filtering, monitoring, process data

Procedia PDF Downloads 548
24528 A Review of Travel Data Collection Methods

Authors: Muhammad Awais Shafique, Eiji Hato

Abstract:

Household trip data is of crucial importance for managing present transportation infrastructure as well as to plan and design future facilities. It also provides basis for new policies implemented under Transportation Demand Management. The methods used for household trip data collection have changed with passage of time, starting with the conventional face-to-face interviews or paper-and-pencil interviews and reaching to the recent approach of employing smartphones. This study summarizes the step-wise evolution in the travel data collection methods. It provides a comprehensive review of the topic, for readers interested to know the changing trends in the data collection field.

Keywords: computer, smartphone, telephone, travel survey

Procedia PDF Downloads 304
24527 Nonlinear Propagation of Acoustic Soliton Waves in Dense Quantum Electron-Positron Magnetoplasma

Authors: A. Abdikian

Abstract:

Propagation of nonlinear acoustic wave in dense electron-positron (e-p) plasmas in the presence of an external magnetic field and stationary ions (to neutralize the plasma background) is studied. By means of the quantum hydrodynamics model and applying the reductive perturbation method, the Zakharov-Kuznetsov equation is derived. Using the bifurcation theory of planar dynamical systems, the compressive structure of electrostatic solitary wave and periodic travelling waves is found. The numerical results show how the ion density ratio, the ion cyclotron frequency, and the direction cosines of the wave vector affect the nonlinear electrostatic travelling waves. The obtained results may be useful to better understand the obliquely nonlinear electrostatic travelling wave of small amplitude localized structures in dense magnetized quantum e-p plasmas and may be applicable to study the particle and energy transport mechanism in compact stars such as the interior of massive white dwarfs etc.

Keywords: bifurcation theory, phase portrait, magnetized electron-positron plasma, the Zakharov-Kuznetsov equation

Procedia PDF Downloads 236
24526 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain

Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami

Abstract:

To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. In the blockchain mechanism such as Bitcoin using PKI (Public Key Infrastructure), in order to confirm the identity of the company that has sent the data, the plaintext must be shared between the companies. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is a top secret. In this scenario, we show a implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.

Keywords: business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption

Procedia PDF Downloads 122
24525 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: cluster analysis, education, mathematics, profiles

Procedia PDF Downloads 115
24524 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators

Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros

Abstract:

Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.

Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis

Procedia PDF Downloads 127
24523 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm

Procedia PDF Downloads 134
24522 Canopy Temperature Acquired from Daytime and Nighttime Aerial Data as an Indicator of Trees’ Health Status

Authors: Agata Zakrzewska, Dominik Kopeć, Adrian Ochtyra

Abstract:

The growing number of new cameras, sensors, and research methods allow for a broader application of thermal data in remote sensing vegetation studies. The aim of this research was to check whether it is possible to use thermal infrared data with a spectral range (3.6-4.9 μm) obtained during the day and the night to assess the health condition of selected species of deciduous trees in an urban environment. For this purpose, research was carried out in the city center of Warsaw (Poland) in 2020. During the airborne data acquisition, thermal data, laser scanning, and orthophoto map images were collected. Synchronously with airborne data, ground reference data were obtained for 617 studied species (Acer platanoides, Acer pseudoplatanus, Aesculus hippocastanum, Tilia cordata, and Tilia × euchlora) in different health condition states. The results were as follows: (i) healthy trees are cooler than trees in poor condition and dying both in the daytime and nighttime data; (ii) the difference in the canopy temperatures between healthy and dying trees was 1.06oC of mean value on the nighttime data and 3.28oC of mean value on the daytime data; (iii) condition classes significantly differentiate on both daytime and nighttime thermal data, but only on daytime data all condition classes differed statistically significantly from each other. In conclusion, the aerial thermal data can be considered as an alternative to hyperspectral data, a method of assessing the health condition of trees in an urban environment. Especially data obtained during the day, which can differentiate condition classes better than data obtained at night. The method based on thermal infrared and laser scanning data fusion could be a quick and efficient solution for identifying trees in poor health that should be visually checked in the field.

Keywords: middle wave infrared, thermal imagery, tree discoloration, urban trees

Procedia PDF Downloads 106
24521 Directional Search for Dark Matter Using Nuclear Emulsion

Authors: Ali Murat Guler

Abstract:

A variety of experiments have been developed over the past decades, aiming at the detection of Weakly Interactive Massive Particles (WIMPs) via their scattering in an instrumented medium. The sensitivity of these experiments has improved with a tremendous speed, thanks to a constant development of detectors and analysis methods. Detectors capable of reconstructing the direction of the nuclear recoil induced by the WIMP scattering are opening a new frontier to possibly extend Dark Matter searches beyond the neutrino background. Measurement of WIMP’s direction will allow us to detect the galactic origin of dark matter and, therefore to have a clear signal-background separation. The NEWSdm experiment, based on nuclear emulsions, is intended to measure the direction of WIMP-induced nuclear coils with a solid-state detector, thus with high sensitivity. We discuss the discovery potential of a directional experiment based on the use of a solid target made of newly developed nuclear emulsions and novel read-out systems achieving nanometric resolution. We also report results of a technical test conducted in Gran Sasso.

Keywords: dark matter, direct detection, nuclear emulsion, WIMPS

Procedia PDF Downloads 267
24520 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: clustering, unsupervised learning, algorithms, hierarchical

Procedia PDF Downloads 877
24519 Adsorptive Removal of Cd(II) Ions from Aqueous Systems by Wood Ash-Alginate Composite Beads

Authors: Tichaona Nharingo, Hope Tauya, Mambo Moyo

Abstract:

Wood ash has been demonstrated to have favourable adsorption capacity for heavy metal ions but suffers the application problem of difficult to separate/isolate from the batch adsorption systems. Fabrication of wood ash beads using multifunctional group and non-toxic carbohydrate, alginate, may improve the applicability of wood ash in environmental pollutant remediation. In this work, alginate-wood ash beads (AWAB) were fabricated and applied to the removal of cadmium ions from aqueous systems. The beads were characterized by FTIR, TGA/DSC, SEM-EDX and their pHZPC before and after the adsorption of Cd(II) ions. Important adsorption parameters i.e. pH, AWAB dosage, contact time and ionic strength were optimized and the effect of initial concentration of Cd(II) ions to the adsorption process was established. Adsorption kinetics, adsorption isotherms, adsorption mechanism and application of AWAB to real water samples spiked with Cd(II) ions were ascertained. The composite adsorbent was characterized by a heterogeneous macro pore surface comprising of metal oxides, multiple hydroxyl groups and carbonyl groups that were involved in electrostatic interaction and Lewis acid-base interactions with the Cd(II) ions. The pseudo second order and the Freundlich isotherm models best fitted the adsorption kinetics and isotherm data respectively suggesting chemical sorption process and surface heterogeneity. The presence of Pb(II) ions inhibited the adsorption of Cd(II) ions (reduced by 40 %) attributed to the competition for the adsorption sites. The Cd(II) loaded beads could be regenerated using 0.1 M HCl and could be applied to four sorption-desorption cycles without significant loss in its initial adsorption capacity. The high maximum adsorption capacity, stability, selectivity and reusability of AWAB make the adsorbent ideal for application in the removal of Cd(II) ions from real water samples. Column type adsorption experiments need to be explored to establish the potential of the adsorbent in removing Cd(II) ions using continuous flow systems.

Keywords: adsorption, Cd(II) ions, regeneration, wastewater, wood ash-alginate beads

Procedia PDF Downloads 236
24518 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering

Authors: K. Umbleja, M. Ichino

Abstract:

Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.

Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis

Procedia PDF Downloads 154
24517 WiFi Data Offloading: Bundling Method in a Canvas Business Model

Authors: Majid Mokhtarnia, Alireza Amini

Abstract:

Mobile operators deal with increasing in the data traffic as a critical issue. As a result, a vital responsibility of the operators is to deal with such a trend in order to create added values. This paper addresses a bundling method in a Canvas business model in a WiFi Data Offloading (WDO) strategy by which some elements of the model may be affected. In the proposed method, it is supposed to sell a number of data packages for subscribers in which there are some packages with a free given volume of data-offloaded WiFi complimentary. The paper on hands analyses this method in the views of attractiveness and profitability. The results demonstrate that the quality of implementation of the WDO strongly affects the final result and helps the decision maker to make the best one.

Keywords: bundling, canvas business model, telecommunication, WiFi data offloading

Procedia PDF Downloads 188
24516 Daily Site Risks Associated with Construction Projects and On-spot Corrective Measurements: Case Study of Revamping Projects in Kuwait Oil Company Fields Area

Authors: Yousef S. Al-Othman

Abstract:

The growth and expansion of the industrial facilities comes proportional to the market increasing demand of products and services. Furthermore, raw material producers such as oil companies usually undergo massive revamping projects to maintain a synchronized supply. These revamping projects are usually delivered through challenging construction projects held and associated with daily site risks related to the construction process. Henceforth, a case study related to these risks and corresponding on-spot corrective measurements has been made on a certain number of construction project contractors at Kuwait Oil Company (KOC) to derive the benefits and overall effectiveness of the on-spot corrective measurements during the construction phase of a project, and how would the same help in avoiding major incidents, ensuring a smooth, cost effective and on time delivery of the project. Findings of this case study shall have an added value to the overall risk management process by minimizing the daily site risks that may affect the project lead time, resulting in an undisturbed on-site construction process.

Keywords: oil and gas, risk management, construction projects, project lead time

Procedia PDF Downloads 99
24515 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 421
24514 Physiological Effects on Scientist Astronaut Candidates: Hypobaric Training Assessment

Authors: Pedro Llanos, Diego García

Abstract:

This paper is addressed to expanding our understanding of the effects of hypoxia training on our bodies to better model its dynamics and leverage some of its implications and effects on human health. Hypoxia training is a recommended practice for military and civilian pilots that allow them to recognize their early hypoxia signs and symptoms, and Scientist Astronaut Candidates (SACs) who underwent hypobaric hypoxia (HH) exposure as part of a training activity for prospective suborbital flight applications. This observational-analytical study describes physiologic responses and symptoms experienced by a SAC group before, during and after HH exposure and proposes a model for assessing predicted versus observed physiological responses. A group of individuals with diverse Science Technology Engineering Mathematics (STEM) backgrounds conducted a hypobaric training session to an altitude up to 22,000 ft (FL220) or 6,705 meters, where heart rate (HR), breathing rate (BR) and core temperature (Tc) were monitored with the use of a chest strap sensor pre and post HH exposure. A pulse oximeter registered levels of saturation of oxygen (SpO2), number and duration of desaturations during the HH chamber flight. Hypoxia symptoms as described by the SACs during the HH training session were also registered. This data allowed to generate a preliminary predictive model of the oxygen desaturation and O2 pressure curve for each subject, which consists of a sixth-order polynomial fit during exposure, and a fifth or fourth-order polynomial fit during recovery. Data analysis showed that HR and BR showed no significant differences between pre and post HH exposure in most of the SACs, while Tc measures showed slight but consistent decrement changes. All subjects registered SpO2 greater than 94% for the majority of their individual HH exposures, but all of them presented at least one clinically significant desaturation (SpO2 < 85% for more than 5 seconds) and half of the individuals showed SpO2 below 87% for at least 30% of their HH exposure time. Finally, real time collection of HH symptoms presented temperature somatosensory perceptions (SP) for 65% of individuals, and task-focus issues for 52.5% of individuals as the most common HH indications. 95% of the subjects experienced HH onset symptoms below FL180; all participants achieved full recovery of HH symptoms within 1 minute of donning their O2 mask. The current HH study performed on this group of individuals suggests a rapid and fully reversible physiologic response after HH exposure as expected and obtained in previous studies. Our data showed consistent results between predicted versus observed SpO2 curves during HH suggesting a mathematical function that may be used to model HH performance deficiencies. During the HH study, real-time HH symptoms were registered providing evidenced SP and task focusing as the earliest and most common indicators. Finally, an assessment of HH signs of symptoms in a group of heterogeneous, non-pilot individuals showed similar results to previous studies in homogeneous populations of pilots.

Keywords: slow onset hypoxia, hypobaric chamber training, altitude sickness, symptoms and altitude, pressure cabin

Procedia PDF Downloads 108
24513 Analysing Techniques for Fusing Multimodal Data in Predictive Scenarios Using Convolutional Neural Networks

Authors: Philipp Ruf, Massiwa Chabbi, Christoph Reich, Djaffar Ould-Abdeslam

Abstract:

In recent years, convolutional neural networks (CNN) have demonstrated high performance in image analysis, but oftentimes, there is only structured data available regarding a specific problem. By interpreting structured data as images, CNNs can effectively learn and extract valuable insights from tabular data, leading to improved predictive accuracy and uncovering hidden patterns that may not be apparent in traditional structured data analysis. In applying a single neural network for analyzing multimodal data, e.g., both structured and unstructured information, significant advantages in terms of time complexity and energy efficiency can be achieved. Converting structured data into images and merging them with existing visual material offers a promising solution for applying CNN in multimodal datasets, as they often occur in a medical context. By employing suitable preprocessing techniques, structured data is transformed into image representations, where the respective features are expressed as different formations of colors and shapes. In an additional step, these representations are fused with existing images to incorporate both types of information. This final image is finally analyzed using a CNN.

Keywords: CNN, image processing, tabular data, mixed dataset, data transformation, multimodal fusion

Procedia PDF Downloads 108