Search results for: weather data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24809

Search results for: weather data

24299 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Authors: Usama Ahmed

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 132
24298 FSO Performance under High Solar Irradiation: Case Study Qatar

Authors: Syed Jawad Hussain, Abir Touati, Farid Touati

Abstract:

Free-Space Optics (FSO) is a wireless technology that enables the optical transmission of data though the air. FSO is emerging as a promising alternative or complementary technology to fiber optic and wireless radio-frequency (RF) links due to its high-bandwidth, robustness to EMI, and operation in unregulated spectrum. These systems are envisioned to be an essential part of future generation heterogeneous communication networks. Despite the vibrant advantages of FSO technology and the variety of its applications, its widespread adoption has been hampered by rather disappointing link reliability for long-range links due to atmospheric turbulence-induced fading and sensitivity to detrimental climate conditions. Qatar, with modest cloud coverage, high concentrations of airborne dust and high relative humidity particularly lies in virtually rainless sunny belt with a typical daily average solar radiation exceeding 6 kWh/m2 and 80-90% clear skies throughout the year. The specific objective of this work is to study for the first time in Qatar the effect of solar irradiation on the deliverability of the FSO Link. In order to analyze the transport media, we have ported Embedded Linux kernel on Field Programmable Gate Array (FPGA) and designed a network sniffer application that can run into FPGA. We installed new FSO terminals and configure and align them successively. In the reporting period, we carry out measurement and relate them to weather conditions.

Keywords: free space optics, solar irradiation, field programmable gate array, FSO outage

Procedia PDF Downloads 346
24297 Comprehensive Study of Data Science

Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly

Abstract:

Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.

Keywords: data science, machine learning, data analytics, artificial intelligence

Procedia PDF Downloads 59
24296 An Energy and Economic Comparison of Solar Thermal Collectors for Domestic Hot Water Applications

Authors: F. Ghani, T. S. O’Donovan

Abstract:

Today, the global solar thermal market is dominated by two collector types; the flat plate and evacuated tube collector. With regards to the number of installations worldwide, the evacuated tube collector is the dominant variant primarily due to the Chinese market but the flat plate collector dominates both the Australian and European markets. The market share of the evacuated tube collector is, however, growing in Australia due to a common belief that this collector type is ‘more efficient’ and, therefore, the better choice for hot water applications. In this study, we investigate this issue further to assess the validity of this statement. This was achieved by methodically comparing the performance and economics of several solar thermal systems comprising of; a low-performance flat plate collector, a high-performance flat collector, and an evacuated tube collector coupled with a storage tank and pump. All systems were simulated using the commercial software package Polysun for four climate zones in Australia to take into account different weather profiles in the study and subjected to a thermal load equivalent to a household comprising of four people. Our study revealed that the energy savings and payback periods varied significantly for systems operating under specific environmental conditions. Solar fractions ranged between 58 and 100 per cent, while payback periods range between 3.8 and 10.1 years. Although the evacuated tube collector was found to operate with a marginally higher thermal efficiency over the selective surface flat plate collector due to reduced ambient heat loss, the high-performance flat plate collector outperformed the evacuated tube collector on thermal yield. This result was obtained as the flat plate collector possesses a significantly higher absorber to gross collector area ratio over the evacuated tube collector. Furthermore, it was found for Australian regions operating with a high average solar radiation intensity and ambient temperature, the lower performance collector is the preferred choice due to favorable economics and reduced stagnation temperature. Our study has provided additional insight into the thermal performance and economics of the two prevalent solar thermal collectors currently available. A computational investigation has been carried out specifically for the Australian climate due to its geographic size and significant variation in weather. For domestic hot water applications were fluid temperatures between 50 and 60 degrees Celsius are sought, the flat plate collector is both technically and economically favorable over the evacuated tube collector. This research will be useful to system design engineers, solar thermal manufacturers, and those involved in policy to encourage the implementation of solar thermal systems into the hot water market.

Keywords: solar thermal, energy analysis, flat plate, evacuated tube, collector performance

Procedia PDF Downloads 201
24295 A Comprehensive Study of Spread Models of Wildland Fires

Authors: Manavjit Singh Dhindsa, Ursula Das, Kshirasagar Naik, Marzia Zaman, Richard Purcell, Srinivas Sampalli, Abdul Mutakabbir, Chung-Horng Lung, Thambirajah Ravichandran

Abstract:

These days, wildland fires, also known as forest fires, are more prevalent than ever. Wildfires have major repercussions that affect ecosystems, communities, and the environment in several ways. Wildfires lead to habitat destruction and biodiversity loss, affecting ecosystems and causing soil erosion. They also contribute to poor air quality by releasing smoke and pollutants that pose health risks, especially for individuals with respiratory conditions. Wildfires can damage infrastructure, disrupt communities, and cause economic losses. The economic impact of firefighting efforts, combined with their direct effects on forestry and agriculture, causes significant financial difficulties for the areas impacted. This research explores different forest fire spread models and presents a comprehensive review of various techniques and methodologies used in the field. A forest fire spread model is a computational or mathematical representation that is used to simulate and predict the behavior of a forest fire. By applying scientific concepts and data from empirical studies, these models attempt to capture the intricate dynamics of how a fire spreads, taking into consideration a variety of factors like weather patterns, topography, fuel types, and environmental conditions. These models assist authorities in understanding and forecasting the potential trajectory and intensity of a wildfire. Emphasizing the need for a comprehensive understanding of wildfire dynamics, this research explores the approaches, assumptions, and findings derived from various models. By using a comparison approach, a critical analysis is provided by identifying patterns, strengths, and weaknesses among these models. The purpose of the survey is to further wildfire research and management techniques. Decision-makers, researchers, and practitioners can benefit from the useful insights that are provided by synthesizing established information. Fire spread models provide insights into potential fire behavior, facilitating authorities to make informed decisions about evacuation activities, allocating resources for fire-fighting efforts, and planning for preventive actions. Wildfire spread models are also useful in post-wildfire mitigation strategies as they help in assessing the fire's severity, determining high-risk regions for post-fire dangers, and forecasting soil erosion trends. The analysis highlights the importance of customized modeling approaches for various circumstances and promotes our understanding of the way forest fires spread. Some of the known models in this field are Rothermel’s wildland fuel model, FARSITE, WRF-SFIRE, FIRETEC, FlamMap, FSPro, cellular automata model, and others. The key characteristics that these models consider include weather (includes factors such as wind speed and direction), topography (includes factors like landscape elevation), and fuel availability (includes factors like types of vegetation) among other factors. The models discussed are physics-based, data-driven, or hybrid models, also utilizing ML techniques like attention-based neural networks to enhance the performance of the model. In order to lessen the destructive effects of forest fires, this initiative aims to promote the development of more precise prediction tools and effective management techniques. The survey expands its scope to address the practical needs of numerous stakeholders. Access to enhanced early warning systems enables decision-makers to take prompt action. Emergency responders benefit from improved resource allocation strategies, strengthening the efficacy of firefighting efforts.

Keywords: artificial intelligence, deep learning, forest fire management, fire risk assessment, fire simulation, machine learning, remote sensing, wildfire modeling

Procedia PDF Downloads 66
24294 A Convolution Neural Network PM-10 Prediction System Based on a Dense Measurement Sensor Network in Poland

Authors: Piotr A. Kowalski, Kasper Sapala, Wiktor Warchalowski

Abstract:

PM10 is a suspended dust that primarily has a negative effect on the respiratory system. PM10 is responsible for attacks of coughing and wheezing, asthma or acute, violent bronchitis. Indirectly, PM10 also negatively affects the rest of the body, including increasing the risk of heart attack and stroke. Unfortunately, Poland is a country that cannot boast of good air quality, in particular, due to large PM concentration levels. Therefore, based on the dense network of Airly sensors, it was decided to deal with the problem of prediction of suspended particulate matter concentration. Due to the very complicated nature of this issue, the Machine Learning approach was used. For this purpose, Convolution Neural Network (CNN) neural networks have been adopted, these currently being the leading information processing methods in the field of computational intelligence. The aim of this research is to show the influence of particular CNN network parameters on the quality of the obtained forecast. The forecast itself is made on the basis of parameters measured by Airly sensors and is carried out for the subsequent day, hour after hour. The evaluation of learning process for the investigated models was mostly based upon the mean square error criterion; however, during the model validation, a number of other methods of quantitative evaluation were taken into account. The presented model of pollution prediction has been verified by way of real weather and air pollution data taken from the Airly sensor network. The dense and distributed network of Airly measurement devices enables access to current and archival data on air pollution, temperature, suspended particulate matter PM1.0, PM2.5, and PM10, CAQI levels, as well as atmospheric pressure and air humidity. In this investigation, PM2.5, and PM10, temperature and wind information, as well as external forecasts of temperature and wind for next 24h served as inputted data. Due to the specificity of the CNN type network, this data is transformed into tensors and then processed. This network consists of an input layer, an output layer, and many hidden layers. In the hidden layers, convolutional and pooling operations are performed. The output of this system is a vector containing 24 elements that contain prediction of PM10 concentration for the upcoming 24 hour period. Over 1000 models based on CNN methodology were tested during the study. During the research, several were selected out that give the best results, and then a comparison was made with the other models based on linear regression. The numerical tests carried out fully confirmed the positive properties of the presented method. These were carried out using real ‘big’ data. Models based on the CNN technique allow prediction of PM10 dust concentration with a much smaller mean square error than currently used methods based on linear regression. What's more, the use of neural networks increased Pearson's correlation coefficient (R²) by about 5 percent compared to the linear model. During the simulation, the R² coefficient was 0.92, 0.76, 0.75, 0.73, and 0.73 for 1st, 6th, 12th, 18th, and 24th hour of prediction respectively.

Keywords: air pollution prediction (forecasting), machine learning, regression task, convolution neural networks

Procedia PDF Downloads 128
24293 Analysis of Rainfall and Malaria Trends in Limpopo Province, South Africa

Authors: Abiodun M. Adeola, Hannes Rautenbach, Gbenga J. Abiodun, Thabo E. Makgoale, Joel O. Botai, Omolola M. Adisa, Christina M. Botai

Abstract:

There was a surge in malaria morbidity as well as mortality in 2016/2017 malaria season in malaria-endemic regions of South Africa. Rainfall is a major climatic driver of malaria transmission and has potential use for predicting malaria. Annual and seasonal trends and cross-correlation analyses were performed on time series of monthly total rainfall (derived from interpolated weather station data) and monthly malaria cases in five districts of Limpopo Province for the period of 1998 to 2017. The time series analysis indicated that an average of 629.5mm of rainfall was received over the period of study. The rainfall has an annual variation of about 0.46%. Rainfall amount varies among the five districts, with the north-eastern part receiving more rainfall. Spearman’s correlation analysis indicated that total monthly rainfall with one to two months lagged effect is significant in malaria transmission in all the five districts. The strongest correlation is noticed in Mopani (r=0.54; p-value = < 0.001), Vhembe (r=0.53; p-value = < 0.001), Waterberg (r=0.40; p-value = < 0.001), Capricorn (r=0.37; p-value = < 0.001) and lowest in Sekhukhune (r=0.36; p-value = < 0.001). More particularly, malaria morbidity showed a strong relationship with an episode of rainfall above 5-year running means of rainfall of 400 mm. Both annual and seasonal analyses showed that the effect of rainfall on malaria varied across the districts and it is seasonally dependent. Adequate understanding of climatic variables dynamics annually and seasonally is imperative in seeking answers to malaria morbidity among other factors, particularly in the wake of the sudden spike of the disease in the province.

Keywords: correlation, malaria, rainfall, seasonal, trends

Procedia PDF Downloads 206
24292 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 256
24291 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 174
24290 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 28
24289 Assessing Local Authorities’ Interest in Addressing Urban Challenges through Nature Based Solutions in Romania

Authors: Athanasios A. Gavrilidis, Mihai R. Nita, Larissa N. Stoia, Diana A. Onose

Abstract:

Contemporary global environmental challenges must be primarily addressed at local levels. Cities are under continuous pressure as they must ensure high quality of life levels for their citizens and at the same time to adapt and address specific environmental issues. Innovative solutions using natural features or mimicking natural systems are endorsed by the scientific community as efficient approaches for both mitigating climate change effects and the decrease of environmental quality and for maintaining high standards of living for urban dwellers. The aim of this study was to assess whether Romanian cities’ authorities are considering nature-based innovation as solutions for their planning, management, and environmental issues. Data were gathered by applying 140 questionnaires to urban authorities throughout the country. The questionnaire was designed for assessinglocal policy makers’ perspective over the efficiency of nature-based innovations as a tool to address specific challenges. It also focused on extracting data about financing sources and challenges they must overcome for adopting nature-based approaches. The gather results from the municipalities participating in our study were statistically processed, and they revealed that Romanian city managers acknowledge the benefits of nature-based innovations, but investments in this sector are not on top of their priorities. More than 90% of the selected cities have agreed that in the last 10 years, their major concern was to expand the grey infrastructure (roads and public amenities) using traditional approaches. When asked how they would react if faced with different socio-economic and environmental challenges, local urban managers indicated investments nature-based solutions as a priority only in case of biodiversity loss and extreme weather, while for other 14 proposed scenarios, they would embrace the business-as-usual approach. Our study indicates that while new concepts of sustainable urban planning emerge within the scientific community, local authorities need more time to understand and implement them. Without the proper knowledge, personnel, policies, or dedicated budgets, local administrators will not embrace nature-based innovations as solutions for their challenges.

Keywords: nature based innovations, perception analysis, policy making, urban planning

Procedia PDF Downloads 150
24288 Data Access, AI Intensity, and Scale Advantages

Authors: Chuping Lo

Abstract:

This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.

Keywords: digital intensity, digital divide, international trade, scale of economics

Procedia PDF Downloads 48
24287 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data

Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju

Abstract:

Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.

Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding

Procedia PDF Downloads 395
24286 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 234
24285 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 122
24284 The Kafrah Dam (The Oldest Dam in History)

Authors: Mohamed Bekhit Gad Khalil

Abstract:

This dam is the oldest dam in history. It was built by the ancient Egyptian around (2650 B.C) control flooding. It is believed to have been built between the third and fourth dynasties .It contains the oldest dam in history. Many studies have been conducted for the dam. This report was prepared under my supervision and in cooperation with the Ministry of Tourism and Antiquities. The dam was re-documented and photographed again. The dam on the northern side Consists of irregularly shaped stones of varying sizes used randomly. Sand and soil fill the gaps between the stones. creating layers to form the body of the dam. The eastern. side of the dam Consists of a series of regular shaped stones that have been cut and constructed into a stepped pyramid-like structure with width of (15,7) meters and height of (10) meters. The surface has significant erosion and wear on the stones due to weather Conditions. which has resulted in deep cavities in most of the stone blocks forming the surface.

Keywords: ministry of tourism and antiquities, excavations, registration, documentation

Procedia PDF Downloads 13
24283 Forecasting Model for Rainfall in Thailand: Case Study Nakhon Ratchasima Province

Authors: N. Sopipan

Abstract:

In this paper, we study of rainfall time series of weather stations in Nakhon Ratchasima province in Thailand using various statistical methods enabled to analyse the behaviour of rainfall in the study areas. Time-series analysis is an important tool in modelling and forecasting rainfall. ARIMA and Holt-Winter models based on exponential smoothing were built. All the models proved to be adequate. Therefore, could give information that can help decision makers establish strategies for proper planning of agriculture, drainage system and other water resource applications in Nakhon Ratchasima province. We found the best perform for forecasting is ARIMA(1,0,1)(1,0,1)12.

Keywords: ARIMA Models, exponential smoothing, Holt-Winter model

Procedia PDF Downloads 283
24282 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 115
24281 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 416
24280 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 77
24279 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network

Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan

Abstract:

Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.

Keywords: aggregation point, data communication, data aggregation, wireless sensor network

Procedia PDF Downloads 142
24278 Association between Noise Levels, Particulate Matter Concentrations and Traffic Intensities in a Near-Highway Urban Area

Authors: Mohammad Javad Afroughi, Vahid Hosseini, Jason S. Olfert

Abstract:

Both traffic-generated particles and noise have been associated with the development of cardiovascular diseases, especially in near-highway environments. Although noise and particulate matters (PM) have different mechanisms of dispersion, sharing the same emission source in urban areas (road traffics) can result in a similar degree of variability in their levels. This study investigated the temporal variation of and correlation between noise levels, PM concentrations and traffic intensities near a major highway in Tehran, Iran. Tehran particulate concentration is highly influenced by road traffic. Additionally, Tehran ultrafine particles (UFP, PM<0.1 µm) are mostly emitted from combustion processes of motor vehicles. This gives a high possibility of a strong association between traffic-related noise and UFP in near-highway environments of this megacity. Hourly average of equivalent continuous sound pressure level (Leq), total number concentration of UFPs, mass concentration of PM2.5 and PM10, as well as traffic count and speed were simultaneously measured over a period of three days in winter. Additionally, meteorological data including temperature, relative humidity, wind speed and direction were collected in a weather station, located 3 km from the monitoring site. Noise levels showed relatively low temporal variability in near-highway environments compared to PM concentrations. Hourly average of Leq ranged from 63.8 to 69.9 dB(A) (mean ~ 68 dB(A)), while hourly concentration of particles varied from 30,800 to 108,800 cm-3 for UFP (mean ~ 64,500 cm-3), 41 to 75 µg m-3 for PM2.5 (mean ~ 53 µg m-3), and 62 to 112 µg m-3 for PM10 (mean ~ 88 µg m-3). The Pearson correlation coefficient revealed strong relationship between noise and UFP (r ~ 0.61) overall. Under downwind conditions, UFP number concentration showed the strongest association with noise level (r ~ 0.63). The coefficient decreased to a lesser degree under upwind conditions (r ~ 0.24) due to the significant role of wind and humidity in UFP dynamics. Furthermore, PM2.5 and PM10 correlated moderately with noise (r ~ 0.52 and 0.44 respectively). In general, traffic counts were more strongly associated with noise and PM compared to traffic speeds. It was concluded that noise level combined with meteorological data can be used as a proxy to estimate PM concentrations (specifically UFP number concentration) in near-highway environments of Tehran. However, it is important to measure joint variability of noise and particles to study their health effects in epidemiological studies.

Keywords: noise, particulate matter, PM10, PM2.5, ultrafine particle

Procedia PDF Downloads 173
24277 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data

Procedia PDF Downloads 572
24276 Development of a Framework for Assessing Public Health Risk Due to Pluvial Flooding: A Case Study of Sukhumvit, Bangkok

Authors: Pratima Pokharel

Abstract:

When sewer overflow due to rainfall in urban areas, this leads to public health risks when an individual is exposed to that contaminated floodwater. Nevertheless, it is still unclear the extent to which the infections pose a risk to public health. This study analyzed reported diarrheal cases by month and age in Bangkok, Thailand. The results showed that the cases are reported higher in the wet season than in the dry season. It was also found that in Bangkok, the probability of infection with diarrheal diseases in the wet season is higher for the age group between 15 to 44. However, the probability of infection is highest for kids under 5 years, but they are not influenced by wet weather. Further, this study introduced a vulnerability that leads to health risks from urban flooding. This study has found some vulnerability variables that contribute to health risks from flooding. Thus, for vulnerability analysis, the study has chosen two variables, economic status, and age, that contribute to health risk. Assuming that the people's economic status depends on the types of houses they are living in, the study shows the spatial distribution of economic status in the vulnerability maps. The vulnerability map result shows that people living in Sukhumvit have low vulnerability to health risks with respect to the types of houses they are living in. In addition, from age the probability of infection of diarrhea was analyzed. Moreover, a field survey was carried out to validate the vulnerability of people. It showed that health vulnerability depends on economic status, income level, and education. The result depicts that people with low income and poor living conditions are more vulnerable to health risks. Further, the study also carried out 1D Hydrodynamic Advection-Dispersion modelling with 2-year rainfall events to simulate the dispersion of fecal coliform concentration in the drainage network as well as 1D/2D Hydrodynamic model to simulate the overland flow. The 1D result represents higher concentrations for dry weather flows and a large dilution of concentration on the commencement of a rainfall event, resulting in a drop of the concentration due to runoff generated after rainfall, whereas the model produced flood depth, flood duration, and fecal coliform concentration maps, which were transferred to ArcGIS to produce hazard and risk maps. In addition, the study also simulates the 5-year and 10-year rainfall simulations to show the variation in health hazards and risks. It was found that even though the hazard coverage is very high with a 10-year rainfall events among three rainfall events, the risk was observed to be the same with a 5-year and 10-year rainfall events.

Keywords: urban flooding, risk, hazard, vulnerability, health risk, framework

Procedia PDF Downloads 51
24275 A NoSQL Based Approach for Real-Time Managing of Robotics's Data

Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir

Abstract:

This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.

Keywords: NoSQL databases, database management systems, robotics, big data

Procedia PDF Downloads 330
24274 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 165
24273 A Refrigerated Condition for the Storage of Glucose Test Strips at Health Promoting Hospitals: An Implication for Hospitals with Limited Air Conditioners

Authors: Wanutchaya Duanginta, Napaporn Apiratmateekul, Tippawan Sangkaew, Sunaree Wekinhirun, Kunchit Kongros, Wanvisa Treebuphachatsakul

Abstract:

Thailand has a tropical climate with an average outdoor ambient air temperature of over 30°C, which can exceed manufacturer recommendations for the storage of glucose test strips. This study monitored temperature and humidity at actual sites of five sub-district health promoting hospitals (HPH) in Phitsanulok Province for the storage of glucose test strips in refrigerated conditions. Five calibrated data loggers were placed at the actual sites for glucose test strip storage at five HPHs for 8 weeks between April and June. For the stress test, two lot numbers of glucose test strips, each with two glucose meters, were kept in a plastic box with desiccants and placed in a refrigerator with the temperature calibrated to 4°C and at room temperature (RT). Temperature and humidity in the refrigerator and at RT were measured every hour for 30 days. The mean temperature for storing test strips at the five HPHs ranged from 29°C to 33°C, and three of the five HPHs (60%) had a mean temperature above 30°C. The refrigerator temperatures were 3.8 ± 2.0°C (2.0°C to 6.5°C), and relative humidity was 51 ± 2% (42 to 54%). The maximum of blood glucose testing by glucose meters when the test strips were stored in a refrigerator were not significantly different (p > 0.05) from unstressed test strips for both glucose meters using amperometry-GDH-PQQ and amperometry-GDH-FAD principles. Opening the test strip vial daily resulted in higher variation than when refrigerated after a single-use. However, the variations were still within an acceptable range. This study concludes that glucose tested strips can be stored in plastic boxes in a refrigerator if it is well-controlled for temperature and humidity. Storage of glucose-tested strips in the refrigerator during hot and humid weather may be useful for HPHs with limited air conditioners.

Keywords: environmental stressed test, thermal stressed test, quality control, point-of-care testing

Procedia PDF Downloads 178
24272 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data

Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim

Abstract:

Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.

Keywords: activity pattern, data fusion, smart-card, XGboost

Procedia PDF Downloads 225
24271 A Two Stage Stochastic Mathematical Model for the Tramp Ship Routing with Time Windows Problem

Authors: Amin Jamili

Abstract:

Nowadays, the majority of international trade in goods is carried by sea, and especially by ships deployed in the industrial and tramp segments. This paper addresses routing the tramp ships and determining the schedules including the arrival times to the ports, berthing times at the ports, and the departure times in an operational planning level. In the operational planning level, the weather can be almost exactly forecasted, however in some routes some uncertainties may remain. In this paper, the voyaging times between some of the ports are considered to be uncertain. To that end, a two-stage stochastic mathematical model is proposed. Moreover, a case study is tested with the presented model. The computational results show that this mathematical model is promising and can represent acceptable solutions.

Keywords: routing, scheduling, tram ships, two stage stochastic model, uncertainty

Procedia PDF Downloads 422
24270 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: mutex task generation, data augmentation, meta-learning, text classification.

Procedia PDF Downloads 119