Search results for: data fitting
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24284

Search results for: data fitting

24104 Complete Ensemble Empirical Mode Decomposition with Adaptive Noise Temporal Convolutional Network for Remaining Useful Life Prediction of Lithium Ion Batteries

Authors: Jing Zhao, Dayong Liu, Shihao Wang, Xinghua Zhu, Delong Li

Abstract:

Uhumanned Underwater Vehicles generally operate in the deep sea, which has its own unique working conditions. Lithium-ion power batteries should have the necessary stability and endurance for use as an underwater vehicle’s power source. Therefore, it is essential to accurately forecast how long lithium-ion batteries will last in order to maintain the system’s reliability and safety. In order to model and forecast lithium battery Remaining Useful Life (RUL), this research suggests a model based on Complete Ensemble Empirical Mode Decomposition with Adaptive noise-Temporal Convolutional Net (CEEMDAN-TCN). In this study, two datasets, NASA and CALCE, which have a specific gap in capacity data fluctuation, are used to verify the model and examine the experimental results in order to demonstrate the generalizability of the concept. The experiments demonstrate the network structure’s strong universality and ability to achieve good fitting outcomes on the test set for various battery dataset types. The evaluation metrics reveal that the CEEMDAN-TCN prediction performance of TCN is 25% to 35% better than that of a single neural network, proving that feature expansion and modal decomposition can both enhance the model’s generalizability and be extremely useful in industrial settings.

Keywords: lithium-ion battery, remaining useful life, complete EEMD with adaptive noise, temporal convolutional net

Procedia PDF Downloads 110
24103 A Model for Predicting Organic Compounds Concentration Change in Water Associated with Horizontal Hydraulic Fracturing

Authors: Ma Lanting, S. Eguilior, A. Hurtado, Juan F. Llamas Borrajo

Abstract:

Horizontal hydraulic fracturing is a technology to increase natural gas flow and improve productivity in the low permeability formation. During this drilling operation tons of flowback and produced water which contains many organic compounds return to the surface with a potential risk of influencing the surrounding environment and human health. A mathematical model is urgently needed to represent organic compounds in water transportation process behavior and the concentration change with time throughout the hydraulic fracturing operation life cycle. A comprehensive model combined Organic Matter Transport Dynamic Model with Two-Compartment First-order Model Constant (TFRC) Model has been established to quantify the organic compounds concentration. This algorithm model is composed of two transportation parts based on time factor. For the fast part, the curve fitting technique is applied using flowback water data from the Marcellus shale gas site fracturing and the coefficients of determination (R2) from all analyzed compounds demonstrate a high experimental feasibility of this numerical model. Furthermore, along a decade of drilling the concentration ratio curves have been estimated by the slow part of this model. The result shows that the larger value of Koc in chemicals, the later maximum concentration in water will reach, as well as all the maximum concentrations percentage would reach up to 90% of initial concentration from shale formation within a long sufficient period.

Keywords: model, shale gas, concentration, organic compounds

Procedia PDF Downloads 200
24102 Closest Possible Neighbor of a Different Class: Explaining a Model Using a Neighbor Migrating Generator

Authors: Hassan Eshkiki, Benjamin Mora

Abstract:

The Neighbor Migrating Generator is a simple and efficient approach to finding the closest potential neighbor(s) with a different label for a given instance and so without the need to calibrate any kernel settings at all. This allows determining and explaining the most important features that will influence an AI model. It can be used to either migrate a specific sample to the class decision boundary of the original model within a close neighborhood of that sample or identify global features that can help localising neighbor classes. The proposed technique works by minimizing a loss function that is divided into two components which are independently weighted according to three parameters α, β, and ω, α being self-adjusting. Results show that this approach is superior to past techniques when detecting the smallest changes in the feature space and may also point out issues in models like over-fitting.

Keywords: explainable AI, EX AI, feature importance, counterfactual explanations

Procedia PDF Downloads 120
24101 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation

Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen

Abstract:

Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.

Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning

Procedia PDF Downloads 36
24100 Obstacles to Accessible Tourism for People with Mental, Physical and Mobility Disability: A Case Study of North Cyprus

Authors: Marjan Kamyabi

Abstract:

Attending tourism in the current century is one of the key factors in the success of the tourism industry and, consequently, the prosperity of the economies of the countries. In this regard, accessible tourism can play a major role in the development of tourism, taking into account the attractions, facilities and capabilities of the development of tourism in Northern Cyprus, and given that the satisfaction of tourists from the product and destination of tourism has an undeniable role in attracting tourists. The purpose of this study is to investigate the environmental barriers and accessibility of the tourism industry in Northern Cyprus. Another goal of this study is to introduce this consumer group to the tourism community. In order to achieve the objectives of this paper, a questionnaire was designed and provided to three tourism professionals to assess the reliability, and then, among the 200 people with physical and mental disabilities who travelled to Cyprus, The data analysis was used as a confirmatory factor analysis method. The environmental barrier for tourists with disabilities is classified in three sections of transport, attractions and accommodation, each section being separately identified separately. In general, observance of the principles and standards of proper fitting in the main sectors of the tourism industry of Northern Cyprus in the situation The facilities and transportation were identified as the first problem and obstacle for the development of tourism for people with physical and mental disabilities and, finally, suggestions and solutions for the development of tourism for people with physical and physical disabilities were presented.

Keywords: accessible tourism, environmental barriers, tourism, people with disability, accessibility

Procedia PDF Downloads 154
24099 Investigation of the Effects of Sampling Frequency on the THD of 3-Phase Inverters Using Space Vector Modulation

Authors: Khattab Al Qaisi, Nicholas Bowring

Abstract:

This paper presents the simulation results of the effects of sampling frequency on the total harmonic distortion (THD) of three-phase inverters using the space vector pulse width modulation (SVPWM) and space vector control (SVC) algorithms. The relationship between the variables was studied using curve fitting techniques, and it has been shown that, for 50 Hz inverters, there is an exponential relation between the sampling frequency and THD up to around 8500 Hz, beyond which the performance of the model becomes irregular, and there is an negative exponential relation between the sampling frequency and the marginal improvement to the THD. It has also been found that the performance of SVPWM is better than that of SVC with the same sampling frequency in most frequency range, including the range where the performance of the former is irregular.

Keywords: DSI, SVPWM, THD, DC-AC converter, sampling frequency, performance

Procedia PDF Downloads 451
24098 Linear MIMO Model Identification Using an Extended Kalman Filter

Authors: Matthew C. Best

Abstract:

Linear Multi-Input Multi-Output (MIMO) dynamic models can be identified, with no a priori knowledge of model structure or order, using a new Generalised Identifying Filter (GIF). Based on an Extended Kalman Filter, the new filter identifies the model iteratively, in a continuous modal canonical form, using only input and output time histories. The filter’s self-propagating state error covariance matrix allows easy determination of convergence and conditioning, and by progressively increasing model order, the best fitting reduced-order model can be identified. The method is shown to be resistant to noise and can easily be extended to identification of smoothly nonlinear systems.

Keywords: system identification, Kalman filter, linear model, MIMO, model order reduction

Procedia PDF Downloads 566
24097 Effect of Boric Acid Content on the Structural and Optical Properties of In2O3 Films Prepared by Spray Pyrolysis Technique

Authors: Mustafa Öztas, Metin Bedir, Yahya Özdemir

Abstract:

Boron doped of In2O3 films were prepared by spray pyrolysis technique at 350 °C substrate temperature, which is a low cost and large area technique to be well-suited for the manufacture of solar cells, using boric acid (H3BO3) as dopant source, and their properties were investigated as a function of doping concentration. X-ray analysis showed that the films were polycrystalline fitting well with a hexagonal structure and have preferred orientation in (220) direction. The changes observed in the energy band gap and structural properties of the films related to the boric acid concentration are discussed in detail.

Keywords: spray pyrolysis, In2O3, boron, optical properties, boric acid

Procedia PDF Downloads 560
24096 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 158
24095 Modeling Stream Flow with Prediction Uncertainty by Using SWAT Hydrologic and RBNN Neural Network Models for Agricultural Watershed in India

Authors: Ajai Singh

Abstract:

Simulation of hydrological processes at the watershed outlet through modelling approach is essential for proper planning and implementation of appropriate soil conservation measures in Damodar Barakar catchment, Hazaribagh, India where soil erosion is a dominant problem. This study quantifies the parametric uncertainty involved in simulation of stream flow using Soil and Water Assessment Tool (SWAT), a watershed scale model and Radial Basis Neural Network (RBNN), an artificial neural network model. Both the models were calibrated and validated based on measured stream flow and quantification of the uncertainty in SWAT model output was assessed using ‘‘Sequential Uncertainty Fitting Algorithm’’ (SUFI-2). Though both the model predicted satisfactorily, but RBNN model performed better than SWAT with R2 and NSE values of 0.92 and 0.92 during training, and 0.71 and 0.70 during validation period, respectively. Comparison of the results of the two models also indicates a wider prediction interval for the results of the SWAT model. The values of P-factor related to each model shows that the percentage of observed stream flow values bracketed by the 95PPU in the RBNN model as 91% is higher than the P-factor in SWAT as 87%. In other words the RBNN model estimates the stream flow values more accurately and with less uncertainty. It could be stated that RBNN model based on simple input could be used for estimation of monthly stream flow, missing data, and testing the accuracy and performance of other models.

Keywords: SWAT, RBNN, SUFI 2, bootstrap technique, stream flow, simulation

Procedia PDF Downloads 327
24094 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 484
24093 The Influence of the Concentration and Temperature on the Rheological Behavior of Carbonyl-Methylcellulose

Authors: Mohamed Rabhi, Kouider Halim Benrahou

Abstract:

The rheological properties of the carbonyl-methylcellulose (CMC), of different concentrations (25000, 50000, 60000, 80000 and 100000 ppm) and different temperatures were studied. We found that the rheological behavior of all CMC solutions presents a pseudo-plastic behavior, it follows the model of Ostwald-de Waele. The objective of this work is the modeling of flow by the CMC Cross model. The Cross model gives us the variation of the viscosity according to the shear rate. This model allowed us to adjust more clearly the rheological characteristics of CMC solutions. A comparison between the Cross model and the model of Ostwald was made. Cross the model fitting parameters were determined by a numerical simulation to make an approach between the experimental curve and those given by the two models. Our study has shown that the model of Cross, describes well the flow of "CMC" for low concentrations.

Keywords: CMC, rheological modeling, Ostwald model, cross model, viscosity

Procedia PDF Downloads 355
24092 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 295
24091 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 422
24090 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 218
24089 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 245
24088 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 331
24087 Comparative Analysis of Real and Virtual Garment Fit

Authors: Kristina Ancutiene

Abstract:

The goal of this research is to perform comparative analysis between the virtual fit of the woman's dress and the fit on a real person. The dress fitting was done using mechanical and structural parameters of the 100 % linen fabric and using Modaris_3D_Fit software (CAD Lectra). The dress was also sawn after which garment fit differences of real and virtual dress was researched. Four respondents whose figures were similar were used to evaluate the ease and strain deformations of the real and virtual dress. The scores that were given by the respondents wearing the real dress were compared to the ease and strain results that were given by the software. The main result was that respondents feel similar to the virtual stretch deformations but their ease feeling is not always matching the virtual ones. The results may be influenced by psychological factors and different understanding about purpose of garment.

Keywords: virtual garment, 3D CAD, garment fit, mechanical properties

Procedia PDF Downloads 297
24086 A Spatial Approach to Model Mortality Rates

Authors: Yin-Yee Leong, Jack C. Yue, Hsin-Chung Wang

Abstract:

Human longevity has been experiencing its largest increase since the end of World War II, and modeling the mortality rates is therefore often the focus of many studies. Among all mortality models, the Lee–Carter model is the most popular approach since it is fairly easy to use and has good accuracy in predicting mortality rates (e.g., for Japan and the USA). However, empirical studies from several countries have shown that the age parameters of the Lee–Carter model are not constant in time. Many modifications of the Lee–Carter model have been proposed to deal with this problem, including adding an extra cohort effect and adding another period effect. In this study, we propose a spatial modification and use clusters to explain why the age parameters of the Lee–Carter model are not constant. In spatial analysis, clusters are areas with unusually high or low mortality rates than their neighbors, where the “location” of mortality rates is measured by age and time, that is, a 2-dimensional coordinate. We use a popular cluster detection method—Spatial scan statistics, a local statistical test based on the likelihood ratio test to evaluate where there are locations with mortality rates that cannot be described well by the Lee–Carter model. We first use computer simulation to demonstrate that the cluster effect is a possible source causing the problem of the age parameters not being constant. Next, we show that adding the cluster effect can solve the non-constant problem. We also apply the proposed approach to mortality data from Japan, France, the USA, and Taiwan. The empirical results show that our approach has better-fitting results and smaller mean absolute percentage errors than the Lee–Carter model.

Keywords: mortality improvement, Lee–Carter model, spatial statistics, cluster detection

Procedia PDF Downloads 145
24085 Strategic Planning in South African Higher Education

Authors: Noxolo Mafu

Abstract:

This study presents an overview of strategic planning in South African higher education institutions by tracing its trends and mystique in order to identify its impact. Over the democratic decades, strategic planning has become integral to institutional survival. It has been used as a potent tool by several institutions to catch up and surpass counterparts. While planning has always been part of higher education, strategic planning should be considered different. Strategic planning is primarily about development and maintenance of a strategic fitting between an institution and its dynamic opportunities. This presupposes existence of sets of stages that institutions pursue of which, can be regarded for assessment of the impact of strategic planning in an institution. The network theory serves guides the study in demystifying apparent organisational networks in strategic planning processes.

Keywords: network theory, strategy, planning, strategic planning, assessment, impact

Procedia PDF Downloads 521
24084 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 53
24083 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 300
24082 Improving Neonatal Abstinence Syndrome Assessments

Authors: Nancy Wilson

Abstract:

In utero, fetal drug exposure is prevalent amongst birthing facilities. Assessment tools for neonatal abstinence syndrome (NAS) are often cumbersome and ill-fitting, harboring immense subjectivity. This paradox often leads the clinical assessor to be hypervigilant when assessing the newborn for subtle symptoms of NAS, often mistaken for normal newborn behaviors. As a quality improvement initiative, this project led to a more adaptable NAS tool termed eat, sleep, console (ESC). This function-based NAS assessment scores the infant based on the ability to accomplish three basic newborn necessities- to sleep, to eat, and to be consoled. Literature supports that ESC methodology improves patient and family outcomes while providing more cost-effective care.

Keywords: neonatal abstinence syndrome, neonatal opioid withdrawal, maternal substance abuse, pregnancy, and addiction, Finnegan neonatal abstinence syndrome tool, eat, sleep, console

Procedia PDF Downloads 111
24081 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 344
24080 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 52
24079 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 389
24078 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 155
24077 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 815
24076 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 54
24075 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 81