Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25386

Search results for: data interpolation

25236 Offshore Wind Assessment and Analysis for South Western Mediterranean Sea

Authors: Abdallah Touaibia, Nachida Kasbadji Merzouk, Mustapha Merzouk, Ryma Belarbi

Abstract:

accuracy assessment and a better understand of the wind resource distribution are the most important tasks for decision making before installing wind energy operating systems in a given region, there where our interest come to the Algerian coastline and its Mediterranean sea area. Despite its large coastline overlooking the border of Mediterranean Sea, there is still no strategy encouraging the development of offshore wind farms in Algerian waters. The present work aims to estimate the offshore wind fields for the Algerian Mediterranean Sea based on wind data measurements ranging from 1995 to 2018 provided of 24 years of measurement by seven observation stations focusing on three coastline cities in Algeria under a different measurement time step recorded from 30 min, 60 min, and 180 min variate from one to each other, two stations in Spain, two other ones in Italy and three in the coast of Algeria from the east Annaba, at the center Algiers, and to Oran taken place at the west of it. The idea behind consists to have multiple measurement points that helping to characterize this area in terms of wind potential by the use of interpolation method of their average wind speed values between these available data to achieve the approximate values of others locations where aren’t any available measurement because of the difficulties against the implementation of masts within the deep depth water. This study is organized as follow: first, a brief description of the studied area and its climatic characteristics were done. After that, the statistical properties of the recorded data were checked by evaluating wind histograms, direction roses, and average speeds using MatLab programs. Finally, ArcGIS and MapInfo soft-wares were used to establish offshore wind maps for better understanding the wind resource distribution, as well as to identify windy sites for wind farm installation and power management. The study pointed out that Cap Carbonara is the windiest site with an average wind speed of 7.26 m/s at 10 m, inducing a power density of 902 W/m², then the site of Cap Caccia with 4.88 m/s inducing a power density of 282 W/m². The average wind speed of 4.83 m/s is occurred for the site of Oran, inducing a power density of 230 W/m². The results indicated also that the dominant wind direction where the frequencies are highest for the site of Cap Carbonara is the West with 34%, an average wind speed of 9.49 m/s, and a power density of 1722 W/m². Then comes the site of Cap Caccia, where the prevailing wind direction is the North-west, about 20% and 5.82 m/s occurring a power density of 452 W/m². The site of Oran comes in third place with the North dominant direction with 32% inducing an average wind speed of 4.59 m/s and power density of 189 W/m². It also shown that the proposed method is either crucial in understanding wind resource distribution for revealing windy sites over a large area and more effective for wind turbines micro-siting.

Keywords: wind ressources, mediterranean sea, offshore, arcGIS, mapInfo, wind maps, wind farms

Procedia PDF Downloads 150

25235 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 274

25234 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 364

25233 Two-Dimensional Symmetric Half-Plane Recursive Doubly Complementary Digital Lattice Filters

Authors: Ju-Hong Lee, Chong-Jia Ciou, Yuan-Hau Yang

Abstract:

This paper deals with the problem of two-dimensional (2-D) recursive doubly complementary (DC) digital filter design. We present a structure of 2-D recursive DC filters by using 2-D symmetric half-plane (SHP) recursive digital all-pass lattice filters (DALFs). The novelty of using 2-D SHP recursive DALFs to construct a 2-D recursive DC digital lattice filter is that the resulting 2-D SHP recursive DC digital lattice filter provides better performance than the existing 2-D SHP recursive DC digital filter. Moreover, the proposed structure possesses a favorable 2-D DC half-band (DC-HB) property that allows about half of the 2-D SHP recursive DALF’s coefficients to be zero. This leads to considerable savings in computational burden for implementation. To ensure the stability of a designed 2-D SHP recursive DC digital lattice filter, some necessary constraints on the phase of the 2-D SHP recursive DALF during the design process are presented. Design of a 2-D diamond-shape decimation/interpolation filter is presented for illustration and comparison.

Keywords: all-pass digital filter, doubly complementary, lattice structure, symmetric half-plane digital filter, sampling rate conversion

Procedia PDF Downloads 438

25232 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 90

25231 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 334

25230 Adapting an Accurate Reverse-time Migration Method to USCT Imaging

Authors: Brayden Mi

Abstract:

Reverse time migration has been widely used in the Petroleum exploration industry to reveal subsurface images and to detect rock and fluid properties since the early 1980s. The seismic technology involves the construction of a velocity model through interpretive model construction, seismic tomography, or full waveform inversion, and the application of the reverse-time propagation of acquired seismic data and the original wavelet used in the acquisition. The methodology has matured from 2D, simple media to present-day to handle full 3D imaging challenges in extremely complex geological conditions. Conventional Ultrasound computed tomography (USCT) utilize travel-time-inversion to reconstruct the velocity structure of an organ. With the velocity structure, USCT data can be migrated with the “bend-ray” method, also known as migration. Its seismic application counterpart is called Kirchhoff depth migration, in which the source of reflective energy is traced by ray-tracing and summed to produce a subsurface image. It is well known that ray-tracing-based migration has severe limitations in strongly heterogeneous media and irregular acquisition geometries. Reverse time migration (RTM), on the other hand, fully accounts for the wave phenomena, including multiple arrives and turning rays due to complex velocity structure. It has the capability to fully reconstruct the image detectable in its acquisition aperture. The RTM algorithms typically require a rather accurate velocity model and demand high computing powers, and may not be applicable to real-time imaging as normally required in day-to-day medical operations. However, with the improvement of computing technology, such a computational bottleneck may not present a challenge in the near future. The present-day (RTM) algorithms are typically implemented from a flat datum for the seismic industry. It can be modified to accommodate any acquisition geometry and aperture, as long as sufficient illumination is provided. Such flexibility of RTM can be conveniently implemented for the application in USCT imaging if the spatial coordinates of the transmitters and receivers are known and enough data is collected to provide full illumination. This paper proposes an implementation of a full 3D RTM algorithm for USCT imaging to produce an accurate 3D acoustic image based on the Phase-shift-plus-interpolation (PSPI) method for wavefield extrapolation. In this method, each acquired data set (shot) is propagated back in time, and a known ultrasound wavelet is propagated forward in time, with PSPI wavefield extrapolation and a piece-wise constant velocity model of the organ (breast). The imaging condition is then applied to produce a partial image. Although each image is subject to the limitation of its own illumination aperture, the stack of multiple partial images will produce a full image of the organ, with a much-reduced noise level if compared with individual partial images.

Keywords: illumination, reverse time migration (RTM), ultrasound computed tomography (USCT), wavefield extrapolation

Procedia PDF Downloads 77

25229 Simulation of the Reactive Rotational Molding Using Smoothed Particle Hydrodynamics

Authors: A. Hamidi, S. Khelladi, L. Illoul, A. Tcharkhtchi

Abstract:

Reactive rotational molding (RRM) is a process to manufacture hollow plastic parts with reactive material has several advantages compared to conventional roto molding of thermoplastic powders: process cycle time is shorter; raw material is less expensive because polymerization occurs during processing and high-performance polymers may be used such as thermosets, thermoplastics or blends. However, several phenomena occur during this process which makes the optimization of the process quite complex. In this study, we have used a mixture of isocyanate and polyol as a reactive system. The chemical transformation of this system to polyurethane has been studied by thermal analysis and rheology tests. Thanks to these results of the curing process and rheological measurements, the kinetic and rheokinetik of polyurethane was identified. Smoothed Particle Hydrodynamics, a Lagrangian meshless method, was chosen to simulate reactive fluid flow in 2 and 3D configurations of the polyurethane during the process taking into account the chemical, and chemiorehological results obtained experimentally in this study.

Keywords: reactive rotational molding, simulation, smoothed particle hydrodynamics, surface tension, rheology, free surface flows, viscoelastic, interpolation

Procedia PDF Downloads 292

25228 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 377

25227 Unknown Groundwater Pollution Source Characterization in Contaminated Mine Sites Using Optimal Monitoring Network Design

Authors: H. K. Esfahani, B. Datta

Abstract:

Groundwater is one of the most important natural resources in many parts of the world; however it is widely polluted due to human activities. Currently, effective and reliable groundwater management and remediation strategies are obtained using characterization of groundwater pollution sources, where the measured data in monitoring locations are utilized to estimate the unknown pollutant source location and magnitude. However, accurately identifying characteristics of contaminant sources is a challenging task due to uncertainties in terms of predicting source flux injection, hydro-geological and geo-chemical parameters, and the concentration field measurement. Reactive transport of chemical species in contaminated groundwater systems, especially with multiple species, is a complex and highly non-linear geochemical process. Although sufficient concentration measurement data is essential to accurately identify sources characteristics, available data are often sparse and limited in quantity. Therefore, this inverse problem-solving method for characterizing unknown groundwater pollution sources is often considered ill-posed, complex and non- unique. Different methods have been utilized to identify pollution sources; however, the linked simulation-optimization approach is one effective method to obtain acceptable results under uncertainties in complex real life scenarios. With this approach, the numerical flow and contaminant transport simulation models are externally linked to an optimization algorithm, with the objective of minimizing the difference between measured concentration and estimated pollutant concentration at observation locations. Concentration measurement data are very important to accurately estimate pollution source properties; therefore, optimal design of the monitoring network is essential to gather adequate measured data at desired times and locations. Due to budget and physical restrictions, an efficient and effective approach for groundwater pollutant source characterization is to design an optimal monitoring network, especially when only inadequate and arbitrary concentration measurement data are initially available. In this approach, preliminary concentration observation data are utilized for preliminary source location, magnitude and duration of source activity identification, and these results are utilized for monitoring network design. Further, feedback information from the monitoring network is used as inputs for sequential monitoring network design, to improve the identification of unknown source characteristics. To design an effective monitoring network of observation wells, optimization and interpolation techniques are used. A simulation model should be utilized to accurately describe the aquifer properties in terms of hydro-geochemical parameters and boundary conditions. However, the simulation of the transport processes becomes complex when the pollutants are chemically reactive. Three dimensional transient flow and reactive contaminant transport process is considered. The proposed methodology uses HYDROGEOCHEM 5.0 (HGCH) as the simulation model for flow and transport processes with chemically multiple reactive species. Adaptive Simulated Annealing (ASA) is used as optimization algorithm in linked simulation-optimization methodology to identify the unknown source characteristics. Therefore, the aim of the present study is to develop a methodology to optimally design an effective monitoring network for pollution source characterization with reactive species in polluted aquifers. The performance of the developed methodology will be evaluated for an illustrative polluted aquifer sites, for example an abandoned mine site in Queensland, Australia.

Keywords: monitoring network design, source characterization, chemical reactive transport process, contaminated mine site

Procedia PDF Downloads 234

25226 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 95

25225 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 444

25224 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 183

25223 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 855

25222 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 86

25221 Cost Effective Real-Time Image Processing Based Optical Mark Reader

Authors: Amit Kumar, Himanshu Singal, Arnav Bhavsar

Abstract:

In this modern era of automation, most of the academic exams and competitive exams are Multiple Choice Questions (MCQ). The responses of these MCQ based exams are recorded in the Optical Mark Reader (OMR) sheet. Evaluation of the OMR sheet requires separate specialized machines for scanning and marking. The sheets used by these machines are special and costs more than a normal sheet. Available process is non-economical and dependent on paper thickness, scanning quality, paper orientation, special hardware and customized software. This study tries to tackle the problem of evaluating the OMR sheet without any special hardware and making the whole process economical. We propose an image processing based algorithm which can be used to read and evaluate the scanned OMR sheets with no special hardware required. It will eliminate the use of special OMR sheet. Responses recorded in normal sheet is enough for evaluation. The proposed system takes care of color, brightness, rotation, little imperfections in the OMR sheet images.

Keywords: OMR, image processing, hough circle trans-form, interpolation, detection, binary thresholding

Procedia PDF Downloads 177

25220 Considerations for Effectively Using Probability of Failure as a Means of Slope Design Appraisal for Homogeneous and Heterogeneous Rock Masses

Authors: Neil Bar, Andrew Heweston

Abstract:

Probability of failure (PF) often appears alongside factor of safety (FS) in design acceptance criteria for rock slope, underground excavation and open pit mine designs. However, the design acceptance criteria generally provide no guidance relating to how PF should be calculated for homogeneous and heterogeneous rock masses, or what qualifies a ‘reasonable’ PF assessment for a given slope design. Observational and kinematic methods were widely used in the 1990s until advances in computing permitted the routine use of numerical modelling. In the 2000s and early 2010s, PF in numerical models was generally calculated using the point estimate method. More recently, some limit equilibrium analysis software offer statistical parameter inputs along with Monte-Carlo or Latin-Hypercube sampling methods to automatically calculate PF. Factors including rock type and density, weathering and alteration, intact rock strength, rock mass quality and shear strength, the location and orientation of geologic structure, shear strength of geologic structure and groundwater pore pressure influence the stability of rock slopes. Significant engineering and geological judgment, interpretation and data interpolation is usually applied in determining these factors and amalgamating them into a geotechnical model which can then be analysed. Most factors are estimated ‘approximately’ or with allowances for some variability rather than ‘exactly’. When it comes to numerical modelling, some of these factors are then treated deterministically (i.e. as exact values), while others have probabilistic inputs based on the user’s discretion and understanding of the problem being analysed. This paper discusses the importance of understanding the key aspects of slope design for homogeneous and heterogeneous rock masses and how they can be translated into reasonable PF assessments where the data permits. A case study from a large open pit gold mine in a complex geological setting in Western Australia is presented to illustrate how PF can be calculated using different methods and obtain markedly different results. Ultimately sound engineering judgement and logic is often required to decipher the true meaning and significance (if any) of some PF results.

Keywords: probability of failure, point estimate method, Monte-Carlo simulations, sensitivity analysis, slope stability

Procedia PDF Downloads 210

25219 Design and Development of Real-Time Optimal Energy Management System for Hybrid Electric Vehicles

Authors: Masood Roohi, Amir Taghavipour

Abstract:

This paper describes a strategy to develop an energy management system (EMS) for a charge-sustaining power-split hybrid electric vehicle. This kind of hybrid electric vehicles (HEVs) benefit from the advantages of both parallel and series architecture. However, it gets relatively more complicated to manage power flow between the battery and the engine optimally. The applied strategy in this paper is based on nonlinear model predictive control approach. First of all, an appropriate control-oriented model which was accurate enough and simple was derived. Towards utilization of this controller in real-time, the problem was solved off-line for a vast area of reference signals and initial conditions and stored the computed manipulated variables inside look-up tables. Look-up tables take a little amount of memory. Also, the computational load dramatically decreased, because to find required manipulated variables the controller just needed a simple interpolation between tables.

Keywords: hybrid electric vehicles, energy management system, nonlinear model predictive control, real-time

Procedia PDF Downloads 355

25218 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 110

25217 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 182

25216 Enhancing the Bionic Eye: A Real-time Image Optimization Framework to Encode Color and Spatial Information Into Retinal Prostheses

Authors: William Huang

Abstract:

Retinal prostheses are currently limited to low resolution grayscale images that lack color and spatial information. This study develops a novel real-time image optimization framework and tools to encode maximum information to the prostheses which are constrained by the number of electrodes. One key idea is to localize main objects in images while reducing unnecessary background noise through region-contrast saliency maps. A novel color depth mapping technique was developed through MiniBatchKmeans clustering and color space selection. The resulting image was downsampled using bicubic interpolation to reduce image size while preserving color quality. In comparison to current schemes, the proposed framework demonstrated better visual quality in tested images. The use of the region-contrast saliency map showed improvements in efficacy up to 30%. Finally, the computational speed of this algorithm is less than 380 ms on tested cases, making real-time retinal prostheses feasible.

Keywords: retinal implants, virtual processing unit, computer vision, saliency maps, color quantization

Procedia PDF Downloads 157

25215 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad

Abstract:

Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 466

25214 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 505

25213 A Comprehensive Survey and Improvement to Existing Privacy Preserving Data Mining Techniques

Authors: Tosin Ige

Abstract:

Ethics must be a condition of the world, like logic. (Ludwig Wittgenstein, 1889-1951). As important as data mining is, it possess a significant threat to ethics, privacy, and legality, since data mining makes it difficult for an individual or consumer (in the case of a company) to control the accessibility and usage of his data. This research focuses on Current issues and the latest research and development on Privacy preserving data mining methods as at year 2022. It also discusses some advances in those techniques while at the same time highlighting and providing a new technique as a solution to an existing technique of privacy preserving data mining methods. This paper also bridges the wide gap between Data mining and the Web Application Programing Interface (web API), where research is urgently needed for an added layer of security in data mining while at the same time introducing a seamless and more efficient way of data mining.

Keywords: data, privacy, data mining, association rule, privacy preserving, mining technique

Procedia PDF Downloads 177

25212 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: big data, big data analytics, Hadoop, cloud

Procedia PDF Downloads 313

25211 Semantic Data Schema Recognition

Authors: Aïcha Ben Salem, Faouzi Boufares, Sebastiao Correia

Abstract:

The subject covered in this paper aims at assisting the user in its quality approach. The goal is to better extract, mix, interpret and reuse data. It deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.

Keywords: schema recognition, semantic data profiling, meta-categorisation, semantic dependencies inter columns

Procedia PDF Downloads 419

25210 Deep Learning-Based Classification of 3D CT Scans with Real Clinical Data; Impact of Image format

Authors: Maryam Fallahpoor, Biswajeet Pradhan

Abstract:

Background: Artificial intelligence (AI) serves as a valuable tool in mitigating the scarcity of human resources required for the evaluation and categorization of vast quantities of medical imaging data. When AI operates with optimal precision, it minimizes the demand for human interpretations and, thereby, reduces the burden on radiologists. Among various AI approaches, deep learning (DL) stands out as it obviates the need for feature extraction, a process that can impede classification, especially with intricate datasets. The advent of DL models has ushered in a new era in medical imaging, particularly in the context of COVID-19 detection. Traditional 2D imaging techniques exhibit limitations when applied to volumetric data, such as Computed Tomography (CT) scans. Medical images predominantly exist in one of two formats: neuroimaging informatics technology initiative (NIfTI) and digital imaging and communications in medicine (DICOM). Purpose: This study aims to employ DL for the classification of COVID-19-infected pulmonary patients and normal cases based on 3D CT scans while investigating the impact of image format. Material and Methods: The dataset used for model training and testing consisted of 1245 patients from IranMehr Hospital. All scans shared a matrix size of 512 × 512, although they exhibited varying slice numbers. Consequently, after loading the DICOM CT scans, image resampling and interpolation were performed to standardize the slice count. All images underwent cropping and resampling, resulting in uniform dimensions of 128 × 128 × 60. Resolution uniformity was achieved through resampling to 1 mm × 1 mm × 1 mm, and image intensities were confined to the range of (−1000, 400) Hounsfield units (HU). For classification purposes, positive pulmonary COVID-19 involvement was designated as 1, while normal images were assigned a value of 0. Subsequently, a U-net-based lung segmentation module was applied to obtain 3D segmented lung regions. The pre-processing stage included normalization, zero-centering, and shuffling. Four distinct 3D CNN models (ResNet152, ResNet50, DensNet169, and DensNet201) were employed in this study. Results: The findings revealed that the segmentation technique yielded superior results for DICOM images, which could be attributed to the potential loss of information during the conversion of original DICOM images to NIFTI format. Notably, ResNet152 and ResNet50 exhibited the highest accuracy at 90.0%, and the same models achieved the best F1 score at 87%. ResNet152 also secured the highest Area under the Curve (AUC) at 0.932. Regarding sensitivity and specificity, DensNet201 achieved the highest values at 93% and 96%, respectively. Conclusion: This study underscores the capacity of deep learning to classify COVID-19 pulmonary involvement using real 3D hospital data. The results underscore the significance of employing DICOM format 3D CT images alongside appropriate pre-processing techniques when training DL models for COVID-19 detection. This approach enhances the accuracy and reliability of diagnostic systems for COVID-19 detection.

Keywords: deep learning, COVID-19 detection, NIFTI format, DICOM format

Procedia PDF Downloads 93

25209 Establishment of a Classifier Model for Early Prediction of Acute Delirium in Adult Intensive Care Unit Using Machine Learning

Authors: Pei Yi Lin

Abstract:

Objective: The objective of this study is to use machine learning methods to build an early prediction classifier model for acute delirium to improve the quality of medical care for intensive care patients. Background: Delirium is a common acute and sudden disturbance of consciousness in critically ill patients. After the occurrence, it is easy to prolong the length of hospital stay and increase medical costs and mortality. In 2021, the incidence of delirium in the intensive care unit of internal medicine was as high as 59.78%, which indirectly prolonged the average length of hospital stay by 8.28 days, and the mortality rate is about 2.22% in the past three years. Therefore, it is expected to build a delirium prediction classifier through big data analysis and machine learning methods to detect delirium early. Method: This study is a retrospective study, using the artificial intelligence big data database to extract the characteristic factors related to delirium in intensive care unit patients and let the machine learn. The study included patients aged over 20 years old who were admitted to the intensive care unit between May 1, 2022, and December 31, 2022, excluding GCS assessment <4 points, admission to ICU for less than 24 hours, and CAM-ICU evaluation. The CAMICU delirium assessment results every 8 hours within 30 days of hospitalization are regarded as an event, and the cumulative data from ICU admission to the prediction time point are extracted to predict the possibility of delirium occurring in the next 8 hours, and collect a total of 63,754 research case data, extract 12 feature selections to train the model, including age, sex, average ICU stay hours, visual and auditory abnormalities, RASS assessment score, APACHE-II Score score, number of invasive catheters indwelling, restraint and sedative and hypnotic drugs. Through feature data cleaning, processing and KNN interpolation method supplementation, a total of 54595 research case events were extracted to provide machine learning model analysis, using the research events from May 01 to November 30, 2022, as the model training data, 80% of which is the training set for model training, and 20% for the internal verification of the verification set, and then from December 01 to December 2022 The CU research event on the 31st is an external verification set data, and finally the model inference and performance evaluation are performed, and then the model has trained again by adjusting the model parameters. Results: In this study, XG Boost, Random Forest, Logistic Regression, and Decision Tree were used to analyze and compare four machine learning models. The average accuracy rate of internal verification was highest in Random Forest (AUC=0.86), and the average accuracy rate of external verification was in Random Forest and XG Boost was the highest, AUC was 0.86, and the average accuracy of cross-validation was the highest in Random Forest (ACC=0.77). Conclusion: Clinically, medical staff usually conduct CAM-ICU assessments at the bedside of critically ill patients in clinical practice, but there is a lack of machine learning classification methods to assist ICU patients in real-time assessment, resulting in the inability to provide more objective and continuous monitoring data to assist Clinical staff can more accurately identify and predict the occurrence of delirium in patients. It is hoped that the development and construction of predictive models through machine learning can predict delirium early and immediately, make clinical decisions at the best time, and cooperate with PADIS delirium care measures to provide individualized non-drug interventional care measures to maintain patient safety, and then Improve the quality of care.

Keywords: critically ill patients, machine learning methods, delirium prediction, classifier model

Procedia PDF Downloads 80

25208 Investigation of Produced and Ground Water Contamination of Al Wahat Area South-Eastern Part of Sirt Basin, Libya

Authors: Khalifa Abdunaser, Salem Eljawashi

Abstract:

Study area is threatened by numerous petroleum activities. The most important risk is associated with dramatic dangers of misuse and oil and gas pollutions, such as significant volumes of produced water, which refers to waste water generated during the production of oil and natural gas and disposed on the surface surrounded oil and gas fields. This work concerns the impact of oil exploration and production activities on the physical and environment fate of the area, focusing on the investigation and observation of crude oil migration as toxic fluid. Its penetration in groundwater resulted from the produced water impacted by oilfield operations disposed to the earth surface in Al Wahat area. Describing the areal distribution of the dominant groundwater quality constituents has been conducted to identify the major hydro-geochemical processes that affect the quality of water and to evaluate the relations between rock types and groundwater flow to the quality and geochemistry of water in Post-Eocene aquifer. The chemical and physical characteristics of produced water, where it is produced, and its potential impacts on the environment and on oil and gas operations have been discussed. Field work survey was conducted to identify and locate a large number of monitoring wells previously drilled throughout the study area. Groundwater samples were systematically collected in order to detect the fate of spills resulting from the various activities at the oil fields in the study area. Spatial distribution maps of the water quality parameters were built using Kriging methods of interpolation in ArcMap software. Thematic maps were generated using GIS and remote sensing techniques, which were applied to include all these data layers as an active database for the area for the purpose of identifying hot spots and prioritizing locations based on their environmental conditions as well as for monitoring plans.

Keywords: Sirt Basin, produced water, Al Wahat area, Ground water

Procedia PDF Downloads 144

25207 Access Control System for Big Data Application

Authors: Winfred Okoe Addy, Jean Jacques Dominique Beraud

Abstract:

Access control systems (ACs) are some of the most important components in safety areas. Inaccuracies of regulatory frameworks make personal policies and remedies more appropriate than standard models or protocols. This problem is exacerbated by the increasing complexity of software, such as integrated Big Data (BD) software for controlling large volumes of encrypted data and resources embedded in a dedicated BD production system. This paper proposes a general access control strategy system for the diffusion of Big Data domains since it is crucial to secure the data provided to data consumers (DC). We presented a general access control circulation strategy for the Big Data domain by describing the benefit of using designated access control for BD units and performance and taking into consideration the need for BD and AC system. We then presented a generic of Big Data access control system to improve the dissemination of Big Data.

Keywords: access control, security, Big Data, domain

Procedia PDF Downloads 138