Search results for: data clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24972

Search results for: data clustering

24222 A Policy Strategy for Building Energy Data Management in India

Authors: Shravani Itkelwar, Deepak Tewari, Bhaskar Natarajan

Abstract:

The energy consumption data plays a vital role in energy efficiency policy design, implementation, and impact assessment. Any demand-side energy management intervention's success relies on the availability of accurate, comprehensive, granular, and up-to-date data on energy consumption. The Building sector, including residential and commercial, is one of the largest consumers of energy in India after the Industrial sector. With economic growth and increasing urbanization, the building sector is projected to grow at an unprecedented rate, resulting in a 5.6 times escalation in energy consumption till 2047 compared to 2017. Therefore, energy efficiency interventions will play a vital role in decoupling the floor area growth and associated energy demand, thereby increasing the need for robust data. In India, multiple institutions are involved in the collection and dissemination of data. This paper focuses on energy consumption data management in the building sector in India for both residential and commercial segments. It evaluates the robustness of data available through administrative and survey routes to estimate the key performance indicators and identify critical data gaps for making informed decisions. The paper explores several issues in the data, such as lack of comprehensiveness, non-availability of disaggregated data, the discrepancy in different data sources, inconsistent building categorization, and others. The identified data gaps are justified with appropriate examples. Moreover, the paper prioritizes required data in order of relevance to policymaking and groups it into "available," "easy to get," and "hard to get" categories. The paper concludes with recommendations to address the data gaps by leveraging digital initiatives, strengthening institutional capacity, institutionalizing exclusive building energy surveys, and standardization of building categorization, among others, to strengthen the management of building sector energy consumption data.

Keywords: energy data, energy policy, energy efficiency, buildings

Procedia PDF Downloads 180
24221 A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures

Authors: Silvina Caíno-Lores, Jesús Carretero

Abstract:

Large scale computing infrastructures have been widely developed with the core objective of providing a suitable platform for high-performance and high-throughput computing. These systems are designed to support resource-intensive and complex applications, which can be found in many scientific and industrial areas. Currently, large scale data-intensive applications are hindered by the high latencies that result from the access to vastly distributed data. Recent works have suggested that improving data locality is key to move towards exascale infrastructures efficiently, as solutions to this problem aim to reduce the bandwidth consumed in data transfers, and the overheads that arise from them. There are several techniques that attempt to move computations closer to the data. In this survey we analyse the different mechanisms that have been proposed to provide data locality for large scale high-performance and high-throughput systems. This survey intends to assist scientific computing community in understanding the various technical aspects and strategies that have been reported in recent literature regarding data locality. As a result, we present an overview of locality-oriented techniques, which are grouped in four main categories: application development, task scheduling, in-memory computing and storage platforms. Finally, the authors include a discussion on future research lines and synergies among the former techniques.

Keywords: data locality, data-centric computing, large scale infrastructures, cloud computing

Procedia PDF Downloads 254
24220 Wind Speed Data Analysis in Colombia in 2013 and 2015

Authors: Harold P. Villota, Alejandro Osorio B.

Abstract:

The energy meteorology is an area for study energy complementarity and the use of renewable sources in interconnected systems. Due to diversify the energy matrix in Colombia with wind sources, is necessary to know the data bases about this one. However, the time series given by 260 automatic weather stations have empty, and no apply data, so the purpose is to fill the time series selecting two years to characterize, impute and use like base to complete the data between 2005 and 2020.

Keywords: complementarity, wind speed, renewable, colombia, characteri, characterization, imputation

Procedia PDF Downloads 161
24219 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho

Abstract:

Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques

Procedia PDF Downloads 382
24218 A Risk Assessment Tool for the Contamination of Aflatoxins on Dried Figs Based on Machine Learning Algorithms

Authors: Kottaridi Klimentia, Demopoulos Vasilis, Sidiropoulos Anastasios, Ihara Diego, Nikolaidis Vasileios, Antonopoulos Dimitrios

Abstract:

Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity, and aflatoxinogenic capacity of the strains, topography, soil, and climate parameters of the fig orchards, are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive, and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), the concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P, and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques, i.e., dimensionality reduction on the original dataset (principal component analysis), metric learning (Mahalanobis metric for clustering), and k-nearest neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson correlation coefficient (PCC) between observed and predicted values.

Keywords: aflatoxins, Aspergillus spp., dried figs, k-nearest neighbors, machine learning, prediction

Procedia PDF Downloads 178
24217 Recommender System Based on Mining Graph Databases for Data-Intensive Applications

Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi

Abstract:

In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.

Keywords: graph databases, NLP, recommendation systems, similarity metrics

Procedia PDF Downloads 101
24216 Digital Revolution a Veritable Infrastructure for Technological Development

Authors: Osakwe Jude Odiakaosa

Abstract:

Today’s digital society is characterized by e-education or e-learning, e-commerce, and so on. All these have been propelled by digital revolution. Digital technology such as computer technology, Global Positioning System (GPS) and Geographic Information System (GIS) has been having a tremendous impact on the field of technology. This development has positively affected the scope, methods, speed of data acquisition, data management and the rate of delivery of the results (map and other map products) of data processing. This paper tries to address the impact of revolution brought by digital technology.

Keywords: digital revolution, internet, technology, data management

Procedia PDF Downloads 444
24215 Enhancing the Bionic Eye: A Real-time Image Optimization Framework to Encode Color and Spatial Information Into Retinal Prostheses

Authors: William Huang

Abstract:

Retinal prostheses are currently limited to low resolution grayscale images that lack color and spatial information. This study develops a novel real-time image optimization framework and tools to encode maximum information to the prostheses which are constrained by the number of electrodes. One key idea is to localize main objects in images while reducing unnecessary background noise through region-contrast saliency maps. A novel color depth mapping technique was developed through MiniBatchKmeans clustering and color space selection. The resulting image was downsampled using bicubic interpolation to reduce image size while preserving color quality. In comparison to current schemes, the proposed framework demonstrated better visual quality in tested images. The use of the region-contrast saliency map showed improvements in efficacy up to 30%. Finally, the computational speed of this algorithm is less than 380 ms on tested cases, making real-time retinal prostheses feasible.

Keywords: retinal implants, virtual processing unit, computer vision, saliency maps, color quantization

Procedia PDF Downloads 143
24214 BigCrypt: A Probable Approach of Big Data Encryption to Protect Personal and Business Privacy

Authors: Abdullah Al Mamun, Talal Alkharobi

Abstract:

As data size is growing up, people are became more familiar to store big amount of secret information into cloud storage. Companies are always required to need transfer massive business files from one end to another. We are going to lose privacy if we transmit it as it is and continuing same scenario repeatedly without securing the communication mechanism means proper encryption. Although asymmetric key encryption solves the main problem of symmetric key encryption but it can only encrypt limited size of data which is inapplicable for large data encryption. In this paper we propose a probable approach of pretty good privacy for encrypt big data using both symmetric and asymmetric keys. Our goal is to achieve encrypt huge collection information and transmit it through a secure communication channel for committing the business and personal privacy. To justify our method an experimental dataset from three different platform is provided. We would like to show that our approach is working for massive size of various data efficiently and reliably.

Keywords: big data, cloud computing, cryptography, hadoop, public key

Procedia PDF Downloads 314
24213 Implementation of Big Data Concepts Led by the Business Pressures

Authors: Snezana Savoska, Blagoj Ristevski, Violeta Manevska, Zlatko Savoski, Ilija Jolevski

Abstract:

Big data is widely accepted by the pharmaceutical companies as a result of business demands create through legal pressure. Pharmaceutical companies have many legal demands as well as standards’ demands and have to adapt their procedures to the legislation. To manage with these demands, they have to standardize the usage of the current information technology and use the latest software tools. This paper highlights some important aspects of experience with big data projects implementation in a pharmaceutical Macedonian company. These projects made improvements of their business processes by the help of new software tools selected to comply with legal and business demands. They use IT as a strategic tool to obtain competitive advantage on the market and to reengineer the processes towards new Internet economy and quality demands. The company is required to manage vast amounts of structured as well as unstructured data. For these reasons, they implement projects for emerging and appropriate software tools which have to deal with big data concepts accepted in the company.

Keywords: big data, unstructured data, SAP ERP, documentum

Procedia PDF Downloads 265
24212 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis

Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales

Abstract:

This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.

Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis

Procedia PDF Downloads 190
24211 A Fuzzy Approach to Liver Tumor Segmentation with Zernike Moments

Authors: Abder-Rahman Ali, Antoine Vacavant, Manuel Grand-Brochier, Adélaïde Albouy-Kissi, Jean-Yves Boire

Abstract:

In this paper, we present a new segmentation approach for liver lesions in regions of interest within MRI (Magnetic Resonance Imaging). This approach, based on a two-cluster Fuzzy C-Means methodology, considers the parameter variable compactness to handle uncertainty. Fine boundaries are detected by a local recursive merging of ambiguous pixels with a sequential forward floating selection with Zernike moments. The method has been tested on both synthetic and real images. When applied on synthetic images, the proposed approach provides good performance, segmentations obtained are accurate, their shape is consistent with the ground truth, and the extracted information is reliable. The results obtained on MR images confirm such observations. Our approach allows, even for difficult cases of MR images, to extract a segmentation with good performance in terms of accuracy and shape, which implies that the geometry of the tumor is preserved for further clinical activities (such as automatic extraction of pharmaco-kinetics properties, lesion characterization, etc).

Keywords: defuzzification, floating search, fuzzy clustering, Zernike moments

Procedia PDF Downloads 449
24210 Security in Resource Constraints: Network Energy Efficient Encryption

Authors: Mona Almansoori, Ahmed Mustafa, Ahmad Elshamy

Abstract:

Wireless nodes in a sensor network gather and process critical information designed to process and communicate, information flooding through such network is critical for decision making and data processing, the integrity of such data is one of the most critical factors in wireless security without compromising the processing and transmission capability of the network. This paper presents mechanism to securely transmit data over a chain of sensor nodes without compromising the throughput of the network utilizing available battery resources available at the sensor node.

Keywords: hybrid protocol, data integrity, lightweight encryption, neighbor based key sharing, sensor node data processing, Z-MAC

Procedia PDF Downloads 139
24209 Spatial Analysis and Determinants of Number of Antenatal Health Care Visit Among Pregnant Women in Ethiopia: Application of Spatial Multilevel Count Regression Models

Authors: Muluwerk Ayele Derebe

Abstract:

Background: Antenatal care (ANC) is an essential element in the continuum of reproductive health care for preventing preventable pregnancy-related morbidity and mortality. Objective: The aim of this study is to assess the spatial pattern and predictors of ANC visits in Ethiopia. Method: This study was done using Ethiopian Demographic and Health Survey data of 2016 among 7,174 pregnant women aged 15-49 years which was a nationwide community-based cross-sectional survey. Spatial analysis was done using Getis-Ord Gi* statistics to identify hot and cold spot areas of ANC visits. Multilevel glmmTMB packages adjusted for spatial effects were used in R software. Spatial multilevel count regression was conducted to identify predictors of antenatal care visits for pregnant women, and proportional change in variance was done to uncover the effect of individual and community-level factors of ANC visits. Results: The distribution of ANC visits was spatially clustered Moran’s I = 0.271, p<.0.001, ICC = 0.497, p<0.001). The highest spatial outlier areas of ANC visit was found in Amhara (South Wollo, Weast Gojjam, North Shewa), Oromo (west Arsi and East Harariga), Tigray (Central Tigray) and Benishangul-Gumuz (Asosa and Metekel) regions. The data was found with excess zeros (34.6%) and over-dispersed. The expected ANC visit of pregnant women with pregnancy complications was higher at 0.7868 [ARR= 2.1964, 95% CI: 1.8605, 2.5928, p-value <0.0001] compared to pregnant women who had no pregnancy complications. The expected ANC visit of a pregnant woman who lived in a rural area was 1.2254 times higher [ARR=3.4057, 95% CI: 2.1462, 5.4041, p-value <0.0001] as compared to a pregnant woman who lived in an urban. The study found dissimilar clusters with a low number of zero counts for a mean number of ANC visits surrounded by clusters with a higher number of counts of an average number of ANC visits when other variables held constant. Conclusion: This study found that the number of ANC visits in Ethiopia had a spatial pattern associated with socioeconomic, demographic, and geographic risk factors. Spatial clustering of ANC visits exists in all regions of Ethiopia. The predictor age of the mother, religion, mother’s education, husband’s education, mother's occupation, husband's occupation, signs of pregnancy complication, wealth index and marital status had a strong association with the number of ANC visits by each individual. At the community level, place of residence, region, age of the mother, sex of the household head, signs of pregnancy complications and distance to health facility factors had a strong association with the number of ANC visits.

Keywords: Ethiopia, ANC, spatial, multilevel, zero inflated Poisson

Procedia PDF Downloads 68
24208 KCBA, A Method for Feature Extraction of Colonoscopy Images

Authors: Vahid Bayrami Rad

Abstract:

In recent years, the use of artificial intelligence techniques, tools, and methods in processing medical images and health-related applications has been highlighted and a lot of research has been done in this regard. For example, colonoscopy and diagnosis of colon lesions are some cases in which the process of diagnosis of lesions can be improved by using image processing and artificial intelligence algorithms, which help doctors a lot. Due to the lack of accurate measurements and the variety of injuries in colonoscopy images, the process of diagnosing the type of lesions is a little difficult even for expert doctors. Therefore, by using different software and image processing, doctors can be helped to increase the accuracy of their observations and ultimately improve their diagnosis. Also, by using automatic methods, the process of diagnosing the type of disease can be improved. Therefore, in this paper, a deep learning framework called KCBA is proposed to classify colonoscopy lesions which are composed of several methods such as K-means clustering, a bag of features and deep auto-encoder. Finally, according to the experimental results, the proposed method's performance in classifying colonoscopy images is depicted considering the accuracy criterion.

Keywords: colorectal cancer, colonoscopy, region of interest, narrow band imaging, texture analysis, bag of feature

Procedia PDF Downloads 48
24207 Development of New Technology Evaluation Model by Using Patent Information and Customers' Review Data

Authors: Kisik Song, Kyuwoong Kim, Sungjoo Lee

Abstract:

Many global firms and corporations derive new technology and opportunity by identifying vacant technology from patent analysis. However, previous studies failed to focus on technologies that promised continuous growth in industrial fields. Most studies that derive new technology opportunities do not test practical effectiveness. Since previous studies depended on expert judgment, it became costly and time-consuming to evaluate new technologies based on patent analysis. Therefore, research suggests a quantitative and systematic approach to technology evaluation indicators by using patent data to and from customer communities. The first step involves collecting two types of data. The data is used to construct evaluation indicators and apply these indicators to the evaluation of new technologies. This type of data mining allows a new method of technology evaluation and better predictor of how new technologies are adopted.

Keywords: data mining, evaluating new technology, technology opportunity, patent analysis

Procedia PDF Downloads 369
24206 Anomaly Detection Based on System Log Data

Authors: M. Kamel, A. Hoayek, M. Batton-Hubert

Abstract:

With the increase of network virtualization and the disparity of vendors, the continuous monitoring and detection of anomalies cannot rely on static rules. An advanced analytical methodology is needed to discriminate between ordinary events and unusual anomalies. In this paper, we focus on log data (textual data), which is a crucial source of information for network performance. Then, we introduce an algorithm used as a pipeline to help with the pretreatment of such data, group it into patterns, and dynamically label each pattern as an anomaly or not. Such tools will provide users and experts with continuous real-time logs monitoring capability to detect anomalies and failures in the underlying system that can affect performance. An application of real-world data illustrates the algorithm.

Keywords: logs, anomaly detection, ML, scoring, NLP

Procedia PDF Downloads 88
24205 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 170
24204 Analysis Of Non-uniform Characteristics Of Small Underwater Targets Based On Clustering

Authors: Tianyang Xu

Abstract:

Small underwater targets generally have a non-centrosymmetric geometry, and the acoustic scattering field of the target has spatial inhomogeneity under active sonar detection conditions. In view of the above problems, this paper takes the hemispherical cylindrical shell as the research object, and considers the angle continuity implied in the echo characteristics, and proposes a cluster-driven research method for the non-uniform characteristics of target echo angle. First, the target echo features are extracted, and feature vectors are constructed. Secondly, the t-SNE algorithm is used to improve the internal connection of the feature vector in the low-dimensional feature space and to construct the visual feature space. Finally, the implicit angular relationship between echo features is extracted under unsupervised condition by cluster analysis. The reconstruction results of the local geometric structure of the target corresponding to different categories show that the method can effectively divide the angle interval of the local structure of the target according to the natural acoustic scattering characteristics of the target.

Keywords: underwater target;, non-uniform characteristics;, cluster-driven method;, acoustic scattering characteristics

Procedia PDF Downloads 119
24203 The Impact of Financial Reporting on Sustainability

Authors: Lynn Ruggieri

Abstract:

The worldwide pandemic has only increased sustainability awareness. The public is demanding that businesses be held accountable for their impact on the environment. While financial data enjoys uniformity in reporting requirements, there are no uniform reporting requirements for non-financial data. Europe is leading the way with some standards being implemented for reporting non-financial sustainability data; however, there is no uniformity globally. And without uniformity, there is not a clear understanding of what information to include and how to disclose it. Sustainability reporting will provide important information to stakeholders and will enable businesses to understand their impact on the environment. Therefore, there is a crucial need for this data. This paper looks at the history of sustainability reporting in the countries of the European Union and throughout the world and makes a case for worldwide reporting requirements for sustainability.

Keywords: financial reporting, non-financial data, sustainability, global financial reporting

Procedia PDF Downloads 173
24202 Methods and Algorithms of Ensuring Data Privacy in AI-Based Healthcare Systems and Technologies

Authors: Omar Farshad Jeelani, Makaire Njie, Viktoriia M. Korzhuk

Abstract:

Recently, the application of AI-powered algorithms in healthcare continues to flourish. Particularly, access to healthcare information, including patient health history, diagnostic data, and PII (Personally Identifiable Information) is paramount in the delivery of efficient patient outcomes. However, as the exchange of healthcare information between patients and healthcare providers through AI-powered solutions increases, protecting a person’s information and their privacy has become even more important. Arguably, the increased adoption of healthcare AI has resulted in a significant concentration on the security risks and protection measures to the security and privacy of healthcare data, leading to escalated analyses and enforcement. Since these challenges are brought by the use of AI-based healthcare solutions to manage healthcare data, AI-based data protection measures are used to resolve the underlying problems. Consequently, this project proposes AI-powered safeguards and policies/laws to protect the privacy of healthcare data. The project presents the best-in-school techniques used to preserve the data privacy of AI-powered healthcare applications. Popular privacy-protecting methods like Federated learning, cryptographic techniques, differential privacy methods, and hybrid methods are discussed together with potential cyber threats, data security concerns, and prospects. Also, the project discusses some of the relevant data security acts/laws that govern the collection, storage, and processing of healthcare data to guarantee owners’ privacy is preserved. This inquiry discusses various gaps and uncertainties associated with healthcare AI data collection procedures and identifies potential correction/mitigation measures.

Keywords: data privacy, artificial intelligence (AI), healthcare AI, data sharing, healthcare organizations (HCOs)

Procedia PDF Downloads 84
24201 Mapping Tunnelling Parameters for Global Optimization in Big Data via Dye Laser Simulation

Authors: Sahil Imtiyaz

Abstract:

One of the biggest challenges has emerged from the ever-expanding, dynamic, and instantaneously changing space-Big Data; and to find a data point and inherit wisdom to this space is a hard task. In this paper, we reduce the space of big data in Hamiltonian formalism that is in concordance with Ising Model. For this formulation, we simulate the system using dye laser in FORTRAN and analyse the dynamics of the data point in energy well of rhodium atom. After mapping the photon intensity and pulse width with energy and potential we concluded that as we increase the energy there is also increase in probability of tunnelling up to some point and then it starts decreasing and then shows a randomizing behaviour. It is due to decoherence with the environment and hence there is a loss of ‘quantumness’. This interprets the efficiency parameter and the extent of quantum evolution. The results are strongly encouraging in favour of the use of ‘Topological Property’ as a source of information instead of the qubit.

Keywords: big data, optimization, quantum evolution, hamiltonian, dye laser, fermionic computations

Procedia PDF Downloads 191
24200 Applying Different Stenography Techniques in Cloud Computing Technology to Improve Cloud Data Privacy and Security Issues

Authors: Muhammad Muhammad Suleiman

Abstract:

Cloud Computing is a versatile concept that refers to a service that allows users to outsource their data without having to worry about local storage issues. However, the most pressing issues to be addressed are maintaining a secure and reliable data repository rather than relying on untrustworthy service providers. In this study, we look at how stenography approaches and collaboration with Digital Watermarking can greatly improve the system's effectiveness and data security when used for Cloud Computing. The main requirement of such frameworks, where data is transferred or exchanged between servers and users, is safe data management in cloud environments. Steganography is the cloud is among the most effective methods for safe communication. Steganography is a method of writing coded messages in such a way that only the sender and recipient can safely interpret and display the information hidden in the communication channel. This study presents a new text steganography method for hiding a loaded hidden English text file in a cover English text file to ensure data protection in cloud computing. Data protection, data hiding capability, and time were all improved using the proposed technique.

Keywords: cloud computing, steganography, information hiding, cloud storage, security

Procedia PDF Downloads 182
24199 Investigation on Performance of Change Point Algorithm in Time Series Dynamical Regimes and Effect of Data Characteristics

Authors: Farhad Asadi, Mohammad Javad Mollakazemi

Abstract:

In this paper, Bayesian online inference in models of data series are constructed by change-points algorithm, which separated the observed time series into independent series and study the change and variation of the regime of the data with related statistical characteristics. variation of statistical characteristics of time series data often represent separated phenomena in the some dynamical system, like a change in state of brain dynamical reflected in EEG signal data measurement or a change in important regime of data in many dynamical system. In this paper, prediction algorithm for studying change point location in some time series data is simulated. It is verified that pattern of proposed distribution of data has important factor on simpler and smother fluctuation of hazard rate parameter and also for better identification of change point locations. Finally, the conditions of how the time series distribution effect on factors in this approach are explained and validated with different time series databases for some dynamical system.

Keywords: time series, fluctuation in statistical characteristics, optimal learning, change-point algorithm

Procedia PDF Downloads 419
24198 Determination of the Risks of Heart Attack at the First Stage as Well as Their Control and Resource Planning with the Method of Data Mining

Authors: İbrahi̇m Kara, Seher Arslankaya

Abstract:

Frequently preferred in the field of engineering in particular, data mining has now begun to be used in the field of health as well since the data in the health sector have reached great dimensions. With data mining, it is aimed to reveal models from the great amounts of raw data in agreement with the purpose and to search for the rules and relationships which will enable one to make predictions about the future from the large amount of data set. It helps the decision-maker to find the relationships among the data which form at the stage of decision-making. In this study, it is aimed to determine the risk of heart attack at the first stage, to control it, and to make its resource planning with the method of data mining. Through the early and correct diagnosis of heart attacks, it is aimed to reveal the factors which affect the diseases, to protect health and choose the right treatment methods, to reduce the costs in health expenditures, and to shorten the durations of patients’ stay at hospitals. In this way, the diagnosis and treatment costs of a heart attack will be scrutinized, which will be useful to determine the risk of the disease at the first stage, to control it, and to make its resource planning.

Keywords: data mining, decision support systems, heart attack, health sector

Procedia PDF Downloads 354
24197 Bayesian Borrowing Methods for Count Data: Analysis of Incontinence Episodes in Patients with Overactive Bladder

Authors: Akalu Banbeta, Emmanuel Lesaffre, Reynaldo Martina, Joost Van Rosmalen

Abstract:

Including data from previous studies (historical data) in the analysis of the current study may reduce the sample size requirement and/or increase the power of analysis. The most common example is incorporating historical control data in the analysis of a current clinical trial. However, this only applies when the historical control dataare similar enough to the current control data. Recently, several Bayesian approaches for incorporating historical data have been proposed, such as the meta-analytic-predictive (MAP) prior and the modified power prior (MPP) both for single control as well as for multiple historical control arms. Here, we examine the performance of the MAP and the MPP approaches for the analysis of (over-dispersed) count data. To this end, we propose a computational method for the MPP approach for the Poisson and the negative binomial models. We conducted an extensive simulation study to assess the performance of Bayesian approaches. Additionally, we illustrate our approaches on an overactive bladder data set. For similar data across the control arms, the MPP approach outperformed the MAP approach with respect to thestatistical power. When the means across the control arms are different, the MPP yielded a slightly inflated type I error (TIE) rate, whereas the MAP did not. In contrast, when the dispersion parameters are different, the MAP gave an inflated TIE rate, whereas the MPP did not.We conclude that the MPP approach is more promising than the MAP approach for incorporating historical count data.

Keywords: count data, meta-analytic prior, negative binomial, poisson

Procedia PDF Downloads 113
24196 Strategic Citizen Participation in Applied Planning Investigations: How Planners Use Etic and Emic Community Input Perspectives to Fill-in the Gaps in Their Analysis

Authors: John Gaber

Abstract:

Planners regularly use citizen input as empirical data to help them better understand community issues they know very little about. This type of community data is based on the lived experiences of local residents and is known as "emic" data. What is becoming more common practice for planners is their use of data from local experts and stakeholders (known as "etic" data or the outsider perspective) to help them fill in the gaps in their analysis of applied planning research projects. Utilizing international Health Impact Assessment (HIA) data, I look at who planners invite to their citizen input investigations. Research presented in this paper shows that planners access a wide range of emic and etic community perspectives in their search for the “community’s view.” The paper concludes with how planners can chart out a new empirical path in their execution of emic/etic citizen participation strategies in their applied planning research projects.

Keywords: citizen participation, emic data, etic data, Health Impact Assessment (HIA)

Procedia PDF Downloads 482
24195 Data Augmentation for Automatic Graphical User Interface Generation Based on Generative Adversarial Network

Authors: Xulu Yao, Moi Hoon Yap, Yanlong Zhang

Abstract:

As a branch of artificial neural network, deep learning is widely used in the field of image recognition, but the lack of its dataset leads to imperfect model learning. By analysing the data scale requirements of deep learning and aiming at the application in GUI generation, it is found that the collection of GUI dataset is a time-consuming and labor-consuming project, which is difficult to meet the needs of current deep learning network. To solve this problem, this paper proposes a semi-supervised deep learning model that relies on the original small-scale datasets to produce a large number of reliable data sets. By combining the cyclic neural network with the generated countermeasure network, the cyclic neural network can learn the sequence relationship and characteristics of data, make the generated countermeasure network generate reasonable data, and then expand the Rico dataset. Relying on the network structure, the characteristics of collected data can be well analysed, and a large number of reasonable data can be generated according to these characteristics. After data processing, a reliable dataset for model training can be formed, which alleviates the problem of dataset shortage in deep learning.

Keywords: GUI, deep learning, GAN, data augmentation

Procedia PDF Downloads 178
24194 Modelling Rainfall-Induced Shallow Landslides in the Northern New South Wales

Authors: S. Ravindran, Y.Liu, I. Gratchev, D.Jeng

Abstract:

Rainfall-induced shallow landslides are more common in the northern New South Wales (NSW), Australia. From 2009 to 2017, around 105 rainfall-induced landslides occurred along the road corridors and caused temporary road closures in the northern NSW. Rainfall causing shallow landslides has different distributions of rainfall varying from uniform, normal, decreasing to increasing rainfall intensity. The duration of rainfall varied from one day to 18 days according to historical data. The objective of this research is to analyse slope instability of some of the sites in the northern NSW by varying cumulative rainfall using SLOPE/W and SEEP/W and compare with field data of rainfall causing shallow landslides. The rainfall data and topographical data from public authorities and soil data obtained from laboratory tests will be used for this modelling. There is a likelihood of shallow landslides if the cumulative rainfall is between 100 mm to 400 mm in accordance with field data.

Keywords: landslides, modelling, rainfall, suction

Procedia PDF Downloads 170
24193 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 139