Search results for: data streams
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24541

Search results for: data streams

24331 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication

Authors: Aishwarya Shekhar, Himanshu Sharma

Abstract:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.

Keywords: confidentiality, deduplication, data compression, hybridity of cloud

Procedia PDF Downloads 362
24330 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.

Abstract:

Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 411
24329 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption

Procedia PDF Downloads 159
24328 The Various Legal Dimensions of Genomic Data

Authors: Amy Gooden

Abstract:

When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.

Keywords: artificial intelligence, data, law, genomics, rights

Procedia PDF Downloads 125
24327 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)

Procedia PDF Downloads 218
24326 Optimization of Bioremediation Process to Remove Hexavalent Chromium from Tannery Effluent

Authors: Satish Babu Rajulapati

Abstract:

The removal of toxic and heavy metal contaminants from wastewater streams and industrial effluents is one of the most important environmental issues being faced world over. In the present study three bacterial cultures tolerating high concentrations of chromium were isolated from the soil and wastewater sample collected from the tanneries located in Warangal, Telangana state. The bacterial species were identified as Bacillus sp., Staphylococcus sp. and pseudomonas sp. Preliminary studies were carried out with the three bacterial species at various operating parameters such as pH and temperature. The results indicate that pseudomonas sp. is the efficient one in the uptake of Cr(VI). Further, detailed investigation of Pseudomonas sp. have been carried out to determine the efficiency of removal of Cr(VI). The various parameters influencing the biosorption of Cr(VI) such as pH, temperature, initial chromium concentration, innoculum size and incubation time have been studied. Response Surface Methodology (RSM) was applied to optimize the removal of Cr(VI). Maximum Cr(VI) removal was found to be 85.72% Cr(VI) atpH 7, temperature 35 °C, initial concentration 67mg/l, inoculums size 9 %(v/v) and time 60 hrs.

Keywords: Staphylococcus sp, chromium, RSM, optimization, Cr(IV)

Procedia PDF Downloads 298
24325 Techno-Economic Optimization and Evaluation of an Integrated Industrial Scale NMC811 Cathode Active Material Manufacturing Process

Authors: Usama Mohamed, Sam Booth, Aliysn J. Nedoma

Abstract:

As part of the transition to electric vehicles, there has been a recent increase in demand for battery manufacturing. Cathodes typically account for approximately 50% of the total lithium-ion battery cell cost and are a pivotal factor in determining the viability of new industrial infrastructure. Cathodes which offer lower costs whilst maintaining or increasing performance, such as nickel-rich layered cathodes, have a significant competitive advantage when scaling up the manufacturing process. This project evaluates the techno-economic value proposition of an integrated industrial scale cathode active material (CAM) production process, closing the mass and energy balances, and optimizing the operation conditions using a sensitivity analysis. This is done by developing a process model of a co-precipitation synthesis route using Aspen Plus software and validated based on experimental data. The mechanism chemistry and equilibrium conditions were established based on previous literature and HSC-Chemistry software. This is then followed by integrating the energy streams, adding waste recovery and treatment processes, as well as testing the effect of key parameters (temperature, pH, reaction time, etc.) on CAM production yield and emissions. Finally, an economic analysis estimating the fixed and variable costs (including capital expenditure, labor costs, raw materials, etc.) to calculate the cost of CAM ($/kg and $/kWh), total plant cost ($) and net present value (NPV). This work sets the foundational blueprint for future research into sustainable industrial scale processes for CAM manufacturing.

Keywords: cathodes, industrial production, nickel-rich layered cathodes, process modelling, techno-economic analysis

Procedia PDF Downloads 85
24324 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur

Abstract:

In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 569
24323 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring

Authors: Seung-Lock Seo

Abstract:

This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.

Keywords: data mining, process data, monitoring, safety, industrial processes

Procedia PDF Downloads 376
24322 Estimation of Delay Due to Loading–Unloading of Passengers by Buses and Reduction of Number of Lanes at Selected Intersections in Dhaka City

Authors: Sumit Roy, A. Uddin

Abstract:

One of the significant reasons that increase the delay time in the intersections at heterogeneous traffic condition is a sudden reduction of the capacity of the roads. In this study, the delay for this sudden capacity reduction is estimated. Two intersections at Dhaka city were brought in to thestudy, i.e., Kakrail intersection, and SAARC Foara intersection. At Kakrail intersection, the sudden reduction of capacity in the roads is seen at three downstream legs of the intersection, which are because of slowing down or stopping of buses for loading and unloading of passengers. At SAARC Foara intersection, sudden reduction of capacity was seen at two downstream legs. At one leg, it was due to loading and unloading of buses, and at another leg, it was for both loading and unloading of buses and reduction of the number of lanes. With these considerations, the delay due to intentional stoppage or slowing down of buses and reduction of the number of lanes for these two intersections are estimated. Here the delay was calculated by two approaches. The first approach came from the concept of shock waves in traffic streams. Here the delay was calculated by determining the flow, density, and speed before and after the sudden capacity reduction. The second approach came from the deterministic analysis of queues. Here the delay is calculated by determining the volume, capacity and reduced capacity of the road. After determining the delay from these two approaches, the results were compared. For this study, the video of each of the two intersections was recorded for one hour at the evening peak. Necessary geometric data were also taken to determine speed, flow, and density, etc. parameters. The delay was calculated for one hour with one-hour data at both intersections. In case of Kakrail intersection, the per hour delay for Kakrail circle leg was 5.79, and 7.15 minutes, for Shantinagar cross intersection leg they were 13.02 and 15.65 minutes, and for Paltan T intersection leg, they were 3 and 1.3 minutes for 1st and 2nd approaches respectively. In the case of SAARC Foara intersection, the delay at Shahbag leg was only due to intentional stopping or slowing down of busses, which were 3.2 and 3 minutes respectively for both approaches. For the Karwan Bazar leg, the delays for buses by both approaches were 5 and 7.5 minutes respectively, and for reduction of the number of lanes, the delays for both approaches were 2 and 1.78 minutes respectively. Measuring the delay per hour for the Kakrail leg at Kakrail circle, it is seen that, with consideration of the first approach of delay estimation, the intentional stoppage and lowering of speed by buses contribute to 26.24% of total delay at Kakrail circle. If the loading and unloading of buses at intersection is made forbidden near intersection, and any other measures for loading and unloading of passengers are established far enough from the intersections, then the delay at intersections can be reduced at significant scale, and the performance of the intersections can be enhanced.

Keywords: delay, deterministic queue analysis, shock wave, passenger loading-unloading

Procedia PDF Downloads 160
24321 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: biological ontology, linked data, semantic data integration, semantic web

Procedia PDF Downloads 423
24320 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 103
24319 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault

Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola

Abstract:

Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.

Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula

Procedia PDF Downloads 48
24318 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name

Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing

Abstract:

Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.

Keywords: NDN, order-preserving encryption, fuzzy search, privacy

Procedia PDF Downloads 462
24317 Channel Characteristics and Morphometry of a Part of Umtrew River, Meghalaya

Authors: Pratyashi Phukan, Ranjan Saikia

Abstract:

Morphometry incorporates quantitative study of the area ,altitude,volume, slope profiles of a land and drainage basin characteristics of the area concerned.Fluvial geomorphology includes the consideration of linear,areal and relief aspects of a fluvially originated drainage basin. The linear aspect deals with the hierarchical orders of streams, numbers, and lenghts of stream segments and various relationship among them.The areal aspect includes the analysis of basin perimeters,basin shape, basin area, and related morphometric laws. The relief aspect incorporates besides hypsometric, climographic and altimetric analysis,the study of absolute and relative reliefs, relief ratios, average slope, etc. In this paper we have analysed the relationship among stream velocity, channel shape,sediment load,channel width,channel depth, etc.

Keywords: morphometry, hydraulic geometry, Umtrew river, Meghalaya

Procedia PDF Downloads 430
24316 In the Study of Co₂ Capacity Performance of Different Frothing Agents through Process Simulation

Authors: Muhammad Idrees, Masroor Abro, Sikandar Almani

Abstract:

Presently, the increasing CO₂ concentration in the atmosphere has been taken as one of the major challenges faced by the modern world. The average CO₂ in the atmosphere reached the highest value of 414.72 ppm in 2021, as reported in a conference of the parties (COP26). This study focuses on (i) the comparative study of MEA, NaOH, Acetic acid, and Na₂CO₃ in terms of their CO₂ capture performance, (ii) the significance of adding various frothing agents achieving improved absorption capacity of Na₂CO₃ and (iii) the overall economic evaluation of process with the help of Aspen Plus. The results obtained suggest that the addition of frothing agents significantly increased the absorption rate of dilute sodium carbonate such that from 45% to 99.9%. The effect of temperature, pressure and flow rate of liquid and flue gas streams on CO₂ absorption capacity was also investigated. It was found that the absorption capacity of Na₂CO₃ decreased with increasing temperature of the liquid stream and decreasing flow rate of the liquid stream and pressure of the gas stream.

Keywords: CO₂, absorbents, frothing agents, process simulation

Procedia PDF Downloads 51
24315 Healthcare Big Data Analytics Using Hadoop

Authors: Chellammal Surianarayanan

Abstract:

Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.

Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare

Procedia PDF Downloads 385
24314 Diffusion of Social Innovation in Thai Community Enterprises

Authors: Thanisa Sirithaporn

Abstract:

The study aims to examine the diffusion of social innovation among Thai Community Enterprises in conjunction with a singular case study of a medium-sized corporation that has successfully transitioned from a charitable foundation to a sustainable, profitable entity creating value for both shareholders and the communities in which it operates. It seeks to bridge the gap between different streams of aligned research in the fields of diffusion, social innovation, and community enterprises into a more cohesive conceptual framework and thus to better understand the historical and current impediments that have resulted in so many enterprises failing to be sustainable. The methodology is mixed and dual phased. The initial quantitative phase uses a questionnaire as the main research instrument distributed among community enterprises throughout Thailand which will provide the themes for the qualitative phase through semi-structured interviews with key stakeholders at a commercial enterprise actively engaged in social innovation. The findings seek to present a more comprehensive conceptual framework and actionable guidelines to aid community enterprises to develop social innovation in a sustainable manner that creates value to its beneficiaries.

Keywords: diffusion, community enterprises, social innovation, Thailand

Procedia PDF Downloads 115
24313 Data Disorders in Healthcare Organizations: Symptoms, Diagnoses, and Treatments

Authors: Zakieh Piri, Shahla Damanabi, Peyman Rezaii Hachesoo

Abstract:

Introduction: Healthcare organizations like other organizations suffer from a number of disorders such as Business Sponsor Disorder, Business Acceptance Disorder, Cultural/Political Disorder, Data Disorder, etc. As quality in healthcare care mostly depends on the quality of data, we aimed to identify data disorders and its symptoms in two teaching hospitals. Methods: Using a self-constructed questionnaire, we asked 20 questions in related to quality and usability of patient data stored in patient records. Research population consisted of 150 managers, physicians, nurses, medical record staff who were working at the time of study. We also asked their views about the symptoms and treatments for any data disorders they mentioned in the questionnaire. Using qualitative methods we analyzed the answers. Results: After classifying the answers, we found six main data disorders: incomplete data, missed data, late data, blurred data, manipulated data, illegible data. The majority of participants believed in their important roles in treatment of data disorders while others believed in health system problems. Discussion: As clinicians have important roles in producing of data, they can easily identify symptoms and disorders of patient data. Health information managers can also play important roles in early detection of data disorders by proactively monitoring and periodic check-ups of data.

Keywords: data disorders, quality, healthcare, treatment

Procedia PDF Downloads 414
24312 Geomorphologic Evolution of the Southern Habble-Rud River Basin, North of Iran

Authors: Maryam Jaberi, Siavosh Shayan, Mojtaba Yamani

Abstract:

Habble-Rud River basin (HR), up to 100 km length, one of the largest watersheds which drain into deserts to the north of Central Iran (Dasht-e Kavir). This stream is oblique with the NE-SW trending, flow in the southern range of central Alborz Mountains and the northern border of Central Iran. The end of the ~17 km suddenly change direction and with the southern trending to have a morphology which meanders passes through the Alborz Mountain ridge and flows into the Garmsar plain where it forms one of the largest alluvial fans in Iran, i.e. the vast Garmsar alluvial fan with an area of 476 km2. This study was carried out through morphometric analyses, longitudinal river profiles, and study of geomorpholic evidence such as fluvial terraces, gypsum-salt domes, seismic data, and satellite images. This study aimed to investigate the changes in the pattern of rivers in the southern part of the HR river basin. The southern part of HR river basin located at the southern foothills of the Central Alborz is characterized the thrust faults (Sorkheh-Kalut and Garmsar faults), folds,diapirs and arid climate. The activity of more than 10 salt domes that belong to the Oligocene-Miocene period has considerably influenced the pattern of streams in this region. Dissolution of these domes has not only reduced the quality of water and soil resources, but also has led to the formation of badlands and gullies.Our results indicated that the pattern of rivers in the southern part of HR river basin was influenced by discharge of the HR river in Quaternary, geological structure, subsidence of Central Iran and vertical uplift of Alborz mountain. These agents caused the formation meanders in the southern part of the HR River and evaluation of the seasonal rivers like Shoor-Darre and Garmabsar.

Keywords: geomorphologic evaluation, rivers pattern, Habble-Rud River basin, seasonal rivers

Procedia PDF Downloads 486
24311 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines

Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay

Abstract:

One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.

Keywords: big data, data analytics, higher education, republic of the philippines, assessment

Procedia PDF Downloads 316
24310 Data Management and Analytics for Intelligent Grid

Authors: G. Julius P. Roy, Prateek Saxena, Sanjeev Singh

Abstract:

Power distribution utilities two decades ago would collect data from its customers not later than a period of at least one month. The origin of SmartGrid and AMI has subsequently increased the sampling frequency leading to 1000 to 10000 fold increase in data quantity. This increase is notable and this steered to coin the tern Big Data in utilities. Power distribution industry is one of the largest to handle huge and complex data for keeping history and also to turn the data in to significance. Majority of the utilities around the globe are adopting SmartGrid technologies as a mass implementation and are primarily focusing on strategic interdependence and synergies of the big data coming from new information sources like AMI and intelligent SCADA, there is a rising need for new models of data management and resurrected focus on analytics to dissect data into descriptive, predictive and dictatorial subsets. The goal of this paper is to is to bring load disaggregation into smart energy toolkit for commercial usage.

Keywords: data management, analytics, energy data analytics, smart grid, smart utilities

Procedia PDF Downloads 760
24309 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive

Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh

Abstract:

Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.

Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data

Procedia PDF Downloads 270
24308 Soils Properties of Alfisols in the Nicoya Peninsula, Guanacaste, Costa Rica

Authors: Elena Listo, Miguel Marchamalo

Abstract:

This research studies the soil properties located in the watershed of Jabillo River in the Guanacaste province, Costa Rica. The soils are classified as Alfisols (T. Haplustalfs), in the flatter parts with grazing as Fluventic Haplustalfs or as a consequence of bad drainage as F. Epiaqualfs. The objective of this project is to define the status of the soil, to use remote sensing as a tool for analyzing the evolution of land use and determining the water balance of the watershed in order to improve the efficiency of the water collecting systems. Soil samples were analyzed from trial pits taken from secondary forests, degraded pastures, mature teak plantation, and regrowth -Tectona grandis L. F.- species developed favorably in the area. Furthermore, to complete the study, infiltration measurements were taken with an artificial rainfall simulator, as well as studies of soil compaction with a penetrometer, in points strategically selected from the different land uses. Regarding remote sensing, nearly 40 data samples were collected per plot of land. The source of radiation is reflected sunlight from the beam and the underside of leaves, bare soil, streams, roads and logs, and soil samples. Infiltration reached high levels. The majority of data came from the secondary forest and mature planting due to a high proportion of organic matter, relatively low bulk density, and high hydraulic conductivity. Teak regrowth had a low rate of infiltration because the studies made regarding the soil compaction showed a partial compaction over 50 cm. The secondary forest presented a compaction layer from 15 cm to 30 cm deep, and the degraded pasture, as a result of grazing, in the first 15 cm. In this area, the alfisols soils have high content of iron oxides, a fact that causes a higher reflectivity close to the infrared region of the electromagnetic spectrum (around 700mm), as a result of clay texture. Specifically in the teak plantation where the reflectivity reaches values of 90 %, this is due to the high content of clay in relation to others. In conclusion, the protective function of secondary forests is reaffirmed with regards to erosion and high rate of infiltration. In humid climates and permeable soils, the decrease of runoff is less, however, the percolation increases. The remote sensing indicates that being clay soils, they retain moisture in a better way and it means a low reflectivity despite being fine texture.

Keywords: alfisols, Costa Rica, infiltration, remote sensing

Procedia PDF Downloads 671
24307 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 191
24306 Democracy Bytes: Interrogating the Exploitation of Data Democracy by Radical Terrorist Organizations

Authors: Nirmala Gopal, Sheetal Bhoola, Audecious Mugwagwa

Abstract:

This paper discusses the continued infringement and exploitation of data by non-state actors for destructive purposes, emphasizing radical terrorist organizations. It will discuss how terrorist organizations access and use data to foster their nefarious agendas. It further examines how cybersecurity, designed as a tool to curb data exploitation, is ineffective in raising global citizens' concerns about how their data can be kept safe and used for its acquired purpose. The study interrogates several policies and data protection instruments, such as the Data Protection Act, Cyber Security Policies, Protection of Personal Information(PPI) and General Data Protection Regulations (GDPR), to understand data use and storage in democratic states. The study outcomes point to the fact that international cybersecurity and cybercrime legislation, policies, and conventions have not curbed violations of data access and use by radical terrorist groups. The study recommends ways to enhance cybersecurity and reduce cyber risks using democratic principles.

Keywords: cybersecurity, data exploitation, terrorist organizations, data democracy

Procedia PDF Downloads 178
24305 Healthcare Data Mining Innovations

Authors: Eugenia Jilinguirian

Abstract:

In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.

Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database

Procedia PDF Downloads 46
24304 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods

Procedia PDF Downloads 341
24303 Development of Integrated Solid Waste Management Plan for Industrial Estates of Pakistan

Authors: Mehak Masood

Abstract:

This paper aims to design an integrated solid waste management plan for industrial estates taking Sundar Industrial Estate as case model. The issue of solid waste management is on the rise in Pakistan especially in the industrial sector. In this regard, the concept of development and establishment of industrial estates is gaining popularity nowadays. Without proper solid waste management plan it is very difficult to manage day to day affairs of industrial estates. An industrial estate contains clusters of different types of industrial units. It is necessary to identify different types of solid waste streams from each industrial cluster within the estate. In this study, Sundar Industrial Estate was taken as a case model. Primary and secondary data collection, waste assessment, waste segregation and weighing and field surveys were essential elements of the study. Wastes from each industrial process were identified and quantified. Currently 130 industries are in production but after full colonization of industries this number would reach 385. Elaborated process flow diagrams were made to characterize the recyclable and non-recyclables waste. From the study it was calculated that about 12354.1 kg/captia/day of solid waste is being generated in Sundar Industrial Estate. After the full colonization of the industrial estate, the estimated quantity will be 4756328.5 kg/captia/day. Furthermore, solid waste generated from each industrial sector was estimated. Suggestions for collection and transportation are given. Environment friendly solid waste management practices are suggested. If an effective integrated waste management system is developed and implemented it will conserve resources, create jobs, reduce poverty, conserve natural resources, protect the environment, save collection, transportation and disposal costs and extend the life of disposal sites. A major outcome of this study is an integrated solid waste management plan for the Sundar Industrial Estate which requires immediate implementation.

Keywords: integrated solid waste management plan, industrial estates, Sundar Industrial Estate, Pakistan

Procedia PDF Downloads 471
24302 Case Study on Innovative Aquatic-Based Bioeconomy for Chlorella sorokiniana

Authors: Iryna Atamaniuk, Hannah Boysen, Nils Wieczorek, Natalia Politaeva, Iuliia Bazarnova, Kerstin Kuchta

Abstract:

Over the last decade due to climate change and a strategy of natural resources preservation, the interest for the aquatic biomass has dramatically increased. Along with mitigation of the environmental pressure and connection of waste streams (including CO2 and heat emissions), microalgae bioeconomy can supply food, feed, as well as the pharmaceutical and power industry with number of value-added products. Furthermore, in comparison to conventional biomass, microalgae can be cultivated in wide range of conditions without compromising food and feed production, thus addressing issues associated with negative social and the environmental impacts. This paper presents the state-of-the art technology for microalgae bioeconomy from cultivation process to production of valuable components and by-streams. Microalgae Chlorella sorokiniana were cultivated in the pilot-scale innovation concept in Hamburg (Germany) using different systems such as race way pond (5000 L) and flat panel reactors (8 x 180 L). In order to achieve the optimum growth conditions along with suitable cellular composition for the further extraction of the value-added components, process parameters such as light intensity, temperature and pH are continuously being monitored. On the other hand, metabolic needs in nutrients were provided by addition of micro- and macro-nutrients into a medium to ensure autotrophic growth conditions of microalgae. The cultivation was further followed by downstream process and extraction of lipids, proteins and saccharides. Lipids extraction is conducted in repeated-batch semi-automatic mode using hot extraction method according to Randall. As solvents hexane and ethanol are used at different ratio of 9:1 and 1:9, respectively. Depending on cell disruption method along with solvents ratio, the total lipids content showed significant variations between 8.1% and 13.9 %. The highest percentage of extracted biomass was reached with a sample pretreated with microwave digestion using 90% of hexane and 10% of ethanol as solvents. Proteins content in microalgae was determined by two different methods, namely: Total Kejadahl Nitrogen (TKN), which further was converted to protein content, as well as Bradford method using Brilliant Blue G-250 dye. Obtained results, showed a good correlation between both methods with protein content being in the range of 39.8–47.1%. Characterization of neutral and acid saccharides from microalgae was conducted by phenol-sulfuric acid method at two wavelengths of 480 nm and 490 nm. The average concentration of neutral and acid saccharides under the optimal cultivation conditions was 19.5% and 26.1%, respectively. Subsequently, biomass residues are used as substrate for anaerobic digestion on the laboratory-scale. The methane concentration, which was measured on the daily bases, showed some variations for different samples after extraction steps but was in the range between 48% and 55%. CO2 which is formed during the fermentation process and after the combustion in the Combined Heat and Power unit can potentially be used within the cultivation process as a carbon source for the photoautotrophic synthesis of biomass.

Keywords: bioeconomy, lipids, microalgae, proteins, saccharides

Procedia PDF Downloads 227