Search results for: rainfall intensity-duration-frequency data
24967 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions
Authors: K. Hardy, A. Maurushat
Abstract:
Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.Keywords: big data, open data, productivity, data governance
Procedia PDF Downloads 37124966 A Review on Existing Challenges of Data Mining and Future Research Perspectives
Authors: Hema Bhardwaj, D. Srinivasa Rao
Abstract:
Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges
Procedia PDF Downloads 11024965 Integration of Gravity and Seismic Methods in the Geometric Characterization of a Dune Reservoir: Case of the Zouaraa Basin, NW Tunisia
Authors: Marwa Djebbi, Hakim Gabtni
Abstract:
Gravity is a continuously advancing method that has become a mature technology for geological studies. Increasingly, it has been used to complement and constrain traditional seismic data and even used as the only tool to get information of the sub-surface. In fact, in some regions the seismic data, if available, are of poor quality and hard to be interpreted. Such is the case for the current study area. The Nefza zone is part of the Tellian fold and thrust belt domain in the north west of Tunisia. It is essentially made of a pile of allochthonous units resulting from a major Neogene tectonic event. Its tectonic and stratigraphic developments have always been subject of controversies. Considering the geological and hydrogeological importance of this area, a detailed interdisciplinary study has been conducted integrating geology, seismic and gravity techniques. The interpretation of Gravity data allowed the delimitation of the dune reservoir and the identification of the regional lineaments contouring the area. It revealed the presence of three gravity lows that correspond to the dune of Zouara and Ouchtata separated along with a positive gravity axis espousing the Ain Allega_Aroub Er Roumane axe. The Bouguer gravity map illustrated the compartmentalization of the Zouara dune into two depressions separated by a NW-SE anomaly trend. This constitution was confirmed by the vertical derivative map which showed the individualization of two depressions with slightly different anomaly values. The horizontal gravity gradient magnitude was performed in order to determine the different geological features present in the studied area. The latest indicated the presence of NE-SW parallel folds according to the major Atlasic direction. Also, NW-SE and EW trends were identified. The maxima tracing confirmed this direction by the presence of NE-SW faults, mainly the Ghardimaou_Cap Serrat accident. The quality of the available seismic sections and the absence of borehole data in the region, except few hydraulic wells that been drilled and showing the heterogeneity of the substratum of the dune, required the process of gravity modeling of this challenging area that necessitates to be modeled for the geometrical characterization of the dune reservoir and determine the different stratigraphic series underneath these deposits. For more detailed and accurate results, the scale of study will be reduced in coming research. A more concise method will be elaborated; the 4D microgravity survey. This approach is considered as an expansion of gravity method and its fourth dimension is time. It will allow a continuous and repeated monitoring of fluid movement in the subsurface according to the micro gal (μgall) scale. The gravity effect is a result of a monthly variation of the dynamic groundwater level which correlates with rainfall during different periods.Keywords: 3D gravity modeling, dune reservoir, heterogeneous substratum, seismic interpretation
Procedia PDF Downloads 29824964 A Systematic Review on Challenges in Big Data Environment
Authors: Rimmy Yadav, Anmol Preet Kaur
Abstract:
Big Data has demonstrated the vast potential in streamlining, deciding, spotting business drifts in different fields, for example, producing, fund, Information Technology. This paper gives a multi-disciplinary diagram of the research issues in enormous information and its procedures, instruments, and system identified with the privacy, data storage management, network and energy utilization, adaptation to non-critical failure and information representations. Other than this, result difficulties and openings accessible in this Big Data platform have made.Keywords: big data, privacy, data management, network and energy consumption
Procedia PDF Downloads 31224963 Survey on Big Data Stream Classification by Decision Tree
Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi
Abstract:
Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.Keywords: big data, data streams, classification, decision tree
Procedia PDF Downloads 52124962 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication
Authors: Aishwarya Shekhar, Himanshu Sharma
Abstract:
Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.Keywords: confidentiality, deduplication, data compression, hybridity of cloud
Procedia PDF Downloads 38324961 Multi-Indicator Evaluation of Agricultural Drought Trends in Ethiopia: Implications for Dry Land Agriculture and Food Security
Authors: Dawd Ahmed, Venkatesh Uddameri
Abstract:
Agriculture in Ethiopia is the main economic sector influenced by agricultural drought. A simultaneous assessment of drought trends using multiple drought indicators is useful for drought planning and management. Intra-season and seasonal drought trends in Ethiopia were studied using a suite of drought indicators. Standardized Precipitation Index (SPI), Standardized Precipitation Evapotranspiration Index (SPEI), Palmer Drought Severity Index (PDSI), and Z-index for long-rainy, dry, and short-rainy seasons are used to identify drought-causing mechanisms. The Statistical software package R version 3.5.2 was used for data extraction and data analyses. Trend analysis indicated shifts in late-season long-rainy season precipitation into dry in the southwest and south-central portions of Ethiopia. Droughts during the dry season (October–January) were largely temperature controlled. Short-term temperature-controlled hydrologic processes exacerbated rainfall deficits during the short rainy season (February–May) and highlight the importance of temperature- and hydrology-induced soil dryness on the production of short-season crops such as tef. Droughts during the long-rainy season (June–September) were largely driven by precipitation declines arising from the narrowing of the intertropical convergence zone (ITCZ). Increased dryness during long-rainy season had severe consequences on the production of corn and sorghum. PDSI was an aggressive indicator of seasonal droughts suggesting the low natural resilience to combat the effects of slow-acting, moisture-depleting hydrologic processes. The lack of irrigation systems in the nation limits the ability to combat droughts and improve agricultural resilience. There is an urgent need to monitor soil moisture (a key agro-hydrologic variable) to better quantify the impacts of meteorological droughts on agricultural systems in Ethiopia.Keywords: autocorrelation, climate change, droughts, Ethiopia, food security, palmer z-index, PDSI, SPEI, SPI, trend analysis
Procedia PDF Downloads 14124960 A Review of Machine Learning for Big Data
Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.
Abstract:
Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.Keywords: active learning, big data, deep learning, machine learning
Procedia PDF Downloads 44624959 Water Monitoring Sentinel Cloud Platform: Water Monitoring Platform Based on Satellite Imagery and Modeling Data
Authors: Alberto Azevedo, Ricardo Martins, André B. Fortunato, Anabela Oliveira
Abstract:
Water is under severe threat today because of the rising population, increased agricultural and industrial needs, and the intensifying effects of climate change. Due to sea-level rise, erosion, and demographic pressure, the coastal regions are of significant concern to the scientific community. The Water Monitoring Sentinel Cloud platform (WORSICA) service is focused on providing new tools for monitoring water in coastal and inland areas, taking advantage of remote sensing, in situ and tidal modeling data. WORSICA is a service that can be used to determine the coastline, coastal inundation areas, and the limits of inland water bodies using remote sensing (satellite and Unmanned Aerial Vehicles - UAVs) and in situ data (from field surveys). It applies to various purposes, from determining flooded areas (from rainfall, storms, hurricanes, or tsunamis) to detecting large water leaks in major water distribution networks. This service was built on components developed in national and European projects, integrated to provide a one-stop-shop service for remote sensing information, integrating data from the Copernicus satellite and drone/unmanned aerial vehicles, validated by existing online in-situ data. Since WORSICA is operational using the European Open Science Cloud (EOSC) computational infrastructures, the service can be accessed via a web browser and is freely available to all European public research groups without additional costs. In addition, the private sector will be able to use the service, but some usage costs may be applied, depending on the type of computational resources needed by each application/user. Although the service has three main sub-services i) coastline detection; ii) inland water detection; iii) water leak detection in irrigation networks, in the present study, an application of the service to Óbidos lagoon in Portugal is shown, where the user can monitor the evolution of the lagoon inlet and estimate the topography of the intertidal areas without any additional costs. The service has several distinct methodologies implemented based on the computations of the water indexes (e.g., NDWI, MNDWI, AWEI, and AWEIsh) retrieved from the satellite image processing. In conjunction with the tidal data obtained from the FES model, the system can estimate a coastline with the corresponding level or even topography of the inter-tidal areas based on the Flood2Topo methodology. The outcomes of the WORSICA service can be helpful for several intervention areas such as i) emergency by providing fast access to inundated areas to support emergency rescue operations; ii) support of management decisions on hydraulic infrastructures operation to minimize damage downstream; iii) climate change mitigation by minimizing water losses and reduce water mains operation costs; iv) early detection of water leakages in difficult-to-access water irrigation networks, promoting their fast repair.Keywords: remote sensing, coastline detection, water detection, satellite data, sentinel, Copernicus, EOSC
Procedia PDF Downloads 12624958 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights
Authors: Tomy Prihananto, Damar Apri Sudarmadi
Abstract:
Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.Keywords: Indonesia, protection, personal data, privacy, human rights, encryption
Procedia PDF Downloads 18324957 Flow Duration Curve Method to Evaluate Environmental Flow: Case Study of Gharasou River, Ardabil, Iran
Authors: Mehdi Fuladipanah, Mehdi Jorabloo
Abstract:
Water flow management is one of the most important parts of river engineering. Non-uniformity distribution of rainfall and various flow demand with unreasonable flow management will be caused destroyed of river ecosystem. Then, it is very serious to determine ecosystem flow requirement. In this paper, flow duration curve indices method which has hydrological based was used to evaluate environmental flow in Gharasou River, Ardabil, Iran. Using flow duration curve, Q90 and Q95 for different return periods were calculated. Their magnitude were determined as 1-day, 3-day, 7-day, and 30 day. According the second method, hydraulic alteration indices often had low and medium range. In order to maintain river at an acceptable ecological condition, minimum daily discharge of index Q95 is 0.7 m3.s-1.Keywords: ardabil, environmental flow, flow duration curve, Gharasou river
Procedia PDF Downloads 68324956 The Effect of the Rain Intensity on the Hydrodynamic Behavior of the Low-Floor ChéLiffe
Authors: Ahmed Abbas
Abstract:
Land degradation in the Lower Cheliff region leads to loss of their fertility, physical and chemical properties by secondary salinization and film forming surface or surface crust. The main factor related to runoff and soil erosion is their susceptibility to crusting caused by the impact of raindrops, which causes the reduction of the filterability of the soil. The present study aims to investigate the hydrodynamic behavior of five types of soil taken from the plain of low Cheliff under simulated rainfall by using two intensities, one moderate, and others correspond to heavy rains at low kinetic energies. Experimental results demonstrate the influence of chemical and mechanical physical properties of soils on their hydrodynamic behavior and the influence of heavy rain on the modality of the reduction in the filterability and the amount of transported sediment.Keywords: erosion, hydrodynamic behavior, rain simulation, soil
Procedia PDF Downloads 28724955 The Various Legal Dimensions of Genomic Data
Authors: Amy Gooden
Abstract:
When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.Keywords: artificial intelligence, data, law, genomics, rights
Procedia PDF Downloads 13824954 Bayesian Locally Approach for Spatial Modeling of Visceral Leishmaniasis Infection in Northern and Central Tunisia
Authors: Kais Ben-Ahmed, Mhamed Ali-El-Aroui
Abstract:
This paper develops a Local Generalized Linear Spatial Model (LGLSM) to describe the spatial variation of Visceral Leishmaniasis (VL) infection risk in northern and central Tunisia. The response from each region is a number of affected children less than five years of age recorded from 1996 through 2006 from Tunisian pediatric departments and treated as a poison county level data. The model includes climatic factors, namely averages of annual rainfall, extreme values of low temperatures in winter and high temperatures in summer to characterize the climate of each region according to each continentality index, the pluviometric quotient of Emberger (Q2) to characterize bioclimatic regions and component for residual extra-poison variation. The statistical results show the progressive increase in the number of affected children in regions with high continentality index and low mean yearly rainfull. On the other hand, an increase in pluviometric quotient of Emberger contributed to a significant increase in VL incidence rate. When compared with the original GLSM, Bayesian locally modeling is improvement and gives a better approximation of the Tunisian VL risk estimation. According to the Bayesian approach inference, we use vague priors for all parameters model and Markov Chain Monte Carlo method.Keywords: generalized linear spatial model, local model, extra-poisson variation, continentality index, visceral leishmaniasis, Tunisia
Procedia PDF Downloads 39724953 Big Brain: A Single Database System for a Federated Data Warehouse Architecture
Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf
Abstract:
Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)
Procedia PDF Downloads 23624952 Vertical Structure and Frequencies of Deep Convection during Active Periods of the West African Monsoon Season
Authors: Balogun R. Ayodeji, Adefisan E. Adesanya, Adeyewa Z. Debo, E. C. Okogbue
Abstract:
Deep convective systems during active periods of the West African monsoon season have not been properly investigated over better temporal and spatial resolution in West Africa. Deep convective systems are investigated over seven climatic zones of the West African sub-region, which are; west-coast rainforest, dry rainforest, Nigeria-Cameroon rainforest, Nigeria savannah, Central African and South Sudan (CASS) Savannah, Sudano-Sahel, and Sahel, using data from Tropical Rainfall Measurement Mission (TRMM) Precipitation Feature (PF) database. The vertical structure of the convective systems indicated by the presence of at least one 40 dBZ and reaching (attaining) at least 1km in the atmosphere showed strong core (highest frequency (%)) of reflectivity values around 2 km which is below the freezing level (4-5km) for all the zones. Echoes are detected above the 15km altitude much more frequently in the rainforest and Savannah zones than the Sudano and Sahel zones during active periods in March-May (MAM), whereas during active periods in June-September (JJAS) the savannahs, Sudano and Sahel zones convections tend to reach higher altitude more frequently than the rainforest zones. The percentage frequencies of deep convection indicated that the occurrences of the systems are within the range of 2.3-2.8% during both March-May (MAM) and June-September (JJAS) active periods in the rainforest and savannah zones. On the contrary, the percentage frequencies were found to be less than 2% in the Sudano and Sahel zones, except during the active-JJAS period in the Sudano zone.Keywords: active periods, convective system, frequency, reflectivity
Procedia PDF Downloads 15224951 Analyzing the Impact of Spatio-Temporal Climate Variations on the Rice Crop Calendar in Pakistan
Authors: Muhammad Imran, Iqra Basit, Mobushir Riaz Khan, Sajid Rasheed Ahmad
Abstract:
The present study investigates the space-time impact of climate change on the rice crop calendar in tropical Gujranwala, Pakistan. The climate change impact was quantified through the climatic variables, whereas the existing calendar of the rice crop was compared with the phonological stages of the crop, depicted through the time series of the Normalized Difference Vegetation Index (NDVI) derived from Landsat data for the decade 2005-2015. Local maxima were applied on the time series of NDVI to compute the rice phonological stages. Panel models with fixed and cross-section fixed effects were used to establish the relation between the climatic parameters and the time-series of NDVI across villages and across rice growing periods. Results show that the climatic parameters have significant impact on the rice crop calendar. Moreover, the fixed effect model is a significant improvement over cross-sectional fixed effect models (R-squared equal to 0.673 vs. 0.0338). We conclude that high inter-annual variability of climatic variables cause high variability of NDVI, and thus, a shift in the rice crop calendar. Moreover, inter-annual (temporal) variability of the rice crop calendar is high compared to the inter-village (spatial) variability. We suggest the local rice farmers to adapt this change in the rice crop calendar.Keywords: Landsat NDVI, panel models, temperature, rainfall
Procedia PDF Downloads 20524950 A Review Paper on Data Mining and Genetic Algorithm
Authors: Sikander Singh Cheema, Jasmeen Kaur
Abstract:
In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining
Procedia PDF Downloads 59124949 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring
Authors: Seung-Lock Seo
Abstract:
This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.Keywords: data mining, process data, monitoring, safety, industrial processes
Procedia PDF Downloads 40124948 The Development of a Precision Irrigation System for Durian
Authors: Chatrabhuti Pipop, Visessri Supattra, Charinpanitkul Tawatchai
Abstract:
Durian is one of the top agricultural products exported by Thailand. There is the massive market potential for the durian industry. While the global demand for Thai durians, especially the demand from China, is very high, Thailand's durian supply is far from satisfying strong demand. Poor agricultural practices result in low yields and poor quality of fruit. Most irrigation systems currently used by the farmers are fixed schedule or fixed rates that ignore actual weather conditions and crop water requirements. In addition, the technologies emerging are too difficult and complex and prices are too high for the farmers to adopt and afford. Many farmers leave the durian trees to grow naturally. With improper irrigation and nutrient management system, durians are vulnerable to a variety of issues, including stunted growth, not flowering, diseases, and death. Technical development or research for durian is much needed to support the wellbeing of the farmers and the economic development of the country. However, there are a limited number of studies or development projects for durian because durian is a perennial crop requiring a long time to obtain the results to report. This study, therefore, aims to address the problem of durian production by developing an autonomous and precision irrigation system. The system is designed and equipped with an industrial programmable controller, a weather station, and a digital flow meter. Daily water requirements are computed based on weather data such as rainfall and evapotranspiration for daily irrigation with variable flow rates. A prediction model is also designed as a part of the system to enhance the irrigation schedule. Before the system was installed in the field, a simulation model was built and tested in a laboratory setting to ensure its accuracy. Water consumption was measured daily before and after the experiment for further analysis. With this system, the crop water requirement is precisely estimated and optimized based on the data from the weather station. Durian will be irrigated at the right amount and at the right time, offering the opportunity for higher yield and higher income to the farmers.Keywords: Durian, precision irrigation, precision agriculture, smart farm
Procedia PDF Downloads 11824947 Predicting the Adsorptive Capacities of Biosolid as a Barrier in Soil to Remove Industrial Contaminants
Authors: H. Aguedal, H. Hentit, A. Aziz, D. R. Merouani, A. Iddou
Abstract:
The major environmental risk of soil pollution is the contamination of groundwater by infiltration of organic and inorganic pollutants that can cause a serious pollution. To protect the groundwater, in this study, we proceeded to test the reliability of a bio solid as barrier to prevent the migration of a very dangerous pollutant ‘Cadmium’ through the different soil layers. The follow-up the influence of several parameters, such as: turbidity, pluviometry, initial concentration of cadmium and the nature of soil, allow us to find the most effective manner to integrate this barrier in the soil. From the results obtained, we noted the effective intervention of the barrier. Indeed, the recorded passing quantities are lowest for the highest rainfall; we noted that the barrier has a better affinity towards higher concentrations; the most retained amounts of cadmium has been in the top layer of the two types of soil, while the lowest amounts of cadmium are recorded in the inner layers of soils.Keywords: adsorption of cadmium, barrier, groundwater pollution, protection
Procedia PDF Downloads 36424946 A Survey of Semantic Integration Approaches in Bioinformatics
Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir
Abstract:
Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.Keywords: biological ontology, linked data, semantic data integration, semantic web
Procedia PDF Downloads 44924945 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture
Authors: Thrivikraman Aswathi, S. Advaith
Abstract:
As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.Keywords: GAN, transformer, classification, multivariate time series
Procedia PDF Downloads 13024944 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault
Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola
Abstract:
Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula
Procedia PDF Downloads 8224943 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name
Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing
Abstract:
Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.Keywords: NDN, order-preserving encryption, fuzzy search, privacy
Procedia PDF Downloads 48524942 Healthcare Big Data Analytics Using Hadoop
Authors: Chellammal Surianarayanan
Abstract:
Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare
Procedia PDF Downloads 41324941 Data Disorders in Healthcare Organizations: Symptoms, Diagnoses, and Treatments
Authors: Zakieh Piri, Shahla Damanabi, Peyman Rezaii Hachesoo
Abstract:
Introduction: Healthcare organizations like other organizations suffer from a number of disorders such as Business Sponsor Disorder, Business Acceptance Disorder, Cultural/Political Disorder, Data Disorder, etc. As quality in healthcare care mostly depends on the quality of data, we aimed to identify data disorders and its symptoms in two teaching hospitals. Methods: Using a self-constructed questionnaire, we asked 20 questions in related to quality and usability of patient data stored in patient records. Research population consisted of 150 managers, physicians, nurses, medical record staff who were working at the time of study. We also asked their views about the symptoms and treatments for any data disorders they mentioned in the questionnaire. Using qualitative methods we analyzed the answers. Results: After classifying the answers, we found six main data disorders: incomplete data, missed data, late data, blurred data, manipulated data, illegible data. The majority of participants believed in their important roles in treatment of data disorders while others believed in health system problems. Discussion: As clinicians have important roles in producing of data, they can easily identify symptoms and disorders of patient data. Health information managers can also play important roles in early detection of data disorders by proactively monitoring and periodic check-ups of data.Keywords: data disorders, quality, healthcare, treatment
Procedia PDF Downloads 43324940 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines
Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay
Abstract:
One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.Keywords: big data, data analytics, higher education, republic of the philippines, assessment
Procedia PDF Downloads 34824939 Erosion Modeling of Surface Water Systems for Long Term Simulations
Authors: Devika Nair, Sean Bellairs, Ken Evans
Abstract:
Flow and erosion modeling provides an avenue for simulating the fine suspended sediment in surface water systems like streams and creeks. Fine suspended sediment is highly mobile, and many contaminants that may have been released by any sort of catchment disturbance attach themselves to these sediments. Therefore, a knowledge of fine suspended sediment transport is important in assessing contaminant transport. The CAESAR-Lisflood Landform Evolution Model, which includes a hydrologic model (TOPMODEL) and a hydraulic model (Lisflood), is being used to assess the sediment movement in tropical streams on account of a disturbance in the catchment of the creek and to determine the dynamics of sediment quantity in the creek through the years by simulating the model for future years. The accuracy of future simulations depends on the calibration and validation of the model to the past and present events. Calibration and validation of the model involve finding a combination of parameters of the model, which, when applied and simulated, gives model outputs similar to those observed for the real site scenario for corresponding input data. Calibrating the sediment output of the CAESAR-Lisflood model at the catchment level and using it for studying the equilibrium conditions of the landform is an area yet to be explored. Therefore, the aim of the study was to calibrate the CAESAR-Lisflood model and then validate it so that it could be run for future simulations to study how the landform evolves over time. To achieve this, the model was run for a rainfall event with a set of parameters, plus discharge and sediment data for the input point of the catchment, to analyze how similar the model output would behave when compared with the discharge and sediment data for the output point of the catchment. The model parameters were then adjusted until the model closely approximated the real site values of the catchment. It was then validated by running the model for a different set of events and checking that the model gave similar results to the real site values. The outcomes demonstrated that while the model can be calibrated to a greater extent for hydrology (discharge output) throughout the year, the sediment output calibration may be slightly improved by having the ability to change parameters to take into account the seasonal vegetation growth during the start and end of the wet season. This study is important to assess hydrology and sediment movement in seasonal biomes. The understanding of sediment-associated metal dispersion processes in rivers can be used in a practical way to help river basin managers more effectively control and remediate catchments affected by present and historical metal mining.Keywords: erosion modelling, fine suspended sediments, hydrology, surface water systems
Procedia PDF Downloads 8424938 Data Management and Analytics for Intelligent Grid
Authors: G. Julius P. Roy, Prateek Saxena, Sanjeev Singh
Abstract:
Power distribution utilities two decades ago would collect data from its customers not later than a period of at least one month. The origin of SmartGrid and AMI has subsequently increased the sampling frequency leading to 1000 to 10000 fold increase in data quantity. This increase is notable and this steered to coin the tern Big Data in utilities. Power distribution industry is one of the largest to handle huge and complex data for keeping history and also to turn the data in to significance. Majority of the utilities around the globe are adopting SmartGrid technologies as a mass implementation and are primarily focusing on strategic interdependence and synergies of the big data coming from new information sources like AMI and intelligent SCADA, there is a rising need for new models of data management and resurrected focus on analytics to dissect data into descriptive, predictive and dictatorial subsets. The goal of this paper is to is to bring load disaggregation into smart energy toolkit for commercial usage.Keywords: data management, analytics, energy data analytics, smart grid, smart utilities
Procedia PDF Downloads 780