Search results for: Data Mining
24802 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication
Authors: Aishwarya Shekhar, Himanshu Sharma
Abstract:
Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.Keywords: confidentiality, deduplication, data compression, hybridity of cloud
Procedia PDF Downloads 38324801 Manganese Contamination Exacerbates Reproductive Stress in a Suicidally-Breeding Marsupial
Authors: Ami Fadhillah Amir Abdul Nasir, Amanda C. Niehaus, Skye F. Cameron, Frank A. Von Hippel, John Postlethwait, Robbie S. Wilson
Abstract:
For suicidal breeders, the physiological stresses and energetic costs of breeding are fatal. Environmental stressors such as pollution should compound these costs, yet suicidal breeding is so rare among mammals that this is unknown. Here, we explored the consequences of metal contamination to the health, aging and performance of endangered, suicidally-breeding northern quolls (Dasyurus hallucatus) living near an active manganese mine on Groote Eylandt, Northern Territory, Australia. We found respirable manganese dust at levels exceeding international recommendations even 20km from mining sites and substantial accumulation of manganese within quolls’ hair, testes, and in two brain regions—the neocortex and cerebellum, responsible for sensory perception and motor function, respectively. Though quolls did not differ in sprint speeds, motor skill, or manoeuvrability, those with higher accumulation of manganese crashed at lower speeds during manoeuvrability tests, indicating a potential effect on sight or cognition. Immune function and telomere length declined over the breeding season, as expected with ageing, but manganese contamination exacerbated immune declines and suppressed cortisol. Unexpectedly, male quolls with higher levels of manganese had longer telomeres, supporting evidence of unusual telomere dynamics among Dasyurids—though whether this affects their lifespan is unknown. We posit that sublethal contamination via pollution, mining, or urbanisation imposes physiological costs on wildlife that may diminish reproductive success or survival.Keywords: ecotoxicology, heavy metal, manganese, telomere length, cortisol, locomotor
Procedia PDF Downloads 31724800 A Review of Machine Learning for Big Data
Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.
Abstract:
Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.Keywords: active learning, big data, deep learning, machine learning
Procedia PDF Downloads 44624799 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights
Authors: Tomy Prihananto, Damar Apri Sudarmadi
Abstract:
Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.Keywords: Indonesia, protection, personal data, privacy, human rights, encryption
Procedia PDF Downloads 18324798 Benthic Foraminiferal Responses to Coastal Pollution for Some Selected Sites along Red Sea, Egypt
Authors: Ramadan M. El-Kahawy, M. A. El-Shafeiy, Mohamed Abd El-Wahab, S. A. Helal, Nabil Aboul-Ela
Abstract:
Due to the economic importance of Safaga Bay, Quseir harbor and Ras Gharib harbor , a multidisciplinary approach was adopted to invistigate 27 surfecial sediment samples from the three sites and 9 samples for each in order to use the benthic foraminifera as bio-indicators for characterization of the environmental variations. Grain size analyses indicate that the bottom facies in the inner part of quseir is muddy while the inner part of Ras Gharib and Safaga is silty sand and those close to the entrance of Safaga bay and Ras Gharib is sandy facies while quseir still also muddy facies. geochemical data show high concentration of heavy-metals mainly in Ras Gharib due to oil leakage from the hydrocarbon oil field and Safaga bay due to the phosphate mining while quseir is medium concentration due to anthropocentric effect.micropaelontological analyses indicate the boundaries of the highest concentration of heavy metals and those of low concentration as well.the dominant benthic foraminifera in these three sites are Ammonia beccarii, Amphistigina and sorites. the study highlights the worsening of environmental conditions and also show that the areas in need of a priority recovery.Keywords: benthic foraminifera, Ras Gharib, Safaga, Quseir, Red Sea, Egypt
Procedia PDF Downloads 35124797 The Various Legal Dimensions of Genomic Data
Authors: Amy Gooden
Abstract:
When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.Keywords: artificial intelligence, data, law, genomics, rights
Procedia PDF Downloads 13824796 Metal Contaminants in River Water and Human Urine after an Episode of Major Pollution by Mining Wastes in the Kasai Province of DR Congo
Authors: Remy Mpulumba Badiambile, Paul Musa Obadia, Malick Useni Mutayo, Jeef Numbi Mukanya, Patient Nkulu Banza, Tony Kayembe Kitenge, Erik Smolders, Jean-François Picron, Vincent Haufroid, Célestin Banza Lubaba Nkulu, Benoit Nemery
Abstract:
Background: In July 2021, the Tshikapa river became heavily polluted by mining wastes from a diamond mine in neighboring Angola, leading to massive killing of fish, as well as disease and even deaths among residents living along the Tshikapa and Kasai rivers, a major contributory of the Congo river. The exact nature of the pollutants was unknown. Methods: In a cross-sectional study conducted in the city of Tshikapa in August 2021, we enrolled by opportunistic sampling 65 residents (11 children < 16y) living alongside the polluted rivers and 65 control residents (5 children) living alongside a non-affected portion of the Kasai river (upstream from the Tshikapa-Kasai confluence). We administered a questionnaire and obtained spot urine samples for measurements of thiocyanate (a metabolite of cyanide) and 26 trace metals (by ICP-MS). Metals (and pH) were also measured in samples of river water. Results: Participants from both groups consumed river water. In the area affected by the pollution, most participants had eaten dead fish. Prevalences of reported health symptoms were higher in the exposed group than among controls: skin rashes (52% vs 0%), diarrhea (40% vs 8%), abdominal pain (8% vs 3%), nausea (3% vs 0%). In polluted water, concentrations [median (range)] were only higher for nickel [(2.2(1.4–3.5)µg/L] and uranium [78(71–91)ng/L] than in non-polluted water [0.8(0.6–1.9)µg/L; 9(7–19)ng/L]. In urine, concentrations [µg/g creatinine, median(IQR)] were significantly higher in the exposed group than in controls for lithium [19.5(12.4–27.3) vs 6.9(5.9–12.1)], thallium [0.41(0.31–0.57) vs 0.19(0.16–0.39)], and uranium [0.026(0.013–0.037)] vs 0.012(0.006–0.024)]. Other elements did not differ between the groups, but levels were higher than reference values for several metals (including manganese, cobalt, nickel, and lead). Urinary thiocyanate concentrations did not differ. Conclusion: This study, after an ecological disaster in the DRC, has documented contamination of river water by nickel and uranium and high urinary levels of some trace metals among affected riverine populations. However, the exact cause of the massive fish kill and disease among residents remains elusive. The capacity to rapidly investigate toxic pollution events must be increased in the area.Keywords: metal contaminants, river water and human urine, pollution by mining wastes, DR Congo
Procedia PDF Downloads 15724795 Big Brain: A Single Database System for a Federated Data Warehouse Architecture
Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf
Abstract:
Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)
Procedia PDF Downloads 23624794 Heavy Sulphide Material Characterization of Grasberg Block Cave Mine, Mimika, Papua: Implication for Tunnel Development and Mill Issue
Authors: Cahya Wimar Wicaksono, Reynara Davin Chen, Alvian Kristianto Santoso
Abstract:
Grasberg Cu-Au ore deposit as one of the biggest porphyry deposits located in Papua Province, Indonesia produced by several intrusion that restricted by Heavy Sulphide Zone (HSZ) in peripheral. HSZ is the rock that becomes the contact between Grassberg Igneous Complex (GIC) with sedimentary and igneous rock outside, which is rich in sulphide minerals such as pyrite ± pyrrhotite. This research is to obtain the characteristic of HSZ based on geotechnical, geochemical and mineralogy aspect and those implication for daily mining operational activities. Method used in this research are geological and alteration mapping, core logging, FAA (Fire Assay Analysis), AAS (Atomic absorption spectroscopy), RQD (Rock Quality Designation) and rock water content. Data generated from methods among RQD data, mineral composition and grade, lithological and structural geology distribution in research area. The mapping data show that HSZ material characteristics divided into three type based on rocks association, there are near igneous rocks, sedimentary rocks and on HSZ area. And also divided based on its location, north and south part of research area. HSZ material characteristic consist of rock which rich of pyrite ± pyrrhotite, and RQD range valued about 25%-100%. Pyrite ± pyrrhotite which outcropped will react with H₂O and O₂ resulting acid that generates corrosive effect on steel wire and rockbolt. Whereas, pyrite precipitation proses in HSZ forming combustible H₂S gas which is harmful during blasting activities. Furthermore, the impact of H₂S gas in blasting activities is forming poison gas SO₂. Although HSZ high grade Cu-Au, however those high grade Cu-Au rich in sulphide components which is affected in flotation milling process. Pyrite ± pyrrhotite in HSZ will chemically react with Cu-Au that will settle in milling process instead of floating.Keywords: combustible, corrosive, heavy sulphide zone, pyrite ± pyrrhotite
Procedia PDF Downloads 32624793 A Comparative Study on the Positive and Negative of Electronic Word-of-Mouth on the SERVQUAL Scale-Take A Certain Armed Forces General Hospital in Taiwan As An Example
Authors: Po-Chun Lee, Li-Lin Liang, Ching-Yuan Huang
Abstract:
Purpose: Research on electronic word-of-mouth (eWOM)& online review has been widely used in service industry management research in recent years. The SERVQUAL scale is the most commonly used method to measure service quality. Therefore, the purpose of this research is to combine electronic word of mouth & online review with the SERVQUAL scale. To explore the comparative study of positive and negative electronic word-of-mouth reviews of a certain armed force general hospital in Taiwan. Data sources: This research obtained online word-of-mouth comment data on google maps from a military hospital in Taiwan in the past ten years through Internet data mining technology. Research methods: This study uses the semantic content analysis method to classify word-of-mouth reviews according to the revised PZB SERVQUAL scale. Then carry out statistical analysis. Results of data synthesis: The results of this study disclosed that the negative reviews of this military hospital in Taiwan have been increasing year by year. Under the COVID-19 epidemic, positive word-of-mouth has a downward trend. Among the five determiners of SERVQUAL of PZB, positive word-of-mouth reviews performed best in “Assurance,” with a positive review rate of 58.89%, Followed by 43.33% of “Responsiveness.” In negative word-of-mouth reviews, “Assurance” performed the worst, with a positive rate of 70.99%, followed by responsive 29.01%. Conclusions: The important conclusions of this study disclosed that the total number of electronic word-of-mouth reviews of the military hospital has revealed positive growth in recent years, and the positive word-of-mouth growth has revealed negative growth after the epidemic of COVID-19, while the negative word-of-mouth has grown substantially. Regardless of the positive and negative comments, what patients care most about is “Assurance” of the professional attitude and skills of the medical staff, which needs to be strengthened most urgently. In addition, good “Reliability” will help build positive word-of-mouth. However, poor “Responsiveness” can easily lead to the spread of negative word-of-mouth. This study suggests that the hospital should focus on these few service-oriented quality management and audits.Keywords: quality of medical service, electronic word-of-mouth, armed forces general hospital
Procedia PDF Downloads 17724792 A Tool for Facilitating an Institutional Risk Profile Definition
Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan
Abstract:
This paper presents an approach for the easy creation of an institutional risk profile for endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support risk factors set up with just the most important values that are important for a particular organisation. Subsequently, the risk profile employs fuzzy models and associated configurations for the file format metadata aggregator to support digital preservation experts with a semi-automatic estimation of endangerment level for file formats. Our goal is to make use of a domain expert knowledge base aggregated from a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation and analysis of risk factors for a requried dimension. The proposed methods improve the visibility of risk factor information and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and automatically aggregated file format metadata from linked open data sources. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.Keywords: digital information management, file format, endangerment analysis, fuzzy models
Procedia PDF Downloads 40424791 A 0-1 Goal Programming Approach to Optimize the Layout of Hospital Units: A Case Study in an Emergency Department in Seoul
Authors: Farhood Rismanchian, Seong Hyeon Park, Young Hoon Lee
Abstract:
This paper proposes a method to optimize the layout of an emergency department (ED) based on real executions of care processes by considering several planning objectives simultaneously. Recently, demand for healthcare services has been dramatically increased. As the demand for healthcare services increases, so do the need for new healthcare buildings as well as the need for redesign and renovating existing ones. The importance of implementation of a standard set of engineering facilities planning and design techniques has been already proved in both manufacturing and service industry with many significant functional efficiencies. However, high complexity of care processes remains a major challenge to apply these methods in healthcare environments. Process mining techniques applied in this study to tackle the problem of complexity and to enhance care process analysis. Process related information such as clinical pathways extracted from the information system of an ED. A 0-1 goal programming approach is then proposed to find a single layout that simultaneously satisfies several goals. The proposed model solved by optimization software CPLEX 12. The solution reached using the proposed method has 42.2% improvement in terms of walking distance of normal patients and 47.6% improvement in walking distance of critical patients at minimum cost of relocation. It has been observed that lots of patients must unnecessarily walk long distances during their visit to the emergency department because of an inefficient design. A carefully designed layout can significantly decrease patient walking distance and related complications.Keywords: healthcare operation management, goal programming, facility layout problem, process mining, clinical processes
Procedia PDF Downloads 29524790 Assessing the Impacts of Riparian Land Use on Gully Development and Sediment Load: A Case Study of Nzhelele River Valley, Limpopo Province, South Africa
Authors: B. Mavhuru, N. S. Nethengwe
Abstract:
Human activities on land degradation have triggered several environmental problems especially in rural areas that are underdeveloped. The main aim of this study is to analyze the contribution of different land uses to gully development and sediment load on the Nzhelele River Valley in the Limpopo Province. Data was collected using different methods such as observation, field data techniques and experiments. Satellite digital images, topographic maps, aerial photographs and the sediment load static model also assisted in determining how land use affects gully development and sediment load. For data analysis, the researcher used the following methods: Analysis of Variance (ANOVA), descriptive statistics, Pearson correlation coefficient and statistical correlation methods. The results of the research illustrate that high land use activities create negative changes especially in areas that are highly fragile and vulnerable. Distinct impact on land use change was observed within settlement area (9.6 %) within a period of 5 years. High correlation between soil organic matter and soil moisture (R=0.96) was observed. Furthermore, a significant variation (p ≤ 0.6) between the soil organic matter and soil moisture was also observed. A very significant variation (p ≤ 0.003) was observed in bulk density and extreme significant variations (p ≤ 0.0001) were observed in organic matter and soil particle size. The sand mining and agricultural activities has contributed significantly to the amount of sediment load in the Nzhelele River. A high significant amount of total suspended sediment (55.3 %) and bed load (53.8 %) was observed within the agricultural area. The connection which associates the development of gullies to various land use activities determines the amount of sediment load. These results are consistent with other previous research and suggest that land use activities are likely to exacerbate the development of gullies and sediment load in the Nzhelele River Valley.Keywords: drainage basin, geomorphological processes, gully development, land degradation, riparian land use and sediment load
Procedia PDF Downloads 30724789 A Survey of Semantic Integration Approaches in Bioinformatics
Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir
Abstract:
Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.Keywords: biological ontology, linked data, semantic data integration, semantic web
Procedia PDF Downloads 44924788 Performance Analysis of Search Medical Imaging Service on Cloud Storage Using Decision Trees
Authors: González A. Julio, Ramírez L. Leonardo, Puerta A. Gabriel
Abstract:
Telemedicine services use a large amount of data, most of which are diagnostic images in Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7) formats. Metadata is generated from each related image to support their identification. This study presents the use of decision trees for the optimization of information search processes for diagnostic images, hosted on the cloud server. To analyze the performance in the server, the following quality of service (QoS) metrics are evaluated: delay, bandwidth, jitter, latency and throughput in five test scenarios for a total of 26 experiments during the loading and downloading of DICOM images, hosted by the telemedicine group server of the Universidad Militar Nueva Granada, Bogotá, Colombia. By applying decision trees as a data mining technique and comparing it with the sequential search, it was possible to evaluate the search times of diagnostic images in the server. The results show that by using the metadata in decision trees, the search times are substantially improved, the computational resources are optimized and the request management of the telemedicine image service is improved. Based on the experiments carried out, search efficiency increased by 45% in relation to the sequential search, given that, when downloading a diagnostic image, false positives are avoided in management and acquisition processes of said information. It is concluded that, for the diagnostic images services in telemedicine, the technique of decision trees guarantees the accessibility and robustness in the acquisition and manipulation of medical images, in improvement of the diagnoses and medical procedures in patients.Keywords: cloud storage, decision trees, diagnostic image, search, telemedicine
Procedia PDF Downloads 20424787 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture
Authors: Thrivikraman Aswathi, S. Advaith
Abstract:
As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.Keywords: GAN, transformer, classification, multivariate time series
Procedia PDF Downloads 13024786 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault
Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola
Abstract:
Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula
Procedia PDF Downloads 8224785 Testing and Validation Stochastic Models in Epidemiology
Authors: Snigdha Sahai, Devaki Chikkavenkatappa Yellappa
Abstract:
This study outlines approaches for testing and validating stochastic models used in epidemiology, focusing on the integration and functional testing of simulation code. It details methods for combining simple functions into comprehensive simulations, distinguishing between deterministic and stochastic components, and applying tests to ensure robustness. Techniques include isolating stochastic elements, utilizing large sample sizes for validation, and handling special cases. Practical examples are provided using R code to demonstrate integration testing, handling of incorrect inputs, and special cases. The study emphasizes the importance of both functional and defensive programming to enhance code reliability and user-friendliness.Keywords: computational epidemiology, epidemiology, public health, infectious disease modeling, statistical analysis, health data analysis, disease transmission dynamics, predictive modeling in health, population health modeling, quantitative public health, random sampling simulations, randomized numerical analysis, simulation-based analysis, variance-based simulations, algorithmic disease simulation, computational public health strategies, epidemiological surveillance, disease pattern analysis, epidemic risk assessment, population-based health strategies, preventive healthcare models, infection dynamics in populations, contagion spread prediction models, survival analysis techniques, epidemiological data mining, host-pathogen interaction models, risk assessment algorithms for disease spread, decision-support systems in epidemiology, macro-level health impact simulations, socioeconomic determinants in disease spread, data-driven decision making in public health, quantitative impact assessment of health policies, biostatistical methods in population health, probability-driven health outcome predictions
Procedia PDF Downloads 724784 Generation of Knowlege with Self-Learning Methods for Ophthalmic Data
Authors: Klaus Peter Scherer, Daniel Knöll, Constantin Rieder
Abstract:
Problem and Purpose: Intelligent systems are available and helpful to support the human being decision process, especially when complex surgical eye interventions are necessary and must be performed. Normally, such a decision support system consists of a knowledge-based module, which is responsible for the real assistance power, given by an explanation and logical reasoning processes. The interview based acquisition and generation of the complex knowledge itself is very crucial, because there are different correlations between the complex parameters. So, in this project (semi)automated self-learning methods are researched and developed for an enhancement of the quality of such a decision support system. Methods: For ophthalmic data sets of real patients in a hospital, advanced data mining procedures seem to be very helpful. Especially subgroup analysis methods are developed, extended and used to analyze and find out the correlations and conditional dependencies between the structured patient data. After finding causal dependencies, a ranking must be performed for the generation of rule-based representations. For this, anonymous patient data are transformed into a special machine language format. The imported data are used as input for algorithms of conditioned probability methods to calculate the parameter distributions concerning a special given goal parameter. Results: In the field of knowledge discovery advanced methods and applications could be performed to produce operation and patient related correlations. So, new knowledge was generated by finding causal relations between the operational equipment, the medical instances and patient specific history by a dependency ranking process. After transformation in association rules logically based representations were available for the clinical experts to evaluate the new knowledge. The structured data sets take account of about 80 parameters as special characteristic features per patient. For different extended patient groups (100, 300, 500), as well one target value as well multi-target values were set for the subgroup analysis. So the newly generated hypotheses could be interpreted regarding the dependency or independency of patient number. Conclusions: The aim and the advantage of such a semi-automatically self-learning process are the extensions of the knowledge base by finding new parameter correlations. The discovered knowledge is transformed into association rules and serves as rule-based representation of the knowledge in the knowledge base. Even more, than one goal parameter of interest can be considered by the semi-automated learning process. With ranking procedures, the most strong premises and also conjunctive associated conditions can be found to conclude the interested goal parameter. So the knowledge, hidden in structured tables or lists can be extracted as rule-based representation. This is a real assistance power for the communication with the clinical experts.Keywords: an expert system, knowledge-based support, ophthalmic decision support, self-learning methods
Procedia PDF Downloads 25324783 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name
Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing
Abstract:
Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.Keywords: NDN, order-preserving encryption, fuzzy search, privacy
Procedia PDF Downloads 48524782 Healthcare Big Data Analytics Using Hadoop
Authors: Chellammal Surianarayanan
Abstract:
Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare
Procedia PDF Downloads 41324781 Data Disorders in Healthcare Organizations: Symptoms, Diagnoses, and Treatments
Authors: Zakieh Piri, Shahla Damanabi, Peyman Rezaii Hachesoo
Abstract:
Introduction: Healthcare organizations like other organizations suffer from a number of disorders such as Business Sponsor Disorder, Business Acceptance Disorder, Cultural/Political Disorder, Data Disorder, etc. As quality in healthcare care mostly depends on the quality of data, we aimed to identify data disorders and its symptoms in two teaching hospitals. Methods: Using a self-constructed questionnaire, we asked 20 questions in related to quality and usability of patient data stored in patient records. Research population consisted of 150 managers, physicians, nurses, medical record staff who were working at the time of study. We also asked their views about the symptoms and treatments for any data disorders they mentioned in the questionnaire. Using qualitative methods we analyzed the answers. Results: After classifying the answers, we found six main data disorders: incomplete data, missed data, late data, blurred data, manipulated data, illegible data. The majority of participants believed in their important roles in treatment of data disorders while others believed in health system problems. Discussion: As clinicians have important roles in producing of data, they can easily identify symptoms and disorders of patient data. Health information managers can also play important roles in early detection of data disorders by proactively monitoring and periodic check-ups of data.Keywords: data disorders, quality, healthcare, treatment
Procedia PDF Downloads 43324780 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines
Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay
Abstract:
One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.Keywords: big data, data analytics, higher education, republic of the philippines, assessment
Procedia PDF Downloads 34824779 Effects of Sulphide Mining on AISI 304 Stainless Steel
Authors: Aguasanta Miguel Sarmiento, José Miguel Dávila, María Luisa de la Torre
Abstract:
Acid mine drainage (AMD) is an acidic leachate with high levels of metals and sulphates in solution, which seriously affects the durability and strength of metallic materials used in the construction of structural and mechanical components. This paper presents the results of the evolution over time of the reduction in tensile strength and defects in AISI 304 stainless steel in contact with acid mine drainage. For this purpose, a total of 30 bars with a diameter of 8 mm and a length of 14 cm were placed transversely in the course of a stream contaminated by AMD from the sulphide mines of the Iberian Pyritic Belt (SW Spain). This stream has average pH values of 2.6, a potential of 660 mV, and average concentrations of 12 g/L of sulphates, 1.2 g/L of Fe, 191 mg/L of Zn, etc. Every two months of exposure, 6 stainless steel bars were extracted from the acid stream. They were subjected to surface roughness analysis carried out with the help of Mitutoyo Surftest SJ-210 surface roughness tester. The analysis was carried out at three different points on 5 specimens from each series. The average reading of each parameter is calculated in order to ensure the accuracy of the measurements and the surface coverage. Arithmetic mean roughness value (Ra), mean roughness depth (Rz), and root mean square roughness (Rq) were measured. Five specimens from each series were statically tensile tested using universal equipment (Servosis ME 403 of 200kN). The specimens were clamped at their ends with two grips for cylindrical sections, and the tensile force was applied at a constant speed of 0.5 kN/s, according to the requirements of standard UNE-EN ISO 6892-1: 2020. To determine the modulus of elasticity, limits close to 15% and 55% of the maximum load were used, depending on the course of each test. Field Emission Scanning Electron Microscopy (FESEM) was used to observe corrosion products and defects generated by exposure to AMD. Energy dispersive X-ray spectrometry (EDS) was used to analyse the chemical composition of the corrosion products formed. For this purpose, small pieces were cut from the resulting specimens, cleaned, and embedded in epoxy resin. The results show that after only 5 months of exposure of AISI 304 stainless steel to the mining environment, the surface roughness increases significantly, with average depths almost 6 times greater than the initial one. Cracks are observed on the surface of the material, which increases in size with the time of exposure. A large number of grains with a composition of more than 57% Pb and 16% Sn can be observed inside these cracks. Tensile tests show a reduction in the resistance of this material after only two months of exposure. The results show the serious problems that would result from the use of this material for the use of mechanical components in a sulphide mining environment, not only because of the significant reduction in the lifetime of such components, but also because of the implications for human safety.Keywords: acid mine drainage, corrosion, mechanical properties, stainless steel
Procedia PDF Downloads 1624778 Career Guidance System Using Machine Learning
Authors: Mane Darbinyan, Lusine Hayrapetyan, Elen Matevosyan
Abstract:
Artificial Intelligence in Education (AIED) has been created to help students get ready for the workforce, and over the past 25 years, it has grown significantly, offering a variety of technologies to support academic, institutional, and administrative services. However, this is still challenging, especially considering the labor market's rapid change. While choosing a career, people face various obstacles because they do not take into consideration their own preferences, which might lead to many other problems like shifting jobs, work stress, occupational infirmity, reduced productivity, and manual error. Besides preferences, people should properly evaluate their technical and non-technical skills, as well as their personalities. Professional counseling has become a difficult undertaking for counselors due to the wide range of career choices brought on by changing technological trends. It is necessary to close this gap by utilizing technology that makes sophisticated predictions about a person's career goals based on their personality. Hence, there is a need to create an automated model that would help in decision-making based on user inputs. Improving career guidance can be achieved by embedding machine learning into the career consulting ecosystem. There are various systems of career guidance that work based on the same logic, such as the classification of applicants, matching applications with appropriate departments or jobs, making predictions, and providing suitable recommendations. Methodologies like KNN, Neural Networks, K-means clustering, D-Tree, and many other advanced algorithms are applied in the fields of data and compute some data, which is helpful to predict the right careers. Besides helping users with their career choice, these systems provide numerous opportunities which are very useful while making this hard decision. They help the candidate to recognize where he/she specifically lacks sufficient skills so that the candidate can improve those skills. They are also capable to offer an e-learning platform, taking into account the user's lack of knowledge. Furthermore, users can be provided with details on a particular job, such as the abilities required to excel in that industry.Keywords: career guidance system, machine learning, career prediction, predictive decision, data mining, technical and non-technical skills
Procedia PDF Downloads 8024777 Data Management and Analytics for Intelligent Grid
Authors: G. Julius P. Roy, Prateek Saxena, Sanjeev Singh
Abstract:
Power distribution utilities two decades ago would collect data from its customers not later than a period of at least one month. The origin of SmartGrid and AMI has subsequently increased the sampling frequency leading to 1000 to 10000 fold increase in data quantity. This increase is notable and this steered to coin the tern Big Data in utilities. Power distribution industry is one of the largest to handle huge and complex data for keeping history and also to turn the data in to significance. Majority of the utilities around the globe are adopting SmartGrid technologies as a mass implementation and are primarily focusing on strategic interdependence and synergies of the big data coming from new information sources like AMI and intelligent SCADA, there is a rising need for new models of data management and resurrected focus on analytics to dissect data into descriptive, predictive and dictatorial subsets. The goal of this paper is to is to bring load disaggregation into smart energy toolkit for commercial usage.Keywords: data management, analytics, energy data analytics, smart grid, smart utilities
Procedia PDF Downloads 78024776 Career Guidance System Using Machine Learning
Authors: Mane Darbinyan, Lusine Hayrapetyan, Elen Matevosyan
Abstract:
Artificial Intelligence in Education (AIED) has been created to help students get ready for the workforce, and over the past 25 years, it has grown significantly, offering a variety of technologies to support academic, institutional, and administrative services. However, this is still challenging, especially considering the labor market's rapid change. While choosing a career, people face various obstacles because they do not take into consideration their own preferences, which might lead to many other problems like shifting jobs, work stress, occupational infirmity, reduced productivity, and manual error. Besides preferences, people should evaluate properly their technical and non-technical skills, as well as their personalities. Professional counseling has become a difficult undertaking for counselors due to the wide range of career choices brought on by changing technological trends. It is necessary to close this gap by utilizing technology that makes sophisticated predictions about a person's career goals based on their personality. Hence, there is a need to create an automated model that would help in decision-making based on user inputs. Improving career guidance can be achieved by embedding machine learning into the career consulting ecosystem. There are various systems of career guidance that work based on the same logic, such as the classification of applicants, matching applications with appropriate departments or jobs, making predictions, and providing suitable recommendations. Methodologies like KNN, neural networks, K-means clustering, D-Tree, and many other advanced algorithms are applied in the fields of data and compute some data, which is helpful to predict the right careers. Besides helping users with their career choice, these systems provide numerous opportunities which are very useful while making this hard decision. They help the candidate to recognize where he/she specifically lacks sufficient skills so that the candidate can improve those skills. They are also capable of offering an e-learning platform, taking into account the user's lack of knowledge. Furthermore, users can be provided with details on a particular job, such as the abilities required to excel in that industry.Keywords: career guidance system, machine learning, career prediction, predictive decision, data mining, technical and non-technical skills
Procedia PDF Downloads 7024775 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive
Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh
Abstract:
Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data
Procedia PDF Downloads 29524774 Early Gastric Cancer Prediction from Diet and Epidemiological Data Using Machine Learning in Mizoram Population
Authors: Brindha Senthil Kumar, Payel Chakraborty, Senthil Kumar Nachimuthu, Arindam Maitra, Prem Nath
Abstract:
Gastric cancer is predominantly caused by demographic and diet factors as compared to other cancer types. The aim of the study is to predict Early Gastric Cancer (ECG) from diet and lifestyle factors using supervised machine learning algorithms. For this study, 160 healthy individual and 80 cases were selected who had been followed for 3 years (2016-2019), at Civil Hospital, Aizawl, Mizoram. A dataset containing 11 features that are core risk factors for the gastric cancer were extracted. Supervised machine algorithms: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Multilayer perceptron, and Random Forest were used to analyze the dataset using Python Jupyter Notebook Version 3. The obtained classified results had been evaluated using metrics parameters: minimum_false_positives, brier_score, accuracy, precision, recall, F1_score, and Receiver Operating Characteristics (ROC) curve. Data analysis results showed Naive Bayes - 88, 0.11; Random Forest - 83, 0.16; SVM - 77, 0.22; Logistic Regression - 75, 0.25 and Multilayer perceptron - 72, 0.27 with respect to accuracy and brier_score in percent. Naive Bayes algorithm out performs with very low false positive rates as well as brier_score and good accuracy. Naive Bayes algorithm classification results in predicting ECG showed very satisfactory results using only diet cum lifestyle factors which will be very helpful for the physicians to educate the patients and public, thereby mortality of gastric cancer can be reduced/avoided with this knowledge mining work.Keywords: Early Gastric cancer, Machine Learning, Diet, Lifestyle Characteristics
Procedia PDF Downloads 16124773 Democracy Bytes: Interrogating the Exploitation of Data Democracy by Radical Terrorist Organizations
Authors: Nirmala Gopal, Sheetal Bhoola, Audecious Mugwagwa
Abstract:
This paper discusses the continued infringement and exploitation of data by non-state actors for destructive purposes, emphasizing radical terrorist organizations. It will discuss how terrorist organizations access and use data to foster their nefarious agendas. It further examines how cybersecurity, designed as a tool to curb data exploitation, is ineffective in raising global citizens' concerns about how their data can be kept safe and used for its acquired purpose. The study interrogates several policies and data protection instruments, such as the Data Protection Act, Cyber Security Policies, Protection of Personal Information(PPI) and General Data Protection Regulations (GDPR), to understand data use and storage in democratic states. The study outcomes point to the fact that international cybersecurity and cybercrime legislation, policies, and conventions have not curbed violations of data access and use by radical terrorist groups. The study recommends ways to enhance cybersecurity and reduce cyber risks using democratic principles.Keywords: cybersecurity, data exploitation, terrorist organizations, data democracy
Procedia PDF Downloads 204