Search results for: incomplete data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24315

Search results for: incomplete data

24165 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 81
24164 Long-Term Cohort of Patients with Beta Thalassemia; Prevailing Role of Serum Ferritin Levels in Hypocalcemia and Growth Retardation

Authors: Shervin Rashidinia, Sara Shahmoradi, Seyyed Shahin Eftekhari, Mohsen Talebizadeh, Mohammad Saleh Sadeghi

Abstract:

Background: Beta-thalassemia Major (BTM) is a kind of hereditary hemolytic anemia which depended on regular monthly blood transfusion. However, iron deposition into the organs leads to multi-organ damage. The present study is the first study which aimed to evaluate the average of five-years serum ferritin level and compared by the prevalence of short stature and hypocalcemia. Materials/Methods: A cross-sectional retrospective study which a total of 140 patients with beta-thalassemia who were referred to Qom Thalassemia Clinic between February 2011 and July 2016 were enrolled to be reviewed. The exclusion criteria were consisting of incomplete medical records, diagnosis less than 2-years-ago and the blood transfusion less than every 4 weeks. The data including age, gender, weight, height, age of initial blood transfusion, age of initial chelation therapy, ferritin, and calcium were collected and analysis by SPSS version 24. Results: A total of 140 patients were enrolled. Of them, 75 (53.4%) were female. The mean age of the patients was 13.4±4.6 years.The mean age of initial diagnosis was 20.2±7.4 months. Hypocalcemia and short stature were occurred in 41 (29.3%) and 37 (26.4%) patients, respectively. The mean five-years serum ferritin level was significantly higher in the patients with short stature and hypocalcemia (P<0.0001). However, rise in serum ferritin level significantly increases the risk of short-stature and hypocalcemia (1.0004- and 1.0029 fold, respectively). Conclusion: We demonstrated that prevalence of short stature and hypocalcemia were significantly higher in the BTM.However, ferritin significantly increases the risk of short stature and hypocalcemia.

Keywords: beta-thalassemia, ferritin, growth retardation, hypocalcemia

Procedia PDF Downloads 302
24163 A Systematic Review on Challenges in Big Data Environment

Authors: Rimmy Yadav, Anmol Preet Kaur

Abstract:

Big Data has demonstrated the vast potential in streamlining, deciding, spotting business drifts in different fields, for example, producing, fund, Information Technology. This paper gives a multi-disciplinary diagram of the research issues in enormous information and its procedures, instruments, and system identified with the privacy, data storage management, network and energy utilization, adaptation to non-critical failure and information representations. Other than this, result difficulties and openings accessible in this Big Data platform have made.

Keywords: big data, privacy, data management, network and energy consumption

Procedia PDF Downloads 277
24162 Survey on Big Data Stream Classification by Decision Tree

Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: big data, data streams, classification, decision tree

Procedia PDF Downloads 487
24161 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication

Authors: Aishwarya Shekhar, Himanshu Sharma

Abstract:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.

Keywords: confidentiality, deduplication, data compression, hybridity of cloud

Procedia PDF Downloads 356
24160 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.

Abstract:

Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 407
24159 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption

Procedia PDF Downloads 152
24158 Valorization of Surveillance Data and Assessment of the Sensitivity of a Surveillance System for an Infectious Disease Using a Capture-Recapture Model

Authors: Jean-Philippe Amat, Timothée Vergne, Aymeric Hans, Bénédicte Ferry, Pascal Hendrikx, Jackie Tapprest, Barbara Dufour, Agnès Leblond

Abstract:

The surveillance of infectious diseases is necessary to describe their occurrence and help the planning, implementation and evaluation of risk mitigation activities. However, the exact number of detected cases may remain unknown whether surveillance is based on serological tests because identifying seroconversion may be difficult. Moreover, incomplete detection of cases or outbreaks is a recurrent issue in the field of disease surveillance. This study addresses these two issues. Using a viral animal disease as an example (equine viral arteritis), the goals were to establish suitable rules for identifying seroconversion in order to estimate the number of cases and outbreaks detected by a surveillance system in France between 2006 and 2013, and to assess the sensitivity of this system by estimating the total number of outbreaks that occurred during this period (including unreported outbreaks) using a capture-recapture model. Data from horses which exhibited at least one positive result in serology using viral neutralization test between 2006 and 2013 were used for analysis (n=1,645). Data consisted of the annual antibody titers and the location of the subjects (towns). A consensus among multidisciplinary experts (specialists in the disease and its laboratory diagnosis, epidemiologists) was reached to consider seroconversion as a change in antibody titer from negative to at least 32 or as a three-fold or greater increase. The number of seroconversions was counted for each town and modeled using a unilist zero-truncated binomial (ZTB) capture-recapture model with R software. The binomial denominator was the number of horses tested in each infected town. Using the defined rules, 239 cases located in 177 towns (outbreaks) were identified from 2006 to 2013. Subsequently, the sensitivity of the surveillance system was estimated as the ratio of the number of detected outbreaks to the total number of outbreaks that occurred (including unreported outbreaks) estimated using the ZTB model. The total number of outbreaks was estimated at 215 (95% credible interval CrI95%: 195-249) and the surveillance sensitivity at 82% (CrI95%: 71-91). The rules proposed for identifying seroconversion may serve future research. Such rules, adjusted to the local environment, could conceivably be applied in other countries with surveillance programs dedicated to this disease. More generally, defining ad hoc algorithms for interpreting the antibody titer could be useful regarding other human and animal diseases and zoonosis when there is a lack of accurate information in the literature about the serological response in naturally infected subjects. This study shows how capture-recapture methods may help to estimate the sensitivity of an imperfect surveillance system and to valorize surveillance data. The sensitivity of the surveillance system of equine viral arteritis is relatively high and supports its relevance to prevent the disease spreading.

Keywords: Bayesian inference, capture-recapture, epidemiology, equine viral arteritis, infectious disease, seroconversion, surveillance

Procedia PDF Downloads 265
24157 Evaluating the Implementation of a Quality Management System in the COVID-19 Diagnostic Laboratory of a Tertiary Care Hospital in Delhi

Authors: Sukriti Sabharwal, Sonali Bhattar, Shikhar Saxena

Abstract:

Introduction: COVID-19 molecular diagnostic laboratory is the cornerstone of the COVID-19 disease diagnosis as the patient’s treatment and management protocol depend on the molecular results. For this purpose, it is extremely important that the laboratory conducting these results adheres to the quality management processes to increase the accuracy and validity of the reports generated. We started our own molecular diagnostic setup at the onset of the pandemic. Therefore, we conducted this study to generate our quality management data to help us in improving on our weak points. Materials and Methods: A total of 14561 samples were evaluated by the retrospective observational method. The quality variables analysed were classified into pre-analytical, analytical, and post-analytical variables, and the results were presented in percentages. Results: Among the pre-analytical variables, sample leaking was the most common cause of the rejection of samples (134/14561, 0.92%), followed by non-generation of SRF ID (76/14561, 0.52%) and non-compliance to triple packaging (44/14561, 0.3%). The other pre-analytical aspects assessed were incomplete patient identification (17/14561, 0.11%), insufficient quantity of samples (12/14561, 0.08%), missing forms/samples (7/14561, 0.04%), samples in the wrong vials/empty VTM tubes (5/14561, 0.03%) and LIMS entry not done (2/14561, 0.01%). We are unable to obtain internal quality control in 0.37% of samples (55/14561). We also experienced two incidences of cross-contamination among the samples resulting in false-positive results. Among the post-analytical factors, a total of 0.07% of samples (11/14561) could not be dispatched within the stipulated time frame. Conclusion: Adherence to quality control processes is foremost for the smooth running of any diagnostic laboratory, especially the ones involved in critical reporting. Not only do the indicators help in keeping in check the laboratory parameters but they also allow comparison with other laboratories.

Keywords: laboratory quality management, COVID-19, molecular diagnostics, healthcare

Procedia PDF Downloads 125
24156 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)

Procedia PDF Downloads 214
24155 Effects of Using a Recurrent Adverse Drug Reaction Prevention Program on Safe Use of Medicine among Patients Receiving Services at the Accident and Emergency Department of Songkhla Hospital Thailand

Authors: Thippharat Wongsilarat, Parichat tuntilanon, Chonlakan Prataksitorn

Abstract:

Recurrent adverse drug reactions are harmful to patients with mild to fatal illnesses, and affect not only patients but also their relatives, and organizations. To compare safe use of medicine among patients before and after using the recurrent adverse drug reaction prevention program . Quasi-experimental research with the target population of 598 patients with drug allergy history. Data were collected through an observation form tested for its validity by three experts (IOC = 0.87), and analyzed with a descriptive statistic (percentage). The research was conducted jointly with a multidisciplinary team to analyze and determine the weak points and strong points in the recurrent adverse drug reaction prevention system during the past three years, and 546, 329, and 498 incidences, respectively, were found. Of these, 379, 279, and 302 incidences, or 69.4; 84.80; and 60.64 percent of the patients with drug allergy history, respectively, were found to have caused by incomplete warning system. In addition, differences in practice in caring for patients with drug allergy history were found that did not cover all the steps of the patient care process, especially a lack of repeated checking, and a lack of communication between the multidisciplinary team members. Therefore, the recurrent adverse drug reaction prevention program was developed with complete warning points in the information technology system, the repeated checking step, and communication among related multidisciplinary team members starting from the hospital identity card room, patient history recording officers, nurses, physicians who prescribe the drugs, and pharmacists. Including in the system were surveillance, nursing, recording, and linking the data to referring units. There were also training concerning adverse drug reactions by pharmacists, monthly meetings to explain the process to practice personnel, creating safety culture, random checking of practice, motivational encouragement, supervising, controlling, following up, and evaluating the practice. The rate of prescribing drugs to which patients were allergic per 1,000 prescriptions was 0.08, and the incidence rate of recurrent drug reaction per 1,000 prescriptions was 0. Surveillance of recurrent adverse drug reactions covering all service providing points can ensure safe use of medicine for patients.

Keywords: recurrent drug, adverse reaction, safety, use of medicine

Procedia PDF Downloads 414
24154 Effect of Internet Addiction on Dietary Behavior and Lifestyle Characteristics among University Students

Authors: Hafsa Kamran, Asma Afreen, Zaheer Ahmed

Abstract:

Internet addiction, an emerging mental health disorder from last two decades, is manifested by the inability in the controlled use of internet leading to academics, social, physiological and/or psychological difficulties. The present study aimed to assess the levels of internet addiction among university students in Lahore and to explore the effects of internet addiction on their dietary behavior and lifestyle. It was an analytical cross-sectional study. Data was collected from October to December 2016 from students of four universities selected through two-stage sampling method. The numbers of participants were 500 and 13 questionnaires were rejected due to incomplete information. Levels of Internet Addiction (IA) were calculated using Young Internet Addiction Test (YIAT). Data was also collected on students’ demographics, lifestyle factors and dietary behavior using self-reported questionnaire. Data was analyzed using SPSS (version 21). Chi-square test was applied to evaluate the relationship between variables. Results of the study revealed that 10% of the population had severe internet addiction while moderate Internet Addiction was present in 42%. High prevalence was found among males (11% vs. 8%), private sector university students (p = 0.008) and engineering students (p = 0.000). The lifestyle habits of internet addicts were significantly of poorer quality than normal users (p = 0.05). Internet addiction was found associated with lesser physically activity (p = 0.025), had shorter duration of physical activity (p = 0.016), had more disorganized sleep pattern (p = 0.023), had less duration of sleep (p = 0.019), reported being more tired and sleepy in class (p = 0.033) and spending more time on internet as compared to normal users. Severe and moderate internet addicts also found to be more overweight and obese than normal users (p = 0.000). The dietary behavior of internet addicts was significantly poorer than normal users. Internet addicts were found to skip breakfast more than a normal user (p = 0.039). Common reasons for meal skipping were lack of time and snacking between meals (p = 0.000). They also had increased meal size (p = 0.05) and habit of snacking while using the internet (p = 0.027). Fast food (p = 0.016) and fried items (p = 0.05) were most consumed snacks, while carbonated beverages (p = 0.019) were most consumed beverages among internet addicts. Internet Addicts were found to consume less than recommended daily servings of dairy (p = 0.008) and fruits (p = 0.000) and more servings of meat group (p = 0.025) than their no internet addict counterparts. In conclusion, in this study, it was demonstrated that internet addicts have unhealthy dietary behavior and inappropriate lifestyle habits. University students should be educated regarding the importance of balanced diet and healthy lifestyle, which are critical for effectual primary prevention of numerous chronic degenerative diseases. Furthermore, it is necessary to raise awareness concerning adverse effects of internet addiction among youth and their parents.

Keywords: dietary behavior, internet addiction, lifestyle, university students

Procedia PDF Downloads 177
24153 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur

Abstract:

In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 565
24152 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring

Authors: Seung-Lock Seo

Abstract:

This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.

Keywords: data mining, process data, monitoring, safety, industrial processes

Procedia PDF Downloads 369
24151 Effect of Globalization on Flow Performance in Godean Jathilan Pranesa Yogyakarta

Authors: Maria Armalita Tumimbang

Abstract:

Jathilan or Kuda Lumping is a dance-drama with warfare as the main theme and the dancers mimicking mighty horsemen armed with sword in the middle of the battle field. However, to most people this dance-drama is more identical with magical nuanced dance and trance, beside the attractive and even dangerous acts of the dancers, such as eating shard or broken glass in a state of trance. Several music players play the accompaniment made up of incomplete gamelan set that include saron, kendang, gong, and kempul. In general, it remains unchanged with regards to the seemingly monotonous beat and occasional “bumps” that may lead the dancers into a trance state. The dances performed also tend to be of repetitive patterns. The development of Jathilan and other traditional art performance in this globalization and industrialization era can be divided into two: firstly, they are subjected to the power of industrialization, which means their performances are to be recorded for commercial purpose, and secondly, they are to be presented in live performances. To some people, live performances are preferable, and for some reasons, they represent a form of cultural résistance to globalization and industrialization. The present study is qualitative in nature. It aims to describe the music and performance of Jathilan in the era of globalization in Indonesia. The subject of this study is a traditional art group, Jathilan Kuda Pranesa of Godean, Yogyakarta. Data collection was conducted by interviews with the leader of the group, the dancers and music players, as well as the audience. The wave of globalization has brought strong capitalistic industrialization that render traditional arts simply into industrial commodities tailored to the need of the era. This very fact has made the repositioning of traditional art performance of Jathilan a necessity. And by repositioning we mean that Jathilans should be put back to their traditional forms and functions as they used to be.

Keywords: Jathilan, globalization, industrialization, music, performance

Procedia PDF Downloads 280
24150 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: biological ontology, linked data, semantic data integration, semantic web

Procedia PDF Downloads 416
24149 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture

Authors: Thrivikraman Aswathi, S. Advaith

Abstract:

As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.

Keywords: GAN, transformer, classification, multivariate time series

Procedia PDF Downloads 90
24148 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault

Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola

Abstract:

Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.

Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula

Procedia PDF Downloads 33
24147 Topic-to-Essay Generation with Event Element Constraints

Authors: Yufen Qin

Abstract:

Topic-to-Essay generation is a challenging task in Natural language processing, which aims to generate novel, diverse, and topic-related text based on user input. Previous research has overlooked the generation of articles under the constraints of event elements, resulting in issues such as incomplete event elements and logical inconsistencies in the generated results. To fill this gap, this paper proposes an event-constrained approach for a topic-to-essay generation that enforces the completeness of event elements during the generation process. Additionally, a language model is employed to verify the logical consistency of the generated results. Experimental results demonstrate that the proposed model achieves a better BLEU-2 score and performs better than the baseline in terms of subjective evaluation on a real dataset, indicating its capability to generate higher-quality topic-related text.

Keywords: event element, language model, natural language processing, topic-to-essay generation.

Procedia PDF Downloads 197
24146 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name

Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing

Abstract:

Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.

Keywords: NDN, order-preserving encryption, fuzzy search, privacy

Procedia PDF Downloads 448
24145 From the Recursive Definition of Refutability to the Invalidity of Gödel’s 1931 Incompleteness

Authors: Paola Cattabriga

Abstract:

According to Gödel’s first incompleteness argument it is possible to construct a formally undecidable proposition in Principia mathematica, a statement that, although true, turns out to be neither provable nor refutable for the system, making therefore incomplete any formal system suitable for the arithmetic of integers. Its features and limitation effects are today widespread basics throughout whole scientific thought. This article brings Gödel’s achievement into question by the definition of the refutability predicate as a number-theoretical statement. We develop proof of invalidity of Theorem VI in Gödel’s 1931, the so-called Gödel’s first incompleteness theorem, in two steps: defining refutability within the same recursive status as provability and showing that as a consequence propositions (15) and (16), derived from definition 8.1 in Gödel’s 1931, are false and unacceptable for the system. The achievement of their falsity blocks the derivation of Theorem VI, which turns out to be therefore invalid, together with all the depending theorems. This article opens up thus new perspectives for mathematical research and for the overall scientific reasoning.

Keywords: Gödel numbering, incompleteness, provability predicate, refutability predicate

Procedia PDF Downloads 156
24144 Healthcare Big Data Analytics Using Hadoop

Authors: Chellammal Surianarayanan

Abstract:

Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.

Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare

Procedia PDF Downloads 380
24143 Generating Music with More Refined Emotions

Authors: Shao-Di Feng, Von-Wun Soo

Abstract:

To generate symbolic music with specific emotions is a challenging task due to symbolic music datasets that have emotion labels are scarce and incomplete. This research aims to generate more refined emotions based on the training datasets that are only labeled with four quadrants in Russel’s 2D emotion model. We focus on the theory of Music Fadernet and map arousal and valence to the low-level attributes, and build a symbolic music generation model by combining transformer and GM-VAE. We adopt an in-attention mechanism for the model and improve it by allowing modulation by conditional information. And we show the music generation model could control the generation of music according to the emotions specified by users in terms of high-level linguistic expression and by manipulating their corresponding low-level musical attributes. Finally, we evaluate the model performance using a pre-trained emotion classifier against a pop piano midi dataset called EMOPIA, and by subjective listening evaluation, we demonstrate that the model could generate music with more refined emotions correctly.

Keywords: music generation, music emotion controlling, deep learning, semi-supervised learning

Procedia PDF Downloads 56
24142 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines

Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay

Abstract:

One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.

Keywords: big data, data analytics, higher education, republic of the philippines, assessment

Procedia PDF Downloads 309
24141 Data Management and Analytics for Intelligent Grid

Authors: G. Julius P. Roy, Prateek Saxena, Sanjeev Singh

Abstract:

Power distribution utilities two decades ago would collect data from its customers not later than a period of at least one month. The origin of SmartGrid and AMI has subsequently increased the sampling frequency leading to 1000 to 10000 fold increase in data quantity. This increase is notable and this steered to coin the tern Big Data in utilities. Power distribution industry is one of the largest to handle huge and complex data for keeping history and also to turn the data in to significance. Majority of the utilities around the globe are adopting SmartGrid technologies as a mass implementation and are primarily focusing on strategic interdependence and synergies of the big data coming from new information sources like AMI and intelligent SCADA, there is a rising need for new models of data management and resurrected focus on analytics to dissect data into descriptive, predictive and dictatorial subsets. The goal of this paper is to is to bring load disaggregation into smart energy toolkit for commercial usage.

Keywords: data management, analytics, energy data analytics, smart grid, smart utilities

Procedia PDF Downloads 753
24140 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive

Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh

Abstract:

Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.

Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data

Procedia PDF Downloads 263
24139 A Review of Current Practices in Tattooing of Colonic Lesion at Endoscopy

Authors: Dhanashree Moghe, Roberta Bullingham, Rizwan Ahmed, Tarun Singhal

Abstract:

Aim: The NHS Bowel Screening Programme recommends the use of endoscopic tattooing for suspected malignant lesions that later require surgical or endoscopic localisation, using local protocols as guidance. This is in accordance with guidance from the BSG (The British Society of Gastroenterologists). We used a well-recognised local protocol as a standard to audit current tattooing practice in a large district general hospital with no current local guidelines. Method: A retrospective quantitative analysis of 50 patients who underwent segmental colonic resection for cancer over a 6-month period in 2021. We reviewed historic electronic endoscopy reports recording relevant data on tattoo indication and placement. Secondly, we carried out an anonymous survey of 16 independent lower GI endoscopists on self-reported details of their practice. Results: In our study, 28 patients (56%) had a tattoo placed at the time of their colonoscopy. Of these, only 53% (n=15) had the tattoo distal to the lesion, with the measured distance of the tattoo from the lesion only being documented in 8 reports. Only seven patients (25%) had a circumferential (4 quadrant) placement of the tattoo. 13 patients had lesions either in the caecum or rectum, locations deemed unnecessary as per BSG guidelines. Of the survey responses collected, there were four different protocols being used to guide practice. Only 50% of respondents placed tattoos at the correct distance from the lesion, and 83% placed the correct number of tattoos. Conclusion: There is a lack of standardisation of practices in colonic tattooing demonstrated in our study with incomplete compliance to our standard. The inadequate documentation of tattoo location can contribute to confusion and inaccuracy in the intraoperative localisation of lesions. This has the potential to increase operation length and morbidity. There is a need to standardise both technique and documentation in colonoscopic tattooing practice.

Keywords: colorectal cancer, endoscopic tattooing, colonoscopy, NHS BSCP

Procedia PDF Downloads 87
24138 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 177
24137 Cadaveric Assessment of Kidney Dimensions Among Nigerians - A Preliminary Report

Authors: Rotimi Sunday Ajani, Omowumi Femi-Akinlosotu

Abstract:

Background: The usually paired human kidneys are retroperitoneal urinary organs with some endocrine functions. Standard text books of anatomy ascribe single value to each of the dimension of length, width and thickness. Research questions: These values do not give consideration to racial and genetic variability in human morphology. They may thus be erroneous to students and clinicians working on Nigerians. Objectives: The study aimed at establishing reference values of the kidney length, width and thickness for Nigerians using the cadaveric model. Methodology: The length, width, thickness and weight of sixty kidneys harvested from cadavers of thirty adult Nigerians (Male: Female; 27: 3) were measured. Respective volume was calculated using the ellipsoid formula. Results: The mean length of the kidney was 9.84±0.89 cm (9.63±0.88 {right}; 10.06±0.86 {left}), width- 5.18±0.70 cm (5.21±0.72 {right}; 5.14±0.70 {left}), thickness-3.45±0.56 cm (3.36±0.58 {right}, 3.53±0.55 {left}), weight-125.06±22.34 g (122.36±21.70 {right}; 127.76 ±24.02 {left}) and volume of 95.45± 24.40 cm3 (91.73± 26.84 {right}; 99.17± 25.75 {left}). Discussion: Though the values of the parameters measured were higher for the left kidney (except for the width), they were not statistically significant. The various parameters obtained by this study differ from those of similar studies from other continents. Conclusion: Stating single value for each of the parameter of length, width and thickness of the kidney as currently obtained in textbooks of anatomy may be incomplete information and hence misleading. Thus, there is the need to emphasize racial differences when stating the normal values of kidney dimensions in textbooks of anatomy. Implication for Research and Innovation: The results of the study showed the dimensions of the kidney (length, width and thickness) have interracial vagaries as they were different from those of similar studies and values stated in standard textbooks of human anatomy. Future direction: This is a preliminary report and the study will continue so that more data will be obtained.

Keywords: kidney dimensions, cadaveric estimation, adult nigerians, racial differences

Procedia PDF Downloads 64
24136 Democracy Bytes: Interrogating the Exploitation of Data Democracy by Radical Terrorist Organizations

Authors: Nirmala Gopal, Sheetal Bhoola, Audecious Mugwagwa

Abstract:

This paper discusses the continued infringement and exploitation of data by non-state actors for destructive purposes, emphasizing radical terrorist organizations. It will discuss how terrorist organizations access and use data to foster their nefarious agendas. It further examines how cybersecurity, designed as a tool to curb data exploitation, is ineffective in raising global citizens' concerns about how their data can be kept safe and used for its acquired purpose. The study interrogates several policies and data protection instruments, such as the Data Protection Act, Cyber Security Policies, Protection of Personal Information(PPI) and General Data Protection Regulations (GDPR), to understand data use and storage in democratic states. The study outcomes point to the fact that international cybersecurity and cybercrime legislation, policies, and conventions have not curbed violations of data access and use by radical terrorist groups. The study recommends ways to enhance cybersecurity and reduce cyber risks using democratic principles.

Keywords: cybersecurity, data exploitation, terrorist organizations, data democracy

Procedia PDF Downloads 169