Search results for: semantic data
25066 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication
Authors: Aishwarya Shekhar, Himanshu Sharma
Abstract:
Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.Keywords: confidentiality, deduplication, data compression, hybridity of cloud
Procedia PDF Downloads 38325065 A Review of Machine Learning for Big Data
Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.
Abstract:
Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.Keywords: active learning, big data, deep learning, machine learning
Procedia PDF Downloads 44625064 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights
Authors: Tomy Prihananto, Damar Apri Sudarmadi
Abstract:
Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.Keywords: Indonesia, protection, personal data, privacy, human rights, encryption
Procedia PDF Downloads 18325063 True and False Cognates of Japanese, Chinese and Philippine Languages: A Contrastive Analysis
Authors: Jose Marie E. Ocdenaria, Riceli C. Mendoza
Abstract:
Culturally, languages meet, merge, share, exchange, appropriate, donate, and divide in and to and from each other. Further, this type of recurrence manifests in East Asian cultures, where language influence diffuses across geographical proximities. Historically, China has notable impacts on Japan’s culture. For instance, Japanese borrowed words from China and their way of reading and writing. This qualitative and descriptive employing contrastive analysis study addressed the true and false cognates of Japanese-Philippine languages and Chinese-Philippine languages. It involved a rich collection of data from various sources like textual pieces of evidence or corpora to gain a deeper understanding of true and false cognates between L1 and L2. Cognates of Japanese-Philippine languages and Chinese-Philippine languages were analyzed contrastively according to orthography, phonology, and semantics. The words presented were the roots; however, derivatives, reduplications, and variants of stress were included when they shed emphases on the comparison. The basis of grouping the cognates was its phonetic-semantic resemblance. Based on the analysis, it revealed that there are words which may have several types of lexical relationship. Further, the study revealed that the Japanese language has more false cognates in the Philippine languages, particularly in Tagalog and Cebuano. On the other hand, there are more true cognates of Chinese in Tagalog. It is the hope of this study to provide a significant contribution to a diverse audience. These include the teachers and learners of foreign languages such as Japanese and Chinese, future researchers and investigators, applied linguists, curricular theorists, community, and publishers.Keywords: Contrastive Analysis, Japanese, Chinese and Philippine languages, Qualitative and descriptive study, True and False Cognates
Procedia PDF Downloads 13725062 The Various Legal Dimensions of Genomic Data
Authors: Amy Gooden
Abstract:
When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.Keywords: artificial intelligence, data, law, genomics, rights
Procedia PDF Downloads 13825061 Big Brain: A Single Database System for a Federated Data Warehouse Architecture
Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf
Abstract:
Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)
Procedia PDF Downloads 23625060 Results and Insights from a Developmental Psychology Study on the Presentation of Juvenility in Punk Fanzines
Authors: Marc Dietrich
Abstract:
Youth cultures like Punk as much as media relevant to the specific scenes associated with them offer ample opportunity for young people or juvenile adults to construct their personal identities. However, developmental psychology has largely neglected such identity construction processes during the last decades. Such was not always the case: Early developmental psychologists intensely studied youth cultures and their meaningful objects and media in the early 20th century but lost interest when cultural studies and the social sciences occupied the field after World War II. Our project Constructions of Juvenility and Generation(ality), funded by the German Federal Ministry for Education and Research, reintegrates the study of youth cultures and their meaningful objects and media in a developmental psychology perspective. We present an empirical study of the ways in which youth, juvenility, and generation (ality) are constructed and negotiated in underground media like punk fanzines (a portmanteau of fan and magazine), including both semantic and aesthetic aspects of these construction processes within punk culture. The fanzine sample was accessed by the theoretical sampling strategy typical for GTM studies. Acknowledging fanzines as artful self-produced media by scene members for scene members, we conceptualize them as authentic documents of scene norms and values. Drawing on an analysis of both text and (cover) images in Punk fanzines published in Germany (and within a sample dating from 1981 until 2015) using a novel Visual Grounded Theory approach, we found that: a) Juvenility is a highly contested concept in punk culture. Its semantic quality and valuation varies with the perspectives present within the culture (e.g. embryo punks versus older punks); b) Juvenility is constructed as having energy and being socio-critical that does not depend on biological age; c) Juvenility is regarded not an ideal per se in German Punk culture; Punk culture constructs old age in a largely positive way (e.g., as marker of being real and a historical innovator); d) Juvenility is constructed as a habit that should be kept for life as it is constantly adapted to individual biographical trajectories like specific job situations or having a family. Consequently, identity negotiation as documented in the zines attempts to balance subculturally driven perspectives on life and society with the pragmatic requirements of a bourgeois life. The proposed paper will present the main results of this large-scale study of punk fanzines and show how developmental psychology perspectives as represented in the novel methodology applied in it can advance the study of youth cultures.Keywords: construction of juvenility, developmental psychology, visual GTM, youth culture, fanzines
Procedia PDF Downloads 29225059 A Case of Generalized Anxiety Disorder (GAD)
Authors: Muhammad Zeeshan
Abstract:
This case study is about a 54 years man named Mr. U, referred to Capital Hospital, Islamabad, with the presenting complaints of Generalized Anxiety Disorder (GAD). Contrary to his complaints, the client reported psychological symptoms such as restlessness, low mood and fear of darkness and fear from closed places from the last 30 days. He also had a fear of death and his existence in the grave. His sleep was also disturbed due to excessive urination due to diabetes. He was also suffering from semantic symptoms such as headache, numbness of feet and pain in the chest and blockage of the nose. A complete history was taken and informal assessment (clinical interview and MSE) and formal testing (BAI) was applied that showed the clear diagnosis of Generalized Anxiety Disorder. CBT, relaxation techniques, prayer chart and behavioural techniques were applied for the treatment purposes.Keywords: generalized anxiety disorder, presenting complaints, formal and informal assessment, diagnosis
Procedia PDF Downloads 28525058 A Review Paper on Data Mining and Genetic Algorithm
Authors: Sikander Singh Cheema, Jasmeen Kaur
Abstract:
In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining
Procedia PDF Downloads 59125057 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring
Authors: Seung-Lock Seo
Abstract:
This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.Keywords: data mining, process data, monitoring, safety, industrial processes
Procedia PDF Downloads 40125056 Multidisciplinary Approach to Diagnosis of Primary Progressive Aphasia in a Younger Middle Aged Patient
Authors: Robert Krause
Abstract:
Primary progressive aphasia (PPA) is a neurodegenerative disease similar to frontotemporal and semantic dementia, while having a different clinical image and anatomic pathology topography. Nonetheless, they are often included under an umbrella term: frontotemporal lobar degeneration (FTLD). In the study, examples of diagnosing PPA are presented through the multidisciplinary lens of specialists from different fields (neurologists, psychiatrists, clinical speech therapists, clinical neuropsychologists and others) using a variety of diagnostic tools such as MR, PET/CT, genetic screening and neuropsychological and logopedic methods. Thanks to that, specialists can get a better and clearer understanding of PPA diagnosis. The study summarizes the concrete procedures and results of different specialists while diagnosing PPA in a patient of younger middle age and illustrates the importance of multidisciplinary approach to differential diagnosis of PPA.Keywords: primary progressive aphasia, etiology, diagnosis, younger middle age
Procedia PDF Downloads 19525055 Methodological Resolutions for Definition Problems in Turkish Navigation Terminology
Authors: Ayşe Yurdakul, Eckehard Schnieder
Abstract:
Nowadays, there are multilingual and multidisciplinary communication problems because of the increasing technical progress. Each technical field has its own specific terminology and in each particular language, there are differences in relation to definitions of terms. Besides, there could be several translations in the certain target language for one term of the source language. First of all, these problems of semantic relations between terms include the synonymy, antonymy, hypernymy/hyponymy, ambiguity, risk of confusion and translation problems. Therefore, the iglos terminology management system of the Institute for Traffic Safety and Automation Engineering of the Technische Universität Braunschweig has the goal to avoid these problems by a methodological standardisation of term definitions on the basis of the iglos sign model and iglos relation types. The focus of this paper should be on standardisation of navigation terminology as an example.Keywords: iglos, localisation, methodological approaches, navigation, positioning, definition problems, terminology
Procedia PDF Downloads 36825054 Classification of Generative Adversarial Network Generated Multivariate Time Series Data Featuring Transformer-Based Deep Learning Architecture
Authors: Thrivikraman Aswathi, S. Advaith
Abstract:
As there can be cases where the use of real data is somehow limited, such as when it is hard to get access to a large volume of real data, we need to go for synthetic data generation. This produces high-quality synthetic data while maintaining the statistical properties of a specific dataset. In the present work, a generative adversarial network (GAN) is trained to produce multivariate time series (MTS) data since the MTS is now being gathered more often in various real-world systems. Furthermore, the GAN-generated MTS data is fed into a transformer-based deep learning architecture that carries out the data categorization into predefined classes. Further, the model is evaluated across various distinct domains by generating corresponding MTS data.Keywords: GAN, transformer, classification, multivariate time series
Procedia PDF Downloads 13025053 Mask-Prompt-Rerank: An Unsupervised Method for Text Sentiment Transfer
Authors: Yufen Qin
Abstract:
Text sentiment transfer is an important branch of text style transfer. The goal is to generate text with another sentiment attribute based on a text with a specific sentiment attribute while maintaining the content and semantic information unrelated to sentiment unchanged in the process. There are currently two main challenges in this field: no parallel corpus and text attribute entanglement. In response to the above problems, this paper proposed a novel solution: Mask-Prompt-Rerank. Use the method of masking the sentiment words and then using prompt regeneration to transfer the sentence sentiment. Experiments on two sentiment benchmark datasets and one formality transfer benchmark dataset show that this approach makes the performance of small pre-trained language models comparable to that of the most advanced large models, while consuming two orders of magnitude less computing and memory.Keywords: language model, natural language processing, prompt, text sentiment transfer
Procedia PDF Downloads 8125052 Generative AI: A Comparison of Conditional Tabular Generative Adversarial Networks and Conditional Tabular Generative Adversarial Networks with Gaussian Copula in Generating Synthetic Data with Synthetic Data Vault
Authors: Lakshmi Prayaga, Chandra Prayaga. Aaron Wade, Gopi Shankar Mallu, Harsha Satya Pola
Abstract:
Synthetic data generated by Generative Adversarial Networks and Autoencoders is becoming more common to combat the problem of insufficient data for research purposes. However, generating synthetic data is a tedious task requiring extensive mathematical and programming background. Open-source platforms such as the Synthetic Data Vault (SDV) and Mostly AI have offered a platform that is user-friendly and accessible to non-technical professionals to generate synthetic data to augment existing data for further analysis. The SDV also provides for additions to the generic GAN, such as the Gaussian copula. We present the results from two synthetic data sets (CTGAN data and CTGAN with Gaussian Copula) generated by the SDV and report the findings. The results indicate that the ROC and AUC curves for the data generated by adding the layer of Gaussian copula are much higher than the data generated by the CTGAN.Keywords: synthetic data generation, generative adversarial networks, conditional tabular GAN, Gaussian copula
Procedia PDF Downloads 8225051 Inferring Cognitive Skill in Concept Space
Authors: Rania A. Aboalela, Javed I. Khan
Abstract:
This research presents a learning assessment theory of Cognitive Skill in Concept Space (CS2) to measure the assessed knowledge in terms of cognitive skill levels of the concepts. The cognitive skill levels refer to levels such as if a student has acquired the state at the level of understanding, or applying, or analyzing, etc. The theory is comprised of three constructions: Graph paradigm of a semantic/ ontological scheme, the concept states of the theory and the assessment analytics which is the process to estimate the sets of concept state at a certain skill level. Concept state means if a student has already learned, or is ready to learn, or is not ready to learn a certain skill level. The experiment is conducted to prove the validation of the theory CS2.Keywords: cognitive skill levels, concept states, concept space, knowledge assessment theory
Procedia PDF Downloads 32325050 An Event-Related Potentials Study on the Processing of English Subjunctive Mood by Chinese ESL Learners
Authors: Yan Huang
Abstract:
Event-related potentials (ERPs) technique helps researchers to make continuous measures on the whole process of language comprehension, with an excellent temporal resolution at the level of milliseconds. The research on sentence processing has developed from the behavioral level to the neuropsychological level, which brings about a variety of sentence processing theories and models. However, the applicability of these models to L2 learners is still under debate. Therefore, the present study aims to investigate the neural mechanisms underlying English subjunctive mood processing by Chinese ESL learners. To this end, English subject clauses with subjunctive moods are used as the stimuli, all of which follow the same syntactic structure, “It is + adjective + that … + (should) do + …” Besides, in order to examine the role that language proficiency plays on L2 processing, this research deals with two groups of Chinese ESL learners (18 males and 22 females, mean age=21.68), namely, high proficiency group (Group H) and low proficiency group (Group L). Finally, the behavioral and neurophysiological data analysis reveals the following findings: 1) Syntax and semantics interact with each other on the SECOND phase (300-500ms) of sentence processing, which is partially in line with the Three-phase Sentence Model; 2) Language proficiency does affect L2 processing. Specifically, for Group H, it is the syntactic processing that plays the dominant role in sentence processing while for Group L, semantic processing also affects the syntactic parsing during the THIRD phase of sentence processing (500-700ms). Besides, Group H, compared to Group L, demonstrates a richer native-like ERPs pattern, which further demonstrates the role of language proficiency in L2 processing. Based on the research findings, this paper also provides some enlightenment for the L2 pedagogy as well as the L2 proficiency assessment.Keywords: Chinese ESL learners, English subjunctive mood, ERPs, L2 processing
Procedia PDF Downloads 13125049 A Privacy Protection Scheme Supporting Fuzzy Search for NDN Routing Cache Data Name
Authors: Feng Tao, Ma Jing, Guo Xian, Wang Jing
Abstract:
Named Data Networking (NDN) replaces IP address of traditional network with data name, and adopts dynamic cache mechanism. In the existing mechanism, however, only one-to-one search can be achieved because every data has a unique name corresponding to it. There is a certain mapping relationship between data content and data name, so if the data name is intercepted by an adversary, the privacy of the data content and user’s interest can hardly be guaranteed. In order to solve this problem, this paper proposes a one-to-many fuzzy search scheme based on order-preserving encryption to reduce the query overhead by optimizing the caching strategy. In this scheme, we use hash value to ensure the user’s query safe from each node in the process of search, so does the privacy of the requiring data content.Keywords: NDN, order-preserving encryption, fuzzy search, privacy
Procedia PDF Downloads 48525048 Healthcare Big Data Analytics Using Hadoop
Authors: Chellammal Surianarayanan
Abstract:
Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare
Procedia PDF Downloads 41325047 Data Disorders in Healthcare Organizations: Symptoms, Diagnoses, and Treatments
Authors: Zakieh Piri, Shahla Damanabi, Peyman Rezaii Hachesoo
Abstract:
Introduction: Healthcare organizations like other organizations suffer from a number of disorders such as Business Sponsor Disorder, Business Acceptance Disorder, Cultural/Political Disorder, Data Disorder, etc. As quality in healthcare care mostly depends on the quality of data, we aimed to identify data disorders and its symptoms in two teaching hospitals. Methods: Using a self-constructed questionnaire, we asked 20 questions in related to quality and usability of patient data stored in patient records. Research population consisted of 150 managers, physicians, nurses, medical record staff who were working at the time of study. We also asked their views about the symptoms and treatments for any data disorders they mentioned in the questionnaire. Using qualitative methods we analyzed the answers. Results: After classifying the answers, we found six main data disorders: incomplete data, missed data, late data, blurred data, manipulated data, illegible data. The majority of participants believed in their important roles in treatment of data disorders while others believed in health system problems. Discussion: As clinicians have important roles in producing of data, they can easily identify symptoms and disorders of patient data. Health information managers can also play important roles in early detection of data disorders by proactively monitoring and periodic check-ups of data.Keywords: data disorders, quality, healthcare, treatment
Procedia PDF Downloads 43325046 Big Data and Analytics in Higher Education: An Assessment of Its Status, Relevance and Future in the Republic of the Philippines
Authors: Byron Joseph A. Hallar, Annjeannette Alain D. Galang, Maria Visitacion N. Gumabay
Abstract:
One of the unique challenges provided by the twenty-first century to Philippine higher education is the utilization of Big Data. The higher education system in the Philippines is generating burgeoning amounts of data that contains relevant data that can be used to generate the information and knowledge needed for accurate data-driven decision making. This study examines the status, relevance and future of Big Data and Analytics in Philippine higher education. The insights gained from the study may be relevant to other developing nations similarly situated as the Philippines.Keywords: big data, data analytics, higher education, republic of the philippines, assessment
Procedia PDF Downloads 34825045 5iD Viewer: Observation of Fish School Behaviour in Labyrinths and Use of Semantic and Syntactic Entropy for School Structure Definition
Authors: Dalibor Štys, Kryštof M. Stys, Maryia Chkalova, Petr Kouba, Aliaxandr Pautsina, Dalibor Štys Jr., Jana Pečenková, Denis Durniev, Tomáš Náhlík, Petr Císař
Abstract:
In this article, a construction and some properties of the 5iD viewer, the system recording simultaneously five views of a given experimental object is reported. Properties of the system are demonstrated on the analysis of fish schooling behavior. It is demonstrated the method of instrument calibration which allows inclusion of image distortion and it is proposed and partly tested also the method of distance assessment in the case that only two opposite cameras are available. Finally, we demonstrate how the state trajectory of the behavior of the fish school may be constructed from the entropy of the system.Keywords: 3D positioning, school behavior, distance calibration, space vision, space distortion
Procedia PDF Downloads 39025044 An Ontology for Smart Learning Environments in Music Education
Authors: Konstantinos Sofianos, Michail Stefanidakis
Abstract:
Nowadays, despite the great advances in technology, most educational frameworks lack a strong educational design basis. E-learning has become prevalent, but it faces various challenges such as student isolation and lack of quality in the learning process. An intelligent learning system provides a student with educational material according to their learning background and learning preferences. It records full information about the student, such as demographic information, learning styles, and academic performance. This information allows the system to be fully adapted to the student’s needs. In this paper, we propose a framework and an ontology for music education, consisting of the learner model and all elements of the learning process (learning objects, teaching methods, learning activities, assessment). This framework can be integrated into an intelligent learning system and used for music education in schools for the development of professional skills and beyond.Keywords: intelligent learning systems, e-learning, music education, ontology, semantic web
Procedia PDF Downloads 14025043 Data Management and Analytics for Intelligent Grid
Authors: G. Julius P. Roy, Prateek Saxena, Sanjeev Singh
Abstract:
Power distribution utilities two decades ago would collect data from its customers not later than a period of at least one month. The origin of SmartGrid and AMI has subsequently increased the sampling frequency leading to 1000 to 10000 fold increase in data quantity. This increase is notable and this steered to coin the tern Big Data in utilities. Power distribution industry is one of the largest to handle huge and complex data for keeping history and also to turn the data in to significance. Majority of the utilities around the globe are adopting SmartGrid technologies as a mass implementation and are primarily focusing on strategic interdependence and synergies of the big data coming from new information sources like AMI and intelligent SCADA, there is a rising need for new models of data management and resurrected focus on analytics to dissect data into descriptive, predictive and dictatorial subsets. The goal of this paper is to is to bring load disaggregation into smart energy toolkit for commercial usage.Keywords: data management, analytics, energy data analytics, smart grid, smart utilities
Procedia PDF Downloads 78025042 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive
Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh
Abstract:
Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data
Procedia PDF Downloads 29525041 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects
Authors: Behnam Tavakkol
Abstract:
Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data
Procedia PDF Downloads 21525040 Democracy Bytes: Interrogating the Exploitation of Data Democracy by Radical Terrorist Organizations
Authors: Nirmala Gopal, Sheetal Bhoola, Audecious Mugwagwa
Abstract:
This paper discusses the continued infringement and exploitation of data by non-state actors for destructive purposes, emphasizing radical terrorist organizations. It will discuss how terrorist organizations access and use data to foster their nefarious agendas. It further examines how cybersecurity, designed as a tool to curb data exploitation, is ineffective in raising global citizens' concerns about how their data can be kept safe and used for its acquired purpose. The study interrogates several policies and data protection instruments, such as the Data Protection Act, Cyber Security Policies, Protection of Personal Information(PPI) and General Data Protection Regulations (GDPR), to understand data use and storage in democratic states. The study outcomes point to the fact that international cybersecurity and cybercrime legislation, policies, and conventions have not curbed violations of data access and use by radical terrorist groups. The study recommends ways to enhance cybersecurity and reduce cyber risks using democratic principles.Keywords: cybersecurity, data exploitation, terrorist organizations, data democracy
Procedia PDF Downloads 20425039 Healthcare Data Mining Innovations
Authors: Eugenia Jilinguirian
Abstract:
In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database
Procedia PDF Downloads 6625038 Network Conditioning and Transfer Learning for Peripheral Nerve Segmentation in Ultrasound Images
Authors: Harold Mauricio Díaz-Vargas, Cristian Alfonso Jimenez-Castaño, David Augusto Cárdenas-Peña, Guillermo Alberto Ortiz-Gómez, Alvaro Angel Orozco-Gutierrez
Abstract:
Precise identification of the nerves is a crucial task performed by anesthesiologists for an effective Peripheral Nerve Blocking (PNB). Now, anesthesiologists use ultrasound imaging equipment to guide the PNB and detect nervous structures. However, visual identification of the nerves from ultrasound images is difficult, even for trained specialists, due to artifacts and low contrast. The recent advances in deep learning make neural networks a potential tool for accurate nerve segmentation systems, so addressing the above issues from raw data. The most widely spread U-Net network yields pixel-by-pixel segmentation by encoding the input image and decoding the attained feature vector into a semantic image. This work proposes a conditioning approach and encoder pre-training to enhance the nerve segmentation of traditional U-Nets. Conditioning is achieved by the one-hot encoding of the kind of target nerve a the network input, while the pre-training considers five well-known deep networks for image classification. The proposed approach is tested in a collection of 619 US images, where the best C-UNet architecture yields an 81% Dice coefficient, outperforming the 74% of the best traditional U-Net. Results prove that pre-trained models with the conditional approach outperform their equivalent baseline by supporting learning new features and enriching the discriminant capability of the tested networks.Keywords: nerve segmentation, U-Net, deep learning, ultrasound imaging, peripheral nerve blocking
Procedia PDF Downloads 10625037 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering
Authors: Yunus Doğan, Ahmet Durap
Abstract:
Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods
Procedia PDF Downloads 361