Search results for: data discovery
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24828

Search results for: data discovery

24588 A Comprehensive Survey and Improvement to Existing Privacy Preserving Data Mining Techniques

Authors: Tosin Ige

Abstract:

Ethics must be a condition of the world, like logic. (Ludwig Wittgenstein, 1889-1951). As important as data mining is, it possess a significant threat to ethics, privacy, and legality, since data mining makes it difficult for an individual or consumer (in the case of a company) to control the accessibility and usage of his data. This research focuses on Current issues and the latest research and development on Privacy preserving data mining methods as at year 2022. It also discusses some advances in those techniques while at the same time highlighting and providing a new technique as a solution to an existing technique of privacy preserving data mining methods. This paper also bridges the wide gap between Data mining and the Web Application Programing Interface (web API), where research is urgently needed for an added layer of security in data mining while at the same time introducing a seamless and more efficient way of data mining.

Keywords: data, privacy, data mining, association rule, privacy preserving, mining technique

Procedia PDF Downloads 147
24587 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: big data, big data analytics, Hadoop, cloud

Procedia PDF Downloads 293
24586 Semantic Data Schema Recognition

Authors: Aïcha Ben Salem, Faouzi Boufares, Sebastiao Correia

Abstract:

The subject covered in this paper aims at assisting the user in its quality approach. The goal is to better extract, mix, interpret and reuse data. It deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.

Keywords: schema recognition, semantic data profiling, meta-categorisation, semantic dependencies inter columns

Procedia PDF Downloads 405
24585 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks

Authors: Paul Shize Li, Frank Alber

Abstract:

A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.

Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation

Procedia PDF Downloads 137
24584 Access Control System for Big Data Application

Authors: Winfred Okoe Addy, Jean Jacques Dominique Beraud

Abstract:

Access control systems (ACs) are some of the most important components in safety areas. Inaccuracies of regulatory frameworks make personal policies and remedies more appropriate than standard models or protocols. This problem is exacerbated by the increasing complexity of software, such as integrated Big Data (BD) software for controlling large volumes of encrypted data and resources embedded in a dedicated BD production system. This paper proposes a general access control strategy system for the diffusion of Big Data domains since it is crucial to secure the data provided to data consumers (DC). We presented a general access control circulation strategy for the Big Data domain by describing the benefit of using designated access control for BD units and performance and taking into consideration the need for BD and AC system. We then presented a generic of Big Data access control system to improve the dissemination of Big Data.

Keywords: access control, security, Big Data, domain

Procedia PDF Downloads 118
24583 Didactical and Semiotic Affordance of GeoGebra in a Productive Mathematical Discourse

Authors: Isaac Benning

Abstract:

Using technology to expand the learning space is critical for a productive mathematical discourse. This is a case study of two teachers who developed and enacted GeoGebra-based mathematics lessons following their engagement in a two-year professional development. The didactical and semiotic affordance of GeoGebra in widening the learning space for a productive mathematical discourse was explored. The approach of thematic analysis was used for lesson artefact, lesson observation, and interview data. The results indicated that constructing tools in GeoGebra provided a didactical milieu where students used them to explore mathematical concepts with little or no support from their teacher. The prompt feedback from the GeoGebra motivated students to practice mathematical concepts repeatedly in which they privately rethink their solutions before comparing their answers with that of their colleagues. The constructing tools enhanced self-discovery, team spirit, and dialogue among students. With regards to the semiotic construct, the tools widened the physical and psychological atmosphere of the classroom by providing animations that served as virtual concrete to enhance the recording, manipulation, testing of a mathematical idea, construction, and interpretation of geometric objects. These findings advance the discussion of widening the classroom for a productive mathematical discourse within the context of the mathematics curriculum of Ghana and similar Sub-Saharan African countries.

Keywords: GeoGebra, theory of didactical situation, semiotic mediation, mathematics laboratory, mathematical discussion

Procedia PDF Downloads 108
24582 A Data Envelopment Analysis Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most of Data Envelopment Analysis models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp Data Envelopment Analysis into Data Envelopment Analysis with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the Data Envelopment Analysis model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units' efficiency. Finally, the developed Data Envelopment Analysis model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, Data Envelopment Analysis, fuzzy, higher education, input, output

Procedia PDF Downloads 31
24581 An Empirical Analysis of the Effects of Corporate Derivatives Use on the Underlying Stock Price Exposure: South African Evidence

Authors: Edson Vengesai

Abstract:

Derivative products have become essential instruments in portfolio diversification, price discovery, and, most importantly, risk hedging. Derivatives are complex instruments; their valuation, volatility implications, and real impact on the underlying assets' behaviour are not well understood. Little is documented empirically, with conflicting conclusions on how these instruments affect firm risk exposures. Given the growing interest in using derivatives in risk management and portfolio engineering, this study examines the practical impact of derivative usage on the underlying stock price exposure and systematic risk. The paper uses data from South African listed firms. The study employs GARCH models to understand the effect of derivative uses on conditional stock volatility. The GMM models are used to estimate the effect of derivatives use on stocks' systematic risk as measured by Beta and on the total risk of stocks as measured by the standard deviation of returns. The results provide evidence on whether derivatives use is instrumental in reducing stock returns' systematic and total risk. The results are subjected to numerous controls for robustness, including financial leverage, firm size, growth opportunities, and macroeconomic effects.

Keywords: derivatives use, hedging, volatility, stock price exposure

Procedia PDF Downloads 92
24580 Homogeneous Anti-Corrosion Coating of Spontaneously Dissolved Defect-Free Graphene

Authors: M. K. Bin Subhan, P. Cullen, C. Howard

Abstract:

A recent study by the World Corrosion Organization estimated that corrosion related damage causes $2.5tr worth of damage every year. As such, a low cost easily scalable solution is required to the corrosion problem which is economically viable. Graphene is an ideal anti-corrosion barrier layer material due to its excellent barrier properties and chemical stability, which makes it impermeable to all molecules. However, attempts to employ graphene as a barrier layer has been hampered by the fact that defect sites in graphene accelerate corrosion due to the inert nature of graphene which promotes galvanic corrosion at the expense of the metal. The recent discovery of spontaneous dissolution of charged graphite intercalation compounds in aprotic solvents enables defect free graphene platelets to be employed for anti-corrosion applications. These ‘inks’ of defect-free charged graphene platelets in solution can be coated onto a metallic surfaces via electroplating to form a homogeneous barrier layer. In this paper, initial data showing homogeneous coatings of graphene barrier layers on steel coupons via electroplating will be presented. This easily scalable technique also provides a controllable method for applying different barrier thicknesses from ultra thin layers to thick opaque coatings making it useful for a wide range of applications.

Keywords: anti-corrosion, defect-free, electroplating, graphene

Procedia PDF Downloads 120
24579 From Context to Text and Back Again: Teaching Toni Morrison Overseas

Authors: Helena Maragou

Abstract:

Introducing Toni Morrison’s fiction to a classroom overseas entails a significant pedagogical investment, from monitoring students’ uncertain journey through Morrison’s shifty semantics to filling in the gaps of cultural knowledge and understanding for the students to be able to relate text to context. A rewarding process, as Morrison’s works present a tremendous opportunity for transnational dialogue, an opportunity that hinges upon Toni Morrison’s bringing to the fore the untold and unspeakable lives of racial ‘Others’, but also, crucially, upon her broader critique of Western ideological hegemony. This critique is a fundamental aspect of Toni Morrison’s politics and one that appeals to young readers of Toni Morrison in Greece at a time when the questioning of institutions and ideological traditions is precipitated by regional and global change. It is more or less self-evident that to help a class of international students get aboard a Morrison novel, an instructor should begin by providing them with cultural context. These days, students’ exposure to Hollywood representations of the African American past and present, as well as the use of documentaries, photography, music videos, etc., as supplementary class material, provide a starting point, a workable historical and cultural framework for textual comprehension. The true challenge, however, lies ahead: it is one thing for students to intellectually grasp the historical hardships and traumas of Morrison’s characters and to even engage in aesthetic appreciation of Morrison’s writing; quite another to relate to her works as articulations of experiences akin to their own. The great challenge, then, is in facilitating students’ discovery of the universal Morrison, the author who speaks across cultures while voicing the untold tales of her own people; this process of discovery entails, on a pedagogical level, that students be guided through the works’ historical context, to plunge into the intricacies of Morrison’s discourse, itself an elaborate linguistic booby trap, so as to be finally brought to reconsider their own historical experiences using the lens of Morrison’s fiction. The paper will be based on experience of teaching a Toni Morrison seminar to a class of Greek students at the American College of Greece and will draw from students’ exposure and responses to Toni Morrison’s “Nobel Prize Lecture,” as well as her novels Song of Solomon and Home.

Keywords: toni morrison, international classroom, pedagogy, African American literature

Procedia PDF Downloads 68
24578 Collaborative Data Refinement for Enhanced Ionic Conductivity Prediction in Garnet-Type Materials

Authors: Zakaria Kharbouch, Mustapha Bouchaara, F. Elkouihen, A. Habbal, A. Ratnani, A. Faik

Abstract:

Solid-state lithium-ion batteries have garnered increasing interest in modern energy research due to their potential for safer, more efficient, and sustainable energy storage systems. Among the critical components of these batteries, the electrolyte plays a pivotal role, with LLZO garnet-based electrolytes showing significant promise. Garnet materials offer intrinsic advantages such as high Li-ion conductivity, wide electrochemical stability, and excellent compatibility with lithium metal anodes. However, optimizing ionic conductivity in garnet structures poses a complex challenge, primarily due to the multitude of potential dopants that can be incorporated into the LLZO crystal lattice. The complexity of material design, influenced by numerous dopant options, requires a systematic method to find the most effective combinations. This study highlights the utility of machine learning (ML) techniques in the materials discovery process to navigate the complex range of factors in garnet-based electrolytes. Collaborators from the materials science and ML fields worked with a comprehensive dataset previously employed in a similar study and collected from various literature sources. This dataset served as the foundation for an extensive data refinement phase, where meticulous error identification, correction, outlier removal, and garnet-specific feature engineering were conducted. This rigorous process substantially improved the dataset's quality, ensuring it accurately captured the underlying physical and chemical principles governing garnet ionic conductivity. The data refinement effort resulted in a significant improvement in the predictive performance of the machine learning model. Originally starting at an accuracy of 0.32, the model underwent substantial refinement, ultimately achieving an accuracy of 0.88. This enhancement highlights the effectiveness of the interdisciplinary approach and underscores the substantial potential of machine learning techniques in materials science research.

Keywords: lithium batteries, all-solid-state batteries, machine learning, solid state electrolytes

Procedia PDF Downloads 42
24577 The Economic Limitations of Defining Data Ownership Rights

Authors: Kacper Tomasz Kröber-Mulawa

Abstract:

This paper will address the topic of data ownership from an economic perspective, and examples of economic limitations of data property rights will be provided, which have been identified using methods and approaches of economic analysis of law. To properly build a background for the economic focus, in the beginning a short perspective of data and data ownership in the EU’s legal system will be provided. It will include a short introduction to its political and social importance and highlight relevant viewpoints. This will stress the importance of a Single Market for data but also far-reaching regulations of data governance and privacy (including the distinction of personal and non-personal data, data held by public bodies and private businesses). The main discussion of this paper will build upon the briefly referred to legal basis as well as methods and approaches of economic analysis of law.

Keywords: antitrust, data, data ownership, digital economy, property rights

Procedia PDF Downloads 63
24576 Protecting the Cloud Computing Data Through the Data Backups

Authors: Abdullah Alsaeed

Abstract:

Virtualized computing and cloud computing infrastructures are no longer fuzz or marketing term. They are a core reality in today’s corporate Information Technology (IT) organizations. Hence, developing an effective and efficient methodologies for data backup and data recovery is required more than any time. The purpose of data backup and recovery techniques are to assist the organizations to strategize the business continuity and disaster recovery approaches. In order to accomplish this strategic objective, a variety of mechanism were proposed in the recent years. This research paper will explore and examine the latest techniques and solutions to provide data backup and restoration for the cloud computing platforms.

Keywords: data backup, data recovery, cloud computing, business continuity, disaster recovery, cost-effective, data encryption.

Procedia PDF Downloads 68
24575 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: data estimation, link data, machine learning, road network

Procedia PDF Downloads 499
24574 Customer Data Analysis Model Using Business Intelligence Tools in Telecommunication Companies

Authors: Monica Lia

Abstract:

This article presents a customer data analysis model using business intelligence tools for data modelling, transforming, data visualization and dynamic reports building. Economic organizational customer’s analysis is made based on the information from the transactional systems of the organization. The paper presents how to develop the data model starting for the data that companies have inside their own operational systems. The owned data can be transformed into useful information about customers using business intelligence tool. For a mature market, knowing the information inside the data and making forecast for strategic decision become more important. Business Intelligence tools are used in business organization as support for decision-making.

Keywords: customer analysis, business intelligence, data warehouse, data mining, decisions, self-service reports, interactive visual analysis, and dynamic dashboards, use cases diagram, process modelling, logical data model, data mart, ETL, star schema, OLAP, data universes

Procedia PDF Downloads 415
24573 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: big data, open data, productivity, data governance

Procedia PDF Downloads 353
24572 Identifying a Drug Addict Person Using Artificial Neural Networks

Authors: Mustafa Al Sukar, Azzam Sleit, Abdullatif Abu-Dalhoum, Bassam Al-Kasasbeh

Abstract:

Use and abuse of drugs by teens is very common and can have dangerous consequences. The drugs contribute to physical and sexual aggression such as assault or rape. Some teenagers regularly use drugs to compensate for depression, anxiety or a lack of positive social skills. Teen resort to smoking should not be minimized because it can be "gateway drugs" for other drugs (marijuana, cocaine, hallucinogens, inhalants, and heroin). The combination of teenagers' curiosity, risk taking behavior, and social pressure make it very difficult to say no. This leads most teenagers to the questions: "Will it hurt to try once?" Nowadays, technological advances are changing our lives very rapidly and adding a lot of technologies that help us to track the risk of drug abuse such as smart phones, Wireless Sensor Networks (WSNs), Internet of Things (IoT), etc. This technique may help us to early discovery of drug abuse in order to prevent an aggravation of the influence of drugs on the abuser. In this paper, we have developed a Decision Support System (DSS) for detecting the drug abuse using Artificial Neural Network (ANN); we used a Multilayer Perceptron (MLP) feed-forward neural network in developing the system. The input layer includes 50 variables while the output layer contains one neuron which indicates whether the person is a drug addict. An iterative process is used to determine the number of hidden layers and the number of neurons in each one. We used multiple experiment models that have been completed with Log-Sigmoid transfer function. Particularly, 10-fold cross validation schemes are used to access the generalization of the proposed system. The experiment results have obtained 98.42% classification accuracy for correct diagnosis in our system. The data had been taken from 184 cases in Jordan according to a set of questions compiled from Specialists, and data have been obtained through the families of drug abusers.

Keywords: drug addiction, artificial neural networks, multilayer perceptron (MLP), decision support system

Procedia PDF Downloads 280
24571 Membrane Permeability of Middle Molecules: A Computational Chemistry Approach

Authors: Sundaram Arulmozhiraja, Kanade Shimizu, Yuta Yamamoto, Satoshi Ichikawa, Maenaka Katsumi, Hiroaki Tokiwa

Abstract:

Drug discovery is shifting from small molecule based drugs targeting local active site to middle molecules (MM) targeting large, flat, and groove-shaped binding sites, for example, protein-protein interface because at least half of all targets assumed to be involved in human disease have been classified as “difficult to drug” with traditional small molecules. Hence, MMs such as peptides, natural products, glycans, nucleic acids with various high potent bioactivities become important targets for drug discovery programs in the recent years as they could be used for ‘undruggable” intracellular targets. Cell membrane permeability is one of the key properties of pharmacodynamically active MM drug compounds and so evaluating this property for the potential MMs is crucial. Computational prediction for cell membrane permeability of molecules is very challenging; however, recent advancement in the molecular dynamics simulations help to solve this issue partially. It is expected that MMs with high membrane permeability will enable drug discovery research to expand its borders towards intracellular targets. Further to understand the chemistry behind the permeability of MMs, it is necessary to investigate their conformational changes during the permeation through membrane and for that their interactions with the membrane field should be studied reliably because these interactions involve various non-bonding interactions such as hydrogen bonding, -stacking, charge-transfer, polarization dispersion, and non-classical weak hydrogen bonding. Therefore, parameters-based classical mechanics calculations are hardly sufficient to investigate these interactions rather, quantum mechanical (QM) calculations are essential. Fragment molecular orbital (FMO) method could be used for such purpose as it performs ab initio QM calculations by dividing the system into fragments. The present work is aimed to study the cell permeability of middle molecules using molecular dynamics simulations and FMO-QM calculations. For this purpose, a natural compound syringolin and its analogues were considered in this study. Molecular simulations were performed using NAMD and Gromacs programs with CHARMM force field. FMO calculations were performed using the PAICS program at the correlated Resolution-of-Identity second-order Moller Plesset (RI-MP2) level with the cc-pVDZ basis set. The simulations clearly show that while syringolin could not permeate the membrane, its selected analogues go through the medium in nano second scale. These correlates well with the existing experimental evidences that these syringolin analogues are membrane-permeable compounds. Further analyses indicate that intramolecular -stacking interactions in the syringolin analogues influenced their permeability positively. These intramolecular interactions reduce the polarity of these analogues so that they could permeate the lipophilic cell membrane. Conclusively, the cell membrane permeability of various middle molecules with potent bioactivities is efficiently studied using molecular dynamics simulations. Insight of this behavior is thoroughly investigated using FMO-QM calculations. Results obtained in the present study indicate that non-bonding intramolecular interactions such as hydrogen-bonding and -stacking along with the conformational flexibility of MMs are essential for amicable membrane permeation. These results are interesting and are nice example for this theoretical calculation approach that could be used to study the permeability of other middle molecules. This work was supported by Japan Agency for Medical Research and Development (AMED) under Grant Number 18ae0101047.

Keywords: fragment molecular orbital theory, membrane permeability, middle molecules, molecular dynamics simulation

Procedia PDF Downloads 171
24570 A Systematic Review on Challenges in Big Data Environment

Authors: Rimmy Yadav, Anmol Preet Kaur

Abstract:

Big Data has demonstrated the vast potential in streamlining, deciding, spotting business drifts in different fields, for example, producing, fund, Information Technology. This paper gives a multi-disciplinary diagram of the research issues in enormous information and its procedures, instruments, and system identified with the privacy, data storage management, network and energy utilization, adaptation to non-critical failure and information representations. Other than this, result difficulties and openings accessible in this Big Data platform have made.

Keywords: big data, privacy, data management, network and energy consumption

Procedia PDF Downloads 288
24569 One Species into Five: Nucleo-Mito Barcoding Reveals Cryptic Species in 'Frankliniella Schultzei Complex': Vector for Tospoviruses

Authors: Vikas Kumar, Kailash Chandra, Kaomud Tyagi

Abstract:

The insect order Thysanoptera includes small insects commonly called thrips. As insect vectors, only thrips are capable of Tospoviruses transmission (genus Tospovirus, family Bunyaviridae) affecting various crops. Currently, fifteen species of subfamily Thripinae (Thripidae) have been reported as vectors for tospoviruses. Frankliniella schultzei, which is reported as act as a vector for at least five tospovirses, have been suspected to be a species complex with more than one species. It is one of the historical unresolved issues where, two species namely, F. schultzei Trybom and F. sulphurea Schmutz were erected from South Africa and Srilanaka respectively. These two species were considered to be valid until 1968 when sulphurea was treated as colour morph (pale form) and synonymised under schultzei (dark form) However, these two have been considered as valid species by some of the thrips workers. Parallel studies have indicated that brown form of schultzei is a vector for tospoviruses while yellow form is a non-vector. However, recent studies have shown that yellow populations have also been documented as vectors. In view of all these facts, it is highly important to have a clear understanding whether these colour forms represent true species or merely different populations with different vector carrying capacities and whether there is some hidden diversity in 'Frankliniella schultzei species complex'. In this study, we aim to study the 'Frankliniella schultzei species complex' with molecular spectacles with DNA data from India and Australia and Africa. A total of fifty-five specimens was collected from diverse locations in India and Australia. We generated molecular data using partial fragments of mitochondrial cytochrome c oxidase I gene (mtCOI) and 28S rRNA gene. For COI dataset, there were seventy-four sequences, out of which data on fifty-five was generated in the current study and others were retrieved from NCBI. All the four different tree construction methods: neighbor-joining, maximum parsimony, maximum likelihood and Bayesian analysis, yielded the same tree topology and produced five cryptic species with high genetic divergence. For, rDNA, there were forty-five sequences, out of which data on thirty-nine was generated in the current study and others were retrieved from NCBI. The four tree building methods yielded four cryptic species with high bootstrap support value/posterior probability. Here we could not retrieve one cryptic species from South Africa as we could not generate data on rDNA from South Africa and sequence for rDNA from African region were not available in the database. The results of multiple species delimitation methods (barcode index numbers, automatic barcode gap discovery, general mixed Yule-coalescent, and Poisson-tree-processes) also supported the phylogenetic data and produced 5 and 4 Molecular Operational Taxonomic Units (MOTUs) for mtCOI and 28S dataset respectively. These results of our study indicate the likelihood that F. sulphurea may be a valid species, however, more morphological and molecular data is required on specimens from type localities of these two species and comparison with type specimens.

Keywords: DNA barcoding, species complex, thrips, species delimitation

Procedia PDF Downloads 114
24568 Survey on Big Data Stream Classification by Decision Tree

Authors: Mansoureh Ghiasabadi Farahani, Samira Kalantary, Sara Taghi-Pour, Mahboubeh Shamsi

Abstract:

Nowadays, the development of computers technology and its recent applications provide access to new types of data, which have not been considered by the traditional data analysts. Two particularly interesting characteristics of such data sets include their huge size and streaming nature .Incremental learning techniques have been used extensively to address the data stream classification problem. This paper presents a concise survey on the obstacles and the requirements issues classifying data streams with using decision tree. The most important issue is to maintain a balance between accuracy and efficiency, the algorithm should provide good classification performance with a reasonable time response.

Keywords: big data, data streams, classification, decision tree

Procedia PDF Downloads 501
24567 Robust and Dedicated Hybrid Cloud Approach for Secure Authorized Deduplication

Authors: Aishwarya Shekhar, Himanshu Sharma

Abstract:

Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. In this process, duplicate data is expunged, leaving only one copy means single instance of the data to be accumulated. Though, indexing of each and every data is still maintained. Data deduplication is an approach for minimizing the part of storage space an organization required to retain its data. In most of the company, the storage systems carry identical copies of numerous pieces of data. Deduplication terminates these additional copies by saving just one copy of the data and exchanging the other copies with pointers that assist back to the primary copy. To ignore this duplication of the data and to preserve the confidentiality in the cloud here we are applying the concept of hybrid nature of cloud. A hybrid cloud is a fusion of minimally one public and private cloud. As a proof of concept, we implement a java code which provides security as well as removes all types of duplicated data from the cloud.

Keywords: confidentiality, deduplication, data compression, hybridity of cloud

Procedia PDF Downloads 367
24566 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.

Abstract:

Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 421
24565 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption

Procedia PDF Downloads 166
24564 Neuroblastoma in Children and the Potential Involvement of Viruses in Its Pathogenesis

Authors: Ugo Rovigatti

Abstract:

Neuroblastoma (NBL) has epitomized for at least 40 years our understanding of cancer cellular and molecular biology and its potential applications to novel therapeutic strategies. This includes the discovery of the very first oncogene aberrations and tumorigenesis suppression by differentiation in the 80s; the potential role of suppressor genes in the 90s; the relevance of immunotherapy in the millennium first, and the discovery of additional mutations by NGS technology in the millennium second decade. Similar discoveries were achieved in the majority of human cancers, and similar therapeutic interventions were obtained subsequently to NBL discoveries. Unfortunately, targeted therapies suggested by specific mutations (such as MYCN amplification –MNA- present in ¼ or 1/5 of cases) have not elicited therapeutic successes in aggressive NBL, where the prognosis is still dismal. The reasons appear to be linked to Tumor Heterogeneity, which is particularly evident in NBL but also a clear hallmark of aggressive human cancers generally. The new avenue of cancer immunotherapy (CIT) provided new hopes for cancer patients, but we still ignore the cellular or molecular targets. CIT is emblematic of high-risk disease (HR-NBL) since the mentioned GD2 passive immunotherapy is still providing better survival. We recently critically reviewed and evaluated the literature depicting the genomic landscapes of HR-NBL, coming to the qualified conclusion that among hundreds of affected genes, potential targets, or chromosomal sites, none correlated with anti-GD2 sensitivity. A better explanation is provided by the Micro-Foci inducing Virus (MFV) model, which predicts that neuroblasts infection with the MFV, an RNA virus isolated from a cancer-cluster (space-time association) of HR-NBL cases, elicits the appearance of MNA and additional genomic aberrations with mechanisms resembling chromothripsis. Neuroblasts infected with low titers of MFV amplified MYCN up to 100 folds and became highly transformed and malignant, thus causing neuroblastoma in young rat pups of strains SD and Fisher-344 and larger tumor masses in nu/nu mice. An association was discovered with GD2 since this glycosphingolipid is also the receptor for the family of MFV virus (dsRNA viruses). It is concluded that a dsRNA virus, MFV, appears to provide better explicatory mechanisms for the genesis of i) specific genomic aberrations such as MNA; ii) extensive tumor heterogeneity and chromothripsis; iii) the effects of passive immunotherapy with anti-GD2 monoclonals and that this and similar models should be further investigated in both pediatric and adult cancers.

Keywords: neuroblastoma, MYCN, amplification, viruses, GD2

Procedia PDF Downloads 91
24563 A Computational Investigation of Potential Drugs for Cholesterol Regulation to Treat Alzheimer’s Disease

Authors: Marina Passero, Tianhua Zhai, Zuyi (Jacky) Huang

Abstract:

Alzheimer’s disease has become a major public health issue, as indicated by the increasing populations of Americans living with Alzheimer’s disease. After decades of extensive research in Alzheimer’s disease, only seven drugs have been approved by Food and Drug Administration (FDA) to treat Alzheimer’s disease. Five of these drugs were designed to treat the dementia symptoms, and only two drugs (i.e., Aducanumab and Lecanemab) target the progression of Alzheimer’s disease, especially the accumulation of amyloid-b plaques. However, controversial comments were raised for the accelerated approvals of either Aducanumab or Lecanemab, especially with concerns on safety and side effects of these two drugs. There is still an urgent need for further drug discovery to target the biological processes involved in the progression of Alzheimer’s disease. Excessive cholesterol has been found to accumulate in the brain of those with Alzheimer’s disease. Cholesterol can be synthesized in both the blood and the brain, but the majority of biosynthesis in the adult brain takes place in astrocytes and is then transported to the neurons via ApoE. The blood brain barrier separates cholesterol metabolism in the brain from the rest of the body. Various proteins contribute to the metabolism of cholesterol in the brain, which offer potential targets for Alzheimer’s treatment. In the astrocytes, SREBP cleavage-activating protein (SCAP) binds to Sterol Regulatory Element-binding Protein 2 (SREBP2) in order to transport the complex from the endoplasmic reticulum to the Golgi apparatus. Cholesterol is secreted out of the astrocytes by ATP-Binding Cassette A1 (ABCA1) transporter. Lipoprotein receptors such as triggering receptor expressed on myeloid cells 2 (TREM2) internalize cholesterol into the microglia, while lipoprotein receptors such as Low-density lipoprotein receptor-related protein 1 (LRP1) internalize cholesterol into the neuron. Cytochrome P450 Family 46 Subfamily A Member 1 (CYP46A1) converts excess cholesterol to 24S-hydroxycholesterol (24S-OHC). Cholesterol has been approved for its direct effect on the production of amyloid-beta and tau proteins. The addition of cholesterol to the brain promotes the activity of beta-site amyloid precursor protein cleaving enzyme 1 (BACE1), secretase, and amyloid precursor protein (APP), which all aid in amyloid-beta production. The reduction of cholesterol esters in the brain have been found to reduce phosphorylated tau levels in mice. In this work, a computational pipeline was developed to identify the protein targets involved in cholesterol regulation in brain and further to identify chemical compounds as the inhibitors of a selected protein target. Since extensive evidence shows the strong correlation between brain cholesterol regulation and Alzheimer’s disease, a detailed literature review on genes or pathways related to the brain cholesterol synthesis and regulation was first conducted in this work. An interaction network was then built for those genes so that the top gene targets were identified. The involvement of these genes in Alzheimer’s disease progression was discussed, which was followed by the investigation of existing clinical trials for those targets. A ligand-protein docking program was finally developed to screen 1.5 million chemical compounds for the selected protein target. A machine learning program was developed to evaluate and predict the binding interaction between chemical compounds and the protein target. The results from this work pave the way for further drug discovery to regulate brain cholesterol to combat Alzheimer’s disease.

Keywords: Alzheimer’s disease, drug discovery, ligand-protein docking, gene-network analysis, cholesterol regulation

Procedia PDF Downloads 55
24562 The Various Legal Dimensions of Genomic Data

Authors: Amy Gooden

Abstract:

When human genomic data is considered, this is often done through only one dimension of the law, or the interplay between the various dimensions is not considered, thus providing an incomplete picture of the legal framework. This research considers and analyzes the various dimensions in South African law applicable to genomic sequence data – including property rights, personality rights, and intellectual property rights. The effective use of personal genomic sequence data requires the acknowledgement and harmonization of the rights applicable to such data.

Keywords: artificial intelligence, data, law, genomics, rights

Procedia PDF Downloads 130
24561 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: data integration, data warehousing, federated architecture, Online Analytical Processing (OLAP)

Procedia PDF Downloads 223
24560 A Review Paper on Data Mining and Genetic Algorithm

Authors: Sikander Singh Cheema, Jasmeen Kaur

Abstract:

In this paper, the concept of data mining is summarized and its one of the important process i.e KDD is summarized. The data mining based on Genetic Algorithm is researched in and ways to achieve the data mining Genetic Algorithm are surveyed. This paper also conducts a formal review on the area of data mining tasks and genetic algorithm in various fields.

Keywords: data mining, KDD, genetic algorithm, descriptive mining, predictive mining

Procedia PDF Downloads 576
24559 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring

Authors: Seung-Lock Seo

Abstract:

This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.

Keywords: data mining, process data, monitoring, safety, industrial processes

Procedia PDF Downloads 381