Search results for: Data Mining
24929 Quantifying User-Related, System-Related, and Context-Related Patterns of Smartphone Use
Authors: Andrew T. Hendrickson, Liven De Marez, Marijn Martens, Gytha Muller, Tudor Paisa, Koen Ponnet, Catherine Schweizer, Megan Van Meer, Mariek Vanden Abeele
Abstract:
Quantifying and understanding the myriad ways people use their phones and how that impacts their relationships, cognitive abilities, mental health, and well-being is increasingly important in our phone-centric society. However, most studies on the patterns of phone use have focused on theory-driven tests of specific usage hypotheses using self-report questionnaires or analyses of smaller datasets. In this work we present a series of analyses from a large corpus of over 3000 users that combine data-driven and theory-driven analyses to identify reliable smartphone usage patterns and clusters of similar users. Furthermore, we compare the stability of user clusters across user- and system-initiated sessions, as well as during the hypothesized ritualized behavior times directly before and after sleeping. Our results indicate support for some hypothesized usage patterns but present a more complete and nuanced view of how people use smartphones.Keywords: data mining, experience sampling, smartphone usage, health and well being
Procedia PDF Downloads 16324928 Use of Locally Effective Microorganisms in Conjunction with Biochar to Remediate Mine-Impacted Soils
Authors: Thomas F. Ducey, Kristin M. Trippe, James A. Ippolito, Jeffrey M. Novak, Mark G. Johnson, Gilbert C. Sigua
Abstract:
The Oronogo-Duenweg mining belt –approximately 20 square miles around the Joplin, Missouri area– is a designated United States Environmental Protection Agency Superfund site due to lead-contaminated soil and groundwater by former mining and smelting operations. Over almost a century of mining (from 1848 to the late 1960’s), an estimated ten million tons of cadmium, lead, and zinc containing material have been deposited on approximately 9,000 acres. Sites that have undergone remediation, in which the O, A, and B horizons have been removed along with the lead contamination, the exposed C horizon remains incalcitrant to revegetation efforts. These sites also suffer from poor soil microbial activity, as measured by soil extracellular enzymatic assays, though 16S ribosomal ribonucleic acid (rRNA) indicates that microbial diversity is equal to sites that have avoided mine-related contamination. Soil analysis reveals low soil organic carbon, along with high levels of bio-available zinc, that reflect the poor soil fertility conditions and low microbial activity. Our study looked at the use of several materials to restore and remediate these sites, with the goal of improving soil health. The following materials, and their purposes for incorporation into the study, were as follows: manure-based biochar for the binding of zinc and other heavy metals responsible for phytotoxicity, locally sourced biosolids and compost to incorporate organic carbon into the depleted soils, effective microorganisms harvested from nearby pristine sites to provide a stable community for nutrient cycling in the newly composited 'soil material'. Our results indicate that all four materials used in conjunction result in the greatest benefit to these mine-impacted soils, based on above ground biomass, microbial biomass, and soil enzymatic activities.Keywords: locally effective microorganisms, biochar, remediation, reclamation
Procedia PDF Downloads 21724927 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results
Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter
Abstract:
Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.Keywords: classification, CRISP-DM, machine learning, predictive quality, regression
Procedia PDF Downloads 14424926 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text
Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni
Abstract:
The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance
Procedia PDF Downloads 15124925 Ibrutinib and the Potential Risk of Cardiac Failure: A Review of Pharmacovigilance Data
Authors: Abdulaziz Alakeel, Roaa Alamri, Abdulrahman Alomair, Mohammed Fouda
Abstract:
Introduction: Ibrutinib is a selective, potent, and irreversible small-molecule inhibitor of Bruton's tyrosine kinase (BTK). It forms a covalent bond with a cysteine residue (CYS-481) at the active site of Btk, leading to inhibition of Btk enzymatic activity. The drug is indicated to treat certain type of cancers such as mantle cell lymphoma (MCL), chronic lymphocytic leukaemia and Waldenström's macroglobulinaemia (WM). Cardiac failure is a condition referred to inability of heart muscle to pump adequate blood to human body organs. There are multiple types of cardiac failure including left and right-sided heart failure, systolic and diastolic heart failures. The aim of this review is to evaluate the risk of cardiac failure associated with the use of ibrutinib and to suggest regulatory recommendations if required. Methodology: Signal Detection team at the National Pharmacovigilance Center (NPC) of Saudi Food and Drug Authority (SFDA) performed a comprehensive signal review using its national database as well as the World Health Organization (WHO) database (VigiBase), to retrieve related information for assessing the causality between cardiac failure and ibrutinib. We used the WHO- Uppsala Monitoring Centre (UMC) criteria as standard for assessing the causality of the reported cases. Results: Case Review: The number of resulted cases for the combined drug/adverse drug reaction are 212 global ICSRs as of July 2020. The reviewers have selected and assessed the causality for the well-documented ICSRs with completeness scores of 0.9 and above (35 ICSRs); the value 1.0 presents the highest score for best-written ICSRs. Among the reviewed cases, more than half of them provides supportive association (four probable and 15 possible cases). Data Mining: The disproportionality of the observed and the expected reporting rate for drug/adverse drug reaction pair is estimated using information component (IC), a tool developed by WHO-UMC to measure the reporting ratio. Positive IC reflects higher statistical association while negative values indicates less statistical association, considering the null value equal to zero. The results of (IC=1.5) revealed a positive statistical association for the drug/ADR combination, which means “Ibrutinib” with “Cardiac Failure” have been observed more than expected when compared to other medications available in WHO database. Conclusion: Health regulators and health care professionals must be aware for the potential risk of cardiac failure associated with ibrutinib and the monitoring of any signs or symptoms in treated patients is essential. The weighted cumulative evidences identified from causality assessment of the reported cases and data mining are sufficient to support a causal association between ibrutinib and cardiac failure.Keywords: cardiac failure, drug safety, ibrutinib, pharmacovigilance, signal detection
Procedia PDF Downloads 12924924 A Dynamic Solution Approach for Heart Disease Prediction
Authors: Walid Moudani
Abstract:
The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the coronary heart disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts’ knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.Keywords: multi-classifier decisions tree, features reduction, dynamic programming, rough sets
Procedia PDF Downloads 41024923 Solar Power Generation in a Mining Town: A Case Study for Australia
Authors: Ryan Chalk, G. M. Shafiullah
Abstract:
Climate change is a pertinent issue facing governments and societies around the world. The industrial revolution has resulted in a steady increase in the average global temperature. The mining and energy production industries have been significant contributors to this change prompting government to intervene by promoting low emission technology within these sectors. This paper initially reviews the energy problem in Australia and the mining sector with a focus on the energy requirements and production methods utilised in Western Australia (WA). Renewable energy in the form of utility-scale solar photovoltaics (PV) provides a solution to these problems by providing emission-free energy which can be used to supplement the existing natural gas turbines in operation at the proposed site. This research presents a custom renewable solution for the mining site considering the specific township network, local weather conditions, and seasonal load profiles. A summary of the required PV output is presented to supply slightly over 50% of the towns power requirements during the peak (summer) period, resulting in close to full coverage in the trench (winter) period. Dig Silent Power Factory Software has been used to simulate the characteristics of the existing infrastructure and produces results of integrating PV. Large scale PV penetration in the network introduce technical challenges, that includes; voltage deviation, increased harmonic distortion, increased available fault current and power factor. Results also show that cloud cover has a dramatic and unpredictable effect on the output of a PV system. The preliminary analyses conclude that mitigation strategies are needed to overcome voltage deviations, unacceptable levels of harmonics, excessive fault current and low power factor. Mitigation strategies are proposed to control these issues predominantly through the use of high quality, made for purpose inverters. Results show that use of inverters with harmonic filtering reduces the level of harmonic injections to an acceptable level according to Australian standards. Furthermore, the configuration of inverters to supply active and reactive power assist in mitigating low power factor problems. Use of FACTS devices; SVC and STATCOM also reduces the harmonics and improve the power factor of the network, and finally, energy storage helps to smooth the power supply.Keywords: climate change, mitigation strategies, photovoltaic (PV), power quality
Procedia PDF Downloads 16624922 Breast Cancer Survivability Prediction via Classifier Ensemble
Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia
Abstract:
This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.Keywords: classifier ensemble, breast cancer survivability, data mining, SEER
Procedia PDF Downloads 32824921 Phytoremediation of artisanal gold mine tailings - Potential of Chrysopogon zizanioides and Andropogon gayanus in the Sahelian climate
Authors: Yamma Rose, Kone Martine, Yonli Arsène, Wanko Ngnien Adrien
Abstract:
Soil pollution and, consequently, water resources by micropollutants from gold mine tailings constitute a major threat in developing countries due to the lack of waste treatment. Phytoremediation is an alternative for extracting or trapping micropollutants from contaminated soils by mining residues. The potentialities of Chrysopogon zizanioides (acclimated plant) and Andropogon gayanus (native plant) to accumulate arsenic (As), mercury (Hg), iron (Fe) and zinc (Zn) were studied in artisanal gold mine in Ouagadougou, Burkina Faso. The phytoremediation effectiveness of two plant species was studied in 75 pots of 30 liters each, containing mining residues from the artisanal gold processing site in the rural commune of Nimbrogo. The experiments cover three modalities: Tn - planted unpolluted soils; To – unplanted mine tailings and Tp – planted mine tailings arranged in a randomized manner. The pots were amended quarterly with compost to provide nutrients to the plants. The phytoremediation assessment consists of comparing the growth, biomass and capacity of these two herbaceous plants to extract or to trap Hg, Fe, Zn and As in mining residues in a controlled environment. The analysis of plant species parameters cultivated in mine tailings shows indices of relative growth of A. gayanus very significantly high (34.38%) compared to 20.37% for C.zizanioides. While biomass analysis reveals that C. zizanioides has greater foliage and root system growth than A. gayanus. The results after a culture time of 6 months showed that C. zizanioides and A. gayanus have the potential to accumulate Hg, Fe, Zn and As. Root biomass has a more significant accumulation than aboveground biomass for both herbaceous species. Although the BCF bioaccumulation factor values for both plants together are low (<1), the removal efficiency of Hg, Fe, Zn and As is 45.13%, 42.26%, 21.5% and 2.87% respectively in 24 weeks of culture with C. zizanioides. However, pots grown with A. gayanus gives an effectiveness rate of 43.55%; 41.52%; 2.87% and 1.35% respectively for Fe, Zn, Hg and As. The results indicate that the plant species studied have a strong phytoremediation potential, although that of A. gayanus is relatively less than C. zizanioides.Keywords: artisanal gold mine tailings, andropogon gayanus, chrysopogon zizanioides, phytoremediation
Procedia PDF Downloads 6524920 Knowledge Discovery from Production Databases for Hierarchical Process Control
Authors: Pavol Tanuska, Pavel Vazan, Michal Kebisek, Dominika Jurovata
Abstract:
The paper gives the results of the project that was oriented on the usage of knowledge discoveries from production systems for needs of the hierarchical process control. One of the main project goals was the proposal of knowledge discovery model for process control. Specifics data mining methods and techniques was used for defined problems of the process control. The gained knowledge was used on the real production system, thus, the proposed solution has been verified. The paper documents how it is possible to apply new discovery knowledge to be used in the real hierarchical process control. There are specified the opportunities for application of the proposed knowledge discovery model for hierarchical process control.Keywords: hierarchical process control, knowledge discovery from databases, neural network, process control
Procedia PDF Downloads 48124919 Copyright Clearance for Artificial Intelligence Training Data: Challenges and Solutions
Authors: Erva Akin
Abstract:
– The use of copyrighted material for machine learning purposes is a challenging issue in the field of artificial intelligence (AI). While machine learning algorithms require large amounts of data to train and improve their accuracy and creativity, the use of copyrighted material without permission from the authors may infringe on their intellectual property rights. In order to overcome copyright legal hurdle against the data sharing, access and re-use of data, the use of copyrighted material for machine learning purposes may be considered permissible under certain circumstances. For example, if the copyright holder has given permission to use the data through a licensing agreement, then the use for machine learning purposes may be lawful. It is also argued that copying for non-expressive purposes that do not involve conveying expressive elements to the public, such as automated data extraction, should not be seen as infringing. The focus of such ‘copy-reliant technologies’ is on understanding language rules, styles, and syntax and no creative ideas are being used. However, the non-expressive use defense is within the framework of the fair use doctrine, which allows the use of copyrighted material for research or educational purposes. The questions arise because the fair use doctrine is not available in EU law, instead, the InfoSoc Directive provides for a rigid system of exclusive rights with a list of exceptions and limitations. One could only argue that non-expressive uses of copyrighted material for machine learning purposes do not constitute a ‘reproduction’ in the first place. Nevertheless, the use of machine learning with copyrighted material is difficult because EU copyright law applies to the mere use of the works. Two solutions can be proposed to address the problem of copyright clearance for AI training data. The first is to introduce a broad exception for text and data mining, either mandatorily or for commercial and scientific purposes, or to permit the reproduction of works for non-expressive purposes. The second is that copyright laws should permit the reproduction of works for non-expressive purposes, which opens the door to discussions regarding the transposition of the fair use principle from the US into EU law. Both solutions aim to provide more space for AI developers to operate and encourage greater freedom, which could lead to more rapid innovation in the field. The Data Governance Act presents a significant opportunity to advance these debates. Finally, issues concerning the balance of general public interests and legitimate private interests in machine learning training data must be addressed. In my opinion, it is crucial that robot-creation output should fall into the public domain. Machines depend on human creativity, innovation, and expression. To encourage technological advancement and innovation, freedom of expression and business operation must be prioritised.Keywords: artificial intelligence, copyright, data governance, machine learning
Procedia PDF Downloads 8324918 Recent Findings of Late Bronze Age Mining and Archaeometallurgy Activities in the Mountain Region of Colchis (Southern Lechkhumi, Georgia)
Authors: Rusudan Chagelishvili, Nino Sulava, Tamar Beridze, Nana Rezesidze, Nikoloz Tatuashvili
Abstract:
The South Caucasus is one of the most important centers of prehistoric metallurgy, known for its Colchian bronze culture. Modern Lechkhumi – historical Mountainous Colchis where the existence of prehistoric metallurgy is confirmed by the discovery of many artifacts is a part of this area. Studies focused on prehistoric smelting sites, related artefacts, and ore deposits have been conducted during last ten years in Lechkhumi. More than 20 prehistoric smelting sites and artefacts associated with metallurgical activities (ore roasting furnaces, slags, crucible, and tuyères fragments) have been identified so far. Within the framework of integrated studies was established that these sites were operating in 13-9 centuries B.C. and used for copper smelting. Palynological studies of slags revealed that chestnut (Castanea sativa) and hornbeam (Carpinus sp.) wood were used as smelting fuel. Geological exploration-analytical studies revealed that copper ore mining, processing, and smelting sites were distributed close to each other. Despite recent complex data, the signs of prehistoric mines (trenches) haven’t been found in this part of the study area so far. Since 2018 the archaeological-geological exploration has been focused on the southern part of Lechkhumi and covered the areas of villages Okureshi and Opitara. Several copper smelting sites (Okureshi 1 and 2, Opitara 1), as well as a Colchian Bronze culture settlement, have been identified here. Three mine workings have been found in the narrow gorge of the river Rtkhmelebisgele in the vicinities of the village Opitara. In order to establish a link between the Opitara-Okureshi archaeometallurgical sites, Late Bronze Age settlements, and mines, various scientific analytical methods -mineralized rock and slags petrography and atomic absorption spectrophotometry (AAS) analysis have been applied. The careful examination of Opitara mine workings revealed that there is a striking difference between the mine #1 on the right bank of the river and mines #2 and #3 on the left bank. The first one has all characteristic features of the Soviet period mine working (e. g. high portal with angular ribs and roof showing signs of blasting). In contrast, mines #2 and #3, which are located very close to each other, have round-shaped portals/entrances, low roofs, and fairly smooth ribs and are filled with thick layers of river sediments and collapsed weathered rock mass. A thorough review of the publications related to prehistoric mine workings revealed some striking similarities between mines #2 and #3 with their worldwide analogues. Apparently, the ore extraction from these mines was conducted by fire-setting applying primitive tools. It was also established that mines are cut in Jurassic mineralized volcanic rocks. Ore minerals (chalcopyrite, pyrite, galena) are related to calcite and quartz veins. The results obtained through the petrochemical and petrography studies of mineralized rock samples from Opitara mines and prehistoric slags are in complete correlation with each other, establishing the direct link between copper mining and smelting within the study area. Acknowledgment: This work was supported by the Shota Rustaveli National Science Foundation of Georgia (grant # FR-19-13022).Keywords: archaeometallurgy, Mountainous Colchis, mining, ore minerals
Procedia PDF Downloads 18024917 Blue Economy and Marine Mining
Authors: Fani Sakellariadou
Abstract:
The Blue Economy includes all marine-based and marine-related activities. They correspond to established, emerging as well as unborn ocean-based industries. Seabed mining is an emerging marine-based activity; its operations depend particularly on cutting-edge science and technology. The 21st century will face a crisis in resources as a consequence of the world’s population growth and the rising standard of living. The natural capital stored in the global ocean is decisive for it to provide a wide range of sustainable ecosystem services. Seabed mineral deposits were identified as having a high potential for critical elements and base metals. They have a crucial role in the fast evolution of green technologies. The major categories of marine mineral deposits are deep-sea deposits, including cobalt-rich ferromanganese crusts, polymetallic nodules, phosphorites, and deep-sea muds, as well as shallow-water deposits including marine placers. Seabed mining operations may take place within continental shelf areas of nation-states. In international waters, the International Seabed Authority (ISA) has entered into 15-year contracts for deep-seabed exploration with 21 contractors. These contracts are for polymetallic nodules (18 contracts), polymetallic sulfides (7 contracts), and cobalt-rich ferromanganese crusts (5 contracts). Exploration areas are located in the Clarion-Clipperton Zone, the Indian Ocean, the Mid Atlantic Ridge, the South Atlantic Ocean, and the Pacific Ocean. Potential environmental impacts of deep-sea mining include habitat alteration, sediment disturbance, plume discharge, toxic compounds release, light and noise generation, and air emissions. They could cause burial and smothering of benthic species, health problems for marine species, biodiversity loss, reduced photosynthetic mechanism, behavior change and masking acoustic communication for mammals and fish, heavy metals bioaccumulation up the food web, decrease of the content of dissolved oxygen, and climate change. An important concern related to deep-sea mining is our knowledge gap regarding deep-sea bio-communities. The ecological consequences that will be caused in the remote, unique, fragile, and little-understood deep-sea ecosystems and inhabitants are still largely unknown. The blue economy conceptualizes oceans as developing spaces supplying socio-economic benefits for current and future generations but also protecting, supporting, and restoring biodiversity and ecological productivity. In that sense, people should apply holistic management and make an assessment of marine mining impacts on ecosystem services, including the categories of provisioning, regulating, supporting, and cultural services. The variety in environmental parameters, the range in sea depth, the diversity in the characteristics of marine species, and the possible proximity to other existing maritime industries cause a span of marine mining impact the ability of ecosystems to support people and nature. In conclusion, the use of the untapped potential of the global ocean demands a liable and sustainable attitude. Moreover, there is a need to change our lifestyle and move beyond the philosophy of single-use. Living in a throw-away society based on a linear approach to resource consumption, humans are putting too much pressure on the natural environment. Applying modern, sustainable and eco-friendly approaches according to the principle of circular economy, a substantial amount of natural resource savings will be achieved. Acknowledgement: This work is part of the MAREE project, financially supported by the Division VI of IUPAC. This work has been partly supported by the University of Piraeus Research Center.Keywords: blue economy, deep-sea mining, ecosystem services, environmental impacts
Procedia PDF Downloads 8324916 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach
Authors: Theertha Chandroth
Abstract:
This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.Keywords: XML, JSON, data comparison, integration testing, Python, SQL
Procedia PDF Downloads 14024915 Using Machine Learning Techniques to Extract Useful Information from Dark Data
Authors: Nigar Hussain
Abstract:
It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.Keywords: big data, dark data, machine learning, heatmap, random forest
Procedia PDF Downloads 2824914 Evaluation of Modern Natural Language Processing Techniques via Measuring a Company's Public Perception
Authors: Burak Oksuzoglu, Savas Yildirim, Ferhat Kutlu
Abstract:
Opinion mining (OM) is one of the natural language processing (NLP) problems to determine the polarity of opinions, mostly represented on a positive-neutral-negative axis. The data for OM is usually collected from various social media platforms. In an era where social media has considerable control over companies’ futures, it’s worth understanding social media and taking actions accordingly. OM comes to the fore here as the scale of the discussion about companies increases, and it becomes unfeasible to gauge opinion on individual levels. Thus, the companies opt to automize this process by applying machine learning (ML) approaches to their data. For the last two decades, OM or sentiment analysis (SA) has been mainly performed by applying ML classification algorithms such as support vector machines (SVM) and Naïve Bayes to a bag of n-gram representations of textual data. With the advent of deep learning and its apparent success in NLP, traditional methods have become obsolete. Transfer learning paradigm that has been commonly used in computer vision (CV) problems started to shape NLP approaches and language models (LM) lately. This gave a sudden rise to the usage of the pretrained language model (PTM), which contains language representations that are obtained by training it on the large datasets using self-supervised learning objectives. The PTMs are further fine-tuned by a specialized downstream task dataset to produce efficient models for various NLP tasks such as OM, NER (Named-Entity Recognition), Question Answering (QA), and so forth. In this study, the traditional and modern NLP approaches have been evaluated for OM by using a sizable corpus belonging to a large private company containing about 76,000 comments in Turkish: SVM with a bag of n-grams, and two chosen pre-trained models, multilingual universal sentence encoder (MUSE) and bidirectional encoder representations from transformers (BERT). The MUSE model is a multilingual model that supports 16 languages, including Turkish, and it is based on convolutional neural networks. The BERT is a monolingual model in our case and transformers-based neural networks. It uses a masked language model and next sentence prediction tasks that allow the bidirectional training of the transformers. During the training phase of the architecture, pre-processing operations such as morphological parsing, stemming, and spelling correction was not used since the experiments showed that their contribution to the model performance was found insignificant even though Turkish is a highly agglutinative and inflective language. The results show that usage of deep learning methods with pre-trained models and fine-tuning achieve about 11% improvement over SVM for OM. The BERT model achieved around 94% prediction accuracy while the MUSE model achieved around 88% and SVM did around 83%. The MUSE multilingual model shows better results than SVM, but it still performs worse than the monolingual BERT model.Keywords: BERT, MUSE, opinion mining, pretrained language model, SVM, Turkish
Procedia PDF Downloads 14624913 Genomics of Aquatic Adaptation
Authors: Agostinho Antunes
Abstract:
The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining
Procedia PDF Downloads 53324912 Decision Support System for Diagnosis of Breast Cancer
Authors: Oluwaponmile D. Alao
Abstract:
In this paper, two models have been developed to ascertain the best network needed for diagnosis of breast cancer. Breast cancer has been a disease that required the attention of the medical practitioner. Experience has shown that misdiagnose of the disease has been a major challenge in the medical field. Therefore, designing a system with adequate performance for will help in making diagnosis of the disease faster and accurate. In this paper, two models: backpropagation neural network and support vector machine has been developed. The performance obtained is also compared with other previously obtained algorithms to ascertain the best algorithms.Keywords: breast cancer, data mining, neural network, support vector machine
Procedia PDF Downloads 34724911 The Right to Data Portability and Its Influence on the Development of Digital Services
Authors: Roman Bieda
Abstract:
The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.Keywords: data portability, digital market, GDPR, personal data
Procedia PDF Downloads 47324910 Application of Building Information Modeling in Energy Management of Individual Departments Occupying University Facilities
Authors: Kung-Jen Tu, Danny Vernatha
Abstract:
To assist individual departments within universities in their energy management tasks, this study explores the application of Building Information Modeling in establishing the ‘BIM based Energy Management Support System’ (BIM-EMSS). The BIM-EMSS consists of six components: (1) sensors installed for each occupant and each equipment, (2) electricity sub-meters (constantly logging lighting, HVAC, and socket electricity consumptions of each room), (3) BIM models of all rooms within individual departments’ facilities, (4) data warehouse (for storing occupancy status and logged electricity consumption data), (5) building energy management system that provides energy managers with various energy management functions, and (6) energy simulation tool (such as eQuest) that generates real time 'standard energy consumptions' data against which 'actual energy consumptions' data are compared and energy efficiency evaluated. Through the building energy management system, the energy manager is able to (a) have 3D visualization (BIM model) of each room, in which the occupancy and equipment status detected by the sensors and the electricity consumptions data logged are displayed constantly; (b) perform real time energy consumption analysis to compare the actual and standard energy consumption profiles of a space; (c) obtain energy consumption anomaly detection warnings on certain rooms so that energy management corrective actions can be further taken (data mining technique is employed to analyze the relation between space occupancy pattern with current space equipment setting to indicate an anomaly, such as when appliances turn on without occupancy); and (d) perform historical energy consumption analysis to review monthly and annually energy consumption profiles and compare them against historical energy profiles. The BIM-EMSS was further implemented in a research lab in the Department of Architecture of NTUST in Taiwan and implementation results presented to illustrate how it can be used to assist individual departments within universities in their energy management tasks.Keywords: database, electricity sub-meters, energy anomaly detection, sensor
Procedia PDF Downloads 30724909 A Comparative Study on Automatic Feature Classification Methods of Remote Sensing Images
Authors: Lee Jeong Min, Lee Mi Hee, Eo Yang Dam
Abstract:
Geospatial feature extraction is a very important issue in the remote sensing research. In the meantime, the image classification based on statistical techniques, but, in recent years, data mining and machine learning techniques for automated image processing technology is being applied to remote sensing it has focused on improved results generated possibility. In this study, artificial neural network and decision tree technique is applied to classify the high-resolution satellite images, as compared to the MLC processing result is a statistical technique and an analysis of the pros and cons between each of the techniques.Keywords: remote sensing, artificial neural network, decision tree, maximum likelihood classification
Procedia PDF Downloads 34724908 The Implementation of Corporate Social Responsibility to Contribute the Isolated District and the Drop behind District to Overcome the Poverty, Study Cases: PT. Kaltim Prima Coal (KPC) Sanggata, East Borneo, Indonesia
Authors: Sri Suryaningsum
Abstract:
The achievement ‘Best Practice Model’ holds by the government on behalf of the success implementation corporate social responsibility program that held on PT. Kaltim Prima Coal which had operation located in the isolated district in Sanggata, it could be the reference for the other companies to improve the social welfare in surrounding area, especially for the companies that have operated in the isolated area in Indonesia. The rule of Kaltim Prima Coal as the catalyst in the development progress to push up the independence of district especially for the district which has located in surrounding mining operation from village level to the regency level, those programs had written in the 7 field program in Corporate Social Responsibility, it was doing by stakeholders. The stakeholders are village government, sub-district government, Regency and citizen. One of the best programs that implement at PT. Kaltim Prima Coal is Regarding Resettlement that was completed based on Asian Development Bank Resettlement Best Practice and International Financial Corporation Resettlement Action Plan. This program contributed on the resettlement residences to develop the isolated and the neglected district.Keywords: CSR, isolated, neglected, poverty, mining industry
Procedia PDF Downloads 24724907 Explainable Graph Attention Networks
Authors: David Pham, Yongfeng Zhang
Abstract:
Graphs are an important structure for data storage and computation. Recent years have seen the success of deep learning on graphs such as Graph Neural Networks (GNN) on various data mining and machine learning tasks. However, most of the deep learning models on graphs cannot easily explain their predictions and are thus often labelled as “black boxes.” For example, Graph Attention Network (GAT) is a frequently used GNN architecture, which adopts an attention mechanism to carefully select the neighborhood nodes for message passing and aggregation. However, it is difficult to explain why certain neighbors are selected while others are not and how the selected neighbors contribute to the final classification result. In this paper, we present a graph learning model called Explainable Graph Attention Network (XGAT), which integrates graph attention modeling and explainability. We use a single model to target both the accuracy and explainability of problem spaces and show that in the context of graph attention modeling, we can design a unified neighborhood selection strategy that selects appropriate neighbor nodes for both better accuracy and enhanced explainability. To justify this, we conduct extensive experiments to better understand the behavior of our model under different conditions and show an increase in both accuracy and explainability.Keywords: explainable AI, graph attention network, graph neural network, node classification
Procedia PDF Downloads 19824906 An Approach to Building a Recommendation Engine for Travel Applications Using Genetic Algorithms and Neural Networks
Authors: Adrian Ionita, Ana-Maria Ghimes
Abstract:
The lack of features, design and the lack of promoting an integrated booking application are some of the reasons why most online travel platforms only offer automation of old booking processes, being limited to the integration of a smaller number of services without addressing the user experience. This paper represents a practical study on how to improve travel applications creating user-profiles through data-mining based on neural networks and genetic algorithms. Choices made by users and their ‘friends’ in the ‘social’ network context can be considered input data for a recommendation engine. The purpose of using these algorithms and this design is to improve user experience and to deliver more features to the users. The paper aims to highlight a broader range of improvements that could be applied to travel applications in terms of design and service integration, while the main scientific approach remains the technical implementation of the neural network solution. The motivation of the technologies used is also related to the initiative of some online booking providers that have made the fact that they use some ‘neural network’ related designs public. These companies use similar Big-Data technologies to provide recommendations for hotels, restaurants, and cinemas with a neural network based recommendation engine for building a user ‘DNA profile’. This implementation of the ‘profile’ a collection of neural networks trained from previous user choices, can improve the usability and design of any type of application.Keywords: artificial intelligence, big data, cloud computing, DNA profile, genetic algorithms, machine learning, neural networks, optimization, recommendation system, user profiling
Procedia PDF Downloads 16324905 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures
Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat
Abstract:
In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.Keywords: association rules, clustering, similarity measure, statistical approaches
Procedia PDF Downloads 32024904 The Investigation of Enzymatic Activity in the Soils Under the Impact of Metallurgical Industrial Activity in Lori Marz, Armenia
Authors: T. H. Derdzyan, K. A. Ghazaryan, G. A. Gevorgyan
Abstract:
Beta-glucosidase, chitinase, leucine-aminopeptidase, acid phosphomonoestearse and acetate-esterase enzyme activities in the soils under the impact of metallurgical industrial activity in Lori marz (district) were investigated. The results of the study showed that the activities of the investigated enzymes in the soils decreased with increasing distance from the Shamlugh copper mine, the Chochkan tailings storage facility and the ore transportation road. Statistical analysis revealed that the activities of the enzymes were positively correlated (significant) to each other according to the observation sites which indicated that enzyme activities were affected by the same anthropogenic factor. The investigations showed that the soils were polluted with heavy metals (Cu, Pb, As, Co, Ni, Zn) due to copper mining activity in this territory. The results of Pearson correlation analysis revealed a significant negative correlation between heavy metal pollution degree (Nemerow integrated pollution index) and soil enzyme activity. All of this indicated that copper mining activity in this territory causing the heavy metal pollution of the soils resulted in the inhabitation of the activities of the enzymes which are considered as biological catalysts to decompose organic materials and facilitate the cycling of nutrients.Keywords: Armenia, metallurgical industrial activity, heavy metal pollutionl, soil enzyme activity
Procedia PDF Downloads 29624903 Aviation versus Aerospace: A Differential Analysis of Workforce Jobs via Text Mining
Authors: Sarah Werner, Michael J. Pritchard
Abstract:
From pilots to engineers, the skills development within the aerospace industry is exceptionally broad. Employers often struggle with finding the right mixture of qualified skills to fill their organizational demands. This effort to find qualified talent is further complicated by the industrial delineation between two key areas: aviation and aerospace. In a broad sense, the aerospace industry overlaps with the aviation industry. In turn, the aviation industry is a smaller sector segment within the context of the broader definition of the aerospace industry. Furthermore, it could be conceptually argued that -in practice- there is little distinction between these two sectors (i.e., aviation and aerospace). However, through our unstructured text analysis of over 6,000 job listings captured, our team found a clear delineation between aviation-related jobs and aerospace-related jobs. Using techniques in natural language processing, our research identifies an integrated workforce skill pattern that clearly breaks between these two sectors. While the aviation sector has largely maintained its need for pilots, mechanics, and associated support personnel, the staffing needs of the aerospace industry are being progressively driven by integrative engineering needs. Increasingly, this is leading many aerospace-based organizations towards the acquisition of 'system level' staffing requirements. This research helps to better align higher educational institutions with the current industrial staffing complexities within the broader aerospace sector.Keywords: aerospace industry, job demand, text mining, workforce development
Procedia PDF Downloads 27224902 A Schema of Building an Efficient Quality Gate throughout the Software Development with Tools
Authors: Le Chen
Abstract:
This paper presents an efficient tool platform scheme to ensure quality protection throughout the software development process. The main principle is to manage the information of requirements, design, development, testing, operation and maintenance process with proper tools, and to set up the quality standards of each process. Through the tools’ display and summary of quality standards, the quality standards can be visualizad and ready for policy decision, which is called Quality Gate in this paper. In addition, the tools are also integrated to achieve the exchange and relation of information which highly improving operational efficiency. In this paper, the feasibility of the scheme is verified by practical application of development projects, and the overall information display and data mining are proposed to be further improved.Keywords: efficiency, quality gate, software process, tools
Procedia PDF Downloads 35824901 How to Use Big Data in Logistics Issues
Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy
Abstract:
Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.Keywords: big data, logistics, operational efficiency, risk management
Procedia PDF Downloads 64124900 A Case Study of Ontology-Based Sentiment Analysis for Fan Pages
Authors: C. -L. Huang, J. -H. Ho
Abstract:
Social media has become more and more important in our life. Many enterprises promote their services and products to fans via the social media. The positive or negative sentiment of feedbacks from fans is very important for enterprises to improve their products, services, and promotion activities. The purpose of this paper is to understand the sentiment of the fan’s responses by analyzing the responses posted by fans on Facebook. The entity and aspect of fan’s responses were analyzed based on a predefined ontology. The ontology for cell phone sentiment analysis consists of aspect categories on the top level as follows: overall, shape, hardware, brand, price, and service. Each category consists of several sub-categories. All aspects for a fan’s response were found based on the ontology, and their corresponding sentimental terms were found using lexicon-based approach. The sentimental scores for aspects of fan responses were obtained by summarizing the sentimental terms in responses. The frequency of 'like' was also weighted in the sentimental score calculation. Three famous cell phone fan pages on Facebook were selected as demonstration cases to evaluate performances of the proposed methodology. Human judgment by several domain experts was also built for performance comparison. The performances of proposed approach were as good as those of human judgment on precision, recall and F1-measure.Keywords: opinion mining, ontology, sentiment analysis, text mining
Procedia PDF Downloads 232