Search results for: biological data mining
26555 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches
Authors: Aya Salama
Abstract:
Digital Twin is an emerging research topic that attracted researchers in the last decade. It is used in many fields, such as smart manufacturing and smart healthcare because it saves time and money. It is usually related to other technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, Human digital twin (HDT), in specific, is still a novel idea that still needs to prove its feasibility. HDT expands the idea of Digital Twin to human beings, which are living beings and different from the inanimate physical entities. The goal of this research was to create a Human digital twin that is responsible for real-time human replies automation by simulating human behavior. For this reason, clustering, supervised classification, topic extraction, and sentiment analysis were studied in this paper. The feasibility of the HDT for personal replies generation on social messaging applications was proved in this work. The overall accuracy of the proposed approach in this paper was 63% which is a very promising result that can open the way for researchers to expand the idea of HDT. This was achieved by using Random Forest for clustering the question data base and matching new questions. K-nearest neighbor was also applied for sentiment analysis.Keywords: human digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification, clustering
Procedia PDF Downloads 8726554 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method
Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas
Abstract:
To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.Keywords: building energy prediction, data mining, demand response, electricity market
Procedia PDF Downloads 31626553 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles
Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis
Abstract:
Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review
Procedia PDF Downloads 16226552 GIS-Based Spatial Distribution and Evaluation of Selected Heavy Metals Contamination in Topsoil around Ecton Mining Area, Derbyshire, UK
Authors: Zahid O. Alibrahim, Craig D. Williams, Clive L. Roberts
Abstract:
The study area (Ecton mining area) is located in the southern part of the Peak District in Derbyshire, England. It is bounded by the River Manifold from the west. This area has been mined for a long period. As a result, huge amounts of potentially toxic metals were released into the surrounding area and are most likely to be a significant source of heavy metal contamination to the local soil, water and vegetation. In order to appraise the potential heavy metal pollution in this area, 37 topsoil samples (5-20 cm depth) were collected and analysed for their total content of Cu, Pb, Zn, Mn, Cr, Ni and V using ICP (Inductively Coupled Plasma) optical emission spectroscopy. Multivariate Geospatial analyses using the GIS technique were utilised to draw geochemical maps of the metals of interest over the study area. A few hotspot points, areas of elevated concentrations of metals, were specified, which are presumed to be the results of anthropogenic activities. In addition, the soil’s environmental quality was evaluated by calculating the Mullers’ Geoaccumulation index (I geo), which suggests that the degree of contamination of the investigated heavy metals has the following trend: Pb > Zn > Cu > Mn > Ni = Cr = V. Furthermore, the potential ecological risk, using the enrichment factor (EF), was also specified. On the basis of the calculated amount or the EF, the levels of pollution for the studied metals in the study area have the following order: Pb>Zn>Cu>Cr>V>Ni>Mn.Keywords: enrichment factor, geoaccumulation index, GIS, heavy metals, multivariate analysis
Procedia PDF Downloads 35826551 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient
Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart
Abstract:
Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.Keywords: data mining, information retrieval system, multi-label, problem transformation, histogram of gradients
Procedia PDF Downloads 37426550 Colloids and Heavy Metals in Groundwaters: Tangential Flow Filtration Method for Study of Metal Distribution on Different Sizes of Colloids
Authors: Jiancheng Zheng
Abstract:
When metals are released into water from mining activities, they undergo changes chemically, physically and biologically and then may become more mobile and transportable along the waterway from their original sites. Natural colloids, including both organic and inorganic entities, are naturally occurring in any aquatic environment with sizes in the nanometer range. Natural colloids in a water system play an important role, quite often a key role, in binding and transporting compounds. When assessing and evaluating metals in natural waters, their sources, mobility, fate, and distribution patterns in the system are the major concerns from the point of view of assessing environmental contamination and pollution during resource development. There are a few ways to quantify colloids and accordingly study how metals distribute on different sizes of colloids. Current research results show that the presence of colloids can enhance the transport of some heavy metals in water, while heavy metals may also have an influence on the transport of colloids when cations in the water system change colloids and/or the ion strength of the water system changes. Therefore, studies into the relationship between different sizes of colloids and different metals in a water system are necessary and needed as natural colloids in water systems are complex mixtures of both organic and inorganic as well as biological materials. Their stability could be sensitive to changes in their shapes, phases, hardness and functionalities due to coagulation and deposition et al. and chemical, physical, and biological reactions. Because metal contaminants’ adsorption on surfaces of colloids is closely related to colloid properties, it is desired to fraction water samples as soon as possible after a sample is taken in the natural environment in order to avoid changes to water samples during transportation and storage. For this reason, this study carried out groundwater sample processing in the field, using Prep/Scale tangential flow filtration systems with 3-level cartridges (1 kDa, 10 kDa and 100 kDa). Groundwater samples from seven sites at Fort MacMurray, Alberta, Canada, were fractionated during the 2015 field sampling season. All samples were processed within 3 hours after samples were taken. Preliminary results show that although the distribution pattern of metals on colloids may vary with different samples taken from different sites, some elements often tend to larger colloids (such as Fe and Re), some to finer colloids (such as Sb and Zn), while some of them mainly in the dissolved form (such as Mo and Be). This information is useful to evaluate and project the fate and mobility of different metals in the groundwaters and possibly in environmental water systems.Keywords: metal, colloid, groundwater, mobility, fractionation, sorption
Procedia PDF Downloads 36226549 Development of Electroencephalograph Collection System in Language-Learning Self-Study System That Can Detect Learning State of the Learner
Authors: Katsuyuki Umezawa, Makoto Nakazawa, Manabu Kobayashi, Yutaka Ishii, Michiko Nakano, Shigeichi Hirasawa
Abstract:
This research aims to develop a self-study system equipped with an artificial teacher who gives advice to students by detecting the learners and to evaluate language learning in a unified framework. 'Detecting the learners' means that the system understands the learners' learning conditions, such as each learner’s degree of understanding, the difference in each learner’s thinking process, the degree of concentration or boredom in learning, and problem solving for each learner, which can be interpreted from learning behavior. In this paper, we propose a system to efficiently collect brain waves from learners by focusing on only the brain waves among the biological information for 'detecting the learners'. The conventional Electroencephalograph (EEG) measurement method during learning using a simple EEG has the following disadvantages. (1) The start and end of EEG measurement must be done manually by the experiment participant or staff. (2) Even when the EEG signal is weak, it may not be noticed, and the data may not be obtained. (3) Since the acquired EEG data is stored in each PC, there is a possibility that the time of data acquisition will be different in each PC. This time, we developed a system to collect brain wave data on the server side. This system overcame the above disadvantages.Keywords: artificial teacher, e-learning, self-study system, simple EEG
Procedia PDF Downloads 14326548 Synthesis and in-Vitro Biological Activity of Novel Gallic Acid Derivatives
Authors: Hossein Mostafavi
Abstract:
A diversity of biological activities and pharmaceutical uses have been attributed to gallic acid derivatives such as antibacterial, anticancer, anti inflammatory. A series of gallic acid derivatives were synthesized, and their structure was confirmed by FT-IR, HNMR, CNMR, elemental analysis. In vitro biological activity of compounds was determined against Proteus vulgaris ATCC 7829, Escherichia coli ATCC 25922, as (Gram-negative) bacteria and bacillus cereus ATCC 11778, Staphylococus aureus ATCC 6538 as (Gram-positive) bacteria. Antibacterial susceptibility tests were done by use of the paper disc diffusion method on Mueller Hinton agar (Merck). Chloramiphenicol, Penicilline, Streptomycin and Tetracycline were standard reference antibiotics. The zone of inhibition against bacteria was measured after 24 hours at 37 °C. Compounds 3, 4, 5 were the main antibacterial compounds against Gram-negative bacteria but not Gram-positive.Keywords: gallic acid derivatives, antibacterial, antibiotics, inhibition
Procedia PDF Downloads 13626547 Development of a Technology Assessment Model by Patents and Customers' Review Data
Authors: Kisik Song, Sungjoo Lee
Abstract:
Recent years have seen an increasing number of patent disputes due to excessive competition in the global market and a reduced technology life-cycle; this has increased the risk of investment in technology development. While many global companies have started developing a methodology to identify promising technologies and assess for decisions, the existing methodology still has some limitations. Post hoc assessments of the new technology are not being performed, especially to determine whether the suggested technologies turned out to be promising. For example, in existing quantitative patent analysis, a patent’s citation information has served as an important metric for quality assessment, but this analysis cannot be applied to recently registered patents because such information accumulates over time. Therefore, we propose a new technology assessment model that can replace citation information and positively affect technological development based on post hoc analysis of the patents for promising technologies. Additionally, we collect customer reviews on a target technology to extract keywords that show the customers’ needs, and we determine how many keywords are covered in the new technology. Finally, we construct a portfolio (based on a technology assessment from patent information) and a customer-based marketability assessment (based on review data), and we use them to visualize the characteristics of the new technologies.Keywords: technology assessment, patents, citation information, opinion mining
Procedia PDF Downloads 46626546 Application of Acid Base Accounting to Predict Post-Mining Drainage Quality in Coalfields of the Main Karoo Basin and Selected Sub-Basins, South Africa
Authors: Lindani Ncube, Baojin Zhao, Ken Liu, Helen Johanna Van Niekerk
Abstract:
Acid Base Accounting (ABA) is a tool used to assess the total amount of acidity or alkalinity contained in a specific rock sample, and is based on the total S concentration and the carbonate content of a sample. A preliminary ABA test was conducted on 14 sandstone and 5 coal samples taken from coalfields representing the Main Karoo Basin (Highveld, Vryheid and Molteno/Indwe Coalfields) and the Sub-basins (Witbank and Waterberg Coalfields). The results indicate that sandstone and coal from the Main Karoo Basin have the potential of generating Acid Mine Drainage (AMD) as they contain sufficient pyrite to generate acid, with the final pH of samples relatively low upon complete oxidation of pyrite. Sandstone from collieries representing the Main Karoo Basin are characterised by elevated contents of reactive S%. All the studied samples were characterised by an Acid Potential (AP) that is less than the Neutralizing Potential (NP) except for two samples. The results further indicate that the sandstone from the Main Karoo Basin is prone to acid generation as compared to the sandstone from the Sub-basins. However, the coal has a relatively low potential of generating any acid. The application of ABA in this study contributes to an understanding of the complexities governing water-rock interactions. In general, the coalfields from the Main Karoo Basin have much higher potential to produce AMD during mining processes than the coalfields in the Sub-basins.Keywords: Main Karoo Basin, sub-basin, coal, sandstone, acid base accounting (ABA)
Procedia PDF Downloads 43326545 The Crisis of Turkey's Downing the Russian Warplane within the Concept of Country Branding: The Examples of BBC World, and Al Jazeera English
Authors: Derya Gül Ünlü, Oguz Kuş
Abstract:
The branding of a country means that the country has its own position different from other countries in its region and thus it is perceived more specifically. It is made possible by the branding efforts of a country and the uniqueness of all the national structures, by presenting it in a specific way, by creating the desired image and attracting tourists and foreign investors. Establishing a national brand involves, in a sense, the process of managing the perceptions of the citizens of the other country about the target country, by structuring the image of the country permanently and holistically. By this means, countries are not easily affected by their crisis of international relations. Therefore, within the scope of the research that will be carried out from this point, it is aimed to show how the warplane downing crisis between Turkey and Russia is perceived on social media. The Russian warplane was downed by Turkey on November 24, 2015, on the grounds that Turkey violated the airspace on the Syrian border. Whereupon the relations between the two countries have been tensed, and Russia has called on its citizens not to go to Turkey and citizens in Turkey to return to their countries. Moreover, relations between two countries have been weakened, for example, tourism tours organized in Russia to Turkey and visa-free travel were canceled and all military dialogue was cut off. After the event, various news sites on social media published plenty of news related to topic and the readers made various comments about the event and Turkey. In this context, an investigation into the perception of Turkey's national brand before and after the warplane downing crisis has been conducted. through comments fetched from the reports on the BBC World, and from Al Jazeera English news sites on Facebook accounts, which takes place widely in the social media. In order to realize study, user comments were fetched from jet downing-related news which are published on Facebook fan-page of BBC World Service, and Al Jazeera English. Regarding this, all the news published between 24.10.2015-24.12.2015 and containing Turk and Turkey keyword in its title composed data set of our study. Afterwards, comments written to these news were analyzed via text mining technique. Furthermore, by sentiment analysis, it was intended to reveal reader’s emotions before and after the crisis.Keywords: Al Jazeera English, BBC World, country branding, social media, text mining
Procedia PDF Downloads 22326544 Government Big Data Ecosystem: A Systematic Literature Review
Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis
Abstract:
Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review
Procedia PDF Downloads 22826543 A Machine Learning Decision Support Framework for Industrial Engineering Purposes
Authors: Anli Du Preez, James Bekker
Abstract:
Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.Keywords: Data analytics, Industrial engineering, Machine learning, Value creation
Procedia PDF Downloads 16826542 Trichoderma spp Consortium and Its Efficacy as Biological Control Agent of Ganoderma Disease of Oil Palm (Elaies guineensis Jacquin)
Authors: Habu Musa, Nusaibah Binti Syd Ali
Abstract:
Oil palm industries particularly in Malaysia and Indonesia are being devastated by Ganoderma disease caused by Ganoderma spp. To date, this disease has been causing serious oil palm yield losses and collapse of oil palm trees, thus affecting its contribution to the producer’s economy. Research on sustainable and eco-friendly remedy to counter Ganoderma disease is on the upsurge to avoid the current control measures via synthetic fungicides. Trichoderma species have been the most studied and valued microbes as biological control agents in an effort to combat a wide range of plant diseases sustainably. Therefore, in this current study, the potential of Trichoderma spp. (Trichoderma asperellum, Trichoderma harzianum, and Trichoderma virens) as a consortium approach was evaluated as biological control agents against Ganoderma disease on oil palm. The consortium of Trichoderma spp. applied found to be the most effective treatment in suppressing Ganoderma disease with 83.03% and 89.16% from the foliar and bole symptoms respectively. Besides, it exhibited tremendous enhancement in the oil palm seedling vegetative growth parameters. Also, it had highly induced significant activity of peroxidase, polyphenol oxidase and total phenolic content was recorded in the consortium treatment compared to the control treatment. Disease development was slower in the seedlings treated with consortium of Trichoderma spp. compared to the positive control, which exhibited with the highest percentage of disease severity.Keywords: biological control, ganoderma disease, trichoderma, disease severity
Procedia PDF Downloads 27626541 Creation of Computerized Benchmarks to Facilitate Preparedness for Biological Events
Abstract:
Introduction: Communicable diseases and pandemics pose a growing threat to the well-being of the global population. A vital component of protecting the public health is the creation and sustenance of a continuous preparedness for such hazards. A joint Israeli-German task force was deployed in order to develop an advanced tool for self-evaluation of emergency preparedness for variable types of biological threats. Methods: Based on a comprehensive literature review and interviews with leading content experts, an evaluation tool was developed based on quantitative and qualitative parameters and indicators. A modified Delphi process was used to achieve consensus among over 225 experts from both Germany and Israel concerning items to be included in the evaluation tool. Validity and applicability of the tool for medical institutions was examined in a series of simulation and field exercises. Results: Over 115 German and Israeli experts reviewed and examined the proposed parameters as part of the modified Delphi cycles. A consensus of over 75% of experts was attained for 183 out of 188 items. The relative importance of each parameter was rated as part of the Delphi process, in order to define its impact on the overall emergency preparedness. The parameters were integrated in computerized web-based software that enables to calculate scores of emergency preparedness for biological events. Conclusions: The parameters developed in the joint German-Israeli project serve as benchmarks that delineate actions to be implemented in order to create and maintain an ongoing preparedness for biological events. The computerized evaluation tool enables to continuously monitor the level of readiness and thus strengths and gaps can be identified and corrected appropriately. Adoption of such a tool is recommended as an integral component of quality assurance of public health and safety.Keywords: biological events, emergency preparedness, bioterrorism, natural biological events
Procedia PDF Downloads 42326540 Research on the Aeration Systems’ Efficiency of a Lab-Scale Wastewater Treatment Plant
Authors: Oliver Marunțălu, Elena Elisabeta Manea, Lăcrămioara Diana Robescu, Mihai Necșoiu, Gheorghe Lăzăroiu, Dana Andreya Bondrea
Abstract:
In order to obtain efficient pollutants removal in small-scale wastewater treatment plants, uniform water flow has to be achieved. The experimental setup, designed for treating high-load wastewater (leachate), consists of two aerobic biological reactors and a lamellar settler. Both biological tanks were aerated by using three different types of aeration systems - perforated pipes, membrane air diffusers and tube ceramic diffusers. The possibility of homogenizing the water mass with each of the air diffusion systems was evaluated comparatively. The oxygen concentration was determined by optical sensors with data logging. The experimental data was analyzed comparatively for all three different air dispersion systems aiming to identify the oxygen concentration variation during different operational conditions. The Oxygenation Capacity was calculated for each of the three systems and used as performance and selection parameter. The global mass transfer coefficients were also evaluated as important tools in designing the aeration system. Even though using the tubular porous diffusers leads to higher oxygen concentration compared to the perforated pipe system (which provides medium-sized bubbles in the aqueous solution), it doesn’t achieve the threshold limit of 80% oxygen saturation in less than 30 minutes. The study has shown that the optimal solution for the studied configuration was the radial air diffusers which ensure an oxygen saturation of 80% in 20 minutes. An increment of the values was identified when the air flow was increased.Keywords: flow, aeration, bioreactor, oxygen concentration
Procedia PDF Downloads 38926539 Artificial Intelligence in Bioscience: The Next Frontier
Authors: Parthiban Srinivasan
Abstract:
With recent advances in computational power and access to enough data in biosciences, artificial intelligence methods are increasingly being used in drug discovery research. These methods are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Our goal is to develop a model that accurately predicts biological activity and toxicity parameters for novel compounds. We have compiled a robust library of over 150,000 chemical compounds with different pharmacological properties from literature and public domain databases. The compounds are stored in simplified molecular-input line-entry system (SMILES), a commonly used text encoding for organic molecules. We utilize an automated process to generate an array of numerical descriptors (features) for each molecule. Redundant and irrelevant descriptors are eliminated iteratively. Our prediction engine is based on a portfolio of machine learning algorithms. We found Random Forest algorithm to be a better choice for this analysis. We captured non-linear relationship in the data and formed a prediction model with reasonable accuracy by averaging across a large number of randomized decision trees. Our next step is to apply deep neural network (DNN) algorithm to predict the biological activity and toxicity properties. We expect the DNN algorithm to give better results and improve the accuracy of the prediction. This presentation will review all these prominent machine learning and deep learning methods, our implementation protocols and discuss these techniques for their usefulness in biomedical and health informatics.Keywords: deep learning, drug discovery, health informatics, machine learning, toxicity prediction
Procedia PDF Downloads 35626538 The Effect of Feature Selection on Pattern Classification
Authors: Chih-Fong Tsai, Ya-Han Hu
Abstract:
The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.Keywords: data mining, feature selection, pattern classification, dimensionality reduction
Procedia PDF Downloads 66926537 Analysis and Identification of Different Factors Affecting Students’ Performance Using a Correlation-Based Network Approach
Authors: Jeff Chak-Fu Wong, Tony Chun Yin Yip
Abstract:
The transition from secondary school to university seems exciting for many first-year students but can be more challenging than expected. Enabling instructors to know students’ learning habits and styles enhances their understanding of the students’ learning backgrounds, allows teachers to provide better support for their students, and has therefore high potential to improve teaching quality and learning, especially in any mathematics-related courses. The aim of this research is to collect students’ data using online surveys, to analyze students’ factors using learning analytics and educational data mining and to discover the characteristics of the students at risk of falling behind in their studies based on students’ previous academic backgrounds and collected data. In this paper, we use correlation-based distance methods and mutual information for measuring student factor relationships. We then develop a factor network using the Minimum Spanning Tree method and consider further study for analyzing the topological properties of these networks using social network analysis tools. Under the framework of mutual information, two graph-based feature filtering methods, i.e., unsupervised and supervised infinite feature selection algorithms, are used to analyze the results for students’ data to rank and select the appropriate subsets of features and yield effective results in identifying the factors affecting students at risk of failing. This discovered knowledge may help students as well as instructors enhance educational quality by finding out possible under-performers at the beginning of the first semester and applying more special attention to them in order to help in their learning process and improve their learning outcomes.Keywords: students' academic performance, correlation-based distance method, social network analysis, feature selection, graph-based feature filtering method
Procedia PDF Downloads 12926536 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection
Authors: Yaojun Wang, Yaoqing Wang
Abstract:
Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.Keywords: case-based reasoning, decision tree, stock selection, machine learning
Procedia PDF Downloads 42026535 Embodying the Ecological Validity in Creating the Sustainable Public Policy: A Study in Strengthening the Green Economy in Indonesia
Authors: Gatot Dwi Hendro, Hayyan ul Haq
Abstract:
This work aims to explore the strategy in embodying the ecological validity in creating the sustainability of public policy, particularly in strengthening the green economy in Indonesia. This green economy plays an important role in supporting the national development in Indonesia, as it is a part of the national policy that posits the primary priority in Indonesian governance. The green economy refers to the national development covering strategic natural resources, such as mining, gold, oil, coal, forest, water, marine, and the other supporting infrastructure for products and distribution, such as fabrics, roads, bridges, and so forth. Thus, all activities in those national development should consider the sustainability. This sustainability requires the strong commitment of the national and regional government, as well as the local governments to put the ecology as the main requirement for issuing any policy, such as licence in mining production, and developing and building new production and supporting infrastructures for optimising the national resources. For that reason this work will focus on the strategy how to embody the ecological values and norms in the public policy. In detail, this work will offer the method, i.e. legal techniques, in visualising and embodying the norms and public policy that valid ecologically. This ecological validity is required in order to maintain and sustain our collective life.Keywords: ecological validity, sustainable development, coherence, Indonesian Pancasila values, environment, marine
Procedia PDF Downloads 48526534 Advanced Oxidation Processes as a Pre-oxidation Step for Biological Treatment of Leachate from Technical Landfills
Authors: Ala Abdessemed, Mohamed Seddik Oussama Belahmadi, Nabil Charchar, Abdefettah Gherib, Bradai Fares, Boussadia Chouaib Nour El-Islem
Abstract:
Algerian cities are confronted with large quantities of waste generated by the disposal of household and similar residues in technical landfills (CET), such as the one in the location of Batna. The interaction between waste components and incoming water generates leachates rich in organic matter and trace elements, which require treatment before discharge. The aim of this study was to propose an effective process for treating the leachates, which were subjected to an initial chemical treatment using the (H₂O₂/UV) system. Optimal treatment conditions were determined at [H₂O₂] of 0.3 M and pH of 8.6. Next, two hybrid biological treatment systems were applied: hybrid system I (H₂O₂/UV/bacteria) and hybrid system II (H₂O₂/UV/bacteria/microalgae). The three processes resulted in the following degradation rates, expressed in terms of total organic carbon (TOC) 27.4% for the (H₂O₂/UV) system; 58.1% for the hybrid system I (H₂O₂/UV/Bacteria); 67.86% for the hybrid system II (H₂O₂/UV/Bacteria/Microalgae). This study demonstrates that a hybrid approach combining advanced oxidation processes and biological treatments is a highly effective alternative to achieve satisfactory treatment.Keywords: leachate, landfill, advanced oxidation processes, biological treatment, bacteria, microalgae, total organic carbon
Procedia PDF Downloads 7026533 Using Rainfall Simulators to Design and Assess the Post-Mining Erosional Stability
Authors: Ashraf M. Khalifa, Hwat Bing So, Greg Maddocks
Abstract:
Changes to the mining environmental approvals process in Queensland have been rolled out under the MERFP Act (2018). This includes requirements for a Progressive Rehabilitation and Closure Plan (PRC Plan). Key considerations of the landform design report within the PRC Plan must include: (i) identification of materials available for landform rehabilitation, including their ability to achieve the required landform design outcomes, (ii) erosion assessments to determine landform heights, gradients, profiles, and material placement, (iii) slope profile design considering the interactions between soil erodibility, rainfall erosivity, landform height, gradient, and vegetation cover to identify acceptable erosion rates over a long-term average, (iv) an analysis of future stability based on the factors described above e.g., erosion and /or landform evolution modelling. ACARP funded an extensive and thorough erosion assessment program using rainfall simulators from 1998 to 2010. The ACARP program included laboratory assessment of 35 soil and spoil samples from 16 coal mines and samples from a gold mine in Queensland using 3 x 0.8 m laboratory rainfall simulator. The reliability of the laboratory rainfall simulator was verified through field measurements using larger flumes 20 x 5 meters and catchment scale measurements at three sites (3 different catchments, average area of 2.5 ha each). Soil cover systems are a primary component of a constructed mine landform. The primary functions of a soil cover system are to sustain vegetation and limit the infiltration of water and oxygen into underlying reactive mine waste. If the external surface of the landform erodes, the functions of the cover system cannot be maintained, and the cover system will most likely fail. Assessing a constructed landform’s potential ‘long-term’ erosion stability requires defensible erosion rate thresholds below which rehabilitation landform designs are considered acceptably erosion-resistant or ‘stable’. The process used to quantify erosion rates using rainfall simulators (flumes) to measure rill and inter-rill erosion on bulk samples under laboratory conditions or on in-situ material under field conditions will be explained.Keywords: open-cut, mining, erosion, rainfall simulator
Procedia PDF Downloads 10126532 Predicting Success and Failure in Drug Development Using Text Analysis
Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev
Abstract:
Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.Keywords: data analysis, drug development, sentiment analysis, text-mining
Procedia PDF Downloads 15726531 Classification of Contexts for Mentioning Love in Interviews with Victims of the Holocaust
Authors: Marina Yurievna Aleksandrova
Abstract:
Research of the Holocaust retains value not only for history but also for sociology and psychology. One of the most important fields of study is how people were coping during and after this traumatic event. The aim of this paper is to identify the main contexts of the topic of love and to determine which contexts are more characteristic for different groups of victims of the Holocaust (gender, nationality, age). In this research, transcripts of interviews with Holocaust victims that were collected during 1946 for the "Voices of the Holocaust" project were used as data. Main contexts were analyzed with methods of network analysis and latent semantic analysis and classified by gender, age, and nationality with random forest. The results show that love is articulated and described significantly differently for male and female informants, nationality is shown results with lower values of quality metrics, as well as the age.Keywords: Holocaust, latent semantic analysis, network analysis, text-mining, random forest
Procedia PDF Downloads 18026530 A Web Service Based Sensor Data Management System
Authors: Rose A. Yemson, Ping Jiang, Oyedeji L. Inumoh
Abstract:
The deployment of wireless sensor network has rapidly increased, however with the increased capacity and diversity of sensors, and applications ranging from biological, environmental, military etc. generates tremendous volume of data’s where more attention is placed on the distributed sensing and little on how to manage, analyze, retrieve and understand the data generated. This makes it more quite difficult to process live sensor data, run concurrent control and update because sensor data are either heavyweight, complex, and slow. This work will focus on developing a web service platform for automatic detection of sensors, acquisition of sensor data, storage of sensor data into a database, processing of sensor data using reconfigurable software components. This work will also create a web service based sensor data management system to monitor physical movement of an individual wearing wireless network sensor technology (SunSPOT). The sensor will detect movement of that individual by sensing the acceleration in the direction of X, Y and Z axes accordingly and then send the sensed reading to a database that will be interfaced with an internet platform. The collected sensed data will determine the posture of the person such as standing, sitting and lying down. The system is designed using the Unified Modeling Language (UML) and implemented using Java, JavaScript, html and MySQL. This system allows real time monitoring an individual closely and obtain their physical activity details without been physically presence for in-situ measurement which enables you to work remotely instead of the time consuming check of an individual. These details can help in evaluating an individual’s physical activity and generate feedback on medication. It can also help in keeping track of any mandatory physical activities required to be done by the individuals. These evaluations and feedback can help in maintaining a better health status of the individual and providing improved health care.Keywords: HTML, java, javascript, MySQL, sunspot, UML, web-based, wireless network sensor
Procedia PDF Downloads 21226529 Use of AI for the Evaluation of the Effects of Steel Corrosion in Mining Environments
Authors: Maria Luisa de la Torre, Javier Aroba, Jose Miguel Davila, Aguasanta M. Sarmiento
Abstract:
Steel is one of the most widely used materials in polymetallic sulfide mining installations. One of the main problems suffered by these facilities is the economic losses due to the corrosion of this material, which is accelerated and aggravated by the contact with acid waters generated in these mines when sulfides come into contact with oxygen and water. This generation of acidic water, in turn, is accelerated by the presence of acidophilic bacteria. In order to gain a more detailed understanding of this corrosion process and the interaction between steel and acidic water, a laboratory experiment was carried out in which carbon steel plates were introduced into four different solutions for 27 days: distilled water (BK), which tried to assimilate the effect produced by rain on this material, an acid solution from a mine with a high Fe2+/Fe3+ (PO) content, another acid solution of water from another mine with a high Fe3+/Fe2+ (PH) content and, finally, one that reproduced the acid mine water with a high Fe2+/Fe3+ content but in which there were no bacteria (ST). Every 24 hours, physicochemical parameters were measured and water samples were taken to carry out an analysis of the dissolved elements. The results of these measurements were processed using an explainable AI model based on fuzzy logic. It could be seen that, in all cases, there was an increase in pH, as well as in the concentrations of Fe and, in particular, Fe(II), as a consequence of the oxidation of the steel plates. Proportionally, the increase in Fe concentration was higher in PO and ST than in PH because Fe precipitates were produced in the latter. The rise of Fe(II) was proportionally much higher in PH and, especially in the first hours of exposure, because it started from a lower initial concentration of this ion. Although to a lesser extent than in PH, the greater increase in Fe(II) also occurred faster in PO than in ST, a consequence of the action of the catalytic bacteria. On the other hand, Cu concentrations decreased throughout the experiment (with the exception of distilled water, which initially had no Cu, as a result of an electrochemical process that generates a precipitation of Cu together with Fe hydroxides. This decrease is lower in PH because the high total acidity keeps it in solution for a longer time. With the application of an artificial intelligence tool, it has been possible to evaluate the effects of steel corrosion in mining environments, corroborating and extending what was obtained by means of classical statistics. Acknowledgments: This work has been supported by MCIU/AEI/10.13039/501100011033/FEDER, UE, throughout the project PID2021-123130OB-I00.Keywords: carbon steel, corrosion, acid mine drainage, artificial intelligence, fuzzy logic
Procedia PDF Downloads 2026528 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm
Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima
Abstract:
In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.Keywords: cloud space, AES, FTP, NetBeans IDE
Procedia PDF Downloads 20626527 Characteristic Study of Polymer Sand as a Potential Substitute for Natural River Sand in Construction Industry
Authors: Abhishek Khupsare, Ajay Parmar, Ajay Agarwal, Swapnil Wanjari
Abstract:
The extreme demand for aggregate leads to the exploitation of river-bed for fine aggregates, affecting the environment adversely. Therefore, a suitable alternative to natural river sand is essentially required. This study focuses on preventing environmental impact by developing polymer sand to replace natural river sand (NRS). Development of polymer sand by mixing high volume fly ash, bottom ash, cement, natural river sand, and locally purchased high solid content polycarboxylate ether-based superplasticizer (HS-PCE). All the physical and chemical properties of polymer sand (P-Sand) were observed and satisfied the requirement of the Indian Standard code. P-Sand yields good specific gravity of 2.31 and is classified as zone-I sand with a satisfactory friction angle (37˚) compared to natural river sand (NRS) and Geopolymer fly ash sand (GFS). Though the water absorption (6.83%) and pH (12.18) are slightly more than those of GFS and NRS, the alkali silica reaction and soundness are well within the permissible limit as per Indian Standards. The chemical analysis by X-Ray fluorescence showed the presence of high amounts of SiO2 and Al2O3 with magnitudes of 58.879% 325 and 26.77%, respectively. Finally, the compressive strength of M-25 grade concrete using P-sand and Geopolymer sand (GFS) was observed to be 87.51% and 83.82% with respect to natural river sand (NRS) after 28 days, respectively. The results of this study indicate that P-sand can be a good alternative to NRS for construction work as it not only reduces the environmental effect due to sand mining but also focuses on utilising fly ash and bottom ash.Keywords: polymer sand, fly ash, bottom ash, HSPCE plasticizer, river sand mining
Procedia PDF Downloads 7726526 Recommender Systems Using Ensemble Techniques
Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim
Abstract:
This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.Keywords: product recommender system, ensemble technique, association rules, decision tree, artificial neural networks
Procedia PDF Downloads 294