Search results for: biological data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27086

Search results for: biological data mining

26756 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 101
26755 Implementation of Knowledge and Attitude Management Based on Holistic Approach in Andragogy Learning, as an Effort to Solve the Environmental Problems of Post-Coal Mining Activity

Authors: Aloysius Hardoko, Susilo

Abstract:

The root cause of the problem after the environmental damage due to coal mining activities defined as the province of East Kalimantan corridor masterplan economic activity accelerated the expansion of Indonesia's economic development (MP3EI) is the behavior of adults. Adult behavior can be changed through knowledge management and attitude. Based on the root of the problem, the objective of the research is to apply knowledge management and attitude based on holistic approach in learning andragogy as an effort to solve environmental problems after coal mining activities. Research methods to achieve the objective of using quantitative research with pretest postes group design. Knowledge management and attitudes based on a holistic approach in adult learning are applied through initial learning activities, core and case-based cover of environmental damage. The research instrument is a description of the case of environmental damage. The data analysis uses t-test to see the effect of knowledge management attitude based on holistic approach before and after adult learning. Location and sample of representative research of adults as many as 20 people in Kutai Kertanegara District, one of the districts in East Kalimantan province, which suffered the worst environmental damage. The conclusion of the research result is the application of knowledge management and attitude in adult learning influence to adult knowledge and attitude to overcome environmental problem post-coal mining activity.

Keywords: knowledge management and attitude, holistic approach, andragogy learning, environmental Issue

Procedia PDF Downloads 204
26754 Customer Preference in the Textile Market: Fabric-Based Analysis

Authors: Francisca Margarita Ocran

Abstract:

Underwear, and more particularly bras and panties, are defined as intimate clothing. Strictly speaking, they enhance the place of women in the public or private satchel. Therefore, women's lingerie is a complex garment with a high involvement profile, motivating consumers to buy it not only by its functional utility but also by the multisensory experience it provides them. Customer behavior models are generally based on customer data mining, and each model is designed to answer questions at a specific time. Predicting the customer experience is uncertain and difficult. Thus, knowledge of consumers' tastes in lingerie deserves to be treated as an experiential product, where the dimensions of the experience motivating consumers to buy a lingerie product and to remain faithful to it must be analyzed in detail by the manufacturers and retailers to engage and retain consumers, which is why this research aims to identify the variables that push consumers to choose their lingerie product, based on an in-depth analysis of the types of fabrics used to make lingerie. The data used in this study comes from online purchases. Machine learning approach with the use of Python programming language and Pycaret gives us a precision of 86.34%, 85.98%, and 84.55% for the three algorithms to use concerning the preference of a buyer in front of a range of lingerie. Gradient Boosting, random forest, and K Neighbors were used in this study; they are very promising and rich in the classification of preference in the textile industry.

Keywords: consumer behavior, data mining, lingerie, machine learning, preference

Procedia PDF Downloads 84
26753 Signals Monitored During Anaesthesia

Authors: Launcelot McGrath

Abstract:

A comprehensive understanding of physiological data is a vital aid to the anaesthesiologist in monitoring and maintaining the well-being of a patient undergoing surgery. Bio signal analysis is one of the most important topics that researchers have tried to develop over the last century to understand numerous human diseases. Understanding which biological signals are most important during anaesthesia is critically important. It is important that the anaesthesiologist understand both the signals themselves and the limitations introduced by the processes of acquisition. In this article, we provide an overview of different types of biological signals as well as the mechanisms applied to acquire them.

Keywords: biological signals, signal acquisition, anaesthesiology, patient monitoring

Procedia PDF Downloads 135
26752 Financial Assessment of the Hard Coal Mining in the Chosen Region in the Czech Republic: Real Options Methodology Application

Authors: Miroslav Čulík, Petr Gurný

Abstract:

This paper is aimed at the financial assessment of the hard coal mining in a given region by real option methodology application. Hard coal mining in this mine makes net loss for the owner during the last years due to the long-term unfavourable mining conditions and significant drop in the coal prices during the last years. Management is going to shut down the operation and abandon the project to reduce the loss of the company. The goal is to assess whether the shutting down the operation is the only and correct solution of the problem. Due to the uncertainty in the future hard coal price evolution, the production might be again restarted if the price raises enough to cover the cost of the production. For the assessment, real option methodology is applied, which captures two important aspect of the financial decision-making: risk and flexibility. The paper is structured as follows: first, current state is described and problem is analysed. Next, methodology of real options is described. At last, project is evaluated by applying real option methodology. The results are commented and recommendations are provided.

Keywords: real option, investment, option to abandon, option to shut down and restart, risk, flexibility

Procedia PDF Downloads 544
26751 Methods for Distinction of Cattle Using Supervised Learning

Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl

Abstract:

Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.

Keywords: genetic data, Pinzgau cattle, supervised learning, machine learning

Procedia PDF Downloads 544
26750 Virtual Dimension Analysis of Hyperspectral Imaging to Characterize a Mining Sample

Authors: L. Chevez, A. Apaza, J. Rodriguez, R. Puga, H. Loro, Juan Z. Davalos

Abstract:

Virtual Dimension (VD) procedure is used to analyze Hyperspectral Image (HIS) treatment-data in order to estimate the abundance of mineral components of a mining sample. Hyperspectral images coming from reflectance spectra (NIR region) are pre-treated using Standard Normal Variance (SNV) and Minimum Noise Fraction (MNF) methodologies. The endmember components are identified by the Simplex Growing Algorithm (SVG) and after adjusted to the reflectance spectra of reference-databases using Simulated Annealing (SA) methodology. The obtained abundance of minerals of the sample studied is very near to the ones obtained using XRD with a total relative error of 2%.

Keywords: hyperspectral imaging, minimum noise fraction, MNF, simplex growing algorithm, SGA, standard normal variance, SNV, virtual dimension, XRD

Procedia PDF Downloads 154
26749 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques

Authors: Mei-Yi Wu, Shang-Ming Huang

Abstract:

The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.

Keywords: mobile image retrieval, text mining, product information service system, online marketing

Procedia PDF Downloads 354
26748 Coordination Behavior, Theoretical Studies, and Biological Activity of Some Transition Metal Complexes with Oxime Ligands

Authors: Noura Kichou, Manel Tafergguenit, Nabila Ghechtouli, Zakia Hank

Abstract:

The aim of this work is to synthesize, characterize and evaluate the biological activity of two Ligands : glyoxime and dimethylglyoxime, and their metal Ni(II) chelates. The newly chelates were characterized by elemental analysis, IR, EPR, nuclear magnetic resonances (1H and 13C), and biological activity. The antibacterial and antifungal activities of the ligands and its metal complexes were screened against bacterial species (Staphylococcus aureus, Bacillus subtilis, and Escherichia coli) and fungi (Candida albicans). Ampicillin and amphotericin were used as references for antibacterial and antifungal studies. The activity data show that the metal complexes have a promising biological activity comparable with parent free ligand against bacterial and fungal species. A structural, energetic, and electronic theoretical study was carried out using the DFT method, with the functional B3LYP and the gaussian program 09. A complete optimization of geometries was made, followed by a calculation of the frequencies of the normal modes of vibration. The UV spectrum was also interpreted. The theoretical results were compared with the experimental data.

Keywords: glyoxime, dimetylglyoxime, nickel, antibacterial activity

Procedia PDF Downloads 96
26747 Coordination Behavior, Theoretical studies and Biological Activity of Some Transition Metal Complexes with Oxime Ligands

Authors: Noura Kichou, Manel Tafergguenit, Nabila Ghechtouli, Zakia Hank

Abstract:

The aim of this work is to synthesize, characterize and evaluate the biological activity of two Ligands: glyoxime and dimethylglyoxime, and their metal Ni(II) chelates. The newly chelates were characterized by elemental analysis, IR, EPR, nuclear magnetic resonances (1H and 13C), and biological activity. The antibacterial and antifungal activities of the ligands and its metal complexes were screened against bacterial species (Staphylococcus aureus, Bacillus subtilis, and Escherichia coli) and fungi (Candida albicans). Ampicillin and amphotericin were used as references for antibacterial and antifungal studies. The activity data show that the metal complexes have a promising biological activity comparable with parent free ligand against bacterial and fungal species. A structural, energetic, and electronic theoretical study was carried out using the DFT method, with the functional B3LYP and the gaussian program 09. A complete optimization of geometries was made, followed by a calculation of the frequencies of the normal modes of vibration. The UV spectrum was also interpreted. The theoretical results were compared with the experimental data.

Keywords: glyoxime, dimetylglyoxime, nickel, antibacterial activity

Procedia PDF Downloads 106
26746 A Data-Mining Model for Protection of FACTS-Based Transmission Line

Authors: Ashok Kalagura

Abstract:

This paper presents a data-mining model for fault-zone identification of flexible AC transmission systems (FACTS)-based transmission line including a thyristor-controlled series compensator (TCSC) and unified power-flow controller (UPFC), using ensemble decision trees. Given the randomness in the ensemble of decision trees stacked inside the random forests model, it provides an effective decision on the fault-zone identification. Half-cycle post-fault current and voltage samples from the fault inception are used as an input vector against target output ‘1’ for the fault after TCSC/UPFC and ‘1’ for the fault before TCSC/UPFC for fault-zone identification. The algorithm is tested on simulated fault data with wide variations in operating parameters of the power system network, including noisy environment providing a reliability measure of 99% with faster response time (3/4th cycle from fault inception). The results of the presented approach using the RF model indicate the reliable identification of the fault zone in FACTS-based transmission lines.

Keywords: distance relaying, fault-zone identification, random forests, RFs, support vector machine, SVM, thyristor-controlled series compensator, TCSC, unified power-flow controller, UPFC

Procedia PDF Downloads 421
26745 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho

Abstract:

Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques

Procedia PDF Downloads 384
26744 Mining Coupled to Agriculture: Systems Thinking in Scalable Food Production

Authors: Jason West

Abstract:

Low profitability in agriculture production along with increasing scrutiny over environmental effects is limiting food production at scale. In contrast, the mining sector offers access to resources including energy, water, transport and chemicals for food production at low marginal cost. Scalable agricultural production can benefit from the nexus of resources (water, energy, transport) offered by mining activity in remote locations. A decision support bioeconomic model for controlled environment vertical farms was used. Four submodels were used: crop structure, nutrient requirements, resource-crop integration, and economic. They escalate to a macro mathematical model. A demonstrable dynamic systems framework is needed to prove productive outcomes are feasible. We demonstrate a generalized bioeconomic macro model for controlled environment production systems in minesites using systems dynamics modeling methodology. Despite the complexity of bioeconomic modelling of resource-agricultural dynamic processes and interactions, the economic potential greater than general economic models would assume. Scalability of production as an input becomes a key success feature.

Keywords: crop production systems, mathematical model, mining, agriculture, dynamic systems

Procedia PDF Downloads 74
26743 Case Study about Women Driving in Saudi Arabia Announced in 2018: Netnographic and Data Mining Study

Authors: Majdah Alnefaie

Abstract:

The ‘netnographic study’ and data mining have been used to monitor the public interaction on Social Media Sites (SMSs) to understand what the motivational factors influence the Saudi intentions regarding allowing women driving in Saudi Arabia in 2018. The netnographic study monitored the publics’ textual and visual communications in Twitter, Snapchat, and YouTube. SMSs users’ communications method is also known as electronic word of mouth (eWOM). Netnography methodology is still in its initial stages as it depends on manual extraction, reading and classification of SMSs users text. On the other hand, data mining is come from the computer and physical sciences background, therefore it is much harder to extract meaning from unstructured qualitative data. In addition, the new development in data mining software does not support the Arabic text, especially local slang in Saudi Arabia. Therefore, collaborations between social and computer scientists such as ‘netnographic study’ and data mining will enhance the efficiency of this study methodology leading to comprehensive research outcome. The eWOM communications between individuals on SMSs can promote a sense that sharing their preferences and experiences regarding politics and social government regulations is a part of their daily life, highlighting the importance of using SMSs as assistance in promoting participation in political and social. Therefore, public interactions on SMSs are important tools to comprehend people’s intentions regarding the new government regulations in the country. This study aims to answer this question, "What factors influence the Saudi Arabians' intentions of Saudi female's car-driving in 2018". The study utilized qualitative method known as netnographic study. The study used R studio to collect and analyses 27000 Saudi users’ comments from 25th May until 25th June 2018. The study has developed data collection model that support importing and analysing the Arabic text in the local slang. The data collection model in this study has been clustered based on different type of social networks, gender and the study main factors. The social network analysis was employed to collect comments from SMSs owned by governments’ originations, celebrities, vloggers, social activist and news SMSs accounts. The comments were collected from both males and females SMSs users. The sentiment analysis shows that the total number of positive comments Saudi females car driving was higher than negative comments. The data have provided the most important factors influenced the Saudi Arabians’ intention of Saudi females car driving including, culture and environment, freedom of choice, equal opportunities, security and safety. The most interesting finding indicted that women driving would play a role in increasing the individual freedom of choice. Saudi female will be able to drive cars to fulfill her daily life and family needs without being stressed due to the lack of transportation. The study outcome will help Saudi government to improve woman quality of life by increasing the ability to find more jobs and studies, increasing income through decreasing the spending on transport means such as taxi and having more freedom of choice in woman daily life needs. The study enhances the importance of using use marketing research to measure the public opinions on the new government regulations in the country. The study has explained the limitations and suggestions for future research.

Keywords: netnographic study, data mining, social media, Saudi Arabia, female driving

Procedia PDF Downloads 150
26742 Determining of the Performance of Data Mining Algorithm Determining the Influential Factors and Prediction of Ischemic Stroke: A Comparative Study in the Southeast of Iran

Authors: Y. Mehdipour, S. Ebrahimi, A. Jahanpour, F. Seyedzaei, B. Sabayan, A. Karimi, H. Amirifard

Abstract:

Ischemic stroke is one of the common reasons for disability and mortality. The fourth leading cause of death in the world and the third in some other sources. Only 1/3 of the patients with ischemic stroke fully recover, 1/3 of them end in permanent disability and 1/3 face death. Thus, the use of predictive models to predict stroke has a vital role in reducing the complications and costs related to this disease. Thus, the aim of this study was to specify the effective factors and predict ischemic stroke with the help of DM methods. The present study was a descriptive-analytic study. The population was 213 cases from among patients referring to Ali ibn Abi Talib (AS) Hospital in Zahedan. Data collection tool was a checklist with the validity and reliability confirmed. This study used DM algorithms of decision tree for modeling. Data analysis was performed using SPSS-19 and SPSS Modeler 14.2. The results of the comparison of algorithms showed that CHAID algorithm with 95.7% accuracy has the best performance. Moreover, based on the model created, factors such as anemia, diabetes mellitus, hyperlipidemia, transient ischemic attacks, coronary artery disease, and atherosclerosis are the most effective factors in stroke. Decision tree algorithms, especially CHAID algorithm, have acceptable precision and predictive ability to determine the factors affecting ischemic stroke. Thus, by creating predictive models through this algorithm, will play a significant role in decreasing the mortality and disability caused by ischemic stroke.

Keywords: data mining, ischemic stroke, decision tree, Bayesian network

Procedia PDF Downloads 170
26741 Safety-critical Alarming Strategy Based on Statistically Defined Slope Deformation Behaviour Model Case Study: Upright-dipping Highwall in a Coal Mining Area

Authors: Lintang Putra Sadewa, Ilham Prasetya Budhi

Abstract:

Slope monitoring program has now become a mandatory campaign for any open pit mines around the world to operate safely. Utilizing various slope monitoring instruments and strategies, miners are now able to deliver precise decisions in mitigating the risk of slope failures which can be catastrophic. Currently, the most sophisticated slope monitoring technology available is the Slope Stability Radar (SSR), whichcan measure wall deformation in submillimeter accuracy. One of its eminent features is that SSRcan provide a timely warning by automatically raise an alarm when a predetermined rate-of-movement threshold is reached. However, establishing proper alarm thresholds is arguably one of the onerous challenges faced in any slope monitoring program. The difficulty mainly lies in the number of considerations that must be taken when generating a threshold becausean alarm must be effectivethat it should limit the occurrences of false alarms while alsobeing able to capture any real wall deformations. In this sense, experience shows that a site-specific alarm thresholdtendsto produce more reliable results because it considers site distinctive variables. This study will attempt to determinealarming thresholds for safety-critical monitoring based on an empirical model of slope deformation behaviour that is defined statistically fromdeformation data captured by the Slope Stability Radar (SSR). The study area comprises of upright-dipping highwall setting in a coal mining area with intense mining activities, andthe deformation data used for the study were recorded by the SSR throughout the year 2022. The model is site-specific in nature thus, valuable information extracted from the model (e.g., time-to-failure, onset-of-acceleration, and velocity) will be applicable in setting up site-specific alarm thresholds and will give a clear understanding of how deformation trends evolve over the area.

Keywords: safety-critical monitoring, alarming strategy, slope deformation behaviour model, coal mining

Procedia PDF Downloads 86
26740 Data Analysis Tool for Predicting Water Scarcity in Industry

Authors: Tassadit Issaadi Hamitouche, Nicolas Gillard, Jean Petit, Valerie Lavaste, Celine Mayousse

Abstract:

Water is a fundamental resource for the industry. It is taken from the environment either from municipal distribution networks or from various natural water sources such as the sea, ocean, rivers, aquifers, etc. Once used, water is discharged into the environment, reprocessed at the plant or treatment plants. These withdrawals and discharges have a direct impact on natural water resources. These impacts can apply to the quantity of water available, the quality of the water used, or to impacts that are more complex to measure and less direct, such as the health of the population downstream from the watercourse, for example. Based on the analysis of data (meteorological, river characteristics, physicochemical substances), we wish to predict water stress episodes and anticipate prefectoral decrees, which can impact the performance of plants and propose improvement solutions, help industrialists in their choice of location for a new plant, visualize possible interactions between companies to optimize exchanges and encourage the pooling of water treatment solutions, and set up circular economies around the issue of water. The development of a system for the collection, processing, and use of data related to water resources requires the functional constraints specific to the latter to be made explicit. Thus the system will have to be able to store a large amount of data from sensors (which is the main type of data in plants and their environment). In addition, manufacturers need to have 'near-real-time' processing of information in order to be able to make the best decisions (to be rapidly notified of an event that would have a significant impact on water resources). Finally, the visualization of data must be adapted to its temporal and geographical dimensions. In this study, we set up an infrastructure centered on the TICK application stack (for Telegraf, InfluxDB, Chronograf, and Kapacitor), which is a set of loosely coupled but tightly integrated open source projects designed to manage huge amounts of time-stamped information. The software architecture is coupled with the cross-industry standard process for data mining (CRISP-DM) data mining methodology. The robust architecture and the methodology used have demonstrated their effectiveness on the study case of learning the level of a river with a 7-day horizon. The management of water and the activities within the plants -which depend on this resource- should be considerably improved thanks, on the one hand, to the learning that allows the anticipation of periods of water stress, and on the other hand, to the information system that is able to warn decision-makers with alerts created from the formalization of prefectoral decrees.

Keywords: data mining, industry, machine Learning, shortage, water resources

Procedia PDF Downloads 120
26739 Modelling of Recovery and Application of Low-Grade Thermal Resources in the Mining and Mineral Processing Industry

Authors: S. McLean, J. A. Scott

Abstract:

The research topic is focusing on improving sustainable operation through recovery and reuse of waste heat in process water streams, an area in the mining industry that is often overlooked. There are significant advantages to the application of this topic, including economic and environmental benefits. The smelting process in the mining industry presents an opportunity to recover waste heat and apply it to alternative uses, thereby enhancing the overall process. This applied research has been conducted at the Sudbury Integrated Nickel Operations smelter site, in particular on the water cooling towers. The aim was to determine and optimize methods for appropriate recovery and subsequent upgrading of thermally low-grade heat lost from the water cooling towers in a manner that makes it useful for repurposing in applications, such as within an acid plant. This would be valuable to mining companies as it would be an opportunity to reduce the cost of the process, as well as decrease environmental impact and primary fuel usage. The waste heat from the cooling towers needs to be upgraded before it can be beneficially applied, as lower temperatures result in a decrease of the number of potential applications. Temperature and flow rate data were collected from the water cooling towers at an acid plant over two years. The research includes process control strategies and the development of a model capable of determining if the proposed heat recovery technique is economically viable, as well as assessing any environmental impact with the reduction in net energy consumption by the process. Therefore, comprehensive cost and impact analyses are carried out to determine the best area of application for the recovered waste heat. This method will allow engineers to easily identify the value of thermal resources available to them and determine if a full feasibility study should be carried out. The rapid scoping model developed will be applicable to any site that generates large amounts of waste heat. Results show that heat pumps are an economically viable solution for this application, allowing for reduced cost and CO₂ emissions.

Keywords: environment, heat recovery, mining engineering, sustainability

Procedia PDF Downloads 108
26738 Investigation of Topic Modeling-Based Semi-Supervised Interpretable Document Classifier

Authors: Dasom Kim, William Xiu Shun Wong, Yoonjin Hyun, Donghoon Lee, Minji Paek, Sungho Byun, Namgyu Kim

Abstract:

There have been many researches on document classification for classifying voluminous documents automatically. Through document classification, we can assign a specific category to each unlabeled document on the basis of various machine learning algorithms. However, providing labeled documents manually requires considerable time and effort. To overcome the limitations, the semi-supervised learning which uses unlabeled document as well as labeled documents has been invented. However, traditional document classifiers, regardless of supervised or semi-supervised ones, cannot sufficiently explain the reason or the process of the classification. Thus, in this paper, we proposed a methodology to visualize major topics and class components of each document. We believe that our methodology for visualizing topics and classes of each document can enhance the reliability and explanatory power of document classifiers.

Keywords: data mining, document classifier, text mining, topic modeling

Procedia PDF Downloads 398
26737 Searching Linguistic Synonyms through Parts of Speech Tagging

Authors: Faiza Hussain, Usman Qamar

Abstract:

Synonym-based searching is recognized to be a complicated problem as text mining from unstructured data of web is challenging. Finding useful information which matches user need from bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration to realize the technique. Parts-of-Speech tagging is applied for pattern generation of the query and a thesaurus for this experiment was formed and used. Comparison with Non-Context Based Searching, Context Based searching proved to be a more efficient approach while dealing with linguistic semantics. This approach is very beneficial in doing intent based searching. Finally, results and future dimensions are presented.

Keywords: natural language processing, text mining, information retrieval, parts-of-speech tagging, grammar, semantics

Procedia PDF Downloads 305
26736 Syndromic Surveillance Framework Using Tweets Data Analytics

Authors: David Ming Liu, Benjamin Hirsch, Bashir Aden

Abstract:

Syndromic surveillance is to detect or predict disease outbreaks through the analysis of medical sources of data. Using social media data like tweets to do syndromic surveillance becomes more and more popular with the aid of open platform to collect data and the advantage of microblogging text and mobile geographic location features. In this paper, a Syndromic Surveillance Framework is presented with machine learning kernel using tweets data analytics. Influenza and the three cities Abu Dhabi, Al Ain and Dubai of United Arabic Emirates are used as the test disease and trial areas. Hospital cases data provided by the Health Authority of Abu Dhabi (HAAD) are used for the correlation purpose. In our model, Latent Dirichlet allocation (LDA) engine is adapted to do supervised learning classification and N-Fold cross validation confusion matrix are given as the simulation results with overall system recall 85.595% performance achieved.

Keywords: Syndromic surveillance, Tweets, Machine Learning, data mining, Latent Dirichlet allocation (LDA), Influenza

Procedia PDF Downloads 109
26735 Formal Innovations vs. Informal Innovations: The Case of the Mining Sector in Nigeria

Authors: Jegede Oluseye Oladayo

Abstract:

The study mapped innovation activities in the formal and informal mining sector in Nigeria. Data were collected through primary and secondary sources. Primary data were collected through guided questionnaire administration, guided interviews and personal observation. A purposive sampling method was adopted to select firms that are micro, small and medium enterprises. The study covered 100 (50 in the formal sector and 50 in the informal sector) purposively selected companies in south-western Nigeria. Secondary data were collected from different published sources. Data were analysed using descriptive and inferential statistics. Of the four types of technological innovations sampled, organisational innovation was found to be highest both in the formal (100%) and informal (100%) sectors, followed by process innovation: 60% in the formal sector and 28% in the informal sector, marketing innovation and diffusion based innovation were implemented by 64% and 4% respectively in the formal sector. There were no R&D activities (intramural or extramural) in both sectors, however, innovation activities occur at moderate levels in the formal sector. This is characterised by acquisition of machinery, equipment, hardware (100%), software (56), training (82%) and acquisition of external knowledge (60%) in the formal sector. In the informal sector, innovation activities were characterised by acquisition of external knowledge (100%), training/learning by experience (100%) and acquisition of tools (68%). The impact of innovation on firm’s performance in the formal sector was expressed mainly as increased capacity of production (100%), reduced production cost per unit of labour (88%), compliance with governmental regulatory requirements (72%) and entry on new markets (60%). In the informal sector, the impact of innovation was mainly expressed in improved flexibility of production (70%) and machinery/energy efficiency (70%). The important technological driver of process innovation in the mining sector was acquisition of machinery which accounts for the prevalence of 100% both in the formal and informal sectors. Next to this is training and re-training of technical staff, 74% in both the formal and the informal sector. Other factors influencing organisational innovation are skill of workforce with a prevalence of 80% in both the formal and informal sector. The important technological drivers include educational background of the manager/head of technical department (54%) for organisational innovation and (50%) for process innovation in the formal sector. The study concluded that innovation competence of the firms was mostly organisational changes.

Keywords: innovation prevalence, innovation activities, innovation performance, innovation drivers

Procedia PDF Downloads 374
26734 Planning for Enviromental and Social Sustainability in Coastal Areas: A Case of Alappad

Authors: K. Vrinda

Abstract:

Coastal ecosystems across the world are facing a lot of challenges due to natural phenomena as well as from uncontrolled human interventions. Here, Alappad, a coastal island situated in Kerala, India is undergoing significant damage and is gradually losing its environmental and social sustainability. The area is blessed with very rare and precious black mineral sand deposits. Sand mining for these minerals started in 1911 and is still continuing. But, unfortunately all the problems that Alappad faces now, have its root on mining of this mineral sand. The land area is continuously diminishing due to sea erosion. The mining has also caused displacement of people and environmental degradation. Marine life also is getting affected by mining on beach and pollution. The inhabitants are fishermen who are largely dependent on the eco-system for a living. So loss of environmental sustainability subsequently affects social sustainability too. Now the damage has reached a point beyond which our actions may not be able to make any impact. This was one of the most affected areas of the 2004 tsunami and the environmental degradation has further increased the vulnerability. So this study focuses on understanding the concerns related to the resource utilization, environment and the indigenous community staying there, and on formulating suitable strategies to restore the sustainability of the area. An extensive study was conducted on site, to find out the physical, social, and economical characteristics of the area. A focus group discussion with the inhabitants shed light on different issues they face in their day-to-day life. The analysis of all these data, led to the formation of a new development vision for the area which focuses on environmental restoration and socio-economic development while allowing controlled exploitation of resources. A participatory approach is formulated which enables these three aspects through community based programs.

Keywords: Community development, Disaster resilience, Ecological restoration, Environmental sustainability, Social-environmental planning, Social Sustainability

Procedia PDF Downloads 109
26733 The Role of Strategic Alliances, Innovation Capability, Cost Reduction in Enhancing Customer Loyalty and Firm’s Competitive Advantage

Authors: Soebowo Musa

Abstract:

Mining industries are known to be very volatile due to their sensitive nature toward changes in the environment, particularly coal mining. Heavy equipment distributors and coal mining contractors are among heavily affected by such volatility. They are facing more uncertainty on the sustainability of the coal mining industry. Strategic alliances and organizational capabilities such as innovation capability have long been seen as ways to stay competitive with a focus more on the strategic alliances partner-to-partner in serving their customers. In today’s rapid change in the environment, a shift in consumer behaviors, and the human-centric business approach, this study looks at the strategic alliance partner-to-customer relationship in both the industrial organization and resource-based theories. This study was conducted based on 250 respondents from the strategic alliances partner-to-customer between heavy equipment distributors and coal mining contractors in Indonesia. This study finds strategic alliances have the highest association toward cost reduction, a proxy of operational efficiency followed by its association toward innovation capability. Further, strategic alliances and innovation capability have a positive relationship with customer loyalty, while innovation capability and customer loyalty have no significant relationships toward the firm’s competitive advantage. This study also indicates that cost reduction is not a condition to develop customer loyalty in the strategic alliance partner-to-customer relationship. It confirms strategic alliances are a strategy that creates a firm’s operational efficiency, innovation capability that develops customer loyalty, and competitive advantage.

Keywords: strategic alliance, innovation capability, cost reduction, customer loyalty, competitive advantage

Procedia PDF Downloads 114
26732 Identifying Concerned Citizen Communication Style During the State Parliamentary Elections in Bavaria

Authors: Volker Mittendorf, Andre Schmale

Abstract:

In this case study, we want to explore the Twitter-use of candidates during the state parliamentary elections-year 2018 in Bavaria, Germany. This paper focusses on the seven parties that probably entered the parliament. Against this background, the paper classifies the use of language as populism which itself is considered as a political communication style. First, we determine the election campaigns which started in the years 2017 on Twitter, after that we categorize the posting times of the different direct candidates in order to derive ideal types from our empirical data. Second, we have done the exploration based on the dictionary of concerned citizens which contains German political language of the right and the far right. According to that, we are analyzing the corpus with methods of text mining and social network analysis, and afterwards we display the results in a network of words of concerned citizen communication style (CCCS).

Keywords: populism, communication style, election, text mining, social media

Procedia PDF Downloads 147
26731 Multicriteria for Optimal Land Use after Mining

Authors: Carla Idely Palencia-Aguilar

Abstract:

Mining in Colombia represents around 2% of the GDP (USD 8 billion in 2018), with main productions represented by coal, nickel, gold, silver, emeralds, iron, limestone, gypsum, among others. Sand and Gravel had been decreasing its participation of the GDP with a reduction of 33.2 million m3 in 2015, to 27.4 in 2016, 22.7 in 2017 and 15.8 in 2018, with a consumption of approximately 3 tons/inhabitant. However, with the new government policies it is expected to increase in the following years. Mining causes temporary environmental impacts, once restoration and rehabilitation takes place, social, environmental and economic benefits are higher than the initial state. A way to demonstrate how the mining interventions had contributed to improve the characteristics of the region after sand and gravel mining, the NDVI (Normalized Difference Vegetation Index) from MODIS and ASTER were employed. The histograms show not only increments of vegetation in the area (8 times higher), but also topographies similar to the ones before the intervention, according to the application for sustainable development selected: either agriculture, forestry, cattle raising, artificial wetlands or do nothing. The decision was based upon a Multicriteria analysis for optimal land use, with three main variables: geostatistics, evapotranspiration and groundwater characteristics. The use of remote sensing, meteorological stations, piezometers, sunphotometers, geoelectric analysis among others; provide the information required for the multicriteria decision. For cattle raising and agricultural applications (where various crops were implemented), conservation of products were tested by means of nanotechnology. The results showed a duration of 2 years with no chemicals added for preservation and concentration of vitamins of the tested products.

Keywords: ASTER, Geostatistics, MODIS, Multicriteria

Procedia PDF Downloads 121
26730 Comparative Analysis of the Computer Methods' Usage for Calculation of Hydrocarbon Reserves in the Baltic Sea

Authors: Pavel Shcherban, Vlad Golovanov

Abstract:

Nowadays, the depletion of hydrocarbon deposits on the land of the Kaliningrad region leads to active geological exploration and development of oil and natural gas reserves in the southeastern part of the Baltic Sea. LLC 'Lukoil-Kaliningradmorneft' implements a comprehensive program for the development of the region's shelf in 2014-2023. Due to heterogeneity of reservoir rocks in various open fields, as well as with ambiguous conclusions on the contours of deposits, additional geological prospecting and refinement of the recoverable oil reserves are carried out. The key element is use of an effective technique of computer stock modeling at the first stage of processing of the received data. The following step uses information for the cluster analysis, which makes it possible to optimize the field development approaches. The article analyzes the effectiveness of various methods for reserves' calculation and computer modelling methods of the offshore hydrocarbon fields. Cluster analysis allows to measure influence of the obtained data on the development of a technical and economic model for mining deposits. The relationship between the accuracy of the calculation of recoverable reserves and the need of modernization of existing mining infrastructure, as well as the optimization of the scheme of opening and development of oil deposits, is observed.

Keywords: cluster analysis, computer modelling of deposits, correction of the feasibility study, offshore hydrocarbon fields

Procedia PDF Downloads 163
26729 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis

Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales

Abstract:

This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.

Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis

Procedia PDF Downloads 191
26728 A Hybrid Approach for Thread Recommendation in MOOC Forums

Authors: Ahmad. A. Kardan, Amir Narimani, Foozhan Ataiefard

Abstract:

Recommender Systems have been developed to provide contents and services compatible to users based on their behaviors and interests. Due to information overload in online discussion forums and users diverse interests, recommending relative topics and threads is considered to be helpful for improving the ease of forum usage. In order to lead learners to find relevant information in educational forums, recommendations are even more needed. We present a hybrid thread recommender system for MOOC forums by applying social network analysis and association rule mining techniques. Initial results indicate that the proposed recommender system performs comparatively well with regard to limited available data from users' previous posts in the forum.

Keywords: association rule mining, hybrid recommender system, massive open online courses, MOOCs, social network analysis

Procedia PDF Downloads 290
26727 Feature Selection for Production Schedule Optimization in Transition Mines

Authors: Angelina Anani, Ignacio Ortiz Flores, Haitao Li

Abstract:

The use of underground mining methods have increased significantly over the past decades. This increase has also been spared on by several mines transitioning from surface to underground mining. However, determining the transition depth can be a challenging task, especially when coupled with production schedule optimization. Several researchers have simplified the problem by excluding operational features relevant to production schedule optimization. Our research objective is to investigate the extent to which operational features of transition mines accounted for affect the optimal production schedule. We also provide a framework for factors to consider in production schedule optimization for transition mines. An integrated mixed-integer linear programming (MILP) model is developed that maximizes the NPV as a function of production schedule and transition depth. A case study is performed to validate the model, with a comparative sensitivity analysis to obtain operational insights.

Keywords: underground mining, transition mines, mixed-integer linear programming, production schedule

Procedia PDF Downloads 165