Search results for: Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25529

Search results for: Data Mining

25349 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 497
25348 Multi-Class Text Classification Using Ensembles of Classifiers

Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari

Abstract:

Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.

Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost

Procedia PDF Downloads 231
25347 Social Media Mining with R. Twitter Analyses

Authors: Diana Codat

Abstract:

Tweets' analysis is part of text mining. Each document is a written text. It's possible to apply the usual text search techniques, in particular by switching to the bag-of-words representation. But the tweets induce peculiarities. Some may enrich the analysis. Thus, their length is calibrated (at least as far as public messages are concerned), special characters make it possible to identify authors (@) and themes (#), the tweet and retweet mechanisms make it possible to follow the diffusion of the information. Conversely, other characteristics may disrupt the analyzes. Because space is limited, authors often use abbreviations, emoticons to express feelings, and they do not pay much attention to spelling. All this creates noise that can complicate the task. The tweets carry a lot of potentially interesting information. Their exploitation is one of the main axes of the analysis of the social networks. We show how to access Twitter-related messages. We will initiate a study of the properties of the tweets, and we will follow up on the exploitation of the content of the messages. We will work under R with the package 'twitteR'. The study of tweets is a strong focus of analysis of social networks because Twitter has become an important vector of communication. This example shows that it is easy to initiate an analysis from data extracted directly online. The data preparation phase is of great importance.

Keywords: data mining, language R, social networks, Twitter

Procedia PDF Downloads 184
25346 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 428
25345 Automatic Lead Qualification with Opinion Mining in Customer Relationship Management Projects

Authors: Victor Radich, Tania Basso, Regina Moraes

Abstract:

Lead qualification is one of the main procedures in Customer Relationship Management (CRM) projects. Its main goal is to identify potential consumers who have the ideal characteristics to establish a profitable and long-term relationship with a certain organization. Social networks can be an important source of data for identifying and qualifying leads since interest in specific products or services can be identified from the users’ expressed feelings of (dis)satisfaction. In this context, this work proposes the use of machine learning techniques and sentiment analysis as an extra step in the lead qualification process in order to improve it. In addition to machine learning models, sentiment analysis or opinion mining can be used to understand the evaluation that the user makes of a particular service, product, or brand. The results obtained so far have shown that it is possible to extract data from social networks and combine the techniques for a more complete classification.

Keywords: lead qualification, sentiment analysis, opinion mining, machine learning, CRM, lead scoring

Procedia PDF Downloads 85
25344 Managing Data from One Hundred Thousand Internet of Things Devices Globally for Mining Insights

Authors: Julian Wise

Abstract:

Newcrest Mining is one of the world’s top five gold and rare earth mining organizations by production, reserves and market capitalization in the world. This paper elaborates on the data acquisition processes employed by Newcrest in collaboration with Fortune 500 listed organization, Insight Enterprises, to standardize machine learning solutions which process data from over a hundred thousand distributed Internet of Things (IoT) devices located at mine sites globally. Through the utilization of software architecture cloud technologies and edge computing, the technological developments enable for standardized processes of machine learning applications to influence the strategic optimization of mineral processing. Target objectives of the machine learning optimizations include time savings on mineral processing, production efficiencies, risk identification, and increased production throughput. The data acquired and utilized for predictive modelling is processed through edge computing by resources collectively stored within a data lake. Being involved in the digital transformation has necessitated the standardization software architecture to manage the machine learning models submitted by vendors, to ensure effective automation and continuous improvements to the mineral process models. Operating at scale, the system processes hundreds of gigabytes of data per day from distributed mine sites across the globe, for the purposes of increased improved worker safety, and production efficiency through big data applications.

Keywords: mineral technology, big data, machine learning operations, data lake

Procedia PDF Downloads 112
25343 Quantification of GHGs Emissions from Electricity and Diesel Fuel Consumption in Basalt Mining Industry in Thailand

Authors: S. Kittipongvises, A. Dubsok

Abstract:

The mineral and mining industry is necessary for countries to have an adequate and reliable supply of materials to meet their socio-economic development. Despite its importance, the environmental impacts from mineral exploration are hugely significant. This study aimed to investigate and quantify the amount of GHGs emissions emitted from both electricity and diesel vehicle fuel consumption in basalt mining in Thailand. Plant A, located in the northeastern region of Thailand, was selected as a case study. Results indicated that total GHGs emissions from basalt mining and operation (Plant A) were approximately 2,501,086 kgCO2e and 1,997,412 kgCO2e in 2014 and 2015, respectively. The estimated carbon intensity ranged between 1.824 kgCO2e to 2.284 kgCO2e per ton of rock product. Scope 1 (direct emissions) was the dominant driver of its total GHGs compared to scope 2 (indirect emissions). As such, transport related combustion of diesel fuels generated the highest GHGs emission (65%) compared to emissions from purchased electricity (35%). Some of the potential implications for mining entities were also presented.

Keywords: basalt mining, diesel fuel, electricity, GHGs emissions, Thailand

Procedia PDF Downloads 266
25342 Focus-Latent Dirichlet Allocation for Aspect-Level Opinion Mining

Authors: Mohsen Farhadloo, Majid Farhadloo

Abstract:

Aspect-level opinion mining that aims at discovering aspects (aspect identification) and their corresponding ratings (sentiment identification) from customer reviews have increasingly attracted attention of researchers and practitioners as it provides valuable insights about products/services from customer's points of view. Instead of addressing aspect identification and sentiment identification in two separate steps, it is possible to simultaneously identify both aspects and sentiments. In recent years many graphical models based on Latent Dirichlet Allocation (LDA) have been proposed to solve both aspect and sentiment identifications in a single step. Although LDA models have been effective tools for the statistical analysis of document collections, they also have shortcomings in addressing some unique characteristics of opinion mining. Our goal in this paper is to address one of the limitations of topic models to date; that is, they fail to directly model the associations among topics. Indeed in many text corpora, it is natural to expect that subsets of the latent topics have higher probabilities. We propose a probabilistic graphical model called focus-LDA, to better capture the associations among topics when applied to aspect-level opinion mining. Our experiments on real-life data sets demonstrate the improved effectiveness of the focus-LDA model in terms of the accuracy of the predictive distributions over held out documents. Furthermore, we demonstrate qualitatively that the focus-LDA topic model provides a natural way of visualizing and exploring unstructured collection of textual data.

Keywords: aspect-level opinion mining, document modeling, Latent Dirichlet Allocation, LDA, sentiment analysis

Procedia PDF Downloads 94
25341 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 520
25340 Heart Failure Identification and Progression by Classifying Cardiac Patients

Authors: Muhammad Saqlain, Nazar Abbas Saqib, Muazzam A. Khan

Abstract:

Heart Failure (HF) has become the major health problem in our society. The prevalence of HF has increased as the patient’s ages and it is the major cause of the high mortality rate in adults. A successful identification and progression of HF can be helpful to reduce the individual and social burden from this syndrome. In this study, we use a real data set of cardiac patients to propose a classification model for the identification and progression of HF. The data set has divided into three age groups, namely young, adult, and old and then each age group have further classified into four classes according to patient’s current physical condition. Contemporary Data Mining classification algorithms have been applied to each individual class of every age group to identify the HF. Decision Tree (DT) gives the highest accuracy of 90% and outperform all other algorithms. Our model accurately diagnoses different stages of HF for each age group and it can be very useful for the early prediction of HF.

Keywords: decision tree, heart failure, data mining, classification model

Procedia PDF Downloads 402
25339 An Experimental Study for Assessing Email Classification Attributes Using Feature Selection Methods

Authors: Issa Qabaja, Fadi Thabtah

Abstract:

Email phishing classification is one of the vital problems in the online security research domain that have attracted several scholars due to its impact on the users payments performed daily online. One aspect to reach a good performance by the detection algorithms in the email phishing problem is to identify the minimal set of features that significantly have an impact on raising the phishing detection rate. This paper investigate three known feature selection methods named Information Gain (IG), Chi-square and Correlation Features Set (CFS) on the email phishing problem to separate high influential features from low influential ones in phishing detection. We measure the degree of influentially by applying four data mining algorithms on a large set of features. We compare the accuracy of these algorithms on the complete features set before feature selection has been applied and after feature selection has been applied. After conducting experiments, the results show 12 common significant features have been chosen among the considered features by the feature selection methods. Further, the average detection accuracy derived by the data mining algorithms on the reduced 12-features set was very slight affected when compared with the one derived from the 47-features set.

Keywords: data mining, email classification, phishing, online security

Procedia PDF Downloads 432
25338 Assessing Carbon Stock and Sequestration of Reforestation Species on Old Mining Sites in Morocco Using the DNDC Model

Authors: Nabil Elkhatri, Mohamed Louay Metougui, Ngonidzashe Chirinda

Abstract:

Mining activities have left a legacy of degraded landscapes, prompting urgent efforts for ecological restoration. Reforestation holds promise as a potent tool to rehabilitate these old mining sites, with the potential to sequester carbon and contribute to climate change mitigation. This study focuses on evaluating the carbon stock and sequestration potential of reforestation species in the context of Morocco's mining areas, employing the DeNitrification-DeComposition (DNDC) model. The research is grounded in recognizing the need to connect theoretical models with practical implementation, ensuring that reforestation efforts are informed by accurate and context-specific data. Field data collection encompasses growth patterns, biomass accumulation, and carbon sequestration rates, establishing an empirical foundation for the study's analyses. By integrating the collected data with the DNDC model, the study aims to provide a comprehensive understanding of carbon dynamics within reforested ecosystems on old mining sites. The major findings reveal varying sequestration rates among different reforestation species, indicating the potential for species-specific optimization of reforestation strategies to enhance carbon capture. This research's significance lies in its potential to contribute to sustainable land management practices and climate change mitigation strategies. By quantifying the carbon stock and sequestration potential of reforestation species, the study serves as a valuable resource for policymakers, land managers, and practitioners involved in ecological restoration and carbon management. Ultimately, the study aligns with global objectives to rejuvenate degraded landscapes while addressing pressing climate challenges.

Keywords: carbon stock, carbon sequestration, DNDC model, ecological restoration, mining sites, Morocco, reforestation, sustainable land management.

Procedia PDF Downloads 76
25337 A General Strategy for Noise Assessment in Open Mining Industries

Authors: Diego Mauricio Murillo Gomez, Enney Leon Gonzalez Ramirez, Hugo Piedrahita, Jairo Yate

Abstract:

This paper proposes a methodology for the management of noise in open mining industries based on an integral concept, which takes into consideration occupational and environmental noise as a whole. The approach relies on the characterization of sources, the combination of several measurements’ techniques and the use of acoustic prediction software. A discussion about the difference between frequently used acoustic indicators such as Leq and LAV is carried out, aiming to establish common ground for homologation. The results show that the correct integration of this data not only allows for a more robust technical analysis but also for a more strategic route of intervention as several departments of the company are working together. Noise control measurements can be designed to provide a healthy acoustic surrounding in which the exposure workers but also the outdoor community is benefited.

Keywords: environmental noise, noise control, occupational noise, open mining

Procedia PDF Downloads 269
25336 Intelligent Process Data Mining for Monitoring for Fault-Free Operation of Industrial Processes

Authors: Hyun-Woo Cho

Abstract:

The real-time fault monitoring and diagnosis of large scale production processes is helpful and necessary in order to operate industrial process safely and efficiently producing good final product quality. Unusual and abnormal events of the process may have a serious impact on the process such as malfunctions or breakdowns. This work try to utilize process measurement data obtained in an on-line basis for the safe and some fault-free operation of industrial processes. To this end, this work evaluated the proposed intelligent process data monitoring framework based on a simulation process. The monitoring scheme extracts the fault pattern in the reduced space for the reliable data representation. Moreover, this work shows the results of using linear and nonlinear techniques for the monitoring purpose. It has shown that the nonlinear technique produced more reliable monitoring results and outperforms linear methods. The adoption of the qualitative monitoring model helps to reduce the sensitivity of the fault pattern to noise.

Keywords: process data, data mining, process operation, real-time monitoring

Procedia PDF Downloads 640
25335 Identification of Environmental Damage Due to Mining Area Bangka Islands in Indonesia

Authors: Aroma Elmina Martha

Abstract:

Environment affects the continuity of life and human well-being and the bodies of other living. Environmental quality is very closely related to the quality of life. Sustainability must be protected from damage due to the use of natural resources, such as tin mining in Bangka island. This research is a descriptive study, which identifies the environmental damage caused by mining land and sea in Bangka district. The approach used is juridical, social and economic. The study uses primary legal materials, secondary, and tertiary, equipped with field research. The analysis technique used is qualitative analysis. The impacts of mining on land among other physical and chemical damage, erosion and widening the depth of the river, a pool of micro-climate, the quality and feasibility, vegetation, wildlife and biodiversity, land values, social and economic. This mining causes damage to the soil structure, and puddles in the former digs which were not backfilled again. The impact of mining on the ocean such as changes in current surge, erosion and abrasion basic coastal waters, shoreline change, marine water quality changes, and changes in marine communities. The findings of the research show that tin mining in the sea also potentially have a significant impact on the life of the reef, populations of marine organisms. However, mining on land needs to consider the impact of the damage, so that the damage can be minimized. In the recovery process needs to be pursued by exploiting the rest of the pile of tin. Thus, mining activities should take into account the distance of beach sediment size, wave height, wave length, wave period, and the acceleration of gravity. The process of the tin washing should be done in a fairly safe area, thus avoiding damage to the coral reefs that will eventually reduce the population of marine life.

Keywords: abration, environmental damage, mining, shoreline

Procedia PDF Downloads 322
25334 Application of Knowledge Discovery in Database Techniques in Cost Overruns of Construction Projects

Authors: Mai Ghazal, Ahmed Hammad

Abstract:

Cost overruns in construction projects are considered as worldwide challenges since the cost performance is one of the main measures of success along with schedule performance. To overcome this problem, studies were conducted to investigate the cost overruns' factors, also projects' historical data were analyzed to extract new and useful knowledge from it. This research is studying and analyzing the effect of some factors causing cost overruns using the historical data from completed construction projects. Then, using these factors to estimate the probability of cost overrun occurrence and predict its percentage for future projects. First, an intensive literature review was done to study all the factors that cause cost overrun in construction projects, then another review was done for previous researcher papers about mining process in dealing with cost overruns. Second, a proposed data warehouse was structured which can be used by organizations to store their future data in a well-organized way so it can be easily analyzed later. Third twelve quantitative factors which their data are frequently available at construction projects were selected to be the analyzed factors and suggested predictors for the proposed model.

Keywords: construction management, construction projects, cost overrun, cost performance, data mining, data warehousing, knowledge discovery, knowledge management

Procedia PDF Downloads 370
25333 Cirrhosis Mortality Prediction as Classification using Frequent Subgraph Mining

Authors: Abdolghani Ebrahimi, Diego Klabjan, Chenxi Ge, Daniela Ladner, Parker Stride

Abstract:

In this work, we use machine learning and novel data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. To the best of our knowledge, this is the first work to apply modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.

Keywords: machine learning, liver cirrhosis, subgraph mining, supervised learning

Procedia PDF Downloads 134
25332 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques

Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian

Abstract:

Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.

Keywords: data mining, k-means, road traffic accidents, Waze, Weka

Procedia PDF Downloads 417
25331 Leveraging Power BI for Advanced Geotechnical Data Analysis and Visualization in Mining Projects

Authors: Elaheh Talebi, Fariba Yavari, Lucy Philip, Lesley Town

Abstract:

The mining industry generates vast amounts of data, necessitating robust data management systems and advanced analytics tools to achieve better decision-making processes in the development of mining production and maintaining safety. This paper highlights the advantages of Power BI, a powerful intelligence tool, over traditional Excel-based approaches for effectively managing and harnessing mining data. Power BI enables professionals to connect and integrate multiple data sources, ensuring real-time access to up-to-date information. Its interactive visualizations and dashboards offer an intuitive interface for exploring and analyzing geotechnical data. Advanced analytics is a collection of data analysis techniques to improve decision-making. Leveraging some of the most complex techniques in data science, advanced analytics is used to do everything from detecting data errors and ensuring data accuracy to directing the development of future project phases. However, while Power BI is a robust tool, specific visualizations required by geotechnical engineers may have limitations. This paper studies the capability to use Python or R programming within the Power BI dashboard to enable advanced analytics, additional functionalities, and customized visualizations. This dashboard provides comprehensive tools for analyzing and visualizing key geotechnical data metrics, including spatial representation on maps, field and lab test results, and subsurface rock and soil characteristics. Advanced visualizations like borehole logs and Stereonet were implemented using Python programming within the Power BI dashboard, enhancing the understanding and communication of geotechnical information. Moreover, the dashboard's flexibility allows for the incorporation of additional data and visualizations based on the project scope and available data, such as pit design, rock fall analyses, rock mass characterization, and drone data. This further enhances the dashboard's usefulness in future projects, including operation, development, closure, and rehabilitation phases. Additionally, this helps in minimizing the necessity of utilizing multiple software programs in projects. This geotechnical dashboard in Power BI serves as a user-friendly solution for analyzing, visualizing, and communicating both new and historical geotechnical data, aiding in informed decision-making and efficient project management throughout various project stages. Its ability to generate dynamic reports and share them with clients in a collaborative manner further enhances decision-making processes and facilitates effective communication within geotechnical projects in the mining industry.

Keywords: geotechnical data analysis, power BI, visualization, decision-making, mining industry

Procedia PDF Downloads 92
25330 Analysing the Perception of Climate Hazards on Biodiversity Conservation in Mining Landscapes within Southwestern Ghana

Authors: Salamatu Shaibu, Jan Hernning Sommer

Abstract:

Integrating biodiversity conservation practices in mining landscapes ensures the continual provision of various ecosystem services to the dependent communities whilst serving as ecological insurance for corporate mining when purchasing reclamation security bonds. Climate hazards such as long dry seasons, erratic rainfall patterns, and extreme weather events contribute to biodiversity loss in addition to the impact due to mining. Both corporate mining and mine-fringe communities perceive the effect of climate on biodiversity from the context of the benefits they accrue, which motivate their conservation practices. In this study, pragmatic approaches including semi-structured interviews, field visual observation, and review were used to collect data on corporate mining employees and households of fringing communities in the southwestern mining hub. The perceived changes in the local climatic conditions and the consequences on environmental management practices that promote biodiversity conservation were examined. Using a thematic content analysis tool, the result shows that best practices such as concurrent land rehabilitation, reclamation ponds, artificial wetlands, land clearance, and topsoil management are directly affected by prolonging long dry seasons and erratic rainfall patterns. Excessive dust and noise generation directly affect both floral and faunal diversity coupled with excessive fire outbreaks in rehabilitated lands and nearby forest reserves. Proposed adaptive measures include engaging national conservation authorities to promote reforestation projects around forest reserves. National government to desist from using permit for mining concessions in forest reserves, engaging local communities through educational campaigns to control forest encroachment and burning, promoting community-based resource management to promote community ownership, and provision of stricter environmental legislation to compel corporate, artisanal, and small scale mining companies to promote biodiversity conservation.

Keywords: biodiversity conservation, climate hazards, corporate mining, mining landscapes

Procedia PDF Downloads 219
25329 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining

Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato

Abstract:

Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.

Keywords: data mining, data science, trajectory, animal behavior

Procedia PDF Downloads 144
25328 Investigating Data Normalization Techniques in Swarm Intelligence Forecasting for Energy Commodity Spot Price

Authors: Yuhanis Yusof, Zuriani Mustaffa, Siti Sakira Kamaruddin

Abstract:

Data mining is a fundamental technique in identifying patterns from large data sets. The extracted facts and patterns contribute in various domains such as marketing, forecasting, and medical. Prior to that, data are consolidated so that the resulting mining process may be more efficient. This study investigates the effect of different data normalization techniques, which are Min-max, Z-score, and decimal scaling, on Swarm-based forecasting models. Recent swarm intelligence algorithms employed includes the Grey Wolf Optimizer (GWO) and Artificial Bee Colony (ABC). Forecasting models are later developed to predict the daily spot price of crude oil and gasoline. Results showed that GWO works better with Z-score normalization technique while ABC produces better accuracy with the Min-Max. Nevertheless, the GWO is more superior that ABC as its model generates the highest accuracy for both crude oil and gasoline price. Such a result indicates that GWO is a promising competitor in the family of swarm intelligence algorithms.

Keywords: artificial bee colony, data normalization, forecasting, Grey Wolf optimizer

Procedia PDF Downloads 475
25327 A GIS Based Composite Land Degradation Assessment and Mapping of Tarkwa Mining Area

Authors: Bernard Kumi-Boateng, Kofi Bonsu

Abstract:

The clearing of vegetation in the Tarkwa Mining Area (TMA) for the purposes of mining, lumbering and development of settlement for the increasing population has caused a large scale denudation of the forest cover and erosion of the top soil thereby degrading the agriculture land. It is, therefore, essential to know the current status of land degradation in TMA so as to facilitate land conservation policy-making. The types of degradation, the extents of the degradations and their various degrees were combined to develop a composite land degradation index to assess the current status of land degradation in TMA using GIS based techniques. The assessment revealed that the most significant types of degradation in TMA were open pit and quarry mining; urbanisation and other construction projects; and surface scraping during land clearing. It was found that 21.62 % of the total area of TMA (353.07 km2) had high degradation index rating. It is recommended that decision makers use this assessment as a reference point for future initiatives that will be taken in order to develop land conservation policy.

Keywords: degradation, GIS, land, mining

Procedia PDF Downloads 354
25326 Application of Granular Computing Paradigm in Knowledge Induction

Authors: Iftikhar U. Sikder

Abstract:

This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.

Keywords: concept approximation, granular computing, reducts, rough set theory, rule induction

Procedia PDF Downloads 531
25325 Annual Effective Dose Associated with Radon in Groundwater Samples from Mining Communities Within the Ife-Ilesha Schist Belt, Southwestern Nigeria.

Authors: Paulinah Oyindamola Fasanmi, Matthew Omoniyi Isinkaye

Abstract:

In this study, the activity concentration of ²²²Rn in groundwater samples collected from gold and kaolin mining communities within the Ife-Ilesha schist belt, southwestern Nigeria, with their corresponding annual effective doses have been determined using the Durridge RAD-7, radon-in-water detector. The mean concentration of ²²²Rn in all the groundwater samples was 13.83 Bql-¹. In borehole water, ²²²Rn had a mean value of 20.68 Bql-¹, while it had a mean value of 11.67 Bql-¹ in well water samples. The mean activity concentration of radon obtained from the gold mining communities ranged from 1.6 Bql-¹ from Igun town to 4.8 Bql-¹ from Ilesha town. A higher mean value of 41.8 Bql-¹ was, however, obtained from Ijero, which is the kaolin mining community. The mean annual effective dose due to ingestion and inhalation of radon from groundwater samples was obtained to be 35.35 μSvyr-¹ and 34.86 nSvyr-¹, respectively. The mean annual ingestion dose estimated for well water samples was 29.90 μSvyr-¹, while 52.85 μSvyr-¹ was obtained for borehole water samples. On the other hand, the mean annual inhalation dose for well water was 29.49 nSvyr-¹, while for borehole water, 52.13 nSvyr-¹ was obtained. The mean annual effective dose due to ingestion of radon in groundwater from the gold mining communities ranged from 4.10 μSvyr-¹ from Igun to 13.1 μSvyr-¹ from Ilesha, while a mean value of 106.7 μSvyr-¹ was obtained from Ijero kaolin mining community. For inhalation, the mean value varied from 4.0 nSvyr-¹ from Igun to 12.9 nSvyr-¹ from Ilesha, while 105.2 nSvyr-¹ was obtained from the kaolin mining community. The mean annual effective dose due to ingestion and inhalation is lower than the reference level of 100 μSvyr-¹ recommended by World Health Organization except for values obtained from Ijero kaolin mining community, which exceeded the reference levels. It has been concluded that as far as radon-related health risks are concerned, groundwater from gold mining communities is generally safe, while groundwater from kaolin mining communities needs mitigation and monitoring. It has been discovered that Kaolin mining impacts groundwater with ²²²Rn than gold mining. Also, the radon level in borehole water exceeds its level in well water.

Keywords: 222Rn, Groundwater, Radioactivity, Annual Effective Dose, Mining.

Procedia PDF Downloads 69
25324 Trace Logo: A Notation for Representing Control-Flow of Operational Process

Authors: M. V. Manoj Kumar, Likewin Thomas, Annappa

Abstract:

Process mining research discipline bridges the gap between data mining and business process modeling and analysis, it offers the process-centric and end-to-end methods/techniques for analyzing information of real-world process detailed in operational event-logs. In this paper, we have proposed a notation called trace logo for graphically representing control-flow perspective (order of execution of activities) of process. A trace logo consists of a stack of activity names at each position, sizes of the activity name indicates their frequency in the traces and the total height of the activity depicts the information content of the position. A trace logo created from a set of aligned traces generated using Multiple Trace Alignment technique.

Keywords: consensus trace, process mining, multiple trace alignment, trace logo

Procedia PDF Downloads 348
25323 On Exploring Search Heuristics for improving the efficiency in Web Information Extraction

Authors: Patricia Jiménez, Rafael Corchuelo

Abstract:

Nowadays the World Wide Web is the most popular source of information that relies on billions of on-line documents. Web mining is used to crawl through these documents, collect the information of interest and process it by applying data mining tools in order to use the gathered information in the best interest of a business, what enables companies to promote theirs. Unfortunately, it is not easy to extract the information a web site provides automatically when it lacks an API that allows to transform the user-friendly data provided in web documents into a structured format that is machine-readable. Rule-based information extractors are the tools intended to extract the information of interest automatically and offer it in a structured format that allow mining tools to process it. However, the performance of an information extractor strongly depends on the search heuristic employed since bad choices regarding how to learn a rule may easily result in loss of effectiveness and/or efficiency. Improving search heuristics regarding efficiency is of uttermost importance in the field of Web Information Extraction since typical datasets are very large. In this paper, we employ an information extractor based on a classical top-down algorithm that uses the so-called Information Gain heuristic introduced by Quinlan and Cameron-Jones. Unfortunately, the Information Gain relies on some well-known problems so we analyse an intuitive alternative, Termini, that is clearly more efficient; we also analyse other proposals in the literature and conclude that none of them outperforms the previous alternative.

Keywords: information extraction, search heuristics, semi-structured documents, web mining.

Procedia PDF Downloads 335
25322 Environmental Impact Assessment in Mining Regions with Remote Sensing

Authors: Carla Palencia-Aguilar

Abstract:

Calculations of Net Carbon Balance can be obtained by means of Net Biome Productivity (NBP), Net Ecosystem Productivity (NEP), and Net Primary Production (NPP). The latter is an important component of the biosphere carbon cycle and is easily obtained data from MODIS MOD17A3HGF; however, the results are only available yearly. To overcome data availability, bands 33 to 36 from MODIS MYD021KM (obtained on a daily basis) were analyzed and compared with NPP data from the years 2000 to 2021 in 7 sites where surface mining takes place in the Colombian territory. Coal, Gold, Iron, and Limestone were the minerals of interest. Scales and Units as well as thermal anomalies, were considered for net carbon balance per location. The NPP time series from the satellite images were filtered by using two Matlab filters: First order and Discrete Transfer. After filtering the NPP time series, comparing the graph results from the satellite’s image value, and running a linear regression, the results showed R2 from 0,72 to 0,85. To establish comparable units among NPP and bands 33 to 36, the Greenhouse Gas Equivalencies Calculator by EPA was used. The comparison was established in two ways: one by the sum of all the data per point per year and the other by the average of 46 weeks and finding the percentage that the value represented with respect to NPP. The former underestimated the total CO2 emissions. The results also showed that coal and gold mining in the last 22 years had less CO2 emissions than limestone, with an average per year of 143 kton CO2 eq for gold, 152 kton CO2 eq for coal, and 287 kton CO2 eq for iron. Limestone emissions varied from 206 to 441 kton CO2 eq. The maximum emission values from unfiltered data correspond to 165 kton CO2 eq. for gold, 188 kton CO2 eq. for coal, and 310 kton CO2 eq. for iron and limestone, varying from 231 to 490 kton CO2 eq. If the most pollutant limestone site improves its production technology, limestone could count with a maximum of 318 kton CO2 eq emissions per year, a value very similar respect to iron. The importance of gathering data is to establish benchmarks in order to attain 2050’s zero emissions goal.

Keywords: carbon dioxide, NPP, MODIS, MINING

Procedia PDF Downloads 104
25321 Abandoned Mine Methane Mitigation in the United States

Authors: Jerome Blackman, Pamela Franklin, Volha Roshchanka

Abstract:

The US coal mining sector accounts for 6% of total US Methane emissions (2021). 60% of US coal mining methane emissions come from active underground mine ventilation systems. Abandoned mines contribute about 13% of methane emissions from coal mining. While there are thousands of abandoned underground coal mines in the US, the Environmental Protection Agency (EPA) estimates that fewer than 100 have sufficient methane resources for viable methane recovery and use projects. Many abandoned mines are in remote areas far from potential energy customers and may be flooded, further complicating methane recovery. Because these mines are no longer active, recovery projects can be simpler to implement.

Keywords: abandoned mines, coal mine methane, coal mining, methane emissions, methane mitigation, recovery and use

Procedia PDF Downloads 78
25320 Presenting a Model for Predicting the State of Being Accident-Prone of Passages According to Neural Network and Spatial Data Analysis

Authors: Hamd Rezaeifar, Hamid Reza Sahriari

Abstract:

Accidents are considered to be one of the challenges of modern life. Due to the fact that the victims of this problem and also internal transportations are getting increased day by day in Iran, studying effective factors of accidents and identifying suitable models and parameters about this issue are absolutely essential. The main purpose of this research has been studying the factors and spatial data affecting accidents of Mashhad during 2007- 2008. In this paper it has been attempted to – through matching spatial layers on each other and finally by elaborating them with the place of accident – at the first step by adding landmarks of the accident and through adding especial fields regarding the existence or non-existence of effective phenomenon on accident, existing information banks of the accidents be completed and in the next step by means of data mining tools and analyzing by neural network, the relationship between these data be evaluated and a logical model be designed for predicting accident-prone spots with minimum error. The model of this article has a very accurate prediction in low-accident spots; yet it has more errors in accident-prone regions due to lack of primary data.

Keywords: accident, data mining, neural network, GIS

Procedia PDF Downloads 47