Search results for: biological data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26722

Search results for: biological data mining

26362 A Hybrid Approach for Thread Recommendation in MOOC Forums

Authors: Ahmad. A. Kardan, Amir Narimani, Foozhan Ataiefard

Abstract:

Recommender Systems have been developed to provide contents and services compatible to users based on their behaviors and interests. Due to information overload in online discussion forums and users diverse interests, recommending relative topics and threads is considered to be helpful for improving the ease of forum usage. In order to lead learners to find relevant information in educational forums, recommendations are even more needed. We present a hybrid thread recommender system for MOOC forums by applying social network analysis and association rule mining techniques. Initial results indicate that the proposed recommender system performs comparatively well with regard to limited available data from users' previous posts in the forum.

Keywords: association rule mining, hybrid recommender system, massive open online courses, MOOCs, social network analysis

Procedia PDF Downloads 286
26361 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease

Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena

Abstract:

Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.

Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics

Procedia PDF Downloads 87
26360 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule

Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu

Abstract:

Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.

Keywords: instance selection, data reduction, MapReduce, kNN

Procedia PDF Downloads 245
26359 Lead Removal From Ex- Mining Pond Water by Electrocoagulation: Kinetics, Isotherm, and Dynamic Studies

Authors: Kalu Uka Orji, Nasiman Sapari, Khamaruzaman W. Yusof

Abstract:

Exposure of galena (PbS), tealite (PbSnS2), and other associated minerals during mining activities release lead (Pb) and other heavy metals into the mining water through oxidation and dissolution. Heavy metal pollution has become an environmental challenge. Lead, for instance, can cause toxic effects to human health, including brain damage. Ex-mining pond water was reported to contain lead as high as 69.46 mg/L. Conventional treatment does not easily remove lead from water. A promising and emerging treatment technology for lead removal is the application of the electrocoagulation (EC) process. However, some of the problems associated with EC are systematic reactor design, selection of maximum EC operating parameters, scale-up, among others. This study investigated an EC process for the removal of lead from synthetic ex-mining pond water using a batch reactor and Fe electrodes. The effects of various operating parameters on lead removal efficiency were examined. The results obtained indicated that the maximum removal efficiency of 98.6% was achieved at an initial PH of 9, the current density of 15mA/cm2, electrode spacing of 0.3cm, treatment time of 60 minutes, Liquid Motion of Magnetic Stirring (LM-MS), and electrode arrangement = BP-S. The above experimental data were further modeled and optimized using a 2-Level 4-Factor Full Factorial design, a Response Surface Methodology (RSM). The four factors optimized were the current density, electrode spacing, electrode arrangements, and Liquid Motion Driving Mode (LM). Based on the regression model and the analysis of variance (ANOVA) at 0.01%, the results showed that an increase in current density and LM-MS increased the removal efficiency while the reverse was the case for electrode spacing. The model predicted the optimal lead removal efficiency of 99.962% with an electrode spacing of 0.38 cm alongside others. Applying the predicted parameters, the lead removal efficiency of 100% was actualized. The electrode and energy consumptions were 0.192kg/m3 and 2.56 kWh/m3 respectively. Meanwhile, the adsorption kinetic studies indicated that the overall lead adsorption system belongs to the pseudo-second-order kinetic model. The adsorption dynamics were also random, spontaneous, and endothermic. The higher temperature of the process enhances adsorption capacity. Furthermore, the adsorption isotherm fitted the Freundlish model more than the Langmuir model; describing the adsorption on a heterogeneous surface and showed good adsorption efficiency by the Fe electrodes. Adsorption of Pb2+ onto the Fe electrodes was a complex reaction, involving more than one mechanism. The overall results proved that EC is an efficient technique for lead removal from synthetic mining pond water. The findings of this study would have application in the scale-up of EC reactor and in the design of water treatment plants for feed-water sources that contain lead using the electrocoagulation method.

Keywords: ex-mining water, electrocoagulation, lead, adsorption kinetics

Procedia PDF Downloads 140
26358 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 177
26357 Investigation of Yard Seam Workings for the Proposed Newcastle Light Rail Project

Authors: David L. Knott, Robert Kingsland, Alistair Hitchon

Abstract:

The proposed Newcastle Light Rail is a key part of the revitalisation of Newcastle, NSW and will provide a frequent and reliable travel option throughout the city centre, running from Newcastle Interchange at Wickham to Pacific Park in Newcastle East, a total of 2.7 kilometers in length. Approximately one-third of the route, along Hunter and Scott Streets, is subject to potential shallow underground mine workings. The extent of mining and seams mined is unclear. Convicts mined the Yard Seam and overlying Dudley (Dirty) Seam in Newcastle sometime between 1800 and 1830. The Australian Agricultural Company mined the Yard Seam from about 1831 to the 1860s in the alignment area. The Yard Seam was about 3 feet (0.9m) thick, and therefore, known as the Yard Seam. Mine maps do not exist for the workings in the area of interest and it was unclear if both or just one seam was mined. Information from 1830s geological mapping and other data showing shaft locations were used along Scott Street and information from the 1908 Royal Commission was used along Hunter Street to develop an investigation program. In addition, mining was encountered for several sites to the south of the alignment at depths of about 7 m to 25 m. Based on the anticipated depths of mining, it was considered prudent to assess the potential for sinkhole development on the proposed alignment and realigned underground utilities and to obtain approval for the work from Subsidence Advisory NSW (SA NSW). The assessment consisted of a desktop study, followed by a subsurface investigation. Four boreholes were drilled along Scott Street and three boreholes were drilled along Hunter Street using HQ coring techniques in the rock. The placement of boreholes was complicated by the presence of utilities in the roadway and traffic constraints. All the boreholes encountered the Yard Seam, with conditions varying from unmined coal to an open void, indicating the presence of mining. The geotechnical information obtained from the boreholes was expanded by using various downhole techniques including; borehole camera, borehole sonar, and downhole geophysical logging. The camera provided views of the rock and helped to explain zones of no recovery. In addition, timber props within the void were observed. Borehole sonar was performed in the void and provided an indication of room size as well as the presence of timber props within the room. Downhole geophysical logging was performed in the boreholes to measure density, natural gamma, and borehole deviation. The data helped confirm that all the mining was in the Yard Seam and that the overlying Dudley Seam had been eroded in the past over much of the alignment. In summary, the assessment allowed the potential for sinkhole subsidence to be assessed and a mitigation approach developed to allow conditional approval by SA NSW. It also confirmed the presence of mining in the Yard Seam, the depth to the seam and mining conditions, and indicated that subsidence did not appear to have occurred in the past.

Keywords: downhole investigation techniques, drilling, mine subsidence, yard seam

Procedia PDF Downloads 305
26356 Feature Selection for Production Schedule Optimization in Transition Mines

Authors: Angelina Anani, Ignacio Ortiz Flores, Haitao Li

Abstract:

The use of underground mining methods have increased significantly over the past decades. This increase has also been spared on by several mines transitioning from surface to underground mining. However, determining the transition depth can be a challenging task, especially when coupled with production schedule optimization. Several researchers have simplified the problem by excluding operational features relevant to production schedule optimization. Our research objective is to investigate the extent to which operational features of transition mines accounted for affect the optimal production schedule. We also provide a framework for factors to consider in production schedule optimization for transition mines. An integrated mixed-integer linear programming (MILP) model is developed that maximizes the NPV as a function of production schedule and transition depth. A case study is performed to validate the model, with a comparative sensitivity analysis to obtain operational insights.

Keywords: underground mining, transition mines, mixed-integer linear programming, production schedule

Procedia PDF Downloads 158
26355 Effect of Bacillus Pumilus Strains on Heavy Metal Accumulation in Lettuce Grown on Contaminated Soil

Authors: Sabeen Alam, Mehboob Alam

Abstract:

The research work entitled “Effect of Bacillus pumilus strains on heavy metal accumulation in lettuce grown on contaminated soil” focused on functional role of Bacillus pumilus strains inoculated with lettuce seed in mitigating heavy metal in chromite mining soil. In this experiment, factor A was three Bacillus pumilus strains (sequence C-2PMW-8, C-1 SSK-8 and C-1 PWK-7) while soil used for this experiment was collected from Prang Ghar mining site and lettuce seeds were grown in three levels of chromite mining soil (2.27, 4.65 and 7.14 %). For mining soil minimum days to germinate noted in lettuce grown on garden soil inoculated with sequence. Maximum germination percentage noted was for C-1 SSK-8 grown on garden soil, maximum lettuce height for sequence C-2 PWM-8, fresh leaf weight for C-1 PWK-7 inoculated lettuce, dry weight of lettuce leaf for lettuce inoculated with C-1 SSK-8 and C-1 PWK-7 strains, number of leaves per plant for lettuce inoculated with C-1 SSK-8, leaf area for C-2 PMW-8 inoculated lettuce, survival percentage for C-1 SSK-8 treated lettuce and chlorophyll content for C-2 PMW-8. Results related to heavy metals accumulation showed that minimum chromium was in lettuce and in soil for all three sequences, cadmium (Cd) in lettuce and in soil for all three sequences, manganese (Mn) in lettuce and in soil for three sequences, lead (Pb) in lettuce and in soil for three sequences. It can be concluded that chromite mining soil significantly reduced the growth and survival of lettuce, but when lettuce was inoculated with Bacillus.pumilus strains, it enhances growth and survival. Similarly, minimum heavy metal accumulation in plant and soil, regardless of type of Bacillus pumilus used, all three sequences has same mitigating effect on heavy metal in both soil and lettuce. All the three Bacillus pumilus strains ensured reduction in heavy metals content (Mn, Cd, Cr) in lettuce, below the maximum permissible limits of WHO 2011.

Keywords: bacillus pumilus, heavy metals, permissible limits, lettuce, chromite mining soil, mitigating effect

Procedia PDF Downloads 45
26354 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: latent dirichlet allocation, R program, text mining, topic model, user generated contents, visualization

Procedia PDF Downloads 182
26353 The Human Right to a Safe, Clean and Healthy Environment in Corporate Social Responsibility's Strategies: An Approach to Understanding Mexico's Mining Sector

Authors: Thalia Viveros-Uehara

Abstract:

The virtues of Corporate Social Responsibility (CSR) are explored widely in the academic literature. However, few studies address its link to human rights, per se; specifically, the right to a safe, clean and healthy environment. Fewer still are the research works in this area that relate to developing countries, where a number of areas are biodiversity hotspots. In Mexico, despite the rise and evolution of CSR schemes, grave episodes of pollution persist, especially those caused by the mining industry. These cases set up the question of the correspondence between the current CSR practices of mining companies in the country and their responsibility to respect the right to a safe, clean and healthy environment. The present study approaches precisely such a bridge, which until now has not been fully tackled in light of Mexico's 2011 constitutional human rights amendment and the United Nation's Guiding Principles on Business and Human Rights (UN Guiding Principles), adopted by the Human Rights Council in 2011. To that aim, it initially presents a contextual framework; it then explores qualitatively the adoption of human rights’ language in the CSR strategies of the three main mining companies in Mexico, and finally, it examines their standing with respect to the UN Guiding Principles. The results reveal that human rights are included in the RSE strategies of the analysed businesses, at least at the rhetoric level; however, they do not embrace the right to a safe, clean and healthy environment as such. Moreover, we conclude that despite the finding that corporations publicly express their commitment to respect human rights, some operational weaknesses that hamper the exercise of such responsibility persist; for example, the systematic lack of human rights impact assessments per mining unit, the denial of actual and publicly-known negative episodes on the environment linked directly to their operations, and the absence of effective mechanisms to remediate adverse impacts.

Keywords: corporate social responsibility, environmental impacts, human rights, right to a safe, clean and healthy environment, mining industry

Procedia PDF Downloads 323
26352 Investigating Dynamic Transition Process of Issues Using Unstructured Text Analysis

Authors: Myungsu Lim, William Xiu Shun Wong, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Namgyu Kim

Abstract:

The amount of real-time data generated through various mass media has been increasing rapidly. In this study, we had performed topic analysis by using the unstructured text data that is distributed through news article. As one of the most prevalent applications of topic analysis, the issue tracking technique investigates the changes of the social issues that identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has limitation that it cannot discover dynamic mutation process of complex social issues. The purpose of this study is to overcome the limitations of the existing issue tracking method. We first derived core issues of each period, and then discover the dynamic mutation process of various issues. In this study, we further analyze the mutation process from the perspective of the issues categories, in order to figure out the pattern of issue flow, including the frequency and reliability of the pattern. In other words, this study allows us to understand the components of the complex issues by tracking the dynamic history of issues. This methodology can facilitate a clearer understanding of complex social phenomena by providing mutation history and related category information of the phenomena.

Keywords: Data Mining, Issue Tracking, Text Mining, topic Analysis, topic Detection, Trend Detection

Procedia PDF Downloads 393
26351 A Hybrid Recommendation System Based on Association Rules

Authors: Ahmed Mohammed Alsalama

Abstract:

Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of the current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose a hybrid framework recommendation system to be applied on two-dimensional spaces (User x Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.

Keywords: data mining, association rules, recommendation systems, hybrid systems

Procedia PDF Downloads 448
26350 Applying Big Data Analysis to Efficiently Exploit the Vast Unconventional Tight Oil Reserves

Authors: Shengnan Chen, Shuhua Wang

Abstract:

Successful production of hydrocarbon from unconventional tight oil reserves has changed the energy landscape in North America. The oil contained within these reservoirs typically will not flow to the wellbore at economic rates without assistance from advanced horizontal well and multi-stage hydraulic fracturing. Efficient and economic development of these reserves is a priority of society, government, and industry, especially under the current low oil prices. Meanwhile, society needs technological and process innovations to enhance oil recovery while concurrently reducing environmental impacts. Recently, big data analysis and artificial intelligence become very popular, developing data-driven insights for better designs and decisions in various engineering disciplines. However, the application of data mining in petroleum engineering is still in its infancy. The objective of this research aims to apply intelligent data analysis and data-driven models to exploit unconventional oil reserves both efficiently and economically. More specifically, a comprehensive database including the reservoir geological data, reservoir geophysical data, well completion data and production data for thousands of wells is firstly established to discover the valuable insights and knowledge related to tight oil reserves development. Several data analysis methods are introduced to analysis such a huge dataset. For example, K-means clustering is used to partition all observations into clusters; principle component analysis is applied to emphasize the variation and bring out strong patterns in the dataset, making the big data easy to explore and visualize; exploratory factor analysis (EFA) is used to identify the complex interrelationships between well completion data and well production data. Different data mining techniques, such as artificial neural network, fuzzy logic, and machine learning technique are then summarized, and appropriate ones are selected to analyze the database based on the prediction accuracy, model robustness, and reproducibility. Advanced knowledge and patterned are finally recognized and integrated into a modified self-adaptive differential evolution optimization workflow to enhance the oil recovery and maximize the net present value (NPV) of the unconventional oil resources. This research will advance the knowledge in the development of unconventional oil reserves and bridge the gap between the big data and performance optimizations in these formations. The newly developed data-driven optimization workflow is a powerful approach to guide field operation, which leads to better designs, higher oil recovery and economic return of future wells in the unconventional oil reserves.

Keywords: big data, artificial intelligence, enhance oil recovery, unconventional oil reserves

Procedia PDF Downloads 275
26349 Discovering the Effects of Meteorological Variables on the Air Quality of Bogota, Colombia, by Data Mining Techniques

Authors: Fabiana Franceschi, Martha Cobo, Manuel Figueredo

Abstract:

Bogotá, the capital of Colombia, is its largest city and one of the most polluted in Latin America due to the fast economic growth over the last ten years. Bogotá has been affected by high pollution events which led to the high concentration of PM10 and NO2, exceeding the local 24-hour legal limits (100 and 150 g/m3 each). The most important pollutants in the city are PM10 and PM2.5 (which are associated with respiratory and cardiovascular problems) and it is known that their concentrations in the atmosphere depend on the local meteorological factors. Therefore, it is necessary to establish a relationship between the meteorological variables and the concentrations of the atmospheric pollutants such as PM10, PM2.5, CO, SO2, NO2 and O3. This study aims to determine the interrelations between meteorological variables and air pollutants in Bogotá, using data mining techniques. Data from 13 monitoring stations were collected from the Bogotá Air Quality Monitoring Network within the period 2010-2015. The Principal Component Analysis (PCA) algorithm was applied to obtain primary relations between all the parameters, and afterwards, the K-means clustering technique was implemented to corroborate those relations found previously and to find patterns in the data. PCA was also used on a per shift basis (morning, afternoon, night and early morning) to validate possible variation of the previous trends and a per year basis to verify that the identified trends have remained throughout the study time. Results demonstrated that wind speed, wind direction, temperature, and NO2 are the most influencing factors on PM10 concentrations. Furthermore, it was confirmed that high humidity episodes increased PM2,5 levels. It was also found that there are direct proportional relationships between O3 levels and wind speed and radiation, while there is an inverse relationship between O3 levels and humidity. Concentrations of SO2 increases with the presence of PM10 and decreases with the wind speed and wind direction. They proved as well that there is a decreasing trend of pollutant concentrations over the last five years. Also, in rainy periods (March-June and September-December) some trends regarding precipitations were stronger. Results obtained with K-means demonstrated that it was possible to find patterns on the data, and they also showed similar conditions and data distribution among Carvajal, Tunal and Puente Aranda stations, and also between Parque Simon Bolivar and las Ferias. It was verified that the aforementioned trends prevailed during the study period by applying the same technique per year. It was concluded that PCA algorithm is useful to establish preliminary relationships among variables, and K-means clustering to find patterns in the data and understanding its distribution. The discovery of patterns in the data allows using these clusters as an input to an Artificial Neural Network prediction model.

Keywords: air pollution, air quality modelling, data mining, particulate matter

Procedia PDF Downloads 250
26348 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 576
26347 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data

Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill

Abstract:

Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.

Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function

Procedia PDF Downloads 269
26346 Mining in Peru and Local Governance: Assessing the Contribution of CRS Projects

Authors: Sandra Carrillo Hoyos

Abstract:

Mining activities in South America have significantly grown during the last decades, given the abundance of natural resources, the implemented governmental policies to incentivize foreign investment as well as the boom in international prices for metals and oil between 2002 and 2008. While this context allowed the region to occupy a leading position between the top producers of minerals around the world, it has also meant an increase in socio-environmental conflicts which have generated costs and negative impacts not only for the companies but especially for the governments and local communities.During the latest decade, the mining sector in Peru has faced with the social resistance of a large number of communities, which began organizing actions against the implementation of high investing projects. The dissatisfaction has derived in the prevalence of socio-environmental conflicts associated with mining activities, some of them never solved into an agreement. In order to prevent those socio-environmental conflicts and obtain the social license from local communities, most of the mining companies have developed diverse initiatives within the framework of policies and practices of corporate social responsibility (CSR). This paper has assessed the mining sector’s contribution toward the local development management along the last decade, as part of CSR strategies as well as the policies promoted by the Peruvian State. This assessment found that, in the beginning, these initiatives have been based on a philanthropic approach and were reacting to pressures from local stakeholders to maintain the consent to operate from the surrounding communities as well as to create, as a result, a harmonious atmosphere for operations. Due to the weak State presence, such practices have increased the expectations of communities related to the participation of mining companies in solving structural development problems, especially those related to primary needs, infrastructure, education, health, among others. In other words, this paper was focused on analyze in what extent these initiatives have promoted local empowerment for development planning and integrated management of natural resources from a territorial approach. From this perspective, the analysis demonstrates that, while the design and planning of social investment initiatives have improved due to the sector´s sustainability approach, many companies have developed actions beyond their competence during this process. In some cases, the referenced actions have generated dependency with communities, even though this relationship has not exempted the companies of conflict situations with unfortunate consequences. Furthermore, the social programs developed have not necessarily generated a significant impact in improving the quality of life of affected populations. In fact, it is possible to identify that those regions with high mining resources and investment are facing with a situation of poverty and high dependency on mining production. In spite of the revenues derived from mining industry, local governments have not been able to translate the royalties into sustainable development opportunities. For this reason, the proposed paper suggests some challenges for the mining sector contribution to local development based on the best practices and lessons learnt from a benchmarking for the leading mining companies.

Keywords: corporate social responsibility, local development, mining, socio-environmental conflict

Procedia PDF Downloads 392
26345 Lead and Cadmium Spatial Pattern and Risk Assessment around Coal Mine in Hyrcanian Forest, North Iran

Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch

Abstract:

In this study, the effect of coal mining activities on lead and cadmium concentrations and distribution in soil was investigated in Hyrcanian forest, North Iran. 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity; considered as the controlled area. In order to investigate soil lead and cadmium concentration, one sample was taken from the 0-10 cm in each plot. To study the spatial pattern of soil properties and lead and cadmium concentrations in the mining area, an area of 80×80m2 (the mine as the center) was considered and 80 soil samples were systematic-randomly taken (10 m intervals). Geostatistical analysis was performed via Kriging method and GS+ software (version 5.1). In order to estimate the impact of coal mining activities on soil quality, pollution index was measured. Lead and cadmium concentrations were significantly higher in mine area (Pb: 10.97±0.30, Cd: 184.47±6.26 mg.kg-1) in comparison to control area (Pb: 9.42±0.17, Cd: 131.71±15.77 mg.kg-1). The mean values of the PI index indicate that Pb (1.16) and Cd (1.77) presented slightly polluted. Results of the NIPI index showed that Pb (1.44) and Cd (2.52) presented slight pollution and moderate pollution respectively. Results of variography and kriging method showed that it is possible to prepare interpolation maps of lead and cadmium around the mining areas in Hyrcanian forest. According to results of pollution and risk assessments, forest soil was contaminated by heavy metals (lead and cadmium); therefore, using reclamation and remediation techniques in these areas is necessary.

Keywords: traditional coal mining, heavy metals, pollution indicators, geostatistics, Caspian forest

Procedia PDF Downloads 170
26344 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink

Authors: Sanjay Rathee, Arti Kashyap

Abstract:

Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.

Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining

Procedia PDF Downloads 282
26343 Strategies to Enhance Compliance of Health and Safety Standards at the Selected Mining Industries in Limpopo Province, South Africa: Occupational Health Nurse’s Perspective

Authors: Livhuwani Muthelo

Abstract:

The health and safety of the miners in the South African mining industry are guided by the regulations and standards which are anticipated to promote a healthy work environment and fatalities. It is of utmost importance for the miners to comply with these regulations/standards to protect themselves from potential occupational health and safety risks, accidents, and fatalities. The purpose of this study was to develop and validate strategies to enhance compliance with the Health and safety standards within the mining industries of Limpopo province in South Africa. A mixed-method exploratory sequential research design was adopted. The population consisted of 5350 miners. Purposive sampling was used to select the participants in the qualitative strand and stratified random sampling in the quantitative strand. Semi-structured interviews were conducted among the occupational health nurse practitioners and the health and safety team. Thematic analysis was used to generate an understanding of the interviews. In the quantitative strand, a survey was conducted using a self-administered questionnaire. Data were analysed using SPSS version 26.0. A descriptive statistical test was used in the analysis of data including frequencies, means, and standard deviation. Cronbach's alpha test was used to measure internal consistency. The integrated results revealed that there are diverse experiences related to health and safety standards compliance among the mineworkers. The main findings were challenges related to leadership compliance and also related to the cost of maintaining safety, Miner's behavior-related challenges; the impact of non-compliance on the overall health of the miners was also described, the conflict between production and safety. Health and safety compliance is not just mere compliance with regulations and standards but a culture that warrants the miners and organization to take responsibility for their behavior and actions towards health and safety. Thus taking responsibility for your well-being and other miners.

Keywords: perceptions, compliance, health and safety, legislation, standards, miners

Procedia PDF Downloads 92
26342 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop

Authors: Anuta Mukherjee, Saswati Mukherjee

Abstract:

Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.

Keywords: sentiment analysis, twitter, collision theory, discourse analysis

Procedia PDF Downloads 520
26341 Directional Dust Deposition Measurements: The Influence of Seasonal Changes and the Meteorological Conditions Influencing in Witbank Area and Carletonville Area

Authors: Maphuti Georgina Kwata

Abstract:

Coal mining in Mpumalanga Province is known of contributing to the atmospheric pollution from various activities. Gold mining in North-West Province is known of also contributing to the atmospheric pollution especially with the production of radon gas. In this research directional dust deposition gauge was used to measure source of direction and meteorological data was used to determine the wind rose blowing and the influence of the seasonal changes. Fourteen months of dust collection was undertaken in Witbank Area and Carletonville Area. The results shows that the sources of direction for Ericson Dam its East in February 2010 and Tip Area shows that the source of direction its West in October 2010. In the East direction there were mining operations, power stations which contributed to the East to be the sources of direction. In the West direction there were smelters, power stations and agricultural activities which contributed for the source of direction to be the West direction for Driefontein Mine: East Recreational Village Club. The East of Leslie Williams hospital is the source of direction which also indicated that there dust generating activities such as mining operation, agricultural activities. The meteorological results for Emalahleni Area in summer and winter the wind rose blow with wind speed of 5-10 ms-1 from the East sector. Annual average for the wind rose blow its East South eastern sector with 20 ms-1 and day time the wind rose from northwestern sector with excess of 20 ms-1. The night time wind direction East-eastern direction with a maximum wind speed of 20 ms-1. The meteorogical results for Driefontein Mine show that North-western sector and north-eastern sector wind rose is blowing with 5-10 ms-1 win speed. Day time wind blows from the West sector and night time wind blows from the north sector. In summer the wind blows North-east sector with 5-10 ms-1 and winter wind blows from North-west and it’s also predominant. In spring wind blows from north-east. The conclusion is that not only mining operation where the directional dust deposit gauge were installed contributed to the source of direction also the power stations, smelters, and other activities nearby the mining operation contributed. The recommendations are the dust suppressant for unpaved roads should be used on a regular basis and there should be monitoring of the weather conditions (the wind speed and direction prior to blasting to ensure minimal emissions).

Keywords: directional dust deposition gauge, BS part 5 1747 dust deposit gauge, wind rose, wind blowing

Procedia PDF Downloads 498
26340 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse

Procedia PDF Downloads 396
26339 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)

Procedia PDF Downloads 261
26338 Recommender System Based on Mining Graph Databases for Data-Intensive Applications

Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi

Abstract:

In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.

Keywords: graph databases, NLP, recommendation systems, similarity metrics

Procedia PDF Downloads 96
26337 Shark Detection and Classification with Deep Learning

Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti

Abstract:

Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.

Keywords: classification, data mining, Instagram, remote monitoring, sharks

Procedia PDF Downloads 105
26336 Research of the Three-Dimensional Visualization Geological Modeling of Mine Based on Surpac

Authors: Honggang Qu, Yong Xu, Rongmei Liu, Zhenji Gao, Bin Wang

Abstract:

Today's mining industry is advancing gradually toward digital and visual direction. The three-dimensional visualization geological modeling of mine is the digital characterization of mineral deposits and is one of the key technology of digital mining. Three-dimensional geological modeling is a technology that combines geological spatial information management, geological interpretation, geological spatial analysis and prediction, geostatistical analysis, entity content analysis and graphic visualization in a three-dimensional environment with computer technology and is used in geological analysis. In this paper, the three-dimensional geological modeling of an iron mine through the use of Surpac is constructed, and the weight difference of the estimation methods between the distance power inverse ratio method and ordinary kriging is studied, and the ore body volume and reserves are simulated and calculated by using these two methods. Compared with the actual mine reserves, its result is relatively accurate, so it provides scientific bases for mine resource assessment, reserve calculation, mining design and so on.

Keywords: three-dimensional geological modeling, geological database, geostatistics, block model

Procedia PDF Downloads 70
26335 The Investigation of Enzymatic Activity in the Soils Under the Impact of Metallurgical Industrial Activity in Lori Marz, Armenia

Authors: T. H. Derdzyan, K. A. Ghazaryan, G. A. Gevorgyan

Abstract:

Beta-glucosidase, chitinase, leucine-aminopeptidase, acid phosphomonoestearse and acetate-esterase enzyme activities in the soils under the impact of metallurgical industrial activity in Lori marz (district) were investigated. The results of the study showed that the activities of the investigated enzymes in the soils decreased with increasing distance from the Shamlugh copper mine, the Chochkan tailings storage facility and the ore transportation road. Statistical analysis revealed that the activities of the enzymes were positively correlated (significant) to each other according to the observation sites which indicated that enzyme activities were affected by the same anthropogenic factor. The investigations showed that the soils were polluted with heavy metals (Cu, Pb, As, Co, Ni, Zn) due to copper mining activity in this territory. The results of Pearson correlation analysis revealed a significant negative correlation between heavy metal pollution degree (Nemerow integrated pollution index) and soil enzyme activity. All of this indicated that copper mining activity in this territory causing the heavy metal pollution of the soils resulted in the inhabitation of the activities of the enzymes which are considered as biological catalysts to decompose organic materials and facilitate the cycling of nutrients.

Keywords: Armenia, metallurgical industrial activity, heavy metal pollutionl, soil enzyme activity

Procedia PDF Downloads 286
26334 Text Mining Past Medical History in Electrophysiological Studies

Authors: Roni Ramon-Gonen, Amir Dori, Shahar Shelly

Abstract:

Background and objectives: Healthcare professionals produce abundant textual information in their daily clinical practice. The extraction of insights from all the gathered information, mainly unstructured and lacking in normalization, is one of the major challenges in computational medicine. In this respect, text mining assembles different techniques to derive valuable insights from unstructured textual data, so it has led to being especially relevant in Medicine. Neurological patient’s history allows the clinician to define the patient’s symptoms and along with the result of the nerve conduction study (NCS) and electromyography (EMG) test, assists in formulating a differential diagnosis. Past medical history (PMH) helps to direct the latter. In this study, we aimed to identify relevant PMH, understand which PMHs are common among patients in the referral cohort and documented by the medical staff, and examine the differences by sex and age in a large cohort based on textual format notes. Methods: We retrospectively identified all patients with abnormal NCS between May 2016 to February 2022. Age, gender, and all NCS attributes reports were recorded, including the summary text. All patients’ histories were extracted from the text report by a query. Basic text cleansing and data preparation were performed, as well as lemmatization. Very popular words (like ‘left’ and ‘right’) were deleted. Several words were replaced with their abbreviations. A bag of words approach was used to perform the analyses. Different visualizations which are common in text analysis, were created to easily grasp the results. Results: We identified 5282 unique patients. Three thousand and five (57%) patients had documented PMH. Of which 60.4% (n=1817) were males. The total median age was 62 years (range 0.12 – 97.2 years), and the majority of patients (83%) presented after the age of forty years. The top two documented medical histories were diabetes mellitus (DM) and surgery. DM was observed in 16.3% of the patients, and surgery at 15.4%. Other frequent patient histories (among the top 20) were fracture, cancer (ca), motor vehicle accident (MVA), leg, lumbar, discopathy, back and carpal tunnel release (CTR). When separating the data by sex, we can see that DM and MVA are more frequent among males, while cancer and CTR are less frequent. On the other hand, the top medical history in females was surgery and, after that, DM. Other frequent histories among females are breast cancer, fractures, and CTR. In the younger population (ages 18 to 26), the frequent PMH were surgery, fractures, trauma, and MVA. Discussion: By applying text mining approaches to unstructured data, we were able to better understand which medical histories are more relevant in these circumstances and, in addition, gain additional insights regarding sex and age differences. These insights might help to collect epidemiological demographical data as well as raise new hypotheses. One limitation of this work is that each clinician might use different words or abbreviations to describe the same condition, and therefore using a coding system can be beneficial.

Keywords: abnormal studies, healthcare analytics, medical history, nerve conduction studies, text mining, textual analysis

Procedia PDF Downloads 88
26333 From Two-Way to Multi-Way: A Comparative Study for Map-Reduce Join Algorithms

Authors: Marwa Hussien Mohamed, Mohamed Helmy Khafagy

Abstract:

Map-Reduce is a programming model which is widely used to extract valuable information from enormous volumes of data. Map-reduce designed to support heterogeneous datasets. Apache Hadoop map-reduce used extensively to uncover hidden pattern like data mining, SQL, etc. The most important operation for data analysis is joining operation. But, map-reduce framework does not directly support join algorithm. This paper explains and compares two-way and multi-way map-reduce join algorithms for map reduce also we implement MR join Algorithms and show the performance of each phase in MR join algorithms. Our experimental results show that map side join and map merge join in two-way join algorithms has the longest time according to preprocessing step sorting data and reduce side cascade join has the longest time at Multi-Way join algorithms.

Keywords: Hadoop, MapReduce, multi-way join, two-way join, Ubuntu

Procedia PDF Downloads 474