Search results for: Data Mining
25169 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink
Authors: Sanjay Rathee, Arti Kashyap
Abstract:
Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining
Procedia PDF Downloads 29425168 Heritage Value and Industrial Tourism Potential of the Urals, Russia
Authors: Anatoly V. Stepanov, Maria Y. Ilyushkina, Alexander S. Burnasov
Abstract:
Expansion of tourism, especially after WWII, has led to significant improvements in the regional infrastructure. The present study has revealed a lot of progress in the advancement of industrial heritage narrative in the Central Urals. The evidence comes from the general public’s increased fascination with some of Europe’s oldest mining and industrial sites, and the agreement of many stakeholders that the Urals industrial heritage should be preserved. The development of tourist sites in Nizhny Tagil and Nevyansk, gold-digging in Beryosovsky, gemstone search in Murzinka, and the progress with the Urals Gemstone Ring project are the examples showing the immense opportunities of industrial heritage tourism development in the region that are still to be realized. Regardless of the economic future of the Central Urals, whether it will remain an industrial region or experience a deeper deindustrialization, the sprouts of the industrial heritage tourism should be advanced and amplified for the benefit of local communities and the tourist community at large as it is hard to imagine a more suitable site for the discovery of industrial and mining heritage than the Central Urals Region of Russia.Keywords: industrial heritage, mining heritage, Central Urals, Russia
Procedia PDF Downloads 13625167 Strategies to Enhance Compliance of Health and Safety Standards at the Selected Mining Industries in Limpopo Province, South Africa: Occupational Health Nurse’s Perspective
Authors: Livhuwani Muthelo
Abstract:
The health and safety of the miners in the South African mining industry are guided by the regulations and standards which are anticipated to promote a healthy work environment and fatalities. It is of utmost importance for the miners to comply with these regulations/standards to protect themselves from potential occupational health and safety risks, accidents, and fatalities. The purpose of this study was to develop and validate strategies to enhance compliance with the Health and safety standards within the mining industries of Limpopo province in South Africa. A mixed-method exploratory sequential research design was adopted. The population consisted of 5350 miners. Purposive sampling was used to select the participants in the qualitative strand and stratified random sampling in the quantitative strand. Semi-structured interviews were conducted among the occupational health nurse practitioners and the health and safety team. Thematic analysis was used to generate an understanding of the interviews. In the quantitative strand, a survey was conducted using a self-administered questionnaire. Data were analysed using SPSS version 26.0. A descriptive statistical test was used in the analysis of data including frequencies, means, and standard deviation. Cronbach's alpha test was used to measure internal consistency. The integrated results revealed that there are diverse experiences related to health and safety standards compliance among the mineworkers. The main findings were challenges related to leadership compliance and also related to the cost of maintaining safety, Miner's behavior-related challenges; the impact of non-compliance on the overall health of the miners was also described, the conflict between production and safety. Health and safety compliance is not just mere compliance with regulations and standards but a culture that warrants the miners and organization to take responsibility for their behavior and actions towards health and safety. Thus taking responsibility for your well-being and other miners.Keywords: perceptions, compliance, health and safety, legislation, standards, miners
Procedia PDF Downloads 10425166 Discovering the Effects of Meteorological Variables on the Air Quality of Bogota, Colombia, by Data Mining Techniques
Authors: Fabiana Franceschi, Martha Cobo, Manuel Figueredo
Abstract:
Bogotá, the capital of Colombia, is its largest city and one of the most polluted in Latin America due to the fast economic growth over the last ten years. Bogotá has been affected by high pollution events which led to the high concentration of PM10 and NO2, exceeding the local 24-hour legal limits (100 and 150 g/m3 each). The most important pollutants in the city are PM10 and PM2.5 (which are associated with respiratory and cardiovascular problems) and it is known that their concentrations in the atmosphere depend on the local meteorological factors. Therefore, it is necessary to establish a relationship between the meteorological variables and the concentrations of the atmospheric pollutants such as PM10, PM2.5, CO, SO2, NO2 and O3. This study aims to determine the interrelations between meteorological variables and air pollutants in Bogotá, using data mining techniques. Data from 13 monitoring stations were collected from the Bogotá Air Quality Monitoring Network within the period 2010-2015. The Principal Component Analysis (PCA) algorithm was applied to obtain primary relations between all the parameters, and afterwards, the K-means clustering technique was implemented to corroborate those relations found previously and to find patterns in the data. PCA was also used on a per shift basis (morning, afternoon, night and early morning) to validate possible variation of the previous trends and a per year basis to verify that the identified trends have remained throughout the study time. Results demonstrated that wind speed, wind direction, temperature, and NO2 are the most influencing factors on PM10 concentrations. Furthermore, it was confirmed that high humidity episodes increased PM2,5 levels. It was also found that there are direct proportional relationships between O3 levels and wind speed and radiation, while there is an inverse relationship between O3 levels and humidity. Concentrations of SO2 increases with the presence of PM10 and decreases with the wind speed and wind direction. They proved as well that there is a decreasing trend of pollutant concentrations over the last five years. Also, in rainy periods (March-June and September-December) some trends regarding precipitations were stronger. Results obtained with K-means demonstrated that it was possible to find patterns on the data, and they also showed similar conditions and data distribution among Carvajal, Tunal and Puente Aranda stations, and also between Parque Simon Bolivar and las Ferias. It was verified that the aforementioned trends prevailed during the study period by applying the same technique per year. It was concluded that PCA algorithm is useful to establish preliminary relationships among variables, and K-means clustering to find patterns in the data and understanding its distribution. The discovery of patterns in the data allows using these clusters as an input to an Artificial Neural Network prediction model.Keywords: air pollution, air quality modelling, data mining, particulate matter
Procedia PDF Downloads 25825165 Relay Mining: Verifiable Multi-Tenant Distributed Rate Limiting
Authors: Daniel Olshansky, Ramiro Rodrıguez Colmeiro
Abstract:
Relay Mining presents a scalable solution employing probabilistic mechanisms and crypto-economic incentives to estimate RPC volume usage, facilitating decentralized multitenant rate limiting. Network traffic from individual applications can be concurrently serviced by multiple RPC service providers, with costs, rewards, and rate limiting governed by a native cryptocurrency on a distributed ledger. Building upon established research in token bucket algorithms and distributed rate-limiting penalty models, our approach harnesses a feedback loop control mechanism to adjust the difficulty of mining relay rewards, dynamically scaling with network usage growth. By leveraging crypto-economic incentives, we reduce coordination overhead costs and introduce a mechanism for providing RPC services that are both geopolitically and geographically distributed.Keywords: remote procedure call, crypto-economic, commit-reveal, decentralization, scalability, blockchain, rate limiting, token bucket
Procedia PDF Downloads 5425164 Multi-scale Spatial and Unified Temporal Feature-fusion Network for Multivariate Time Series Anomaly Detection
Authors: Hang Yang, Jichao Li, Kewei Yang, Tianyang Lei
Abstract:
Multivariate time series anomaly detection is a significant research topic in the field of data mining, encompassing a wide range of applications across various industrial sectors such as traffic roads, financial logistics, and corporate production. The inherent spatial dependencies and temporal characteristics present in multivariate time series introduce challenges to the anomaly detection task. Previous studies have typically been based on the assumption that all variables belong to the same spatial hierarchy, neglecting the multi-level spatial relationships. To address this challenge, this paper proposes a multi-scale spatial and unified temporal feature fusion network, denoted as MSUT-Net, for multivariate time series anomaly detection. The proposed model employs a multi-level modeling approach, incorporating both temporal and spatial modules. The spatial module is designed to capture the spatial characteristics of multivariate time series data, utilizing an adaptive graph structure learning model to identify the multi-level spatial relationships between data variables and their attributes. The temporal module consists of a unified temporal processing module, which is tasked with capturing the temporal features of multivariate time series. This module is capable of simultaneously identifying temporal dependencies among different variables. Extensive testing on multiple publicly available datasets confirms that MSUT-Net achieves superior performance on the majority of datasets. Our method is able to model and accurately detect systems data with multi-level spatial relationships from a spatial-temporal perspective, providing a novel perspective for anomaly detection analysis.Keywords: data mining, industrial system, multivariate time series, anomaly detection
Procedia PDF Downloads 1525163 Applying Big Data Analysis to Efficiently Exploit the Vast Unconventional Tight Oil Reserves
Authors: Shengnan Chen, Shuhua Wang
Abstract:
Successful production of hydrocarbon from unconventional tight oil reserves has changed the energy landscape in North America. The oil contained within these reservoirs typically will not flow to the wellbore at economic rates without assistance from advanced horizontal well and multi-stage hydraulic fracturing. Efficient and economic development of these reserves is a priority of society, government, and industry, especially under the current low oil prices. Meanwhile, society needs technological and process innovations to enhance oil recovery while concurrently reducing environmental impacts. Recently, big data analysis and artificial intelligence become very popular, developing data-driven insights for better designs and decisions in various engineering disciplines. However, the application of data mining in petroleum engineering is still in its infancy. The objective of this research aims to apply intelligent data analysis and data-driven models to exploit unconventional oil reserves both efficiently and economically. More specifically, a comprehensive database including the reservoir geological data, reservoir geophysical data, well completion data and production data for thousands of wells is firstly established to discover the valuable insights and knowledge related to tight oil reserves development. Several data analysis methods are introduced to analysis such a huge dataset. For example, K-means clustering is used to partition all observations into clusters; principle component analysis is applied to emphasize the variation and bring out strong patterns in the dataset, making the big data easy to explore and visualize; exploratory factor analysis (EFA) is used to identify the complex interrelationships between well completion data and well production data. Different data mining techniques, such as artificial neural network, fuzzy logic, and machine learning technique are then summarized, and appropriate ones are selected to analyze the database based on the prediction accuracy, model robustness, and reproducibility. Advanced knowledge and patterned are finally recognized and integrated into a modified self-adaptive differential evolution optimization workflow to enhance the oil recovery and maximize the net present value (NPV) of the unconventional oil resources. This research will advance the knowledge in the development of unconventional oil reserves and bridge the gap between the big data and performance optimizations in these formations. The newly developed data-driven optimization workflow is a powerful approach to guide field operation, which leads to better designs, higher oil recovery and economic return of future wells in the unconventional oil reserves.Keywords: big data, artificial intelligence, enhance oil recovery, unconventional oil reserves
Procedia PDF Downloads 28325162 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data
Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill
Abstract:
Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function
Procedia PDF Downloads 27925161 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 18925160 The Affective Motivation of Women Miners in Ghana
Authors: Adesuwa Omorede, Rufai Haruna Kilu
Abstract:
Affective motivation (motivation that is emotionally laden usually related to affect, passion, emotions, moods) in the workplace stimulates individuals to reinforce, persist and commit to their task, which leads to the individual and organizational performance. This leads individuals to reach goals especially in situations where task are highly challenging and hostile. In such situations, individuals are more disposed to be more creative, innovative and see new opportunities from the loopholes in their workplace. However, when individuals feel displaced and less important, an adverse reaction may suffice which may be detrimental to the organization and its performance. One sector where affective motivation is eminently present and relevant, is the mining industry. Due to its intense work environment; mostly dominated by men and masculinity cultures; and deliberate exclusion of women in this environment which, makes the women working in these environments to feel marginalized. In Ghana, the mining industry is mostly seen as a very physical environment especially underground and mostly considerd as 'no place for a woman'. Despite the fact that these women feel less 'needed' or 'appreciated' in such environments, they still have to juggle between intense work shifts; face violence and other health risks with their families, which put a strain on their affective motivational reaction. Beyond these challenges, however, several mining companies in Ghana today are working towards providing a fair and equal working situation for both men and women miners, by recognizing them as key stakeholders, as well as including them in the stages of mining projects from the planning and designing phase to the evaluation and implementation stage. Drawing from the psychology and gender literature, this study takes a narrative approach to identify and understand the shifting gender dynamics within the mine works in Ghana, occasioning a change in background disposition of miners, which leads to more women taking up mine jobs in the country. In doing so, a qualitative study was conducted using semi-structured interviews from Ghana. Several women working within the mining industries in Ghana shared their experiences and how they felt and still feel in their workplace. In addition, archival documents were gathered to support the findings. The results suggest a change in enrolment regimes in a mining and technology university in Ghana, making room for a more gender equal enrolments in the university. A renowned university that train and feed mine work professional into the industry. The results further acknowledge gender equal and diversity recruitment policies and initiatives among the mining companies of Ghana. This study contributes to the psychology and gender literature by highlighting the hindrances women face in the mining industry as well as highlighting several of their affective reactions towards gender inequality. The study also provides several suggestions for decision makers in the mining industry of what can be done in the future to reduce the gender inequality gap within the industry.Keywords: affective motivation, gender shape shifting, mining industry, women miners
Procedia PDF Downloads 30125159 Text Mining Past Medical History in Electrophysiological Studies
Authors: Roni Ramon-Gonen, Amir Dori, Shahar Shelly
Abstract:
Background and objectives: Healthcare professionals produce abundant textual information in their daily clinical practice. The extraction of insights from all the gathered information, mainly unstructured and lacking in normalization, is one of the major challenges in computational medicine. In this respect, text mining assembles different techniques to derive valuable insights from unstructured textual data, so it has led to being especially relevant in Medicine. Neurological patient’s history allows the clinician to define the patient’s symptoms and along with the result of the nerve conduction study (NCS) and electromyography (EMG) test, assists in formulating a differential diagnosis. Past medical history (PMH) helps to direct the latter. In this study, we aimed to identify relevant PMH, understand which PMHs are common among patients in the referral cohort and documented by the medical staff, and examine the differences by sex and age in a large cohort based on textual format notes. Methods: We retrospectively identified all patients with abnormal NCS between May 2016 to February 2022. Age, gender, and all NCS attributes reports were recorded, including the summary text. All patients’ histories were extracted from the text report by a query. Basic text cleansing and data preparation were performed, as well as lemmatization. Very popular words (like ‘left’ and ‘right’) were deleted. Several words were replaced with their abbreviations. A bag of words approach was used to perform the analyses. Different visualizations which are common in text analysis, were created to easily grasp the results. Results: We identified 5282 unique patients. Three thousand and five (57%) patients had documented PMH. Of which 60.4% (n=1817) were males. The total median age was 62 years (range 0.12 – 97.2 years), and the majority of patients (83%) presented after the age of forty years. The top two documented medical histories were diabetes mellitus (DM) and surgery. DM was observed in 16.3% of the patients, and surgery at 15.4%. Other frequent patient histories (among the top 20) were fracture, cancer (ca), motor vehicle accident (MVA), leg, lumbar, discopathy, back and carpal tunnel release (CTR). When separating the data by sex, we can see that DM and MVA are more frequent among males, while cancer and CTR are less frequent. On the other hand, the top medical history in females was surgery and, after that, DM. Other frequent histories among females are breast cancer, fractures, and CTR. In the younger population (ages 18 to 26), the frequent PMH were surgery, fractures, trauma, and MVA. Discussion: By applying text mining approaches to unstructured data, we were able to better understand which medical histories are more relevant in these circumstances and, in addition, gain additional insights regarding sex and age differences. These insights might help to collect epidemiological demographical data as well as raise new hypotheses. One limitation of this work is that each clinician might use different words or abbreviations to describe the same condition, and therefore using a coding system can be beneficial.Keywords: abnormal studies, healthcare analytics, medical history, nerve conduction studies, text mining, textual analysis
Procedia PDF Downloads 9625158 A New Approach for Improving Accuracy of Multi Label Stream Data
Authors: Kunal Shah, Swati Patel
Abstract:
Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer
Procedia PDF Downloads 58425157 Shark Detection and Classification with Deep Learning
Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti
Abstract:
Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.Keywords: classification, data mining, Instagram, remote monitoring, sharks
Procedia PDF Downloads 12125156 Filtering Intrusion Detection Alarms Using Ant Clustering Approach
Authors: Ghodhbani Salah, Jemili Farah
Abstract:
With the growth of cyber attacks, information safety has become an important issue all over the world. Many firms rely on security technologies such as intrusion detection systems (IDSs) to manage information technology security risks. IDSs are considered to be the last line of defense to secure a network and play a very important role in detecting large number of attacks. However the main problem with today’s most popular commercial IDSs is generating high volume of alerts and huge number of false positives. This drawback has become the main motivation for many research papers in IDS area. Hence, in this paper we present a data mining technique to assist network administrators to analyze and reduce false positive alarms that are produced by an IDS and increase detection accuracy. Our data mining technique is unsupervised clustering method based on hybrid ANT algorithm. This algorithm discovers clusters of intruders’ behavior without prior knowledge of a possible number of classes, then we apply K-means algorithm to improve the convergence of the ANT clustering. Experimental results on real dataset show that our proposed approach is efficient with high detection rate and low false alarm rate.Keywords: intrusion detection system, alarm filtering, ANT class, ant clustering, intruders’ behaviors, false alarms
Procedia PDF Downloads 40325155 Application of Remote Sensing Technique on the Monitoring of Mine Eco-Environment
Authors: Haidong Li, Weishou Shen, Guoping Lv, Tao Wang
Abstract:
Aiming to overcome the limitation of the application of traditional remote sensing (RS) technique in the mine eco-environmental monitoring, in this paper, we first classified the eco-environmental damages caused by mining activities and then introduced the principle, classification and characteristics of the Light Detection and Ranging (LiDAR) technique. The potentiality of LiDAR technique in the mine eco-environmental monitoring was analyzed, particularly in extracting vertical structure parameters of vegetation, through comparing the feasibility and applicability of traditional RS method and LiDAR technique in monitoring different types of indicators. The application situation of LiDAR technique in extracting typical mine indicators, such as land destruction in mining areas, damage of ecological integrity and natural soil erosion. The result showed that the LiDAR technique has the ability to monitor most of the mine eco-environmental indicators, and exhibited higher accuracy comparing with traditional RS technique, specifically speaking, the applicability of LiDAR technique on each indicator depends on the accuracy requirement of mine eco-environmental monitoring. In the item of large mine, LiDAR three-dimensional point cloud data not only could be used as the complementary data source of optical RS, Airborne/Satellite LiDAR could also fulfill the demand of extracting vertical structure parameters of vegetation in large areas.Keywords: LiDAR, mine, ecological damage, monitoring, traditional remote sensing technique
Procedia PDF Downloads 39725154 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop
Authors: Anuta Mukherjee, Saswati Mukherjee
Abstract:
Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.Keywords: sentiment analysis, twitter, collision theory, discourse analysis
Procedia PDF Downloads 53525153 Lessons from Farmers Performing Agroforestry for Reclamation of Gold Mine Spoils in Colombia
Authors: Bibiana Betancur-Corredor, Juan Carlos Loaiza, Manfred Denich, Christian Borgemeister
Abstract:
Alluvial gold mining generates a vast amount of deposits that cover the natural soil and negatively impacts riverbeds and valleys, causing loss of livelihood opportunities for farmers of these regions. In Colombia, more than 79,000 ha are affected by alluvial gold mining, therefore developing strategies to return this land to productivity is of crucial importance for the country. A novel restoration strategy has been created by a mining company, where the land is restored through the establishment of agroforestry systems, in which agricultural crops and livestock are combined to complement reforestation in the area. The purpose of this study is to capture the knowledge of farmers who perform agroforestry in areas with deposits created by alluvial gold mining activities. Semi structured interviews were conducted with farmers with regard to the following: indicators of soil fertility, management practices, soil heterogeneity, pest outbreaks and weeds. In order to compare the perceptions of soil fertility of farmers with physicochemical properties of soils, the farmers were asked to identify spots within their farms that have exhibited good and poor yields. Soil samples were collected in order to correlate farmer’s perceptions with soil physicochemical properties. The findings suggest that the main challenge that farmers face is the identification of fertile soil for crop establishment. They identify the fertile soil through visually analyzing soil color and compaction as well as the use of spontaneous growth of specific plants as indicator of soil fertility. For less fertile areas, nitrogen fixing plants are used as green manure to restore soil fertility for crop establishment. The findings of this study imply that if gold mining is followed by reclamation practices that involve the successful establishment of productive farmlands, agricultural productivity of these lands might improve, increasing food security of the affected communities.Keywords: agroforestry, knowledge, mining, restoration
Procedia PDF Downloads 23325152 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines
Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma
Abstract:
Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)
Procedia PDF Downloads 27425151 From Two-Way to Multi-Way: A Comparative Study for Map-Reduce Join Algorithms
Authors: Marwa Hussien Mohamed, Mohamed Helmy Khafagy
Abstract:
Map-Reduce is a programming model which is widely used to extract valuable information from enormous volumes of data. Map-reduce designed to support heterogeneous datasets. Apache Hadoop map-reduce used extensively to uncover hidden pattern like data mining, SQL, etc. The most important operation for data analysis is joining operation. But, map-reduce framework does not directly support join algorithm. This paper explains and compares two-way and multi-way map-reduce join algorithms for map reduce also we implement MR join Algorithms and show the performance of each phase in MR join algorithms. Our experimental results show that map side join and map merge join in two-way join algorithms has the longest time according to preprocessing step sorting data and reduce side cascade join has the longest time at Multi-Way join algorithms.Keywords: Hadoop, MapReduce, multi-way join, two-way join, Ubuntu
Procedia PDF Downloads 48725150 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-
Authors: Nieto Bernal Wilson, Carmona Suarez Edgar
Abstract:
The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects. Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse
Procedia PDF Downloads 40925149 Recommender System Based on Mining Graph Databases for Data-Intensive Applications
Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi
Abstract:
In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.Keywords: graph databases, NLP, recommendation systems, similarity metrics
Procedia PDF Downloads 10425148 A General Framework for Measuring the Internal Fraud Risk of an Enterprise Resource Planning System
Authors: Imran Dayan, Ashiqul Khan
Abstract:
Internal corporate fraud, which is fraud carried out by internal stakeholders of a company, affects the well-being of the organisation just like its external counterpart. Even if such an act is carried out for the short-term benefit of a corporation, the act is ultimately harmful to the entity in the long run. Internal fraud is often carried out by relying upon aberrations from usual business processes. Business processes are the lifeblood of a company in modern managerial context. Such processes are developed and fine-tuned over time as a corporation grows through its life stages. Modern corporations have embraced technological innovations into their business processes, and Enterprise Resource Planning (ERP) systems being at the heart of such business processes is a testimony to that. Since ERP systems record a huge amount of data in their event logs, the logs are a treasure trove for anyone trying to detect any sort of fraudulent activities hidden within the day-to-day business operations and processes. This research utilises the ERP systems in place within corporations to assess the likelihood of prospective internal fraud through developing a framework for measuring the risks of fraud through Process Mining techniques and hence finds risky designs and loose ends within these business processes. This framework helps not only in identifying existing cases of fraud in the records of the event log, but also signals the overall riskiness of certain business processes, and hence draws attention for carrying out a redesign of such processes to reduce the chance of future internal fraud while improving internal control within the organisation. The research adds value by applying the concepts of Process Mining into the analysis of data from modern day applications of business process records, which is the ERP event logs, and develops a framework that should be useful to internal stakeholders for strengthening internal control as well as provide external auditors with a tool of use in case of suspicion. The research proves its usefulness through a few case studies conducted with respect to big corporations with complex business processes and an ERP in place.Keywords: enterprise resource planning, fraud risk framework, internal corporate fraud, process mining
Procedia PDF Downloads 33425147 Unsupervised Text Mining Approach to Early Warning System
Authors: Ichihan Tai, Bill Olson, Paul Blessner
Abstract:
Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.Keywords: early warning system, knowledge management, market prediction, topic modeling.
Procedia PDF Downloads 33825146 Designing Supplier Partnership Success Factors in the Coal Mining Industry
Authors: Ahmad Afif, Teuku Yuri M. Zagloel
Abstract:
Sustainable supply chain management is a new pattern that has emerged recently in industry and companies. The procurement process is one of the key factors for efficiency in supply chain management practices. Partnership is one of the procurement strategies for strategic items. The success factors of the partnership must be determined to avoid things that endanger the financial and operational status of the company. The current supplier partnership research focuses on the selection of general criteria and sustainable supplier selection. Currently, there is still limited research on the success factors of supplier partnerships that focus on strategic items in the coal mining industry. Meanwhile, the procurement of coal mining has its own characteristics, and there are regulations related to the procurement of goods. Therefore, this research was conducted to determine the categories of goods that are included in the strategic items and to design the success factors of supplier partnerships. The main factors studied are general, financial, production, reputation, synergies, and sustainable. The research was conducted using the Kraljic method to determine the categories of goods that are included in the strategic items. To design a supplier partnership success factor using the Hybrid Multi Criteria Decision Making method. Integrated Fuzzy AHP-Fuzzy TOPSIS is used to determine the weight of the success factors of supplier partnerships and to rank suppliers on the factors used.Keywords: supplier, partnership, strategic item, success factors, and coal mining industry
Procedia PDF Downloads 13025145 Agriculture Water Quality Evaluation in Minig Basin
Authors: Ben Salah Nahla
Abstract:
The problem of water in Tunisia affects the quality and quantity. Tunisia is in a situation of water shortage. It was estimated that 4.6 Mm3/an. Moreover, the quality of water in Tunisia is also mediocre. In fact, 50% of the water has a high salinity (> 1.5g/l). There are several parameters which affect water quality such as sodium, fluoride. An excess of this parameter may induce some human health. Furthermore, the mining basin area has a problem of industrial waste. This problem may affect the water quality of the groundwater. Therefore, the purpose of this work is to assess the water quality in Basin Mining and the impact of fluorine. For this research, some water samples were done in the field and specific water analysis was implemented in the laboratory. Sampling is carried out on eight drilling in the area of the mining region. In the following, we will look at water view composition, physical and chemical quality. A physical-chemical analysis of water from a survey of the Mining area of Tunisia was performed and showed an excess for the following items: fluorine, sodium, sulfate. So many chemicals may be present in water. However, only a small number of them immediately concern in terms of health in all circumstances. Fluorine (F) is one particular chemical that is considered both necessary for the human body, but an excess of the rate of this chemical causes serious diseases. Sodium fluoride and sodium silicofluoride are more soluble and may spread in animals and plants where their toxicity largest organizations. The more complex particles such as cryolite and fluorite, almost insoluble, are more stable and less toxic. Thereafter, we will study the problem of excess fluorine in the water. The latter intended for human consumption must always comply with the limits for microbiological quality parameters and physical-chemical parameters defined by European standards (1.5 mg/l) and Tunisian (2 mg/l).Keywords: water, minier basin, fluorine, silicofluoride
Procedia PDF Downloads 58225144 Opinion Mining to Extract Community Emotions on Covid-19 Immunization Possible Side Effects
Authors: Yahya Almurtadha, Mukhtar Ghaleb, Ahmed M. Shamsan Saleh
Abstract:
The world witnessed a fierce attack from the Covid-19 virus, which affected public life socially, economically, healthily and psychologically. The world's governments tried to confront the pandemic by imposing a number of precautionary measures such as general closure, curfews and social distancing. Scientists have also made strenuous efforts to develop an effective vaccine to train the immune system to develop antibodies to combat the virus, thus reducing its symptoms and limiting its spread. Artificial intelligence, along with researchers and medical authorities, has accelerated the vaccine development process through big data processing and simulation. On the other hand, one of the most important negatives of the impact of Covid 19 was the state of anxiety and fear due to the blowout of rumors through social media, which prompted governments to try to reassure the public with the available means. This study aims to proposed using Sentiment Analysis (AKA Opinion Mining) and deep learning as efficient artificial intelligence techniques to work on retrieving the tweets of the public from Twitter and then analyze it automatically to extract their opinions, expression and feelings, negatively or positively, about the symptoms they may feel after vaccination. Sentiment analysis is characterized by its ability to access what the public post in social media within a record time and at a lower cost than traditional means such as questionnaires and interviews, not to mention the accuracy of the information as it comes from what the public expresses voluntarily.Keywords: deep learning, opinion mining, natural language processing, sentiment analysis
Procedia PDF Downloads 17125143 Integrating Data Mining with Case-Based Reasoning for Diagnosing Sorghum Anthracnose
Authors: Mariamawit T. Belete
Abstract:
Cereal production and marketing are the means of livelihood for millions of households in Ethiopia. However, cereal production is constrained by technical and socio-economic factors. Among the technical factors, cereal crop diseases are the major contributing factors to the low yield. The aim of this research is to develop an integration of data mining and knowledge based system for sorghum anthracnose disease diagnosis that assists agriculture experts and development agents to make timely decisions. Anthracnose diagnosing systems gather information from Melkassa agricultural research center and attempt to score anthracnose severity scale. Empirical research is designed for data exploration, modeling, and confirmatory procedures for testing hypothesis and prediction to draw a sound conclusion. WEKA (Waikato Environment for Knowledge Analysis) was employed for the modeling. Knowledge based system has come across a variety of approaches based on the knowledge representation method; case-based reasoning (CBR) is one of the popular approaches used in knowledge-based system. CBR is a problem solving strategy that uses previous cases to solve new problems. The system utilizes hidden knowledge extracted by employing clustering algorithms, specifically K-means clustering from sampled anthracnose dataset. Clustered cases with centroid value are mapped to jCOLIBRI, and then the integrator application is created using NetBeans with JDK 8.0.2. The important part of a case based reasoning model includes case retrieval; the similarity measuring stage, reuse; which allows domain expert to transfer retrieval case solution to suit for the current case, revise; to test the solution, and retain to store the confirmed solution to the case base for future use. Evaluation of the system was done for both system performance and user acceptance. For testing the prototype, seven test cases were used. Experimental result shows that the system achieves an average precision and recall values of 70% and 83%, respectively. User acceptance testing also performed by involving five domain experts, and an average of 83% acceptance is achieved. Although the result of this study is promising, however, further study should be done an investigation on hybrid approach such as rule based reasoning, and pictorial retrieval process are recommended.Keywords: sorghum anthracnose, data mining, case based reasoning, integration
Procedia PDF Downloads 8125142 Helping the Development of Public Policies with Knowledge of Criminal Data
Authors: Diego De Castro Rodrigues, Marcelo B. Nery, Sergio Adorno
Abstract:
The project aims to develop a framework for social data analysis, particularly by mobilizing criminal records and applying descriptive computational techniques, such as associative algorithms and extraction of tree decision rules, among others. The methods and instruments discussed in this work will enable the discovery of patterns, providing a guided means to identify similarities between recurring situations in the social sphere using descriptive techniques and data visualization. The study area has been defined as the city of São Paulo, with the structuring of social data as the central idea, with a particular focus on the quality of the information. Given this, a set of tools will be validated, including the use of a database and tools for visualizing the results. Among the main deliverables related to products and the development of articles are the discoveries made during the research phase. The effectiveness and utility of the results will depend on studies involving real data, validated both by domain experts and by identifying and comparing the patterns found in this study with other phenomena described in the literature. The intention is to contribute to evidence-based understanding and decision-making in the social field.Keywords: social data analysis, criminal records, computational techniques, data mining, big data
Procedia PDF Downloads 8425141 Development of Column-Filters of Sulfur Limonene Polysulfide to Mercury Removal from Contaminated Effluents
Authors: Galo D. Soria, Jenny S. Casame, Eddy F. Pazmino
Abstract:
In Ecuador, mining operations have significantly impacted water sources. Artisanal mining extensively relies in mercury amalgamation. Mercury is a neurotoxic substance even at low concentrations. The objective of this investigation is to exploit Hg-removal capacity of sulfur-limonene polysulfide (SLP), which is a low-cost polymer, in order to prepare granular media (sand) coated with SLP to be used in laboratory scale column-filtration systems. Preliminary results achieved 85% removal of Hg⁺⁺ from synthetic effluents using 20-cm length and 5-cm diameter columns at 119m/day average pore water velocity. During elution of the column, the SLP-coated sand indicated that Hg⁺⁺ is permanently fixed to the collector surface, in contrast, uncoated sand showed reversible retention in Hg⁺⁺ in the solid phase. Injection of 50 pore volumes decreased Hg⁺⁺ removal to 46%. Ongoing work has been focused in optimizing the synthesis of SLP and the polymer content in the porous media coating process to improve Hg⁺⁺ removal and extend the lifetime of the column-filter.Keywords: column-filter, mercury, mining, polysulfide, water treatment
Procedia PDF Downloads 14925140 Ontology-Driven Knowledge Discovery and Validation from Admission Databases: A Structural Causal Model Approach for Polytechnic Education in Nigeria
Authors: Bernard Igoche Igoche, Olumuyiwa Matthew, Peter Bednar, Alexander Gegov
Abstract:
This study presents an ontology-driven approach for knowledge discovery and validation from admission databases in Nigerian polytechnic institutions. The research aims to address the challenges of extracting meaningful insights from vast amounts of admission data and utilizing them for decision-making and process improvement. The proposed methodology combines the knowledge discovery in databases (KDD) process with a structural causal model (SCM) ontological framework. The admission database of Benue State Polytechnic Ugbokolo (Benpoly) is used as a case study. The KDD process is employed to mine and distill knowledge from the database, while the SCM ontology is designed to identify and validate the important features of the admission process. The SCM validation is performed using the conditional independence test (CIT) criteria, and an algorithm is developed to implement the validation process. The identified features are then used for machine learning (ML) modeling and prediction of admission status. The results demonstrate the adequacy of the SCM ontological framework in representing the admission process and the high predictive accuracies achieved by the ML models, with k-nearest neighbors (KNN) and support vector machine (SVM) achieving 92% accuracy. The study concludes that the proposed ontology-driven approach contributes to the advancement of educational data mining and provides a foundation for future research in this domain.Keywords: admission databases, educational data mining, machine learning, ontology-driven knowledge discovery, polytechnic education, structural causal model
Procedia PDF Downloads 62