Search results for: distributed data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25808

Search results for: distributed data mining

25328 Discovering the Effects of Meteorological Variables on the Air Quality of Bogota, Colombia, by Data Mining Techniques

Authors: Fabiana Franceschi, Martha Cobo, Manuel Figueredo

Abstract:

Bogotá, the capital of Colombia, is its largest city and one of the most polluted in Latin America due to the fast economic growth over the last ten years. Bogotá has been affected by high pollution events which led to the high concentration of PM10 and NO2, exceeding the local 24-hour legal limits (100 and 150 g/m3 each). The most important pollutants in the city are PM10 and PM2.5 (which are associated with respiratory and cardiovascular problems) and it is known that their concentrations in the atmosphere depend on the local meteorological factors. Therefore, it is necessary to establish a relationship between the meteorological variables and the concentrations of the atmospheric pollutants such as PM10, PM2.5, CO, SO2, NO2 and O3. This study aims to determine the interrelations between meteorological variables and air pollutants in Bogotá, using data mining techniques. Data from 13 monitoring stations were collected from the Bogotá Air Quality Monitoring Network within the period 2010-2015. The Principal Component Analysis (PCA) algorithm was applied to obtain primary relations between all the parameters, and afterwards, the K-means clustering technique was implemented to corroborate those relations found previously and to find patterns in the data. PCA was also used on a per shift basis (morning, afternoon, night and early morning) to validate possible variation of the previous trends and a per year basis to verify that the identified trends have remained throughout the study time. Results demonstrated that wind speed, wind direction, temperature, and NO2 are the most influencing factors on PM10 concentrations. Furthermore, it was confirmed that high humidity episodes increased PM2,5 levels. It was also found that there are direct proportional relationships between O3 levels and wind speed and radiation, while there is an inverse relationship between O3 levels and humidity. Concentrations of SO2 increases with the presence of PM10 and decreases with the wind speed and wind direction. They proved as well that there is a decreasing trend of pollutant concentrations over the last five years. Also, in rainy periods (March-June and September-December) some trends regarding precipitations were stronger. Results obtained with K-means demonstrated that it was possible to find patterns on the data, and they also showed similar conditions and data distribution among Carvajal, Tunal and Puente Aranda stations, and also between Parque Simon Bolivar and las Ferias. It was verified that the aforementioned trends prevailed during the study period by applying the same technique per year. It was concluded that PCA algorithm is useful to establish preliminary relationships among variables, and K-means clustering to find patterns in the data and understanding its distribution. The discovery of patterns in the data allows using these clusters as an input to an Artificial Neural Network prediction model.

Keywords: air pollution, air quality modelling, data mining, particulate matter

Procedia PDF Downloads 235
25327 Distributed Acoustic Sensing Signal Model under Static Fiber Conditions

Authors: G. Punithavathy

Abstract:

The research proposes a statistical model for the distributed acoustic sensor interrogation units that broadcast a laser pulse into the fiber optics, where interactions within the fiber determine the localized acoustic energy that causes light reflections known as backscatter. The backscattered signal's amplitude and phase can be calculated using explicit equations. The created model makes amplitude signal spectrum and autocorrelation predictions that are confirmed by experimental findings. Phase signal characteristics that are useful for researching optical time domain reflectometry (OTDR) system sensing applications are provided and examined, showing good agreement with the experiment. The experiment was successfully done with the use of Python coding. In this research, we can analyze the entire distributed acoustic sensing (DAS) component parts separately. This model assumes that the fiber is in a static condition, meaning that there is no external force or vibration applied to the cable, that means no external acoustic disturbances present. The backscattered signal consists of a random noise component, which is caused by the intrinsic imperfections of the fiber, and a coherent component, which is due to the laser pulse interacting with the fiber.

Keywords: distributed acoustic sensing, optical fiber devices, optical time domain reflectometry, Rayleigh scattering

Procedia PDF Downloads 49
25326 Investigation of Yard Seam Workings for the Proposed Newcastle Light Rail Project

Authors: David L. Knott, Robert Kingsland, Alistair Hitchon

Abstract:

The proposed Newcastle Light Rail is a key part of the revitalisation of Newcastle, NSW and will provide a frequent and reliable travel option throughout the city centre, running from Newcastle Interchange at Wickham to Pacific Park in Newcastle East, a total of 2.7 kilometers in length. Approximately one-third of the route, along Hunter and Scott Streets, is subject to potential shallow underground mine workings. The extent of mining and seams mined is unclear. Convicts mined the Yard Seam and overlying Dudley (Dirty) Seam in Newcastle sometime between 1800 and 1830. The Australian Agricultural Company mined the Yard Seam from about 1831 to the 1860s in the alignment area. The Yard Seam was about 3 feet (0.9m) thick, and therefore, known as the Yard Seam. Mine maps do not exist for the workings in the area of interest and it was unclear if both or just one seam was mined. Information from 1830s geological mapping and other data showing shaft locations were used along Scott Street and information from the 1908 Royal Commission was used along Hunter Street to develop an investigation program. In addition, mining was encountered for several sites to the south of the alignment at depths of about 7 m to 25 m. Based on the anticipated depths of mining, it was considered prudent to assess the potential for sinkhole development on the proposed alignment and realigned underground utilities and to obtain approval for the work from Subsidence Advisory NSW (SA NSW). The assessment consisted of a desktop study, followed by a subsurface investigation. Four boreholes were drilled along Scott Street and three boreholes were drilled along Hunter Street using HQ coring techniques in the rock. The placement of boreholes was complicated by the presence of utilities in the roadway and traffic constraints. All the boreholes encountered the Yard Seam, with conditions varying from unmined coal to an open void, indicating the presence of mining. The geotechnical information obtained from the boreholes was expanded by using various downhole techniques including; borehole camera, borehole sonar, and downhole geophysical logging. The camera provided views of the rock and helped to explain zones of no recovery. In addition, timber props within the void were observed. Borehole sonar was performed in the void and provided an indication of room size as well as the presence of timber props within the room. Downhole geophysical logging was performed in the boreholes to measure density, natural gamma, and borehole deviation. The data helped confirm that all the mining was in the Yard Seam and that the overlying Dudley Seam had been eroded in the past over much of the alignment. In summary, the assessment allowed the potential for sinkhole subsidence to be assessed and a mitigation approach developed to allow conditional approval by SA NSW. It also confirmed the presence of mining in the Yard Seam, the depth to the seam and mining conditions, and indicated that subsidence did not appear to have occurred in the past.

Keywords: downhole investigation techniques, drilling, mine subsidence, yard seam

Procedia PDF Downloads 288
25325 Multicriteria for Optimal Land Use after Mining

Authors: Carla Idely Palencia-Aguilar

Abstract:

Mining in Colombia represents around 2% of the GDP (USD 8 billion in 2018), with main productions represented by coal, nickel, gold, silver, emeralds, iron, limestone, gypsum, among others. Sand and Gravel had been decreasing its participation of the GDP with a reduction of 33.2 million m3 in 2015, to 27.4 in 2016, 22.7 in 2017 and 15.8 in 2018, with a consumption of approximately 3 tons/inhabitant. However, with the new government policies it is expected to increase in the following years. Mining causes temporary environmental impacts, once restoration and rehabilitation takes place, social, environmental and economic benefits are higher than the initial state. A way to demonstrate how the mining interventions had contributed to improve the characteristics of the region after sand and gravel mining, the NDVI (Normalized Difference Vegetation Index) from MODIS and ASTER were employed. The histograms show not only increments of vegetation in the area (8 times higher), but also topographies similar to the ones before the intervention, according to the application for sustainable development selected: either agriculture, forestry, cattle raising, artificial wetlands or do nothing. The decision was based upon a Multicriteria analysis for optimal land use, with three main variables: geostatistics, evapotranspiration and groundwater characteristics. The use of remote sensing, meteorological stations, piezometers, sunphotometers, geoelectric analysis among others; provide the information required for the multicriteria decision. For cattle raising and agricultural applications (where various crops were implemented), conservation of products were tested by means of nanotechnology. The results showed a duration of 2 years with no chemicals added for preservation and concentration of vitamins of the tested products.

Keywords: ASTER, Geostatistics, MODIS, Multicriteria

Procedia PDF Downloads 106
25324 Lead Removal From Ex- Mining Pond Water by Electrocoagulation: Kinetics, Isotherm, and Dynamic Studies

Authors: Kalu Uka Orji, Nasiman Sapari, Khamaruzaman W. Yusof

Abstract:

Exposure of galena (PbS), tealite (PbSnS2), and other associated minerals during mining activities release lead (Pb) and other heavy metals into the mining water through oxidation and dissolution. Heavy metal pollution has become an environmental challenge. Lead, for instance, can cause toxic effects to human health, including brain damage. Ex-mining pond water was reported to contain lead as high as 69.46 mg/L. Conventional treatment does not easily remove lead from water. A promising and emerging treatment technology for lead removal is the application of the electrocoagulation (EC) process. However, some of the problems associated with EC are systematic reactor design, selection of maximum EC operating parameters, scale-up, among others. This study investigated an EC process for the removal of lead from synthetic ex-mining pond water using a batch reactor and Fe electrodes. The effects of various operating parameters on lead removal efficiency were examined. The results obtained indicated that the maximum removal efficiency of 98.6% was achieved at an initial PH of 9, the current density of 15mA/cm2, electrode spacing of 0.3cm, treatment time of 60 minutes, Liquid Motion of Magnetic Stirring (LM-MS), and electrode arrangement = BP-S. The above experimental data were further modeled and optimized using a 2-Level 4-Factor Full Factorial design, a Response Surface Methodology (RSM). The four factors optimized were the current density, electrode spacing, electrode arrangements, and Liquid Motion Driving Mode (LM). Based on the regression model and the analysis of variance (ANOVA) at 0.01%, the results showed that an increase in current density and LM-MS increased the removal efficiency while the reverse was the case for electrode spacing. The model predicted the optimal lead removal efficiency of 99.962% with an electrode spacing of 0.38 cm alongside others. Applying the predicted parameters, the lead removal efficiency of 100% was actualized. The electrode and energy consumptions were 0.192kg/m3 and 2.56 kWh/m3 respectively. Meanwhile, the adsorption kinetic studies indicated that the overall lead adsorption system belongs to the pseudo-second-order kinetic model. The adsorption dynamics were also random, spontaneous, and endothermic. The higher temperature of the process enhances adsorption capacity. Furthermore, the adsorption isotherm fitted the Freundlish model more than the Langmuir model; describing the adsorption on a heterogeneous surface and showed good adsorption efficiency by the Fe electrodes. Adsorption of Pb2+ onto the Fe electrodes was a complex reaction, involving more than one mechanism. The overall results proved that EC is an efficient technique for lead removal from synthetic mining pond water. The findings of this study would have application in the scale-up of EC reactor and in the design of water treatment plants for feed-water sources that contain lead using the electrocoagulation method.

Keywords: ex-mining water, electrocoagulation, lead, adsorption kinetics

Procedia PDF Downloads 130
25323 Combined Safety and Cybersecurity Risk Assessment for Intelligent Distributed Grids

Authors: Anders Thorsén, Behrooz Sangchoolie, Peter Folkesson, Ted Strandberg

Abstract:

As more parts of the power grid become connected to the internet, the risk of cyberattacks increases. To identify the cybersecurity threats and subsequently reduce vulnerabilities, the common practice is to carry out a cybersecurity risk assessment. For safety classified systems and products, there is also a need for safety risk assessments in addition to the cybersecurity risk assessment in order to identify and reduce safety risks. These two risk assessments are usually done separately, but since cybersecurity and functional safety are often related, a more comprehensive method covering both aspects is needed. Some work addressing this has been done for specific domains like the automotive domain, but more general methods suitable for, e.g., intelligent distributed grids, are still missing. One such method from the automotive domain is the Security-Aware Hazard Analysis and Risk Assessment (SAHARA) method that combines safety and cybersecurity risk assessments. This paper presents an approach where the SAHARA method has been modified in order to be more suitable for larger distributed systems. The adapted SAHARA method has a more general risk assessment approach than the original SAHARA. The proposed method has been successfully applied on two use cases of an intelligent distributed grid.

Keywords: intelligent distribution grids, threat analysis, risk assessment, safety, cybersecurity

Procedia PDF Downloads 126
25322 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse

Procedia PDF Downloads 384
25321 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data

Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill

Abstract:

Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.

Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function

Procedia PDF Downloads 254
25320 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop

Authors: Anuta Mukherjee, Saswati Mukherjee

Abstract:

Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.

Keywords: sentiment analysis, twitter, collision theory, discourse analysis

Procedia PDF Downloads 505
25319 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: latent dirichlet allocation, R program, text mining, topic model, user generated contents, visualization

Procedia PDF Downloads 164
25318 Recommender System Based on Mining Graph Databases for Data-Intensive Applications

Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi

Abstract:

In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.

Keywords: graph databases, NLP, recommendation systems, similarity metrics

Procedia PDF Downloads 77
25317 A Hybrid Recommendation System Based on Association Rules

Authors: Ahmed Mohammed Alsalama

Abstract:

Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of the current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose a hybrid framework recommendation system to be applied on two-dimensional spaces (User x Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.

Keywords: data mining, association rules, recommendation systems, hybrid systems

Procedia PDF Downloads 435
25316 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)

Procedia PDF Downloads 248
25315 Power Quality Improvement Using UPQC Integrated with Distributed Generation Network

Authors: B. Gopal, Pannala Krishna Murthy, G. N. Sreenivas

Abstract:

The increasing demand of electric power is giving an emphasis on the need for the maximum utilization of renewable energy sources. On the other hand maintaining power quality to satisfaction of utility is an essential requirement. In this paper the design aspects of a Unified Power Quality Conditioner integrated with photovoltaic system in a distributed generation is presented. The proposed system consist of series inverter, shunt inverter are connected back to back on the dc side and share a common dc-link capacitor with Distributed Generation through a boost converter. The primary task of UPQC is to minimize grid voltage and load current disturbances along with reactive and harmonic power compensation. In addition to primary tasks of UPQC, other functionalities such as compensation of voltage interruption and active power transfer to the load and grid in both islanding and interconnected mode have been addressed. The simulation model is design in MATLAB/ Simulation environment and the results are in good agreement with the published work.

Keywords: distributed generation (DG), interconnected mode, islanding mode, maximum power point tracking (mppt), power quality (PQ), unified power quality conditioner (UPQC), photovoltaic array (PV)

Procedia PDF Downloads 483
25314 Feature Selection for Production Schedule Optimization in Transition Mines

Authors: Angelina Anani, Ignacio Ortiz Flores, Haitao Li

Abstract:

The use of underground mining methods have increased significantly over the past decades. This increase has also been spared on by several mines transitioning from surface to underground mining. However, determining the transition depth can be a challenging task, especially when coupled with production schedule optimization. Several researchers have simplified the problem by excluding operational features relevant to production schedule optimization. Our research objective is to investigate the extent to which operational features of transition mines accounted for affect the optimal production schedule. We also provide a framework for factors to consider in production schedule optimization for transition mines. An integrated mixed-integer linear programming (MILP) model is developed that maximizes the NPV as a function of production schedule and transition depth. A case study is performed to validate the model, with a comparative sensitivity analysis to obtain operational insights.

Keywords: underground mining, transition mines, mixed-integer linear programming, production schedule

Procedia PDF Downloads 137
25313 Effect of Bacillus Pumilus Strains on Heavy Metal Accumulation in Lettuce Grown on Contaminated Soil

Authors: Sabeen Alam, Mehboob Alam

Abstract:

The research work entitled “Effect of Bacillus pumilus strains on heavy metal accumulation in lettuce grown on contaminated soil” focused on functional role of Bacillus pumilus strains inoculated with lettuce seed in mitigating heavy metal in chromite mining soil. In this experiment, factor A was three Bacillus pumilus strains (sequence C-2PMW-8, C-1 SSK-8 and C-1 PWK-7) while soil used for this experiment was collected from Prang Ghar mining site and lettuce seeds were grown in three levels of chromite mining soil (2.27, 4.65 and 7.14 %). For mining soil minimum days to germinate noted in lettuce grown on garden soil inoculated with sequence. Maximum germination percentage noted was for C-1 SSK-8 grown on garden soil, maximum lettuce height for sequence C-2 PWM-8, fresh leaf weight for C-1 PWK-7 inoculated lettuce, dry weight of lettuce leaf for lettuce inoculated with C-1 SSK-8 and C-1 PWK-7 strains, number of leaves per plant for lettuce inoculated with C-1 SSK-8, leaf area for C-2 PMW-8 inoculated lettuce, survival percentage for C-1 SSK-8 treated lettuce and chlorophyll content for C-2 PMW-8. Results related to heavy metals accumulation showed that minimum chromium was in lettuce and in soil for all three sequences, cadmium (Cd) in lettuce and in soil for all three sequences, manganese (Mn) in lettuce and in soil for three sequences, lead (Pb) in lettuce and in soil for three sequences. It can be concluded that chromite mining soil significantly reduced the growth and survival of lettuce, but when lettuce was inoculated with Bacillus.pumilus strains, it enhances growth and survival. Similarly, minimum heavy metal accumulation in plant and soil, regardless of type of Bacillus pumilus used, all three sequences has same mitigating effect on heavy metal in both soil and lettuce. All the three Bacillus pumilus strains ensured reduction in heavy metals content (Mn, Cd, Cr) in lettuce, below the maximum permissible limits of WHO 2011.

Keywords: bacillus pumilus, heavy metals, permissible limits, lettuce, chromite mining soil, mitigating effect

Procedia PDF Downloads 24
25312 From Two-Way to Multi-Way: A Comparative Study for Map-Reduce Join Algorithms

Authors: Marwa Hussien Mohamed, Mohamed Helmy Khafagy

Abstract:

Map-Reduce is a programming model which is widely used to extract valuable information from enormous volumes of data. Map-reduce designed to support heterogeneous datasets. Apache Hadoop map-reduce used extensively to uncover hidden pattern like data mining, SQL, etc. The most important operation for data analysis is joining operation. But, map-reduce framework does not directly support join algorithm. This paper explains and compares two-way and multi-way map-reduce join algorithms for map reduce also we implement MR join Algorithms and show the performance of each phase in MR join algorithms. Our experimental results show that map side join and map merge join in two-way join algorithms has the longest time according to preprocessing step sorting data and reduce side cascade join has the longest time at Multi-Way join algorithms.

Keywords: Hadoop, MapReduce, multi-way join, two-way join, Ubuntu

Procedia PDF Downloads 459
25311 The Human Right to a Safe, Clean and Healthy Environment in Corporate Social Responsibility's Strategies: An Approach to Understanding Mexico's Mining Sector

Authors: Thalia Viveros-Uehara

Abstract:

The virtues of Corporate Social Responsibility (CSR) are explored widely in the academic literature. However, few studies address its link to human rights, per se; specifically, the right to a safe, clean and healthy environment. Fewer still are the research works in this area that relate to developing countries, where a number of areas are biodiversity hotspots. In Mexico, despite the rise and evolution of CSR schemes, grave episodes of pollution persist, especially those caused by the mining industry. These cases set up the question of the correspondence between the current CSR practices of mining companies in the country and their responsibility to respect the right to a safe, clean and healthy environment. The present study approaches precisely such a bridge, which until now has not been fully tackled in light of Mexico's 2011 constitutional human rights amendment and the United Nation's Guiding Principles on Business and Human Rights (UN Guiding Principles), adopted by the Human Rights Council in 2011. To that aim, it initially presents a contextual framework; it then explores qualitatively the adoption of human rights’ language in the CSR strategies of the three main mining companies in Mexico, and finally, it examines their standing with respect to the UN Guiding Principles. The results reveal that human rights are included in the RSE strategies of the analysed businesses, at least at the rhetoric level; however, they do not embrace the right to a safe, clean and healthy environment as such. Moreover, we conclude that despite the finding that corporations publicly express their commitment to respect human rights, some operational weaknesses that hamper the exercise of such responsibility persist; for example, the systematic lack of human rights impact assessments per mining unit, the denial of actual and publicly-known negative episodes on the environment linked directly to their operations, and the absence of effective mechanisms to remediate adverse impacts.

Keywords: corporate social responsibility, environmental impacts, human rights, right to a safe, clean and healthy environment, mining industry

Procedia PDF Downloads 309
25310 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: early warning system, knowledge management, market prediction, topic modeling.

Procedia PDF Downloads 308
25309 Distributed Coverage Control by Robot Networks in Unknown Environments Using a Modified EM Algorithm

Authors: Mohammadhosein Hasanbeig, Lacra Pavel

Abstract:

In this paper, we study a distributed control algorithm for the problem of unknown area coverage by a network of robots. The coverage objective is to locate a set of targets in the area and to minimize the robots’ energy consumption. The robots have no prior knowledge about the location and also about the number of the targets in the area. One efficient approach that can be used to relax the robots’ lack of knowledge is to incorporate an auxiliary learning algorithm into the control scheme. A learning algorithm actually allows the robots to explore and study the unknown environment and to eventually overcome their lack of knowledge. The control algorithm itself is modeled based on game theory where the network of the robots use their collective information to play a non-cooperative potential game. The algorithm is tested via simulations to verify its performance and adaptability.

Keywords: distributed control, game theory, multi-agent learning, reinforcement learning

Procedia PDF Downloads 430
25308 Distributed Actor System for Traffic Simulation

Authors: Han Wang, Zhuoxian Dai, Zhe Zhu, Hui Zhang, Zhenyu Zeng

Abstract:

In traditional microscopic traffic simulation, various approaches have been suggested to implement the single-agent behaviors about lane changing and intelligent driver model. However, when it comes to very large metropolitan areas, microscopic traffic simulation requires more resources and become time-consuming, then macroscopic traffic simulation aggregate trends of interests rather than individual vehicle traces. In this paper, we describe the architecture and implementation of the actor system of microscopic traffic simulation, which exploits the distributed architecture of modern-day cloud computing. The results demonstrate that our architecture achieves high-performance and outperforms all the other traditional microscopic software in all tasks. To the best of our knowledge, this the first system that enables single-agent behavior in macroscopic traffic simulation. We thus believe it contributes to a new type of system for traffic simulation, which could provide individual vehicle behaviors in microscopic traffic simulation.

Keywords: actor system, cloud computing, distributed system, traffic simulation

Procedia PDF Downloads 166
25307 Probabilistic Approach of Dealing with Uncertainties in Distributed Constraint Optimization Problems and Situation Awareness for Multi-agent Systems

Authors: Sagir M. Yusuf, Chris Baber

Abstract:

In this paper, we describe how Bayesian inferential reasoning will contributes in obtaining a well-satisfied prediction for Distributed Constraint Optimization Problems (DCOPs) with uncertainties. We also demonstrate how DCOPs could be merged to multi-agent knowledge understand and prediction (i.e. Situation Awareness). The DCOPs functions were merged with Bayesian Belief Network (BBN) in the form of situation, awareness, and utility nodes. We describe how the uncertainties can be represented to the BBN and make an effective prediction using the expectation-maximization algorithm or conjugate gradient descent algorithm. The idea of variable prediction using Bayesian inference may reduce the number of variables in agents’ sampling domain and also allow missing variables estimations. Experiment results proved that the BBN perform compelling predictions with samples containing uncertainties than the perfect samples. That is, Bayesian inference can help in handling uncertainties and dynamism of DCOPs, which is the current issue in the DCOPs community. We show how Bayesian inference could be formalized with Distributed Situation Awareness (DSA) using uncertain and missing agents’ data. The whole framework was tested on multi-UAV mission for forest fire searching. Future work focuses on augmenting existing architecture to deal with dynamic DCOPs algorithms and multi-agent information merging.

Keywords: DCOP, multi-agent reasoning, Bayesian reasoning, swarm intelligence

Procedia PDF Downloads 95
25306 Strategies to Enhance Compliance of Health and Safety Standards at the Selected Mining Industries in Limpopo Province, South Africa: Occupational Health Nurse’s Perspective

Authors: Livhuwani Muthelo

Abstract:

The health and safety of the miners in the South African mining industry are guided by the regulations and standards which are anticipated to promote a healthy work environment and fatalities. It is of utmost importance for the miners to comply with these regulations/standards to protect themselves from potential occupational health and safety risks, accidents, and fatalities. The purpose of this study was to develop and validate strategies to enhance compliance with the Health and safety standards within the mining industries of Limpopo province in South Africa. A mixed-method exploratory sequential research design was adopted. The population consisted of 5350 miners. Purposive sampling was used to select the participants in the qualitative strand and stratified random sampling in the quantitative strand. Semi-structured interviews were conducted among the occupational health nurse practitioners and the health and safety team. Thematic analysis was used to generate an understanding of the interviews. In the quantitative strand, a survey was conducted using a self-administered questionnaire. Data were analysed using SPSS version 26.0. A descriptive statistical test was used in the analysis of data including frequencies, means, and standard deviation. Cronbach's alpha test was used to measure internal consistency. The integrated results revealed that there are diverse experiences related to health and safety standards compliance among the mineworkers. The main findings were challenges related to leadership compliance and also related to the cost of maintaining safety, Miner's behavior-related challenges; the impact of non-compliance on the overall health of the miners was also described, the conflict between production and safety. Health and safety compliance is not just mere compliance with regulations and standards but a culture that warrants the miners and organization to take responsibility for their behavior and actions towards health and safety. Thus taking responsibility for your well-being and other miners.

Keywords: perceptions, compliance, health and safety, legislation, standards, miners

Procedia PDF Downloads 72
25305 Shark Detection and Classification with Deep Learning

Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti

Abstract:

Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.

Keywords: classification, data mining, Instagram, remote monitoring, sharks

Procedia PDF Downloads 91
25304 Simulation of a Cost Model Response Requests for Replication in Data Grid Environment

Authors: Kaddi Mohammed, A. Benatiallah, D. Benatiallah

Abstract:

Data grid is a technology that has full emergence of new challenges, such as the heterogeneity and availability of various resources and geographically distributed, fast data access, minimizing latency and fault tolerance. Researchers interested in this technology address the problems of the various systems related to the industry such as task scheduling, load balancing and replication. The latter is an effective solution to achieve good performance in terms of data access and grid resources and better availability of data cost. In a system with duplication, a coherence protocol is used to impose some degree of synchronization between the various copies and impose some order on updates. In this project, we present an approach for placing replicas to minimize the cost of response of requests to read or write, and we implement our model in a simulation environment. The placement techniques are based on a cost model which depends on several factors, such as bandwidth, data size and storage nodes.

Keywords: response time, query, consistency, bandwidth, storage capacity, CERN

Procedia PDF Downloads 249
25303 Modifying Byzantine Fault Detection Using Disjoint Paths

Authors: Mehmet Hakan Karaata, Ali Hamdan, Omer Yusuf Adam Mohamed

Abstract:

Consider a distributed system that delivers messages from a process to another. Such a system is often required to deliver each message to its destination regardless of whether or not the system components experience arbitrary forms of faults. In addition, each message received by the destination must be a message sent by a system process. In this paper, we first identify the necessary and sufficient conditions to detect some restricted form of Byzantine faults referred to as modifying Byzantine faults. An observable form of a Byzantine fault whose effect is limited to the modification of a message metadata or content, timing and omission faults, and message replay is referred to as a modifying Byzantine fault. We then present a distributed protocol to detect modifying Byzantine faults using optimal number of messages over node-disjoint paths.

Keywords: Byzantine faults, distributed systems, fault detection, network pro- tocols, node-disjoint paths

Procedia PDF Downloads 536
25302 Helping the Development of Public Policies with Knowledge of Criminal Data

Authors: Diego De Castro Rodrigues, Marcelo B. Nery, Sergio Adorno

Abstract:

The project aims to develop a framework for social data analysis, particularly by mobilizing criminal records and applying descriptive computational techniques, such as associative algorithms and extraction of tree decision rules, among others. The methods and instruments discussed in this work will enable the discovery of patterns, providing a guided means to identify similarities between recurring situations in the social sphere using descriptive techniques and data visualization. The study area has been defined as the city of São Paulo, with the structuring of social data as the central idea, with a particular focus on the quality of the information. Given this, a set of tools will be validated, including the use of a database and tools for visualizing the results. Among the main deliverables related to products and the development of articles are the discoveries made during the research phase. The effectiveness and utility of the results will depend on studies involving real data, validated both by domain experts and by identifying and comparing the patterns found in this study with other phenomena described in the literature. The intention is to contribute to evidence-based understanding and decision-making in the social field.

Keywords: social data analysis, criminal records, computational techniques, data mining, big data

Procedia PDF Downloads 53
25301 A Hybrid Distributed Algorithm for Multi-Objective Dynamic Flexible Job Shop Scheduling Problem

Authors: Aydin Teymourifar, Gurkan Ozturk

Abstract:

In this paper, a hybrid distributed algorithm has been suggested for multi-objective dynamic flexible job shop scheduling problem. The proposed algorithm is high level, in which several algorithms search the space on different machines simultaneously also it is a hybrid algorithm that takes advantages of the artificial intelligence, evolutionary and optimization methods. Distribution is done at different levels and new approaches are used for design of the algorithm. Apache spark and Hadoop frameworks have been used for the distribution of the algorithm. The Pareto optimality approach is used for solving the multi-objective benchmarks. The suggested algorithm that is able to solve large-size problems in short times has been compared with the successful algorithms of the literature. The results prove high speed and efficiency of the algorithm.

Keywords: distributed algorithms, apache-spark, Hadoop, flexible dynamic job shop scheduling, multi-objective optimization

Procedia PDF Downloads 323
25300 Reimagine and Redesign: Augmented Reality Digital Technologies and 21st Century Education

Authors: Jasmin Cowin

Abstract:

Augmented reality digital technologies, big data, and the need for a teacher workforce able to meet the demands of a knowledge-based society are poised to lead to major changes in the field of education. This paper explores applications and educational use cases of augmented reality digital technologies for educational organizations during the Fourth Industrial Revolution. The Fourth Industrial Revolution requires vision, flexibility, and innovative educational conduits by governments and educational institutions to remain competitive in a global economy. Educational organizations will need to focus on teaching in and for a digital age to continue offering academic knowledge relevant to 21st-century markets and changing labor force needs. Implementation of contemporary disciplines will need to be embodied through learners’ active knowledge-making experiences while embracing ubiquitous accessibility. The power of distributed ledger technology promises major streamlining for educational record-keeping, degree conferrals, and authenticity guarantees. Augmented reality digital technologies hold the potential to restructure educational philosophies and their underpinning pedagogies thereby transforming modes of delivery. Structural changes in education and governmental planning are already increasing through intelligent systems and big data. Reimagining and redesigning education on a broad scale is required to plan and implement governmental and institutional changes to harness innovative technologies while moving away from the big schooling machine.

Keywords: fourth industrial revolution, artificial intelligence, big data, education, augmented reality digital technologies, distributed ledger technology

Procedia PDF Downloads 250
25299 Energy System Analysis Using Data-Driven Modelling and Bayesian Methods

Authors: Paul Rowley, Adam Thirkill, Nick Doylend, Philip Leicester, Becky Gough

Abstract:

The dynamic performance of all energy generation technologies is impacted to varying degrees by the stochastic properties of the wider system within which the generation technology is located. This stochasticity can include the varying nature of ambient renewable energy resources such as wind or solar radiation, or unpredicted changes in energy demand which impact upon the operational behaviour of thermal generation technologies. An understanding of these stochastic impacts are especially important in contexts such as highly distributed (or embedded) generation, where an understanding of issues affecting the individual or aggregated performance of high numbers of relatively small generators is especially important, such as in ESCO projects. Probabilistic evaluation of monitored or simulated performance data is one technique which can provide an insight into the dynamic performance characteristics of generating systems, both in a prognostic sense (such as the prediction of future performance at the project’s design stage) as well as in a diagnostic sense (such as in the real-time analysis of underperforming systems). In this work, we describe the development, application and outcomes of a new approach to the acquisition of datasets suitable for use in the subsequent performance and impact analysis (including the use of Bayesian approaches) for a number of distributed generation technologies. The application of the approach is illustrated using a number of case studies involving domestic and small commercial scale photovoltaic, solar thermal and natural gas boiler installations, and the results as presented show that the methodology offers significant advantages in terms of plant efficiency prediction or diagnosis, along with allied environmental and social impacts such as greenhouse gas emission reduction or fuel affordability.

Keywords: renewable energy, dynamic performance simulation, Bayesian analysis, distributed generation

Procedia PDF Downloads 471