Search results for: parallel data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25828

Search results for: parallel data mining

25348 Investigating Dynamic Transition Process of Issues Using Unstructured Text Analysis

Authors: Myungsu Lim, William Xiu Shun Wong, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Namgyu Kim

Abstract:

The amount of real-time data generated through various mass media has been increasing rapidly. In this study, we had performed topic analysis by using the unstructured text data that is distributed through news article. As one of the most prevalent applications of topic analysis, the issue tracking technique investigates the changes of the social issues that identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has limitation that it cannot discover dynamic mutation process of complex social issues. The purpose of this study is to overcome the limitations of the existing issue tracking method. We first derived core issues of each period, and then discover the dynamic mutation process of various issues. In this study, we further analyze the mutation process from the perspective of the issues categories, in order to figure out the pattern of issue flow, including the frequency and reliability of the pattern. In other words, this study allows us to understand the components of the complex issues by tracking the dynamic history of issues. This methodology can facilitate a clearer understanding of complex social phenomena by providing mutation history and related category information of the phenomena.

Keywords: Data Mining, Issue Tracking, Text Mining, topic Analysis, topic Detection, Trend Detection

Procedia PDF Downloads 393
25347 A Hybrid Recommendation System Based on Association Rules

Authors: Ahmed Mohammed Alsalama

Abstract:

Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of the current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose a hybrid framework recommendation system to be applied on two-dimensional spaces (User x Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.

Keywords: data mining, association rules, recommendation systems, hybrid systems

Procedia PDF Downloads 448
25346 SPARK: An Open-Source Knowledge Discovery Platform That Leverages Non-Relational Databases and Massively Parallel Computational Power for Heterogeneous Genomic Datasets

Authors: Thilina Ranaweera, Enes Makalic, John L. Hopper, Adrian Bickerstaffe

Abstract:

Data are the primary asset of biomedical researchers, and the engine for both discovery and research translation. As the volume and complexity of research datasets increase, especially with new technologies such as large single nucleotide polymorphism (SNP) chips, so too does the requirement for software to manage, process and analyze the data. Researchers often need to execute complicated queries and conduct complex analyzes of large-scale datasets. Existing tools to analyze such data, and other types of high-dimensional data, unfortunately suffer from one or more major problems. They typically require a high level of computing expertise, are too simplistic (i.e., do not fit realistic models that allow for complex interactions), are limited by computing power, do not exploit the computing power of large-scale parallel architectures (e.g. supercomputers, GPU clusters etc.), or are limited in the types of analysis available, compounded by the fact that integrating new analysis methods is not straightforward. Solutions to these problems, such as those developed and implemented on parallel architectures, are currently available to only a relatively small portion of medical researchers with access and know-how. The past decade has seen a rapid expansion of data management systems for the medical domain. Much attention has been given to systems that manage phenotype datasets generated by medical studies. The introduction of heterogeneous genomic data for research subjects that reside in these systems has highlighted the need for substantial improvements in software architecture. To address this problem, we have developed SPARK, an enabling and translational system for medical research, leveraging existing high performance computing resources, and analysis techniques currently available or being developed. It builds these into The Ark, an open-source web-based system designed to manage medical data. SPARK provides a next-generation biomedical data management solution that is based upon a novel Micro-Service architecture and Big Data technologies. The system serves to demonstrate the applicability of Micro-Service architectures for the development of high performance computing applications. When applied to high-dimensional medical datasets such as genomic data, relational data management approaches with normalized data structures suffer from unfeasibly high execution times for basic operations such as insert (i.e. importing a GWAS dataset) and the queries that are typical of the genomics research domain. SPARK resolves these problems by incorporating non-relational NoSQL databases that have been driven by the emergence of Big Data. SPARK provides researchers across the world with user-friendly access to state-of-the-art data management and analysis tools while eliminating the need for high-level informatics and programming skills. The system will benefit health and medical research by eliminating the burden of large-scale data management, querying, cleaning, and analysis. SPARK represents a major advancement in genome research technologies, vastly reducing the burden of working with genomic datasets, and enabling cutting edge analysis approaches that have previously been out of reach for many medical researchers.

Keywords: biomedical research, genomics, information systems, software

Procedia PDF Downloads 261
25345 Applying Big Data Analysis to Efficiently Exploit the Vast Unconventional Tight Oil Reserves

Authors: Shengnan Chen, Shuhua Wang

Abstract:

Successful production of hydrocarbon from unconventional tight oil reserves has changed the energy landscape in North America. The oil contained within these reservoirs typically will not flow to the wellbore at economic rates without assistance from advanced horizontal well and multi-stage hydraulic fracturing. Efficient and economic development of these reserves is a priority of society, government, and industry, especially under the current low oil prices. Meanwhile, society needs technological and process innovations to enhance oil recovery while concurrently reducing environmental impacts. Recently, big data analysis and artificial intelligence become very popular, developing data-driven insights for better designs and decisions in various engineering disciplines. However, the application of data mining in petroleum engineering is still in its infancy. The objective of this research aims to apply intelligent data analysis and data-driven models to exploit unconventional oil reserves both efficiently and economically. More specifically, a comprehensive database including the reservoir geological data, reservoir geophysical data, well completion data and production data for thousands of wells is firstly established to discover the valuable insights and knowledge related to tight oil reserves development. Several data analysis methods are introduced to analysis such a huge dataset. For example, K-means clustering is used to partition all observations into clusters; principle component analysis is applied to emphasize the variation and bring out strong patterns in the dataset, making the big data easy to explore and visualize; exploratory factor analysis (EFA) is used to identify the complex interrelationships between well completion data and well production data. Different data mining techniques, such as artificial neural network, fuzzy logic, and machine learning technique are then summarized, and appropriate ones are selected to analyze the database based on the prediction accuracy, model robustness, and reproducibility. Advanced knowledge and patterned are finally recognized and integrated into a modified self-adaptive differential evolution optimization workflow to enhance the oil recovery and maximize the net present value (NPV) of the unconventional oil resources. This research will advance the knowledge in the development of unconventional oil reserves and bridge the gap between the big data and performance optimizations in these formations. The newly developed data-driven optimization workflow is a powerful approach to guide field operation, which leads to better designs, higher oil recovery and economic return of future wells in the unconventional oil reserves.

Keywords: big data, artificial intelligence, enhance oil recovery, unconventional oil reserves

Procedia PDF Downloads 275
25344 Discovering the Effects of Meteorological Variables on the Air Quality of Bogota, Colombia, by Data Mining Techniques

Authors: Fabiana Franceschi, Martha Cobo, Manuel Figueredo

Abstract:

Bogotá, the capital of Colombia, is its largest city and one of the most polluted in Latin America due to the fast economic growth over the last ten years. Bogotá has been affected by high pollution events which led to the high concentration of PM10 and NO2, exceeding the local 24-hour legal limits (100 and 150 g/m3 each). The most important pollutants in the city are PM10 and PM2.5 (which are associated with respiratory and cardiovascular problems) and it is known that their concentrations in the atmosphere depend on the local meteorological factors. Therefore, it is necessary to establish a relationship between the meteorological variables and the concentrations of the atmospheric pollutants such as PM10, PM2.5, CO, SO2, NO2 and O3. This study aims to determine the interrelations between meteorological variables and air pollutants in Bogotá, using data mining techniques. Data from 13 monitoring stations were collected from the Bogotá Air Quality Monitoring Network within the period 2010-2015. The Principal Component Analysis (PCA) algorithm was applied to obtain primary relations between all the parameters, and afterwards, the K-means clustering technique was implemented to corroborate those relations found previously and to find patterns in the data. PCA was also used on a per shift basis (morning, afternoon, night and early morning) to validate possible variation of the previous trends and a per year basis to verify that the identified trends have remained throughout the study time. Results demonstrated that wind speed, wind direction, temperature, and NO2 are the most influencing factors on PM10 concentrations. Furthermore, it was confirmed that high humidity episodes increased PM2,5 levels. It was also found that there are direct proportional relationships between O3 levels and wind speed and radiation, while there is an inverse relationship between O3 levels and humidity. Concentrations of SO2 increases with the presence of PM10 and decreases with the wind speed and wind direction. They proved as well that there is a decreasing trend of pollutant concentrations over the last five years. Also, in rainy periods (March-June and September-December) some trends regarding precipitations were stronger. Results obtained with K-means demonstrated that it was possible to find patterns on the data, and they also showed similar conditions and data distribution among Carvajal, Tunal and Puente Aranda stations, and also between Parque Simon Bolivar and las Ferias. It was verified that the aforementioned trends prevailed during the study period by applying the same technique per year. It was concluded that PCA algorithm is useful to establish preliminary relationships among variables, and K-means clustering to find patterns in the data and understanding its distribution. The discovery of patterns in the data allows using these clusters as an input to an Artificial Neural Network prediction model.

Keywords: air pollution, air quality modelling, data mining, particulate matter

Procedia PDF Downloads 250
25343 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 576
25342 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data

Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill

Abstract:

Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.

Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function

Procedia PDF Downloads 269
25341 Mining in Peru and Local Governance: Assessing the Contribution of CRS Projects

Authors: Sandra Carrillo Hoyos

Abstract:

Mining activities in South America have significantly grown during the last decades, given the abundance of natural resources, the implemented governmental policies to incentivize foreign investment as well as the boom in international prices for metals and oil between 2002 and 2008. While this context allowed the region to occupy a leading position between the top producers of minerals around the world, it has also meant an increase in socio-environmental conflicts which have generated costs and negative impacts not only for the companies but especially for the governments and local communities.During the latest decade, the mining sector in Peru has faced with the social resistance of a large number of communities, which began organizing actions against the implementation of high investing projects. The dissatisfaction has derived in the prevalence of socio-environmental conflicts associated with mining activities, some of them never solved into an agreement. In order to prevent those socio-environmental conflicts and obtain the social license from local communities, most of the mining companies have developed diverse initiatives within the framework of policies and practices of corporate social responsibility (CSR). This paper has assessed the mining sector’s contribution toward the local development management along the last decade, as part of CSR strategies as well as the policies promoted by the Peruvian State. This assessment found that, in the beginning, these initiatives have been based on a philanthropic approach and were reacting to pressures from local stakeholders to maintain the consent to operate from the surrounding communities as well as to create, as a result, a harmonious atmosphere for operations. Due to the weak State presence, such practices have increased the expectations of communities related to the participation of mining companies in solving structural development problems, especially those related to primary needs, infrastructure, education, health, among others. In other words, this paper was focused on analyze in what extent these initiatives have promoted local empowerment for development planning and integrated management of natural resources from a territorial approach. From this perspective, the analysis demonstrates that, while the design and planning of social investment initiatives have improved due to the sector´s sustainability approach, many companies have developed actions beyond their competence during this process. In some cases, the referenced actions have generated dependency with communities, even though this relationship has not exempted the companies of conflict situations with unfortunate consequences. Furthermore, the social programs developed have not necessarily generated a significant impact in improving the quality of life of affected populations. In fact, it is possible to identify that those regions with high mining resources and investment are facing with a situation of poverty and high dependency on mining production. In spite of the revenues derived from mining industry, local governments have not been able to translate the royalties into sustainable development opportunities. For this reason, the proposed paper suggests some challenges for the mining sector contribution to local development based on the best practices and lessons learnt from a benchmarking for the leading mining companies.

Keywords: corporate social responsibility, local development, mining, socio-environmental conflict

Procedia PDF Downloads 392
25340 Health and Safety Risk Assesment with Electromagnetic Field Exposure for Call Center Workers

Authors: Dilsad Akal

Abstract:

Aim: Companies communicate with each other and with their costumers via call centers. Call centers are defined as stressful because of their uncertain working hours, inadequate relief time, performance based system and heavy workload. In literature, this sector is defined as risky as mining sector by means of health and safety. The aim of this research is to enlight the relatively dark area. Subject and Methods: The collection of data for this study completed during April-May 2015 for the two selected call centers in different parts of Turkey. The applied question mostly investigated the health conditions of call center workers. Electromagnetic field measurements were completed at the same time with applying the question poll. The ratio of employee accessibility noted as 73% for the first call center and 87% for the second. Results: The results of electromagnetic field measurements were as between 371 V/m-32 V/m for the first location and between 370 V/m-61 V/m for the second. The general complaints of the employees for both workplaces can be counted as; inadequate relief time, inadequate air conditioning, disturbance, poor thermal conditions, inadequate or extreme lighting. Furthermore, musculoskeletal discomfort, stress, ear and eye discomfort are main health problems of employees. Conclusion: The measured values and the responses to the question poll were found parallel with the other similar research results in literature. At the end of this survey, a risk map of workplace was prepared in terms of safety and health at work in general and some suggestions for resolution were provided.

Keywords: call center, health and safety, electromagnetic field, risk map

Procedia PDF Downloads 172
25339 Lead and Cadmium Spatial Pattern and Risk Assessment around Coal Mine in Hyrcanian Forest, North Iran

Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch

Abstract:

In this study, the effect of coal mining activities on lead and cadmium concentrations and distribution in soil was investigated in Hyrcanian forest, North Iran. 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity; considered as the controlled area. In order to investigate soil lead and cadmium concentration, one sample was taken from the 0-10 cm in each plot. To study the spatial pattern of soil properties and lead and cadmium concentrations in the mining area, an area of 80×80m2 (the mine as the center) was considered and 80 soil samples were systematic-randomly taken (10 m intervals). Geostatistical analysis was performed via Kriging method and GS+ software (version 5.1). In order to estimate the impact of coal mining activities on soil quality, pollution index was measured. Lead and cadmium concentrations were significantly higher in mine area (Pb: 10.97±0.30, Cd: 184.47±6.26 mg.kg-1) in comparison to control area (Pb: 9.42±0.17, Cd: 131.71±15.77 mg.kg-1). The mean values of the PI index indicate that Pb (1.16) and Cd (1.77) presented slightly polluted. Results of the NIPI index showed that Pb (1.44) and Cd (2.52) presented slight pollution and moderate pollution respectively. Results of variography and kriging method showed that it is possible to prepare interpolation maps of lead and cadmium around the mining areas in Hyrcanian forest. According to results of pollution and risk assessments, forest soil was contaminated by heavy metals (lead and cadmium); therefore, using reclamation and remediation techniques in these areas is necessary.

Keywords: traditional coal mining, heavy metals, pollution indicators, geostatistics, Caspian forest

Procedia PDF Downloads 170
25338 A Two Arm Double Parallel Randomized Controlled Trail of the Effects of Health Education Intervention on Insecticide Treated Nets Use and Its Practices among Pregnant Women Attending Antenatal Clinic: Study Protocol

Authors: Opara Monica, Suriani Ismail, Ahmad Iqmer Nashriq Mohd Nazan

Abstract:

The true magnitude of the mortality and morbidity attributable to malaria worldwide is, at best, a scientific guess, although it is not disputable that the greatest burden is in sub-Saharan Africa. Those at highest risk are children younger than 5 years and pregnant women, particularly primigravidae. Nationally, malaria remains the third leading cause of death and is still considered a major public health problem. Therefore, this study is aimed to assess the effectiveness of health education intervention on insecticide-treated net use and its practices among pregnant women attending antenatal clinics. Materials and Methods: This study will be an intervention study with two arms double parallel randomized controlled trial (blinded) to be conducted in 3 stages. The first stage will develop health belief model (HBM) program, while in the second stage, pregnant women will be recruited, assessed (baseline data), randomized into two arms of the study, and follow-up for six months. The third stage will evaluate the impact of the intervention on HBM and disseminate the findings. Data will be collected with the use of a structured questionnaire which will contain validated tools. The main outcome measurement will be the treatment effect using HBM, while data will be analysed using SPSS, version 22. Discussion: The study will contribute to the existing knowledge on hospital-based care programs for pregnant women in developing countries where the literature is scanty. It will generally give insight into the importance of HBM measurement in interventional studies on malaria and other related infectious diseases in this setting.

Keywords: malaria, health education, insecticide-treated nets, sub-Saharan Africa

Procedia PDF Downloads 112
25337 Strategies to Enhance Compliance of Health and Safety Standards at the Selected Mining Industries in Limpopo Province, South Africa: Occupational Health Nurse’s Perspective

Authors: Livhuwani Muthelo

Abstract:

The health and safety of the miners in the South African mining industry are guided by the regulations and standards which are anticipated to promote a healthy work environment and fatalities. It is of utmost importance for the miners to comply with these regulations/standards to protect themselves from potential occupational health and safety risks, accidents, and fatalities. The purpose of this study was to develop and validate strategies to enhance compliance with the Health and safety standards within the mining industries of Limpopo province in South Africa. A mixed-method exploratory sequential research design was adopted. The population consisted of 5350 miners. Purposive sampling was used to select the participants in the qualitative strand and stratified random sampling in the quantitative strand. Semi-structured interviews were conducted among the occupational health nurse practitioners and the health and safety team. Thematic analysis was used to generate an understanding of the interviews. In the quantitative strand, a survey was conducted using a self-administered questionnaire. Data were analysed using SPSS version 26.0. A descriptive statistical test was used in the analysis of data including frequencies, means, and standard deviation. Cronbach's alpha test was used to measure internal consistency. The integrated results revealed that there are diverse experiences related to health and safety standards compliance among the mineworkers. The main findings were challenges related to leadership compliance and also related to the cost of maintaining safety, Miner's behavior-related challenges; the impact of non-compliance on the overall health of the miners was also described, the conflict between production and safety. Health and safety compliance is not just mere compliance with regulations and standards but a culture that warrants the miners and organization to take responsibility for their behavior and actions towards health and safety. Thus taking responsibility for your well-being and other miners.

Keywords: perceptions, compliance, health and safety, legislation, standards, miners

Procedia PDF Downloads 92
25336 The Risk of Ground Movements After Digging Two Parallel Vertical Tunnel in Urban

Authors: Djelloul Chafia, Demagh Rafik, Kareche Toufik

Abstract:

Human activities, made without precautions, accelerate the degradation of the soil structure and reduces its resistance. Operations, such as tunnel construction may exercise an influence more or less permanent on the grounds which surrounded them, these structures alter soil it is necessary to predict their impacts by suitable measures. This research is a numerical analysis that deals the risks and effects due to the weakening of the soil after digging two parallel vertical circular tunnels in urban areas, and suggests forecasting techniques based essentially on the organization of underground space. The simulations are performed using the finite-difference code FLAC in a two-dimensional case and with an elasto-plastic behavior of the soil.

Keywords: sol, weakening, degradation, prevention, tunnel

Procedia PDF Downloads 551
25335 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop

Authors: Anuta Mukherjee, Saswati Mukherjee

Abstract:

Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.

Keywords: sentiment analysis, twitter, collision theory, discourse analysis

Procedia PDF Downloads 520
25334 Fractional Residue Number System

Authors: Parisa Khoshvaght, Mehdi Hosseinzadeh

Abstract:

During the past few years, the Residue Number System (RNS) has been receiving considerable interest due to its parallel and fault-tolerant properties. This system is a useful tool for Digital Signal Processing (DSP) since it can support parallel, carry-free, high-speed and low power arithmetic. One of the drawbacks of Residue Number System is the fractional numbers, that is, the corresponding circuit is very hard to realize in conventional CMOS technology. In this paper, we propose a method in which the numbers of transistors are significantly reduced. The related delay is extremely diminished, in the first glance we use this method to solve concerning problem of one decimal functional number some how this proposition can be extended to generalize the idea. Another advantage of this method is the independency on the kind of moduli.

Keywords: computer arithmetic, residue number system, number system, one-Hot, VLSI

Procedia PDF Downloads 490
25333 Directional Dust Deposition Measurements: The Influence of Seasonal Changes and the Meteorological Conditions Influencing in Witbank Area and Carletonville Area

Authors: Maphuti Georgina Kwata

Abstract:

Coal mining in Mpumalanga Province is known of contributing to the atmospheric pollution from various activities. Gold mining in North-West Province is known of also contributing to the atmospheric pollution especially with the production of radon gas. In this research directional dust deposition gauge was used to measure source of direction and meteorological data was used to determine the wind rose blowing and the influence of the seasonal changes. Fourteen months of dust collection was undertaken in Witbank Area and Carletonville Area. The results shows that the sources of direction for Ericson Dam its East in February 2010 and Tip Area shows that the source of direction its West in October 2010. In the East direction there were mining operations, power stations which contributed to the East to be the sources of direction. In the West direction there were smelters, power stations and agricultural activities which contributed for the source of direction to be the West direction for Driefontein Mine: East Recreational Village Club. The East of Leslie Williams hospital is the source of direction which also indicated that there dust generating activities such as mining operation, agricultural activities. The meteorological results for Emalahleni Area in summer and winter the wind rose blow with wind speed of 5-10 ms-1 from the East sector. Annual average for the wind rose blow its East South eastern sector with 20 ms-1 and day time the wind rose from northwestern sector with excess of 20 ms-1. The night time wind direction East-eastern direction with a maximum wind speed of 20 ms-1. The meteorogical results for Driefontein Mine show that North-western sector and north-eastern sector wind rose is blowing with 5-10 ms-1 win speed. Day time wind blows from the West sector and night time wind blows from the north sector. In summer the wind blows North-east sector with 5-10 ms-1 and winter wind blows from North-west and it’s also predominant. In spring wind blows from north-east. The conclusion is that not only mining operation where the directional dust deposit gauge were installed contributed to the source of direction also the power stations, smelters, and other activities nearby the mining operation contributed. The recommendations are the dust suppressant for unpaved roads should be used on a regular basis and there should be monitoring of the weather conditions (the wind speed and direction prior to blasting to ensure minimal emissions).

Keywords: directional dust deposition gauge, BS part 5 1747 dust deposit gauge, wind rose, wind blowing

Procedia PDF Downloads 498
25332 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse

Procedia PDF Downloads 396
25331 Effect of Damper Combinations in Series or Parallel on Structural Response

Authors: Ajay Kumar Sinha, Sharad Singh, Anukriti Sinha

Abstract:

Passive energy dissipation method for earthquake protection of structures is undergoing developments for improved performance. Combined use of different types of damping mechanisms has shown positive results in the near past. Different supplemental damping methods like viscous damping, frictional damping and metallic damping are being combined together for optimum performance. The conventional method of connecting passive dampers to structures is a parallel connection between the damper unit and structural member. Researchers are investigating coupling effect of different types of dampers. The most popular choice among the research community is coupling of viscous dampers and frictional dampers. The series and parallel coupling of these damping units are being studied for relative performance of the coupled system on response control of structures against earthquake. In this paper an attempt has been made to couple Fluid Viscous Dampers and Frictional Dampers in series and parallel to form a single unit of damping system. The relative performance of the coupled units has been studied on three dimensional reinforced concrete framed structure. The current theories of structural dynamics in practice for viscous damping and frictional damping have been incorporated in this study. The time history analysis of the structural system with coupled damper units, uncoupled damper units as well as of structural system without any supplemental damping has been performed in this study. The investigations reported in this study show significant improved performance of coupled system. A higher natural frequency of the system outside the forcing frequency has been obtained for structural systems with coupled damper units as against the other cases. The structural response of the structure in terms of storey displacement and storey drift show significant improvement for the case with coupled damper units as against the cases with uncoupled units or without any supplemental damping. The results are promising in terms of improved response of the structure with coupled damper units. Further investigations in this regard for a comparative performance of the series and parallel coupled systems will be carried out to study the optimum behavior of these coupled systems for enhanced response control of structural systems.

Keywords: frictional damping, parallel coupling, response control, series coupling, supplemental damping, viscous damping

Procedia PDF Downloads 443
25330 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)

Procedia PDF Downloads 261
25329 Recommender System Based on Mining Graph Databases for Data-Intensive Applications

Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi

Abstract:

In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.

Keywords: graph databases, NLP, recommendation systems, similarity metrics

Procedia PDF Downloads 96
25328 Shark Detection and Classification with Deep Learning

Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti

Abstract:

Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.

Keywords: classification, data mining, Instagram, remote monitoring, sharks

Procedia PDF Downloads 105
25327 Research of the Three-Dimensional Visualization Geological Modeling of Mine Based on Surpac

Authors: Honggang Qu, Yong Xu, Rongmei Liu, Zhenji Gao, Bin Wang

Abstract:

Today's mining industry is advancing gradually toward digital and visual direction. The three-dimensional visualization geological modeling of mine is the digital characterization of mineral deposits and is one of the key technology of digital mining. Three-dimensional geological modeling is a technology that combines geological spatial information management, geological interpretation, geological spatial analysis and prediction, geostatistical analysis, entity content analysis and graphic visualization in a three-dimensional environment with computer technology and is used in geological analysis. In this paper, the three-dimensional geological modeling of an iron mine through the use of Surpac is constructed, and the weight difference of the estimation methods between the distance power inverse ratio method and ordinary kriging is studied, and the ore body volume and reserves are simulated and calculated by using these two methods. Compared with the actual mine reserves, its result is relatively accurate, so it provides scientific bases for mine resource assessment, reserve calculation, mining design and so on.

Keywords: three-dimensional geological modeling, geological database, geostatistics, block model

Procedia PDF Downloads 70
25326 Text Mining Past Medical History in Electrophysiological Studies

Authors: Roni Ramon-Gonen, Amir Dori, Shahar Shelly

Abstract:

Background and objectives: Healthcare professionals produce abundant textual information in their daily clinical practice. The extraction of insights from all the gathered information, mainly unstructured and lacking in normalization, is one of the major challenges in computational medicine. In this respect, text mining assembles different techniques to derive valuable insights from unstructured textual data, so it has led to being especially relevant in Medicine. Neurological patient’s history allows the clinician to define the patient’s symptoms and along with the result of the nerve conduction study (NCS) and electromyography (EMG) test, assists in formulating a differential diagnosis. Past medical history (PMH) helps to direct the latter. In this study, we aimed to identify relevant PMH, understand which PMHs are common among patients in the referral cohort and documented by the medical staff, and examine the differences by sex and age in a large cohort based on textual format notes. Methods: We retrospectively identified all patients with abnormal NCS between May 2016 to February 2022. Age, gender, and all NCS attributes reports were recorded, including the summary text. All patients’ histories were extracted from the text report by a query. Basic text cleansing and data preparation were performed, as well as lemmatization. Very popular words (like ‘left’ and ‘right’) were deleted. Several words were replaced with their abbreviations. A bag of words approach was used to perform the analyses. Different visualizations which are common in text analysis, were created to easily grasp the results. Results: We identified 5282 unique patients. Three thousand and five (57%) patients had documented PMH. Of which 60.4% (n=1817) were males. The total median age was 62 years (range 0.12 – 97.2 years), and the majority of patients (83%) presented after the age of forty years. The top two documented medical histories were diabetes mellitus (DM) and surgery. DM was observed in 16.3% of the patients, and surgery at 15.4%. Other frequent patient histories (among the top 20) were fracture, cancer (ca), motor vehicle accident (MVA), leg, lumbar, discopathy, back and carpal tunnel release (CTR). When separating the data by sex, we can see that DM and MVA are more frequent among males, while cancer and CTR are less frequent. On the other hand, the top medical history in females was surgery and, after that, DM. Other frequent histories among females are breast cancer, fractures, and CTR. In the younger population (ages 18 to 26), the frequent PMH were surgery, fractures, trauma, and MVA. Discussion: By applying text mining approaches to unstructured data, we were able to better understand which medical histories are more relevant in these circumstances and, in addition, gain additional insights regarding sex and age differences. These insights might help to collect epidemiological demographical data as well as raise new hypotheses. One limitation of this work is that each clinician might use different words or abbreviations to describe the same condition, and therefore using a coding system can be beneficial.

Keywords: abnormal studies, healthcare analytics, medical history, nerve conduction studies, text mining, textual analysis

Procedia PDF Downloads 88
25325 From Two-Way to Multi-Way: A Comparative Study for Map-Reduce Join Algorithms

Authors: Marwa Hussien Mohamed, Mohamed Helmy Khafagy

Abstract:

Map-Reduce is a programming model which is widely used to extract valuable information from enormous volumes of data. Map-reduce designed to support heterogeneous datasets. Apache Hadoop map-reduce used extensively to uncover hidden pattern like data mining, SQL, etc. The most important operation for data analysis is joining operation. But, map-reduce framework does not directly support join algorithm. This paper explains and compares two-way and multi-way map-reduce join algorithms for map reduce also we implement MR join Algorithms and show the performance of each phase in MR join algorithms. Our experimental results show that map side join and map merge join in two-way join algorithms has the longest time according to preprocessing step sorting data and reduce side cascade join has the longest time at Multi-Way join algorithms.

Keywords: Hadoop, MapReduce, multi-way join, two-way join, Ubuntu

Procedia PDF Downloads 474
25324 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: early warning system, knowledge management, market prediction, topic modeling.

Procedia PDF Downloads 328
25323 Switched Ultracapacitors for Maximizing Energy Supply

Authors: Nassouh K. Jaber

Abstract:

Supercapacitors (S.C.) are presently attracting attention for driving general purpose (12VDC to 220VAC) inverters in renewable energy systems. Unfortunately, when the voltage of the S.C supplying the inverter reaches the minimal threshold of 7-8VDC the inverter shuts down leaving the remaining 40% of the valuable energy stored inside the ultracapacitor un-usable. In this work a power electronic circuit is proposed which switches 2 banks of supercapacitors from parallel connection when both are fully charged at 14VDC to serial connection when their voltages drop down to 7 volts, thus keeping the inverter working within its operating limits for a longer time and advantageously tapping almost 92% of the stored energy in the supercapacitors.

Keywords: ultra capacitor, switched ultracapacitors, inverter, supercapacitor, parallel connection, serial connection, battery limitation

Procedia PDF Downloads 407
25322 Petra: Simplified, Scalable Verification Using an Object-Oriented, Compositional Process Calculus

Authors: Aran Hakki, Corina Cirstea, Julian Rathke

Abstract:

Formal methods are yet to be utilized in mainstream software development due to issues in scaling and implementation costs. This work is about developing a scalable, simplified, pragmatic, formal software development method with strong correctness properties and guarantees that are easy prove. The method aims to be easy to learn, use and apply without extensive training and experience in formal methods. Petra is proposed as an object-oriented, process calculus with composable data types and sequential/parallel processes. Petra has a simple denotational semantics, which includes a definition of Correct by Construction. The aim is for Petra is to be standard which can be implemented to execute on various mainstream programming platforms such as Java. Work towards an implementation of Petra as a Java EDSL (Embedded Domain Specific Language) is also discussed.

Keywords: compositionality, formal method, software verification, Java, denotational semantics, rewriting systems, rewriting semantics, parallel processing, object-oriented programming, OOP, programming language, correct by construction

Procedia PDF Downloads 132
25321 Filtering Intrusion Detection Alarms Using Ant Clustering Approach

Authors: Ghodhbani Salah, Jemili Farah

Abstract:

With the growth of cyber attacks, information safety has become an important issue all over the world. Many firms rely on security technologies such as intrusion detection systems (IDSs) to manage information technology security risks. IDSs are considered to be the last line of defense to secure a network and play a very important role in detecting large number of attacks. However the main problem with today’s most popular commercial IDSs is generating high volume of alerts and huge number of false positives. This drawback has become the main motivation for many research papers in IDS area. Hence, in this paper we present a data mining technique to assist network administrators to analyze and reduce false positive alarms that are produced by an IDS and increase detection accuracy. Our data mining technique is unsupervised clustering method based on hybrid ANT algorithm. This algorithm discovers clusters of intruders’ behavior without prior knowledge of a possible number of classes, then we apply K-means algorithm to improve the convergence of the ANT clustering. Experimental results on real dataset show that our proposed approach is efficient with high detection rate and low false alarm rate.

Keywords: intrusion detection system, alarm filtering, ANT class, ant clustering, intruders’ behaviors, false alarms

Procedia PDF Downloads 395
25320 Heritage Value and Industrial Tourism Potential of the Urals, Russia

Authors: Anatoly V. Stepanov, Maria Y. Ilyushkina, Alexander S. Burnasov

Abstract:

Expansion of tourism, especially after WWII, has led to significant improvements in the regional infrastructure. The present study has revealed a lot of progress in the advancement of industrial heritage narrative in the Central Urals. The evidence comes from the general public’s increased fascination with some of Europe’s oldest mining and industrial sites, and the agreement of many stakeholders that the Urals industrial heritage should be preserved. The development of tourist sites in Nizhny Tagil and Nevyansk, gold-digging in Beryosovsky, gemstone search in Murzinka, and the progress with the Urals Gemstone Ring project are the examples showing the immense opportunities of industrial heritage tourism development in the region that are still to be realized. Regardless of the economic future of the Central Urals, whether it will remain an industrial region or experience a deeper deindustrialization, the sprouts of the industrial heritage tourism should be advanced and amplified for the benefit of local communities and the tourist community at large as it is hard to imagine a more suitable site for the discovery of industrial and mining heritage than the Central Urals Region of Russia.

Keywords: industrial heritage, mining heritage, Central Urals, Russia

Procedia PDF Downloads 122
25319 Application of Remote Sensing Technique on the Monitoring of Mine Eco-Environment

Authors: Haidong Li, Weishou Shen, Guoping Lv, Tao Wang

Abstract:

Aiming to overcome the limitation of the application of traditional remote sensing (RS) technique in the mine eco-environmental monitoring, in this paper, we first classified the eco-environmental damages caused by mining activities and then introduced the principle, classification and characteristics of the Light Detection and Ranging (LiDAR) technique. The potentiality of LiDAR technique in the mine eco-environmental monitoring was analyzed, particularly in extracting vertical structure parameters of vegetation, through comparing the feasibility and applicability of traditional RS method and LiDAR technique in monitoring different types of indicators. The application situation of LiDAR technique in extracting typical mine indicators, such as land destruction in mining areas, damage of ecological integrity and natural soil erosion. The result showed that the LiDAR technique has the ability to monitor most of the mine eco-environmental indicators, and exhibited higher accuracy comparing with traditional RS technique, specifically speaking, the applicability of LiDAR technique on each indicator depends on the accuracy requirement of mine eco-environmental monitoring. In the item of large mine, LiDAR three-dimensional point cloud data not only could be used as the complementary data source of optical RS, Airborne/Satellite LiDAR could also fulfill the demand of extracting vertical structure parameters of vegetation in large areas.

Keywords: LiDAR, mine, ecological damage, monitoring, traditional remote sensing technique

Procedia PDF Downloads 388