Search results for: Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25529

Search results for: Data Mining

24839 Infrastructure Project Management and Implementation: A Case Study Of the Mokolo-Crocodile Water Augmentation Project in South Africa

Authors: Elkington Sibusiso Mnguni

Abstract:

The Mokolo-Crocodile Water Augmentation Project (MCWAP) is located in the Limpopo Province in the northern-western part of South Africa. Its purpose is to increase water supply by 30 million cubic meters per year to meet current and future demand for users, including power stations, mining houses, and the local municipality in the Lephalale area. This paper documents the planning and implementation aspects of the MCWAP infrastructure project. The study will add to the body of knowledge with respect to bulk water infrastructure development in water-scarce regions. The method used to gather and collate relevant data and information was the desktop study. The key finding was that the project was successfully completed in 2015 using conventional project management and construction methods. The project is currently being operated and maintained by the National Department of Water and Sanitation.

Keywords: construction, contract management, infrastructure project, project management

Procedia PDF Downloads 302
24838 Effects of Lime and N100 on the Growth and Phytoextraction Capability of a Willow Variety (S. Viminalis × S. Schwerinii × S. Dasyclados) Grown in Contaminated Soils

Authors: Mir Md. Abdus Salam, Muhammad Mohsin, Pertti Pulkkinen, Paavo Pelkonen, Ari Pappinen

Abstract:

Soil and water pollution caused by extensive mining practices can adversely affect environmental components, such as humans, animals, and plants. Despite a generally positive contribution to society, mining practices have become a serious threat to biological systems. As metals do not degrade completely, they require immobilization, toxicity reduction, or removal. A greenhouse experiment was conducted to evaluate the effects of lime and N100 (11-amino-1-hydroxyundecylidene) chelate amendment on the growth and phytoextraction potential of the willow variety Klara (S. viminalis × S. schwerinii × S. dasyclados) grown in soils heavily contaminated with copper (Cu). The plants were irrigated with tap or processed water (mine wastewater). The sequential extraction technique and inductively coupled plasma-mass spectrometry (ICP-MS) tool were used to determine the extractable metals and evaluate the fraction of metals in the soil that could be potentially available for plant uptake. The results suggest that the combined effects of the contaminated soil and processed water inhibited growth parameter values. In contrast, the accumulation of Cu in the plant tissues was increased compared to the control. When the soil was supplemented with lime and N100; growth parameter and resistance capacity were significantly higher compared to unamended soil treatments, especially in the contaminated soil treatments. The combined lime- and N100-amended soil treatment produced higher growth rate of biomass, resistance capacity and phytoextraction efficiency levels relative to either the lime-amended or the N100-amended soil treatments. This study provides practical evidence of the efficient chelate-assisted phytoextraction capability of Klara and highlights its potential as a viable and inexpensive novel approach for in-situ remediation of Cu-contaminated soils and mine wastewaters. Abandoned agricultural, industrial and mining sites can also be utilized by a Salix afforestation program without conflict with the production of food crops. This kind of program may create opportunities for bioenergy production and economic development, but contamination levels should be examined before bioenergy products are used.

Keywords: copper, Klara, lime, N100, phytoextraction

Procedia PDF Downloads 146
24837 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 467
24836 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 380
24835 Study of the Stability of the Slope Open-Pit Mines: Case of the Mine of Phosphates – Tebessa, Algeria

Authors: Mohamed Fredj, Abdallah Hafsaoui, Radouane Nakache

Abstract:

The study of the stability of the mining works in rock masses fractured is the major concern of the operating engineer. For geotechnical works in mines and quarries, it there is not today's general methodology for analysis and the quantification of the risks relating to the dangers inherent in these concrete types (falling boulders, landslides, etc.). The reasons for this are uncertainty, which weighs on available data or lack of knowledge of the values of the parameters required for this analysis type. Stability calculations must be based on reliable knowledge of the distribution of discontinuities that dissect the Rocky massif and the resistance to shear of the intact rock and discontinuities. This study is aimed to study the stability of slope of mine (Kef Sennoun - Tebessa, Algeria). The problem is analyzed using a numerical model based on the finite elements (software Plaxis 3D).

Keywords: stability, discontinuities, finite elements, rock mass, open-pit mine

Procedia PDF Downloads 321
24834 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 360
24833 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García

Abstract:

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Keywords: Intelligent Transportation Systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning

Procedia PDF Downloads 472
24832 Re-Examining Contracts in Managing and Exploiting Strategic National Resources: A Case in Divestation Process in the Share Distribution of Mining Corporation in West Nusa Tenggara, Indonesia

Authors: Hayyan ul Haq, Zainal Asikin

Abstract:

This work aims to explore the appropriate solution in solving legal problems stemmed from managing and exploiting strategic natural resources in Indonesia. This discussion will be focused on the exploitation of gold mining, i.e. divestation process in the New Mont Corporation, West Nusa Tenggara. These legal problems relate to the deviation of the national budget regulation, UU. No. 19/2012, and the implementation of the divestastion process, which infringes PP. No. 50/2007 concerning the Impelementation Procedure of Regional Cooperation, which is an implementation regulation of UU No. 1/2004 on State’s Treasury. The cooperation model, have been developed by the Provincial Government, failed to create a permanent legal solution through normative approach. It has merely used practical approach that tends (instant solution), by using some loopholes in the divestation process. The above blunders have accumulated by other secondary legal blunders, i.e. good governance principles, particularly justice, transparency, efficiency, effective principles and competitiveness principle. To solve the above problems, this work offers constitutionalisation of contract that aimed at reviewing and coherencing all deviated contracts, rules and policies that have deprived the national and societies’ interest to optimize the strategic natural resources towards the greatest benefit for the greatest number of people..

Keywords: constitutionalisation of contract, strategic national resources, divestation, the greatest benefit for the greatest number of people, Indonesian Pancasila values

Procedia PDF Downloads 459
24831 Artificial Intelligence Methods in Estimating the Minimum Miscibility Pressure Required for Gas Flooding

Authors: Emad A. Mohammed

Abstract:

Utilizing the capabilities of Data Mining and Artificial Intelligence in the prediction of the minimum miscibility pressure (MMP) required for multi-contact miscible (MCM) displacement of reservoir petroleum by hydrocarbon gas flooding using Fuzzy Logic models and Artificial Neural Network models will help a lot in giving accurate results. The factors affecting the (MMP) as it is proved from the literature and from the dataset are as follows: XC2-6: Intermediate composition in the oil-containing C2-6, CO2 and H2S, in mole %, XC1: Amount of methane in the oil (%),T: Temperature (°C), MwC7+: Molecular weight of C7+ (g/mol), YC2+: Mole percent of C2+ composition in injected gas (%), MwC2+: Molecular weight of C2+ in injected gas. Fuzzy Logic and Neural Networks have been used widely in prediction and classification, with relatively high accuracy, in different fields of study. It is well known that the Fuzzy Inference system can handle uncertainty within the inputs such as in our case. The results of this work showed that our proposed models perform better with higher performance indices than other emprical correlations.

Keywords: MMP, gas flooding, artificial intelligence, correlation

Procedia PDF Downloads 144
24830 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 86
24829 The “Bright Side” of COVID-19: Effects of Livestream Affordances on Consumer Purchase Willingness: Explicit IT Affordances Perspective

Authors: Isaac Owusu Asante, Yushi Jiang, Hailin Tao

Abstract:

Live streaming marketing, the new electronic commerce element, became an optional marketing channel following the COVID-19 pandemic. Many sellers have leveraged the features presented by live streaming to increase sales. Studies on live streaming have focused on gaming and consumers’ loyalty to brands through live streaming, using interview questionnaires. This study, however, was conducted to measure real-time observable interactions between consumers and sellers. Based on the affordance theory, this study conceptualized constructs representing the interactive features and examined how they drive consumers’ purchase willingness during live streaming sessions using 1238 datasets from Amazon Live, following the manual observation of transaction records. Using structural equation modeling, the ordinary least square regression suggests that live viewers, new followers, live chats, and likes positively affect purchase willingness. The Sobel and Monte Carlo tests show that new followers, live chats, and likes significantly mediate the relationship between live viewers and purchase willingness. The study introduces a new way of measuring interactions in live streaming commerce and proposes a way to manually gather data on consumer behaviors in live streaming platforms when the application programming interface (API) of such platforms does not support data mining algorithms.

Keywords: livestreaming marketing, live chats, live viewers, likes, new followers, purchase willingness

Procedia PDF Downloads 81
24828 Gene Names Identity Recognition Using Siamese Network for Biomedical Publications

Authors: Micheal Olaolu Arowolo, Muhammad Azam, Fei He, Mihail Popescu, Dong Xu

Abstract:

As the quantity of biological articles rises, so does the number of biological route figures. Each route figure shows gene names and relationships. Annotating pathway diagrams manually is time-consuming. Advanced image understanding models could speed up curation, but they must be more precise. There is rich information in biological pathway figures. The first step to performing image understanding of these figures is to recognize gene names automatically. Classical optical character recognition methods have been employed for gene name recognition, but they are not optimized for literature mining data. This study devised a method to recognize an image bounding box of gene name as a photo using deep Siamese neural network models to outperform the existing methods using ResNet, DenseNet and Inception architectures, the results obtained about 84% accuracy.

Keywords: biological pathway, gene identification, object detection, Siamese network

Procedia PDF Downloads 292
24827 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 333
24826 The Effect of Additive Acid on the Phytoremediation Efficiency

Authors: G. Hosseini, A. Sadighzadeh, M. Rahimnejad, N. Hosseini, Z. Jamalzadeh

Abstract:

Metal pollutants, especially heavy metals from anthropogenic sources such as metallurgical industries’ waste including mining, smelting, casting or production of nuclear fuel, including mining, concentrate production and uranium processing ends in the environment contamination (water and soil) and risk to human health around the facilities of this type of industrial activity. There are different methods that can be used to remove these contaminants from water and soil. These are very expensive and time-consuming. In this case, the people have been forced to leave the area and the decontamination is not done. For example, in the case of Chernobyl accident, an area of 30 km around the plant was emptied of human life. A very efficient and cost-effective method for decontamination of the soil and the water is phytoremediation. In this method, the plants preferentially native plants which are more adaptive to the regional climate are well used. In this study, three types of plants including Alfalfa, Sunflower and wheat were used to Barium decontamination. Alfalfa and Sunflower were not grown good enough in Saghand mine’s soil sample. This can be due to non-native origin of these plants. But, Wheat rise in Saghand Uranium Mine soil sample was satisfactory. In this study, we have investigated the effect of 4 types of acids inclusive nitric acid, oxalic acid, acetic acid and citric acid on the removal efficiency of Barium by Wheat. Our results indicate the increase of Barium absorption in the presence of citric acid in the soil. In this paper, we will present our research and laboratory results.

Keywords: phytoremediation, heavy metal, wheat, soil

Procedia PDF Downloads 338
24825 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall

Procedia PDF Downloads 277
24824 The Regulation of Reputational Information in the Sharing Economy

Authors: Emre Bayamlıoğlu

Abstract:

This paper aims to provide an account of the legal and the regulative aspects of the algorithmic reputation systems with a special emphasis on the sharing economy (i.e., Uber, Airbnb, Lyft) business model. The first section starts with an analysis of the legal and commercial nature of the tripartite relationship among the parties, namely, the host platform, individual sharers/service providers and the consumers/users. The section further examines to what extent an algorithmic system of reputational information could serve as an alternative to legal regulation. Shortcomings are explained and analyzed with specific examples from Airbnb Platform which is a pioneering success in the sharing economy. The following section focuses on the issue of governance and control of the reputational information. The section first analyzes the legal consequences of algorithmic filtering systems to detect undesired comments and how a delicate balance could be struck between the competing interests such as freedom of speech, privacy and the integrity of the commercial reputation. The third section deals with the problem of manipulation by users. Indeed many sharing economy businesses employ certain techniques of data mining and natural language processing to verify consistency of the feedback. Software agents referred as "bots" are employed by the users to "produce" fake reputation values. Such automated techniques are deceptive with significant negative effects for undermining the trust upon which the reputational system is built. The third section is devoted to explore the concerns with regard to data mobility, data ownership, and the privacy. Reputational information provided by the consumers in the form of textual comment may be regarded as a writing which is eligible to copyright protection. Algorithmic reputational systems also contain personal data pertaining both the individual entrepreneurs and the consumers. The final section starts with an overview of the notion of reputation as a communitarian and collective form of referential trust and further provides an evaluation of the above legal arguments from the perspective of public interest in the integrity of reputational information. The paper concludes with certain guidelines and design principles for algorithmic reputation systems, to address the above raised legal implications.

Keywords: sharing economy, design principles of algorithmic regulation, reputational systems, personal data protection, privacy

Procedia PDF Downloads 465
24823 Pattern Discovery from Student Feedback: Identifying Factors to Improve Student Emotions in Learning

Authors: Angelina A. Tzacheva, Jaishree Ranganathan

Abstract:

Interest in (STEM) Science Technology Engineering Mathematics education especially Computer Science education has seen a drastic increase across the country. This fuels effort towards recruiting and admitting a diverse population of students. Thus the changing conditions in terms of the student population, diversity and the expected teaching and learning outcomes give the platform for use of Innovative Teaching models and technologies. It is necessary that these methods adapted should also concentrate on raising quality of such innovations and have positive impact on student learning. Light-Weight Team is an Active Learning Pedagogy, which is considered to be low-stake activity and has very little or no direct impact on student grades. Emotion plays a major role in student’s motivation to learning. In this work we use the student feedback data with emotion classification using surveys at a public research institution in the United States. We use Actionable Pattern Discovery method for this purpose. Actionable patterns are patterns that provide suggestions in the form of rules to help the user achieve better outcomes. The proposed method provides meaningful insight in terms of changes that can be incorporated in the Light-Weight team activities, resources utilized in the course. The results suggest how to enhance student emotions to a more positive state, in particular focuses on the emotions ‘Trust’ and ‘Joy’.

Keywords: actionable pattern discovery, education, emotion, data mining

Procedia PDF Downloads 98
24822 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 436
24821 Monitoring the Pollution Status of the Goan Coast Using Genotoxicity Biomarkers in the Bivalve, Meretrix ovum

Authors: Avelyno D'Costa, S. K. Shyama, M. K. Praveen Kumar

Abstract:

The coast of Goa, India receives constant anthropogenic stress through its major rivers which carry mining rejects of iron and manganese ores from upstream mining sites and petroleum hydrocarbons from shipping and harbor-related activities which put the aquatic fauna such as bivalves at risk. The present study reports the pollution status of the Goan coast by the above xenobiotics employing genotoxicity studies. This is further supplemented by the quantification of total petroleum hydrocarbons (TPHs) and various trace metals (iron, manganese, copper, cadmium, and lead) in gills of the estuarine clam, Meretrix ovum as well as from the surrounding water and sediment, over a two-year sampling period, from January 2013 to December 2014. Bivalves were collected from a probable unpolluted site at Palolem and a probable polluted site at Vasco, based upon the anthropogenic activities at these sites. Genotoxicity was assessed in the gill cells using the comet assay and micronucleus test. The quantity of TPHs and trace metals present in gill tissue, water and sediments were analyzed using spectrofluorometry and atomic absorption spectrophotometry (AAS), respectively. The statistical significance of data was analyzed employing Student’s t-test. The relationship between DNA damage and pollutant concentrations was evaluated using multiple regression analysis. Significant DNA damage was observed in the bivalves collected from Vasco which is a region of high industrial activity. Concentrations of TPHs and trace metals (iron, manganese, and cadmium) were also found to be significantly high in gills of the bivalves collected from Vasco compared to those collected from Palolem. Further, the concentrations of these pollutants were also found to be significantly high in the water and sediments at Vasco compared to that of Palolem. This may be due to the lack of industrial activity at Palolem. A high positive correlation was observed between the pollutant levels and DNA damage in the bivalves collected from Vasco suggesting the genotoxic nature of these pollutants. Further, M. ovum can be used as a bioindicator species for monitoring the level of pollution of the estuarine/coastal regions by TPHs and trace metals.

Keywords: comet assay, metals, micronucleus test, total petroleum Hydrocarbons

Procedia PDF Downloads 237
24820 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 272
24819 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 89
24818 Learning about the Strengths and Weaknesses of Urban Climate Action Plans

Authors: Prince Dacosta Aboagye, Ayyoob Sharifi

Abstract:

Cities respond to climate concerns mainly through their climate action plans (CAPs). A comprehensive content analysis of the dynamics in existing urban CAPs is not well represented in the literature. This literature void presents a difficulty in appreciating the strengths and weaknesses of urban CAPs. Here, we perform a qualitative content analysis (QCA) on CAPs from 278 cities worldwide and use text-mining tools to map and visualize the relevant data. Our analysis showed a decline in the number of CAPs developed and published following the global COVID-19 lockdown period. Evidently, megacities are leading the deep decarbonisation agenda. We also observed a transition from developing mainly mitigation-focused CAPs pre-COP21 to both mitigation and adaptation CAPs. A lack of inclusiveness in local climate planning was common among European and North American cities. The evidence is a catalyst for understanding the trends in existing urban CAPs to shape future urban climate planning.

Keywords: urban, climate action plans, strengths, weaknesses

Procedia PDF Downloads 96
24817 Hydrogeophysical Investigations And Mapping of Ingress Channels Along The Blesbokspruit Stream In The East Rand Basin Of The Witwatersrand, South Africa

Authors: Melvin Sethobya, Sithule Xanga, Sechaba Lenong, Lunga Nolakana, Gbenga Adesola

Abstract:

Mining has been the cornerstone of the South African economy for the last century. Most of the gold mining in South Africa was conducted within the Witwatersrand basin, which contributed to the rapid growth of the city of Johannesburg and capitulated the city to becoming the business and wealth capital of the country. But with gradual depletion of resources, a stoppage in the extraction of underground water from mines and other factors relating to survival of the mining operations over a lengthy period, most of the mines were abandoned and left to pollute the local waterways and groundwater with toxins, heavy metal residue and increased acid mine drainage ensued. The Department of Mineral Resources and Energy commissioned a project whose aim is to monitor, maintain, and mitigate the adverse environmental impacts of polluted water mine water flowing into local streams affecting local ecosystems and livelihoods downstream. As part of mitigation efforts, the diagnosis and monitoring of groundwater or surface water polluted sites has become important. Geophysical surveys, in particular, Resistivity and Magnetics surveys, were selected as some of most suitable techniques for investigation of local ingress points along of one the major streams cutting through the Witwatersrand basin, namely the Blesbokspruit, which is found in the eastern part of the basin. The aim of the surveys was to provide information that could be used to assist in determining possible water loss/ ingress from the Blesbokspriut stream. Modelling of geophysical surveys results offered an in-depth insight into the interaction and pathways of polluted water through mapping of possible ingress channels near the Blesbokspruit. The resistivity - depth profile of the surveyed site exhibit a three(3) layered model with low resistivity values (10 to 200 Ω.m) overburden, which is underlain by a moderate resistivity weathered layer (>300 Ω.m), which sits on a more resistive crystalline bedrock (>500 Ω.m). Two locations of potential ingress channels were mapped across the two traverses at the site. The magnetic survey conducted at the site mapped a major NE-SW trending regional linearment with a strong magnetic signature, which was modeled to depth beyond 100m, with the potential to act as a conduit for dispersion of stream water away from the stream, as it shared a similar orientation with the potential ingress channels as mapped using the resistivity method.

Keywords: eletrictrical resistivity, magnetics survey, blesbokspruit, ingress

Procedia PDF Downloads 63
24816 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 136
24815 Research on the Risks of Railroad Receiving and Dispatching Trains Operators: Natural Language Processing Risk Text Mining

Authors: Yangze Lan, Ruihua Xv, Feng Zhou, Yijia Shan, Longhao Zhang, Qinghui Xv

Abstract:

Receiving and dispatching trains is an important part of railroad organization, and the risky evaluation of operating personnel is still reflected by scores, lacking further excavation of wrong answers and operating accidents. With natural language processing (NLP) technology, this study extracts the keywords and key phrases of 40 relevant risk events about receiving and dispatching trains and reclassifies the risk events into 8 categories, such as train approach and signal risks, dispatching command risks, and so on. Based on the historical risk data of personnel, the K-Means clustering method is used to classify the risk level of personnel. The result indicates that the high-risk operating personnel need to strengthen the training of train receiving and dispatching operations towards essential trains and abnormal situations.

Keywords: receiving and dispatching trains, natural language processing, risk evaluation, K-means clustering

Procedia PDF Downloads 91
24814 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 181
24813 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 83
24812 An Automated Approach to the Nozzle Configuration of Polycrystalline Diamond Compact Drill Bits for Effective Cuttings Removal

Authors: R. Suresh, Pavan Kumar Nimmagadda, Ming Zo Tan, Shane Hart, Sharp Ugwuocha

Abstract:

Polycrystalline diamond compact (PDC) drill bits are extensively used in the oil and gas industry as well as the mining industry. Industry engineers continually improve upon PDC drill bit designs and hydraulic conditions. Optimized injection nozzles play a key role in improving the drilling performance and efficiency of these ever changing PDC drill bits. In the first part of this study, computational fluid dynamics (CFD) modelling is performed to investigate the hydrodynamic characteristics of drilling fluid flow around the PDC drill bit. An Open-source CFD software – OpenFOAM simulates the flow around the drill bit, based on the field input data. A specifically developed console application integrates the entire CFD process including, domain extraction, meshing, and solving governing equations and post-processing. The results from the OpenFOAM solver are then compared with that of the ANSYS Fluent software. The data from both software programs agree. The second part of the paper describes the parametric study of the PDC drill bit nozzle to determine the effect of parameters such as number of nozzles, nozzle velocity, nozzle radial position and orientations on the flow field characteristics and bit washing patterns. After analyzing a series of nozzle configurations, the best configuration is identified and recommendations are made for modifying the PDC bit design.

Keywords: ANSYS Fluent, computational fluid dynamics, nozzle configuration, OpenFOAM, PDC dill bit

Procedia PDF Downloads 420
24811 Geological Structure Identification in Semilir Formation: An Correlated Geological and Geophysical (Very Low Frequency) Data for Zonation Disaster with Current Density Parameters and Geological Surface Information

Authors: E. M. Rifqi Wilda Pradana, Bagus Bayu Prabowo, Meida Riski Pujiyati, Efraim Maykhel Hagana Ginting, Virgiawan Arya Hangga Reksa

Abstract:

The VLF (Very Low Frequency) method is an electromagnetic method that uses low frequencies between 10-30 KHz which results in a fairly deep penetration. In this study, the VLF method was used for zonation of disaster-prone areas by identifying geological structures in the form of faults. Data acquisition was carried out in Trimulyo Region, Jetis District, Bantul Regency, Special Region of Yogyakarta, Indonesia with 8 measurement paths. This study uses wave transmitters from Japan and Australia to obtain Tilt and Elipt values that can be used to create RAE (Rapat Arus Ekuivalen or Current Density) sections that can be used to identify areas that are easily crossed by electric current. This section will indicate the existence of a geological structure in the form of faults in the study area which is characterized by a high RAE value. In data processing of VLF method, it is obtained Tilt vs Elliptical graph and Moving Average (MA) Tilt vs Moving Average (MA) Elipt graph of each path that shows a fluctuating pattern and does not show any intersection at all. Data processing uses Matlab software and obtained areas with low RAE values that are 0%-6% which shows medium with low conductivity and high resistivity and can be interpreted as sandstone, claystone, and tuff lithology which is part of the Semilir Formation. Whereas a high RAE value of 10% -16% which shows a medium with high conductivity and low resistivity can be interpreted as a fault zone filled with fluid. The existence of the fault zone is strengthened by the discovery of a normal fault on the surface with strike N550W and dip 630E at coordinates X= 433256 and Y= 9127722 so that the activities of residents in the zone such as housing, mining activities and other activities can be avoided to reduce the risk of natural disasters.

Keywords: current density, faults, very low frequency, zonation

Procedia PDF Downloads 175
24810 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 106