Search results for: data stream mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7782

Search results for: data stream mining

7512 Autonomous Robots- Visual Perception in Underground Terrains Using Statistical Region Merging

Authors: Omowunmi E. Isafiade, Isaac O. Osunmakinde, Antoine B. Bagula

Abstract:

Robots- visual perception is a field that is gaining increasing attention from researchers. This is partly due to emerging trends in the commercial availability of 3D scanning systems or devices that produce a high information accuracy level for a variety of applications. In the history of mining, the mortality rate of mine workers has been alarming and robots exhibit a great deal of potentials to tackle safety issues in mines. However, an effective vision system is crucial to safe autonomous navigation in underground terrains. This work investigates robots- perception in underground terrains (mines and tunnels) using statistical region merging (SRM) model. SRM reconstructs the main structural components of an imagery by a simple but effective statistical analysis. An investigation is conducted on different regions of the mine, such as the shaft, stope and gallery, using publicly available mine frames, with a stream of locally captured mine images. An investigation is also conducted on a stream of underground tunnel image frames, using the XBOX Kinect 3D sensors. The Kinect sensors produce streams of red, green and blue (RGB) and depth images of 640 x 480 resolution at 30 frames per second. Integrating the depth information to drivability gives a strong cue to the analysis, which detects 3D results augmenting drivable and non-drivable regions in 2D. The results of the 2D and 3D experiment with different terrains, mines and tunnels, together with the qualitative and quantitative evaluation, reveal that a good drivable region can be detected in dynamic underground terrains.

Keywords: Drivable Region Detection, Kinect Sensor, Robots' Perception, SRM, Underground Terrains.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1837
7511 Feature Selection Approaches with Missing Values Handling for Data Mining - A Case Study of Heart Failure Dataset

Authors: N.Poolsawad, C.Kambhampati, J. G. F. Cleland

Abstract:

In this paper, we investigated the characteristic of a clinical dataseton the feature selection and classification measurements which deal with missing values problem.And also posed the appropriated techniques to achieve the aim of the activity; in this research aims to find features that have high effect to mortality and mortality time frame. We quantify the complexity of a clinical dataset. According to the complexity of the dataset, we proposed the data mining processto cope their complexity; missing values, high dimensionality, and the prediction problem by using the methods of missing value replacement, feature selection, and classification.The experimental results will extend to develop the prediction model for cardiology.

Keywords: feature selection, missing values, classification, clinical dataset, heart failure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3211
7510 Social and Economic Effects of Mining Industry Restructuring in Romania -Case Studies

Authors: Andra Costache, Gica Pehoiu

Abstract:

As in other countries from Central and Eastern Europe, the economic restructuring occurred in the last decade of the twentieth century affected the mining industry in Romania, an oversize and heavily subsidized sector before 1989. After more than a decade since the beginning of mining restructuring, an evaluation of current social implications of the process it is required, together with an efficiency analysis of the adaptation mechanisms developed at governmental level. This article aims to provide an insight into these issues through case studies conducted in the most important coal basin of Romania, Petroşani Depression.

Keywords: case studies, government programs, miningrestructuring, social effects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2660
7509 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1284
7508 Integration of Educational Data Mining Models to a Web-Based Support System for Predicting High School Student Performance

Authors: Sokkhey Phauk, Takeo Okazaki

Abstract:

The challenging task in educational institutions is to maximize the high performance of students and minimize the failure rate of poor-performing students. An effective method to leverage this task is to know student learning patterns with highly influencing factors and get an early prediction of student learning outcomes at the timely stage for setting up policies for improvement. Educational data mining (EDM) is an emerging disciplinary field of data mining, statistics, and machine learning concerned with extracting useful knowledge and information for the sake of improvement and development in the education environment. The study is of this work is to propose techniques in EDM and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models is conducted. Subsequently, high performing models are developed to get higher performance. The hybrid random forest (Hybrid RF) produces the most successful classification. For the context of intervention and improving the learning outcomes, a feature selection method MICHI, which is the combination of mutual information (MI) and chi-square (CHI) algorithms based on the ranked feature scores, is introduced to select a dominant feature set that improves the performance of prediction and uses the obtained dominant set as information for intervention. By using the proposed techniques of EDM, an academic performance prediction system (APPS) is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental outcomes and evaluation surveys report the effectiveness and usefulness of the developed system. The system is used to help educational stakeholders and related individuals for intervening and improving student performance.

Keywords: Academic performance prediction system, prediction model, educational data mining, dominant factors, feature selection methods, student performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 976
7507 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: Clustering, k-means, categorical datasets, pattern recognition, unsupervised learning, knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3545
7506 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2776
7505 A Semantic Recommendation Procedure for Electronic Product Catalog

Authors: Hadi Khosravi Farsani, Mohammadali Nematbakhsh

Abstract:

To overcome the product overload of Internet shoppers, we introduce a semantic recommendation procedure which is more efficient when applied to Internet shopping malls. The suggested procedure recommends the semantic products to the customers and is originally based on Web usage mining, product classification, association rule mining, and frequently purchasing. We applied the procedure to the data set of MovieLens Company for performance evaluation, and some experimental results are provided. The experimental results have shown superior performance in terms of coverage and precision.

Keywords: Personalization, Recommendation, OWL Ontology, Electronic Catalogs, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1923
7504 Forest Risk and Vulnerability Assessment: A Case Study from East Bokaro Coal Mining Area in India

Authors: Sujata Upgupta, Prasoon Kumar Singh

Abstract:

The expansion of large scale coal mining into forest areas is a potential hazard for the local biodiversity and wildlife. The objective of this study is to provide a picture of the threat that coal mining poses to the forests of the East Bokaro landscape. The vulnerable forest areas at risk have been assessed and the priority areas for conservation have been presented. The forested areas at risk in the current scenario have been assessed and compared with the past conditions using classification and buffer based overlay approach. Forest vulnerability has been assessed using an analytical framework based on systematic indicators and composite vulnerability index values. The results indicate that more than 4 km2 of forests have been lost from 1973 to 2016. Large patches of forests have been diverted for coal mining projects. Forests in the northern part of the coal field within 1-3 km radius around the coal mines are at immediate risk. The original contiguous forests have been converted into fragmented and degraded forest patches. Most of the collieries are located within or very close to the forests thus threatening the biodiversity and hydrology of the surrounding regions. Based on the vulnerability values estimated, it was concluded that more than 90% of the forested grids in East Bokaro are highly vulnerable to mining. The forests in the sub-districts of Bermo and Chandrapura have been identified as the most vulnerable to coal mining activities. This case study would add to the capacity of the forest managers and mine managers to address the risk and vulnerability of forests at a small landscape level in order to achieve sustainable development.

Keywords: Coal mining, forest, indicators, vulnerability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1160
7503 Text-Mining Approach for Evaluation of Affective Management Practices

Authors: Masaaki Saito, Qin Tang, Hiroyuki Umemuro

Abstract:

The purpose of this paper is to propose a text mining approach to evaluate companies- practices on affective management. Affective management argues that it is critical to take stakeholders- affects into consideration during decision-making process, along with the traditional numerical and rational indices. CSR reports published by companies were collected as source information. Indices were proposed based on the frequency and collocation of words relevant to affective management concept using text mining approach to analyze the text information of CSR reports. In addition, the relationships between the results obtained using proposed indices and traditional indicators of business performance were investigated using correlation analysis. Those correlations were also compared between manufacturing and non-manufacturing companies. The results of this study revealed the possibility to evaluate affective management practices of companies based on publicly available text documents.

Keywords: Affective management, Affect, Stakeholder, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1845
7502 Hybrid Intelligent Intrusion Detection System

Authors: Norbik Bashah, Idris Bharanidharan Shanmugam, Abdul Manan Ahmed

Abstract:

Intrusion Detection Systems are increasingly a key part of systems defense. Various approaches to Intrusion Detection are currently being used, but they are relatively ineffective. Artificial Intelligence plays a driving role in security services. This paper proposes a dynamic model Intelligent Intrusion Detection System, based on specific AI approach for intrusion detection. The techniques that are being investigated includes neural networks and fuzzy logic with network profiling, that uses simple data mining techniques to process the network data. The proposed system is a hybrid system that combines anomaly, misuse and host based detection. Simple Fuzzy rules allow us to construct if-then rules that reflect common ways of describing security attacks. For host based intrusion detection we use neural-networks along with self organizing maps. Suspicious intrusions can be traced back to its original source path and any traffic from that particular source will be redirected back to them in future. Both network traffic and system audit data are used as inputs for both.

Keywords: Intrusion Detection, Network Security, Data mining, Fuzzy Logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2131
7501 Automata Theory Approach for Solving Frequent Pattern Discovery Problems

Authors: Renáta Iváncsy, István Vajk

Abstract:

The various types of frequent pattern discovery problem, namely, the frequent itemset, sequence and graph mining problems are solved in different ways which are, however, in certain aspects similar. The main approach of discovering such patterns can be classified into two main classes, namely, in the class of the levelwise methods and in that of the database projection-based methods. The level-wise algorithms use in general clever indexing structures for discovering the patterns. In this paper a new approach is proposed for discovering frequent sequences and tree-like patterns efficiently that is based on the level-wise issue. Because the level-wise algorithms spend a lot of time for the subpattern testing problem, the new approach introduces the idea of using automaton theory to solve this problem.

Keywords: Frequent pattern discovery, graph mining, pushdownautomaton, sequence mining, state machine, tree mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1628
7500 Combining Fuzzy Logic and Data Miningto Predict the Result of an EIA Review

Authors: Kevin Fong-Rey Liu, Jia-Shen Chen, Han-Hsi Liang, Cheng-Wu Chen, Yung-Shuen Shen

Abstract:

The purpose of determining impact significance is to place value on impacts. Environmental impact assessment review is a process that judges whether impact significance is acceptable or not in accordance with the scientific facts regarding environmental, ecological and socio-economical impacts described in environmental impact statements (EIS) or environmental impact assessment reports (EIAR). The first aim of this paper is to summarize the criteria of significance evaluation from the past review results and accordingly utilize fuzzy logic to incorporate these criteria into scientific facts. The second aim is to employ data mining technique to construct an EIS or EIAR prediction model for reviewing results which can assist developers to prepare and revise better environmental management plans in advance. The validity of the previous prediction model proposed by authors in 2009 is 92.7%. The enhanced validity in this study can attain 100.0%.

Keywords: Environmental impact assessment review, impactsignificance, fuzzy logic, data mining, classification tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944
7499 An Approach to Concerns and Aspects Mining for Web Applications

Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini

Abstract:

Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.

Keywords: Aspect Mining, Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513
7498 Validation and Selection between Machine Learning Technique and Traditional Methods to Reduce Bullwhip Effects: a Data Mining Approach

Authors: Hamid R. S. Mojaveri, Seyed S. Mousavi, Mojtaba Heydar, Ahmad Aminian

Abstract:

The aim of this paper is to present a methodology in three steps to forecast supply chain demand. In first step, various data mining techniques are applied in order to prepare data for entering into forecasting models. In second step, the modeling step, an artificial neural network and support vector machine is presented after defining Mean Absolute Percentage Error index for measuring error. The structure of artificial neural network is selected based on previous researchers' results and in this article the accuracy of network is increased by using sensitivity analysis. The best forecast for classical forecasting methods (Moving Average, Exponential Smoothing, and Exponential Smoothing with Trend) is resulted based on prepared data and this forecast is compared with result of support vector machine and proposed artificial neural network. The results show that artificial neural network can forecast more precisely in comparison with other methods. Finally, forecasting methods' stability is analyzed by using raw data and even the effectiveness of clustering analysis is measured.

Keywords: Artificial Neural Networks (ANN), bullwhip effect, demand forecasting, Support Vector Machine (SVM).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010
7497 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques

Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas

Abstract:

The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.

Keywords: Artificial neural network, competitive dynamics, logistic regression, text classification, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 535
7496 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: Big Data, Social Networks, Sentiment Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4348
7495 Main Cause of Children's Deaths in Indigenous Wayuu Community from Department of La Guajira: A Research Developed through Data Mining Use

Authors: Isaura Esther Solano Núñez, David Suarez

Abstract:

The main purpose of this research is to discover what causes death in children of the Wayuu community, and deeply analyze those results in order to take corrective measures to properly control infant mortality. We consider important to determine the reasons that are producing early death in this specific type of population, since they are the most vulnerable to high risk environmental conditions. In this way, the government, through competent authorities, may develop prevention policies and the right measures to avoid an increase of this tragic fact. The methodology used to develop this investigation is data mining, which consists in gaining and examining large amounts of data to produce new and valuable information. Through this technique it has been possible to determine that the child population is dying mostly from malnutrition. In short, this technique has been very useful to develop this study; it has allowed us to transform large amounts of information into a conclusive and important statement, which has made it easier to take appropriate steps to resolve a particular situation.

Keywords: Malnutrition, datamining, analytical, descriptive, population, wayuu, indigenous.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 696
7494 The Effects of Human Activity in Yasuj Area on the Health of Stream City

Authors: Jamalodin Alvani, Fardin Boustani, Omid Tabiee, Masoud Hashemi

Abstract:

The Yasuj city stream named the Beshar supply water for different usages such as aquaculture farms , drinking, agricultural and industrial usages. Fish processing plants ,Agricultural farms, waste water of industrial zones and hospitals waste water which they are generate by human activity produce a considerable volume of effluent and when they are released in to the stream they can effect on the water quality and down stream aquatic systems. This study was conducted to evaluate the effects of outflow effluent from different human activity and point and non point pollution sources on the water quality and health of the Beshar river next to Yasuj. Yasuj is the biggest and most important city in the Kohkiloye and Boyerahmad province . The Beshar River is one of the most important aquatic ecosystems in the upstream of the Karun watershed in south of Iran which is affected by point and non point pollutant sources . This study was done in order to evaluate the effects of human activities on the water quality and health of the Beshar river. This river is approximately 190 km in length and situated at the geographical positions of 51° 20' to 51° 48' E and 30° 18' to 30° 52' N it is one of the most important aquatic ecosystems of Kohkiloye and Boyerahmad province in south-west Iran. In this research project, five study stations were selected to examine water pollution in the Beshar River systems. Human activity is now one of the most important factors affecting on hydrology and water quality of the Beshar river. Humans use large amounts of resources to sustain various standards of living, although measures of sustainability are highly variable depending on how sustainability is defined. The Beshar river ecosystems are particularly sensitive and vulnerable to human activities. The water samples were analyzed, then some important water quality parameters such as pH, dissolve oxygen (DO), Biochemical Oxygen Demand (BOD5), Chemical Oxygen Demand (COD), Total Suspended Solids (TDS),Turbidity, Temperature, Nitrates (NO3) and Phosphates (PO4) were estimated at the two stations. The results show a downward trend in the water quality at the down stream of the city. The amounts of BOD5,COD,TSS,T,Turbidity, NO3 and PO4 in the down stream stations were considerably more than the station 1. By contrast the amounts of DO in the down stream stations were less than to the station 1. However when effluent discharge consequence of human activities are released into the Beshar river near the city, the quality of river are decreases and the environmental problems of the river during the next years are predicted to rise.

Keywords: Health, Human activities, Water pollution, Yasuj , Iran

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2184
7493 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: Machine learning, Imbalanced data, Data mining, Big data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1137
7492 Dose due the Incorporation of Radionuclides Using Teeth as Bioindicators nearby Caetité Uranium Mines

Authors: Viviane S. Guimarães, Ícaro M. M. Brasil, Simara S. Campos, Roseli F. Gennari, Márcia R. P. Attie, Susana O. Souza.

Abstract:

Uranium mining and processing in Brazil occur in a northeastern area near to Caetité-BA. Several Non-Governmental Organizations claim that uranium mining in this region is a pollutant causing health risks to the local population,but those in charge of the complex extraction and production of“yellow cake" for generating fuel to the nuclear power plants reject these allegations. This study aimed at identifying potential problems caused by mining to the population of Caetité. In this, work,the concentrations of 238U, 232Th and 40K radioisotopes in the teeth of the Caetité population were determined by ICP-MS. Teeth are used as bioindicators of incorporated radionuclides. Cumulative radiation doses in the skeleton were also determined. The concentration values were below 0.008 ppm, and annual effective dose due to radioisotopes are below to the reference values. Therefore, it is not possible to state that the mining process in Caetité increases pollution or radiation exposure in a meaningful way.

Keywords: bioindicators, radiation dose, radioisotopesincorporation, uranium.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4111
7491 Educational Data Mining: The Case of Department of Mathematics and Computing in the Period 2009-2018

Authors: M. Sitoe, O. Zacarias

Abstract:

University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.

Keywords: Evasion and retention, cross validation, bagging, stacking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 120
7490 Speed Characteristics of Mixed Traffic Flow on Urban Arterials

Authors: Ashish Dhamaniya, Satish Chandra

Abstract:

Speed and traffic volume data are collected on different sections of four lane and six lane roads in three metropolitan cities in India. Speed data are analyzed to fit the statistical distribution to individual vehicle speed data and all vehicles speed data. It is noted that speed data of individual vehicle generally follows a normal distribution but speed data of all vehicle combined at a section of urban road may or may not follow the normal distribution depending upon the composition of traffic stream. A new term Speed Spread Ratio (SSR) is introduced in this paper which is the ratio of difference in 85th and 50th percentile speed to the difference in 50th and 15th percentile speed. If SSR is unity then speed data are truly normally distributed. It is noted that on six lane urban roads, speed data follow a normal distribution only when SSR is in the range of 0.86 – 1.11. The range of SSR is validated on four lane roads also.

Keywords: Normal distribution, percentile speed, speed spread ratio, traffic volume.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4245
7489 Redesigning Business Processes: A Method Based on Simulation and Process Mining Techniques

Authors: Zahra Mohammadnazari, Fateme Rostambeygi, Fatemeh Dehrouyeh, Hwang Ki-Soon, Amir Aghsami

Abstract:

Corporations have always prioritized efforts to examine and improve processes. Various metrics, such as the cost and time required to implement the process and can be specified in this regard. Process improvement can be defined as an improvement of these indicators. This is accomplished by looking at prospective adjustments to the current executive process model or the resources allotted to it. Research has been conducted in this paper to the improve the procurement process and aims to explore assessment prospects in the project using a combination of process mining and simulation (benefiting from Play-In and Play-Out methodologies). To run the simulation, we will need to complete the control flow diagram, institution settings, resource settings, and activity settings. The process of mining event logs yields the process control flow. However, both the entry of institutions and the distribution of resources must be modeled. The rate of admission of institutions and the distribution of time for the implementation of activities will be determined in the next step.

Keywords: Business reengineering, Petri net, process-based simulation, process mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 483
7488 Rainfall and Flood Forecast Models for Better Flood Relief Plan of the Mae Sot Municipality

Authors: S. Chuenchooklin, S. Taweepong, U. Pangnakorn

Abstract:

This research was conducted in the Mae Sot Watershed where located in the Moei River Basin at the Upper Salween River Basin in Tak Province, Thailand. The Mae Sot Municipality is the largest urban area in Tak Province and situated in the midstream of the Mae Sot Watershed. It usually faces flash flood problem after heavy rain due to poor flood management has been reported since economic rapidly bloom up in recent years. Its catchment can be classified as ungauged basin with lack of rainfall data and no any stream gaging station was reported. It was attached by most severely flood events in 2013 as the worst studied case for all those communities in this municipality. Moreover, other problems are also faced in this watershed, such shortage water supply for domestic consumption and agriculture utilizations including a deterioration of water quality and landslide as well. The research aimed to increase capability building and strengthening the participation of those local community leaders and related agencies to conduct better water management in urban area was started by mean of the data collection and illustration of the appropriated application of some short period rainfall forecasting model as they aim for better flood relief plan and management through the hydrologic model system and river analysis system programs. The authors intended to apply the global rainfall data via the integrated data viewer (IDV) program from the Unidata with the aim for rainfall forecasting in a short period of 7-10 days in advance during rainy season instead of real time record. The IDV product can be present in an advance period of rainfall with time step of 3-6 hours was introduced to the communities. The result can be used as input data to the hydrologic modeling system model (HEC-HMS) for synthesizing flood hydrographs and use for flood forecasting as well. The authors applied the river analysis system model (HEC-RAS) to present flood flow behaviors in the reach of the Mae Sot stream via the downtown of the Mae Sot City as flood extents as the water surface level at every cross-sectional profiles of the stream. Both models of HMS and RAS were tested in 2013 with observed rainfall and inflow-outflow data from the Mae Sot Dam. The result of HMS showed fit to the observed data at the dam and applied at upstream boundary discharge to RAS in order to simulate flood extents and tested in the field, and the result found satisfying. The product of rainfall from IDV was fair while compared with observed data. However, it is an appropriate tool to use in the ungauged catchment to use with flood hydrograph and river analysis models for future efficient flood relief plan and management.

Keywords: Global rainfall, flood forecasting, hydrologic modeling system, river analysis system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2415
7487 Evaluation of the Performance of ACTIFLO® Clarifier in the Treatment of Mining Wastewaters: Case Study of Costerfield Mining Operations, Victoria, Australia

Authors: Seyed Mohsen Samaei, Shirley Gato-Trinidad

Abstract:

A pre-treatment stage prior to reverse osmosis (RO) is very important to ensure the long-term performance of the RO membranes in any wastewater treatment using RO. This study aims to evaluate the application of the Actiflo® clarifier as part of a pre-treatment unit in mining operations. It involves performing analytical testing on RO feed water before and after installation of Actiflo® unit. Water samples prior to RO plant stage were obtained on different dates from Costerfield mining operations in Victoria, Australia. Tests were conducted in an independent laboratory to determine the concentration of various compounds in RO feed water before and after installation of Actiflo® unit during the entire evaluated period from December 2015 to June 2018. Water quality analysis shows that the quality of RO feed water has remarkably improved since installation of Actiflo® clarifier. Suspended solids (SS) and turbidity removal efficiencies has been improved by 91 and 85 percent respectively in pre-treatment system since the installation of Actiflo®. The Actiflo® clarifier proved to be a valuable part of pre-treatment system prior to RO. It has the potential to conveniently condition the mining wastewater prior to RO unit, and reduce the risk of RO physical failure and irreversible fouling. Consequently, reliable and durable operation of RO unit with minimum requirement for RO membrane replacement is expected with Actiflo® in use.

Keywords: Actiflo® clarifier, membrane, mining wastewater, reverse osmosis, wastewater treatment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1200
7486 Performance Evaluation of Data Mining Techniques for Predicting Software Reliability

Authors: Pradeep Kumar, Abdul Wahid

Abstract:

Accurate software reliability prediction not only enables developers to improve the quality of software but also provides useful information to help them for planning valuable resources. This paper examines the performance of three well-known data mining techniques (CART, TreeNet and Random Forest) for predicting software reliability. We evaluate and compare the performance of proposed models with Cascade Correlation Neural Network (CCNN) using sixteen empirical databases from the Data and Analysis Center for Software. The goal of our study is to help project managers to concentrate their testing efforts to minimize the software failures in order to improve the reliability of the software systems. Two performance measures, Normalized Root Mean Squared Error (NRMSE) and Mean Absolute Errors (MAE), illustrate that CART model is accurate than the models predicted using Random Forest, TreeNet and CCNN in all datasets used in our study. Finally, we conclude that such methods can help in reliability prediction using real-life failure datasets.

Keywords: Classification, Cascade Correlation Neural Network, Random Forest, Software reliability, TreeNet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1839
7485 Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data

Authors: Cristina G. Dascâlu, Corina Dima Cozma, Elena Carmen Cotrutz

Abstract:

The medical data statistical analysis often requires the using of some special techniques, because of the particularities of these data. The principal components analysis and the data clustering are two statistical methods for data mining very useful in the medical field, the first one as a method to decrease the number of studied parameters, and the second one as a method to analyze the connections between diagnosis and the data about the patient-s condition. In this paper we investigate the implications obtained from a specific data analysis technique: the data clustering preceded by a selection of the most relevant parameters, made using the principal components analysis. Our assumption was that, using the principal components analysis before data clustering - in order to select and to classify only the most relevant parameters – the accuracy of clustering is improved, but the practical results showed the opposite fact: the clustering accuracy decreases, with a percentage approximately equal with the percentage of information loss reported by the principal components analysis.

Keywords: Data clustering, medical data, principal components analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1501
7484 CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education

Authors: Eman AbuKhousa, Marwan Z. Bataineh

Abstract:

The 21st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.

Keywords: Clustering analysis, community of practice, data mining, higher education, new faculty challenges, social networks, social influence, professional development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 972
7483 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: Data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2557