Search results for: mining wastewater

444 Mining User-Generated Contents to Detect Service Failures with Topic Model

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: Latent Dirichlet allocation, R program, text mining, topic model, user generated contents, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1216

443 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2776

442 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: Remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2053

441 Post Mining- Discovering Valid Rules from Different Sized Data Sources

Authors: R. Nedunchezhian, K. Anbumani

Abstract:

A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.

Keywords: Association rules, multiple data stores, synthesizing, valid rules.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1404

440 Optimization of the Co-Precipitation of Industrial Waste Metals in a Continuous Reactor System

Authors: Thomas S. Abia II, Citlali Garcia-Saucedo

Abstract:

A continuous copper precipitation treatment (CCPT) system was conceived at Intel Chandler Site to serve as a first-of-kind (FOK) facility-scale waste copper (Cu), nickel (Ni), and manganese (Mn) co-precipitation facility. The process was designed to treat highly variable wastewater discharged from a substrate packaging research factory. The paper discusses metals co-precipitation induced by internal changes for manufacturing facilities that lack the capacity for hardware expansion due to real estate restrictions, aggressive schedules, or budgetary constraints. Herein, operating parameters such as pH and oxidation reduction potential (ORP) were examined to analyze the ability of the CCPT System to immobilize various waste metals. Additionally, influential factors such as influent concentrations and retention times were investigated to quantify the environmental variability against system performance. A total of 2,027 samples were analyzed and statistically evaluated to measure the performance of CCPT that was internally retrofitted for Mn abatement to meet environmental regulations. In order to enhance the consistency of the influent, a separate holding tank was cannibalized from another system to collect and slow-feed the segregated Mn wastewater from the factory into CCPT. As a result, the baseline influent Mn decreased from 17.2+18.7 mg¹L^-1 at pre-pilot to 5.15+8.11 mg¹L^-1 post-pilot (70.1% reduction). Likewise, the pre-trial and post-trial average influent Cu values to CCPT were 52.0+54.6 mg¹L^-1 and 33.9+12.7 mg¹L^-1, respectively (34.8% reduction). However, the raw Ni content of 0.97+0.39 mg¹L^-1 at pre-pilot increased to 1.06+0.17 mg¹L^-1 at post-pilot. The average Mn output declined from 10.9+11.7 mg¹L^-1 at pre-pilot to 0.44+1.33 mg¹L^-1 at post-pilot (96.0% reduction) as a result of the pH and ORP operating setpoint changes. In similar fashion, the output Cu quality improved from 1.60+5.38 mg¹L^-1 to 0.55+1.02 mg¹L^-1 (65.6% reduction) while the Ni output sustained a 50% enhancement during the pilot study (0.22+0.19 mg¹L^-1 reduced to 0.11+0.06 mg¹L^-1). pH and ORP were shown to be significantly instrumental to the precipitative versatility of the CCPT System.

Keywords: Copper, co-precipitation, industrial wastewater treatment, manganese, optimization, pilot study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 983

439 Content Based Sampling over Transactional Data Streams

Authors: Mansour Tarafdar, Mohammad Saniee Abade

Abstract:

This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.

Keywords: Sampling, data streams, closed frequent item set mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1709

438 Mining Implicit Knowledge to Predict Political Risk by Providing Novel Framework with Using Bayesian Network

Authors: Siavash Asadi Ghajarloo

Abstract:

Nowadays predicting political risk level of country has become a critical issue for investors who intend to achieve accurate information concerning stability of the business environments. Since, most of the times investors are layman and nonprofessional IT personnel; this paper aims to propose a framework named GECR in order to help nonexpert persons to discover political risk stability across time based on the political news and events. To achieve this goal, the Bayesian Networks approach was utilized for 186 political news of Pakistan as sample dataset. Bayesian Networks as an artificial intelligence approach has been employed in presented framework, since this is a powerful technique that can be applied to model uncertain domains. The results showed that our framework along with Bayesian Networks as decision support tool, predicted the political risk level with a high degree of accuracy.

Keywords: Bayesian Networks, Data mining, GECRframework, Predicting political risk.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2174

437 Appraisal of Methods for Identifying, Mapping, and Modelling of Fluvial Erosion in a Mining Environment

Authors: F. F. Howard, I. Yakubu, C. B. Boye, J. S. Y. Kuma

Abstract:

Natural and human activities, such as mining operations, expose the natural soil to adverse environmental conditions, leading to contamination of soil, groundwater, and surface water, which has negative effects on humans, flora, and fauna. Bare or partly exposed soil is most liable to fluvial erosion. This paper enumerates various methods used to identify, map, and model fluvial erosion in a mining environment. Classical, Artificial Intelligence (AI), and GIS methods have been reviewed. One of the many classical methods used to estimate river erosion is the Revised Universal Soil Loss Equation (RUSLE) model. The RUSLE model is easy to use. Its reliance on empirical relationships that may not always be applicable to specific circumstances or locations is a flaw. Other classical models for estimating fluvial erosion are the Soil and Water Assessment Tool (SWAT) and the Universal Soil Loss Equation (USLE). These models offer a more complete understanding of the underlying physical processes and encompass a wider range of situations. Although more difficult to utilise, they depend on the availability and dependability of input data for correctness. AI can help deal with multivariate and complex difficulties and predict soil loss with higher accuracy than traditional methods, and also be used to build unique models for identifying degraded areas. AI techniques have become popular as an alternative predictor for degraded environments. However, this research proposed a hybrid of classical, AI, and GIS methods for efficient and effective modelling of fluvial erosion.

Keywords: Fluvial erosion, classical methods, Artificial Intelligence, Geographic Information System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 185

436 A Hybrid Approach for Thread Recommendation in MOOC Forums

Authors: Ahmad. A. Kardan, Amir Narimani, Foozhan Ataiefard

Abstract:

Recommender Systems have been developed to provide contents and services compatible to users based on their behaviors and interests. Due to information overload in online discussion forums and users diverse interests, recommending relative topics and threads is considered to be helpful for improving the ease of forum usage. In order to lead learners to find relevant information in educational forums, recommendations are even more needed. We present a hybrid thread recommender system for MOOC forums by applying social network analysis and association rule mining techniques. Initial results indicate that the proposed recommender system performs comparatively well with regard to limited available data from users' previous posts in the forum.

Keywords: Association rule mining, hybrid recommender system, massive open online courses, MOOCs, social network analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1263

435 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events

Authors: Jaqueline M. R. Vieira

Abstract:

Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge dataset configurations.

Keywords: Brazil, classifiers, data-mining, Image Segmentation, oil well visualization, classifiers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2544

434 Q-Map: Clinical Concept Mining from Clinical Documents

Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala

Abstract:

Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 779

433 Simultaneous Treatment and Catalytic Gasification of Olive Mill Wastewater under Supercritical Conditions

Authors: Ekin Kıpçak, Sinan Kutluay, Mesut Akgün

Abstract:

Recently, a growing interest has emerged on the development of new and efficient energy sources, due to the inevitable extinction of the nonrenewable energy reserves. One of these alternative sources which has a great potential and sustainability to meet up the energy demand is biomass energy. This significant energy source can be utilized with various energy conversion technologies, one of which is biomass gasification in supercritical water. Water, being the most important solvent in nature, has very important characteristics as a reaction solvent under supercritical circumstances. At temperatures above its critical point (374.8oC and 22.1 MPa), water becomes more acidic and its diffusivity increases. Working with water at high temperatures increases the thermal reaction rate, which in consequence leads to a better dissolving of the organic matters and a fast reaction with oxygen. Hence, supercritical water offers a control mechanism depending on solubility, excellent transport properties based on its high diffusion ability and new reaction possibilities for hydrolysis or oxidation. In this study the gasification of a real biomass, namely olive mill wastewater (OMW), in supercritical water is investigated with the use of Pt/Al2O3 and Ni/Al2O3 catalysts. OMW is a by-product obtained during olive oil production, which has a complex nature characterized by a high content of organic compounds and polyphenols. These properties impose OMW a significant pollution potential, but at the same time, the high content of organics makes OMW a desirable biomass candidate for energy production. All of the catalytic gasification experiments were made with five different reaction temperatures (400, 450, 500, 550 and 600°C), under a constant pressure of 25 MPa. For the experiments conducted with Ni/Al2O3 catalyst, the effect of five reaction times (30, 60, 90, 120 and 150 s) was investigated. However, procuring that similar gasification efficiencies could be obtained at shorter times, the experiments were made by using different reaction times (10, 15, 20, 25 and 30 s) for the case of Pt/Al2O3 catalyst. Through these experiments, the effects of temperature, time and catalyst type on the gasification yields and treatment efficiencies were investigated.

Keywords: Catalyst, Gasification, Olive mill wastewater, Supercritical water.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1747

432 Role of Organic Wastewater Constituents in Iron Redox Cycling for Ferric Sludge Reuse in the Fenton-Based Treatment

Authors: J. Bolobajev, M. Trapido, A. Goi

Abstract:

The practical application of the Fenton-based treatment method for organic compounds-contaminated water purification is limited mainly because of the large amount of ferric sludge formed during the treatment, where ferrous iron (Fe(II)) is used as the activator of the hydrogen peroxide oxidation processes. Reuse of ferric sludge collected from clarifiers to substitute Fe(II) salts allows reducing the total cost of Fenton-type treatment technologies and minimizing the accumulation of hazardous ferric waste. Dissolution of ferric iron (Fe(III)) from the sludge to liquid phase at acidic pH and autocatalytic transformation of Fe(III) to Fe(II) by phenolic compounds (tannic acid, lignin, phenol, catechol, pyrogallol and hydroquinone) added or present as water/wastewater constituents were found to be essentially involved in the Fenton-based oxidation mechanism. Observed enhanced formation of highly reactive species, hydroxyl radicals, resulted in a substantial organic contaminant degradation increase. Sludge reuse at acidic pH and in the presence of ferric iron reductants is a novel strategy in the Fenton-based treatment application for organic compounds-contaminated water purification.

Keywords: Ferric sludge reuse, ferric iron reductant, water treatment, organic pollutant.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1669

431 Genetic-based Anomaly Detection in Logs of Process Aware Systems

Authors: Hanieh Jalali, Ahmad Baraani

Abstract:

Nowaday-s, many organizations use systems that support business process as a whole or partially. However, in some application domains, like software development and health care processes, a normative Process Aware System (PAS) is not suitable, because a flexible support is needed to respond rapidly to new process models. On the other hand, a flexible Process Aware System may be vulnerable to undesirable and fraudulent executions, which imposes a tradeoff between flexibility and security. In order to make this tradeoff available, a genetic-based anomaly detection model for logs of Process Aware Systems is presented in this paper. The detection of an anomalous trace is based on discovering an appropriate process model by using genetic process mining and detecting traces that do not fit the appropriate model as anomalous trace; therefore, when used in PAS, this model is an automated solution that can support coexistence of flexibility and security.

Keywords: Anomaly Detection, Genetic Algorithm, ProcessAware Systems, Process Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925

430 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1985

429 Discovery of Time Series Event Patterns based on Time Constraints from Textual Data

Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara

Abstract:

This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.

Keywords: Text mining, sequential mining, time constraints, daily business reports.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488

428 Knowledge Mining in Web-based Learning Environments

Authors: Nittaya Kerdprasop, Kittisak Kerdprasop

Abstract:

The state of the art in instructional design for computer-assisted learning has been strongly influenced by advances in information technology, Internet and Web-based systems. The emphasis of educational systems has shifted from training to learning. The course delivered has also been changed from large inflexible content to sequential small chunks of learning objects. The concepts of learning objects together with the advanced technologies of Web and communications support the reusability, interoperability, and accessibility design criteria currently exploited by most learning systems. These concepts enable just-in-time learning. We propose to extend theses design criteria further to include the learnability concept that will help adapting content to the needs of learners. The learnability concept offers a better personalization leading to the creation and delivery of course content more appropriate to performance and interest of each learner. In this paper we present a new framework of learning environments containing knowledge discovery as a tool to automatically learn patterns of learning behavior from learners' profiles and history.

Keywords: Knowledge mining, Web-based learning, Learning environments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1786

427 A Hybrid Recommendation System Based On Association Rules

Authors: Ahmed Mohammed K. Alsalama

Abstract:

Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose1 a hybrid framework recommendation system to be applied on two dimensional spaces (User × Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.

Keywords: Data Mining, Association Rules, Recommendation Systems, Hybrid Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3989

426 CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education

Authors: Eman AbuKhousa, Marwan Z. Bataineh

Abstract:

The 21^st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.

Keywords: Clustering analysis, community of practice, data mining, higher education, new faculty challenges, social networks, social influence, professional development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 972

425 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance

Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar

Abstract:

Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.

Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2274

424 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: Client classification, loan suitability, risk rating, CART analysis, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1072

423 Catalytic Gasification of Olive Mill Wastewater as a Biomass Source under Supercritical Conditions

Authors: Ekin Kıpçak, Mesut Akgün

Abstract:

Recently, a growing interest has emerged on the development of new and efficient energy sources, due to the inevitable extinction of the nonrenewable energy reserves. One of these alternative sources which have a great potential and sustainability to meet up the energy demand is biomass energy. This significant energy source can be utilized with various energy conversion technologies, one of which is biomass gasification in supercritical water.

Water, being the most important solvent in nature, has very important characteristics as a reaction solvent under supercritical circumstances. At temperatures above its critical point (374.8^oC and 22.1MPa), water becomes more acidic and its diffusivity increases. Working with water at high temperatures increases the thermal reaction rate, which in consequence leads to a better dissolving of the organic matters and a fast reaction with oxygen. Hence, supercritical water offers a control mechanism depending on solubility, excellent transport properties based on its high diffusion ability and new reaction possibilities for hydrolysis or oxidation.

In this study the gasification of a real biomass, namely olive mill wastewater (OMW), in supercritical water conditions is investigated with the use of Ru/Al₂O₃ catalyst. OMW is a by-product obtained during olive oil production, which has a complex nature characterized by a high content of organic compounds and polyphenols. These properties impose OMW a significant pollution potential, but at the same time, the high content of organics makes OMW a desirable biomass candidate for energy production.

The catalytic gasification experiments were made with five different reaction temperatures (400, 450, 500, 550 and 600°C) and five reaction times (30, 60, 90, 120 and 150s), under a constant pressure of 25MPa. Through these experiments, the effects of reaction temperature and time on the gasification yield, gaseous product composition and OMW treatment efficiency were investigated.

Keywords: Catalyst, Gasification, Olive mill wastewater, Ru/Al2O3, Supercritical water.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2279

422 Mercury Removal Using Pseudomonas putida (ATTC 49128): Effect of Acclimatization Time, Speed and Temperature of Incubator Shaker

Authors: A. A. M. Azoddein, R. M. Yunus, N. M. Sulaiman, A. B. Bustary, K. Sabar

Abstract:

Microbes have been used to solve environmental problems for many years. The role of microorganism to sequester, precipitate or alter the oxidation state of various heavy metals has been extensively studied. Treatment using microorganism interacts with toxic metal are very diverse. The purpose of this research is to remove the mercury using Pseudomonas putida (P. putida), pure culture ATTC 49128 at optimum growth parameters such as techniques of culture, acclimatization time and speed of incubator shaker. Thus, in this study, the optimum growth parameters of P. putida were obtained to achieve the maximum of mercury removal. Based on the optimum parameters of P. putida for specific growth rate, the removal of two different mercury concentration, 1 ppm and 4 ppm were studied. From mercury nitrate solution, a mercuryresistant bacterial strain which is able to reduce from ionic mercury to metallic mercury was used to reduce ionic mercury. The overall levels of mercury removal in this study were between 80% and 89%. The information obtained in this study is of fundamental for understanding of the survival of P. putida ATTC 49128 in mercury solution. Thus, microbial mercury removal is a potential bioremediation for wastewater especially in petrochemical industries in Malaysia.

Keywords: Pseudomonas putida, growth kinetic, biosorption, mercury, petrochemical wastewater.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2421

421 Improving University Operations with Data Mining: Predicting Student Performance

Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević

Abstract:

The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.

Keywords: Data mining, knowledge discovery in databases, prediction models, student success.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2540

420 Data Mining on the Router Logs for Statistical Application Classification

Authors: M. Rahmati, S.M. Mirzababaei

Abstract:

With the advance of information technology in the new era the applications of Internet to access data resources has steadily increased and huge amount of data have become accessible in various forms. Obviously, the network providers and agencies, look after to prevent electronic attacks that may be harmful or may be related to terrorist applications. Thus, these have facilitated the authorities to under take a variety of methods to protect the special regions from harmful data. One of the most important approaches is to use firewall in the network facilities. The main objectives of firewalls are to stop the transfer of suspicious packets in several ways. However because of its blind packet stopping, high process power requirements and expensive prices some of the providers are reluctant to use the firewall. In this paper we proposed a method to find a discriminate function to distinguish between usual packets and harmful ones by the statistical processing on the network router logs. By discriminating these data, an administrator may take an approach action against the user. This method is very fast and can be used simply in adjacent with the Internet routers.

Keywords: Data Mining, Firewall, Optimization, Packetclassification, Statistical Pattern Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1655

419 Study on the Optimization of Completely Batch Water-using Network with Multiple Contaminants Considering Flow Change

Authors: Jian Du, Shui Hong Hong, Lu Meng, Qing Wei Meng

Abstract:

This work addresses the problem of optimizing completely batch water-using network with multiple contaminants where the flow change caused by mass transfer is taken into consideration for the first time. A mathematical technique for optimizing water-using network is proposed based on source-tank-sink superstructure. The task is to obtain the freshwater usage, recycle assignments among water-using units, wastewater discharge and a steady water-using network configuration by following steps. Firstly, operating sequences of water-using units are determined by time constraints. Next, superstructure is simplified by eliminating the reuse and recycle from water-using units with maximum concentration of key contaminants. Then, the non-linear programming model is solved by GAMS (General Algebra Model System) for minimum freshwater usage, maximum water recycle and minimum wastewater discharge. Finally, numbers of operating periods are calculated to acquire the steady network configuration. A case study is solved to illustrate the applicability of the proposed approach.

Keywords: Completely batch process, flow change, multiple contaminants, water-using network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1451

418 Composite Kernels for Public Emotion Recognition from Twitter

Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang

Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

Keywords: Public emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 773

417 Text Mining Analysis of the Reconstruction Plans after the Great East Japan Earthquake

Authors: Minami Ito, Akihiro Iijima

Abstract:

On March 11, 2011, the Great East Japan Earthquake occurred off the coast of Sanriku, Japan. It is important to build a sustainable society through the reconstruction process rather than simply restoring the infrastructure. To compare the goals of reconstruction plans of quake-stricken municipalities, Japanese language morphological analysis was performed by using text mining techniques. Frequently-used nouns were sorted into four main categories of “life”, “disaster prevention”, “economy”, and “harmony with environment”. Because Soma City is affected by nuclear accident, sentences tagged to “harmony with environment” tended to be frequent compared to the other municipalities. Results from cluster analysis and principle component analysis clearly indicated that the local government reinforces the efforts to reduce risks from radiation exposure as a top priority.

Keywords: Eco-friendly reconstruction, harmony with environment, decontamination, nuclear disaster.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1966

416 The Benefits of End-To-End Integrated Planning from the Mine to Client Supply for Minimizing Penalties

Authors: G. Martino, F. Silva, E. Marchal

Abstract:

The control over delivered iron ore blend characteristics is one of the most important aspects of the mining business. The iron ore price is a function of its composition, which is the outcome of the beneficiation process. So, end-to-end integrated planning of mine operations can reduce risks of penalties on the iron ore price. In a standard iron mining company, the production chain is composed of mining, ore beneficiation, and client supply. When mine planning and client supply decisions are made uncoordinated, the beneficiation plant struggles to deliver the best blend possible. Technological improvements in several fields allowed bridging the gap between departments and boosting integrated decision-making processes. Clusterization and classification algorithms over historical production data generate reasonable previsions for quality and volume of iron ore produced for each pile of run-of-mine (ROM) processed. Mathematical modeling can use those deterministic relations to propose iron ore blends that better-fit specifications within a delivery schedule. Additionally, a model capable of representing the whole production chain can clearly compare the overall impact of different decisions in the process. This study shows how flexibilization combined with a planning optimization model between the mine and the ore beneficiation processes can reduce risks of out of specification deliveries. The model capabilities are illustrated on a hypothetical iron ore mine with magnetic separation process. Finally, this study shows ways of cost reduction or profit increase by optimizing process indicators across the production chain and integrating the different plannings with the sales decisions.

Keywords: Clusterization and classification algorithms, integrated planning, optimization, mathematical modeling, penalty minimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 645

415 An Efficient Graph Query Algorithm Based on Important Vertices and Decision Features

Authors: Xiantong Li, Jianzhong Li

Abstract:

Graph has become increasingly important in modeling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. Different from the existing methods, our approach, called VFM (Vertex to Frequent Feature Mapping), makes use of vertices and decision features as the basic indexing feature. VFM constructs two mappings between vertices and frequent features to answer graph queries. The VFM approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. The results show that the proposed method not only avoids the enumeration method of getting subgraphs of query graph, but also effectively reduces the subgraph isomorphism tests between the query graph and graphs in candidate answer set in verification stage.

Keywords: Decision Feature, Frequent Feature, Graph Dataset, Graph Query

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871