Search results for: mining wastewater

142 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data

Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro

Abstract:

Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.

Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 388

141 Removal of Ni(II), Zn(II) and Pb(II) ions from Single Metal Aqueous Solution using Activated Carbon Prepared from Rice Husk

Authors: Mohd F. Taha, Chong F. Kiat, Maizatul S. Shaharun, Anita Ramli

Abstract:

The abundance and availability of rice husk, an agricultural waste, make them as a good source for precursor of activated carbon. In this work, rice husk-based activated carbons were prepared via base treated chemical activation process prior the carbonization process. The effect of carbonization temperatures (400, 600 and 800oC) on their pore structure was evaluated through morphology analysis using scanning electron microscope (SEM). Sample carbonized at 800oC showed better evolution and development of pores as compared to those carbonized at 400 and 600oC. The potential of rice husk-based activated carbon as an alternative adsorbent was investigated for the removal of Ni(II), Zn(II) and Pb(II) from single metal aqueous solution. The adsorption studies using rice husk-based activated carbon as an adsorbent were carried out as a function of contact time at room temperature and the metal ions were analyzed using atomic absorption spectrophotometer (AAS). The ability to remove metal ion from single metal aqueous solution was found to be improved with the increasing of carbonization temperature. Among the three metal ions tested, Pb(II) ion gave the highest adsorption on rice husk-based activated carbon. The results obtained indicate the potential to utilize rice husk as a promising precursor for the preparation of activated carbon for removal of heavy metals.

Keywords: Activated carbon, metal ion adsorption, rice husk, wastewater treatment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2682

140 An Intelligent System for Phish Detection, using Dynamic Analysis and Template Matching

Authors: Chinmay Soman, Hrishikesh Pathak, Vishal Shah, Aniket Padhye, Amey Inamdar

Abstract:

Phishing, or stealing of sensitive information on the web, has dealt a major blow to Internet Security in recent times. Most of the existing anti-phishing solutions fail to handle the fuzziness involved in phish detection, thus leading to a large number of false positives. This fuzziness is attributed to the use of highly flexible and at the same time, highly ambiguous HTML language. We introduce a new perspective against phishing, that tries to systematically prove, whether a given page is phished or not, using the corresponding original page as the basis of the comparison. It analyzes the layout of the pages under consideration to determine the percentage distortion between them, indicative of any form of malicious alteration. The system design represents an intelligent system, employing dynamic assessment which accurately identifies brand new phishing attacks and will prove effective in reducing the number of false positives. This framework could potentially be used as a knowledge base, in educating the internet users against phishing.

Keywords: World Wide Web, Phishing, Internet security, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794

139 A Distance Function for Data with Missing Values and Its Application

Authors: Loai AbdAllah, Ilan Shimshoni

Abstract:

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Keywords: Missing values, Distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2704

138 Application of Artificial Neural Network to Classification Surface Water Quality

Authors: S. Wechmongkhonkon, N.Poomtong, S. Areerachakul

Abstract:

Water quality is a subject of ongoing concern. Deterioration of water quality has initiated serious management efforts in many countries. This study endeavors to automatically classify water quality. The water quality classes are evaluated using 6 factor indices. These factors are pH value (pH), Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), Nitrate Nitrogen (NO3N), Ammonia Nitrogen (NH3N) and Total Coliform (TColiform). The methodology involves applying data mining techniques using multilayer perceptron (MLP) neural network models. The data consisted of 11 sites of canals in Dusit district in Bangkok, Thailand. The data is obtained from the Department of Drainage and Sewerage Bangkok Metropolitan Administration during 2007-2011. The results of multilayer perceptron neural network exhibit a high accuracy multilayer perception rate at 96.52% in classifying the water quality of Dusit district canal in Bangkok Subsequently, this encouraging result could be applied with plan and management source of water quality.

Keywords: artificial neural network, classification, surface water quality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3162

137 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification

Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman

Abstract:

In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.

Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2655

136 Bayesian Networks for Earthquake Magnitude Classification in a Early Warning System

Authors: G. Zazzaro, F.M. Pisano, G. Romano

Abstract:

During last decades, worldwide researchers dedicated efforts to develop machine-based seismic Early Warning systems, aiming at reducing the huge human losses and economic damages. The elaboration time of seismic waveforms is to be reduced in order to increase the time interval available for the activation of safety measures. This paper suggests a Data Mining model able to correctly and quickly estimate dangerousness of the running seismic event. Several thousand seismic recordings of Japanese and Italian earthquakes were analyzed and a model was obtained by means of a Bayesian Network (BN), which was tested just over the first recordings of seismic events in order to reduce the decision time and the test results were very satisfactory. The model was integrated within an Early Warning System prototype able to collect and elaborate data from a seismic sensor network, estimate the dangerousness of the running earthquake and take the decision of activating the warning promptly.

Keywords: Bayesian Networks, Decision Support System, Magnitude Classification, Seismic Early Warning System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3561

135 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.

Keywords: Politics, machine learning, feature selection, LIWC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2320

134 Lexical Database for Multiple Languages: Multilingual Word Semantic Network

Authors: K. K. Yong, R. Mahmud, C. S. Woo

Abstract:

Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators and corpora are usually limited to certain languages and domains. Furthermore, search results from engines with traditional 'keyword' approach are no longer satisfying. More intelligent knowledge engineering agents are needed. To address to these problems, a system known as Multilingual Word Semantic Network is proposed. This system adapted semantic network to organize words according to concepts and relations. The system also uses open source as the development philosophy to enable the native language speakers and experts to contribute their knowledge to the system. The contributed words are then defined and linked using lexical and semantic relations. Thus, related words and derivatives can be identified and linked. From the outcome of the system implementation, it contributes to the development of semantic web and knowledge engineering.

Keywords: Multilingual, semantic network, intelligent knowledge engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1918

133 Attacks Classification in Adaptive Intrusion Detection using Decision Tree

Authors: Dewan Md. Farid, Nouria Harbi, Emna Bahri, Mohammad Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Recently, information security has become a key issue in information technology as the number of computer security breaches are exposed to an increasing number of security threats. A variety of intrusion detection systems (IDS) have been employed for protecting computers and networks from malicious network-based or host-based attacks by using traditional statistical methods to new data mining approaches in last decades. However, today's commercially available intrusion detection systems are signature-based that are not capable of detecting unknown attacks. In this paper, we present a new learning algorithm for anomaly based network intrusion detection system using decision tree algorithm that distinguishes attacks from normal behaviors and identifies different types of intrusions. Experimental results on the KDD99 benchmark network intrusion detection dataset demonstrate that the proposed learning algorithm achieved 98% detection rate (DR) in comparison with other existing methods.

Keywords: Detection rate, decision tree, intrusion detectionsystem, network security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3563

132 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch

Abstract:

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.

Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2481

131 Modelling Phytoremediation Rates of Aquatic Macrophytes in Aquaculture Effluent

Authors: E. A. Kiridi, A. O. Ogunlela

Abstract:

Pollutants from aquacultural practices constitute environmental problems and phytoremediation could offer cheaper environmentally sustainable alternative since equipment using advanced treatment for fish tank effluent is expensive to import, install, operate and maintain, especially in developing countries. The main objective of this research was, therefore, to develop a mathematical model for phytoremediation by aquatic plants in aquaculture wastewater. Other objectives were to evaluate the retention times on phytoremediation rates using the model and to measure the nutrient level of the aquaculture effluent and phytoremediation rates of three aquatic macrophytes, namely; water hyacinth (Eichornia crassippes), water lettuce (Pistial stratoites) and morning glory (Ipomea asarifolia). A completely randomized experimental design was used in the study. Approximately 100 g of each macrophyte were introduced into the hydroponic units and phytoremediation indices monitored at 8 different intervals from the first to the 28^th day. The water quality parameters measured were pH and electrical conductivity (EC). Others were concentration of ammonium–nitrogen (NH4⁺ -N), nitrite- nitrogen (NO₂^- -N), nitrate- nitrogen (NO₃^- -N), phosphate –phosphorus (PO₄^3- -P), and biomass value. The biomass produced by water hyacinth was 438.2 g, 600.7 g, 688.2 g and 725.7 g at four 7–day intervals. The corresponding values for water lettuce were 361.2 g, 498.7 g, 561.2 g and 623.7 g and for morning glory were 417.0 g, 567.0 g, 642.0 g and 679.5g. Coefficient of determination was greater than 80% for EC, TDS, NO₂^- -N, NO₃^- -N and 70% for NH₄⁺ -N using any of the macrophytes and the predicted values were within the 95% confidence interval of measured values. Therefore, the model is valuable in the design and operation of phytoremediation systems for aquaculture effluent.

Keywords: Phytoremediation, macrophytes, hydroponic unit, aquaculture effluent, mathematical model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577

130 The Development of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications

Authors: Mohamed R. Mhereeg

Abstract:

The paper investigates the feasibility of constructing a software multi-agent based monitoring and classification system and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. The agents function autonomously to provide continuous and periodic monitoring of excels spreadsheet workbooks. Resulting in, the development of the MultiAgent classification System (MACS) that is in compliance with the specifications of the Foundation for Intelligent Physical Agents (FIPA). However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies that are Windows Communication Foundation (WCF) services, Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW that is in order to satisfy the monitoring and classification of the multiple developer aspect. ODM was used to automate the classification phase of MACS.

Keywords: Autonomous, Classification, MACS, Multi-Agent, SOA, WCF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544

129 Rapid Determination of Biochemical Oxygen Demand

Authors: Mayur Milan Kale, Indu Mehrotra

Abstract:

Biochemical Oxygen Demand (BOD) is a measure of the oxygen used in bacteria mediated oxidation of organic substances in water and wastewater. Theoretically an infinite time is required for complete biochemical oxidation of organic matter, but the measurement is made over 5-days at 20 0C or 3-days at 27 0C test period with or without dilution. Researchers have worked to further reduce the time of measurement. The objective of this paper is to review advancement made in BOD measurement primarily to minimize the time and negate the measurement difficulties. Survey of literature review in four such techniques namely BOD-BARTTM, Biosensors, Ferricyanidemediated approach, luminous bacterial immobilized chip method. Basic principle, method of determination, data validation and their advantage and disadvantages have been incorporated of each of the methods. In the BOD-BARTTM method the time lag is calculated for the system to change from oxidative to reductive state. BIOSENSORS are the biological sensing element with a transducer which produces a signal proportional to the analyte concentration. Microbial species has its metabolic deficiencies. Co-immobilization of bacteria using sol-gel biosensor increases the range of substrate. In ferricyanidemediated approach, ferricyanide has been used as e-acceptor instead of oxygen. In Luminous bacterial cells-immobilized chip method, bacterial bioluminescence which is caused by lux genes was observed. Physiological responses is measured and correlated to BOD due to reduction or emission. There is a scope to further probe into the rapid estimation of BOD.

Keywords: BOD, Four methods, Rapid estimation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3597

128 Using Data Mining Methodology to Build the Predictive Model of Gold Passbook Price

Authors: Chien-Hui Yang, Che-Yang Lin, Ya-Chen Hsu

Abstract:

Gold passbook is an investing tool that is especially suitable for investors to do small investment in the solid gold. The gold passbook has the lower risk than other ways investing in gold, but its price is still affected by gold price. However, there are many factors can cause influences on gold price. Therefore, building a model to predict the price of gold passbook can both reduce the risk of investment and increase the benefits. This study investigates the important factors that influence the gold passbook price, and utilize the Group Method of Data Handling (GMDH) to build the predictive model. This method can not only obtain the significant variables but also perform well in prediction. Finally, the significant variables of gold passbook price, which can be predicted by GMDH, are US dollar exchange rate, international petroleum price, unemployment rate, whole sale price index, rediscount rate, foreign exchange reserves, misery index, prosperity coincident index and industrial index.

Keywords: Gold price, Gold passbook price, Group Method ofData Handling (GMDH), Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2244

127 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2419

126 DCBOR: A Density Clustering Based on Outlier Removal

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1891

125 Dynamics of Mini Hydraulic Backhoe Excavator: A Lagrange-Euler (L-E) Approach

Authors: Bhaveshkumar P. Patel, J. M. Prajapati

Abstract:

Excavators are high power machines used in the mining, agricultural and construction industry whose principal functions are digging (material removing), ground leveling and material transport operations. During the digging task there are certain unknown forces exerted by the bucket on the soil and the digging operation is repetitive in nature. Automation of the digging task can be performed by an automatically controlled excavator system, which is not only control the forces but also follow the planned digging trajectories. To develop such a controller for automated excavation, it is required to develop a dynamic model to describe the behavior of the control system during digging operation and motion of excavator with time. The presented work described a dynamic model needed for controller design and which is derived by applying Lagrange-Euler approach. The developed dynamic model is intended for further development of an automated excavation control system for light duty construction work and can be applied for heavy duty or all types of backhoe excavators.

Keywords: Backhoe excavator, controller, digging, excavation, trajectory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4407

124 On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

Authors: Tatjana Eitrich, Bruno Lang

Abstract:

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.

Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1399

123 Product Features Extraction from Opinions According to Time

Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou

Abstract:

Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.

Keywords: Opinion mining, product feature extraction, sentiment analysis, SentiWordNet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1268

122 Hybrid Advanced Oxidative Pretreatment of Complex Industrial Effluent for Biodegradability Enhancement

Authors: K. Paradkar, S. N. Mudliar, A. Sharma, A. B. Pandit, R. A. Pandey

Abstract:

The study explores the hybrid combination of Hydrodynamic Cavitation (HC) and Subcritical Wet Air Oxidation-based pretreatment of complex industrial effluent to enhance the biodegradability selectively (without major COD destruction) to facilitate subsequent enhanced downstream processing via anaerobic or aerobic biological treatment. Advanced oxidation based techniques can be less efficient as standalone options and a hybrid approach by combining Hydrodynamic Cavitation (HC), and Wet Air Oxidation (WAO) can lead to a synergistic effect since both the options are based on common free radical mechanism. The HC can be used for initial turbulence and generation of hotspots which can begin the free radical attack and this agitating mixture then can be subjected to less intense WAO since initial heat (to raise the activation energy) can be taken care by HC alone. Lab-scale venturi-based hydrodynamic cavitation and wet air oxidation reactor with biomethanated distillery wastewater (BMDWW) as a model effluent was examined for establishing the proof-of-concept. The results indicated that for a desirable biodegradability index (BOD: COD - BI) enhancement (up to 0.4), the Cavitation (standalone) pretreatment condition was: 5 bar and 88 min reaction time with a COD reduction of 36 % and BI enhancement of up to 0.27 (initial BI - 0.17). The optimum WAO condition (standalone) was: 150oC, 6 bar and 30 minutes with 31% COD reduction and 0.33 BI. The hybrid pretreatment (combined Cavitation + WAO) worked out to be 23.18 min HC (at 5 bar) followed by 30 min WAO at 150oC, 6 bar, at which around 50% COD was retained yielding a BI of 0.55. FTIR & NMR analysis of pretreated effluent indicated dissociation and/or reorientation of complex organic compounds in untreated effluent to simpler organic compounds post-pretreatment.

Keywords: BI, hybrid, hydrodynamic cavitation, wet air oxidation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1717

121 Experimental and CFD Simulation of the Jet Pump for Air Bubbles Formation

Authors: L. Grinis, N. Lubashevsky, Y. Ostrovski

Abstract:

A jet pump is a type of pump that accelerates the flow of a secondary fluid (driven fluid) by introducing a motive fluid with high velocity into a converging-diverging nozzle. Jet pumps are also known as adductors or ejectors depending on the motivator phase. The ejector's motivator is of a gaseous nature, usually steam or air, while the educator's motivator is a liquid, usually water. Jet pumps are devices that use air bubbles and are widely used in wastewater treatment processes. In this work, we will discuss about the characteristics of the jet pump and the computational simulation of this device. To find the optimal angle and depth for the air pipe, so as to achieve the maximal air volumetric flow rate, an experimental apparatus was constructed to ascertain the best geometrical configuration for this new type of jet pump. By using 3D printing technology, a series of jet pumps was printed and tested whilst aspiring to maximize air flow rate dependent on angle and depth of the air pipe insertion. The experimental results show a major difference of up to 300% in performance between the different pumps (ratio of air flow rate to supplied power) where the optimal geometric model has an insertion angle of 60⁰ and air pipe insertion depth ending at the center of the mixing chamber. The differences between the pumps were further explained by using CFD for better understanding the reasons that affect the airflow rate. The validity of the computational simulation and the corresponding assumptions have been proved experimentally. The present research showed high degree of congruence with the results of the laboratory tests. This study demonstrates the potential of using of the jet pump in many practical applications.

Keywords: Air bubbles, CFD simulation, jet pump, practical applications.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1988

120 A Reference Framework Integrating Lean and Green Principles within Supply Chain Management

Authors: M. Bortolini, E. Ferrari, F. G. Galizia, C. Mora

Abstract:

In the last decades, an increasing set of companies adopted lean philosophy to improve their productivity and efficiency promoting the so-called continuous improvement concept, reducing waste of time and cutting off no-value added activities. In parallel, increasing attention rises toward green practice and management through the spread of the green supply chain pattern, to minimise landfilled waste, drained wastewater and pollutant emissions. Starting from a review on contributions deepening lean and green principles applied to supply chain management, the most relevant drivers to measure the performance of industrial processes are pointed out. Specific attention is paid on the role of cost because it is of key importance and it crosses both lean and green principles. This analysis leads to figure out an original reference framework for integrating lean and green principles in designing and managing supply chains. The proposed framework supports the application, to the whole value chain or to parts of it, e.g. distribution network, assembly system, job-shop, storage system etc., of the lean-green integrated perspective. Evidences show that the combination of the lean and green practices lead to great results, higher than the sum of the performances from their separate application. Lean thinking has beneficial effects on green practices and, at the same time, methods allowing environmental savings generate positive effects on time reduction and process quality increase.

Keywords: Environmental sustainability, green supply chain, integrated framework, lean thinking, supply chain management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2740

119 Cost Sensitive Feature Selection in Decision-Theoretic Rough Set Models for Customer Churn Prediction: The Case of Telecommunication Sector Customers

Authors: Emel Kızılkaya Aydogan, Mihrimah Ozmen, Yılmaz Delice

Abstract:

In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.

Keywords: Churn prediction, data mining, decision-theoretic rough set, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1727

118 Artificial Intelligence Techniques Applications for Power Disturbances Classification

Authors: K.Manimala, Dr.K.Selvi, R.Ahila

Abstract:

Artificial Intelligence (AI) methods are increasingly being used for problem solving. This paper concerns using AI-type learning machines for power quality problem, which is a problem of general interest to power system to provide quality power to all appliances. Electrical power of good quality is essential for proper operation of electronic equipments such as computers and PLCs. Malfunction of such equipment may lead to loss of production or disruption of critical services resulting in huge financial and other losses. It is therefore necessary that critical loads be supplied with electricity of acceptable quality. Recognition of the presence of any disturbance and classifying any existing disturbance into a particular type is the first step in combating the problem. In this work two classes of AI methods for Power quality data mining are studied: Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). We show that SVMs are superior to ANNs in two critical respects: SVMs train and run an order of magnitude faster; and SVMs give higher classification accuracy.

Keywords: back propagation network, power quality, probabilistic neural network, radial basis function support vector machine

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1520

117 Treatment of Paper and Pulp Mill Effluent by Coagulation

Authors: Pradeep Kumar, Tjoon Tow Teng, Shri Chand, Kailas L. Wasewar

Abstract:

The pulp and paper mill effluent is one of the high polluting effluent amongst the effluents obtained from polluting industries. All the available methods for treatment of pulp and paper mill effluent have certain drawbacks. The coagulation is one of the cheapest process for treatment of various organic effluents. Thus, the removal of chemical oxygen demand (COD) and colour of paper mill effluent is studied using coagulation process. The batch coagulation process was performed using various coagulants like: aluminium chloride, poly aluminium chloride and copper sulphate. The initial pH of the effluent (Coagulation pH) has tremendous effect on COD and colour removal. Poly aluminium chloride (PAC) as coagulant reduced COD to 84 % and 92 % of colour was removed at an optimum pH 5 and coagulant dose of 8 ml l-1. With aluminium chloride at an optimum pH = 4 and coagulant dose of 5 g l-1, 74 % COD and 86 % colour removal were observed. The results using copper sulphate as coagulant (a less commercial coagulant) were encouraging. At an optimum pH 6 and mass loading of 5 g l-1, 76 % COD reduction and 78 % colour reduction were obtained. It was also observed that after addition of coagulant, the pH of the effluent decreases. The decrease in pH was highest for AlCl3, which was followed by PAC and CuSO4. Significant amount of COD reductions was obtained by coagulation process. Since the coagulation process is the first stage for treatment of effluent and some of the coagulant cations usually remain in the treated effluents. Thus, cation like copper may be one of the good catalyst for second stage of treatment process like wet oxidation. The copper has been found to be good oxidation catalyst then iron and aluminum.

Keywords: Aluminium based coagulants, Coagulation, Copper, PAC, Pulp and paper mill effluent, Wastewater treatment

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6942

116 Phytoremediation Potential of Native Plants Growing on a Heavy Metals Contaminated Soil of Copper mine in Iran

Authors: B. Lorestani, M. Cheraghi, N. Yousefi

Abstract:

A research project dealing with the phytoremediation of a soil polluted by some heavy metals is currently running. The case study is represented by a mining area in Hamedan province in the central west part of Iran. The potential of phytoextraction and phytostabilization of plants was evaluated considering the concentration of heavy metals in the plant tissues and also the bioconcentration factor (BCF) and the translocation factor (TF). Also the several established criteria were applied to define hyperaccumulator plants in the studied area. Results showed that none of the collected plant species were suitable for phytoextraction of Cu, Zn, Fe and Mn, but among the plants, Euphorbia macroclada was the most efficient in phytostabilization of Cu and Fe, while, Ziziphora clinopodioides, Cousinia sp. and Chenopodium botrys were the most suitable for phytostabilization of Zn and Chondrila juncea and Stipa barbata had the potential for phytostabilization of Mn. Using the most common criterion, Euphorbia macroclada and Verbascum speciosum were Fe hyperaccumulator plants. Present study showed that native plant species growing on contaminated sites may have the potential for phytoremediation.

Keywords: Bioconcentration factor, Heavy metals, Hyperaccumulator, Phytoremediation, Translocation factor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3894

115 A Novel In-Place Sorting Algorithm with O(n log z) Comparisons and O(n log z) Moves

Authors: Hanan Ahmed-Hosni Mahmoud, Nadia Al-Ghreimil

Abstract:

In-place sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel in-place sorting algorithm is presented. The algorithm comprises two phases; rearranging the input unsorted array in place, resulting segments that are ordered relative to each other but whose elements are yet to be sorted. The first phase requires linear time, while, in the second phase, elements of each segment are sorted inplace in the order of z log (z), where z is the size of the segment, and O(1) auxiliary storage. The algorithm performs, in the worst case, for an array of size n, an O(n log z) element comparisons and O(n log z) element moves. Further, no auxiliary arithmetic operations with indices are required. Besides these theoretical achievements of this algorithm, it is of practical interest, because of its simplicity. Experimental results also show that it outperforms other in-place sorting algorithms. Finally, the analysis of time and space complexity, and required number of moves are presented, along with the auxiliary storage requirements of the proposed algorithm.

Keywords: Auxiliary storage sorting, in-place sorting, sorting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1878

114 Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Authors: Yogita, Durga Toshniwal

Abstract:

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2594

113 The Implementation of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications

Authors: Mohamed R. Mhereeg

Abstract:

The paper discusses the implementation of the MultiAgent classification System (MACS) and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies, which are the .NET widows service based agents, the Windows Communication Foundation (WCF) services, the Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW. The Monitoring Agents (MAs) were configured to execute automatically to monitor excel spreadsheets development activities by content. Data gathered by the Monitoring Agents from various resources over a period of time was collected and filtered by a Database Updater Agent (DUA) residing in the .NET client application of the system. This agent then transfers and stores the data in Oracle server database via Oracle stored procedures for further processing that leads to the classification of the end user developers.

Keywords: MACS, Implementation, Multi-Agent, SOA, Autonomous, WCF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673