Search results for: data stream mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7707

Search results for: data stream mining

7347 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1451
7346 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1957
7345 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: Latent Dirichlet allocation, R program, text mining, topic model, user generated contents, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1189
7344 A Hybrid Recommendation System Based On Association Rules

Authors: Ahmed Mohammed K. Alsalama

Abstract:

Recommendation systems are widely used in e-commerce applications. The engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose1 a hybrid framework recommendation system to be applied on two dimensional spaces (User × Item) with a large number of Users and a small number of Items. Moreover, our proposed framework makes use of both favorite and non-favorite items of a particular user. The proposed framework is built upon the integration of association rules mining and the content-based approach. The results of experiments show that our proposed framework can provide accurate recommendations to users.

Keywords: Data Mining, Association Rules, Recommendation Systems, Hybrid Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3955
7343 Multi-Dimensional Concerns Mining for Web Applications via Concept-Analysis

Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini

Abstract:

Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.

Keywords: Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1439
7342 Real-Time Image Encryption Using a 3D Discrete Dual Chaotic Cipher

Authors: M. F. Haroun, T. A. Gulliver

Abstract:

In this paper, an encryption algorithm is proposed for real-time image encryption. The scheme employs a dual chaotic generator based on a three dimensional (3D) discrete Lorenz attractor. Encryption is achieved using non-autonomous modulation where the data is injected into the dynamics of the master chaotic generator. The second generator is used to permute the dynamics of the master generator using the same approach. Since the data stream can be regarded as a random source, the resulting permutations of the generator dynamics greatly increase the security of the transmitted signal. In addition, a technique is proposed to mitigate the error propagation due to the finite precision arithmetic of digital hardware. In particular, truncation and rounding errors are eliminated by employing an integer representation of the data which can easily be implemented. The simple hardware architecture of the algorithm makes it suitable for secure real-time applications.

Keywords: Chaotic systems, image encryption, 3D Lorenz attractor, non-autonomous modulation, FPGA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1185
7341 Unsteady 3D Post-Stall Aerodynamics Accounting for Effective Loss in Camber Due to Flow Separation

Authors: Aritras Roy, Rinku Mukherjee

Abstract:

The current study couples a quasi-steady Vortex Lattice Method and a camber correcting technique, ‘Decambering’ for unsteady post-stall flow prediction. The wake is force-free and discrete such that the wake lattices move with the free-stream once shed from the wing. It is observed that the time-averaged unsteady coefficient of lift sees a relative drop at post-stall angles of attack in comparison to its steady counterpart for some angles of attack. Multiple solutions occur at post-stall and three different algorithms to choose solutions in these regimes show both unsteadiness and non-convergence of the iterations. The distribution of coefficient of lift on the wing span also shows sawtooth. Distribution of vorticity changes both along span and in the direction of the free-stream as the wake develops over time with distinct roll-up, which increases with time.

Keywords: Post-stall, unsteady, wing, aerodynamics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 954
7340 Yield Prediction Using Support Vectors Based Under-Sampling in Semiconductor Process

Authors: Sae-Rom Pak, Seung Hwan Park, Jeong Ho Cho, Daewoong An, Cheong-Sool Park, Jun Seok Kim, Jun-Geol Baek

Abstract:

It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.

Keywords: Yield Prediction, Semiconductor Test Process, Support Vector Machine, Under Sampling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2362
7339 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: Early Warning System, Knowledge Management, Topic Modeling, Market Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1886
7338 A Study on Finding Similar Document with Multiple Categories

Authors: R. Saraçoğlu, N. Allahverdi

Abstract:

Searching similar documents and document management subjects have important place in text mining. One of the most important parts of similar document research studies is the process of classifying or clustering the documents. In this study, a similar document search approach that includes discussion of out the case of belonging to multiple categories (multiple categories problem) has been carried. The proposed method that based on Fuzzy Similarity Classification (FSC) has been compared with Rocchio algorithm and naive Bayes method which are widely used in text mining. Empirical results show that the proposed method is quite successful and can be applied effectively. For the second stage, multiple categories vector method based on information of categories regarding to frequency of being seen together has been used. Empirical results show that achievement is increased almost two times, when proposed method is compared with classical approach.

Keywords: Document similarity, Fuzzy classification, Multiple categories, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
7337 Lead and Cadmium Spatial Pattern and Risk Assessment around Coal Mine in Hyrcanian Forest, North Iran

Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch

Abstract:

In this study, the effect of coal mining activities on lead and cadmium concentrations and distribution in soil was investigated in Hyrcanian forest, North Iran. 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity; considered as the controlled area. In order to investigate soil lead and cadmium concentration, one sample was taken from the 0-10 cm in each plot. To study the spatial pattern of soil properties and lead and cadmium concentrations in the mining area, an area of 80×80m2 (the mine as the center) was considered and 80 soil samples were systematic-randomly taken (10 m intervals). Geostatistical analysis was performed via Kriging method and GS+ software (version 5.1). In order to estimate the impact of coal mining activities on soil quality, pollution index was measured. Lead and cadmium concentrations were significantly higher in mine area (Pb: 10.97±0.30, Cd: 184.47±6.26 mg.kg-1) in comparison to control area (Pb: 9.42±0.17, Cd: 131.71±15.77 mg.kg-1). The mean values of the PI index indicate that Pb (1.16) and Cd (1.77) presented slightly polluted. Results of the NIPI index showed that Pb (1.44) and Cd (2.52) presented slight pollution and moderate pollution respectively. Results of variography and kriging method showed that it is possible to prepare interpolation maps of lead and cadmium around the mining areas in Hyrcanian forest. According to results of pollution and risk assessments, forest soil was contaminated by heavy metals (lead and cadmium); therefore, using reclamation and remediation techniques in these areas is necessary.

Keywords: Traditional coal mining, heavy metals, pollution indicators, geostatistics, caspian forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1005
7336 Efficient STAKCERT KDD Processes in Worm Detection

Authors: Madihah Mohd Saudi, Andrea J Cullen, Mike E Woodward

Abstract:

This paper presents a new STAKCERT KDD processes for worm detection. The enhancement introduced in the data-preprocessing resulted in the formation of a new STAKCERT model for worm detection. In this paper we explained in detail how all the processes involved in the STAKCERT KDD processes are applied within the STAKCERT model for worm detection. Based on the experiment conducted, the STAKCERT model yielded a 98.13% accuracy rate for worm detection by integrating the STAKCERT KDD processes.

Keywords: data mining, incident response, KDD processes, security metrics and worm detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1624
7335 Discovery of Sequential Patterns Based On Constraint Patterns

Authors: Shigeaki Sakurai, Youichi Kitahata, Ryohei Orihara

Abstract:

This paper proposes a method that discovers sequential patterns corresponding to user-s interests from sequential data. This method expresses the interests as constraint patterns. The constraint patterns can define relationships among attributes of the items composing the data. The method recursively decomposes the constraint patterns into constraint subpatterns. The method evaluates the constraint subpatterns in order to efficiently discover sequential patterns satisfying the constraint patterns. Also, this paper applies the method to the sequential data composed of stock price indexes and verifies its effectiveness through comparing it with a method without using the constraint patterns.

Keywords: Sequential pattern mining, Constraint pattern, Attribute constraint, Stock price indexes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1394
7334 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1923
7333 Investigation of Scour Depth at Bridge Piers using Bri-Stars Model in Iran

Authors: Gh. Saeidifar, F. Raeiszadeh

Abstract:

BRI-STARS (BRIdge Stream Tube model for Alluvial River Simulation) program was used to investigate the scour depth around bridge piers in some of the major river systems in Iran. Model calibration was performed by collecting different field data. Field data are cataloged on three categories, first group of bridges that their rivers bed are formed by fine material, second group of bridges that their rivers bed are formed by sand material, and finally bridges that their rivers bed are formed by gravel or cobble materials. Verification was performed with some field data in Fars Province. Results show that for wide piers, computed scour depth is more than measured one. In gravel bed streams, computed scour depth is greater than measured scour depth, the reason is due to formation of armor layer on bed of channel. Once this layer is eroded, the computed scour depth is close to the measured one.

Keywords: BRI-STARS, local scour, bridge, computer modeling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958
7332 A Numerical Study on the Effects of N2 Dilution on the Flame Structure and Temperature Distribution of Swirl Diffusion Flames

Authors: Yasaman Tohidi, Shidvash Vakilipour, Saeed Ebadi Tavallaee, Shahin Vakilipoor Takaloo, Hossein Amiri

Abstract:

The numerical modeling is performed to study the effects of N2 addition to the fuel stream on the flame structure and temperature distribution of methane-air swirl diffusion flames with different swirl intensities. The Open source Field Operation and Manipulation (OpenFOAM) has been utilized as the computational tool. Flamelet approach along with modified k-ε model is employed to model the flame characteristics.  The results indicate that the presence of N2 in the fuel stream leads to the flame temperature reduction. By increasing of swirl intensity, the flame structure changes significantly. The flame has a conical shape in low swirl intensity; however, it has an hour glass-shape with a shorter length in high swirl intensity. The effects of N2 dilution decrease the flame length in all swirl intensities; however, the rate of reduction is more noticeable in low swirl intensity.

Keywords: Swirl diffusion flame, N2 dilution, OpenFOAM, Swirl intensity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 572
7331 The Impact of Temporal Impairment on Quality of Experience (QoE) in Video Streaming: A No Reference (NR) Subjective and Objective Study

Authors: Muhammad Arslan Usman, Muhammad Rehan Usman, Soo Young Shin

Abstract:

Live video streaming is one of the most widely used service among end users, yet it is a big challenge for the network operators in terms of quality. The only way to provide excellent Quality of Experience (QoE) to the end users is continuous monitoring of live video streaming. For this purpose, there are several objective algorithms available that monitor the quality of the video in a live stream. Subjective tests play a very important role in fine tuning the results of objective algorithms. As human perception is considered to be the most reliable source for assessing the quality of a video stream subjective tests are conducted in order to develop more reliable objective algorithms. Temporal impairments in a live video stream can have a negative impact on the end users. In this paper we have conducted subjective evaluation tests on a set of video sequences containing temporal impairment known as frame freezing. Frame Freezing is considered as a transmission error as well as a hardware error which can result in loss of video frames on the reception side of a transmission system. In our subjective tests, we have performed tests on videos that contain a single freezing event and also for videos that contain multiple freezing events. We have recorded our subjective test results for all the videos in order to give a comparison on the available No Reference (NR) objective algorithms. Finally, we have shown the performance of no reference algorithms used for objective evaluation of videos and suggested the algorithm that works better. The outcome of this study shows the importance of QoE and its effect on human perception. The results for the subjective evaluation can serve the purpose for validating objective algorithms.

Keywords: Objective evaluation, subjective evaluation, quality of experience (QoE), video quality assessment (VQA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1603
7330 Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Authors: T. Niknam, M. Nayeripour, B.Bahmani Firouzi

Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Keywords: Ant Colony Optimization (ACO), Data clustering, Hybrid evolutionary optimization algorithm, K-means clustering, Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2175
7329 Ammonia Gas Removal from Gas Stream by Biofiltration using Agricultural Residue Biofilter Medias in Laboratory-scale Biofilter

Authors: Thaniya Kaosol, Nuttawut Pongpat

Abstract:

In this research, a biofiltration process to remove ammonia gas from gas stream using agricultural residue biofilter medias is studied. The experiments were conducted in laboratoryscale biofilter. The biofilter medias were a mixture of manure fertilizer and bagasse at various ratios i.e., 1:3, 1:5 and 1:7. The experiments were performed for a period of 40 days. The empty bed retention time (EBRT) is 78s. The moisture content of biofilter media was maintained at 45-60% using water. The results showed that the agricultural residues (manure fertilizer and bagasse) are suitable as biofilter media for ammonia gas removal in biofiltration process. The maximum efficiency of ammonia gas removal is observed from the 1:5 of manure fertilizer: bagasse ratio at 89.93%. The biofiltration is more effective at low ammonia gas concentration. In addition, the mixture ratio of biofilter media is not a significant factor in biofiltration operation while the most significant factor for biofiltration operation is the inlet ammonia gas concentration.

Keywords: ammonia gas, biofiltration, biofilter media, removal efficiency, elimination capacity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2468
7328 Appraisal of Methods for Identifying, Mapping, and Modelling of Fluvial Erosion in a Mining Environment

Authors: F. F. Howard, I. Yakubu, C. B. Boye, J. S. Y. Kuma

Abstract:

Natural and human activities, such as mining operations, expose the natural soil to adverse environmental conditions, leading to contamination of soil, groundwater, and surface water, which has negative effects on humans, flora, and fauna. Bare or partly exposed soil is most liable to fluvial erosion. This paper enumerates various methods used to identify, map, and model fluvial erosion in a mining environment. Classical, Artificial Intelligence (AI), and GIS methods have been reviewed. One of the many classical methods used to estimate river erosion is the Revised Universal Soil Loss Equation (RUSLE) model. The RUSLE model is easy to use. Its reliance on empirical relationships that may not always be applicable to specific circumstances or locations is a flaw. Other classical models for estimating fluvial erosion are the Soil and Water Assessment Tool (SWAT) and the Universal Soil Loss Equation (USLE). These models offer a more complete understanding of the underlying physical processes and encompass a wider range of situations. Although more difficult to utilise, they depend on the availability and dependability of input data for correctness. AI can help deal with multivariate and complex difficulties and predict soil loss with higher accuracy than traditional methods, and also be used to build unique models for identifying degraded areas. AI techniques have become popular as an alternative predictor for degraded environments. However, this research proposed a hybrid of classical, AI, and GIS methods for efficient and effective modelling of fluvial erosion.

Keywords: Fluvial erosion, classical methods, Artificial Intelligence, Geographic Information System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 130
7327 The Self-Propelled Model of a Boat, Based on the Wave Thrust

Authors: V. Arabadzhi

Abstract:

We attempted investigate a boat model, based on the conversion of energy of surface wave into a sequence of unidirectional pulses of jet spurts, in other words - model of the boat, which is thrusting by the waves field on water surface. These pulses are forming some average reactive stream from the output nozzle on the stern of boat. The suggested model provides the conversion of its oscillatory motions (both pitching and rolling) into a jet flow. This becomes possible due to special construction of the boat and due to several details, sensitive to the local wave field. The boat model presents the uniflow jet engine without slow conversions of mechanical energy into intermediate forms and without any external sources of energy (besides surface waves). Motion of boat is characterized by fast jerks and average onward velocity, which exceeds the velocities of liquid particles in the wave.

Keywords: Flat-bottomed boat, Underwater wing, Input and output nozzles, Wave thrust, Conversion of wave into a jet stream, Oscillatory motion and onward motion, Squid-like pump, Hatch-like pump, The thrust due to lifting float, The thrust due to radiation reaction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1803
7326 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1918
7325 An Attribute-Centre Based Decision Tree Classification Algorithm

Authors: Gökhan Silahtaroğlu

Abstract:

Decision tree algorithms have very important place at classification model of data mining. In literature, algorithms use entropy concept or gini index to form the tree. The shape of the classes and their closeness to each other some of the factors that affect the performance of the algorithm. In this paper we introduce a new decision tree algorithm which employs data (attribute) folding method and variation of the class variables over the branches to be created. A comparative performance analysis has been held between the proposed algorithm and C4.5.

Keywords: Classification, decision tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1342
7324 Combining Bagging and Boosting

Authors: S. B. Kotsiantis, P. E. Pintelas

Abstract:

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using a voting methodology of bagging and boosting ensembles with 10 subclassifiers in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique was the most accurate.

Keywords: data mining, machine learning, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2521
7323 Production Optimization through Ejector Installation at ESA Platform Offshore North West Java Field

Authors: Arii Bowo Yudhaprasetya, Ario Guritno, Agus Setiawan, Recky Tehupuring, Cosmas Supriatna

Abstract:

The offshore facilities condition of Pertamina Hulu Energi Offshore North West Java (PHE ONWJ) varies greatly from place to place, depending on the characteristics of the presently installed facilities. In some locations, such as ESA platform, gas trap is mainly caused by the occurrence of flash gas phenomenon which is known as mechanical-physical separation process of multiphase flow. Consequently, the presence of gas trap at main oil line would accumulate on certain areas result in a reduced oil stream throughout the pipeline. Any presence of discrete gaseous along continuous oil flow represents a unique flow condition under certain specific volume fraction and velocity field. From gas lift source, a benefit line is used as a motive flow for ejector which is designed to generate a syphon effect to minimize the gas trap phenomenon. Therefore, the ejector’s exhaust stream will flow to the designated point without interfering other systems.

Keywords: Ejector, diffuser, multiphase flow, syphon effects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 935
7322 Plant Varieties Selection System

Authors: Kitti Koonsanit, Chuleerat Jaruskulchai, Poonsak Miphokasap, Apisit Eiumnoh

Abstract:

In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.

Keywords: Plant varieties selection system, decision tree, expert recommendation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1762
7321 Application Methodology for the Generation of 3D Thermal Models Using UAV Photogrammety and Dual Sensors for Mining/Industrial Facilities Inspection

Authors: Javier Sedano-Cibrián, Julio Manuel de Luis-Ruiz, Rubén Pérez-Álvarez, Raúl Pereda-García, Beatriz Malagón-Picón

Abstract:

Structural inspection activities are necessary to ensure the correct functioning of infrastructures. UAV techniques have become more popular than traditional techniques. Specifically, UAV Photogrammetry allows time and cost savings. The development of this technology has permitted the use of low-cost thermal sensors in UAVs. The representation of 3D thermal models with this type of equipment is in continuous evolution. The direct processing of thermal images usually leads to errors and inaccurate results. In this paper, a methodology is proposed for the generation of 3D thermal models using dual sensors, which involves the application of RGB and thermal images in parallel. Hence, the RGB images are used as the basis for the generation of the model geometry, and the thermal images are the source of the surface temperature information that is projected onto the model. Mining/industrial facilities representations that are obtained can be used for inspection activities.

Keywords: Aerial thermography, data processing, drone, low-cost, point cloud.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 282
7320 On Speeding Up Support Vector Machines: Proximity Graphs Versus Random Sampling for Pre-Selection Condensation

Authors: Xiaohua Liu, Juan F. Beltran, Nishant Mohanchandra, Godfried T. Toussaint

Abstract:

Support vector machines (SVMs) are considered to be the best machine learning algorithms for minimizing the predictive probability of misclassification. However, their drawback is that for large data sets the computation of the optimal decision boundary is a time consuming function of the size of the training set. Hence several methods have been proposed to speed up the SVM algorithm. Here three methods used to speed up the computation of the SVM classifiers are compared experimentally using a musical genre classification problem. The simplest method pre-selects a random sample of the data before the application of the SVM algorithm. Two additional methods use proximity graphs to pre-select data that are near the decision boundary. One uses k-Nearest Neighbor graphs and the other Relative Neighborhood Graphs to accomplish the task.

Keywords: Machine learning, data mining, support vector machines, proximity graphs, relative-neighborhood graphs, k-nearestneighbor graphs, random sampling, training data condensation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
7319 Modeling Ambient Carbon Monoxide Pollutant Due to Road Traffic

Authors: Anjaneyulu M.V.L.R., Harikrishna M., Chenchuobulu S.

Abstract:

Rapid urbanization, industrialization and population growth have led to an increase in number of automobiles that cause air pollution. It is estimated that road traffic contributes 60% of air pollution in urban areas. A case by case assessment is required to predict the air quality in urban situations, so as to evolve certain traffic management measures to maintain the air quality levels with in the tolerable limits. Calicut city in the state of Kerala, India has been chosen as the study area. Carbon Monoxide (CO) concentration was monitored at 15 links in Calicut city and air quality performance was evaluated over each link. The CO pollutant concentration values were compared with the National Ambient Air Quality Standards (NAAQS), and the CO values were predicted by using CALINE4 and IITLS and Linear regression models. The study has revealed that linear regression model performs better than the CALINE4 and IITLS models. The possible association between CO pollutant concentration and traffic parameters like traffic flow, type of vehicle, and traffic stream speed was also evaluated.

Keywords: CO pollution, Modelling, Traffic stream parameters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2324
7318 Despiking of Turbulent Flow Data in Gravel Bed Stream

Authors: Ratul Das

Abstract:

The present experimental study insights the decontamination of instantaneous velocity fluctuations captured by Acoustic Doppler Velocimeter (ADV) in gravel-bed streams to ascertain near-bed turbulence for low Reynolds number. The interference between incidental and reflected pulses produce spikes in the ADV data especially in the near-bed flow zone and therefore filtering the data are very essential. Nortek’s Vectrino four-receiver ADV probe was used to capture the instantaneous three-dimensional velocity fluctuations over a non-cohesive bed. A spike removal algorithm based on the acceleration threshold method was applied to note the bed roughness and its influence on velocity fluctuations and velocity power spectra in the carrier fluid. The velocity power spectra of despiked signals with a best combination of velocity threshold (VT) and acceleration threshold (AT) are proposed which ascertained velocity power spectra a satisfactory fit with the Kolmogorov “–5/3 scaling-law” in the inertial sub-range. Also, velocity distributions below the roughness crest level fairly follows a third-degree polynomial series.

Keywords: Acoustic Doppler Velocimeter, gravel-bed, spike removal, Reynolds shear stress, near-bed turbulence, velocity power spectra.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1150