Search results for: data stream mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7782

Search results for: data stream mining

7272 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1271
7271 Unsupervised Clustering Methods for Identifying Rare Events in Anomaly Detection

Authors: Witcha Chimphlee, Abdul Hanan Abdullah, Mohd Noor Md Sap, Siriporn Chimphlee, Surat Srinoy

Abstract:

It is important problems to increase the detection rates and reduce false positive rates in Intrusion Detection System (IDS). Although preventative techniques such as access control and authentication attempt to prevent intruders, these can fail, and as a second line of defence, intrusion detection has been introduced. Rare events are events that occur very infrequently, detection of rare events is a common problem in many domains. In this paper we propose an intrusion detection method that combines Rough set and Fuzzy Clustering. Rough set has to decrease the amount of data and get rid of redundancy. Fuzzy c-means clustering allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect suspicious activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining-(KDDCup 1999) Dataset show that the method is efficient and practical for intrusion detection systems.

Keywords: Network and security, intrusion detection, fuzzy cmeans, rough set.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2862
7270 Experience Modularization for New Value of Evanescent Cultural Communities: Developing Creative Tourism Services in Bangkok

Authors: Wuttigrai Ngamsirijit

Abstract:

Creative tourism is an ongoing development in many countries as an attempt to moving away from serial reproduction of culture and reviving the culture. Despite, in the destinations with diverse and potential cultural resources, creating new tourism services can be vague. This paper presents how tourism experiences are modularized and consolidated in order to form new creative tourism service offerings in evanescent cultural communities of Bangkok, Thailand. The benefits from data mining in accommodating value co-creation are discussed, and implication of experience modularization to national creative tourism policy is addressed.

Keywords: Co-creation, Creative tourism, New Service Design

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2403
7269 Development of Innovative Islamic Web Applications

Authors: Farrukh Shahzad

Abstract:

The rich Islamic resources related to religious text, Islamic sciences, and history are widely available in print and in electronic format online. However, most of these works are only available in Arabic language. In this research, an attempt is made to utilize these resources to create interactive web applications in Arabic, English and other languages. The system utilizes the Pattern Recognition, Knowledge Management, Data Mining, Information Retrieval and Management, Indexing, storage and data-analysis techniques to parse, store, convert and manage the information from authentic Arabic resources. These interactive web Apps provide smart multi-lingual search, tree based search, on-demand information matching and linking. In this paper, we provide details of application architecture, design, implementation and technologies employed. We also presented the summary of web applications already developed. We have also included some screen shots from the corresponding web sites. These web applications provide an Innovative On-line Learning Systems (eLearning and computer based education).

Keywords: Islamic resources, Muslim scholars, hadith, narrators, history, fiqh.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1302
7268 MCOKE: Multi-Cluster Overlapping K-Means Extension Algorithm

Authors: Said Baadel, Fadi Thabtah, Joan Lu

Abstract:

Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold be defined a priori which can be difficult to determine by novice users.

Keywords: Data mining, k-means, MCOKE, overlapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2754
7267 Hydrodynamic Characterisation of a Hydraulic Flume with Sheared Flow

Authors: Daniel Rowe, Christopher R. Vogel, Richard H. J. Willden

Abstract:

This study documents the hydrodynamic characteristics of a recirculating water flume in preparation for experimental testing of horizontal axis tidal stream turbine models. An Acoustic Doppler Velocimeter (ADV) was used to measure the flow at high temporal resolution at various locations throughout the flume, enabling the spatial uniformity and turbulence flow parameters to be investigated. The mean velocity profiles exhibited high levels of spatial uniformity at the design speed of the flume, 0.6 ms−1, with variations in the three-dimensional velocity components on the order of ±1% at the 95% confidence level, along with a modest streamwise acceleration through the measurement domain, a target 5m working section of the flume. A high degree of uniformity was also apparent for the turbulence intensity, with values ranging between 1-2% across the intended swept area of the turbine rotor. The integral scales of turbulence exhibited a far higher degree of variation throughout the water column, particularly in the streamwise and vertical scales. This behaviour is believed to be due to the high signal noise content leading to decorrelation in the sampling records. To achieve more realistic levels of vertical velocity shear in the flume, a simple procedure to practically generate target vertical shear profiles in open-channel flows is described. Here, we arranged a series of non-uniformly spaced parallel bars placed across the width of the flume and normal to the onset flow. By adjusting the resistance grading across the height of the working section, the downstream profiles could be modified accordingly, characterised by changes in the velocity profile power-law exponent, 1/n. Considering the significant temporal variation in a tidal channel, the choice of the exponent denominator, n = 6 and n = 9, effectively provides an achievable range around the much-cited value of n = 7 observed at many tidal sites. The resulting flow profiles, which we intend to use in future turbine tests, have been characterised in detail. The results indicate non-uniform vertical shear across the survey area and reveal substantial corner flows, arising from the differential shear between the target vertical and cross-stream shear profiles throughout the measurement domain. In vertically sheared flow, the rotor-equivalent turbulence intensity ranges between 3.0-3.8% throughout the measurement domain for both bar arrangements, while the streamwise integral length scale grows from a characteristic dimension on the order of the bar width, similar to the flow downstream of a turbulence-generating grid. The experimental tests are well-defined and repeatable and serve as a reference for other researchers who wish to undertake similar investigations.

Keywords: Acoustic Doppler velocimetry, experimental hydrodynamics, open-channel flow, shear profiles, tidal stream turbines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 72
7266 A Growing Natural Gas Approach for Evaluating Quality of Software Modules

Authors: Parvinder S. Sandhu, Sandeep Khimta, Kiranpreet Kaur

Abstract:

The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.

Keywords: Growing Neural Gas, data clustering, fault prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1865
7265 Bayesian Networks for Earthquake Magnitude Classification in a Early Warning System

Authors: G. Zazzaro, F.M. Pisano, G. Romano

Abstract:

During last decades, worldwide researchers dedicated efforts to develop machine-based seismic Early Warning systems, aiming at reducing the huge human losses and economic damages. The elaboration time of seismic waveforms is to be reduced in order to increase the time interval available for the activation of safety measures. This paper suggests a Data Mining model able to correctly and quickly estimate dangerousness of the running seismic event. Several thousand seismic recordings of Japanese and Italian earthquakes were analyzed and a model was obtained by means of a Bayesian Network (BN), which was tested just over the first recordings of seismic events in order to reduce the decision time and the test results were very satisfactory. The model was integrated within an Early Warning System prototype able to collect and elaborate data from a seismic sensor network, estimate the dangerousness of the running earthquake and take the decision of activating the warning promptly.

Keywords: Bayesian Networks, Decision Support System, Magnitude Classification, Seismic Early Warning System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3598
7264 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay

Abstract:

Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies  the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.

Keywords: Retail stores, Faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 583
7263 Numerical Study of the Influence of the Primary Stream Pressure on the Performance of the Ejector Refrigeration System Based on Heat Exchanger Modeling

Authors: Elhameh Narimani, Mikhail Sorin, Philippe Micheau, Hakim Nesreddine

Abstract:

Numerical models of the heat exchangers in ejector refrigeration system (ERS) were developed and validated with the experimental data. The models were based on the switched heat exchangers model using the moving boundary method, which were capable of estimating the zones’ lengths, the outlet temperatures of both sides and the heat loads at various experimental points. The developed models were utilized to investigate the influence of the primary flow pressure on the performance of an R245fa ERS based on its coefficient of performance (COP) and exergy efficiency. It was illustrated numerically and proved experimentally that increasing the primary flow pressure slightly reduces the COP while the exergy efficiency goes through a maximum before decreasing.

Keywords: Coefficient of performance, ejector refrigeration system, exergy efficiency, heat exchangers modeling, moving boundary method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 551
7262 Using Data Mining Methodology to Build the Predictive Model of Gold Passbook Price

Authors: Chien-Hui Yang, Che-Yang Lin, Ya-Chen Hsu

Abstract:

Gold passbook is an investing tool that is especially suitable for investors to do small investment in the solid gold. The gold passbook has the lower risk than other ways investing in gold, but its price is still affected by gold price. However, there are many factors can cause influences on gold price. Therefore, building a model to predict the price of gold passbook can both reduce the risk of investment and increase the benefits. This study investigates the important factors that influence the gold passbook price, and utilize the Group Method of Data Handling (GMDH) to build the predictive model. This method can not only obtain the significant variables but also perform well in prediction. Finally, the significant variables of gold passbook price, which can be predicted by GMDH, are US dollar exchange rate, international petroleum price, unemployment rate, whole sale price index, rediscount rate, foreign exchange reserves, misery index, prosperity coincident index and industrial index.

Keywords: Gold price, Gold passbook price, Group Method ofData Handling (GMDH), Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2285
7261 Accumulation of Pollutants, Self-purification and Impact on Peripheral Urban Areas: A Case Study in Shantytowns in Argentina

Authors: N. Porzionato, M. Mantiñan, E. Bussi, S. Grinberg, R. Gutierrez, G. Curutchet

Abstract:

This work sets out to debate the tensions involved in the processes of contamination and self-purification in the urban space, particularly in the streams that run through the Buenos Aires metropolitan area. For much of their course, those streams are piped; their waters do not come into contact with the outdoors until they have reached deeply impoverished urban areas with high levels of environmental contamination. These are peripheral zones that, until thirty years ago, were marshlands and fields. They are now densely populated areas largely lacking in urban infrastructure. The Cárcova neighborhood, where this project is underway, is in the José León Suárez section of General San Martín county, Buenos Aires province. A stretch of José León Suarez canal crosses the neighborhood. Starting upstream, this canal carries pollutants due to the sewage and industrial waste released into it. Further downstream, in the neighborhood, domestic drainage is poured into the stream. In this paper, we formulate a hypothesis diametrical to the one that holds that these neighborhoods are the primary source of contamination, suggesting instead that in the stretch of the canal that runs through the neighborhood the stream’s waters are actually cleaned and the sediments accumulate pollutants. Indeed, the stretches of water that runs through these neighborhoods act as water processing plants for the metropolis. This project has studied the different organic-load polluting contributions to the water in a certain stretch of the canal, the reduction of that load over the course of the canal, and the incorporation of pollutants into the sediments. We have found that the surface water has considerable ability to self-purify, mostly due to processes of sedimentation and adsorption. The polluting load is accumulated in the sediments where that load stabilizes slowly by means of anaerobic processes. In this study, we also investigated the risks of sediment management and the use of the processes studied here in controlled conditions as tools of environmental restoration.

Keywords: Bioremediation, pollutants, sediments, urban streams.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2480
7260 Annual Power Load Forecasting Using Support Vector Regression Machines: A Study on Guangdong Province of China 1985-2008

Authors: Zhiyong Li, Zhigang Chen, Chao Fu, Shipeng Zhang

Abstract:

Load forecasting has always been the essential part of an efficient power system operation and planning. A novel approach based on support vector machines is proposed in this paper for annual power load forecasting. Different kernel functions are selected to construct a combinatorial algorithm. The performance of the new model is evaluated with a real-world dataset, and compared with two neural networks and some traditional forecasting techniques. The results show that the proposed method exhibits superior performance.

Keywords: combinatorial algorithm, data mining, load forecasting, support vector machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646
7259 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4876
7258 An Optimal Control of Water Pollution in a Stream Using a Finite Difference Method

Authors: Nopparat Pochai, Rujira Deepana

Abstract:

Water pollution assessment problems arise frequently in environmental science. In this research, a finite difference method for solving the one-dimensional steady convection-diffusion equation with variable coefficients is proposed; it is then used to optimize water treatment costs.

Keywords: Finite difference, One-dimensional, Steady state, Waterpollution control, Optimization, Convection-diffusion equation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1747
7257 Aerodynamic Models for the Analysis of Vertical Axis Wind Turbines (VAWTs)

Authors: T. Brahimi, F. Saeed, I. Paraschivoiu

Abstract:

This paper details the progress made in the development of the different state-of-the-art aerodynamic tools for the analysis of vertical axis wind turbines including the flow simulation around the blade, viscous flow, stochastic wind, and dynamic stall effects. The paper highlights the capabilities of the developed wind turbine aerodynamic codes over the last thirty years which are currently being used in North America and Europe by Sandia Laboratories, FloWind, IMST Marseilles, and Hydro-Quebec among others. The aerodynamic codes developed at Ecole Polytechnique de Montreal, Canada, represent valuable tools for simulating the flow around wind turbines including secondary effects. Comparison of theoretical results with experimental data have shown good agreement. The strength of the aerodynamic codes based on Double-Multiple Stream tube model (DMS) lies in its simplicity, accuracy, and ability to analyze secondary effects that interfere with wind turbine aerodynamic calculations.

Keywords: Aerodynamics, wind turbines, VAWT, CARDAAV, Darrieus, dynamic stall.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2601
7256 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2612
7255 Two Different Solutions for Gigabit Ethernet Transmission over POF

Authors: Stefano Straullu, Silvio Abrate, Antonino Nespola, Paolo Savio, Roberto Gaudino

Abstract:

Two completely different approaches for a Gigabit Ethernet compliant stream transmission over 50m of 1mm PMMA SI-POF have been experimentally demonstrated and are compared in this paper. The first solution is based on a commercial RC-LED transmission and a careful optimization of the physical layer architecture, realized during the POF-PLUS EU Project. The second solution exploits the performance of an edge-emitting laser at the transmitter side in order to avoid any sort of electrical equalization at the receiver side.

Keywords: Gigabit Ethernet, Home Networking, Step-Index Polymer Optical Fiber (SI-POF)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1790
7254 Fault Localization and Alarm Correlation in Optical WDM Networks

Authors: G. Ramesh, S. Sundara Vadivelu

Abstract:

For several high speed networks, providing resilience against failures is an essential requirement. The main feature for designing next generation optical networks is protecting and restoring high capacity WDM networks from the failures. Quick detection, identification and restoration make networks more strong and consistent even though the failures cannot be avoided. Hence, it is necessary to develop fast, efficient and dependable fault localization or detection mechanisms. In this paper we propose a new fault localization algorithm for WDM networks which can identify the location of a failure on a failed lightpath. Our algorithm detects the failed connection and then attempts to reroute data stream through an alternate path. In addition to this, we develop an algorithm to analyze the information of the alarms generated by the components of an optical network, in the presence of a fault. It uses the alarm correlation in order to reduce the list of suspected components shown to the network operators. By our simulation results, we show that our proposed algorithms achieve less blocking probability and delay while getting higher throughput.

Keywords: Alarm correlation, blocking probability, delay, fault localization, WDM networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2068
7253 Eclectic Rule-Extraction from Support Vector Machines

Authors: Nahla Barakat, Joachim Diederich

Abstract:

Support vector machines (SVMs) have shown superior performance compared to other machine learning techniques, especially in classification problems. Yet one limitation of SVMs is the lack of an explanation capability which is crucial in some applications, e.g. in the medical and security domains. In this paper, a novel approach for eclectic rule-extraction from support vector machines is presented. This approach utilizes the knowledge acquired by the SVM and represented in its support vectors as well as the parameters associated with them. The approach includes three stages; training, propositional rule-extraction and rule quality evaluation. Results from four different experiments have demonstrated the value of the approach for extracting comprehensible rules of high accuracy and fidelity.

Keywords: Data mining, hybrid rule-extraction algorithms, medical diagnosis, SVMs

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708
7252 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559
7251 Spatial Clustering Model of Vessel Trajectory to Extract Sailing Routes Based on AIS Data

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

The automatic extraction of shipping routes is advantageous for intelligent traffic management systems to identify events and support decision-making in maritime surveillance. At present, there is a high demand for the extraction of maritime traffic networks that resemble the real traffic of vessels accurately, which is valuable for further analytical processing tasks for vessels trajectories (e.g., naval routing and voyage planning, anomaly detection, destination prediction, time of arrival estimation). With the help of big data and processing huge amounts of vessels’ trajectory data, it is possible to learn these shipping routes from the navigation history of past behaviour of other, similar ships that were travelling in a given area. In this paper, we propose a spatial clustering model of vessels’ trajectories (SPTCLUST) to extract spatial representations of sailing routes from historical Automatic Identification System (AIS) data. The whole model consists of three main parts: data preprocessing, path finding, and route extraction, which consists of clustering and representative trajectory extraction. The proposed clustering method provides techniques to overcome the problems of: (i) optimal input parameters selection; (ii) the high complexity of processing a huge volume of multidimensional data; (iii) and the spatial representation of complete representative trajectory detection in the context of trajectory clustering algorithms. The experimental evaluation showed the effectiveness of the proposed model by using a real-world AIS dataset from the Port of Halifax. The results contribute to further understanding of shipping route patterns. This could aid surveillance authorities in stable and sustainable vessel traffic management.

Keywords: Vessel trajectory clustering, trajectory mining, Spatial Clustering, marine intelligent navigation, maritime traffic network extraction, sdailing routes extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 455
7250 Modeling Language for Constructing Solvers in Machine Learning: Reductionist Perspectives

Authors: Tsuyoshi Okita

Abstract:

For a given specific problem an efficient algorithm has been the matter of study. However, an alternative approach orthogonal to this approach comes out, which is called a reduction. In general for a given specific problem this reduction approach studies how to convert an original problem into subproblems. This paper proposes a formal modeling language to support this reduction approach in order to make a solver quickly. We show three examples from the wide area of learning problems. The benefit is a fast prototyping of algorithms for a given new problem. It is noted that our formal modeling language is not intend for providing an efficient notation for data mining application, but for facilitating a designer who develops solvers in machine learning.

Keywords: Formal language, statistical inference problem, reduction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1328
7249 Dynamic Features Selection for Heart Disease Classification

Authors: Walid MOUDANI

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the Coronary Heart Disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts- knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: Multi-Classifier Decisions Tree, Features Reduction, Dynamic Programming, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2532
7248 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: Cooccurrence graph, entity relation graph, unstructured text, weighted distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 684
7247 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: Classifier ensemble, breast cancer survivability, data mining, SEER.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671
7246 Knowledge Discovery from Production Databases for Hierarchical Process Control

Authors: Pavol Tanuska, Pavel Vazan, Michal Kebisek, Dominika Jurovata

Abstract:

The paper gives the results of the project that was oriented on the usage of knowledge discoveries from production systems for needs of the hierarchical process control. One of the main project goals was the proposal of knowledge discovery model for process control. Specifics data mining methods and techniques was used for defined problems of the process control. The gained knowledge was used on the real production system thus the proposed solution has been verified. The paper documents how is possible to apply the new discovery knowledge to use in the real hierarchical process control. There are specified the opportunities for application of the proposed knowledge discovery model for hierarchical process control.

Keywords: Hierarchical process control, knowledge discovery from databases, neural network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1775
7245 Impact of Vehicle Travel Characteristics on Level of Service: A Comparative Analysis of Rural and Urban Freeways

Authors: Anwaar Ahmed, Muhammad Bilal Khurshid, Samuel Labi

Abstract:

The effect of trucks on the level of service is determined by considering passenger car equivalents (PCE) of trucks. The current version of Highway Capacity Manual (HCM) uses a single PCE value for all tucks combined. However, the composition of truck traffic varies from location to location; therefore, a single PCE value for all trucks may not correctly represent the impact of truck traffic at specific locations. Consequently, present study developed separate PCE values for single-unit and combination trucks to replace the single value provided in the HCM on different freeways. Site specific PCE values, were developed using concept of spatial lagging headways (that is the distance between rear bumpers of two vehicles in a traffic stream) measured from field traffic data. The study used data from four locations on a single urban freeway and three different rural freeways in Indiana. Three-stage-leastsquares (3SLS) regression techniques were used to generate models that predicted lagging headways for passenger cars, single unit trucks (SUT), and combination trucks (CT). The estimated PCE values for single-unit and combination truck for basic urban freeways (level terrain) were: 1.35 and 1.60, respectively. For rural freeways the estimated PCE values for single-unit and combination truck were: 1.30 and 1.45, respectively. As expected, traffic variables such as vehicle flow rates and speed have significant impacts on vehicle headways. Study results revealed that the use of separate PCE values for different truck classes can have significant influence on the LOS estimation.

Keywords: Level of Service, Capacity Analysis, Lagging Headway.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2096
7244 High-Speed Pipeline Implementation of Radix-2 DIF Algorithm

Authors: Christos Meletis, Paul Bougas, George Economakos , Paraskevas Kalivas, Kiamal Pekmestzi

Abstract:

In this paper, we propose a new architecture for the implementation of the N-point Fast Fourier Transform (FFT), based on the Radix-2 Decimation in Frequency algorithm. This architecture is based on a pipeline circuit that can process a stream of samples and produce two FFT transform samples every clock cycle. Compared to existing implementations the architecture proposed achieves double processing speed using the same circuit complexity.

Keywords: Digital signal processing, systolic circuits, FFTalgorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2215
7243 Destination Port Detection for Vessels: An Analytic Tool for Optimizing Port Authorities Resources

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages Automatic Identification System (AIS) messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring AIS messages. Our RRo method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measures to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Frechet Distance (DFD), Dynamic Time ´ Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an f-measure of 99.08% using Dynamic Time Warping (DTW) similarity measure.

Keywords: Spatial temporal data mining, trajectory mining, trajectory similarity, resource optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 696