Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3

concept drift Related Abstracts

3 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: Data Stream Mining, binary relevance, concept drift, MLSC, multiple window with buffer

Procedia PDF Downloads 377
2 Concept Drifts Detection and Localisation in Process Mining

Authors: M. V. Manoj Kumar, Likewin Thomas, Annappa

Abstract:

Process mining provides methods and techniques for analyzing event logs recorded in modern information systems that support real-world operations. While analyzing an event-log, state-of-the-art techniques available in process mining believe that the operational process as a static entity (stationary). This is not often the case due to the possibility of occurrence of a phenomenon called concept drift. During the period of execution, the process can experience concept drift and can evolve with respect to any of its associated perspectives exhibiting various patterns-of-change with a different pace. Work presented in this paper discusses the main aspects to consider while addressing concept drift phenomenon and proposes a method for detecting and localizing the sudden concept drifts in control-flow perspective of the process by using features extracted by processing the traces in the process log. Our experimental results are promising in the direction of efficiently detecting and localizing concept drift in the context of process mining research discipline.

Keywords: Process Mining, concept drift, abrupt drift, sudden drift, control-flow perspective, detection and localization

Procedia PDF Downloads 202
1 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: Gaelle Candel, David Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embeddings. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n²) to O(n²=k), and the memory requirement from n² to 2(n=k)², which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution, and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: Data Visualization, monitoring, unsupervised learning, Dimension Reduction, reusability, embedding, concept drift, t-SNE

Procedia PDF Downloads 1