Lu Si and Jie Yu and Shasha Li and Jun Ma and Lei Luo and Qingbo Wu and Yongqi Ma and Zhengji Liu
FCNNMR A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule
855 - 861
2017
11
7
International Journal of Information and Communication Engineering
https://publications.waset.org/pdf/10007534
https://publications.waset.org/vol/127
World Academy of Science, Engineering and Technology
Instance selection (IS) technique is used to reduce
the data size to improve the performance of data mining methods.
Recently, to process very large data set, several proposed methods
divide the training set into some disjoint subsets and apply IS
algorithms independently to each subset. In this paper, we analyze
the limitation of these methods and give our viewpoint about how to
divide and conquer in IS procedure. Then, based on fast condensed
nearest neighbor (FCNN) rule, we propose a large data sets instance
selection method with MapReduce framework. Besides ensuring the
prediction accuracy and reduction rate, it has two desirable properties
First, it reduces the work load in the aggregation node; Second
and most important, it produces the same result with the sequential
version, which other parallel methods cannot achieve. We evaluate the
performance of FCNNMR on one small data set and two large data
sets. The experimental results show that it is effective and practical.
Open Science Index 127, 2017