TY - JFULL AU - Lu Si and Jie Yu and Shasha Li and Jun Ma and Lei Luo and Qingbo Wu and Yongqi Ma and Zhengji Liu PY - 2017/8/ TI - FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule T2 - International Journal of Information and Communication Engineering SP - 854 EP - 861 VL - 11 SN - 1307-6892 UR - https://publications.waset.org/pdf/10007534 PU - World Academy of Science, Engineering and Technology NX - Open Science Index 127, 2017 N2 - Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical. ER -