Decision Tree-based Feature Ranking using Manhattan Hierarchical Cluster Criterion
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
Decision Tree-based Feature Ranking using Manhattan Hierarchical Cluster Criterion

Authors: Yasmin Mohd Yacob, Harsa A. Mat Sakim, Nor Ashidi Mat Isa

Abstract:

Feature selection study is gaining importance due to its contribution to save classification cost in terms of time and computation load. In search of essential features, one of the methods to search the features is via the decision tree. Decision tree act as an intermediate feature space inducer in order to choose essential features. In decision tree-based feature selection, some studies used decision tree as a feature ranker with a direct threshold measure, while others remain the decision tree but utilized pruning condition that act as a threshold mechanism to choose features. This paper proposed threshold measure using Manhattan Hierarchical Cluster distance to be utilized in feature ranking in order to choose relevant features as part of the feature selection process. The result is promising, and this method can be improved in the future by including test cases of a higher number of attributes.

Keywords: Feature ranking, decision tree, hierarchical cluster, Manhattan distance.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1056236

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1924

References:


[1] P. Pudil, Novovicova, and J. Kittler, "Floating search methods in feature selection," Pattern Recognition Letters, vol. 15, pp. 1119-1125, 1994.
[2] R. Battiti, "Using Mutual Information for Selecting Features in Supervised Neural Network Learning," IEEE Transactions on Neural Network,, pp. 537-550, 1994.
[3] F. Fleuret, "Fast Binary Feature Selection with Conditional Mutual Information," Journal of Machine Learning Research , vol. 5, pp. 1531-1555, 2004.
[4] J. Wren, "Extending the mutual information measure to rank inferred literature relationships," BMC Bioinformatics, pp. 145, 2004.
[5] H. Peng, F. Long, and C. Ding, "Feature Selection Based on Mutual Information:Criteria of Max-Dependency, Max-Relevance,and Min- Redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1226-1238, 2005.
[6] T.-S. Chou, K.K. Yen, J. Luo, N. Pissinou, and K. Makki, "Correlation- Based Feature Selection for Intrusion Detection Design," in IEEE 2007 Proc. Military Communications Conference (MILCOM 2007) , pp. 1-7.
[7] H.E. Osman, "Correlation-based feature ranking for online classification," in IEEE 2009 International Conference on Systems, Man and Cybernetics (SMC 2009), pp. 3077-3082.
[8] T. Piroonratana, W. Wongseree, T. Usavanarong, A. Assawamakin, C. Limwongse, and N. Chaiyaratana, "Identification of Ancestry Informative Markers from Chromosome-Wide Single Nucleotide Polymorphisms Using Symmetrical Uncertainty Ranking," in 20th International Conference on Pattern Recognition (ICPR), 2010, pp. 2448-2451.
[9] S. Senthamarai Kannan, and N. Ramaraj, "A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm," Knowledge-Based Systems, pp. 580-585, 2010.
[10] X.J. Zhou, T.S. Dillon, "A statistical-heuristic feature selection criterion for decision tree induction," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 834-841.
[11] Y-S Hwang, Hae-Chang R. "Decision tree decomposition-based complex feature selection for text chunking," in Proceedings of the 9th International Conference on Neural Information Processing (ICONIP '02), 2002, vol.2215, pp. 2217-2222.
[12] Mohammadi M, Gharehpetian GB. "Application of core vector machines for on-line voltage security assessment using a decisiontree-based feature selection algorithm," Generation, Transmission & Distribution, IET, vol. 3, no. 8, pp. 701-712, 2009
[13] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical recipes in C, Cambridge University Press, 1988.
[14] L. Yu, and H. Liu, "Feature Selection for High-Dimensional Data : A Fast Correlation-Based Filter Solution," in Proceedings of the 20th International Conference on Machine Learning (ICML), 2003, pp. 856-863.
[15] B. Jiang, X. Ding, L. Ma, Y. He, T. Wang, and W. Xie, "A hybrid feature selection algorithm: Combination of symmetrical uncertainty and genetic algorithms", in Proc. of the 2nd. Intl. Symposium on Optimization and Systems Biology (OSB'08), Lijang, China, 2008, pp. 152-157.