Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31108
Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas


Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: Bioinformatics, Meta-Learning, hierarchical classification, algorithm recommendation

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 672


[1] L. Schietgat, C. Vens, J. Struyf, H. Blockeel, D. Kocev, and S. Dzeroski, “Predicting gene function using hierarchical multi-label decision tree ensembles.” BMC Bioinformatics, vol. 11, no. 2, pp. 1–14, Jan. 2010.
[2] D. Delen, G. Walker, and A. Kadam, “Predicting breast cancer survivability: a comparison of three data mining methods,” Artificial Intelligence in Medicine, vol. 34, no. 2, pp. 113–127, 2005.
[3] C. N. Silla Jr. and A. A. Freitas, “A Survey of Hierarchical Classification Across Different Application Domains,” Data Mining and Knowledge Discovery, vol. 44, no. 1-2, pp. 31–72, 2011.
[4] P. Brazdil, C. G. Carrier, C. Soares, and R. Vilalta, Metalearning: Applications to data mining. Springer, 2008.
[5] C. Vens, L. Schietgat, J. Struyf, H. Blockeel, and D. Kocev, “Predicting Gene Function using Predictive Clustering Trees,” BMC Bioinformatics, vol. 11, no. 2, pp. 1–25, 2010.
[6] D. Koller and M. Sahami, “Hierarchically Classifying Documents Using Very Few Words,” in Proceedings of the 14th International Conference on Machine Learning, ser. ICML ’97. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, pp. 170—-178.
[7] M. A. Harris, J. Clark, A. Ireland, J. Lomax et al., “The Gene Ontology (GO) database and informatics resource.” Nucleic Acids Research, vol. 32, pp. D258–61, Jan. 2004.
[8] H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon, and J. Struyf, “Hierarchical Multi-Classification,” in Proceedings of the ACM SIGKDD 2002 workshop on multi-relational data mining (MRDM 2002), 2002, pp. 21–35.
[9] C. Vens, J. Struyf, L. Schietgat, S. Dzeroski, and H. Blockeel, “Decision Trees for Hierarchical Multi-label Classification,” Machine Learning, vol. 73, no. 2, pp. 185–214, Aug. 2008.
[10] F. Fabris and A. A. Freitas, “Dependency Network Methods for Hierarchical Multi-label Classification of Gene Functions,” in Proceedings of the 2014 IEEE International Conference on Computational Intelligence and Data Mining, Orlando, Florida, Dec. 2014, pp. 241–248.
[11] F. Fabris, A. Freitas, and J. Tullet, “An Extensive Empirical Comparison of Probabilistic Hierarchical Classifiers in Datasets of Ageing-Related Genes,” IEEE/ACM transactions on computational biology and bioinformatics/IEEE, ACM, pp. 1–14, dec 2015.
[Online]. Available:
[12] F. Fabris and A. A. Freitas, “A Novel Extended Hierarchical Dependence Network Method Based on non-Hierarchical Predictive Classes and Applications to Ageing-Related Data,” in Proceedings of the 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2015, pp. 294–301.
[13] L. d. C. Merschmann and A. A. Freitas, “An Extended Local Hierarchical Classifier for Prediction of Protein and Gene Functions,” in Data Warehousing and Knowledge Discovery, ser. Lecture Notes in Computer Science. Springer, 2013, vol. 8057, pp. 159–171.
[14] A. A. Freitas, “Comprehensible Classification Models - a position paper,” ACM SIGKDD Explor. Newsl., vol. 15, no. 1, pp. 1–10, 2014.
[15] A. Vellido, J. D. Mart´ın-Guerrero, and P. J. Lisboa, “Making machine learning models interpretable,” in In Proc. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, vol. 12, 2012, pp. 163–172.
[16] K. Boyd, K. H. Eng, and C. D. Page, “Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals,” in Machine Learning and Knowledge Discovery in Databases, ser. Lecture Notes in Computer Science. Springer, 2013, vol. 8190, pp. 451–466.
[17] Y. Peng, P. A. Flach, C. Soares, and P. B. Brazdil, “Improved dataset characterisation for meta-learning,” ser. Lecture Notes in Computer Science. Springer, 2002, vol. 2534, pp. 141–152.
[18] R. Leite and Pavel Brazdil, “Active Testing Strategy to Predict the Best Classification Algorithm via Sampling and Meta-Learning,” in Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence. IOS Press, 2010, pp. 309–314.
[19] Q. Sun and B. Pfahringer, “Pairwise meta-rules for better meta-learning-based algorithm ranking,” Machine Learning, vol. 93, no. 1, pp. 141–161, jul 2013.
[20] J. N. van Rijn, S. M. Abdulrahman, P. Brazdil, and J. Vanschoren, “Fast algorithm selection using learning curves,” in International Symposium on Intelligent Data Analysis. Springer, 2015, pp. 298–309.
[21] R. Leite, P. Brazdil, and J. Vanschoren, “Selecting classification algorithms with active testing,” in Machine Learning and Data Mining in Pattern Recognition, ser. Lecture Notes in Computer Science, 2012, vol. 7376, pp. 117–131.
[22] S. M. Abdulrahman and P. Brazdil, “Measures for combining accuracy and time for meta-learning,” in Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection (MLAS’14), vol. 1201, 2014, pp. 49–50.
[23] I. Partalas, R. Babbar, E. Gaussier, and C. Amblard, “Adaptive classifier selection in large-scale hierarchical classification,” in Lecture Notes in Computer Science, vol. 7665, no. 3, 2012, pp. 612–619.
[24] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-label Data,” in Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds., 2010, pp. 667–685.
[25] A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani et al., “The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes,” Nucleic Acids Research, vol. 32, no. 18, pp. 5539–5545, 2004.
[26] R. Tacutu, T. Craig, A. Budovsky, D. Wuttke, G. Lehmann, D. Taranukha, J. Costa, V. E. Fraifeld, and J. a. P. de Magalh˜aes, “Human Ageing Genomic Resources: integrated databases and tools for the biology and genetics of ageing.” Nucleic Acids Research, vol. 41, no. Database issue, pp. D1027–D1033, Jan. 2013.
[27] F. Fabris and A. A. Freitas, “New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins,” Bioinformatics, vol. 32, no. 19, pp. 2988–2995, jun 2016.
[28] “HMC Software and Datasets,” hmcdatasets/, accessed: 2016-09-23.
[29] “Other Bioinformatics Datasets, including ageing-related datasets with GO and FunCat classes,” Fabris Datasets.tar.gz, accessed: 2016-09-23.
[30] M. Lichman, “UCI machine learning repository,” 2013.
[Online]. Available: http: //
[31] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ser. COLT ’92. New York, NY, USA: ACM, 1992, pp. 144–152.
[32] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993.
[33] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000.
[34] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1–27, 2011.
[35] T. D. Gautheir, “Detecting Trends Using Spearman’s Rank Correlation Coefficient,” Environmental Forensics, vol. 2, no. 4, pp. 359–362, 2001.
[36] P. B. Brazdil, C. Soares, and J. P. Da Costa, “Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results,” Machine Learning, vol. 50, no. 3, pp. 251–277, 2003.
[37] J. Demsar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.