A Character Detection Method for Ancient Yi Books Based on Connected Components and Regressive Character Segmentation

Xu Han; Shanxiong Chen; Shiyu Zhu; Xiaoyu Lin; Fujia Zhao; Dingwang Wang

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33156

A Character Detection Method for Ancient Yi Books Based on Connected Components and Regressive Character Segmentation

Authors: Xu Han, Shanxiong Chen, Shiyu Zhu, Xiaoyu Lin, Fujia Zhao, Dingwang Wang

Abstract:

Character detection is an important issue for character recognition of ancient Yi books. The accuracy of detection directly affects the recognition effect of ancient Yi books. Considering the complex layout, the lack of standard typesetting and the mixed arrangement between images and texts, we propose a character detection method for ancient Yi books based on connected components and regressive character segmentation. First, the scanned images of ancient Yi books are preprocessed with nonlocal mean filtering, and then a modified local adaptive threshold binarization algorithm is used to obtain the binary images to segment the foreground and background for the images. Second, the non-text areas are removed by the method based on connected components. Finally, the single character in the ancient Yi books is segmented by our method. The experimental results show that the method can effectively separate the text areas and non-text areas for ancient Yi books and achieve higher accuracy and recall rate in the experiment of character detection, and effectively solve the problem of character detection and segmentation in character recognition of ancient books.

Keywords: Computing methodologies, interest point, salient region detections, image segmentation.

Digital Object Identifier (DOI): doi.org/1

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 883

References:

[1] Xiaodong Jia, Wendong Gong and Jie Yuan, “Handwritten Yi character recognition with Density-based clustering algorithm and convolutional neural network,” in Proceedings of CSE and EUC 2017, GuangZhou, China, 2017, pp. 337-341.
[2] Xiangdong Su and Guanglai Gao, “A knowledge-based recognition system for historical Mongolian documents,” International Journal on Document Analysis and Recognition, vol. 19, pp. 221-235, 2016.
[3] Halmurat and Aziguli, “Research and development of a multifont printed uyghur character recognition system,” Chinese Journal of Computers, vol. 27, pp. 1480-1482, 2002.
[4] Jianming Jin, Xiaoqing Ding, Liangrui Peng and Hua Wang, “Printed uyghur texts segmentation,” Journal of Chinese Information Processing, vol. 18, pp. 76-83, 2004.
[5] Xiaosong Shi, Yongjie Huang and Yongge Liu, “Text on Oracle rubbing segmentation method based on connected domain,” in Proceedings of IEEE IMCEC 2016, AnYang, China, 2016, pp. 414-418.
[6] Jinliang Yao, Lubin Weng and Xiaohua Wang, “A text region location method based on connected component,” PR&AI, vol. 25, pp. 325-331, 2012.
[7] S. M. Lucas, “ICDAR 2005 text locating competition results,” in Proceedings of ICDAR 2005, Seoul, korea, 2005, pp. 80-84.
[8] Y. Zhang, J. Lai and P. Yuen, “Text string detection for loosely constructed characters with arbitrary orientations,” Neurocomputing, vol. 168, pp. 970-978, 2015.
[9] Yue Xu, Fei Yin, Zhaoxiang Zhang and Chenglin Liu, “Multi-task layout analysis for historical handwritten documents using fully convolutional networks,” in Proceedings of IJCAI 2018, Stockholm, Sweden, 2018, pp. 1057-1063.
[10] Xiaohui Li, Fei Yin, Chenglin Liu, “Printed/Handwritten texts and graphics separation in complex documents using conditional random fields,” in Proceedings of the 13th International Workshop on Document Analysis Systems. Vienna, Austria, 2018, pp. 145-150.
[11] J.Bensen, “Dynamic thresholding of grey-level images,” in Proceedings of the 8th ICPR, Paros, France, 1986, pp. 1251-1255.
[12] A. Buades, B. Coll , J. M. Morel, “A non-local algorithm for image denoising,” in Proceedings of CVPR 2005, San Diego, USA, 2005, pp. 60-65.
[13] Zheng Zhang, Chengquan Zhang, Wei Shen, et al, “Muti-oriented text detection with fully covolutional networks,” in Proceedings of CVPR 2016, Las Vegas, USA, 2016, pp. 1-9.
[14] G. Renton, Y. Soullard, C. Chatelain, et al. “Fully convolutional network with dilated convolutions for handwritten text line segmentation,” International Journal on Document Analysis and Recognition, vol. 21, pp. 177-186, 2018.
[15] J. Long, E. Shelhamer, T. Darrell, “Fully convolutionalnetworks for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 640-651, 2014.
[16] P. Lyu, M. Liao, C. Yao, et al. “Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary Shapes,” in Proceedings of ECCV 2018, Munich, Gremany, 2018, pp. 1-17.
[17] Zhi Tian, Weilin Huang, Tong He, et al. “Detecting text in natural image with connectionist text proposal network,” in Proceedings of ECCV 2016, Amsterdam, The Newtherlands, 2016, pp. 56-72.
[18] Yihua Fan, Dexiang Deng, Jia Yan, “Natural scene text detection based on maximally stable extremal region in color space,” Journal of Computer Applications, vol. 38, pp. 264-269, 2018.
[19] S. Karthikeyan, V Jagadeesh, B. S. Manjunath, “Learning bottom-up text attention maps for text detection using stroke width transform,” in Proceedings of IEEE International Conference on Image Processing (ICIP) 2013, Melbourne, VIC, Australia, 2013, pp. 1-5.
[20] Tongwei Lu, Renjun Liu, “Detecting text in natural scenes with multi-level MSER and SWT,” in Proceedings of Ninth International Conference on Graphic and Image Processing (ICGIP) 2017, Qingdao, China, 2017, vol. 10615.
[21] Yan Wu, Jincheng Yin, “Guizhou Yi language information technology research overview,” China Information Technology, vol. 8, pp. 63-65, 2017.
[22] Chengping Wang, “Design and implementation of ancient Yi input method based on Yunnan, Sichuan, Guizhou and Guilin Yi Character Sets,” Computer and Information Technology, vol. 20, pp. 28-30, 2012.