Component-based Segmentation of Words from Handwritten Arabic Text
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33092
Component-based Segmentation of Words from Handwritten Arabic Text

Authors: Jawad H AlKhateeb, Jianmin Jiang, Jinchang Ren, Stan S Ipson

Abstract:

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.

Keywords: Arabic OCR, off-line recognition, Baseline estimation, Word segmentation.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333879

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2205

References:


[1] A. Amin. "Offline Arabic character recognition: The state of the art". Pattern Recognition, vol. 3, pp. 517-530, 1998.
[2] L. M. Lorigo and V. Govindaraju, "Offline Arabic handwriting recognition: a survey", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, pp. 712-724, 2006.
[3] M.S. Khorsheed," Off-Line Arabic Character Recognition - A Review", Pattern Analysis & Applications, vol.5, pp. 31-45, 2002.
[4] H. Al-Muallim and S Yamaguchi. "A method of recognition of Arabic cursive handwriting". IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 9, pp. 715-722, 1987.
[5] A. Amin and H. Alsadoun, "A new segmentation technique of Arabic text.", IEEE Trans. Pattern Recognition, Vol.2, pp. 441-445, 1992.
[6] A. Amin and H. Alsadoun, "Hand printed Arabic Character Recognition System", IEEE Trans. Pattern Recognition, Vol. 2, pp536-539, 1994.
[7] I. S. I. Abuhaiba and P. Ahmed, "Restoration of temporal information in off-line arabic handwriting," Pattern Recognition, vol. 26, pp. 1009- 1017, 1993.
[8] I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta, "Processing of binary images of handwritten text documents," Pattern Recognition, vol. 29, pp. 1161-1177, 1996.
[9] I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta, "Recognition of Off-Line Cursive Handwriting," Computer Vision and Image Understanding, vol. 71, pp. 19-38, 1998.
[10] M. Khorsheed, "Recognising handwritten Arabic manuscripts using a single hidden Markov model", Pattern Recognition Letters, vol. 24, pp. 2235-2242, 2003.
[11] S. Alma-adeed, C. Higgens, and D. Elliman, "Off-line recognition of handwritten Arabic words using multiple hidden Markov models", Knowledge-Based Systems, vol. 17, pp. 75-79, 2004.
[12] F. Farooq, V. Govindaraju, and M. Perrone, "Pre-processing Methods for Handwritten Arabic Documents", proc. Int-l conf. Document Analysis and Recognition, vol. 1, pp. 267-271, 2005.
[13] IFN/ENIT - Database of Arabic Handwritten words, Institute of Communications Technology, Technical University Braunschweig, Germany.
[14] M. Pechwitz, and V. Margner. "Baseline Estimation for Arabic Handwritten Words". International Workshop on Frontiers in Handwriting Recognition, pages 479-484, 2002.
[15] H. Al-Rashaideh, "Preprocessing phase for Arabic Word Handwritten Recognition", Information Transmissions in Computer Networks, vol.6, pp. 11-19, 2006.
[16] M.Syiam, T.M. Nazmy, A.E. Fahmy, H. Fathi, and H. Ali, "Histogram Clustering and Hybrid Classifier for Handwritten Arabic Characters Recognition", Proc. IASTED Int. Multi-conf. Signal Proc., Pattern Recognition and Applications, pp 44-49, 2006.
[17] B. Al_Badr, and R. Haralick, "Segmentation-Free Word Recognition with Application to Arabic", proc. Int-l conf. Document Analysis and Recognition, vol. 1, pp. 355-359, 1995..
[18] D. Motawa, A.Amin, and R. Sabourin, "Segmentation of Arabic Cursive Script", In Proceeding of the 4th International conference Document Analysis and Recognition, vol. 2, pp. 625-628, 1997.
[19] L. Lorigo and V. Govindaraju, "Segmentation and pre-recognition of Arabic handwriting," proc. Int-l conf. Document Analysis and Recognition, vol. 2, pp. 605-609, 2005.
[20] J. AlKhateeb, J. Ren, S. S. Ipson and J. Jiang: "Knowledge-based Baseline Detection and Optimal Thresholding for Words Segmentation in Efficient Pre-processing of Handwritten Arabic Text". International Conference on Information Technology: New Generations, pp.1158- 1159, 2008.