Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Segmentation Problems and Solutions in Printed Degraded Gurmukhi Script

Authors: M. K. Jindal, G. S. Lehal, R. K. Sharma

Abstract:

Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper we have proposed a complete solution for segmenting touching characters in all the three zones of printed Gurmukhi script. A study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis. Structural properties of the Gurmukhi characters are used for defining the categories. New algorithms have been proposed to segment the touching characters in middle zone, upper zone and lower zone. These algorithms have shown a reasonable improvement in segmenting the touching characters in degraded printed Gurmukhi script. The algorithms proposed in this paper are applicable only to machine printed text. We have also discussed a new and useful technique to segment the horizontally overlapping lines.

Keywords: Character Segmentation, Middle Zone, Upper Zone, Lower Zone, Touching Characters, Horizontally Overlapping Lines.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060978

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1355

References:


[1] Y. Lu, "Machine Printed Character Segmentation - an Overview", Pattern Recognition, vol. 29, no. 1, pp. 67-80, 1995
[2] S.Kahan, T.Pavlidis, and H.S.Baird, " on the recognition of printed characters of any fonts and sizes", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 2, pp. 274-288, Mar. 1987
[3] S. Tsujimoto and H. Asada, " Resolving Ambiguity in Segmenting Touching Characters" Ist Int. Conf. on Document Analysis and Recognition ,pp. 701-709, Saint-Malo, France, Oct 1991.
[4] R.G.Casey and G. Nagy, "Recursive Segmentation and Classification of Composite character Patterns", Proc. 6th Int. Conf. on Pattern Recognition, pp. 1023-1026, Munich, germany,1982.
[5] Tao Hong, "Degraded text recognition using visual and linguistic context", a dissertation submitted to the faculty of the graduate school of the State University of New York at Buffalo, 1995.
[6] Veena Bansal and R.M.K. Sinha , "Segmentation of touching and Fused Devanagari characters, ", Pattern recognition, vol. 35, pp. 875-893, 2002.
[7] U. Garain, B.B. Chaudhuri, "Segmentation of touching characters in printed Devanagari and Bangla scripts using fuzzy multifactorial analysis", IEEE Trans. Systems Man Cybern. Part C-32 (2002) 449-459.
[8] B.B. Chaudhuri ,U. Pal and M. Mitra , "Automatic Recognition of Printed Oriya Script", ICDAR, pp.795-799,2001.
[9] U. Garain, B.B. Chaudhuri, "On recognition of touching characters in printed Bangla Documents", Proceedings of the Fourth International Conference on Document Analysis and Recognition, 1997, pp. 1011- 1016.
[10] M. K. Jindal, G.S. Lehal and R.K. Sharma," A Study of Touching Characters in degraded Gurmukhi Script", in Int. Conf. on Pattern Recognition and Computer Vision, PRCV 2005, pp. ?, 25-27 February 2005, Istanbul, Turkey
[11] G. S .Lehal and Chandan Singh, "Text segmentation of machine printed Gurmukhi script", Document Recognition and Retrieval VIII, Proceedings SPIE, USA, vol. 4307, pp. 223-231, 2001.
[12] G. S. Lehal and Chandan Singh, "A technique for segmentation of Gurmukhi script", Computer Analysis of Images and Patterns, Proceedings CAIP 2001, W. Skarbek (Ed.), Lecture Notes in Computer Science, vol. 2124, Springer-Verlag, Germany, pp. 191-200, 2001.
[13] Serban, Rajjan and Raymund, "Proposed Heuristic Procedures to Preprocesses Character Pattern using Line Adjacency Graphs", Pattern recognition, vol. 29, no. 6, pp. 951-975, 1996.
[14] B. B. Chaudhuri and U. Pal, "Skew Angle Detection of Digitized Indian Scripts Documents", Pattern recognition, vol. 19, no. 2, pp. 182-186, 1997.