Urdu Nastaleeq Optical Character Recognition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
Urdu Nastaleeq Optical Character Recognition

Authors: Zaheer Ahmad, Jehanzeb Khan Orakzai, Inam Shamsher, Awais Adnan

Abstract:

This paper discusses the Urdu script characteristics, Urdu Nastaleeq and a simple but a novel and robust technique to recognize the printed Urdu script without a lexicon. Urdu being a family of Arabic script is cursive and complex script in its nature, the main complexity of Urdu compound/connected text is not its connections but the forms/shapes the characters change when it is placed at initial, middle or at the end of a word. The characters recognition technique presented here is using the inherited complexity of Urdu script to solve the problem. A word is scanned and analyzed for the level of its complexity, the point where the level of complexity changes is marked for a character, segmented and feeded to Neural Networks. A prototype of the system has been tested on Urdu text and currently achieves 93.4% accuracy on the average.

Keywords: Cursive Script, OCR, Urdu.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1055691

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2725

References:


[1] U. Pal and Anirban Sarkar, "Recognition of Printed Urdu Script", "Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003)".
[2] Raymond G. Gordon, "Ethnologue: Languages of the World Fifteenth Edition" SIL International, 2005.
[3] Khalid Saeed, "New Approaches for Cursive Languages Recognition: Machine and Hand Written Script and Texts".
[4] K. Saeed, Three-Agent System for Cursive Script Recognition, " Proc. CVPRIP ÔÇÿ2000 Computer Vsion, Pattern Recognition and Image Processing-5th Joint Conf. on Information Sciences, JCIS-200, Vol.2, PP.244-247, Feb 27-March 3, N.Jersry 2000.
[5] K. Saeed, R Niedzielski, "Experiments on Thinning of Cursive-Style Alphabets, "Inter Conf. on information Technologies ITESB -99, June 24-25, Minsk 1999.
[6] Inam shamsheer, Zaheer Ahmad, Jehanzeb Khan Orakzai, Awais Adnan, "OCR For Printed Urdu Script Using Feed Forward Neural Network," MLPR 2007 :International Conference on Machine Learning and Pattern Recognition", 2007.