Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30174
Segmentation Free Nastalique Urdu OCR

Authors: Sobia T. Javed, Sarmad Hussain, Ameera Maqbool, Samia Asloob, Sehrish Jamil, Huma Moin

Abstract:

The electronically available Urdu data is in image form which is very difficult to process. Printed Urdu data is the root cause of problem. So for the rapid progress of Urdu language we need an OCR systems, which can help us to make Urdu data available for the common person. Research has been carried out for years to automata Arabic and Urdu script. But the biggest hurdle in the development of Urdu OCR is the challenge to recognize Nastalique Script which is taken as standard for writing Urdu language. Nastalique script is written diagonally with no fixed baseline which makes the script somewhat complex. Overlap is present not only in characters but in the ligatures as well. This paper proposes a method which allows successful recognition of Nastalique Script.

Keywords: HMM, Image processing, Optical CharacterRecognition, Urdu OCR.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080342

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532

References:


[1] Javed, S.T., Hussain, S. "Improving Nastalique Specific Pre-Recognition Process for Urdu OCR", In the Proceedings of 13th IEEE International Multitopic Conference 2009 (INMIC 2009), Islamabad, Pakistan, 2009 (URL: http://www.jinnah.edu.pk/inmic2009)
[2] Wali, A. and Hussain, S. "Context Sensitive Shape-Substitution in Nastaliq Writing system: Analysis and Formulation," in the Proceedings of International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CISSE), 2006.
[3] Hussain, S. and Durrani, N. "Urdu," in A Study on Collation of Languages from Developing Asia, Center for Research in Urdu Language Processing, NUCES, Pakistan, 2007.
[4] Hussain, S. and Afzal, M. "Urdu Computing Standards: UZT 1.01", in the Proceedings of the IEEE International Multi-Topic Conference, Lahore, Pakistan, 2001.
[5] Hussain, S. "Letter to Sound Rules for Urdu Text to Speech System", In the Proceedings of Workshop on Computational Approaches to Arabic Script-based Languages, COLING 2004, Geneva, Switzerland, 2004.
[6] Hussain, S. "www.LICT4D.asia/Fonts/Nafees_Nastalique," in the Proceedings of 12th AMIC Annual Conference on E-Worlds: Governments, Business and Civil Society, Asian Media Information Center, Singapore, 2003.
[7] Lu, Z., Bazzi, I., Kornai, A. and Makhoul, J. "A Robust, Language- Independent OCR System," in the 27th AIPR Workshop: Advances in Computer Assisted Recognition, SPIE, 1999.
[8] El-Hajj, r., Likforman-Sulem, L. and Mokbel, C. "Arabic Handwriting Recognition Using Baseline Dependant Features and Hidden Markov Modeling," in the 8th International Conference on Document Analysis and Recognition (ICDAR), South Korea, 2005.
[9] Shah, Z. and Saleem, F. "Ligature Based Optical Character Recognition of Urdu, Nastaliq Font," in the Proceedings of International Multi Topic Conference, Karachi, Pakistan, 2002.
[10] Husain, S.A. and Amin, S.H. "A Multi-tier Holistic approach for Urdu Nastaliq Recognition," in the Proceedings of International Multi Topic Conference, Karachi, Pakistan, 2002.
[11] Rabiner, L. and Juang, B. "Theory and Implementation of Hidden Markov Models" in the book, "Fundamental of Speech Recognition", chapter 6, published in 1993.
[12] Young,S., Evermann, G., Hain,T., Kershaw, D., Moore, G., Odell, J.,Ollason, D., Povey, D., Valtchev, V., and Woodland, P. "The HTK Book", December 1995.
[13] Khorsheed, M. S., Clocksin, W.F. "Structural Features Of Cursive Arabic Script", in Proceeding of British Machine Vision Conference, pg.1285- 1294, 1999.
[14] Ijaz, M., Hussain, S. "Corpus Based Urdu Lexicon Development", In the Proceedings of Conference on Language Technology (CLT07), University of Peshawar, Pakistan, 2007.
[15] Pal, U. and Sarkar, A. "Recognition of Printed Urdu Text," in the Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR), 2003.
[16] Bojovic, M. and Savic, M. D. "Training of Hidden Markov Models for Cursive Handwritten Word Recognition," in the Proceedings of the15th International Conference on Pattern Recognition (ICPR) vol.1, 2000.
[17] Ahmad, Z., Orakzai, J. K. , Shamsher, I. and Adnan, A. "Urdu Nastaleeq Optical Character Recognition," in the Proceedings of World Academy of Science, Engineering and Technology 26, 2007.
[18] Shafait, F., Hasan, A., Keysers, D. and Breuel, T. "Layout analysis of Urdu document images" in Proceedings of IEEE Multitopic Conference (INMIC 06), 2006.
[19] Safabakhsh, R. and Abidi, P. "Nastaaligh Handwritten Word Recognition Using a Continuous-Density Variable-Duration HMM", The Arabian Journal for Science and Engineering, April 2005.
[20] Shamsher, I., Ahmad, Z., Orakzai, J. K. and Adnan, A. "OCR for Printed Urdu Script Using Feed Forward Neural Network", in the Proceedings of World Academy of Science, Engineering and Technology 23, 2007.
[21] Razzak,M., Hussain,A., Sher,M., and Khan,Z. "Combining Offline and Online Preprocessing for Online Urdu Character Recognition",Proceedings of the International MultiConference of Engineers and Computer Scientists 2009 Vol I, IMECS 2009, March 18 - 20, 2009.
[22] Hussain, A., Anwar, F., and Sajjad, A. "Online Urdu Character Recognition System." MVA2007 IAPR Conference on Machine Vision Applications, 2007.