Identification of Printed Punjabi Words and English Numerals Using Gabor Features
Authors: Rajneesh Rani, Renu Dhir, G. S. Lehal
Abstract:
Script identification is one of the challenging steps in the development of optical character recognition system for bilingual or multilingual documents. In this paper an attempt is made for identification of English numerals at word level from Punjabi documents by using Gabor features. The support vector machine (SVM) classifier with five fold cross validation is used to classify the word images. The results obtained are quite encouraging. Average accuracy with RBF kernel, Polynomial and Linear Kernel functions comes out to be greater than 99%.
Keywords: Script identification, gabor features, support vector machines.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1084141
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2128References:
[1] D Dhanya and A G Ramakrishnan, "Simultaneous Recognition of Tamil and Roman Scripts", in the Proc. Tamil Internet, Kuala Lumpur, pp. 64- 68, 2001.
[2] Rajneesh Rani, Renu Dhir , "A Survey: Recognition of Scripts in Bi- Lingual/Multi-Lingual Indian Documents" in national journal of PIMT Journal of Research Vol. 2 No. 1 pp. 55-60 , March- August, 2009.
[3] S.Abirami, Dr. D. Manjula, "A Survey of Script Identification Techniques for Multi-Script Document Images" in international journal of Recent trends in Engineering Vol. 1 No. 2 pp. 246-249 May,2009.
[4] S.Wood, X.Yao, K.Krishnamurthi and L.Dang "language identification from for printrd trxt independent od fsegmentation," Proc of International conference on Image Processing, pp. 428-431,1995.
[5] J.Hochberg, P.Kelly, T Thomas and L Kerns, "Automatic script identification from document images using cluster based templates," IEEE Trans. on Pattern Anaylsis and Machine Intelligence, vol 19, pp. 176-181, 1997.
[6] A.L.Spitz, "Determination of the script and language content of document images," IEEE Transactions on pattern Anaylsis and Machine Intelligence, Vol 19, pp.234-24,1997.
[7] T.N. Tan, "Rotation invariant textutre features and their use in automatic script identification," IEEE Trans on Pattern Anaylsis and Machine Intelligence, vol. 20, pp 751-756, 1998.
[8] D Dhanya, A.G Ramakrishnan and Peeta Basa pati, "Script identification in printed bilingual documents," Sadhana, vol. 27, part-1, pp. 73-82, 2002.
[9] U.Pal. S.Sinha and B.B Chaudhuri, "Word-wise Script identification from a document containing English ,Devnagari and Telgu Text," in the proc. of NCDAR, pp. 213-220,2003
[10] M.C. Padma , Dr. P.A. Vijya, " Language Identification of Kannada, Hindi and English Text Words through Visual Discriminating features", in the international journal of Computational Intelligence Systems, Vol.1 No.2 pp. 116-126, May -2008.
[11] Peeta Basa pati, S. Sabari Raju, Nishikanta Pati and A.G. Ramakrishnan, "Gabor filters for document analysis in Indian Bilingual Documents," In the Proc. Of ICISIP, pp. 123-126, 2004.
[12] Peeta Basa Pati and A.G.Ramakrishnan, "HVS inspired system for Script Identification in Indian Multi-Script Documents", In Proc. of 7th International Workshop on Document Analysis System, Nelson Newland, pp. 380-389, 2006
[13] Peeta Basa Pati, A.G. Ramakrishnan " Word level multi-script identification" in the Pattern Recognition Letters 29 pp. 1218-1219, 2008.
[14] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi, V.S.Malemath, "Wordwise Script Identification from Bilingual Documents based on Morphological Reconstruction," in the proc. of First IEEE International Conference on Digital Information Management, pp. 389-394, 2006.
[15] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi, V.S.Malemath, "Word- wise Script Identification based on Morphological Reconstruction in Printed Bilingual Documents," in the proc. of IET International Conference on Vision Information Engineering VIE, Bangalore pp. 389- 393, 2006
[16] B.V.Dhandra, Mallikarjun Hangarge, " On Separation of English Numerals from Multilingual Document Images", In the journal of multimedia , Vol 2, No 6, pp. 26-33, 2007.
[17] Renu Dhir, Chandan Singh and G.S.Lehal, "A Structural Feature Based Approach for Script Identification of Gurmukhi and Roman Character and Words" in the proc. of 39th Annual National Convention of Computer Society of India (CSI) held at Mumbai, India, 2004
[18] Dharamveer Sharma, Gurpreet Singh Lehal, Preeti Kathuria ," Digit Extraction and Recognition from Machine printed Gurmukhi documents" in the Proc. Of International workshop on Multilingual Ocr Article no 12, 2009.
[19] R Anjeev Kunte and R D Sudhaker Samuel, " A Bilingual machine- Interface OCR fr Printed Kannada and English Text Employing Wavelet Features" in the prproc of 10th International Conference on Information Technology, pp.202-207, 2007.
[20] G.G.Rajput,S.M Mati, "Fourier Descriptor based Isolated Marathi Handwritten Numeral Rcognition" in International Journal od Computer Applications Vol. 3 No.4 pp.9-13,June=2010