Ultra High Speed Approach for Document Skew Detection and Correction Based On Centre of Gravity
Authors: Seyyed Yasser Hashemi
Abstract:
Skew detection and correction (SDC) has a direct effect in efficiency and exactitude of documents’ segmentation and analysis and thus is considered as a very important step in documents’ analysis field. Skew is a major problem in documents’ analysis for every language. For Arabic/Persian document scripts this problem is more severe because of special features of these languages. In this paper an efficient and fast algorithm for Document Skew Detection (DSD) based on the concept of segmentation and Center of Gravity (COG) is proposed. This algorithm is examined for 150 Arabic/Persian and English documents and SDC process are done successfully for 93 percent of documents with error rate of less than 1°. This algorithm shows better results for English documents compared to Arabic/Persian documents. The proposed method is also represents favorable results for handwritten, printed and also complicated documents such as newspapers and journals even with very low quality and resolution.
Keywords: Arabic/Persian document, Baseline, Centre of gravity, Document segmentation, Skew detection and correction.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1336586
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1917References:
[1] Akiyama T, Hagita N. Automated entry systems for printed documents. Pattern Recognition 1990; 23: 11; 1141–1154.
[2] Bloomberg D S, Kopec G E, Dasari L. Measuring document image skew and orientation. Document Recognition II (SPIE vol.2422)1995;302–316.
[3] Chevillat P, Schindler H R. Arrangement for determining the optimum scan angle for printed documents. International Business Machines Corp (U.S Patent 4)1982; 338–588.
[4] Ishitani Y. Document skew detection based on local region complexity. The Second International Conference on Document Analysis and Recognition 1996; 49–52.
[5] Pavlidis T, Zhou J. Page segmentation and classification. Graphical models and image processing. 1992; 54: 6;484–496.
[6] Postl W. Detection of linear oblique structures and skew scan in digitized documents. 8th International. Conference. on Pattern Recognition(Paris)1986,687–689.
[7] Amin A, Fischer S. A document skew detection method using the Hough transform. Pattern Analysis and Applications. 2000;3: 3;243–253.
[8] Farrow G S D. Ireton M A, Xydeas C S. Detecting the skew angle in document images. Signal Processing: Image Communication, 1994;6:6;101–114.
[9] Farrow G S D, Xydeas C S. Detecting skew in digitized images. Int Computers Ltd., London, European Patent App 1992;485–491.
[10] Ham Y K, Chung H K, Kim I K, Park R H. Automated analysis of mixed documents consisting of printed Korean alphanumeric texts and graphic images. Optical Engineering. 1994;33:6;1845–1853.
[11] Hind S C, Fisher J L, D’Amato D P D. A document skew detection method using run-length encoding and the Hough transform. Proceedings of the 10th International Conference on Pattern recognition(Atlantic City, New Jersey)1990;464–468.
[12] Kwag H K, Kim S H, Jeong S H, Lee G S. Efficient skew estimation and correction algorithm for document images. Image and vision computing. 2001; 20:1;25–35.
[13] Le D S, Thoma G R, Wechsler H. Automated page orientation and skew angle determination for binary document images. Pattern Recognition. 1994;27:10;1325–1344.
[14] Lee Y. Method of detecting the skew angle of a printed business form. Eastman Kodak Company, U.S. Patent 5,054,098;1991.
[15] Srihari S N, Govindaraju V. Analysis of textual images using the Hough transform. Machine Vision Applications. 1989;2:3;141–153.
[16] Yu B, Jain A K. A robust and fast skew detection algorithm for generic documents. Pattern Recognition. 1996;29:10;1599–1630.
[17] Ashkan M Y, Guru D S, Punitha P. Skew estimation in Persian documents: A novel approach. Proceeding of International Conference on Computer Graphics, Imaging and Visualization (CGiV'06)2006;64–70.
[18] Chen S, Haralick R M. An automatic algorithm for text skew estimation in document images using recursive morphological transform. IEEE International Conference on Image Processing (ICIP94)1994;139–143.
[19] Dasari L, Bloomberg D S. Rapid detection of page orientation Xerox Corporation. U.S. Patent 5276742;1994.
[20] Hashizume A, Yeh P S, Rosenfeld A. A method of detecting the orientation of aligned components. Pattern Recognition Letters. 1986;4:2;125–132.
[21] Liu J, Lee C M, Shu R B. An efficient method for the skew normalization of a document image. 11th International Conference on Pattern Recognition (IAPR) 1992;152–155
[22] Yue L Y, Chew L T. A Nearest neighbor chain based approach to skew estimation in document images. Pattern Recognition Letters. 2003;24:14;2315–2323.
[23] O’Gorman L. The document spectrum for page layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1993;15:11;1162–1173.
[24] Okamoto M, Twaakyondo H M, Nishizawa H. Skew detection, skew normalization and segmentation of document images using segmented block code. Journal of the Faculty of Engineering. 1988;1:1;9–18.
[25] Bloomberg D S, Kopec G. Method and apparatus for identification and correction of document skew. Xerox Corporation, U.S. Patent 5,187,753; 1993.
[26] Mitchel P E, Yana H. Newspaper layout analysis incorporating connected component separation. Image and Vision Computing. 2004;22:4;307–317.