Fast Document Segmentation Using Contourand X-Y Cut Technique
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
Fast Document Segmentation Using Contourand X-Y Cut Technique

Authors: Boontee Kruatrachue, Narongchai Moongfangklang, Kritawan Siriboon

Abstract:

This paper describes fast and efficient method for page segmentation of document containing nonrectangular block. The segmentation is based on edge following algorithm using small window of 16 by 32 pixels. This segmentation is very fast since only border pixels of paragraph are used without scanning the whole page. Still, the segmentation may contain error if the space between them is smaller than the window used in edge following. Consequently, this paper reduce this error by first identify the missed segmentation point using direction information in edge following then, using X-Y cut at the missed segmentation point to separate the connected columns. The advantage of the proposed method is the fast identification of missed segmentation point. This methodology is faster with fewer overheads than other algorithms that need to access much more pixel of a document.

Keywords: Contour Direction Technique, Missed SegmentationPoints, Page Segmentation, Recursive X-Y Cut Technique

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1335518

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2789

References:


[1] A. Antonapoulos, "Page Segmentation using the Description of the Background Computer Vision and Image Understanding, Vol. 70, (1998) 350-369.
[2] G. Nagy and S. Seth, "Hierarchical representation of optically scanned documents," Proc. of ICPR, (1984) 347-349.
[3] B. Kruatrachue, P. Suthaphan, "A fast and efficient method for document segmentation for OCR", Electrical and Electronic Technology, 2001. Proceeding of IEEE Region 10 International conference on, Volume: 1, 19-22 Aug. (2001) 381- 383 vol.1
[4] Jaekyu Ha, R.M. Haralick, I.T. Phillips, "Recursive X-Y Cut using Bounding Boxes of connected components", Proceedings of the Third International Conference on Document Analysis and Recognition , Volume:2, 14 - 15Aug. (1995) 952-954.
[5] Jaekyu Ha, R.M. Haralick, I.T. Phillips, "Document Page Decomposition by the Bounding-Box Projection Technique", Proceedings of the Third International Conference on Document Analysis and Recognition , Volume:2, 14 - 15Aug. (1995) 1119-1122.
[6] T. Saitoh, T. Pavlidis, "Page Segmentation without Rectangle Assumption", Pattern Recognition Methodology and Systems, Proceedings, 11th IAPR International Conference on , 30 Aug.-3 Sept. (1992) 277 - 280.