Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

L. Hamsaveni; Navya Prakash; Suresha

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: Grayscale image format, image fusing, SURF detection, YCbCr image format.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1128989

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1105

References:

[1] Lawrence O’Gorman, Rangachar Kasturi, “Document Image Analysis”, IEEE Computer Society Executive Briefings, ISBN 0-8186-7802-X, 2009.
[2] Herbert Bay, Tinne Tuytelaars, Luc Van Gool, “SURF: Speeded Up Robust Features”, European Conference on Computer Vision, 2006.
[3] Navya Prakash, L. Hamsaveni, Prof. Dr. Suresha, “Extraction of Original Text Document from a Set of Degraded Text Documents from the Same Source”, IJATSCE Vol. 5, No. 4, July-August 2016, ISSN 2278-3091.
[4] Navya Prakash, L. Hamsaveni, Prof. Dr. Suresha, “Extraction of Original Text Document from a Set of Degraded Text Documents from the Same Source”, 4th International Conference on Computing, Engineering and Information Technology, Bangalore, 2016.
[5] A.S. Kavitha, P. Shivakumara, G.H. Kumar, Tong Lu, “Text Segmentation in Degraded Historical Document Images”, Egyptian Informatics Journal, 2016.