Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30578
A Study of the Variability of Very Low Resolution Characters and the Feasibility of Their Discrimination Using Geometrical Features

Authors: Farshideh Einsele, Rolf Ingold

Abstract:

Current OCR technology does not allow to accurately recognizing small text images, such as those found in web images. Our goal is to investigate new approaches to recognize very low resolution text images containing antialiased character shapes. This paper presents a preliminary study on the variability of such characters and the feasibility to discriminate them by using geometrical features. In a first stage we analyze the distribution of these features. In a second stage we present a study on the discriminative power for recognizing isolated characters, using various rendering methods and font properties. Finally we present interesting results of our evaluation tests leading to our conclusion and future focus.

Keywords: world wide web, Optical Character Recognition, Document Analysis, pattern recognition

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1071126

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1073

References:


[1] A. Antonacopoulos, D. Karatzas nad J.O.Lopetz "Accessing Textual Information Embedded in Internet Images", Proceedings of Electronic Imaging, Jan. 2001, Internet Imaging II, San Jose, California.
[2] D. Amor, The E-Business (R)evolution, Prentice Hall, 1999.
[3] E.V. Munson, Y. Tsymbalenko, "Using HTML Metadata to Find Relevant Images on the Web", Proceedings of Internet Computing 2001, Volume II, Las Vegas, pages 842-848, CSREA Press, June 2001.
[4] G. Nagy, "Twenty Years of Document Image Analysis in PAMI", IEEE Transactions on Pattern Analysis and Machine Intelligence", 1999125.
[5] D. Lopresti, J. Zhou, "Document Analysis and the World Wide Web", International Association for Pattern Recognition, Workshop on Document Analysis Systems, 1996, pp 651-671.
[6] J. Zhou, D. Lopresti, "Extracting Text from WWW Images", Proceedings of the 4th ICDAR, 1997, pp 248-252.
[7] J. Zhou, D. Lopresti, "OCR for World Wide Web Images", Proceedings of SPIE on Document Recognition IV, 1997, pp 58-66.
[8] D. Lopresti, J. Zhou, "Locating and Recognizing Text in WWW Images", Information Retrieval 2,, 2000, pp 177-206.
[9] A. Antonacopoulos, D. Karatzas, "An Anthropocentric Approach to Text Extraction from WWW Images", IAPR Rio de Janiero, 2000.
[10] A. Antonacopoulos, D. Kartazas, "Text Extraction from Web Images Based on Human Perception and Fuzzy Inference", Document Analysis Systems V: 5th International Workshop, DAS 2002, Princeton, NY, USA, August 19-21, 2002.
[11] A. Antonacopoulos, D. Karatzas, "Text Extraction from Web Images Based on a Split-and-Merge Segmentation Method Using Color Perception", Proceedings of the 17th International Conference on Pattern Recognition (ICPR2004), Cambridge, UK, August 23-26, 2004, IEEE-CS Press.
[12] A. Zramdini and R. Ingold, "Optical Font Recognition Using Typographical Features". IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), August 1998.