Abstract
This article reviews the history and state-of-the-art optical character recognition systems, such as ABBYY FineReader, Tesseract, CuneiForm, with particular attention given to their inner algorithms, including page layout analysis; page segmentation and document skew angle estimation. The overview includes the description and comparison of different methods proposed for the last 30 years in terms of speed and versatility. Critical analysis and discussions about the status of the field and open problems are reported.
Original language | English |
---|---|
Pages (from-to) | 441-452 |
Number of pages | 12 |
Journal | Computer Optics |
Volume | 41 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2017 |
Keywords
- OCR
- Page layout analysis
- Skew detection
- Text segmentation
ASJC Scopus subject areas
- Atomic and Molecular Physics, and Optics
- Computer Science Applications
- Electrical and Electronic Engineering