Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Source – Our Search for the Best OCR Tool, and What We Found

Source is an OpenNews project designed to amplify the impact of journalism code and the community of developers, designers, journalists, and editors w”ho make it.”

Our Search for the Best OCR Tool, and What We Found: A side-by-side comparison of seven OCR tools using multiple kinds of documents, from Factful – There are a lot of OCR options available. Some are easy to use, some require a bit of programming to make them work, some require a lot of programming. Some are quite expensive, some are free and open source. We selected several documents—two easy to read reports, a receipt, an historical document, a legal filing with a lot of redaction, a filled in disclosure form, and a water damaged page—to run through the OCR engines we are most interested in. We tested three free and open source options (Calamari, OCRopus and Tesseract) as well as one desktop app (Adobe Acrobat Pro) and three cloud services (Abbyy Cloud, Google Cloud Vision, and Microsoft Azure Computer Vision). All the scripts we used, as well as the complete output from each OCR engine, are available on GitHub. You can use the scripts to check our work, or to run your own documents against any of the clients we tested…”

Sorry, comments are closed for this post.