Transkribus and the Altona Case Team
This post first appeared on the blog for the Centre for Privacy Studies: In the Altona Case Team at PRIVACY, we are working with two versions of a late 18th century text by Johann Peter Willebrand. The text appears in French as Abrégé de la police, accompagné de réflexions sur l'accroissement des villes and in German as Innbegriff der Policey: nebst Betrachtungen über das Wachsthum der Städte . To make our lives easier, our team thought that it would be a good idea to run the PDFs through OCR, to have searchable and editable texts that we could work with. However, we got huge differences in accuracy with different OCR tools. We started with the French version of the text, which we downloaded in PDF format from Google books. First, we tried Abbyy FineReader . This is a very good (proprietary) app to run OCR on scanned text written in modern languages , but when dealing with our early modern material, the results were far from acceptable. Ne...