The purpose of this repo is to collect details about various efforts across the Congruence Engine to experiment with Optical Character Recognition technology. It includes google colab notebooks and pipelines used by the project to harness OCR tools, mostly with the intention of extracting raw text for subsequent analysis, as well as post-processing.
Max Long: Investigation, Data curation, Formal analysis, Methodology, Writing
Natasha Kitcher: Investigation, Data curation, Formal analysis, Methodology, Writing
Daniel Belteki: Investigation, Data curation, Formal analysis, Methodology, Writing
Alex Butterworth: Investigation, Methodology
Nayomi Kasthuri Arachchi: Software
Felix Needham-Simpson: Software
ABBYY, Surya, Tesseract, LLM vision models
Google colab notebooks
This work is licensed under a Creative Commons Attribution 4.0 License - CC BY 4.0.