Challenges for automated OCR/HWR postprocessing

OCR results, especially for languages other than English, still contain many mistakes. One option to reduce these errors is to train better models for specific use cases, but this is often not possible. Automated postprocessing of the plain text generated by the OCR tool is another option, and several researchers are actively exploring the role that AI / ML can play in this process.

The code for text postprocessing provided in this repository includes the following two use cases:

Test-case 1: using a Large Language Model (LLM) for the correction of spelling mistakes / misidentified characters

Test-case 2: using a Large Language Model (LLM) for Named Entity Recognition (NER)

Challenges of working with LLMs via (free-tier) coding environments

LLMs can be accessed via graphic user interfaces (GUIs) on the provider websites or via an application programming interface (API). Most models have a strict rate limit per request and charge for advanced processing of data. Many popular models have a limit of ca. 2000 tokens only, which requires input truncation. Also, API calls (using secret API keys) should only be carried out in stable and safe coding environments.

When trying to access LLMs in environments like Google Colab, users often encounter installation issues. Some existing models are “gated” and require authentication (via the Hugging Face platform). There can also be challenges related to the formats in which models are made available. GPT4All, for instance, only supports .gguf models, but many available models are provided as .bin, which causes errors. Some popular models may also be removed from an AI’s official repository, which forces users to manually fetch models or find alternatives.

Another challenge is computational speed. Google Colab, for instance, permits free users only limited computational power (no GPU acceleration), which is why accessing LLMs via code in Colab is considerably slower than working with an AI on the official cloud instance. Also, Python is not always the best programming language to work with AI / LLMs. The GPT4All GUI uses optimized C++ inference, while Python calls run without optimisation. Python users should therefore always opt for the smaller models first.

Comparisons between API and GUI access

To test the performance differences between API access and GUI access, here are links to the official web interfaces, online demos, or installers of some popular AI tools:

Using the model via Hugging Face requires a hugging face account. The details are explained in the following Medium article: Chat with Llama-2 (7B) from HuggingFace (Llama-2–7b-chat-hf)