OCRopus check groundtruth

When training a character model for OCRopus you need a good selection of ground truth data for training and testing. To be able to recognize a certain charater it must be included in the training data. Otherwise the neuronal network has no chance to detect the character’s appearance. Although not strictly required, it is also a good idea to include every possible character at least once in the ground truth data used for testing.

Testing OCRopus character models

After you have trained an OCRopus character model or selected an existing character model you want to measure its character recognition accuracy. Do measure it you need ground truth data (images and text) that has not been used in the training for the model. If you would use images that have been used in the training process you might only measure and overfitting of the model (i.e., the character models ‘knows’ the solution for exactly this image).

Ready2Power Multimedia 3D All in One

A couple of days ago I bought an interesting piece of hardware: “Ready2Power Multimedia 3D All in One” - a pair of VR glasses with a built-in computer and monitor. My main motivation to buy the device is my 3D photo camera. I have many 3D photographs and 3D films but no device to watch them properly. It is very difficult to find any information on the glasses online. Therefore, I decided to write some blog posts about the device. Even the user’s manual is difficult to find. There are the English and German version.

Optimizing Binarization for OCRopus

In many hours of work and frustration I have learned that page segmentation and character models have a strong influence on the result of OCR. However, I always underestimated the effects of the initial step – the binarization.

Schriftchaos unter Ubuntu beseitigen

Habt ihr euch auch schon darüber geärgert, dass unter Ubuntu die Liste der Schriften so unglaublich unübersichtlich ist, weil sie viele Schriften enthält, die man in Europa nie braucht? Hier am Beispiel von LibreOffice bei der Auswahl der Schriftart „Noto“.