Benjamin Smith analyzes a text:
Since the text that I’m using has with two columns per page, the text will need to be cropped by columns before OCR is applied. Prior to that, the
.png
format.
Read on to see the code for the entire process, using the tidyverse, magick, and tesseract packages.