OCR With Tesseract

Amuda Adelou shows how to use Tesseract’s Java API to perform character recognition in images:

Extracting text from an image means that you are considering the flowchart imagery that’s processed to extract the text components and then extracting the geometrical shapes components. The text components are extracted with geometrical components, as well. The internal relationship between the components is set up by tracing the flow lines that connect different components. The extracted components are output to metadata (in XML format), which is machine-readable. This metadata can be archived, stored in a knowledge base, or shared with others.

Click through for a demo app and code.

Related Posts

Calculating TF-IDF Using Apache Spark

Arseniy Tashoyan shows us how to calculate Term Frequency-Inverse Document Frequency using Apache Spark: TF-IDF is used in a large variety of applications. Typical use cases include: Document search. Document tagging. Text preprocessing and feature vector engineering for Machine Learning algorithms. There is a vast number of resources on the web explaining the concept itself […]

Read More

Exception Handling In Scala

Shivangi Gupta shows off the Either keyword in Scala: How to get values from Either? There are many ways we will talk about all one by one.  One way to get values is by doing left and right projection. We can not perform any operation i.e, map, filter etc; on Either. Either provide left and right methods to get the left and right projection. Projection on […]

Read More


April 2017
« Mar May »