OCR With Tesseract

Amuda Adelou shows how to use Tesseract’s Java API to perform character recognition in images:

Extracting text from an image means that you are considering the flowchart imagery that’s processed to extract the text components and then extracting the geometrical shapes components. The text components are extracted with geometrical components, as well. The internal relationship between the components is set up by tracing the flow lines that connect different components. The extracted components are output to metadata (in XML format), which is machine-readable. This metadata can be archived, stored in a knowledge base, or shared with others.

Click through for a demo app and code.

Related Posts

The Basics Of Bash: Writing Data

Mark Wilkinson hits us with some basic Bash output management: If you have experience with PowerShell, some properties of Bash variables will feel familiar. In Bash, variables are denoted with a $ just like in PowerShell, but unlike PowerShell the $ is only needed when they are being referenced. When you are assigning a value to a variable, the $ is […]

Read More

Using The Power Query SDK

Chris Webb shows how to build M queries in Visual Studio: Writing M in the Advanced Editor in Excel or Power BI can be a frustrating experience unless you’re the kind of masochist who loves writing code in Notepad. There are some options for writing M code outside Excel and Power BI, for example Lars […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930