Image Processing In U-SQL

Kevin Feasel

2016-08-19

U-SQL

Rukmani Gopalan and Apostolos Lerios show how to perform image processing using U-SQL:

We have published C# libraries that supply UDOs and UDFs for processing images with U-SQL in our GitHub site. In this section, we introduce these UDOs and UDFs and, in the next section, we use them within a U-SQL walkthrough to operate on images.

The basic flow behind processing images in U-SQL has three stages:

  1. Use the custom UDO extractor ImageExtractor to read a (JPEG or non-JPEG) image file and return the image data as a byte[] column value which contains the same exact image as the file in an (always) JPEG representation. Please note that there is a current limitation in U-SQL that a row cannot exceed a size of 4 MB, so you will run into issues if your image size is greater than 4 MB.

  2. Use the image processing UDFs to manipulate this byte[] (the UDFs support JPEG and non-JPEG representations within this byte[] despite the previous step always producing a JPEG representation). For example, one UDF extracts metadata from an image to produce textual or numeric data. More interesting UDFs derive an output image from an input image; that output represents the visually transformed input (e.g. rotated or scaled/resized), also stored as a byte[] containing an (always) JPEG representation of the output.

  3. Use the custom UDO outputter ImageOutputter to writes each byte[] to a JPEG image file so that we can view the output images of the aforementioned UDFs.

The major value proposition to me for U-SQL is “doing stuff SQL can’t do very well.”  This is one of those cases.

Related Posts

Reusing U-SQL Scripts

Kevin Feasel

2017-12-20

U-SQL

Matthew Hicks shows how to use Powershell to parameterize U-SQL scripts: You can use this feature either via Azure Cloud Shell or on a Windows machine with Azure PowerShell installed. When submitting, simply construct a hashtable of U-SQL variable names to values and pass it in using the -ScriptParameter cmdlet parameter. The .NET type of each value in the hashtable is used when defining […]

Read More

Fetching U-SQL Job Input And Output Paths

Matthew Hicks shows how to retrieve information on U-SQL input and output paths using Powershell: Each time you submit a U-SQL job, a job folder is created in your Azure Data Lake Store account. This folder contains useful debugging information about the job, including a file called the U-SQL algebra file. This is an XML […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031