Image Processing In U-SQL

Kevin Feasel



Rukmani Gopalan and Apostolos Lerios show how to perform image processing using U-SQL:

We have published C# libraries that supply UDOs and UDFs for processing images with U-SQL in our GitHub site. In this section, we introduce these UDOs and UDFs and, in the next section, we use them within a U-SQL walkthrough to operate on images.

The basic flow behind processing images in U-SQL has three stages:

  1. Use the custom UDO extractor ImageExtractor to read a (JPEG or non-JPEG) image file and return the image data as a byte[] column value which contains the same exact image as the file in an (always) JPEG representation. Please note that there is a current limitation in U-SQL that a row cannot exceed a size of 4 MB, so you will run into issues if your image size is greater than 4 MB.

  2. Use the image processing UDFs to manipulate this byte[] (the UDFs support JPEG and non-JPEG representations within this byte[] despite the previous step always producing a JPEG representation). For example, one UDF extracts metadata from an image to produce textual or numeric data. More interesting UDFs derive an output image from an input image; that output represents the visually transformed input (e.g. rotated or scaled/resized), also stored as a byte[] containing an (always) JPEG representation of the output.

  3. Use the custom UDO outputter ImageOutputter to writes each byte[] to a JPEG image file so that we can view the output images of the aforementioned UDFs.

The major value proposition to me for U-SQL is “doing stuff SQL can’t do very well.”  This is one of those cases.

Related Posts

Inline U-SQL Functions

Damien Widera shows us how to write inline functions in U-SQL: Now let’s go to the new thing – undocumented usage of inline functions. My function is pretty simple and I can imagine that function you will write could be as simple as mine but your functions will probably do something more useful. To simplyfy […]

Read More

Copying Azure Data Lake Databases

Yanan Cai shows how to copy Azure Data Lake databases for local debugging and development: The concept of a database is used to group related data structures and functions together. ADLA users have databases in their production environment that contain tables, assemblies, table valued functions and other objects. Previously, when developing and tuning U-SQL queries […]

Read More


August 2016
« Jul Sep »