Polybase – Curated SQL

Creating Parquet Files in SQL Server 2022 via PolyBase

Published 2025-03-03 by Kevin Feasel

Deb Melkin uses a favorite feature of mine:

Have you ever wanted to know how to create a parquet file directly from your SQL Server 2022 instance? Did you even know that it was an option?

Setting this up sounds like it would be really complicated, but it’s much simpler than it seems. Join me as I show you how:

Click through for the video.

Comments closed

Read Local Files in SQL Server with PolyBase and MinIO

Published 2025-01-29 by Kevin Feasel

I have a new video:

In this video, I demonstrate how we can use PolyBase and MinIO to read files on a local machine in SQL Server.

This is one of the reasons I’m really happy that SQL Server introduced access to AWS S3 and S3-compatible storage with PolyBase in SQL Server 2022. The results are definitely slower than if you had direct file access, but it is possible.

Comments closed

Using PolyBase for Archiving Data

Published 2025-01-23 by Kevin Feasel

Andy Yun is speaking my language:

One of SQL Server 2022’s new features is something called Data Virtualization. It enables T-SQL to directly query files that reside in Azure object storage or S3-compatible object storage. In my opinion, since SQL Server 2022’s release, it’s one of those underrated capabilities that I think many have glossed over. But I strongly believe that it is insanely useful and you should take a few minutes to learn more!

Read on to learn more. Also, Andy mentions using S3-compatible local storage with PolyBase for local storage. As a spoiler, I have a video coming out on January 28th that covers exactly that same topic, though without the benefit of snappy all-flash storage arrays.

1 Comment

Ingesting Blob Storage Data into SQL Server

Published 2024-10-16 by Kevin Feasel

Andy Brownsword brings in some data:

We may associate consuming data from Azure Storage with tools like Data Factory or even SSIS as we saw recently. We don’t always need the middle man though.

Here we’ll demonstrate how to use an External Data Source to perform the ingestion directly into SQL Server.

Click through for the solution. As a quick note, the TYPE attribute that Andy uses in CREATE EXTERNAL DATA SOURCE was necessary from SQL Server 2016 through SQL Server 2019, but no longer exists for SQL Server 2022. Instead, for SQL Server 2022, you’d switch the LOCATION to start with abs:// for Azure Blob Storage and PolyBase would infer the type from the protocol.

Comments closed

The Joy of Partitioned Views

Published 2024-05-16 by Kevin Feasel

Rod Edwards talks partitioned views:

This post came around when I was at a loose end one evening, and just started poking at a local sandpit database, and it got me reminiscing and revisiting / testing a few things. The devil makes work for idle thumbs and all that…

Partitioned Views…do they have a place in society anymore?

Rod does a great job of following Betteridge’s Law of Headlines, as well as saving the ‘Yes’ answer for the post itself. Partitioned views come with their own pains, though one use case Rod did not include is using PolyBase and partitioned views to move “cold” data to slower external storage.

Comments closed

Running PolyBase in SQL Server on Linux

Published 2024-01-17 by Kevin Feasel

I have a new video out:

In this video, we install and demonstrate PolyBase running in SQL Server on Linux hosted on Ubuntu 22.04 and get a friendly reminder to read the manual along the way.

In all seriousness, the largest percentage of time I spent on this video was because I didn’t read the manual.

Comments closed

CETAS in SQL Server 2022

Published 2023-05-04 by Kevin Feasel

Eric Rouach shows off a nice extension to T-SQL in SQL Server 2022:

Create External Table As Select or “CETAS” has finally become available on SQL Server with the release of the 2022 version.

After a short setup, we can create various formats files containing any query’s result set. The created file/s must be kept on an Azure storage solution i.e. Azure Blob Storage.

The process also creates an external table reflecting the updated file’s content.

We’ve been able to do this in Azure Synapse Analytics dedicated and serverless SQL pools for a while, so it’s good to be able to create an external table from a SELECT query on-premises, especially considering that it’s the only way we have left to write to external sources using PolyBase.

Comments closed

CETAS to Parquet Files in Azure SQL Managed Instance

Published 2023-04-27 by Kevin Feasel

Michael Bourgon gives CETAS a chance:

TL;DR – the below lines will allow you to query a table on your MI, creating Parquet files in Azure blob storage. And you can query it! Next up is partitioning over time, etc, etc. But this is freaking fantastic. I have a python script I wrote that does it, but it’s nowhere as nice/easy as this.

Why do you care? Because it’s a fantastically easy way to archive older data to blob storage, and I suspect (need to test) that if you do it right, you can then have it go to cool/archive storage via a lifecycle setup, so that if you need it much later, you can.

Yep, this is historically one of the best use cases for PolyBase. Unfortunately, we can’t do this in SQL Server 2022, though you can in pre-2022 versions using the Hadoop process. Given that it’s now available in SQL MI, I wouldn’t be too shocked to see it on-premises at some point, with the big question being in SQL Server 2022 or vNext.

2 Comments

PolyBase Generic ODBC Bug in 2019 CU19 and 2022 CU2

Published 2023-04-06 by Kevin Feasel

Nathan Schoenack warns us of a bug in SQL Server:

After you install SQL Server 2019 CU19 or SQL Server 2022 CU2, external data sources using generic ODBC connector may no longer work. When you try to query external tables that were created before installing the cumulative update, you receive the following error message:

Msg 7320, Level 16, State 110, Line 68

Cannot execute the query “Remote Query” against OLE DB provider “MSOLEDBSQL” for linked server “(null)”. Object reference not set to an instance of an object.

If you try to create a new external table, you receive the following error message:

Msg 110813, Level 16, State 1, Line 64

Object reference not set to an instance of an object.

Click through to learn the cause of the issue and a workaround to resolve it.

2 Comments

PolyBase, JRE7, and TLS Support

Published 2023-02-28 by Kevin Feasel

Nathan Schoenack explains an error:

At end of October 2022 we saw an issue where a customer using PolyBase external query to Azure Storage started seeing queries fail with the following error:

Msg 7320, Level 16, State 110, Line 2

Cannot execute the query “Remote Query” against OLE DB provider “SQLNCLI11” for linked server “(null)”. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_IsDirExist: Error [com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: ]occurred while accessing external file.’

Prior to this, everything was working fine; the customer made no changes to SQL Server or Azure Storage.

I guess it doesn’t matter so much unless you’re interested in getting support, but Java SE 7 is no longer supported. Java SE 8 is still in support and JRE 8 remains the best version for PolyBase integration in my experience.

Comments closed

Category: Polybase