Press "Enter" to skip to content

Category: Polybase

Using PolyBase for Archiving Data

Andy Yun is speaking my language:

One of SQL Server 2022’s new features is something called Data Virtualization. It enables T-SQL to directly query files that reside in Azure object storage or S3-compatible object storage. In my opinion, since SQL Server 2022’s release, it’s one of those underrated capabilities that I think many have glossed over. But I strongly believe that it is insanely useful and you should take a few minutes to learn more!

Read on to learn more. Also, Andy mentions using S3-compatible local storage with PolyBase for local storage. As a spoiler, I have a video coming out on January 28th that covers exactly that same topic, though without the benefit of snappy all-flash storage arrays.

1 Comment

Ingesting Blob Storage Data into SQL Server

Andy Brownsword brings in some data:

We may associate consuming data from Azure Storage with tools like Data Factory or even SSIS as we saw recently. We don’t always need the middle man though.

Here we’ll demonstrate how to use an External Data Source to perform the ingestion directly into SQL Server.

Click through for the solution. As a quick note, the TYPE attribute that Andy uses in CREATE EXTERNAL DATA SOURCE was necessary from SQL Server 2016 through SQL Server 2019, but no longer exists for SQL Server 2022. Instead, for SQL Server 2022, you’d switch the LOCATION to start with abs:// for Azure Blob Storage and PolyBase would infer the type from the protocol.

Comments closed

The Joy of Partitioned Views

Rod Edwards talks partitioned views:

This post came around when I was at a loose end one evening, and just started poking at a local sandpit database, and it got me reminiscing and revisiting / testing a few things. The devil makes work for idle thumbs and all that…

Partitioned Views…do they have a place in society anymore?

Rod does a great job of following Betteridge’s Law of Headlines, as well as saving the ‘Yes’ answer for the post itself. Partitioned views come with their own pains, though one use case Rod did not include is using PolyBase and partitioned views to move “cold” data to slower external storage.

Comments closed

CETAS in SQL Server 2022

Eric Rouach shows off a nice extension to T-SQL in SQL Server 2022:

Create External Table As Select or “CETAS” has finally become available on SQL Server with the release of the 2022 version.

After a short setup, we can create various formats files containing any query’s result set. The created file/s must be kept on an Azure storage solution i.e. Azure Blob Storage.

The process also creates an external table reflecting the updated file’s content.

We’ve been able to do this in Azure Synapse Analytics dedicated and serverless SQL pools for a while, so it’s good to be able to create an external table from a SELECT query on-premises, especially considering that it’s the only way we have left to write to external sources using PolyBase.

Comments closed

CETAS to Parquet Files in Azure SQL Managed Instance

Michael Bourgon gives CETAS a chance:

TL;DR – the below lines will allow you to query a table on your MIcreating Parquet files in Azure blob storageAnd you can query it! Next up is partitioning over time, etc, etc. But this is freaking fantastic. I have a python script I wrote that does it, but it’s nowhere as nice/easy as this.

Why do you care? Because it’s a fantastically easy way to archive older data to blob storage, and I suspect (need to test) that if you do it right, you can then have it go to cool/archive storage via a lifecycle setup, so that if you need it much later, you can.

Yep, this is historically one of the best use cases for PolyBase. Unfortunately, we can’t do this in SQL Server 2022, though you can in pre-2022 versions using the Hadoop process. Given that it’s now available in SQL MI, I wouldn’t be too shocked to see it on-premises at some point, with the big question being in SQL Server 2022 or vNext.

2 Comments

PolyBase Generic ODBC Bug in 2019 CU19 and 2022 CU2

Nathan Schoenack warns us of a bug in SQL Server:

After you install SQL Server 2019 CU19 or SQL Server 2022 CU2, external data sources using generic ODBC connector may no longer work. When you try to query external tables that were created before installing the cumulative update, you receive the following error message:

Msg 7320, Level 16, State 110, Line 68

Cannot execute the query “Remote Query” against OLE DB provider “MSOLEDBSQL” for linked server “(null)”. Object reference not set to an instance of an object.

If you try to create a new external table, you receive the following error message:

Msg 110813, Level 16, State 1, Line 64

Object reference not set to an instance of an object.

Click through to learn the cause of the issue and a workaround to resolve it.

2 Comments

PolyBase, JRE7, and TLS Support

Nathan Schoenack explains an error:

At end of October 2022 we saw an issue where a customer using PolyBase external query to Azure Storage started seeing queries fail with the following error:

Msg 7320, Level 16, State 110, Line 2

Cannot execute the query “Remote Query” against OLE DB provider “SQLNCLI11” for linked server “(null)”. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_IsDirExist: Error [com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: ]occurred while accessing external file.’

Prior to this, everything was working fine; the customer made no changes to SQL Server or Azure Storage.

I guess it doesn’t matter so much unless you’re interested in getting support, but Java SE 7 is no longer supported. Java SE 8 is still in support and JRE 8 remains the best version for PolyBase integration in my experience.

Comments closed

PolyBase and S3 Integration in SQL Server 2022 on Containers

Amit Khandelwal combines a bunch of things together:

One of the new features introduced with SQL Server 2022 is the ability to connect to any S3-compatible object storage and SQL Server supports both Backup/Restore and data lake virtualization with Polybase integration.  In this blog, we will demonstrate both of these features for SQL Server 2022 Containers running on Kubernetes. As usual, I will use the Azure Kubernetes Service as my Kubernetes environment

Most of the work is in the container configuration, which is good on net, as it means you only have to do it once.

Comments closed