Press "Enter" to skip to content

Category: Polybase

Ingesting Blob Storage Data into SQL Server

Andy Brownsword brings in some data:

We may associate consuming data from Azure Storage with tools like Data Factory or even SSIS as we saw recently. We don’t always need the middle man though.

Here we’ll demonstrate how to use an External Data Source to perform the ingestion directly into SQL Server.

Click through for the solution. As a quick note, the TYPE attribute that Andy uses in CREATE EXTERNAL DATA SOURCE was necessary from SQL Server 2016 through SQL Server 2019, but no longer exists for SQL Server 2022. Instead, for SQL Server 2022, you’d switch the LOCATION to start with abs:// for Azure Blob Storage and PolyBase would infer the type from the protocol.

Comments closed

The Joy of Partitioned Views

Rod Edwards talks partitioned views:

This post came around when I was at a loose end one evening, and just started poking at a local sandpit database, and it got me reminiscing and revisiting / testing a few things. The devil makes work for idle thumbs and all that…

Partitioned Views…do they have a place in society anymore?

Rod does a great job of following Betteridge’s Law of Headlines, as well as saving the ‘Yes’ answer for the post itself. Partitioned views come with their own pains, though one use case Rod did not include is using PolyBase and partitioned views to move “cold” data to slower external storage.

Comments closed

CETAS in SQL Server 2022

Eric Rouach shows off a nice extension to T-SQL in SQL Server 2022:

Create External Table As Select or “CETAS” has finally become available on SQL Server with the release of the 2022 version.

After a short setup, we can create various formats files containing any query’s result set. The created file/s must be kept on an Azure storage solution i.e. Azure Blob Storage.

The process also creates an external table reflecting the updated file’s content.

We’ve been able to do this in Azure Synapse Analytics dedicated and serverless SQL pools for a while, so it’s good to be able to create an external table from a SELECT query on-premises, especially considering that it’s the only way we have left to write to external sources using PolyBase.

Comments closed

CETAS to Parquet Files in Azure SQL Managed Instance

Michael Bourgon gives CETAS a chance:

TL;DR – the below lines will allow you to query a table on your MIcreating Parquet files in Azure blob storageAnd you can query it! Next up is partitioning over time, etc, etc. But this is freaking fantastic. I have a python script I wrote that does it, but it’s nowhere as nice/easy as this.

Why do you care? Because it’s a fantastically easy way to archive older data to blob storage, and I suspect (need to test) that if you do it right, you can then have it go to cool/archive storage via a lifecycle setup, so that if you need it much later, you can.

Yep, this is historically one of the best use cases for PolyBase. Unfortunately, we can’t do this in SQL Server 2022, though you can in pre-2022 versions using the Hadoop process. Given that it’s now available in SQL MI, I wouldn’t be too shocked to see it on-premises at some point, with the big question being in SQL Server 2022 or vNext.

2 Comments

PolyBase Generic ODBC Bug in 2019 CU19 and 2022 CU2

Nathan Schoenack warns us of a bug in SQL Server:

After you install SQL Server 2019 CU19 or SQL Server 2022 CU2, external data sources using generic ODBC connector may no longer work. When you try to query external tables that were created before installing the cumulative update, you receive the following error message:

Msg 7320, Level 16, State 110, Line 68

Cannot execute the query “Remote Query” against OLE DB provider “MSOLEDBSQL” for linked server “(null)”. Object reference not set to an instance of an object.

If you try to create a new external table, you receive the following error message:

Msg 110813, Level 16, State 1, Line 64

Object reference not set to an instance of an object.

Click through to learn the cause of the issue and a workaround to resolve it.

2 Comments

PolyBase, JRE7, and TLS Support

Nathan Schoenack explains an error:

At end of October 2022 we saw an issue where a customer using PolyBase external query to Azure Storage started seeing queries fail with the following error:

Msg 7320, Level 16, State 110, Line 2

Cannot execute the query “Remote Query” against OLE DB provider “SQLNCLI11” for linked server “(null)”. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_IsDirExist: Error [com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: ]occurred while accessing external file.’

Prior to this, everything was working fine; the customer made no changes to SQL Server or Azure Storage.

I guess it doesn’t matter so much unless you’re interested in getting support, but Java SE 7 is no longer supported. Java SE 8 is still in support and JRE 8 remains the best version for PolyBase integration in my experience.

Comments closed

PolyBase and S3 Integration in SQL Server 2022 on Containers

Amit Khandelwal combines a bunch of things together:

One of the new features introduced with SQL Server 2022 is the ability to connect to any S3-compatible object storage and SQL Server supports both Backup/Restore and data lake virtualization with Polybase integration.  In this blog, we will demonstrate both of these features for SQL Server 2022 Containers running on Kubernetes. As usual, I will use the Azure Kubernetes Service as my Kubernetes environment

Most of the work is in the container configuration, which is good on net, as it means you only have to do it once.

Comments closed

Data Virtualization in SQL Server 2022

Hugo Queiroz provides an overview of data virtualization options in SQL Server 2022:

SQL Server 2022 now supports CSV, Parquet, and Deltafiles stored on Azure Storage Account v2, Azure Data Lake Storage Gen2, or any simple storage service (S3)–compliant object storage—the last as an on-premises offering or in the cloud. Finally, SQL Server 2022 can now use Create External Table as Select (CETAS), together with commands like OPENROWSETCreate External Table (CET), and all the new T-SQL enhancements. SQL Server 2022 is a powerful data hub.

The post doesn’t get too deep into the topic, though a search here will find you links to articles with concrete examples.

Comments closed

Data Virtualization in Azure SQL Managed Instance

Mladen Andzic has an announcement:

We are excited to announce the general availability (GA) of data virtualization capabilities in Azure SQL Managed Instance, with improved query performance and managed identity as a new supported option for authenticating to storage accounts.  

The data virtualization enables you to execute Transact-SQL (T-SQL) queries on files storing data in common data formats in Azure Data Lake Storage Gen2 or Azure Blob Storage and combine it with relational data stored locally in the managed instance using logical joins. This way you can transparently access external data while keeping it in its original format and location. There is no data duplication or need to run and maintain ETL processes, which means that you can extract and deliver insights faster. The supported file formats are Parquet, CSV, and JSON.

This is similar to PolyBase in SQL Server 2019 but is a different underlying technology. In SQL Managed Instance, it looks like we only get API-based data virtualization, not the ODBC-based PolyBase we saw in SQL Server 2019.

Comments closed