Press "Enter" to skip to content

Category: Polybase

Build a Sandbox for Testing PolyBase and Hadoop

Fernando Sibaja Araya has a step-by-step guide to building a Hadoop sandbox for testing PolyBase on SQL Server:

This guide will take you step by step into deploying a hadoop sandbox into Azure. You then will connect to the sandbox through SSH and tunnel all the required ports to your machine so you can access all the endpoints to execute hadoop queries from Polybase.

We will be deploying Hortonworks Data Platform Sandbox 2.6.4. This will be 1 VM running in azure and within this VM a docker container will have all the HDP services running.

Click through for the full set of instructions. I’m a little overjoyed that my blog snuck into the set of links and resources at the end.

Comments closed

Loading Azure Synapse Analytics using PolyBase

Gauri Mahajan needs to load some data:

Azure Synapse Analytics is Microsoft’s data warehousing offering on Azure Cloud. It supports three types of runtimes – SQL Serverless Pool, SQL Dedicated Pool, and Spark Pools. As there are a variety of data sources on Azure, it’s very obvious that there can be varying types and volumes of data that would have to be loaded into Azure Synapse pools. There are three major types of data ingestion approaches that can be used to load data into Synapse. The COPY command is the most flexible and elaborate mechanism, where someone can execute this command from a SQL pool to load data from supported data repositories. This command is convenient to load ad-hoc and small to medium-sized data loads into Synapse. The second method of loading data is the Bulk Insert, where the method name is self-relevant regarding the approach functionality. To ingest the data from supported repositories into dedicated SQL pools, PolyBase is as efficient and at times it’s even more efficient than the COPY command. This article will help you understand the process to ingest data into Azure Synapse Analytics using PolyBase to load the data.

Click through for the process.

Comments closed

Creating Parquet Files from SQL Server Data

Andy Leonard answers a challenge:

I searched and found some promising Parquet SSIS components available from CData Software and passed that information along. I shared my inexperience in exporting to parquet format and asked a few friends how they’d done it.

I thought: How many times have I demonstrated Azure Data Factory and clicked right past file format selection without giving Parquet a second thought? Too many times. It was time to change that.

Another route is to use PolyBase. If you’re okay with writing the results to Azure Blob Storage, you can insert directly into Parquet files the results of a SQL query. If that sounds interesting, here are posts on connecting to Azure Blob Storage via PolyBase and inserting into Azure Blob Storage. I insert in CSV format to make it easier for people to follow, but swap the file format with Parquet and it works all the same.

Comments closed

Connecting to Postgres with PolyBase

I clear one blog post off my backlog:

Now that we have some data, let’s go back to SQL Server. I assume you’ve already installed and configured PolyBase—if not, check out my presentation on PolyBase. Note that this requires SQL Server 2019 or later, as that’s the first version which supports PolyBase to ODBC. Here’s a script which assumes a database named Scratch and a master key <<SomeSecureKey>>.

Click through for step-by-step instructions to get started, though I will freely admit that I don’t have the Postgres knowledge to give you a full listing of sharp edges.

Comments closed

PolyBase and Availability Groups

Rajendra Gupta has a detailed article on working with PolyBase in an Availability Group:

In this 28th article for SQL Server Always On Availability Group series, we explore the high-availability for the SQL Server PolyBase(SSB) external tables using AG groups.

There’s a lot of detail in the article and it’s worth reading in conjunction with Nathan Schoenack’s post. Someday I’ll get to the blog post on my backlog around PolyBase and AGs, especially with scale-out clusters. Someday.

Comments closed

Querying an AS400 with PolyBase

Lee Markum proves you can read data from an AS400 via PolyBase:

Before I dive into that, why was I interested in this feature? What did I hope to gain? Well, first of all, there was definitely the motivation of wondering, “Can we get this to work?” Secondly, and more practically, the promise of SQL Server Data Virtualization is to make other data sources available without using a Linked Server and without the time it takes to develop an ETL process to move the data. On a related note, you can cut out the time it takes for an ETL job to actually move the data somewhere like a data warehouse or flattened tables for reporting. Third, the Polybase feature has a built in engine that can provide query performance not available via Linked Server. Fourth, I wanted to provide a way for developers to query data in the AS400 without having to learn the different syntax required by the AS400 iSeries. Fifth, query writers can also join the external able to local SQL Server data.

Lee used the ODBC driver functionality; click through to see how that worked and what needed to change.

Comments closed

Issue with PolyBase and Cosmos DB

I found an issue with connecting to Cosmos DB from PolyBase after installing SQL Server 2019 CU2:

After upgrading to SQL Server 2019 CU2, I noticed some issues when trying to connect to a Cosmos DB collection via PolyBase. Specifically, I started getting the following error message:

Msg 105082, Level 16, State 1, Line 35
105082;Generic ODBC error: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>>.documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15) Additional error <2>: ErrorMsg: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>> .documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15), SqlState: HY000, NativeError: 110 .

Read on for a couple attempts at a solution and some more detail.

Comments closed

PolyBase Bug Around Windows Authentication

I have a post documenting a bug in SQL Server 2019:

Here’s the short version of the bug. If you are connected using a Windows authenticated account and attempt to perform a PolyBase-related action, such as creating an external data source or querying from an external table, you receive the following error:

Msg 46721, Level 20, State 1, Line 5
Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication.

Because this is an error with a severity level of 20, it kills your session.

Click through for the workaround. I had hoped that this would have been fixed with CU3, but it’s still in there.

Comments closed