Polybase – Page 4 – Curated SQL

Connecting to Postgres with PolyBase

Published 2020-11-03 by Kevin Feasel

Now that we have some data, let’s go back to SQL Server. I assume you’ve already installed and configured PolyBase—if not, check out my presentation on PolyBase. Note that this requires SQL Server 2019 or later, as that’s the first version which supports PolyBase to ODBC. Here’s a script which assumes a database named Scratch and a master key <<SomeSecureKey>>.

Click through for step-by-step instructions to get started, though I will freely admit that I don’t have the Postgres knowledge to give you a full listing of sharp edges.

Comments closed

PolyBase and Availability Groups

Published 2020-09-30 by Kevin Feasel

Rajendra Gupta has a detailed article on working with PolyBase in an Availability Group:

In this 28^th article for SQL Server Always On Availability Group series, we explore the high-availability for the SQL Server PolyBase(SSB) external tables using AG groups.

There’s a lot of detail in the article and it’s worth reading in conjunction with Nathan Schoenack’s post. Someday I’ll get to the blog post on my backlog around PolyBase and AGs, especially with scale-out clusters. Someday.

Comments closed

PolyBase, Excel, and the TOP Operator

Published 2020-06-25 by Kevin Feasel

I follow up on a bug from SQL Server 2019 CU2:

Back with SQL Server 2019 CU2, I reported an error with PolyBase connecting to Excel when trying to select TOP(10) from the table. I’m using the Microsoft Access Database Engine 2016 Redistributable’s Excel driver.

Click through for an explanation of what’s happened since then on this front.

Comments closed

Querying an AS400 with PolyBase

Published 2020-04-09 by Kevin Feasel

Lee Markum proves you can read data from an AS400 via PolyBase:

Before I dive into that, why was I interested in this feature? What did I hope to gain? Well, first of all, there was definitely the motivation of wondering, “Can we get this to work?” Secondly, and more practically, the promise of SQL Server Data Virtualization is to make other data sources available without using a Linked Server and without the time it takes to develop an ETL process to move the data. On a related note, you can cut out the time it takes for an ETL job to actually move the data somewhere like a data warehouse or flattened tables for reporting. Third, the Polybase feature has a built in engine that can provide query performance not available via Linked Server. Fourth, I wanted to provide a way for developers to query data in the AS400 without having to learn the different syntax required by the AS400 iSeries. Fifth, query writers can also join the external able to local SQL Server data.

Lee used the ODBC driver functionality; click through to see how that worked and what needed to change.

Comments closed

Issue with PolyBase and Cosmos DB

Published 2020-03-19 by Kevin Feasel

I found an issue with connecting to Cosmos DB from PolyBase after installing SQL Server 2019 CU2:

After upgrading to SQL Server 2019 CU2, I noticed some issues when trying to connect to a Cosmos DB collection via PolyBase. Specifically, I started getting the following error message:
Msg 105082, Level 16, State 1, Line 35
105082;Generic ODBC error: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>>.documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15) Additional error <2>: ErrorMsg: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>> .documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15), SqlState: HY000, NativeError: 110 .

Read on for a couple attempts at a solution and some more detail.

Comments closed

PolyBase Bug Around Windows Authentication

Published 2020-03-17 by Kevin Feasel

I have a post documenting a bug in SQL Server 2019:

Here’s the short version of the bug. If you are connected using a Windows authenticated account and attempt to perform a PolyBase-related action, such as creating an external data source or querying from an external table, you receive the following error:
Msg 46721, Level 20, State 1, Line 5
Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication.
Because this is an error with a severity level of 20, it kills your session.

Click through for the workaround. I had hoped that this would have been fixed with CU3, but it’s still in there.

Comments closed

PolyBase and Excel

Published 2020-02-21 by Kevin Feasel

I have a post on setting up PolyBase to work with Microsoft Excel:

If you tried to use Microsoft’s Excel driver prior to 2019 CU2, you’d get the following error:
Msg 105082, Level 16, State 1, Line LineNumber
105082;Generic ODBC error: [Microsoft][ODBC Excel Driver]Optional feature not implemented
To this point, I recommended in PolyBase Revealed that you use a different driver, like CData’s, which did work. CData’s driver still works (I assume…PolyBase ODBC support is a fluid situation, it seems), but now I can officially say that PolyBase supports the Microsoft Access Database Engine Redistributable driver for Microsoft Excel. Let’s go to the tape.

Click through for the instructions.

Comments closed

Investigating the Big Data Cluster Data Pool

Published 2020-01-14 by Kevin Feasel

Mohammad Darab takes us through Big Data Cluster data pools:

Data pools enable the creation of scale-out data marts. Whether your data is being ingested from Spark jobs or SQL, it is stored into the data pool. Data is distributed across one, or two, SQL Server instances running queries against it is more efficient.
Whether the data is being ingested from IoT device, Kafka, another relational data source (like Oracle or Teradata), it all is stored into the data pool instances and are available as “data marts” for the consumer to work with. There is no need to go back out to the original data source each time you want to query the data. It is all available inside the data pool instances.

This lets you cache data brought in via PolyBase and spread it across a number of instances. That’s pretty powerful.

Comments closed

Diagnosing PolyBase Errors

Published 2019-12-31 by Kevin Feasel

Niels Berglund takes us through an odd “incorrect syntax” error with PolyBase:

What we see in Figure 1, incorrect syntax exception, is strange, as I have executed the same code in a SQL Server 2019 Big Data Cluster, (BDC), without any issues, and the forum poster executed the same in SQL Server 2019 Enterprise Edition also without any issues.
Ok, but what about creating an external table against a relational data source – where we do not need to define an external file format?

There is a straightforward answer as to why the specific error message pops up, but I agree with Niels that it’d be nice to have “here’s the problem and here’s the solution” types of error messages. The deeper you get into the product—especially the older Hadoop external data source—the worse the error messages get.

Comments closed

When PolyBase Startup Fails in SQL Server 2019

Published 2019-11-27 by Kevin Feasel

Niels Berglund hits an annoying error after installing PolyBase on SQL Server 2019:

At MS Ignite in Orlando November 4 – 8, 2019, Microsoft announced the general availability of SQL Server 2019. At the same time, the SQL Server 2019 Developers Edition appeared as an MSDN download, and of course, I downloaded it and installed it on my dev box.
After the installation, I noticed that PolyBase did not start up correctly, and I saw dump files all over the place. After some investigation, I figured out what the issue was, and this blog post describes the fix.

This only affects Developer and Express editions, not Standard or Enterprise.

Comments closed

Category: Polybase