Polybase – Page 6 – Curated SQL

Let me tell you about one of my least favorite things I like to see in PolyBase:
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
This error is not limited to PolyBase but is instead an issue when trying to run MapReduce jobs in Hadoop. There are several potential causes, so let’s cover each of them as they relate to PolyBase and hopefully one of these solves your issue.

Click through for four potential solutions to what ails you.

Comments closed

PolyBase and Hive Shim Errors

Published 2019-03-14 by Kevin Feasel

I ran into a problem with Hive 3 and PolyBase:

My initial plan was to google things. The specific error: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number. That pops up HIVE-15326 and HIVE-15016 but gave me no immediate joy.
After reaching out to James Rowland-Jones (t), we (by which I mean he) eventually figured out the issue.

Click through for the solution.

Comments closed

PolyBase and Pushdown Limitations

Published 2019-03-07 by Kevin Feasel

I have a post covering something I learned about predicate pushdown against Hadoop in PolyBase:

Before I start, let’s talk about predicate pushdown for a moment. The gist of it is that when you have data in two sources, you have two options for combining the data:
1. Bring the data in its entirety from your remote source to your local target and work as though everything were in the local target to begin with. I’ll call this the streaming approach.
2. Send as much of your query’s filters, projections, and pre-conditions to the remote source, have the remote source perform some of the work, and then have the remote source send its post-operative data to the local target. Then, the local target once more treats this as though it were simply local data. This is the pushdown approach because you push down those predicates (that is, filters, projections, and pre-conditions).

Click through for the unfortunate finding and also vote up my UserVoice feature request if you want to see string columns as filters.

Comments closed

Querying CosmosDB with PolyBase

Published 2019-03-06 by Kevin Feasel

Hasan Savran shows how you can query CosmosDB with T-SQL statements:

SQL Server Polybase Services lets us to pull data from other data resources by using T-SQL queries. SQL Server 2019 introduces new connectors for Polybase services like Oracle, Teradata and MongoDB. In one of the SQL Server 2019 presentations from Bob Ward, I saw the CosmosDB logo when he was talking about the new connectors of SQL Server 2019. CosmosDB already has an ODBC driver and you can use it as a datasource for your Power BI, SSIS or SSMS. SQL Server 2019 makes this connection easier by using the Polybase services.

In this post, I am going to show you how to configure SQL Server 2019 Polybase to connect Azure CosmosDB Mongo databases

I have a bit of a vested interest in this, so it is heartening to see people trying out PolyBase. It’s more heartening when they use it after they try it out.

Comments closed

Tips For Using PolyBase With Cloudera QuickStart VM

Published 2019-02-21 by Kevin Feasel

I have a post on using Cloudera’s QuickStart VM with PolyBase:

Here’s something which tripped me up a little bit while connecting to Cloudera using SQL Server. The data node name, instead of being quickstart.cloudera like the host name, is actually localhost. You can change this in /etc/cloudera-scm-agent/config.ini.
Because PolyBase needs to have direct access to the data nodes, having a node called localhost is a bit of a drag.

I’m used to the Hortonworks Data Platform, so this is a quick compendium of things I noticed to were different.

Comments closed

Data Transformation Tools In The Azure Space

Published 2019-01-11 by Kevin Feasel

James Serra gives us an overview of the major tools you would use for ETL and ELT in Azure:

If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake. And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL). Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data. There is good news and bad news when it comes to which product to use. The good news is there are a lot of products to choose from. The bad news is there are a lot of products to choose from :-). I’ll try to help your decision-making by talking briefly about most of the Azure choices and the best use cases for each when it comes to transforming data (although some of these products also do the Extract and Load part

The only surprise is the non-mention of Azure Data Lake Analytics, and there is a good conversation in the comments section explaining why.

Comments closed

Keeping Polybase Tables In Sync With Biml

Published 2018-12-20 by Kevin Feasel

Ben Weissman combines two of my favorite things:

If you have started playing with polybase, you probably figured out by now, that – as awesome as it is – it’s still a bit of a pain to set it up and maintain external tables. There is a wizard in Azure Data Studio but it’s still under development, especially from a usability standpoint.
So what can be done about that? Well, we effectively looking for an easy way to read metadata from a relational database and automate T-SQL to mirror that metadata. HELLO?! Perfect usecase for Biml – which is NOT just for SSIS.
Let’s take a look at how that can be done…

If only Ben could have used F# instead of VB and VB with curly braces…

Comments closed

Technologies Surrounding Big Data Clusters In SQL Server 2019

Published 2018-12-04 by Kevin Feasel

Buck Woody has some long-term homework for people:

Some of these technologies and concepts are not owned or created by Microsoft – the concepts are universal, and a few of the technologies are open-source. I’ve marked those in italics.

I’ve also included a few links to a training resource I’ve found to be useful. I normally use LinkedIn Learning for larger courses, along with EdX, DataCamp, and many other platforms for in-depth training. The links I have indicated here are by no means exhaustive, but they are free, and provide a good starting point.

Click through for a list of some of the technologies in play.

Comments closed

Polybase And Azure Data Studio

Published 2018-11-07 by Kevin Feasel

Rajendra Gupta continues his series on Polybase in SQL Server 2019 with a look at Polybase integration in Azure Data Studio:

We have learned earlier that PolyBase in SQL Server 2019 Preview allows access to various data sources such as SQL Server, Oracle, MongoDB, Teradata, and ODBC based sources etc. Azure Data Studio SQL Server 2019 preview extension currently supports for SQL Server and Oracle data sources only from the External table wizard.

In this series, we will create an external table for SQL Server and explore some more features around it.

Launch Azure Data Studio and connect to the SQL Server 2019 preview instance. Right click on the database and launch ‘Create External Table’.

Rajendra also looks at some of the Polybase DMVs and the notion of predicate pushdown, which is critical to understand for writing Polybase queries which perform well.

Comments closed

Using Polybase External Tables To Connect To Oracle

Published 2018-11-01 by Kevin Feasel

Rajendra Gupta continues his Polybase series:

In part 2 of the series, we saw that the external table could be accessed similarly to a relational database table. One more advantage is that we can join them with any relational tables.

Let us see how we can join the external table with the relational DB tables. I have saved the data into a CSV file so we will import the table using my earlier article, SQL Server Data Import using SQL Operations Studio. Therefore, you can follow the article in the same way in the Azure Data Studio also. I will just give high-level steps to import data from flat file into Azure Data Studio in this article.

Click through for more.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Polybase

MRAppMaster Errors Running MapReduce Jobs

PolyBase and Hive Shim Errors

PolyBase and Pushdown Limitations

Querying CosmosDB with PolyBase

Tips For Using PolyBase With Cloudera QuickStart VM

Data Transformation Tools In The Azure Space

Keeping Polybase Tables In Sync With Biml

Technologies Surrounding Big Data Clusters In SQL Server 2019

Polybase And Azure Data Studio

Using Polybase External Tables To Connect To Oracle