Polybase Design Patterns On Azure SQL DW

Simon Whiteley continues his Polybase on Azure SQL Data Warehouse series.  First, he covers data loading patterns:

That’s enough about data loading for now, there’s another major use case for Polybase that we haven’t yet discussed. Many data processing solutions have a huge, unwieldy overnight batch job that performs aggregates, lookups, analytics and various other calculations.

However, it is often the case that this is not timely enough for many business requirements. This is where Polybase can help.

If we have an External Table over the newest files, this will read these new records at query time. We can write a view that combines the External Table with our batch-produced table. This will obviously go a little slower than usual, given it has to read data from flat files each time, however the results returned will be up to date.

Simon then covers the Create Table As Select statement:

In order to utilise SQLDW effectively, we write SQL for our transformations, rather than relying on external tools such as SSIS. This ensures the work is being done by our compute nodes and, therefore, can be scaled up to increase performance.

General best practice, therefore, would be write stored procedures for each of the data movements we want to occur. This allows us to add in auditing, logging etc. But what we’re interested in here is the core data movement itself.

Writing a traditional INSERT statement isn’t the fastest way to get data into a table. There is a special syntax which creates a new table and inserts into it, that is automatically configured for optimal bulk loading, this is the CTAS, or “Create Table as Select” statement.

This is a pair of interesting posts from Simon.

Related Posts

Data Lake Analysis With Excel And Power BI

Sachin C Sheth announces support for Azure Data Lake Store within Excel and Power BI: Until now, if you had to analyze data stored in ADLS with Excel, you would have to copy it into a relational data store like Azure SQL Data Warehouse or download the data onto a machine, and then use Excel […]

Read More

Azure SQL Database Performance Tips

Arun Sirpal looks at some performance issues in Azure SQL Database: Let’s assume that you are not driven by logins, workers and session counts how does one select the right level? What exactly does DTUs (Database Transaction Units) mean? I suggest reading this post by Andy Mallon https://sqlperformance.com/2017/03/azure/what-the-heck-is-a-dtu I am going to undersize my database and […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930