Category: Warehousing

The Key Concepts of Azure Synapse Analytics

Published 2020-06-18 by Kevin Feasel

Simon Whiteley takes a look at what Azure Synapse Analytics really is:

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace. Having two products, both talked about in Marketing, one generally available, one still in preview – it’s no wonder people are still confused!

Simon also has a video, which I recommend so that you can enjoy the funny way he pronounces “Synapse.” That said, next time I’m in the UK, it’ll be just as fair for someone to point out the funny way I pronounce “Synapse.” Also, you should watch the video because Simon knows the topic cold and does a great job of explaining things.

Comments closed

The Power BI Pro’s Guide to Azure Synapse Analytics

Published 2020-06-02 by Kevin Feasel

Brett Powell gives a quick overview of a new whitepaper:

The Power BI Professional’s Guide to Azure Synapse Analytics, a white paper I wrote describing the Synapse Analytics platform and its benefits and use cases for Power BI professionals, was published a couple weeks ago. This post discusses a few themes from this paper and also shares a couple notes that weren’t included.

There are some interesting notes in here, so check those out and also get ahold of the whitepaper to understand how Power BI relates to the artist formerly known as Azure SQL Data Warehouse.

Comments closed

Azure Synapse Analytics in Preview

Published 2020-05-20 by Kevin Feasel

Simon Whiteley clarifies a Build announcement:

Today’s the day! There’s much buzz & excitement as we FINALLY get to see Azure Synapse Analytics in public preview, ready for us all to get our hands on it. There’s a raft of other announcements that come hand & hand with it too.
What’s that? You thought Azure Synapse Analytics was already available? You’ve been using all year and don’t see what the fuss is about??
I’m expecting this to be the common reaction. The marketing story for Synapse has been… interesting… to say the least. I’ve been asked several times in the last week exactly what the new story is and, given today’s news, I thought I’d clarify.

The big picture is the version of Azure Synapse Analytics I’ve been interested in for a bit, so it’s nice to see the movement here.

Comments closed

Monitoring Azure Synapse Analytics SQL Pools with Power BI

Published 2020-05-12 by Kevin Feasel

Brett Powell has a pair of Power BI templates for monitoring Azure Synapse Analytics:

Upon clicking ‘Load’ you’ll either need to provide your credentials for this source (if you don’t have this data source saved from previous use) or the queries will execute and the following report pages will be available:
– Executions
– Waits
– Sessions
– Waits Detail
– Execution Detail
– Memory
– ExecutionDrillThrough (hidden)

Click through to see what the templates look like and how to obtain them.

Comments closed

Workload Isolation in Azure Synapse Analytics

Published 2020-04-27 by Kevin Feasel

Niko Neugebauer explains how resource governance works with Azure Synapse Analytics SQL Pools:

Carrying on with the Azure Synapse series on the workload identification, classification and isolation started with
Query Identification in Azure SQL DW (Synapse Analytics), in this post I wanted to focus on the workload groups and the workload isolation (aka Resource Governance).
Before advancing and looking into Azure Synapse Analytics “Resource Governor” (my own naming, my fault – and yeah, I shall keep it naming properly), we need to look at the resource classes in Azure Synapse Analytics.
But even before that et me start with WTH – Where is the Heck of Resource Governance in Azure SQL Database ? (Don’t throw at me those Managed Instances, which is a SQL Server with Availability Group running in tuned VM in the background – I want & need the Azure SQL Database to have the proper Resource Governance.

Click through for an explanation plus demonstration.

Comments closed

The Flink-Hive Integration

Published 2020-04-01 by Kevin Feasel

Bowen Li takes us through Apache Flink 1.10’s integration with Apache Hive:

On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.
Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in Flink 1.10 and we can’t wait to walk you through the details.

Click through to see how it works.

Comments closed

Sort Keys and Join Types in Amazon Redshift

Published 2020-03-02 by Kevin Feasel

Derik Hammer takes us through query tuning a nasty job on Amazon Redshift:

My team built a process to load from a couple of base tables, in our Amazon Redshift enterprise data warehouse, into an other table which would act as a data mart entity. The data was rolled up and it included some derived fields. The SQL query had some complicity [complexity?, ed.] to it.
This process ran daily and was being killed by our operations team after running for 22 hours.
I stepped in to assist with performance tuning and discovered that join choices, such as INNER vs. OUTER joins have a big impact on whether Redshift can use its sort keys or not.

Click through for more details and what Derik ended up doing.

Comments closed

RESULT_SCAN() in Snowflake

Published 2020-02-06 by Kevin Feasel

Koen Verbeeck introduces us to the RESULT_SCAN() function in Snowflake DB:

I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.
This post builds upon part 6 of the series, which dealt with query history. There it is explained how Snowflake caches the query results. You can find a query in the history and take a look at what was returned. Using the RESULT_SCAN table function, you can do this with SQL. Let’s take a look at an example.

This is an interesting function. Click through to see it in action.

Comments closed

Connecting to Snowflake with Power BI

Published 2020-02-06 by Kevin Feasel

Gilbert Quevauvilliers shows us how we can connect from a Snowflake DB instance to Power BI using DirectQuery:

The first thing I did was to install the ODBC Drivers.
I installed the 64bit drivers where I had my Power BI Desktop installed, and I also installed it on all the Servers where I had the On-Premise Data gateway installed.
Below is the link that I used which should always be the latest version
https://sfc-repo.snowflakecomputing.com/odbc/win64/latest/index.html
One thing to note is all that I did was I installed the ODBC driver I did not actually do any configuration of the ODBC driver, this is because it will be configured in Power BI Desktop.

Read on for the configuration instructions as well as getting past “it works in Power BI Desktop.”

Comments closed

Moving Data Around in Azure Synapse Analytics

Published 2020-01-31 by Kevin Feasel

Niko Neugebauer looks at some techniques for copying data into a table in an Azure Synapse Analytics SQL Pool:

First of all, let us list some of them (and I am not even attempting on providing all of them, of course):
– INSERT INTO … SELECT FROM … (the most well known one)
– SELECT INTO … FROM … (the most well-known to perform well, since it will create a HEAP while copying most of the properties from the original table(s))
– CREATE TABLE … AS SELECT … (the old way, which must be like 10 years old on PDW/APS & Azure SQL DW, but that has never gotten into a Box Product or Azure SQL Database)
– Polybase (that will use the External Tables & externally allocated data to transfer into Azure SQL DW)
– BCP (good old tested friend that will give you a pain in the neck until you dominate it)
– OPENROWSET / BULK INSERT (some very good and very old friends with complicated histories (who remembers all the code pages?, settings and uncertain future mostly because of their original restrictions, I guess)
– COPY INTO … (the brand new command that will allow you under very neat privileges to copy data from the external storage accounts, much like BULK INSERT)
In this blog post I will simply focus on those features that have not been ported (hopefully just yet): CTAS & COPY INTO.

Read on to see how these two work. Also, I too have wanted CTAS in on-premises SQL Server for years.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31