Press "Enter" to skip to content

Category: Synapse Analytics

Querying Serverless SQL Pools from Spark Notebooks in Scala

Jovan Popovic shows off one integration point between the data services in Azure Synapse Analytics:

Azure Synapse Analytics provides multiple query runtimes that you can use to query in-database or external data. You have the choice to use T-SQL queries using a serverless Synapse SQL pool or notebooks in Apache Spark for Synapse analytics to analyze your data.

You can also connect these runtimes and run the queries from Spark notebooks on a dedicated SQL pool.

In this post, you will see how to create Scala code in a Spark notebook that executes a T-SQL query on a serverless SQL pool.

Read on to see how. You can also query Spark pool and dedicated SQL pool tables from serverless SQL pools.

4 Comments

Synapse Studio in 5 Minutes

Kevin Chant wants 4 minutes and 58 seconds of your time:

In this post I want to do a five minute crash course about Synapse Studio. Because I have recently been asked to do this by colleagues.

In addition, I want to clear up some confusion about what you need to do before you can access Synapse Studio.

Aim of this post is for you will have a better overview of Synapse Studio within five minutes. Which happens to be the estimated reading time of this post.

Click through and be sure to start the stopwatch.

Comments closed

Loading Azure Synapse Analytics using PolyBase

Gauri Mahajan needs to load some data:

Azure Synapse Analytics is Microsoft’s data warehousing offering on Azure Cloud. It supports three types of runtimes – SQL Serverless Pool, SQL Dedicated Pool, and Spark Pools. As there are a variety of data sources on Azure, it’s very obvious that there can be varying types and volumes of data that would have to be loaded into Azure Synapse pools. There are three major types of data ingestion approaches that can be used to load data into Synapse. The COPY command is the most flexible and elaborate mechanism, where someone can execute this command from a SQL pool to load data from supported data repositories. This command is convenient to load ad-hoc and small to medium-sized data loads into Synapse. The second method of loading data is the Bulk Insert, where the method name is self-relevant regarding the approach functionality. To ingest the data from supported repositories into dedicated SQL pools, PolyBase is as efficient and at times it’s even more efficient than the COPY command. This article will help you understand the process to ingest data into Azure Synapse Analytics using PolyBase to load the data.

Click through for the process.

Comments closed

Hash Distributions, Shuffling, and Data Types

Reiss McSporran explains an issue in Azure Synapse Analytics dedicated SQL pools:

Oh dear. Shuffle. On the surface, shuffle is not necessarily a bad thing and may even be expected, but not in this scenario. So what is shuffle and why can it be bad?

Shuffle occurs when a part of a distributed table is moved to a different node during query execution. To do this a hash value is computed using the join columns, the node is then found that has that hash value and the row is then sent to that node for processing. If the tables you’re joining on have different hash distributions, you’d expect this to happen, (hence why it’s not always a bad thing!), but in this case I’ve used the same value to hash both of my tables, so what’s gone wrong?

Click through to see what’s gone wrong.

Comments closed

Azure Synapse Pathway

John Macintyre announces a new product:

Azure Synapse Pathway connects to the source system and inspects details about your database objects. An assessment report captures further details on the database objects that can be translated into Azure Synapse Analytics. With Azure Synapse Pathway the source database objects are automatically converted and optimized to T-SQL code on Azure Synapse Analytics. This means your existing code, whether a thousand or million lines of code, will be converted by Azure Synapse Pathway.

As a result of these capabilities, the traditional process of manual code conversion can now be automated in a fraction of the time; all while cutting out manual errors and reducing the total cost of the migration.

They’re starting with a few data sources (including Snowflake), but it’s an interesting product. I could see it useful for getting 80-85% of the migration done, though I don’t trust auto-generated code to be optimal.

Comments closed

Synchronizing Metadata between Spark Tables and Serverless Pool

Charl Roux takes us through one back-end integration mechanism between tables in Azure Synapse Analytics Spark pools and serverless SQL pool:

Synapse provides an exciting feature which allows you to sync Spark database objects to Serverless pools and to query these objects without the Spark pool being active or running.  Synapse workspaces are accessed exclusively through an Azure AD Account and objects are created within this context in the Spark pool. In some scenarios I would like to share the data which I’ve created in my Spark database with other users for reporting or analysis purposes. This is possible with Serverless and in this article I will show you how to complete the required steps from creation of the object to successful execution. 

Click through for the demonstration.

Comments closed

The Production-Readiness of Azure Synapse Analytics

Paul Andrew casts some harsh light:

While I completely share and actually like Microsoft’s vision of an analytics resource…

“that brings together data integration, enterprise data warehousing and big data analytics”

https://azure.microsoft.com/en-gb/services/synapse-analytics/

… the marketing, hype and technical implementation have resulted in a lot of confusion and disappointment.

So, to answer the title of this blog post directly. My opinion, as I write on 29th January 2021, is: NoAzure Synapse Analytics is not ready. Sorry Microsoft, but you’ve had long enough. I can’t hold back the questions and demands from customers anymore on why Synapse still isn’t included in my architecture diagrams.

Paul raises many good points, and the positive takeaway is that these are fixable issues. But as of today, they are definitely things you want to consider before jumping in.

Comments closed

Combining Azure Synapse Analytics and Azure Purview

Wolfgang Strasser shows how we can integrate Azure Synapse Analytics with Azure Purview:

In the past months I had the chance to play with and build solutions based on Azure Synapse Analytics and Azure Purview.

Azure Synapse (my Synapse blog entries) as the foundation for a solid platform to store, analyze and build data solutions and Azure Purview (my Purview blog posts) as the data governance and data catalog solution in Azure.

During the writing of my latest blog post (What’s new in Azure Synapse Analytics?), I found a very interesting entry in the update feature list: Azure Purview Integration.

Read on to see how.

Comments closed

Optimizing a SQL Server 2019 Project for a Dedicated SQL Pool

Kevin Chant shows us how we can modify a database schema intended for SQL Server 2019 to work best with an Azure Synapse Analytics dedicated SQL pool:

In this post I want to cover how you can transform your SQL Server database schema for a dedicated SQL Pool if you are using Azure DevOps. Because I covered it at Data Toboggan over the weekend and it can be very useful.

By the end of this post, you will know one way you can transform the schema of a database project for SQL Server 2019 if you are using Azure DevOps. So that you can make it optimal for dedicated SQL Pools.

Click through for the process and an example. Note that this isn’t a quick “check this box and you’re done” type of solution, but if you already have a proper star schema, this will help you think through some of the things you’ll need to do.

Comments closed