Synapse Analytics – Page 18

In my previous article, I demonstrated how we could easily use the Synapse Workspace Deployment extension to accomplish this second stage of the process. I’m now coming back to this topic as I realized that many of our customers were reporting difficulties in completing this second stage of their Synapse CICD process because they were failing to deploy Synapse artifacts to a Managed VNET Synapse Workspace.
In this particular scenario, the deployment was failing because their target workspace was not allowing access from public networks.

Fortunately, the answer isn’t “Allow access from public networks.” Click through to see what you can do instead.

Comments closed

From Azure Data Factory to Synapse Pipelines

Published 2021-09-24 by Kevin Feasel

Kevin Chant copies and pastes:

In this post I want to share an alternative way to copy an Azure Data Factory pipeline to Synapse Studio. Because I think it can be useful.
For those who are not aware, Synapse Studio is the frontend that comes with Azure Synapse Analytics. You can find out more about it in another post I did, which was a five minute crash course about Synapse Studio.
By the end of this post, you will know one way to copy objects used for an Azure Data factory pipeline to Synapse Studio. Which works as long as both are configured to use Git.

Click through to see how.

Comments closed

Spark Performance Improvements in Azure Synapse

Published 2021-09-22 by Kevin Feasel

Balaji Sankaran shows improvements Microsoft has made over open-source Apache Spark 3 in Azure Synapse Analytics:

Azure Synapse Analytics is continually focused on delivering a highly performant and scalable platform for supporting Spark Workload. We are focused on improving the query performance for the typical workload patterns that we see with our customers. By combining the latest open-source updates in Apache Spark with our team’s focus on performance updates we have made significant performance gains in standard TPC-DS benchmarking tests.

I expect it will never be as fast as what Databricks can do, but getting a 2x performance improvement over the open source version of Spark is nothing to sneeze at.

Comments closed

Distribution Techniques in Azure Synapse Analytics

Published 2021-09-21 by Kevin Feasel

Gauri Mahajan takes us through three distribution techniques when working with Azure Synapse Analytics dedicated SQL pool tables:

Data warehouses host much larger volumes of data compared to transactional databases, the volume of reads is much more compared to writes and queries tend to result in much larger result sets compared to queries that retrieve scalar values or paginated record sets from transactional databases. Due to this nature of data warehouses, there is a higher impetus on the server to perform faster. Modern data warehouses like AWS Redshift, Azure Synapse, Snowflake and others employ approaches like data sharding where data is distributed horizontally on multiple nodes which process data in parallel. This approach is highly scalable as nodes can be easily added to a data cluster as the storage and performance need increases. One key aspect that is different for tables hosted on such data warehouses is that tables are distributed horizontally using different distribution algorithms, so that all the nodes in an Azure Synapse cluster have an equal share of responsibility for hosting, processing, and delivering data for any given query to maximize performance.
In this article, we will learn about the table distribution styles supported in an Azure Synapse and how to use them for creating distributed tables.

Read on to learn more. This is an example of something we don’t think about on the SQL Server side, so when moving to Azure Synapse Analytics dedicated SQL pools, it can be easy to get this wrong and end up with sub-optimal performance.

Comments closed

Interchangability between ADF and Synapse Integration Pipelines

Published 2021-09-09 by Kevin Feasel

Paul Andrew makes a discovery:

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.
As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Read on to learn the answer.

Comments closed

Configuring Classifiers for Workload Management

Published 2021-08-02 by Kevin Feasel

Reiss McSporran continues a series on workload management in Azure Synapse Analytics:

So in part 1 we found out what Workload Management was and why we wanted to use it. (If you haven’t read that part, please click here and read that post before you carry on, as we’re jumping in at the deep end!) In this part we’ll be learning how to configure it.
Continuing from the end of the previous post, you have two main options to configure this. First is to assign a user or role to one of the existing system defined workload groups, second is to create a custom workload group and assign a user or role to this instead.

Read on to learn how to create classifiers, what the rules look like for them, and their importance.

Comments closed

Querying Private Blob Storage Containers with Azure Synapse Analytics

Published 2021-07-22 by Kevin Feasel

Dennes Torres looks at some private information:

The queries from the previous article were made against the public container in the blob storage. However, if the container is private, you will need to authenticate with the container. In this article, you’ll learn how to query private blob storage with SQL.
NOTE: Be sure that the Azure Synapse Workspace and the storage account with the sample files are set up before following along with this article. You will also need to replace your storage account URL each time that a storage account URL is used in the article.
There are three possible authentication methods, and these methods may have some variation according to the type of storage account and the access configuration. I will not dig into details about storage here and leave that for a future article.

Read on for the three authorization methods and a lot of detail on using SAS tokens (the preferred method) to access this data.

Comments closed

A Review of Serverless SQL Pools

Published 2021-06-30 by Kevin Feasel

Teo Lachev lays out a review of serverless SQL pools in Azure Synapse Analytics:

Being able to query files using SQL is great. Some of you might remember the U-SQL language that was introduced a few years ago alongside Azure Data Lake Storage (ADLS) Gen 1 which is now deprecated. It never caught up because it looked like SQL but it wasn’t (it was actually closer to C#). Now we’re talking about real SQL. To query files! This opens the possibility to implement a logical warehouse (the emphasis is on logical as everyone to my knowledge who tried to replace a data warehouse with a data lake has failed). Or, you can connect Power BI to the serverless endpoint and start querying all these files in DirectQuery mode. So, this enables real-time BI on top of file extracts.

Click through for the pros and cons of using serverless SQL pools today.

Comments closed

Moving Artifacts between Folders in Synapse Studio

Published 2021-06-25 by Kevin Feasel

Wolfgang Strasser looks at a recent update:

Another small but very powerful usability extension in Azure Synapse Studio was added at the beginning of June: Move artifacts across folders in Synapse Studio (without extra clicks but with drag&drop)
Once again, the release notes list contained the short sentence that made me curious… hmm… that sound nice… In one of my previous post, I described the “old” way of moving artifacts around in Synapse Studio.

Click through for a demonstration.

Comments closed

Linking Multiple Power BI Workspaces to Synapse Analytics

Published 2021-06-15 by Kevin Feasel

Wolfgang Strasser points out a new feature with Azure Synapse Analytics:

Let’s call it by accident, that I found out that Azure Synapse Analytics now let’s you to link more than just one Power BI workspace.. There is an entry in the Release notes/Update menu that lists this option.

Read on to see what this means for you.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: Synapse Analytics

Deploying Synapse Artifacts to a Managed vNet Workspace

From Azure Data Factory to Synapse Pipelines

Spark Performance Improvements in Azure Synapse

Distribution Techniques in Azure Synapse Analytics

Interchangability between ADF and Synapse Integration Pipelines

Configuring Classifiers for Workload Management

Querying Private Blob Storage Containers with Azure Synapse Analytics

A Review of Serverless SQL Pools

Moving Artifacts between Folders in Synapse Studio

Linking Multiple Power BI Workspaces to Synapse Analytics