Press "Enter" to skip to content

Category: Synapse Analytics

Serverless SQL Pool CI/CD

Kevin Chant doesn’t have time for manual deployments:

I want to cover one way you can do CI/CD for Azure Synapse Analytics serverless SQL pools using Azure DevOps in this post. Because I know it is a popular topic.

It’s related to my post about how you can create a dacpac for an Azure Synapse Analytics dedicated SQL pool using Azure DevOps. Since they are both based in the same service.

Plus, a while ago I wrote about the increase in demand for Data Platform automation. So, I really wanted to do a post about how you can do CI/CD for Azure Synapse Analytics serverless SQL pools.

Read on to learn how.

Comments closed

Azure Synapse Analytics October 2021 Update

Saveen Reddy summarizes the newest updates in Azure Synapse Analytics:

Use Stringify in data flows to easily transform complex data types to strings

Mapping data flows helps you perform code-free data transformation your Synapse pipelines. When you work with complex data types such as structures, arrays, map, you need to transform them into strings. You can do this by using the new Stringify data transformation simplifying this common task.

Read on for the full set of updates.

Comments closed

Optimizing Blob Storage Query Performance

Dennes Torres compares several strategies for querying data stored in Azure Blob Storage:

In the third part of the series Querying Blob Storage with SQL, I will focus on the performance behaviour of queries: What makes them faster, slower, and some syntax beyond the basics.

The performance tests in this article are repeated, and the best time of the queries is recorded. This doesn’t mean you will always achieve the same timing. Many architectural details will affect the timing, such as cache, first execution, and so on. The timing exposed on each query is only a reference pointing to the differences of the query methods that can affect the time and the usual result for better or worse performance.

Click through to see which patterns perform well and which don’t.

Comments closed

Using Query Labels in Azure Synapse Analytics

Gauri Mahajan shows one of the pieces of functionality in Azure Synapse Analytics dedicated SQL pools that I’d like to see on-premises:

Azure Synapse supports a concept known as “query labels” that allows tagging any DDL or DML queries that are executed on the dedicated SQL pool. These labels can be queried using the dynamic management views (DMVs). One can use these labels to describe the purpose of the query or add any metadata to the query being executed and the same can be used later for instrumenting the queries, specifically to identify the queries that meet the desired search criteria. Let’s walk through a step-by-step exercise to understand this concept practically.

Click through for the process.

Comments closed

Synapse vs Snowflake

Travis Manning has a throw-down:

Data warehousing has become a hot topic for most organizations as data volume grows exponentially, and yet the capacity to manually manage it all but diminishes. The ecosystem is replete with options, each with a host of features and integrations. In this article, we will discuss two of the most common (and commonly discussed!) data warehousing services, Azure Synapse and Snowflake Data Warehouse (DW). For this article, we will try to focus on use cases, and which option is appropriate in that context.

Click through for the product comparison. One big difference not covered is pricing uncertainty. If you have a good understanding of the number of executions and computational complexity of your queries, as well as data quantities, Snowflake can be very competitively priced. But what can happen is that the competitive price turns into a much-less-competitive price by the time you’re fully up to speed.

1 Comment

Deploying Synapse Artifacts to a Managed vNet Workspace

Rui Cunha takes us through an Azure Synapse Analytics deployment scenario:

In my previous article, I demonstrated how we could easily use the Synapse Workspace Deployment extension to accomplish this second stage of the process. I’m now coming back to this topic as I realized that many of our customers were reporting difficulties in completing this second stage of their Synapse CICD process because they were failing to deploy Synapse artifacts to a Managed VNET Synapse Workspace.

In this particular scenario, the deployment was failing because their target workspace was not allowing access from public networks.

Fortunately, the answer isn’t “Allow access from public networks.” Click through to see what you can do instead.

Comments closed

From Azure Data Factory to Synapse Pipelines

Kevin Chant copies and pastes:

In this post I want to share an alternative way to copy an Azure Data Factory pipeline to Synapse Studio. Because I think it can be useful.

For those who are not aware, Synapse Studio is the frontend that comes with Azure Synapse Analytics. You can find out more about it in another post I did, which was a five minute crash course about Synapse Studio.

By the end of this post, you will know one way to copy objects used for an Azure Data factory pipeline to Synapse Studio. Which works as long as both are configured to use Git.

Click through to see how.

Comments closed

Spark Performance Improvements in Azure Synapse

Balaji Sankaran shows improvements Microsoft has made over open-source Apache Spark 3 in Azure Synapse Analytics:

Azure Synapse Analytics is continually focused on delivering a highly performant and scalable platform for supporting Spark Workload. We are focused on improving the query performance for the typical workload patterns that we see with our customers. By combining the latest open-source updates in Apache Spark with our team’s focus on performance updates we have made significant performance gains in standard TPC-DS benchmarking tests.

I expect it will never be as fast as what Databricks can do, but getting a 2x performance improvement over the open source version of Spark is nothing to sneeze at.

Comments closed

Distribution Techniques in Azure Synapse Analytics

Gauri Mahajan takes us through three distribution techniques when working with Azure Synapse Analytics dedicated SQL pool tables:

Data warehouses host much larger volumes of data compared to transactional databases, the volume of reads is much more compared to writes and queries tend to result in much larger result sets compared to queries that retrieve scalar values or paginated record sets from transactional databases. Due to this nature of data warehouses, there is a higher impetus on the server to perform faster. Modern data warehouses like AWS Redshift, Azure Synapse, Snowflake and others employ approaches like data sharding where data is distributed horizontally on multiple nodes which process data in parallel. This approach is highly scalable as nodes can be easily added to a data cluster as the storage and performance need increases. One key aspect that is different for tables hosted on such data warehouses is that tables are distributed horizontally using different distribution algorithms, so that all the nodes in an Azure Synapse cluster have an equal share of responsibility for hosting, processing, and delivering data for any given query to maximize performance.

In this article, we will learn about the table distribution styles supported in an Azure Synapse and how to use them for creating distributed tables.

Read on to learn more. This is an example of something we don’t think about on the SQL Server side, so when moving to Azure Synapse Analytics dedicated SQL pools, it can be easy to get this wrong and end up with sub-optimal performance.

Comments closed

Interchangability between ADF and Synapse Integration Pipelines

Paul Andrew makes a discovery:

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.

As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Read on to learn the answer.

Comments closed