Press "Enter" to skip to content

Category: Synapse Analytics

Migrating the Serverless SQL Pool to Fabric

Kevin Chant makes a move:

By the end of this post, you will know how to migrate serverless SQL Pool objects to a Microsoft Fabric Data Warehouse using Azure DevOps. Along the way I share plenty of links and advice.

Please note that Microsoft Fabric is currently in Public Preview and what you see in this post is subject to change.

This is the relatively easy one. The real challenge will be dedicated SQL pool migration.

Comments closed

Auto-Pausing Synapse Dedicated SQL Pools

Mark Broadbent saves some money via pool auto-pausing:

This capability is neither earth shatteringly new nor unexpected, and something that Databricks has provided for some time. Of the two Data Exploration & Data Warehousing Pool types, Synapse Serverless Pool (otherwise know as the built-in Pool) by its very definition does not incur compute charges when it is not running.

Therefore this leaves us with only dedicated SQL Pool to worry about and this is where our problems begin.

Click through for the scripts to pause and resume a dedicated SQL pool, and Mark promises a part 2 in which we see the automation.

Comments closed

Microsoft Fabric vs Synapse

Nikola Ilic shares some thoughts:

I’ve already introduced Microsoft Fabric in the previous article, so if you’re still not sure what is it all about and why you can think of Fabric as your “data football team”, I strongly encourage you to check that article. Additionally, there are many great articles and videos, both from Microsoft and the community, where you can find out more about Fabric and its various scenarios and components.

In the above-mentioned article, I scratched the surface of the inevitable topic that now comes into focus: “What now for Azure Synapse Analytics?” Since I’ve been asked this exact question multiple times in the previous days, I’ve decided to put down my thoughts and share them in this article.

Read the whole thing. My thoughts, which are generally similar to Nikola’s:

  • There are no plans (at this time) to remove Synapse, and even if there were, prior history—like with Azure SQL DW—says that the deprecation timeframe is something we can measure in years rather than months
  • Fabric is intended to replace Synapse one of these days, and new customers should start with Fabric
  • Current Synapse customers should stay on Synapse for now, especially given that there is currently no easy migration plan. Give partners and Microsoft some time to sort that out, though, and I expect you’ll see tools and products for this by the time Fabric goes GA
  • PaaS and SaaS are quite different and that can be an influential factor. My personal preference is for SaaS, especially knowing how difficult it can be to secure Synapse while still enabling developer functionality
  • We’re on day 4 of Fabric being a thing (at least in public), and it’ll probably be in a public preview for a while, so there’s still plenty of baking left to do
Comments closed

Feature Branching and Hotfixes for Azure DevOps

Vytas Suopys covers a bit of source control strategy:

Have you ever deployed a release to production only to find out a bug has escaped your testing process and now users are being severely impacted? In this post, I’ll discuss how to deploy a fix from your development Synapse Workspace into a production Synapse Workspace without adversely affecting ongoing development projects.

This example uses Azure DevOps for CICD along with a Synapse extension for Azure DevOps: Synapse Workspace Deployment. In this example, I assume Synapse is already configured for source control with Azure DevOps Git and Build and Release pipelines are already defined in Azure DevOps. Instructions on how to apply this this can be found in the Azure Synapse documentation for continuous integration and delivery.

The specific example covers Synapse, though the general principle applies no matter what you’re deploying.

Comments closed

An Overview of Azure Synapse Analytics

Kevin Chant offers a primer on Azure Synapse Analytics:

In reality, there are a lot of features within Azure Synapse Analytics where your SQL Server background can prove to be useful.

By the end of this post, you will have a good overview of Azure Synapse Analytics. In addition, where your SQL Server background can prove to be useful. Plus, there are plenty of links included in this post.

This is not the slimmest of primers, which makes sense given how broad Synapse is.

Comments closed

Slowly-Changing Dimensions in the Serverless SQL Pool

Lilliam Leme is building a serverless warehouse:

As organizations continue to collect and store large volumes of data in their data lakes, managing this data effectively becomes increasingly important. One key aspect of this is implementing Slow Change Dimension type 2, which allows organizations to track historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. In this blog post we will address the following scenario: a customer wants to implement Slow Change Dimension type 2 on top of their data lake.

For this example, we will use Serverless SQL Pool to demonstrate how this can be done. Additionally, in the next post, we will explore how the same approach can be used with Spark.

This turns out to be more work than a classic SQL Server-based solution because of the fact that the serverless SQL pool is read-only, save for CETAS statements.

Comments closed

Azure Synapse Analytics April 2023 Updates

Ryan Majidimehr has an update for us:

Low Shuffle Merge optimization for Delta tables is now available in Apache Spark 3.2 and 3.3 pools. You can now update a Delta table with advanced conditions using the Delta Lake MERGE command. It can update data from a source table, view, or DataFrame into a target table. The current algorithm of the MERGE command is not optimized for handling unmodified rows. With Low Shuffle Merge optimization, unmodified rows are excluded from expensive shuffling execution and written separately.

To learn more about this new command, read Low Shuffle Merge optimization on Delta tables

Looks like a bit of work on Data Explorer pools and a little bit on Spark pools and Synapse Link to Cosmos DB to round out the month.

Comments closed

Converting JSON to a Relational Schema with KQL

Devang Shah does some flattening and moving:

In the world of IoT devices, industrial historians, infrastructure and application logs, and metrics, machine-generated or software-generated telemetry, there are often scenarios where the upstream data producer produces data in non-standard schemas, formats, and structures that often make it difficult to analyze the data contained in these at scale. Azure Data Explorer provides some useful features to run meaningful, fast, and interactive analytics on such heterogenous data structures and formats. 

In this blog, we’re taking an example of a complex JSON file as shown in the screenshot below. You can access the JSON file from this GitHub page to try the steps below.

Click through for the example, which is definitely non-trivial.

Comments closed

Spark ELT in Synapse Notebooks

Liliam Leme performs some data movement:

I often receive various requests from customers while working on FastTrack projects, and I have compiled some examples to help you build your solution on top of a data lake using useful tips. Most of the examples in this post use pandas, and I hope they will be helpful for you as they were for me.

Please note that all examples in this post use pyspark.

In my scenario, I exported multiple tables from SQLDB to a folder using a notebook and ran the requests in parallel.

Read on for the examples and some of the things you can do with Spark notebooks in Azure Synapse Analytics.

Comments closed

Azure Synapse Analytics March 2023 Update

Ryan Majidimehr has an update for us:

We are excited to announce that Multi-Column Distribution (MCD) for Azure Synapse Dedicated SQL pools is now Generally Available! MCD is highly desirable for easing migrations, promotes faster query performance, and reduces data skew. 

You can choose to distribute data on multiple columns to balance the data distribution in your tables and reduce data movement during query execution. Multi-Column distribution will allow you to choose up to eight columns for distribution. 

Click through for the full list of changes. It seems like there’s a decent spread between the four major pool types, with the emphasis on Data Explorer pools this month.

Comments closed