Press "Enter" to skip to content

Category: Synapse Analytics

Rolling Your Own Serverless SQL Pool Database Project

Kevin Chant doesn’t let the lack of support for a product limit him:

In this post I want to share how I created a homemade serverless SQL Pool database project.

Because I know people are keen to work this way right now. Mostly due to the comments I received when I covered how to deploy a dacpac to a serverless SQL pool.

By the end of this post you will know how I created a database project for it. Plus, how you can deploy the contents of the database project with Azure DevOps. I also share plenty of links along the way.

Though Kevin did run into some challenges trying to hack in a solution, so it’s not quite as useful as you’d first hope.

Comments closed

Synapse and Azure ML Pipelines

Santosh Thomas integrates two Azure products:

As more customers standardize on the Synapse data platform, enabling machine learning workflows through Azure Machine Learning (Azure ML) becomes particularly interesting. This is especially true as more customers look to bring their data engineering and data science practices together and mature capabilities on both sides.

The goal of this blog post is to highlight how Synapse and Azure ML can work well together to deliver key insights. This is motivated by a scenario where a customer modernized their data platform on Azure Synapse but was looking to improve their data science practices through Azure ML. The focus of this blog is to expose existing functionality, and it is not a “hardened” solution with security or other cloud best practice implementations. The workflow steps also assume some level of comfort with Python and working with the Azure Python SDKs.

There was a time in which Microsoft wanted us to remain in Synapse for machine learning tasks, but that time is gone: the emphasis is definitely to do machine learning tasks in Azure ML, regardless of where the data lives…unless there’s a Spark job involved, in which case things get all weird again.

Comments closed

Principles of Synapse Security

Liliam Leme provides an overview of security options in Azure Synapse Analytics:

This blog post will provide an overview of the Synapse security environment focused on Dedicated SQL Pool, Serverless SQL Pool, and Spark.

Security has many layers and frequently it will determine how you build your process. I start this post by reviewing several important security considerations which you can later apply to your Synapse environment. 

This is a fairly lengthy post and it still only covers a moderate amount of what you’d want to do for Azure Synapse Analytics. This is the downside to having a complex interplay of several products: there’s a lot to secure and a lot to think about along the way.

Comments closed

Getting Started with Azure Synapse Analytics

Shabnam Watson gets us started with Azure Synapse Analytics:

In this blog post, I show you how easy it is to start an Azure Synapse Analytics workspace (instance) and use its Serverless SQL Pool engine to analyze sample publicly available data. As you will read shortly, Azure Synapse Analytics provides many compute engines for different use cases. The easiest one to get started with is its Serverless SQL Pool since every Azure Synapse Analytics instance comes with one already created and ready to use. It also does not have any cost unless if you use it which makes it very attractive to those who have a limited Azure budget.

Click through to see how to create a workspace, load some data, and query it via the serverless SQL pool.

Comments closed

Recommendations for Dedicated SQL Pool Data Modeling

Bhaskar Sharma has some advice:

In this article, I will discuss how to physically model an Azure Synapse Analytics data warehouse while migrating from an existing on-premises MPP (Massive Parallel Processing) data warehouse solution like Teradata and Netezza. The approach and methodologies discussed in this article are purely based on the knowledge and insight I have gained while migrating these data warehouses to Azure Synapse dedicated SQL pool. 

Dedicated SQL pools are close enough to regular SQL Server that we make a lot of assumptions about it, some of which may be wrong.

Comments closed

Deploying a dacpac to the Serverless SQL Pool

Kevin Chant drops some dacpacs off at the (serverless) pool:

In this post I want to cover deploying a dacpac to a serverless SQL pool using Azure DevOps. Yes, you are reading that right. It is now possible thanks to a sqlpackage update.

To clarify, a dacpac file is a special file that you can use to deploy updates to SQL Server related databases using a state-based deployment. Plus, when I say serverless SQL pool I mean an Azure Synapse Analytics serverless SQL Pool.

Kevin includes examples for Azure DevOps as well as GitHub Actions.

Comments closed

Trying out Azure Synapse Link for SQL Server 2022

Kevin Chant looks at Azure Synapse Link for SQL Server 2022:

My first topic is about a new feature that covers both SQL Server 2022 and Azure. Which is Azure Synapse Link, or to be more precise Azure Synapse Link for SQL Server 2022.

I have been doing various tests with this feature recently. Which has led to some interesting blog posts about Azure Synapse Link for SQL Server 2022.

Read on for a few more thoughts, as well as deployment scripts via Azure DevOps and GitHub Actions.

Comments closed

Automated Delta Lake Maintenance in Synapse

Shalu Ganotra Chadha, et al, explain how to keep your Delta Lake tidy:

The useful features of Delta Lake come at the cost of requiring regular maintenance. Delta Lake requires periodic cleanup as it accumulates files over time with each upsert and retains previous snapshots of the data. They can quickly convert a small dataset (in MBs) to several GBs of storage. This is because deleted data is not really removed but retained as an older snapshot of the Delta Lake dataset.

Click through for two operations you can perform on a Delta Lake, as well as some recommendations on when to do what via the Genie Delta Lake Auto Maintenance scripts they provide.

Comments closed

Disabling Public Network Access in Synapse

Ryan Adams builds a private endpoint:

If you disable public access to your Azure Synapse Workspace you will get the following error message when attempting to open Synapse Studio. 

“Failed to load one or more resources due to forbidden issue, error code 403.” 

Click through for more information about routing for Synapse resources and what you’d need to do in order to disable public network access entirely.

Comments closed

Power BI Incremental Refresh with Non-Standard Dates in Parquet Files

Shabnam Watson hits on a specific but interesting use case:

The most common scenario for setting up the out of the box incremental refresh in Power BI is to base it off of a datetime column; however, there are cases when you may want to set up incremental refresh based off of a column with a data type other than datetime. Examples are when you are working with a smart date ID (01012023 for Jan 1,2023) column or when you are working with a source system that has partitioned data using a column such as Year that has a numeric data type.

A use case for the latter scenario is when you are working with Parquet/Delta files via Azure Synapse Analytics Serverless SQL Pool. When working with larger datasets, it is typical to see the Parquet/Delta files partitioned by date ranges. Depending on how much data there is, the partitioning may be at the Year level instead of Day.

With that scenario in mind, read on to learn how you can minimize your Power BI processing time and costs when doing incremental refresh.

Comments closed