Press "Enter" to skip to content

Category: Synapse Analytics

Monitoring the Serverless SQL Pool via Log Analytics

Sidney Cirqueira shows how to monitor SQL requests in Azure Synapse Analytics:

Today I would like to share a scenario that I was working on one of my serverless SQL Pool support cases. The customer asked for an advice on how to monitor serverless SQL requests by using log analytics.

The intention of this guide is to help you with choosing the configuration required to easily setup the Synapse Analytics Workspace monitoring and all other considerations about how to monitor serverless SQL requests with Azure Monitor. Spoiler: At the end of this article, I will share the latest version of the serverless workbook posted on the Azure_Synapse_Tool_Box. This includes a really cool way to see query execution information.

Read on for that and definitely check out the Azure Synapse Toolbox if you’re a Synapse user.

Comments closed

Transferring Data between Dedicated SQL and Spark Pools in Synapse

Sidney Cirqueria shows off a connector available to us in Azure Synapse Analytics:

Usually, customers do this kind of operation using Synapse Apache Spark to load data to Dedicated Pool within Azure Synapse Workspace, but today, I would like to reproduce a different scenario that I was working on one of my support cases.  Consider a scenario where you are trying to load data from Synapse Spark to Dedicated pool (formerly SQL DW) using Synapse Pipelines, and additionally you are using Synapse Workspace deployed with Managed Virtual Network.

The intention of this guide is to help you with which configuration will be required if you need to load data from Azure Synapse Apache Spark to Dedicated SQL Pool (formerly SQL DW). If you prefer take advantage of the new feature-rich capabilities now available via the Synapse workspace and Studio and load data directly from Azure Apache Spark to Dedicated Pool in Azure Synapse Workspace is recommended that you enable Synapse workspace features on an existing dedicated SQL pool (formerly SQL DW).

Read on for a few tips a nd a step-by-step walkthrough of the process.

Comments closed

Replicated Tables in Dedicated SQL Pools

Pedro Martinez explains the idea behind replicated tables in Azure Synapse Analytics dedicated SQL pools:

If you have ever used Azure Synapse Analytics dedicated SQL pool you would know there are multiple table types to choose from, for your workload. You might ask yourself, “when can I use Replicated table type and how I can efficiently use them”?  

This blog is going to talk in detail about replicated table type, when to use and what are best practices for its usage. But before that, let’s start by understanding the different table types: 

I’ve seen replicated tables get overused, so check out Pedro’s advice on how not to get burned with them.

Comments closed

CI/CD with Azure Synapse Link for SQL Server 2022

Kevin Chant gives us the whole story:

In this post I want to show you a complete CI/CD experience for Azure Synapse Link for SQL Server 2022 tables. Which uses a YAML Pipeline in Azure DevOps. Including how to automatically stop and start it in the pipeline.

In a previous post I showed how an easier way to perform CI/CD for Azure Synapse Link for SQL Server 2022. Where you only need to stop the link, update the SQL Server database and afterwards start the link again.

However, the best CI/CD solutions are the ones where you do not do any manual work at all. This includes stopping and starting the link.

And that’s just what Kevin gives us.

Comments closed

Reading Delta Tables from Power BI via Synapse Serverless

Dan English is up for a data lake change:

In this post I just wanted to show the ability to use the Delta Lake format that is very common now with Power BI. I will go over a quick example of creating the files to reference, building a view to use with Power BI, and then querying the data in Power BI.

In my Synapse Workspace I created a Linked service connection to an Azure SQL Database that has the AdventureWorksLT database loaded which is the sample database you can create when you first create a SQL instance in Azure and here is a walkthrough link and see the Additional settings section.

Dan shows how to create the lake files in delta format via Synapse pipeline and then how to query the data from there.

Comments closed

Named Entity Encryption in Spark

Arshad Ali wants to secure some data being used in a Synapse Spark pool:

As a data engineer, we often get requirements to encrypt, decrypt, mask, or anonymize certain columns of data in files sitting in the data lake when preparing and transforming data with Apache Spark. The extensibility feature of Spark allows us to leverage a library which is not native to Spark. One such library is Microsoft Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more. It facilitates both fully automated and semi-automated PII (Personal Identifiable Information) de-identification and anonymization flows on multiple platforms.

In this blog post, I am going to demonstrate step by step how to download and use this library to meet the above requirements with Spark pool of Azure Synapse Analytics.

Read on to see how it works.

Comments closed

Interacting with Microsoft Graph API via Synapse

Paul Hernandez starts a new series:

In this and the next post I want to show you how to connect to the Microsoft Graph API, request some data, process it and store it in a database using Synapse Analytics. 

This first post presents a sample use case, briefly introduces the Graph API, how to create a linked service to it, and how to start querying data. In the next post a sample  Synapse pipeline will be described. The pipeline grabs some data and copies it into some target tables. Finally, I will create a sample query to showcase the newly imported data. 

Because there’s some potential confusion to people, Graph API is completely different from the idea of graph databases.

Comments closed

Azure Synapse Analytics September 2022 Update

Ryan Majidimehr has an update for us:

Serverless SQL pool relies on statistics to generate an optimal query execution plan and resource allocation. Previously, you had to manually create statistics for your CSV datasets when querying with OPENROWSET to get optimal performance due to the lack of CSV auto-statistics feature. With this improvement, serverless SQL pool will automatically create statistics for CSV datasets when needed to ensure an optimal query execution plan is generated. This feature is currently enabled for OPENROWSET only. 

How much performance improvement CSV statistics will bring depends on query complexity and the amount of data processed.

Click through for the full list of updates.

Comments closed

MERGE in Dedicated SQL Pools

Emily Tehrani notes an addition to Azure Synapse Analytics:

We are thrilled to announce that the MERGE T-SQL command for Azure Synapse Dedicated SQL pools is now Generally Available! MERGE has been a highly requested addition to the Synapse T-SQL library that encapsulates INSERTs/UPDATEs/DELETEs into a single statement, drastically easing migrations and making table synchronization a breeze.

If you do decide to use this, I’d expect it to have the same bugs which make its use on-premises a mess. As always, MERGE responsibly.

Comments closed

CI/CD for Synapse Link for SQL Server 2022

Kevin Chant makes some changes:

In another post I showed how you can use CI/CD to update both ends of Azure Synapse Link for SQL Server 2022 using Azure DevOps. Allowing you to update both a SQL Server 2022 database and an Azure Synapse Analytics dedicated SQL Pool in the same deployment pipeline.

By my own admission, that method can become complex. Plus, I showed some more advanced concepts in that post. With this in mind, I have decided to cover an easier way in this post.

Read on for the simpler technique.

Comments closed