Press "Enter" to skip to content

Category: Synapse Analytics

CI/CD with Azure Synapse Link for SQL Server 2022

Kevin Chant gives us the whole story:

In this post I want to show you a complete CI/CD experience for Azure Synapse Link for SQL Server 2022 tables. Which uses a YAML Pipeline in Azure DevOps. Including how to automatically stop and start it in the pipeline.

In a previous post I showed how an easier way to perform CI/CD for Azure Synapse Link for SQL Server 2022. Where you only need to stop the link, update the SQL Server database and afterwards start the link again.

However, the best CI/CD solutions are the ones where you do not do any manual work at all. This includes stopping and starting the link.

And that’s just what Kevin gives us.

Comments closed

Reading Delta Tables from Power BI via Synapse Serverless

Dan English is up for a data lake change:

In this post I just wanted to show the ability to use the Delta Lake format that is very common now with Power BI. I will go over a quick example of creating the files to reference, building a view to use with Power BI, and then querying the data in Power BI.

In my Synapse Workspace I created a Linked service connection to an Azure SQL Database that has the AdventureWorksLT database loaded which is the sample database you can create when you first create a SQL instance in Azure and here is a walkthrough link and see the Additional settings section.

Dan shows how to create the lake files in delta format via Synapse pipeline and then how to query the data from there.

Comments closed

Named Entity Encryption in Spark

Arshad Ali wants to secure some data being used in a Synapse Spark pool:

As a data engineer, we often get requirements to encrypt, decrypt, mask, or anonymize certain columns of data in files sitting in the data lake when preparing and transforming data with Apache Spark. The extensibility feature of Spark allows us to leverage a library which is not native to Spark. One such library is Microsoft Presidio, which provides fast identification and anonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more. It facilitates both fully automated and semi-automated PII (Personal Identifiable Information) de-identification and anonymization flows on multiple platforms.

In this blog post, I am going to demonstrate step by step how to download and use this library to meet the above requirements with Spark pool of Azure Synapse Analytics.

Read on to see how it works.

Comments closed

Interacting with Microsoft Graph API via Synapse

Paul Hernandez starts a new series:

In this and the next post I want to show you how to connect to the Microsoft Graph API, request some data, process it and store it in a database using Synapse Analytics. 

This first post presents a sample use case, briefly introduces the Graph API, how to create a linked service to it, and how to start querying data. In the next post a sample  Synapse pipeline will be described. The pipeline grabs some data and copies it into some target tables. Finally, I will create a sample query to showcase the newly imported data. 

Because there’s some potential confusion to people, Graph API is completely different from the idea of graph databases.

Comments closed

Azure Synapse Analytics September 2022 Update

Ryan Majidimehr has an update for us:

Serverless SQL pool relies on statistics to generate an optimal query execution plan and resource allocation. Previously, you had to manually create statistics for your CSV datasets when querying with OPENROWSET to get optimal performance due to the lack of CSV auto-statistics feature. With this improvement, serverless SQL pool will automatically create statistics for CSV datasets when needed to ensure an optimal query execution plan is generated. This feature is currently enabled for OPENROWSET only. 

How much performance improvement CSV statistics will bring depends on query complexity and the amount of data processed.

Click through for the full list of updates.

Comments closed

MERGE in Dedicated SQL Pools

Emily Tehrani notes an addition to Azure Synapse Analytics:

We are thrilled to announce that the MERGE T-SQL command for Azure Synapse Dedicated SQL pools is now Generally Available! MERGE has been a highly requested addition to the Synapse T-SQL library that encapsulates INSERTs/UPDATEs/DELETEs into a single statement, drastically easing migrations and making table synchronization a breeze.

If you do decide to use this, I’d expect it to have the same bugs which make its use on-premises a mess. As always, MERGE responsibly.

Comments closed

CI/CD for Synapse Link for SQL Server 2022

Kevin Chant makes some changes:

In another post I showed how you can use CI/CD to update both ends of Azure Synapse Link for SQL Server 2022 using Azure DevOps. Allowing you to update both a SQL Server 2022 database and an Azure Synapse Analytics dedicated SQL Pool in the same deployment pipeline.

By my own admission, that method can become complex. Plus, I showed some more advanced concepts in that post. With this in mind, I have decided to cover an easier way in this post.

Read on for the simpler technique.

Comments closed

Rebuilding a Dedicated SQL Pool via Azure DevOps

Sarath Sasidharan clones an Azure Synapse Analytics dedicated SQL pool:

There are many scenarios where you want to create a new Synapse dedicated SQL pool environment based on an existing Synapse dedicated SQL pool environment. This may be required when you need to create a development or test environment based on your production environment by copying complete schemas and without copying data.

Note that this process won’t move the data itself—given that you’re starting with terabytes for an effective dedicated SQL pool, trying to create a bacpac would be an exercise in misery.

Comments closed

Tips for using Synapse Database Templates

James Serra provides some guidance:

I had previously blogged about Azure Synapse Analytics database templates, and wanted to follow-up with some notes and tips on that feature as I have been involved on a project that is using it:

– Purview does not yet pull in the metadata for database templates (table/field descriptions and table relationships). Right now it pulls in the metadata as if it was a SQL table or as if it was a file in ADLS. Both just have the basic information supported by those types. The SQL one is probably preferred

– Power BI does not import the table and field descriptions when connecting to a lake database (where the database templates are stored), but it does import the table relationships. You can see the table descriptions by hovering over the table names in the navigator when importing tables using the “Azure Synapse Analytics workspace (Beta)” connector. Note you are not able to see the table descriptions when hovering over the table names using the “Azure Synapse Analytics SQL” connector. Also note the “Select Related Tables” button does not work in the navigator

Click through for more notes from the field.

Comments closed

Loading the Synapse Data Explorer Pool

Gauri Mahajan loads some event data for analysis:

In my previous article, Getting started with Data Explorer pools in Azure Synapse, we learned how to create Data Explorer pools in Azure Synapse and the unique value that Data Explorer brings to semi-structured and free-text data. The creation of the Data Explorer pool is the first step in the process. After the pool is created, one can create data structures, ingest data and then use it for consumption. Data Explorer pool interface provides different ways of ingesting data into the pool including one-click ingestion.

Some of these techniques are really straightforward. Others—especially if you’re talking about large amounts of data—do require installing and working with local tooling.

Comments closed