Press "Enter" to skip to content

Category: Synapse Analytics

Azure Synapse Analytics June 2022 Updates

Ryan Majidimehr has some updates for us:

Fuzzy matching with a sliding similarity score option has been added to the Join transformation in Mapping Data Flows. You can create inner and outer joins on data values that are similar rather than exact matches! Previously, you would have had to use an exact match. The sliding scale value goes from 60% to 100%, making it easy to adjust the similarity threshold of the match. 

Read on for the full list of updates.

Leave a Comment

Triggering a Power BI Dataset Refresh from Synapse

Nick Edwards updates a dataset:

Login to powerbi.com and in the top right hand corner locate “Settings” and then “Admin portal”

Under “Tenant settings” locate “Developer Settings” and then “Allow service principles to user Power BI APIs”.

Set this service to “Enabled” using the toggle. Next under the heading “Apply to:” select “Specific security groups (Recommended)”. Next add the newly created security group “AzureSynapsePowerBIIntegration” and click apply.

Click through for the full process.

Leave a Comment

Finding Assigned Synapse RBAC Roles in Powershell

Charith Caldera wants to know your access level:

One of the key use cases that most customers face difficulties while retrieving or assigning the role-based access control in Azure Synapse Analytics, that they cannot find the correct usernames, group names or the service principal names using the PowerShell cmdlet “Get-AzSynapseRoleAssignment”. The PowerShell cmdlet only provides limited information and it’s difficult to understand since that contains the object IDs. 

Read on for a script which helps with this problem.

Leave a Comment

Monitoring Open Connections in the Serverless SQL Pool

Liliam Leme has a pair of queries for us:

Consider a scenario where you are trying to monitor the connections from other applications to serverless SQL. I hit this need while trying to understand how many connections opened I had coming from an application. Had I hit some kind of limitation on serverless SQL or not. Spoiler: There is no limit for connections on Synapse serverless SQL  as you would find with a dedicated SQL pool (formerly SQL DW).

Alternatively, the limit to the number of serverless SQL pool connections is how much cash you have in your bank account…though given that it’s $5 per TB processed, if you’re writing good queries, that’s a lot of queries and connections.

Comments closed

Azure Synapse Analytics May 2022 Updates

Ryan Majidimehr lays out some updates for Azure Synapse Analytics:

Serverless SQL pools let you query files in the data lake without knowing the schema upfront. The best practice was to specify the lengths of character columns to get optimal performance. Not anymore!  

Previously, you had to explicitly define the schema to get optimal query performance. In this case, the column countries_and_territories is defined as varchar(50):  

There are some interesting updates in this month’s release, including the public preview of Azure Synapse Link for SQL, which connects to Azure SQL DB and SQL Server 2022.

Comments closed

Azure Synapse Link for SQL

Chuck Heinzelman makes an announcement:

Azure Synapse Link for SQL is an automated system for replicating data from your transactional databases (both SQL Server 2022 and Azure SQL Database) into a dedicated SQL pool in Azure Synapse Analytics. The process of setting up a link from your SQL data to Azure Synapse takes just a few clicks and a matter of minutes rather than hours or days for traditional ETL processes. Once configured, your initial data is replicated into the target dedicated SQL pool. After the initial table seeding, changes made to your source data are replicated in near real-time. 

I’ll be interested in seeing how it performs and how efficient that change feed processor is.

Comments closed

Low-Code Churn Prediction with Synapse Analytics

Gavita Regunath shows off a capability in Azure Synapse Analytics:

We will build a machine learning solution to predict churn using Azure Synapse Analytics and Azure Machine Learning.

Azure Synapse Analytics is Microsoft’s limitless analytics platform that combines enterprise data warehousing and big data analytics. In simple terms, it is a one-stop-shop that allows you to ingest, prepare, and manage data that can then be used for machine learning and business intelligence, all from a single place. It provides a unified platform and encourages collaboration between data and machine learning professionals.

This article will show you how to build an end-to-end solution to train a machine learning model from Azure Synapse analytics using AutoML functionality within Azure Machine Learning. Using the T-SQL Predict statement, we can then use the trained machine model to make predictions against the churn dataset stored in the SQL Pool table. One of the key benefits of working from within Azure Synapse is that all the necessary steps required to train and make predictions with the trained model can be done from a single platform, Azure Synapse.

Click through for the three-step process and a demonstration.

Comments closed

Comparing Databricks to Synapse Spark Pools

Corrinna Peters makes comparisons:

There are different cases for using both depending on the specific needs and requirements, Synapse and Databricks are similar, but both have their own areas of specialities or rather areas where they are above the other.

Data Lake – they both allow you to query the data from the data lake, Synapse uses either the SQL on demand pool or Spark and Databricks uses the Databricks workspace once you have mounted the data lake. If you are predominately a SQL user and prefer the code and the BI developer feel then Synapse would be the correct choice whereas if you are a Data Scientist and prefer to code in Python or R then Databricks would feel more at home.

Read on for a nuanced take. My less nuanced take is, Databricks beats the pants off of Synapse Spark pools in terms of performance. Synapse has a much better overall ecosystem, expanding beyond Spark and into T-SQL (in two flavors) and log/event analytics with KQL. If you’re spending 100% of your time in Spark and don’t care about the rest, use Databricks; if Spark is a relatively small part of your warehousing work, use Synapse.

1 Comment

This Month in Synapse

Ryan Majidimehr recaps the updates for Azure Synapse Analytics in April of 2022:

Since the GA of Synapse, customers have asked for a fine-grained RBAC (role-based access control) role that allows a user persona to monitor the execution of Synapse Pipelines and Spark applications without having the ability to run or cancel the execution of these applications.  

Now, customers can assign the Synapse Monitoring Operator role to such monitoring personas. This allows organizations to stay compliant while having flexibility in the delegation of tasks to individuals or teams. 

That is indeed useful.

Comments closed

Synapse Database Templates GA

Kevin Schofield makes an announcement:

We’re pleased to announce today that Synapse Database Templates are now Generally Available and that we are also making available three additional Synapse Database Templates for Healthcare Insurance, Healthcare Providers, and R&D and Clinical Trials.

The Healthcare Insurance template is a comprehensive data model that addresses the typical data requirements of organizations providing insurance to cover healthcare needs (sometimes known as Payors).

The Healthcare Providers template is a comprehensive data model that addresses the typical data requirements of organizations providing healthcare services.

The R&D and Clinical Trials template is a comprehensive data model that addresses the typical data requirements of organizations involved in research and development and clinical trials of pharmaceutical products and devices.

Read on to learn more about how these templates work and what you can do with them.

Comments closed