Press "Enter" to skip to content

Category: Synapse Analytics

Monitoring Open Connections in the Serverless SQL Pool

Liliam Leme has a pair of queries for us:

Consider a scenario where you are trying to monitor the connections from other applications to serverless SQL. I hit this need while trying to understand how many connections opened I had coming from an application. Had I hit some kind of limitation on serverless SQL or not. Spoiler: There is no limit for connections on Synapse serverless SQL  as you would find with a dedicated SQL pool (formerly SQL DW).

Alternatively, the limit to the number of serverless SQL pool connections is how much cash you have in your bank account…though given that it’s $5 per TB processed, if you’re writing good queries, that’s a lot of queries and connections.

Comments closed

Azure Synapse Analytics May 2022 Updates

Ryan Majidimehr lays out some updates for Azure Synapse Analytics:

Serverless SQL pools let you query files in the data lake without knowing the schema upfront. The best practice was to specify the lengths of character columns to get optimal performance. Not anymore!  

Previously, you had to explicitly define the schema to get optimal query performance. In this case, the column countries_and_territories is defined as varchar(50):  

There are some interesting updates in this month’s release, including the public preview of Azure Synapse Link for SQL, which connects to Azure SQL DB and SQL Server 2022.

Comments closed

Azure Synapse Link for SQL

Chuck Heinzelman makes an announcement:

Azure Synapse Link for SQL is an automated system for replicating data from your transactional databases (both SQL Server 2022 and Azure SQL Database) into a dedicated SQL pool in Azure Synapse Analytics. The process of setting up a link from your SQL data to Azure Synapse takes just a few clicks and a matter of minutes rather than hours or days for traditional ETL processes. Once configured, your initial data is replicated into the target dedicated SQL pool. After the initial table seeding, changes made to your source data are replicated in near real-time. 

I’ll be interested in seeing how it performs and how efficient that change feed processor is.

Comments closed

Low-Code Churn Prediction with Synapse Analytics

Gavita Regunath shows off a capability in Azure Synapse Analytics:

We will build a machine learning solution to predict churn using Azure Synapse Analytics and Azure Machine Learning.

Azure Synapse Analytics is Microsoft’s limitless analytics platform that combines enterprise data warehousing and big data analytics. In simple terms, it is a one-stop-shop that allows you to ingest, prepare, and manage data that can then be used for machine learning and business intelligence, all from a single place. It provides a unified platform and encourages collaboration between data and machine learning professionals.

This article will show you how to build an end-to-end solution to train a machine learning model from Azure Synapse analytics using AutoML functionality within Azure Machine Learning. Using the T-SQL Predict statement, we can then use the trained machine model to make predictions against the churn dataset stored in the SQL Pool table. One of the key benefits of working from within Azure Synapse is that all the necessary steps required to train and make predictions with the trained model can be done from a single platform, Azure Synapse.

Click through for the three-step process and a demonstration.

Comments closed

Comparing Databricks to Synapse Spark Pools

Corrinna Peters makes comparisons:

There are different cases for using both depending on the specific needs and requirements, Synapse and Databricks are similar, but both have their own areas of specialities or rather areas where they are above the other.

Data Lake – they both allow you to query the data from the data lake, Synapse uses either the SQL on demand pool or Spark and Databricks uses the Databricks workspace once you have mounted the data lake. If you are predominately a SQL user and prefer the code and the BI developer feel then Synapse would be the correct choice whereas if you are a Data Scientist and prefer to code in Python or R then Databricks would feel more at home.

Read on for a nuanced take. My less nuanced take is, Databricks beats the pants off of Synapse Spark pools in terms of performance. Synapse has a much better overall ecosystem, expanding beyond Spark and into T-SQL (in two flavors) and log/event analytics with KQL. If you’re spending 100% of your time in Spark and don’t care about the rest, use Databricks; if Spark is a relatively small part of your warehousing work, use Synapse.

1 Comment

This Month in Synapse

Ryan Majidimehr recaps the updates for Azure Synapse Analytics in April of 2022:

Since the GA of Synapse, customers have asked for a fine-grained RBAC (role-based access control) role that allows a user persona to monitor the execution of Synapse Pipelines and Spark applications without having the ability to run or cancel the execution of these applications.  

Now, customers can assign the Synapse Monitoring Operator role to such monitoring personas. This allows organizations to stay compliant while having flexibility in the delegation of tasks to individuals or teams. 

That is indeed useful.

Comments closed

Synapse Database Templates GA

Kevin Schofield makes an announcement:

We’re pleased to announce today that Synapse Database Templates are now Generally Available and that we are also making available three additional Synapse Database Templates for Healthcare Insurance, Healthcare Providers, and R&D and Clinical Trials.

The Healthcare Insurance template is a comprehensive data model that addresses the typical data requirements of organizations providing insurance to cover healthcare needs (sometimes known as Payors).

The Healthcare Providers template is a comprehensive data model that addresses the typical data requirements of organizations providing healthcare services.

The R&D and Clinical Trials template is a comprehensive data model that addresses the typical data requirements of organizations involved in research and development and clinical trials of pharmaceutical products and devices.

Read on to learn more about how these templates work and what you can do with them.

Comments closed

Cross-Subscription Restore for Dedicated SQL Pools

Steve Howard announces some good news:

We are excited to announce the release of cross-subscription restore. This has been one of our top requested features from customers as it unlocks multiple scenarios from dev/test to simplified billing at the subscription level for restored data warehouses.

Click through to see how you can do this. There was a workaround in the past but this should be quite a bit faster.

Comments closed

Streaming Data into Synapse Dedicated SQL Pool

Lionel Penuchot loads some data:

This article reviews a common pattern of streaming data (i.e. real-time message ingestion) in Synapse dedicated pool. It opens a discussion on the simple standard way to implement this, as well as the challenges and drawbacks. It then presents an alternate solution which enables optimal performance and greatly reduces maintenance tasks when using clustered column store indexes. This is aimed at developers, DBAs, architects, and anyone who works with streams of data that are captured in real-time.

I’d probably avoid the MERGE statement in there because of how many problems there are with it. That said, this is a useful pattern for trickle-loading columnstore tables.

Comments closed