Press "Enter" to skip to content

Category: Synapse Analytics

Troubleshooting Azure Synapse Link for SQL Server Issues

Kevin Chant diagnoses an issue:

In this post I want to cover common Azure Synapse Link for SQL storage permission issues. Since I helped a fellow MVP out with this recently.

To be more precise, I want to show how you can fix one of the most common issues I tend to encounter with Azure Synapse Link for SQL. Which is access to the Data Lake Storage Gen2 account.

I have encountered issues like this a few times now. For example, when I was performing my file tests for Azure Synapse Link for SQL Server 2022.

Click through to learn more about a couple of common issues, their causes, and resolutions.

Comments closed

Tips for Using a Data Lakehouse

James Serra shares some advice:

As I mentioned in my Data Mesh, Data Fabric, Data Lakehouse presentation, the data lakehouse architecture, where you use a data lake with delta lake as a software layer and skip using a relational data warehouse, is becoming more and more popular. For some customers, I will recommend “Use a data lake until you can’t”. What I mean by this is to take the following steps when building a new data architecture in Azure with Azure Synapse Analytics:

Click through for six notes.

Comments closed

Performance Tuning a Dedicated SQL Pool

Sarath Sasidharan has some guidance for us:

Synapse Dedicated pools have been battle tested at enterprise customers across the globe. We deal with data in the magnitude of PetaBytes. Synapse can provide you with the scale of the cloud and the high performance required for your enterprise-grade requirements.  The key to maximizing your performance is to follow best practices, check out best practices for dedicated SQL pools in Azure Synapse Analytics

Failure to do so causes performance issues. In such scenarios, is it important to understand where the bottlenecks are. This blog focuses on the different steps a query goes through; from the time the query is fired from the client until it returns back.  Delay caused in any of the steps would impact the overall run-time of the query and hence indicate degraded performance.

Click through for a walkthrough of each step along the way, potential problems you could run into, and remediations for those problems. Much of the advice is similar to what you’d get with SQL Server, though there are differences interspersed throughout each level.

Comments closed

February 2023 Updates for Azure Synapse Analytics

Ryan Majidimehr has a new round-up for us:

Azure Synapse Runtime for Apache Spark 3.3 has been in Public Preview since November 2022. We are excited to announce that after notable improvements in performance and stability, Azure Synapse Runtime for Apache Spark 3.3 now becomes Generally Available and ready for production workloads.   

The essential changes include features that come from upgrading Apache Spark to version 3.3.1, Delta Lake to version 2.2.0, and Python to 3.10. 

This month’s set of changes isn’t quite as big as some prior months, though there are a couple items of great importance to make up for it.

Comments closed

SqlPackage Support for the Serverless SQL Pool

Rui Cunha announces an update to SqlPackage.exe:

SqlPackage release 161.8089.0, dated February 13, 2023, brings a new feature: support for Synapse serverless SQL pools. According to the release notes, you can use SqlPackage to extract and publish both external and internal objects from serverless SQL pools. This includes the T-SQL surface area of serverless SQL pools is limited by design to external tables, external views, stored procedures, functions, statistics, and credentials. 

The following features are included in support for serverless SQL pools:

  • delta external file format
  • openrowset provider CosmosDB with PROVIDER, CONNECTION, OBJECT, and CREDENTIAL or SERVER_CREDENTIAL specified
  • openrowset format SStream
  • with () clause on openrowset

Rui also includes a quick demonstration of this new functionality.

Comments closed

Creating a Disaster Recovery Plan for Synapse

Freddie Santos talks HA/DR with Synapse:

Many of our customers have been asking about creating a disaster recovery plan for their Synapse Workspace. In a new blog series, we will cover the basics of disaster recovery and business continuity, discussing available options and custom solutions.

In this first post, we’ll review important concepts and questions to answer before building a disaster recovery plan, including the differences between High Availability and Disaster Recovery.

The focus in this post is on the dedicated SQL pool and Azure Data Lake Storage Gen2 (because people still think about Gen1?), though that’s the majority of what you’d need to think about—Spark pools and the serverless SQL pool really drive from the data lake. There’s also Data Explorer pools, which have their own storage and HA/DR capabilities.

Comments closed

Unit Testing Spark Notebooks in Synapse

Arun Sethia grabs the oscilloscope:

In this blog post, we will cover how to test and create unit test cases for Spark jobs developed using Synapse Notebook. This is an extension of my previous blog, Synapse – Choosing Between Spark Notebook vs Spark Job Definition, where we discussed selecting between Spark Notebook and Spark Job Definition. Unit testing is an automated approach that developers use to test individual self-contained code units. By verifying code behavior early, it helps to streamline coding practices for larger systems.

Arun covers three major use cases: when your code is in an external library, when it is in a separate notebook, and when it is in the same notebook.

Comments closed

Working with Managed Private Endpoints in Synapse

Sergio Fonseca continues a series on Synapse connectivity:

When you create your Azure Synapse workspace, you can choose to associate it to an Azure Virtual Network. The Virtual Network associated with your workspace is managed by Azure Synapse. This Virtual Network is called a Managed Workspace Virtual Network or Synapse Managed VNET

I am 100% in favor of using managed vNETs with Synapse and about 40% in favor of using Data Exfiltration Protection—it’s a lot lower because of the impact it has on your developers, though if you need it, developers will just have to deal with the added pain.

Comments closed

Rolling Your Own Serverless SQL Pool Database Project

Kevin Chant doesn’t let the lack of support for a product limit him:

In this post I want to share how I created a homemade serverless SQL Pool database project.

Because I know people are keen to work this way right now. Mostly due to the comments I received when I covered how to deploy a dacpac to a serverless SQL pool.

By the end of this post you will know how I created a database project for it. Plus, how you can deploy the contents of the database project with Azure DevOps. I also share plenty of links along the way.

Though Kevin did run into some challenges trying to hack in a solution, so it’s not quite as useful as you’d first hope.

Comments closed

Synapse and Azure ML Pipelines

Santosh Thomas integrates two Azure products:

As more customers standardize on the Synapse data platform, enabling machine learning workflows through Azure Machine Learning (Azure ML) becomes particularly interesting. This is especially true as more customers look to bring their data engineering and data science practices together and mature capabilities on both sides.

The goal of this blog post is to highlight how Synapse and Azure ML can work well together to deliver key insights. This is motivated by a scenario where a customer modernized their data platform on Azure Synapse but was looking to improve their data science practices through Azure ML. The focus of this blog is to expose existing functionality, and it is not a “hardened” solution with security or other cloud best practice implementations. The workflow steps also assume some level of comfort with Python and working with the Azure Python SDKs.

There was a time in which Microsoft wanted us to remain in Synapse for machine learning tasks, but that time is gone: the emphasis is definitely to do machine learning tasks in Azure ML, regardless of where the data lives…unless there’s a Spark job involved, in which case things get all weird again.

Comments closed