Press "Enter" to skip to content

Category: Synapse Analytics

Synapse Runtime for Spark 3.3 Now in Public Preview

Estera Kot has an announcement:

We are excited to announce the preview availability of Apache Spark™ 3.3 on Synapse Analytics. The essential changes include features which come from upgrading Apache Spark to version 3.3.1 and upgrading Delta Lake to version 2.1.0.

Check out the official release notes for Apache Spark 3.3.0 and Apache Spark 3.3.1 for the complete list of fixes and features. In addition, review the migration guidelines between Spark 3.2 and 3.3 to assess potential changes to your applications, jobs and notebooks.

There’s a lot in there, though I did snicker a bit at log4j 2 being more secure than log4j v1 given what we saw last year, though that gaping hole was fixed.

Comments closed

A Crash Course on Synapse Studio

Kevin Chant wants six minutes of your time:

In this post I want to do a six-minute crash course about Synapse Studio. I wanted to do this follow-up post for a couple of reasons.

First reason is because a while ago somebody who was fairly new to Azure Data Engineering Services mentioned that they thought a lot of my posts were for advanced users. So, I showed them a previous post which was a five-minute crash course about Synapse Studio.

Whilst showing them that post I realized that some of the screenshots were out of date. With this in mind I thought I would do an updated version of the crash course for Synapse Studio. Which also allows me to highlight where to find some features.

Start your timers and get reading.

Comments closed

Roll Your Own Row-Level Security for the Serverless SQL Pool

Randheer Parmar wants row-level security:

Row Level Security is a very key requirement for most database or data lake applications. Most of the databases are having natively build row-level security but Synapse serverless SQL pool doesn’t support this inbuilt functionality. In this article, we will see how to implement it.

Row-level security has always seemed to me to be a great idea but not one I can implement because its performance cost is always too high.

Comments closed

InvalidAbfsRestOperationException in Synapse Managed VNet

Kamil Nowinski goes down a rabbit hole:

This happens on the customer’s Synapse workspace where we have a public network disabled, so only private endpoint and managed VNET are available. Additionally, you probably spotted, that it took over 3 minutes to actually get this message. Hence, as a next step, in order to minimize the potential causes I simplified the query to make sure I have access to the Storage, by listing the files:

Click through for a story of pain, followed by glorious resolution.

Comments closed

Data Exfiltration Protection and Synapse Pipelines

Luke Moloney shuts it down:

Before we discuss how DEP applies to Synapse Pipelines, it is important to level-set on some Synapse Pipelines specific concepts – if you are familiar with Synapse Pipelines or Azure Data Factory you can skip over this section and jump to Synapse Pipeline connectivity without DEP enabled.

For a more generalized introduction to Synapse Pipelines check out this doc article.

Synapse Pipelines enables users to connect to a range of different data services, through what is called a Linked Service. 

The big trick, using self-hosted integration runtimes, is something Luke spends a fair amount of time on.

Comments closed

Azure Synapse Analytics Updates for November 2022

Ryan Majidimehr has a bundle of updates for us:

We are always working to improve Azure Analytics Spark performance. We are making significant changes that will increase Spark performance by up to 77%.  

Based on our testing using the 1TB TPC-H industry benchmark, you’re likely to see up to 77% increased performance. While your workload may perform differently than the TPC-H benchmark, everyone is expected to see improved performance. These Spark performance improvements come from moving to the latest Azure v5 VMs which have improved CPU performance, increased temporary SSD throughput, and lastly higher remote storage IOPS.  

Click through for a whole bunch of updates.

Comments closed

Reading Serverless SQL Pool Data with Data Factory

Koen Verbeeck wants to read from the serverless SQL pool in Azure Synapse Analytics:

We have some data we can query using the serverless SQL pools in Azure Synapse Analytics. For this blog post, I’m querying data that is stored in Azure Cosmos DB. Read the blog post How to Store Normalized SQL Server Data into Azure Cosmos DB to learn more about how that data got there.

Suppose I now want to read the data using Azure Data Factory. You can read data from Cosmos DB directly, but let’s pretend I want to do some transformations first using my favorite language: SQL. How can we do this?

Read on to learn how.

Comments closed

REST APIs for Synapse Spark Pools

Abid Nazir Guroo looks at some endpoints:

Azure Synapse Analytics Representational State Transfer (REST) APIs are secure HTTP service endpoints that support creating and managing Azure Synapse resources using Azure Resource Manager and Azure Synapse web endpoints. This article provides instructions on how to setup and use Synapse REST endpoints and describe the Apache Spark Pool operations supported by REST APIs.

Read on to see some of the Spark pool management options are available to you via the REST API.

Comments closed

Time Travel with Delta Tables in Synapse

Liliam Leme reverses the clock:

Scenario

While working with a customer, they had a requirement to restore modified files to a specific point in time. They had built their architecture on top of a Data lake.

Looking for options

While working on this scenario, we explored some storage options available without any side customization, for example, Soft delete for blobs – Azure Storage | Microsoft Docs.

Read on to see what they landed on.

Comments closed