Press "Enter" to skip to content

Category: Synapse Analytics

Executing Multiple Notebooks in one Spark Pool with Genie

Shalu Ganotra Chadra, et al, explain what Synapse Genie is:

The Genie framework is a metadata driven utility written in Python. It is implemented using threading (ThreadPoolExecutor module) and directed acyclic graph (Networkx library). It consists of a wrapper notebook, that reads metadata of notebooks and executes them within a single Spark session. Each notebook is invoked on a thread with MSSparkutils.run() command based on the available resources in the Spark pool. The dependencies between notebooks are understood and tracked through a directed acyclic graph.

Read on for more information about how you can use it and what the setup process looks like.

Comments closed

Cosmos DB to Data Explorer Synapse Link

Vicent-Philippe Lauzon makes an announcement:

We recently made our new Kusto data connection available in public preview:  Cosmos DB to Azure Data Explorer Synapse Link.

This does look like a marketing-heavy announcement but the short version is that you can ingest data from Cosmos DB into Data Explorer pools via Synapse Link rather than creating your own ETL process. The previous Cosmos DB connector for Synapse Link tied to a dedicated SQL pool.

Comments closed

Finding Data Factory Objects in Synapse Studio

Kevin Chant pulls out the magnifying glass and compass:

In this post I want to cover where you can find Azure Data Factory objects in Synapse Studio. I want to do this post for a couple of reasons.

First reason is that at the start of the year I published a post on how to automate a Data Factory pipeline migration to an Azure Synapse Analytics workspace using Azure DevOps.

Even though I showed one way that you can automate the migrations of a Data Factory to a Synapse workspace I did not show where you can view them in Synapse Studio.

Second reason is because I keep telling everybody they can use the same pipelines in Azure Synapse Analytics but the objects can be found in different places.

Read on for the two places to find Data Factory objects.

Comments closed

Sharing Results between Notebooks with MSSparkUtils

Liliam Leme provides an answer to a common Synapse Spark pool question:

I’ve been reviewing customer questions centered around “Have I tried using MSSparkUtils to solve the problem?”

One of the questions asked was how to share results between notebooks. Every time you hit “run” in a notebook, it starts a new Spark cluster which means that each notebook would be using different sessions. Making it impossible to share results between executions of notebooks. MSSparkUtils offers a solution to handle this exact scenario. 

Read on to see what MSSparkUtils is and how it helps in this case.

Comments closed

Synapse Runtime for Spark 3.3 Now in Public Preview

Estera Kot has an announcement:

We are excited to announce the preview availability of Apache Spark™ 3.3 on Synapse Analytics. The essential changes include features which come from upgrading Apache Spark to version 3.3.1 and upgrading Delta Lake to version 2.1.0.

Check out the official release notes for Apache Spark 3.3.0 and Apache Spark 3.3.1 for the complete list of fixes and features. In addition, review the migration guidelines between Spark 3.2 and 3.3 to assess potential changes to your applications, jobs and notebooks.

There’s a lot in there, though I did snicker a bit at log4j 2 being more secure than log4j v1 given what we saw last year, though that gaping hole was fixed.

Comments closed

A Crash Course on Synapse Studio

Kevin Chant wants six minutes of your time:

In this post I want to do a six-minute crash course about Synapse Studio. I wanted to do this follow-up post for a couple of reasons.

First reason is because a while ago somebody who was fairly new to Azure Data Engineering Services mentioned that they thought a lot of my posts were for advanced users. So, I showed them a previous post which was a five-minute crash course about Synapse Studio.

Whilst showing them that post I realized that some of the screenshots were out of date. With this in mind I thought I would do an updated version of the crash course for Synapse Studio. Which also allows me to highlight where to find some features.

Start your timers and get reading.

Comments closed

Roll Your Own Row-Level Security for the Serverless SQL Pool

Randheer Parmar wants row-level security:

Row Level Security is a very key requirement for most database or data lake applications. Most of the databases are having natively build row-level security but Synapse serverless SQL pool doesn’t support this inbuilt functionality. In this article, we will see how to implement it.

Row-level security has always seemed to me to be a great idea but not one I can implement because its performance cost is always too high.

Comments closed

InvalidAbfsRestOperationException in Synapse Managed VNet

Kamil Nowinski goes down a rabbit hole:

This happens on the customer’s Synapse workspace where we have a public network disabled, so only private endpoint and managed VNET are available. Additionally, you probably spotted, that it took over 3 minutes to actually get this message. Hence, as a next step, in order to minimize the potential causes I simplified the query to make sure I have access to the Storage, by listing the files:

Click through for a story of pain, followed by glorious resolution.

Comments closed

Data Exfiltration Protection and Synapse Pipelines

Luke Moloney shuts it down:

Before we discuss how DEP applies to Synapse Pipelines, it is important to level-set on some Synapse Pipelines specific concepts – if you are familiar with Synapse Pipelines or Azure Data Factory you can skip over this section and jump to Synapse Pipeline connectivity without DEP enabled.

For a more generalized introduction to Synapse Pipelines check out this doc article.

Synapse Pipelines enables users to connect to a range of different data services, through what is called a Linked Service. 

The big trick, using self-hosted integration runtimes, is something Luke spends a fair amount of time on.

Comments closed