Synapse Analytics – Page 15

TokenLibrary and Data Exfiltration Protection

Published 2021-12-31 by Kevin Feasel

I troubleshoot an unfortunate error message:

Based on this error message, the command is timing out after one minute. My initial thought is that there must be a network restriction preventing me from communicating with Key Vault, and I know what my first answer is.

This story even has a red herring, which means it’s a set of junior sleuths and a talking dog away from being a ’90s cartoon.

Comments closed

Data Exfiltration Protection and Pip

Published 2021-12-30 by Kevin Feasel

I have a post borne from frustration:

I have an Azure Synapse Analytics workspace which uses a managed virtual network and includes data exfiltration protection. I also have a Spark pool. My goal is to import a few packages and use them in a Spark notebook.
Doing so is pretty easy from the Synapse workspace. I navigate to the Manage hub and then choose Apache Spark pools from the Analytics pools menu. Select the ellipsis for my Spark pool and then choose Packages.
From there, because I plan to update Python packages, I can upload a requirements.txt file and have Pip do its job.

But then it doesn’t… Click through to learn why, as well as the workaround for this. It’s stuff like this which makes me say data exfiltration protection is a feature administrators will (mostly) like and developers will hate. Especially because there’s no obvious indicator why this was happening in the error message itself.

Comments closed

Creating a Synapse Workspace with Data Exfiltration Protection

Published 2021-12-29 by Kevin Feasel

I have a post on creating a new Azure Synapse Analytics workspace:

As a quick upshot, having a managed VNet set up means that any Spark pools you create will have subnet segregation, meaning that the Spark machines will be in their own subnet, away from everything else. This provides a bit of cross-pool protection for you automatically. It also performs similar network isolation for your Synapse workspace, keeping it separated from other workspaces. The other big thing it does is create managed private endpoints to the serverless and dedicated SQL pools, which means that any network traffic between these pools and resources in the Synapse workspace will be guaranteed to transit over Azure networks and not the public internet, at least until it gets to you hitting the web.azuresynapse.net URL (and there are additional methods to lock down that part of it that we won’t cover today).
By default, the portal will not create a managed virtual network, so you’ll need to enable it at creation time. You cannot enable or disable the managed virtual network setting after a workspace has been created, so if you make a mistake, you’d need to rebuild the workspace, though you can at least use the same storage account.
One last thing that managed virtual networks offer you is the ability to enable data exfiltration protection.

Click through to see how it all works. Data exfiltration protection can limit you a bit, and that can be quite frustrating, but it does what it says…in the same way that Draconis did what he said.

Comments closed

Azure Synapse Analytics: Success by Design

Published 2021-12-23 by Kevin Feasel

Wolfgang Strasser digs up some documents:

Today, I stumbled upon a very interesting link – the Azure Synapse Analytics – Success by Design site (follow this link).
If you need guidance, best practices links, POC playbooks, links to blogs & videos, tools, .. THIS is the site you need to bookmark.

Click through for a bit more information, as well as links to other relevant Azure Synapse Analytics resources.

Comments closed

Dedicated SQL Pool Index, Distribution, and Partition Guidance

Published 2021-12-22 by Kevin Feasel

I have a write-up on the specific value of distributions, indexes, and partitions in Azure Synapse Analytics dedicated SQL pools:

Not too long ago, I ended up taking the DP-203 certification exam for sundry reasons. On that exam, they ask a lot about Azure Synapse Analytics, including indexing, distribution, and partitioning strategies. Because these can be a bit different from on-premises SQL Server, I wanted to cover what options are available and when you might choose them. Let’s start with distributions, as that’s the biggest change in thought process.

Read on for the guidance.

Comments closed

Kicking off Synapse Pipeline with CI/CD

Published 2021-12-22 by Kevin Feasel

Hiram Fleitas builds a process:

I use an Azure Automation account to host my PowerShell script because it provides reusability and management of required modules. Schedules and logging are included but most importantly, it provides a REST API webhook that I can call from pipelines if necessary.

Click through to see how this all works.

Comments closed

Azure Synapse Analytics Updates

Published 2021-12-21 by Kevin Feasel

Saveen Reddy catalogs what’s new in Azure Synapse Analytics:

Quick Reuse of Spark clusters
By default, every data flow activity spins up a new Spark cluster based upon the Azure Integration Runtime (IR) configuration. Cold cluster start-up time takes a few minutes. If your pipelines contain multiple sequential data flows, you can enable a time-to-live (TTL) value, which keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL duration, it will reuse the existing cluster and start up time will be greatly reduced.

Read on for the full list of updates.

Comments closed

A Primer on the Serverless SQL Pool

Published 2021-12-20 by Kevin Feasel

Tino Zishiri has some tips for people trying out the Azure Synapse Analytics serverless SQL pool:

Serverless SQL Pools or SQL on-demand is a serverless distributed data processing service offered by Microsoft. The service is comparable to Amazon Athena. The serverless nature of the service means that there is no infrastructure to manage, and you only pay for what you use (pay-per-query model).
Through Serverless SQL pools, you query the data in your data lake using T-SQL. The architecture behind the service is optimized for querying and analyzing big data by running queries in parallel.

Read on to understand where the serverless SQL pool fits, as well as some tips about data transformation with this pool.

Comments closed

Microsoft.DataFactory and Storage Event Triggers in Synapse

Published 2021-12-16 by Kevin Feasel

Cathrine Wilhelmsen troubleshoots an Azure issue:

I ran into an issue today while trying to publish a storage event trigger in Azure Synapse Analytics. After publishing, I got error messages that said “failed to subscribe” and “failed to activate”. The storage event trigger had been published, but it wouldn’t start. Help!

Click through for some resources on documentation, a few things which didn’t work, and what finally resolved the issue.

Comments closed

Azure Synapse Analytics November Updates

Published 2021-12-10 by Kevin Feasel

James Serra keeps us up to date on Synapse:

Delta Lake support for serverless SQL is generally available: Azure Synapse has had preview-level support for serverless SQL pools querying the Delta Lake format. This enables BI and reporting tools to access data in Delta Lake format through standard T-SQL. With this latest update, the support is now Generally Available and can be used in production. See How to query Delta Lake files using serverless SQL pools

Click through for the full list of what James likes.

Comments closed

Category: Synapse Analytics