Press "Enter" to skip to content

Category: Synapse Analytics

January 2022 Updates for Azure Synapse Analytics

Saveen Reddy has an update for us:

You can now easily add data quality, data validation, and schema validation to your Synapse ETL jobs by leveraging Assert transformation in Synapse data flows. Add expectations to your data streams that will execute from the pipeline data flow activity to evaluate whether each row or column in your data meets your assertion. Tag the rows as pass or fail and add row-level details about how a constraint has been breached. This is a critical new feature to an already effective ETL framework to ensure that you are loading and processing quality data for your analytical solutions.

Read on for the full list of changes.

Comments closed

Synapse and ADF Pipeline Dependency Diagrams

Kamil Nowinski uses one of my favorite tools for diagram creation:

Documenting objects dependencies of ETL processes is a tough task. Regardless it is SSIS, ADF, pipelines in Azure Synapse or other systems. The reasons for understanding the current solution can vary either: handover to other team/member of the team, troubleshooting, refactoring, debugging, investigating dependencies due to error, performance issue or others, as well as keen to remove selected/duplicated pipelines or logic.

But there is never a good time to make documentation, and even if that has been done – no one knows how much it’s up to date. The situation is not improved by the fact that quite often there is a lack of (free or built-in) tools for generating such documentation. Sounds familiar? I bet it does.

Click through to learn more and to see how to use that tool (Mermaid).

Comments closed

Automating Pipeline Migration to Synapse via Azure DevOps

Kevin Chant deploys some Synapse pipelines:

In this post I want to cover how you can automate a pipeline migration to a Synapse workspace using Azure DevOps. As a follow up to a previous post I did about one way to copy an Azure Data Factory pipeline to Synapse Studio.

Because even though the post is good it deserves a follow up showing an automated way of doing it. I wanted to show that it can be done more gracefully.

And we all want to be graceful, right?

Comments closed

Using Synapse Link for Cosmos DB

I have a post combining Synapse Link for Cosmos DB and the Spark to Synapse SQL Connector:

In this post, we saw how to enable Cosmos DB’s Analytical store, access data using Synapse Link for Cosmos DB, and use the Spark to Synapse SQL Connector to move that data into a dedicated SQL pool. We saw how to do this in a workspace using a managed virtual network with data exfiltration protection enabled, meaning this is the largest number of steps necessary.

Click through for product descriptions and step-by-step instructions.

Comments closed

Azure Synapse Analytics Integration Points

Warner Chaves takes us through several integration points with Azure Synapse Analytics:

Azure Stream Analytics allows for in-flight querying of streaming data from Blog storage, Data Lake Storage, IoT Hub or Event Hubs. The querying is done through an easily adoptable SQL language and it really speeds up the development of a streaming solution.

The nice thing here is that Stream Analytics allows the use of a Synapse SQL Pool table as the target for the results of the streaming query. So, this is another way to do near real-time analytics by passing data from a streaming source through a Stream Analytics job and into a Synapse table. You could do this to pre-aggregate data on the fly, score data in real-time, perform real-time calculations over specific time or event windows, etc.

Click through for several examples of this.

Comments closed

Simple Mapping Data Flows in Synapse

Joshuha Owen announces a new feature:

This week, we are excited to announce the public preview for Map Data, a new feature for Azure Synapse Analytics and Database Templates! The Map Data tool is a guided process to help users create ETL mappings and mapping data flows from their source data to Synapse lake database tables without writing code. This experience will help you get started with transformations into your Synapse Lake database quickly but still give you the power of Mapping Data Flows.

This process starts with the user choosing the destination tables in Synapse lake databases and then mapping their source data into these tables. We will be following up with a demo video shortly.

Click through for more details on how it works.

Comments closed

New Azure Synapse Database Templates

Kevin Schofield has some new database templates for us:

The Synapse Database Template for Automotive Industries is a comprehensive data model that addresses the typical data requirements of organizations engaged in manufacturing automobiles, heavy vehicles, tires, and other automotive components.

The Synapse Database Template for Genomics is a comprehensive data model that addresses the typical data requirements of organizations engaged in acquiring and analyzing genomic data about human beings or other species.

Click through for more information on these, as well as two other fields.

Comments closed

Data Exfiltration Protection and Pip

I have a post borne from frustration:

I have an Azure Synapse Analytics workspace which uses a managed virtual network and includes data exfiltration protection. I also have a Spark pool. My goal is to import a few packages and use them in a Spark notebook.

Doing so is pretty easy from the Synapse workspace. I navigate to the Manage hub and then choose Apache Spark pools from the Analytics pools menu. Select the ellipsis for my Spark pool and then choose Packages.

From there, because I plan to update Python packages, I can upload a requirements.txt file and have Pip do its job.

But then it doesn’t… Click through to learn why, as well as the workaround for this. It’s stuff like this which makes me say data exfiltration protection is a feature administrators will (mostly) like and developers will hate. Especially because there’s no obvious indicator why this was happening in the error message itself.

Comments closed

Creating a Synapse Workspace with Data Exfiltration Protection

I have a post on creating a new Azure Synapse Analytics workspace:

As a quick upshot, having a managed VNet set up means that any Spark pools you create will have subnet segregation, meaning that the Spark machines will be in their own subnet, away from everything else. This provides a bit of cross-pool protection for you automatically. It also performs similar network isolation for your Synapse workspace, keeping it separated from other workspaces. The other big thing it does is create managed private endpoints to the serverless and dedicated SQL pools, which means that any network traffic between these pools and resources in the Synapse workspace will be guaranteed to transit over Azure networks and not the public internet, at least until it gets to you hitting the web.azuresynapse.net URL (and there are additional methods to lock down that part of it that we won’t cover today).

By default, the portal will not create a managed virtual network, so you’ll need to enable it at creation time. You cannot enable or disable the managed virtual network setting after a workspace has been created, so if you make a mistake, you’d need to rebuild the workspace, though you can at least use the same storage account.

One last thing that managed virtual networks offer you is the ability to enable data exfiltration protection.

Click through to see how it all works. Data exfiltration protection can limit you a bit, and that can be quite frustrating, but it does what it says…in the same way that Draconis did what he said.

Comments closed