Press "Enter" to skip to content

Category: Synapse Analytics

From Cosmos DB to Dedicated SQL Pools via Synapse Link

Jovan Popovic shows off Azure Synapse Link:

At the time of writing this article, the dedicated SQL pool doesn’t have the ability to read data from CosmosDB/Dataverse using the Synapse link. There are scenarios where you would need to use CosmosDB data in your dedicated SQL pool, so you would need to find a way how to load data. In theory, you could create an ADF pipeline that reads data from CosmosDB or Dataverse and store data in the dedicated SQL pool as a target. This might be a problem if your Pipeline is reading data directly from CosmosDB account because it might impact both operational workload performance and cost. The analytical storage is the recommended location that you should use to fetch all data from CosmosDB/Dataverse.

In this post, I will describe how to use a two-step approach where you export your data using the serverless SQL pool via Synapse link into Azure Data Lake storage, and then load data into the dedicated SQL pool table. This process is shown in the following figure:

A couple of weeks back, I wrote about another method of doing this through the Spark pool. Now you have two options.

Comments closed

January 2022 Updates for Azure Synapse Analytics

Saveen Reddy has an update for us:

You can now easily add data quality, data validation, and schema validation to your Synapse ETL jobs by leveraging Assert transformation in Synapse data flows. Add expectations to your data streams that will execute from the pipeline data flow activity to evaluate whether each row or column in your data meets your assertion. Tag the rows as pass or fail and add row-level details about how a constraint has been breached. This is a critical new feature to an already effective ETL framework to ensure that you are loading and processing quality data for your analytical solutions.

Read on for the full list of changes.

Comments closed

Synapse and ADF Pipeline Dependency Diagrams

Kamil Nowinski uses one of my favorite tools for diagram creation:

Documenting objects dependencies of ETL processes is a tough task. Regardless it is SSIS, ADF, pipelines in Azure Synapse or other systems. The reasons for understanding the current solution can vary either: handover to other team/member of the team, troubleshooting, refactoring, debugging, investigating dependencies due to error, performance issue or others, as well as keen to remove selected/duplicated pipelines or logic.

But there is never a good time to make documentation, and even if that has been done – no one knows how much it’s up to date. The situation is not improved by the fact that quite often there is a lack of (free or built-in) tools for generating such documentation. Sounds familiar? I bet it does.

Click through to learn more and to see how to use that tool (Mermaid).

Comments closed

Automating Pipeline Migration to Synapse via Azure DevOps

Kevin Chant deploys some Synapse pipelines:

In this post I want to cover how you can automate a pipeline migration to a Synapse workspace using Azure DevOps. As a follow up to a previous post I did about one way to copy an Azure Data Factory pipeline to Synapse Studio.

Because even though the post is good it deserves a follow up showing an automated way of doing it. I wanted to show that it can be done more gracefully.

And we all want to be graceful, right?

Comments closed

Using Synapse Link for Cosmos DB

I have a post combining Synapse Link for Cosmos DB and the Spark to Synapse SQL Connector:

In this post, we saw how to enable Cosmos DB’s Analytical store, access data using Synapse Link for Cosmos DB, and use the Spark to Synapse SQL Connector to move that data into a dedicated SQL pool. We saw how to do this in a workspace using a managed virtual network with data exfiltration protection enabled, meaning this is the largest number of steps necessary.

Click through for product descriptions and step-by-step instructions.

Comments closed

Azure Synapse Analytics Integration Points

Warner Chaves takes us through several integration points with Azure Synapse Analytics:

Azure Stream Analytics allows for in-flight querying of streaming data from Blog storage, Data Lake Storage, IoT Hub or Event Hubs. The querying is done through an easily adoptable SQL language and it really speeds up the development of a streaming solution.

The nice thing here is that Stream Analytics allows the use of a Synapse SQL Pool table as the target for the results of the streaming query. So, this is another way to do near real-time analytics by passing data from a streaming source through a Stream Analytics job and into a Synapse table. You could do this to pre-aggregate data on the fly, score data in real-time, perform real-time calculations over specific time or event windows, etc.

Click through for several examples of this.

Comments closed

Simple Mapping Data Flows in Synapse

Joshuha Owen announces a new feature:

This week, we are excited to announce the public preview for Map Data, a new feature for Azure Synapse Analytics and Database Templates! The Map Data tool is a guided process to help users create ETL mappings and mapping data flows from their source data to Synapse lake database tables without writing code. This experience will help you get started with transformations into your Synapse Lake database quickly but still give you the power of Mapping Data Flows.

This process starts with the user choosing the destination tables in Synapse lake databases and then mapping their source data into these tables. We will be following up with a demo video shortly.

Click through for more details on how it works.

Comments closed

New Azure Synapse Database Templates

Kevin Schofield has some new database templates for us:

The Synapse Database Template for Automotive Industries is a comprehensive data model that addresses the typical data requirements of organizations engaged in manufacturing automobiles, heavy vehicles, tires, and other automotive components.

The Synapse Database Template for Genomics is a comprehensive data model that addresses the typical data requirements of organizations engaged in acquiring and analyzing genomic data about human beings or other species.

Click through for more information on these, as well as two other fields.

Comments closed